Skip to contents

Calculates the cosine disimilarity of a Marix pairwise for each column.

Usage

cosine(x, weighted = TRUE, threads = 1)

Arguments

x

A matrix, sparseMatrix or Matrix.

weighted

A boolean value, to use abundances (weighted = TRUE) or absence/presence (weighted=FALSE) (default: TRUE).

threads

A wholenumber, the number of threads to use in setThreadOptions (default: 1).

Value

A column x column dist object.

Details

The cosine dissimilarity between two samples \(A\) and \(B\), each of length \(n\), is defined as:

\(d(A,B) = 1 - \frac{\sum_{i}^n A_i B_i}{\sqrt{\sum_{i}^n A_i^2} \sqrt{\sum_{i}^n B_i^2}} \)

where \(A_i\) and \(B_i\) are the abundances of the \(i\)-th feature in sample \(A\) and \(B\), respectively. When weighted is set to FALSE, counts are replaced by presence/absence data.

References

Deza, M. M., & Deza, E. (2009). Encyclopedia of Distances. Springer Science & Business Media., 308.

Examples

library("OmicFlow")

metadata_file <- system.file("extdata", "metadata.tsv", package = "OmicFlow")
counts_file <- system.file("extdata", "counts.tsv", package = "OmicFlow")
features_file <- system.file("extdata", "features.tsv", package = "OmicFlow")
tree_file <- system.file("extdata", "tree.newick", package = "OmicFlow")

taxa <- metagenomics$new(
    metaData = metadata_file,
    countData = counts_file,
    featureData = features_file,
    treeData = tree_file
)
#>  metaData template passed the JSON validation.
#>  Checking for duplicated identifiers ..
#>  featureData is loaded.
#>  countData is loaded.
#>  treeData is loaded.
#>  Final steps .. cleaning & creating back-up
#> 
#> ── <metagenomics> object 
#> metaData: 9 variables × 4 samples
#> countData: 4 samples × 242 features
#> featureData: 7 attributes × 242 features
#> treeData: 242 tips × 241 nodes

taxa$feature_subset(Kingdom == "Bacteria")
#> 
#> ── <metagenomics> object 
#> metaData: 9 variables × 4 samples
#> countData: 4 samples × 185 features
#> featureData: 7 attributes × 185 features
#> treeData: 185 tips × 184 nodes
taxa$scale(method = "tss")

cosine(taxa$countData)
#>           S100      S103      S115
#> S103 1.0000000                    
#> S115 0.7676911 1.0000000          
#> S120 1.0000000 0.9093691 1.0000000