又是Curtis Huttenhower实验室开发的软件~
MaAsLin2是全面的R软件包,用于有效确定临床data和微生物组学特征之间的多变量关联。 MaAsLin2依靠通用的线性模型来适应大多数现代流行病学研究设计,包括横断面和纵断面,并提供了多种数据探索,标准化和转换方法。 MaAsLin2是MaAsLin的下一代。
MaAsLin2需要两个输入文件,一个是物种丰度表,一个临床信息表:
The data file can contain samples not included in the metadata file (along with the reverse case). For both cases, those samples not included in both files will be removed from the analysis. Also the samples do not need to be in the same order in the two files.
- Data (or features) file
制表符分隔;样本作为行,feature作为列(也可以转置);可能的特征包括微生物、基因、途径等 - Metadata file
制表符分隔;样本作为行,feature作为列(也可以转置);可以是分类型变量,也可以是连续型
安装
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Maaslin2")
使用
library(Maaslin2)
##input data
input_data = system.file(
"extdata", "HMP2_taxonomy.tsv", package="Maaslin2") # The abundance table file
input_data
input_metadata = system.file(
"extdata", "HMP2_metadata.tsv", package="Maaslin2") # The metadata table file
input_metadata
##读入示例数据
df_input_data = read.table(file = input_data, header = TRUE, sep = "\t",
row.names = 1,
stringsAsFactors = FALSE)
df_input_data[1:5, 1:5]
df_input_metadata = read.table(file = input_metadata, header = TRUE, sep = "\t",
row.names = 1,
stringsAsFactors = FALSE)
df_input_metadata[1:5, ]
##开始运行Maaslin2
fit_data = Maaslin2(
input_data = input_data,
input_metadata = input_metadata,
output = "demo_output",
fixed_effects = c("diagnosis", "dysbiosis"))
output 文件
- Significant associations
-
metadata
: the variable name being associated with a microbial feature. -
feature
: the microbial feature (taxon, gene, pathway, etc.). -
value
: for categorical features, the specific feature level for which the coefficient and significance of association is being reported. -
coef
: the model coefficient value (effect size).- Coefficients for categorical variables indicate the contrast between the category specified in
value
versus the reference category. - MaAsLin2 by default sets the first category in alphabetical order as the reference. See 4.1 Setting Reference Levels on how to change this behavior.
- Coefficients for categorical variables indicate the contrast between the category specified in
-
stderr
: the standard error from the model. -
N
: the total number of samples used in the model for this association (since e.g. missing values can be excluded). -
N.not.0
: the total of number of these samples in which the feature is non-zero. -
pval
: the nominal significance of this association. -
qval
: the corrected significance is computed withp.adjust
with the correction method (BH, etc.)