limma/voom,edgeR,DESeq2分析注意事项,差异分析表达矩阵与分组信息

给粉丝朋友们带来了很多理解上的挑战,所以我们开辟专栏慢慢介绍其中的一些概念性的问题,上一期: 箱线图的生物学含义

这一讲我们来说一下limma/voom,edgeR,DESeq2,转录组差异分析的三大R包!

差异分析的第一步是要构建符合不同模型的R对象,主要包括两部分的信息:表达矩阵和分组信息。 这次主要讨论一下limma/voom,edgeR,DESeq2是转录组差异分析的三大R包的表达矩阵和分组矩阵构建,主要针对二分组转录组数据的差异分析。

一、limma和edgeR包的表达矩阵和分组信息

1.limma和edgeR包DEGList对象的构建

limma和edgeR包都是由一个研究团队开发,方法之间互相继承。edgeR是专门针对转录组数据开发的,limma包最早是用来进行芯片数据的差异分析,对转录组数据差异分析的功能是后来添加的,表达矩阵的构建方法直接使用edgeR包中的DGEList函数

DEGList函数的参数示例:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">DGEList(counts = matrix(0, 0, 0), lib.size = colSums(counts),
norm.factors = rep(1,ncol(counts)), samples = NULL,
group = NULL, genes = NULL, remove.zeros = FALSE)</pre>

使用airway中的转录组表达矩阵来演示

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># BiocManager::install(c("airway", "edgeR"))

library(airway)
data(airway)

获取基因counts矩阵

exprSet <- assay(airway)
exprSet[1:6,1:6]
SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517
ENSG00000000003 679 448 873 408 1138 1047
ENSG00000000005 0 0 0 0 0 0
ENSG00000000419 467 515 621 365 587 799
ENSG00000000457 260 211 263 164 245 331
ENSG00000000460 60 55 40 35 78 63
ENSG00000000938 0 0 2 0 1 0
exprSet <- assay(airway)

获取分组信息

group_list <- colData(airway)$dex
group_list
[1] untrt trt untrt trt untrt trt untrt trt
Levels: trt untrt</pre>

使用 DEGList函数构建limma和edgeR包需要的输入矩阵:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> dge <- edgeR::DGEList(counts=exprSet)

dge
An object of class "DGEList"
$counts
SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517 SRR1039520
ENSG00000000003 679 448 873 408 1138 1047 770
ENSG00000000005 0 0 0 0 0 0 0
ENSG00000000419 467 515 621 365 587 799 417
ENSG00000000457 260 211 263 164 245 331 233
ENSG00000000460 60 55 40 35 78 63 76
SRR1039521
ENSG00000000003 572
ENSG00000000005 0
ENSG00000000419 508
ENSG00000000457 229
ENSG00000000460 60
64097 more rows ...

$samples
group lib.size norm.factors
SRR1039508 1 20637971 1
SRR1039509 1 18809481 1
SRR1039512 1 25348649 1
SRR1039513 1 15163415 1
SRR1039516 1 24448408 1
SRR1039517 1 30818215 1
SRR1039520 1 19126151 1
SRR1039521 1 21164133 1</pre>

可以看到包含了counts矩阵和一些其他用于差异分析要使用的信息,还可以把分组信息添加进来。

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> DEG <- edgeR::DGEList(counts=exprSet,group=factor(group_list))

DEG
An object of class "DGEList"
$counts
SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517 SRR1039520
ENSG00000000003 679 448 873 408 1138 1047 770
ENSG00000000005 0 0 0 0 0 0 0
ENSG00000000419 467 515 621 365 587 799 417
ENSG00000000457 260 211 263 164 245 331 233
ENSG00000000460 60 55 40 35 78 63 76
SRR1039521
ENSG00000000003 572
ENSG00000000005 0
ENSG00000000419 508
ENSG00000000457 229
ENSG00000000460 60
64097 more rows ...

$samples
group lib.size norm.factors
SRR1039508 untrt 20637971 1
SRR1039509 trt 18809481 1
SRR1039512 untrt 25348649 1
SRR1039513 trt 15163415 1
SRR1039516 untrt 24448408 1
SRR1039517 trt 30818215 1
SRR1039520 untrt 19126151 1
SRR1039521 trt 21164133 1</pre>

这些使用方法适用于绝大多数limma和edgeR包差异分析的表达矩阵构建。

2.limma和edgeR包分组矩阵的设置

limma和edgeR的假设都是数据符合正态分布,构建线性模型。 使用model.matrix函数构建分组信息的矩阵,就是将分组信息二值化,用0和1构成的矩阵来代表不同的分组信息

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> design <- model.matrix(~0+factor(group_list))

colnames(design) <- levels(factor(group_list))
rownames(design) <- colnames(exprSet)
design
trt untrt
SRR1039508 0 1
SRR1039509 1 0
SRR1039512 0 1
SRR1039513 1 0
SRR1039516 0 1
SRR1039517 1 0
SRR1039520 0 1
SRR1039521 1 0
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"factor(group_list)`
[1] "contr.treatment"</pre>

需要注意的一点是下面两种构建模型矩阵的区别

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> design <- model.matrix(~factor(group_list))

design
(Intercept) factor(group_list)untrt
1 1 1
2 1 0
3 1 1
4 1 0
5 1 1
6 1 0
7 1 1
8 1 0
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"factor(group_list)`
[1] "contr.treatment"

design <- model.matrix(~0+factor(group_list))
design
factor(group_list)trt factor(group_list)untrt
1 0 1
2 1 0
3 0 1
4 1 0
5 0 1
6 1 0
7 0 1
8 1 0
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"factor(group_list)`
[1] "contr.treatment"</pre>

第一种方法是将第一列的分组信息作为线性模型的截距,第二列开始依次与第一列比较,通过coef参数可以把差异分析结果依次提取出来。

第二种方法,仅仅是分组信息而已,需要通过makeContrasts函数来制作差异比较矩阵控制。

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> # 通过makeContrasts设置需要进行对比的分组

comp='trt-untrt'
cont.matrix <- makeContrasts(contrasts=c(comp),levels = design)
cont.matrix
Contrasts
Levels trt-untrt
trt 1
untrt -1</pre>

二、DESeq2包的表达矩阵和分组信息的构建

1.DESeq2包输入文件:DESeqDataSet对象的制作

构建DESeqDataSet函数的参数示例:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">DESeqDataSet(se, design, ignoreRank = FALSE)

DESeqDataSetFromMatrix(countData, colData, design, tidy = FALSE,
ignoreRank = FALSE, ...)

DESeqDataSetFromHTSeqCount(sampleTable, directory = ".", design,
ignoreRank = FALSE, ...)

DESeqDataSetFromTximport(txi, colData, design, ...)</pre>

DESeqDataSet使用示例:从SummarizedExperiment流程中产生的数据导入

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> library(airway)

data(airway)
ddsSE <- DESeq2::DESeqDataSet(airway, design = ~ cell + dex)
ddsSE
class: DESeqDataSet
dim: 64102 8
metadata(2): '' version
assays(1): counts
rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
rowData names(0):
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(9): SampleName cell ... Sample BioSample
</pre>

DESeqDataSetFromMatrix使用示例:从count矩阵中构建DESeqDataSet:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> colData <- data.frame(row.names=colnames(exprSet),group_list=group_list)

dds <- DESeq2::DESeqDataSetFromMatrix(countData = exprSet,colData = colData, design = ~ group_list)
dds

class: DESeqDataSet
dim: 64102 8
metadata(1): version
assays(1): counts
rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
rowData names(0):
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(1): group_list</pre>

DESeqDataSetFromHTSeqCount使用示例:从HTSeqCount中导入数据

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 指定表达矩阵文件所在的文件夹

directory <- system.file("extdata", package="pasilla",

  •                      mustWork=TRUE)
    

directory
[1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library/pasilla/extdata"
sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleFiles
[1] "treated1fb.txt" "treated2fb.txt" "treated3fb.txt" "untreated1fb.txt"
[5] "untreated2fb.txt" "untreated3fb.txt" "untreated4fb.txt"
sampleCondition <- sub("(.treated).","\1",sampleFiles)
sampleCondition
[1] "treated" "treated" "treated" "untreated" "untreated" "untreated" "untreated"

构建包含表达矩阵文件信息的数据框

sampleTable <- data.frame(sampleName = sampleFiles,

  •                       fileName = sampleFiles,
    
  •                       condition = sampleCondition)
    

导入为DESeqDataSet对象

ddsHTSeq <- DESeq2::DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,

  •                                    directory = directory,
    
  •                                    design= ~ condition)
    

ddsHTSeq
class: DESeqDataSet
dim: 70463 7
metadata(1): version
assays(1): counts
rownames(70463): FBgn0000003:001 FBgn0000008:001 ... FBgn0261575:001
FBgn0261575:002
rowData names(0):
colnames(7): treated1fb.txt treated2fb.txt ... untreated3fb.txt untreated4fb.txt
colData names(1): condition</pre>

DESeqDataSetFromTximport使用示例:通过Tximport导入不基于比对的基因定量矩阵,主要是以下四个常用软件

  • Salmon (Patro et al. 2017)
  • Sailfish (Patro, Mount, and Kingsford 2014)
  • kallisto (Bray et al. 2016)
  • RSEM (Li and Dewey 2011)

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 加载R包及示例数据

library("tximport")
library("readr")
library("tximportData")
dir <- system.file("extdata", package="tximportData")

设置分组信息

group_list <- factor(rep(c("A","B"),each=3))

获取表达矩阵所在的文件夹,salmon的结果为例

files <- list.files(file.path(dir,"salmon"),pattern = "*quant.sf.gz", recursive = TRUE)
full_files_path <- file.path(dir,"salmon",files)

读入转录本和基因对应列表

tx2gene <- read_csv(file.path(dir, "tx2gene.gencode.v27.csv"))
Parsed with column specification:
cols(
TXNAME = col_character(),
GENEID = col_character()
)
head(tx2gene)

A tibble: 6 x 2

TXNAME GENEID
<chr> <chr>
1 ENST00000456328.2 ENSG00000223972.5
2 ENST00000450305.2 ENSG00000223972.5
3 ENST00000473358.1 ENSG00000243485.5
4 ENST00000469289.1 ENSG00000243485.5
5 ENST00000607096.1 ENSG00000284332.1
6 ENST00000606857.1 ENSG00000268020.3

txi倒入需要两个参数:表达矩阵所在路径和基因转录本对应的列表

txi <- tximport(full_files_path, type="salmon", tx2gene=tx2gene)
reading in files with read_tsv
1 2 3 4 5 6
summarizing abundance
summarizing counts
summarizing length
sampleName<- c("ERR188021", "ERR188088", "ERR188288","ERR188297", "ERR188329", "ERR188356")
colData <- cbind(sampleName, group_list)
ddsTxi <- DESeq2::DESeqDataSetFromTximport(txi,

  •                                        colData = colData,
    
  •                                        design = ~ group_list)
    

using counts and average transcript lengths from tximport

ddsTxi
class: DESeqDataSet
dim: 58288 6
metadata(1): version
assays(2): counts avgTxLength
rownames(58288): ENSG00000000003.14 ENSG00000000005.5 ... ENSG00000284747.1
ENSG00000284748.1
rowData names(0):
colnames: NULL
colData names(2): sampleName group_list</pre>

2.DESeq2分组信息的设置

DESeq2的差异分析的分组信息设置比较简单,主要通过resuls函数实现

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">results(object, contrast, name, lfcThreshold = 0,
altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"),
listValues = c(1, -1), cooksCutoff, independentFiltering = TRUE,
alpha = 0.1, filter, theta, pAdjustMethod = "BH", filterFun,
format = c("DataFrame", "GRanges", "GRangesList"), test,
addMLE = FALSE, tidy = FALSE, parallel = FALSE,
BPPARAM = bpparam(), minmu = 0.5)

resultsNames(object)

removeResults(object)</pre>

使用示例:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">results(dds, contrast=c("condition","C","B"))</pre>

dds代表DESeq2得到了差异分析的结果,contrast的输入一个长度为3的向量,与上面构建DESeqDataSet时输入的分组信息对应。

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 如输入的分组信息是如下的因子向量

group_list
[1] A A A B B B
Levels: A B

提取A和B差异分析结果的示例如下,A代表对照组,B代表处理组,注意先后顺序,与edgeR正好相反

results(dds, contrast=c("group_list","B","A"))</pre>

三、总结

limma,edgeR,DESeq2三大包基本是做转录组差异分析的金标准,大多数转录组的文章都是用这三个R包进行差异分析。 edgeR差异分析速度快,得到的基因数目比较多,假阳性高(实际不差异结果差异)。 DESeq2差异分析速度慢,得到的基因数目比较少,假阴性高(实际差异结果不差异)。 需要注意的是制作分组信息的因子向量是,因子水平的前后顺序,在R的很多模型中,默认将因子向量的第一个水平看作对照组

四、假如是多个分组呢

比如,大家都知道,TCGA的乳腺癌可以分成PAM50的5类,那么差异分析就复杂了,大家可以拿我3年前的WGCNA的教程做例子,下面是分组信息啦

image

这个时候有两个策略来做差异分析,当然,分组比较多的时候,差异分析并不是最好的策略啦,WGCNA等其它算法更好!

策略1:在分组信息里面挑选

参考我GitHub代码, https://github.com/jmzeng1314/my-R/tree/master/10-RNA-seq-3-groups

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">group_list
cont.matrix=makeContrasts(contrasts=c('treat_12-control','treat_2-control'),levels = design)
fit2=contrasts.fit(fit,cont.matrix)
fit2=eBayes(fit2)

tempOutput = topTable(fit2, coef='treat_12-control', n=Inf)
DEG_treat_12_limma_voom = na.omit(tempOutput)
write.csv(DEG_treat_12_limma_voom,"DEG_treat_12_limma_voom.csv",quote = F)

tempOutput = topTable(fit2, coef='treat_2-control', n=Inf)
DEG_treat_2_limma_voom = na.omit(tempOutput)
write.csv(DEG_treat_2_limma_voom,"DEG_treat_2_limma_voom.csv",quote = F)</pre>

在提取差异分析结果的时候,需要指定是哪个组和哪个组在进行比较。

值得一提的是, 我的GitHub简直就是宝藏,我上面提到的3年前的WGCNA的教程做例子,最近看到有两个文章就拿同样的数据代码和图片发了一个4分,一个5分的文章!!!

你懂得!

策略2:提取子矩阵和子分组信息

这个很容易理解了,把表达矩阵根据自己想要进行的两两比对来筛选即可,这样就可以多次做差异分析啦,而且保证每次都只有两个分组。

参考资料

http://www.bio-info-trainee.com/1514.html http://www.bio-info-trainee.com/255.html http://www.bio-info-trainee.com/1533.html https://ucdavis-bioinformatics-training.github.io/2018-June-RNA-Seq-Workshop/thursday/DE.html http://www.bioconductor.org/packages/release/bioc/html/limma.html http://www.bioconductor.org/packages/release/bioc/html/edgeR.html http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html https://rdrr.io/bioc/DESeq2/man/DESeqDataSet.html https://rdrr.io/bioc/DESeq2/man/results.html https://rdrr.io/bioc/limma/man/11RNAseq.html

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,319评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,801评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,567评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,156评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 61,019评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,090评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,500评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,192评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,474评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,566评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,338评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,212评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,572评论 3 298
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,890评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,169评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,478评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,661评论 2 335

推荐阅读更多精彩内容