在运行ESTIMATE包的时候,需要先把表达矩阵输出成txt,然后运行filterCommonGenes函数,转换成gct文件,在转换这一步总是报错,错误提示是:
> filterCommonGenes(input.f="./step_7_estimate/exp.txt",
+ output.f="exp_genes.gct",
+ id="GeneSymbol")
[1] "Merged dataset includes 0 genes (10412 mismatched)."
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
排除了变量类型(numric)的问题,打开txt文件看了一眼,发现行名和列名多了很多引号,于是回过头去看输出txt的代码:
write.table(a,file="./step_7_estimate/exp.txt")
突发奇想加上了sep="\t",于是不报错了,但是依然显示没有merge到基因:
> write.table(a,file="./step_7_estimate/exp.txt",sep="\t")
> filterCommonGenes(input.f="./step_7_estimate/exp.txt",
+ output.f="exp_genes.gct",
+ id="GeneSymbol")
[1] "Merged dataset includes 0 genes (10412 mismatched)."
但是手动grep一下,发现很多基因其实是可以找到的
> grep("DCN",rownames(a))
[1] 4783
> grep("THBS2",rownames(a))
[1] 1903
于是去看了filterCommonGenes这个函数的源码
> filterCommonGenes
function (input.f, output.f, id = c("GeneSymbol", "EntrezID"))
{
stopifnot((is.character(input.f) && length(input.f) == 1 &&
nzchar(input.f)) || (inherits(input.f, "connection") &&
isOpen(input.f, "r")))
stopifnot((is.character(output.f) && length(output.f) ==
1 && nzchar(output.f)))
id <- match.arg(id)
input.df <- read.table(input.f, header = TRUE, row.names = 1,
sep = "\t", quote = "", stringsAsFactors = FALSE)
merged.df <- merge(common_genes, input.df, by.x = id, by.y = "row.names")
rownames(merged.df) <- merged.df$GeneSymbol
merged.df <- merged.df[, -1:-ncol(common_genes)]
print(sprintf("Merged dataset includes %d genes (%d mismatched).",
nrow(merged.df), nrow(common_genes) - nrow(merged.df)))
outputGCT(merged.df, output.f)
}
<bytecode: 0x000002a43c866268>
<environment: namespace:estimate>
把变量名赋值好之后,尝试分行run一下,在读取表达矩阵这一步发现了问题
> input.df <- read.table(input.f, header = TRUE, row.names = 1,
+ sep = "\t", quote = "", stringsAsFactors = FALSE)
可以看到读取的时候其实是有很多参数的,除了sep="\t"之外,还有quete=“”,而且发现读进来的表达矩阵,行名和列名都有引号。所以可能是引号的问题。
在write.table函数里加上quote=“”这个参数看一下
write.table(a,file="./step_7_estimate/exp.txt",sep="\t",
+ quote = F)
> filterCommonGenes(input.f="./step_7_estimate/exp.txt",
+ output.f="exp_genes.gct",
+ id="GeneSymbol")
[1] "Merged dataset includes 7099 genes (3313 mismatched)."
问题解决了,merge到3313个基因。