一定要夸夸R,可真是太棒棒啦!以下操作均基于R.
首先安装biomaRt:BiocManager::install('biomaRt')
加载:library(biomaRt)
主要利用ensembl的gene id(ENSG)作为中间转换,SNP-ensembl gene id-gene name
需要用到两个库:hsapiens_snp和hsapiens_gene_ensembl
但是需要的结果的称呼再这两个库中有所不同,可以用listAttributes(dbsnp)函数列出
这是我的语句:SCdbsnp =
listAttributes(dbsnp)
write.table(SCdbsnp,file="listAttributessnp.csv",sep="\t",
col.names=T,row.names=T, append = F, quote=FALSE)
SC = listAttributes(mart)
write.table(SC,file="listAttributes.csv",sep="\t",
col.names=T,row.names=T, append = F,quote=FALSE)
确定好自己想要哪些信息就可以开动啦!!!
以下是我自己的例子:
dbsnp <-useMart("ENSEMBL_MART_SNP",
dataset = "hsapiens_snp")#将这个库命名为dnsnp
snps<-read.csv("EPUrs.csv",header=T,sep=",")[,1]#加载我的csv格式的文件(第一列是rs,取第一列,命名为snps
getsnps
=getBM(attributes=c("refsnp_id","ensembl_gene_stable_id","consequence_type_tv","study_type"
,"study_external_ref","study_description","associated_gene","phenotype_name",
"phenotype_description","doi"),filters
= "snp_filter", values = snps, mart = dbsnp)#利用getBM检索函数输出rs号,gene id 等等,命名为getsnps
write.table(getsnps,file="EPUsnpstd.csv",sep="\t",
col.names=T, row.names=T,append = F, quote=FALSE)#将得到的结果写入cvs格式文件命名为EPUsnpstd.csv
以上会得到很多snp相关信息,接着利用gene id得到gene信息,基本操作同上
!!!注意,得到了csv格式的结果后需要分列保存在继续下一步!!!
mart <-useMart("ensembl",
"hsapiens_gene_ensembl")
genes<-read.csv("EPUsnpstd.csv",header=T,sep=",")[,3]#此时geng id在第三列,所以读取第三列。
getgenes =
getBM(attributes=c("ensembl_gene_id","external_gene_name","description","gene_biotype",
"study_external_id","source","external_synonym","phenotype_description"),filters
= "ensembl_gene_id",values = genes, mart = mart)#注意两个库中一些称呼的命名不一致,所以需要仔细看listAttributes(dbsnp)和listAttributes(mart).尤其是两个库ensembl gene id 称呼不一致,一个是ensembl_gene_stable_id,一个是ensembl_gene_id。
write.table(getgenes, file="EPUgenetd.csv",sep="\t",
col.names=T, row.names=T, append = F, quote=FALSE)
以下是我的部分结果