复现lncRNA文献时,作者使用了一个叫做Animal QTL的数据库。
The Animal Quantitative Trait Loci (QTL) Database (Animal QTLdb) strives to collect all publicly available trait mapping data, i.e. QTL (phenotype/expression, eQTL), candidate gene and association data (GWAS), and copy number variations (CNV) mapped to livestock animal genomes, in order to facilitate locating and comparing discoveries within and between species. New data and database tools are continually developed to align various trait mapping data to map-based genome features such as annotated genes.
下载了文件以后,查看内容:
wc -l qdwnld82711OVKG.txt
30195 qdwnld82711OVKG.txt
grep -v '^#' qdwnld82711OVKG.txt |less -SN
圈出来的几行没有坐标,需要去除。
awk
命令查看第二列:
grep -v '^#' qdwnld82711OVKG.txt |less -SN
可以看到,没有坐标的行输出的是字符串。
查看第二列的首字母:
grep -v "^#" qdwnld82711OVKG.txt |awk '{print(substr($2,1,1))}' |less -SN
统计:
grep -v "^#" qdwnld82711OVKG.txt |awk '{print(substr($2,1,1))}' |sort | uniq -c
我们只需要首字母是0-9的行。
grep -v "^#" qdwnld82711OVKG.txt |awk '(substr($2,1,1) ~ /[0-9]/){print($0)}' |less -SN
grep -v "^#" qdwnld82711OVKG.txt |awk '(substr($2,1,1) ~ /[0-9]/){print($0)}' |wc -l
28594
涨姿势
-
awk
命令字符串函数substr()
-
awk
命令中字符串匹配模式~ //