文献里面用到的基因组注释方法（不包括重复序列和ncRNA）

Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement (NG, 2019)

1、同源比对注释

For homolog evidence, 744,030 annotated protein sequences of six species (Arabidopsis thaliana, Brachypodium distachyon, Oryza sativa, Setaria italica, Sorghum bicolor, Zea mays) were aligned to the genome using exonerate, and then clustered and filtered to result in the final homolog gene set.

2、转录组注释

Generated 327,904 high-quality full-length transcripts from Iso-seq and 1,795,841 Trinity-assembled transcripts from the RNA-seq. The transcripts from RNA-seq and Iso-Seq were further validated by PASA.

3、de novo

we used Augustus and FGENESH trained on 2,000 homolog genes which were supported by Iso-Seq full-length transcripts and monocots transcripts, respectively.

4、整合

All the evidence was submitted to MAKER resulting in 40,936 gene models and 48,224 transcripts. The output of MAKER was refined again by PASA only retaining the validated transcripts.

The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication

1、同源比对注释

2、转录组注释

RNA-seq and Iso-Seq reads were mapped onto the reference genome using TopHat and Bowtie 2, respectively. Hints with locations of potential intron–exon boundaries were generated from the alignment files with the software package BAM2 hints in the MAKER package. MAKER with AUGUSTUS was then used to predict genes in the repeat-masked reference genome.

3、de novo

AUGUSTUS, SNAP and GeneMark were used for ab initio gene prediction, using model training based on coding sequences from A. ipaensis, A. duranensis, G. max and A. thaliana.

4、整合

Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. (NG, 2018)

使用MAKER做了2轮分析，并且又手动做了很多调整。这里只记录第一轮，详见文章内容。
1、同源比对注释

2、转录组注释
Trinity assembled transcripts (genome-guided) were fed to PASA. The PASA-assembled transcripts were used for training.

3、de novo

SNAP, GENEMARK and AUGUSTUS, were each trained with those selected proteins.

4、整合

MAKER pipeline was used to integrate multiple tiers of coding evidence, including ab initio gene prediction, transcript evidence and protein evidence and generate a comprehensive set of protein-coding genes.

Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense

1、同源比对注释

For the homolog-based approach, GeMoMa (version 1.3.1) software was applied by using protein sequences from Populus trichocarpa, Arabidopsis thaliana, Vitis vinifera, Theobroma cacao and Gossypium raimondii.

2、转录组注释

For the transcript-based prediction, the Hisat (version 2.0.4) and Stringtie (version 1.2.3) programs were used to carry out reference-based transcriptome assembly (data from NCBI BioProject of PRJNA248163 and PRJNA266265). TransDecoder (version2.0; https://github.com/TransDecoder/TransDecoder/) and GeneMarkS-T (version 5.1) were used to predict genes based on transcripts. The PASA (version 2.0.2) software was used to predict genes based on unigenes and full-length transcripts from the PacBio sequencing.

3、de novo

For the de novo prediction, five software programs were used, including Genscan, Augustus (version 2.4), GlimmerHMM (version 3.0.4), GeneID (version 1.4) and SNAP (version 2006-07-28) to scan the repeat-masked genome.

4、整合

Gene models from these different approaches were combined using the EVM software (version 1.1.1).

The rubber tree genome reveals new insights into rubber production and species adaptation (NP, 2016)

1、同源比对注释

SPALN was used for protein homologue search with the parameter “-Q4 –O0 –M10 –H180” against proteins in Malpighiales from NRDB and Uniprot

2、转录组注释

the assembled transcripts from transcriptome sequencing were used to construct gene models by the PASA software for training the predictors, as well as extracting the most possible coding sequences (CDS) with the PASA inner-built Transdecoder program.

3、de novo

Four HMM based predictors for ab initio prediction were used, namely AUGUSTUS, GlimmerHMM, SNAP, and FGENESH++. The first three predictors were trained with PASA-built training sets and the FGENESH++ was run with pre-trained parameters specialized for Hevea.

4、整合

All results from the three types of prediction were integrated by EVM software.

除此之外还写了脚本对上述结果进行了过滤，使用了4个标准，详见文章

Finally, all gene models were updated and curated by PASA software to confirm the UTR region and alternative splicing form. Highly repetitive genes, such as “Retro-transposon”, were manually removed from the candidates.

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 205,236评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 87,867评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 151,715评论 0赞 340
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,899评论 1赞 278
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,895评论 5赞 368
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,733评论 1赞 283
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,085评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,722评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,025评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,696评论 2赞 323
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,816评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,447评论 4赞 322
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,057评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,009评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,254评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,204评论 2赞 352
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,561评论 2赞 343

文献里面用到的基因组注释方法（不包括重复序列和ncRNA）

Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement (NG, 2019)

The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication

Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. (NG, 2018)

Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense

The rubber tree genome reveals new insights into rubber production and species adaptation (NP, 2016)

推荐阅读更多精彩内容