基于三代测序技术及Hi-C的碧凤蝶染色体级别基因组

Chromosomal-level reference genome of Chinese peacock butterfly (Papilio bianor) based on third-generation DNA sequencing and Hi-C analysis

碧凤蝶基因组

Abstract

来自昆明动物所的研究人员通过pacbio与Hi-c技术的结合从头测序得到了碧凤蝶染色体级别的基因组。最终组装的基因组大小是421.52Mb,30条染色体(29 autosome and 1 Z sex chromosome),scaffoldN50为13.12Mb,15,375个蛋白编码基因以及233.09Mb的重复序列,物种进化分歧分析表明碧凤蝶和柑橘凤蝶有共同的祖先,两个物种的分歧时间大概在23.69-36.07百万年前。种群历史数量统计表明,物种的种群扩增发生在最后一次间冰期到最后一次盛冰期之间,这可能是由于天敌以及对冰期气候环境的适应。

这篇文章的原始数据:https://www.ebi.ac.uk/ena/data/view/PRJNA530186
分析过程软件参数及代码:ftp://parrot.genomics.cn/gigadb/pub/10.5524/100001_101000/100653/
代码网盘链接地址:链接:https://share.weiyun.com/5e6tovj 密码:hktmze

Backgroud

蝴蝶是一种在全世界范围内非常具有观赏性,吸引性的动物,由于其不同寻常的翅纹在各个物种,性别以及季节都有不同的表现,尤其是拟态理论的提出,让蝴蝶再次成为了焦点。但由于昆虫基因组的高杂合度,目前只有6个科,37种蝴蝶有他们的参考基因组,而染色体级别的基因组只有红带袖蝶,庆网蛱蝶以及柑橘凤蝶,这些染色体都是通过构建遗传图谱得到的。
随着第三代单分子测序技术的发展,再结合高通量染色体构象捕获技术,可以得到染色体的图谱,目前已经应用到果蝇,蚊子以及蛾子中。


Characterization of Papilio bianor

Female adult. Left, dorsal view; right, ventral view

Data Description

Insect collection and breeding

2个五龄幼虫用于Hi-C测序,一头雄蛾用于genome survey以及de novo genome sequencing。

Genome survey using Illumina sequencing technology

一头雄蛾的胸和腹用于提取基因组DNA,Illumina HiSeq2000构建150和500bp两种插入片段大小的文库,kmer=17,G =
k-mer number/k-mer depth,预估的基因组大小为496.05 Mb,杂合度1.81%。


Kmer analysis

Library construction and sequencing using SMRT and Hi-C technologies

提取基因组DNA后,构建20kb文库用于Pacbio测序,10个SMRT cells( PacBio RSII platform)用于测序,共产生43.19 Gb subreads,平均长度16.4kb,400–700 bp的文库( Illumina
HiSeq X Ten platform), PE150测序,产生∼75.11 Gb
raw reads。


Statistics

Chromosomal-level genome assembly

考虑到碧凤蝶基因组的高杂合度。First,PacBio-only assembly using Wtdbg(–tidyreads 5000 -k 0 -p 17 -S 1),Wtdbg是一款基于fuzzy Bruijn graph算法的组装软件,用于组装PacBio以及nanapore的测序数据。Second,用Illumina reads去polish PacBio-only assembly sequences:先将Illumina reads mapping到PacBio-only assembly sequences,再用Pilon做两轮的碱基矫正。Third,由于矫正后的基因组仍有很多低测序深度的short contigs,将identity>90%,低测序深度的短contigs(size < 1,000 bp and coverage < 50 or size < 10,000 bp and coverage < 35)merge成更长的contigs。Fourth,用Juicer软件和3D de novo assembly将Hi-C的raw reads比对到polished assembly genome上去提高组装质量。
最终90.5.%的contigs被固定到30个uper-scaffolds上,这可能对应着30条染色体。最后,得到了碧凤蝶染色体级别的基因组,基因组大小是421.52Mb,scaffoldN50为13.12Mb,大约占预估基因组大小的85%(高杂合的表现)。


chromosomal interactions

Heat map of chromosomal interactions. Each chromosome is framed with a blue block, and each scaffold is framed with a green block.

Quality evaluation of assembled genome

基因组组装质量的评估主要通过三个方法。3C原则(completeness, base level contiguity, and accuracy)。First,基因组的完整性评估通过BUSCO评价(insecta_odb9),core genes的覆盖度为96.60%。Second,通过BWA和BLASER计算Illumina and PacBio reads与组装基因组的mapping rates,最终96.31% of Illumina reads mapped to the assembled genome with few heterozygous regions
96.86% of PacBio reads also mapped to the assembled genome
with few heterozygous regions。Third,通过碧凤蝶基因组与柑橘凤蝶基因组染色体共线性分析,61,082,412 bp of the P. bianor assembled genome could be aligned (1:1) with high confidence
(-m 0.01) to the P. xuthus reference genome。

syntenic relationships

Circos plot of P. bianor chromosome-level reference genome with the previously released Papilio xuthus genome (obtained from a Chinese group). Shown from outermost to innermost are (1) gene density, (2) repeat element density, (3) GC content, and (4) syntenic regions with P. xuthus (left).

Genome annotation

基因组注释:

  • 重复序列注释
  • 基因结构注释
  • 基因功能注释
  1. 重复序列注释
    重复序列包括串联重复序列以及散在重复序列。
    串联重复序列主要包括微卫星,小卫星等短的串联重复。
    散在重复序列包括DNA/RNA转座子,LTR(long terminal repeats),LINE(long interspersed nuclear elements),SINE(short interspersed nuclear elements)等。

总的来讲,基因组的注释主要通过基于从头预测以及基于同源比对的方法。

重复序列
  • Tandem Repeats Finder to annotate the tandem repeats(Tandem Repeats Database)

  • TEs de novo and homology-based approaches at both the DNA and protein levels

    • At the DNA level

      • RepeatModeler to construct a de novo repeat library
      • Repeat-Masker to search similar TEs against the known Repbase TE library and de novo repeat library
      • LTR FINDER to find long terminal repeats (LTRs)
    • At the protein level

      • RepeatProteinMask search the assembled genome against the TE protein database using the WU-BLASTX engine
蛋白编码基因
  • 结构注释

    • de novo gene prediction: the repeat-masked genome was analyzed by SNAP GENSCAN glimmerHMM AUGUSTUS
    • homology-based predictions: TBLASTN with an E-value cut-off of 1e−5 to align the protein sequences of the reference gene set to the genome, and GeneWise to perform more precise alignment
    • Evidence-Modeler software was used to integrate the genes predicted by the homology and de novo approaches and generate a comprehensive, non-redundant gene set
  • 功能注释

    • KEGG, TrEMBL, SwissProt, and COG databases were searched for best matches to P. bianor for the protein sequences yielded by EVM software, using BLASTP
    • Pfam, PRINTS, ProDom, and SMART databases were searched for known motifs and domains in our sequences using Inter-ProScan software
    • searched all predicted gene sequences against the GenBank nonredundant protein (nr) database using BLASTP


      transposable elements (TEs)

      图片.png

(a) Breakdown of the whole-genome assemblies into different functional classes in Papilio

基因家族鉴定与进化分析

  • OrthoMCL to cluster the annotated genes


    图片.png

(b) Venn diagram of the shared gene families of Papilio.The result showed that 293 gene families were specific to P. bianor.

  • Using CAFE identified expanded gene families and contracted gene families

  • A total of 1,378 one-to-one single-copy orthologs that contain
    only 1 protein for each species were collected and clustered
    by OrthoMCL

  • nucleic acid sequences were aligned using PRANK,Gene alignments were concatenated and phylogenetic trees were constructed using RAxML with GTR+G+I model


    图片.png
  • the phylogeny was further analyzed by MCMCtree in PAML to investigate the divergence time of these species


    图片.png

(d) Maximum likelihood phylogenetic tree of Papilionoidea constructed by the concatenated alignment of 1,378 1-to-1 single-copy ortholog genes. The numbers in the square brackets on the nodes are the 95% confidence intervals of divergence time. The red dots are fossil evidence downloaded from the TimeTree website , and the black dots are inferred time obtained from the TimeTree website. Both were used to calibrate divergent time.

  • demographic histories applying the Pairwise Sequentially Markovian Coalescence analysis mapping Illumina short reads to the assembled genome with BWA and calling variants with SAMtools


    图片.png

(c) The dynamic changes of the effective population size were plotted using PSMC software, with 100 bootstrap replicates to test the robust variations. The parameter “g” represents the generation time in years, and the parameter “μ” means the per generation mutation rate.

下期预告:

苹果蠹蛾基因组

https://www.nature.com/articles/s41467-019-12175-9

reference:

https://academic.oup.com/gigascience/article/8/11/giz128/5612101

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,242评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,769评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,484评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,133评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 61,007评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,080评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,496评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,190评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,464评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,549评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,330评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,205评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,567评论 3 298
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,889评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,160评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,475评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,650评论 2 335

推荐阅读更多精彩内容