Broad Institute视频笔记Introduction to Germline Variant Discovery

这篇笔记是这个系列视频的第5讲,笔记有的是用英文记的,有的用中文。因为是用零碎的时间来看视频记笔记,所以用英语还是中文看当时的心情。。。

视频地址:here

what difference between germline and somatic? Germline is essentially all the variants that you are born with that you herit from your parents, one half from your mother, the other from your father. There are also some germline variants that are unique to each person, and thoes are only in the range about 30. So this workshop we are going to focus on short variants, which are point mutations, and insertions, and deletions less than 50bp(This is an arbitrary threshhold that we set).

So this is overall best practices piplines for germline variants discovery. We went over the first column which is data pre-processing. We spoke about that. In this talk ,we will give you overview of the variants calling which is the second column. And the third column is filtering your variants and refining the genotypes that are called.

So the key player in germline variants discovery is this tool: HaplotypeCaller. HaplotypeCaller follows essentially four steps. The first step is identifying acitve regions. What HaplotypeCaller essentially doing is looking at your genome, and saying I'm gona to focus on the regions that have most variations. Because a lot of your genome is similar to your reference. So you want to concentrate on the regions that have variation to make it more effecient. And that is essentially what it doing at identifying active regions. And then it does local realignment using a graph base assembly to create haplotypes from reads around complex sequencing regions in the genome. And then it takes each of these reads and then five determines likelihoods for each reads against the haplotypes that it creates. The fourth step is getting the genotype for your calls.

So why do we do joint analysis? A single genome in itself is not giving you much information. For example, you find a variation in a sample, and you want to see whether it associates with any disease. How do you know that a variant call has some biological consequences? Maybe the variant is present in one of parent, and they are healthy and they don't show disease phenotypes, or maybe in that population most people have that variant. So adding that family and population data will help you filter out all of that variants. That is essentially the idea behand doing joint analysis.You want to focus on the variants that are rare in a population because those might have something to do with any disease-causing phenotype.

We did haplotype caller,and joint calling together. So if you had 10 samples, haplotypecaller would create a graph for all the 10 samples and then do joint calling for the entire dataset. So if you increase the number of your samples, the graph just go bigger and bigger. This make haplotypecaller very time consuming. If you say have one-way sequencing which gave your 10 samples, and then you got the second way sequencing and need to add sampls to your original data. You need to go through haplotyphcaller process all of again.

上面讲到如果你想在原有的Data上再加几个data,那么你需要重新运行haplotypecaller,非常的耽误时间。所以主讲人提出了一个新办法,这个方法是用haplotypecaller把每一个样品运行完后,生成一个GVCF文件,这时如果你想再加样品的话,只需要运行一遍haplotypecaller,然后把GVCF文件合并,再做joint analysis.

如果使用这个GVCF文件模式的话,同样也是4步。这4步与上面讲的一样。

GVCF文件是什么呢?它是genomics VCF files。它与vcf文件格式相同,除了一点:GVCF has information of all the positions in your genome.但是我们为什么需要这个信息呢?比如上面这个图,黄色部分是你样品里所有的variants信息,而蓝色部分是reference信息。如果你有多个样品在你的dataset里,你发现其中一个样品里有一个很有意思的variant,而其他样品里这个位置都有很好的reads覆盖,所以你可以知道这个variant很有意思。或者说,如果你在某一个位置没有看到variants,那么你可以判断出是真的没有,还是因为read没有覆盖到这个区域。所以如果你的文件里包含了refenrence信息,你就可以解释刚刚提到的问题了。

在做完joint analysis ,你需要filter your variants and refine your genotypes.

有几种方法可以filter variants。上面这张图是使用一个tool: VQSR完成的。VQSR takes seven different annotations and uses a combination of those to see what are good variants and what passes those variants. 在这张图里,所有红色的点是good variants,你需要保留的。绿色的点代表的variants不是特别的差,但是也不是很好。If you want to do this with two annotations, you take any two random annaotations on the X and Y axis. And if you try to form a box to enclose good variants, there's no real good way to do that with two annotations.

你可以通过添加population priors and family priors to make your calls better to define the genotype.在上面这个例子里,我们在一个样品里发现了417个de novo mutations. 这个数字是不合理的,因为一个人大概只有30左右mutations that unique to just you.这个数字显然太多了。在apply population priors and family priors,这个数字降低到17.当我们使用high confidence de novo mutations,我们最终得到8个de novo mutations。这个数字现在就make sense了。

Once you have a call set, at the end you want to check evaluate your call set and see how good were my calls? How sensitive my calls were? How specific were they? What is the ratio of true positives to false positives of false negatives. To do this, you would need true dataset, we would do concordance analysis.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,684评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,143评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,214评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,788评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,796评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,665评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,027评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,679评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 41,346评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,664评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,766评论 1 331
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,412评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,015评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,974评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,073评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,501评论 2 343