关键基因和hub基因(生物网络角度)

写在前面

简单来说,hub基因就是网络中节点连接的瓶颈基因,一个连接了很多节点的基因。
这篇文章仍然来自几篇文章及自己平时的积累,主要阐述关键基因和hub基因。很多人误以为hub基因就是关键基因,甚至有人认为差异表达基因就是关键基因。在正式看本文章之前,我先以个人理解的角度简单的来说明这三者之间的关系,不同见解的请留言。

  • 差异表达基因是两个group之间有统计学差异的gene,以芯片为例的话,几万个探针里可能差异的就1000个左右(当然根据设定阈值差异很大)
  • hub基因,是degree高的gene,在基因表达网络中有高的连接度degree,不涉及betweeness等。并且hub基因的筛选有很大的人为因素,到底是取前5%还是10%没有具体要求,一般建议5%。也就是说这是一个很宽松的设定。
  • 关键基因,有人从hub里挑靠前的,有人从差异表达基因里挑p值大的。到怎么才算关键基因?笼统来说,假如你这个基因被敲减,表型显著消失,那肯定是关键基因。但仅从生物信息分析角度怎么挑?不可能有一种方法就可以直接解决这个问题,现在只从表达网络的角度,稍后我会写一篇多个角度如何筛选关键基因的文章。,其范围要比hub小。hub不一定关键,关键不一定hub。

总之,在数目上获范畴上

DGEs>Hubs>key genes(candidate genes)

------------------------------------------------

好了,开始正文吧

HUB 基因

The WGCNA approach typically deals with the identification of gene modules by using the gene expression levels that are highly correlated across samples. This technique has been successfully utilized to detect gene modules in Arabidopsis, rice, maize and poplar for various biotic and abiotic stresses . Further, this approach also leads to construction of Gene Co-expression Network (GCN), a scale free network, where, genes are represented as nodes and edges depict associations among genes . In such network, highly connected genes are called hub genes, which are expected to play an important role in understanding the biological mechanism of response under stresses/conditions. Identification of hub genes will also help in mitigating the stress in plants through genetic engineering. The existing approaches have mainly focused on hub gene identification, based only on gene connection degrees in the GCN. Moreover, these techniques select such genes empirically without any statistical criteria. Besides, few approaches can be found in the literature for the identification of hub nodes in a scale free network.

这里可以看出,hub基因是是在无尺度共表达网络中存在的,对应着degree,也就是说在GCN中。现存的方法主要关注hub基因的鉴定,基于的就是GCN中的连接度,这些技术只是凭经验选择,并没有统计学标准。另外,在文献中很少有方法发现来鉴定无尺度网络的中hub nodes。
所以作者提出了一个算法,并写了一个包,对hub gene提供p值,可以根据p值标准来减少hub gene数目。
包在这里
文章地址1
文章地址2

It has been a long-standing长久存在的 goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has been on highly connected proteins (“hubs”). As a complementary notion, it is possible to define bottlenecks as proteins with a high betweenness centrality (i.e., network nodes that have many “shortest paths” going through them, analogous to major bridges and tunnels on a highway map). Bottlenecks are, in fact, key connector proteins with surprising functional and dynamic properties. In particular, they are more likely to be essential proteins. In fact, in regulatory and other directed networks, betweenness (i.e., “bottleneck-ness”) is a much more significant indicator of essentiality than degree (i.e., “hub-ness”). Furthermore, bottlenecks correspond to the dynamic components of the interaction network—they are significantly less well coexpressed with their neighbors than nonbottlenecks, implying that expression dynamics is wired into the network topology.
A network is a graph consisting of a number of nodes with edges connecting them. Recently, network models have been widely applied to biological systems. Here, we are mainly interested in two types of biological networks: the interaction network, where nodes are proteins and edges connect interacting partners; and the regulatory network, where nodes are proteins and edges connect transcription factors and their targets. Betweenness is one of the most important topological properties of a network. It measures the number of shortest paths going through a certain node. Therefore, nodes with the highest betweenness control most of the information flow in the network, representing the critical points of the network. We thus call these nodes the “bottlenecks” of the network. Here, we focus on bottlenecks in protein networks. We find that, in the regulatory network, where there is a clear concept of information flow, protein bottlenecks indeed have a much higher tendency to be essential genes. In this type of network, betweenness is a good predictor of essentiality. Biological researchers can therefore use the betweenness as one more feature to choose potential targets for detailed analysis.

image
image

下面是关于hub和bottlenecks的区别解释

Central complex members have a low betweenness and are hub–nonbottlenecks. 中心复合体成员低betweenness,属于hub-nonbottlenecks.

Because of the high connectivity inside these complexes, paths can go through them and all their neighbors. On the other hand, hub–bottlenecks tend to correspond to highly central proteins that connect several complexes or are peripheral members of central complexes.

Hub-bottlenecks倾向于对应那些高中心性蛋白,连接几个复合体,或者是中心复合体的周边成员,他们有高betweenness的事实显示这些蛋白不是简单的大的蛋白复合体的成员(nonbottleneck-hubs的特点),而是把这个复合体和网络中其他部分连接起来,一定意义上说,是真正的连接度瓶颈。

The fact that they have a high betweenness suggests that these proteins are not, however, simply members of large protein complexes (which is true for nonbottleneck–hubs), but are those members that connect the complex to the rest of the graph; in a sense, real connectivity bottlenecks. While hub–nonbottlenecks mainly consist of structural proteins, hub–bottlenecks are more likely to be part of signal transduction pathways.
Hub-nonbottlenecks主要构成结构蛋白,
Hub-bottlenecks更倾向于是信号转导通路的一部分

Furthermore, hub–bottlenecks are (by construction) the most efficient in disrupting the network upon hub removal. This relates nicely to the date/party-hub concept by Han et al. : hub–bottlenecks tend to be date-hubs, whereas hub–nonbottlenecks tend to be party-hubs.

另外,一旦hub被移走,hub-bottlenecks是破坏网络最有效的节点。这和Han的hub概念非常接近:hub-bottlenecks倾向于是date-hubs,hub-nonbottlenecks倾向于party-hubs(hans的文章看了就明白,datehubs更容易是大架构的组织者维持者,是大老板)。(han的这个观点发表在nature上,下面是han的观点)

上面说的那个han的nature上的文章
https://www.nature.com/articles/nature02555
In apparently scale-free protein–protein interaction networks, or ‘interactome’ networks1,2, most proteins interact with few partners, whereas a small but significant proportion of proteins, the ‘hubs’, interact with many partners.
在无尺度蛋白相互作用网络或叫相互作用组网络,大多数蛋白都是和少数的partners作用,只有少部分蛋白,也就是hubs,和很多partners作用.

非hub但瓶颈通常比那些非hub非瓶颈蛋白和他们的邻居共表达更少,符合这个观察:betweenness是和邻接蛋白平均相关性的指标,非hub但瓶颈蛋白很少是复合体成员,并且大部分都是调节蛋白和信号转到machinery。
不管是生物还是非生物,只要是无尺度网络,都对随机的node移除有抵抗能力,但是对hubs的移除非常敏感。
大概就是酵母做了个实验,移除敲除编码hub蛋白的基因,比非hub的死亡率大3倍,我们发现了两类hub:party hubs党派型,同时和partners的大部分相互作用。Date hubs约会型,不同的时间或位置结合不同的partners。

image

这样,酵母中的相互作用网络的hub基于他们的partners‘表达谱,可以分为两类:date和party hubs。这种区分揭示了酵母蛋白组组织模块的模型,通过regulators,mediators或adaptors连接模块,这就是date hubs。Party hubs代表不同的模块内部的必须的成分,对这这些模块介导的功能很重要(因此倾向于是必须蛋白),倾向于在蛋白组的组织上低水平工作。(大概意思是date hubs是大boss,沟通衔接,而party hubs是模块内部的小老板)。我们提出,date hubs在整个蛋白组网络中生物模块的总体组织中是必须的,参与的是大范围的整合连接(虽然一些date hub可以简单的共享,并且调节模块内或跨模块的局部功能)。这种相互作用网络的关键特点,比如对抗外界环境的遗传稳定性和弹性,使用这样的模块组织方式作为框架就更好理解了。

因此,所谓的date-hubs是那些有高的betweeness(hub-bottlenecks),
而party-hubs更可能是有着低betweeness的hubs(hub-nonbottlenecks)
这个发现,或许表明了相互作用网络中动态和拓扑特性之间的联系,而这迄今为止是人类未知的。
作者相信,虽然先有不好实现的地方,但是betweenness将来会被证明是一个非常有用的工具对很多蛋白昂立来说,尤其是有方向的edges(调控网络)。
总之,我们提供了两种互补的拓扑网络特性的整合分析,这适合于不同的网络类型。这种整合的方法解释了先前不为人知的网络拓扑性质之间的联系,蛋白质必要性和表达动态。我们相信,这种整合的方法就像现在提出的这种,会对将来的预测模型至为重要。

作者:Y大宽
链接:https://www.jianshu.com/p/e2acfee2ba5f
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,684评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,143评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,214评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,788评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,796评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,665评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,027评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,679评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 41,346评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,664评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,766评论 1 331
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,412评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,015评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,974评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,073评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,501评论 2 343