Overview
本节的目的是使用您在前几节中学习的内容创建协议。在本教程中,我们将准备抗体抗原结合结构,在抗体上进行点突变,记录结合强度的变化,并生成显示能量差异的热图。该方法广泛用于抗体界面设计,以提高结合亲和力或扩大结合范围。
Protocol
整个Protocol分为八个步骤:
- 使用FastRelax()准备结构
- 编写函数以执行突变PackMover()
- 编写函数以解除绑定抗体抗原结合结构unbind()
- 编写获取野生型氨基酸的函数
- 编写函数以正确变异和pack特定的残基并输出能量情况
- 循环通过接口位置,用输出文件将它们变为20个氨基酸
- 汇总结合能分析的所有输入文件
Loading structures
from pyrosetta import *
init()
pose = pose_from_pdb("inputs/1jhl.clean.pdb")
testPose = Pose()
testPose.assign(pose)
print(testPose)
PDB file name: inputs/1jhl.clean.pdb
Total residues: 353
Sequence: DIELTQSPSYLVASPGETITINCRASKSISKSLAWYQEKPGKTNNLLIYSGSTLQSGIPSRFSGSGSGTDFTLTISSLEPEDFAMYICQQHNEYPWTFGGGTKLEIKRQVQLQQSGAELVRPGASVKLSCKASGYTFISYWINWVKQRPGQGLEWIGNIYPSDSYTNYNQKFKDKATLTVDKSSSTAYMQLSSPTSEDSAVYYCTRDDNYGAMDYWGQGTTVTVKVYGRCELAAAMKRMGLDNYRGYSLGNWVCAAKFESNFNTGATNRNTDGSTDYGILQINSRWWCNDGRTPGSKNLCHIPCSALLSSDITASVNCAKKIVSDGDGMNAWVAWRKHCKGTDVNVWIRGCRL
Fold tree:
FOLD_TREE EDGE 1 108 -1 EDGE 1 109 1 EDGE 109 224 -1 EDGE 1 225 2 EDGE 225 353 -1
Step 1. Prepare the starting structure with FastRelax()
适当relax结构对于Rosetta的设计至关重要。非松弛结构可能无法很好地克服不良的全局能量,因此会影响以下关于结合能的分析。
FastRelax()用于relax结构。虽然我们希望将结构置于最低能量状态,但我们希望尽可能保留晶体结构中的骨架信息(最低RMSD)。因此,我们将constrain_relax_to_start_coords(True)应用于FastRelax()。
由于FastRelax()占用了大量资源,运行它似乎会使notebook崩溃,因此我们删除了“应用”部分(执行松弛的部分),并打印出松弛Mover()。我们将松弛的结构上传到输入文件夹,以便进一步分析。
坐标约束relax是一种常规准备。有关此准备方法的更高级版本,请参阅:https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059004
from pyrosetta.rosetta.protocols.relax import FastRelax
relax = FastRelax()
scorefxn = get_fa_scorefxn()
relax.set_scorefxn(scorefxn)
relax = rosetta.protocols.relax.FastRelax()
relax.constrain_relax_to_start_coords(True)
print(relax)
#relax.apply(testPose)
#testPose.dump_pdb('test.relax.pdb')
Writing Function in Python
函数是组织代码的好方法。从本节开始,我将介绍一些功能来促进协议。
要在python中定义函数,需要使用“def”关键字。函数既可以返回值,也可以简单地执行代码块。定义的函数也可以在主函数或代码的其他部分中调用。
Step 2. Write the function for mutation
此函数利用了先前教程中演示的ResidueSelectorMover()。允许设计和repack突变位置,而附近的残留物仅限于repack。最后的突变将使用PackMover()执行。
from pyrosetta.rosetta.core.pack.task import *
from pyrosetta.rosetta.protocols import *
from pyrosetta.rosetta.core.select import *
def pack(pose, posi, amino, scorefxn):
# Select Mutate Position
mut_posi = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector()
mut_posi.set_index(posi)
#print(pyrosetta.rosetta.core.select.get_residues_from_subset(mut_posi.apply(pose)))
# Select Neighbor Position
nbr_selector = pyrosetta.rosetta.core.select.residue_selector.NeighborhoodResidueSelector()
nbr_selector.set_focus_selector(mut_posi)
nbr_selector.set_include_focus_in_subset(True)
#print(pyrosetta.rosetta.core.select.get_residues_from_subset(nbr_selector.apply(pose)))
# Select No Design Area
not_design = pyrosetta.rosetta.core.select.residue_selector.NotResidueSelector(mut_posi)
#print(pyrosetta.rosetta.core.select.get_residues_from_subset(not_design.apply(pose)))
# The task factory accepts all the task operations
tf = pyrosetta.rosetta.core.pack.task.TaskFactory()
# These are pretty standard
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.InitializeFromCommandline())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.IncludeCurrent())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.NoRepackDisulfides())
# Disable Packing
prevent_repacking_rlt = pyrosetta.rosetta.core.pack.task.operation.PreventRepackingRLT()
prevent_subset_repacking = pyrosetta.rosetta.core.pack.task.operation.OperateOnResidueSubset(prevent_repacking_rlt, nbr_selector, True )
tf.push_back(prevent_subset_repacking)
# Disable design
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.OperateOnResidueSubset(
pyrosetta.rosetta.core.pack.task.operation.RestrictToRepackingRLT(),not_design))
# Enable design
aa_to_design = pyrosetta.rosetta.core.pack.task.operation.RestrictAbsentCanonicalAASRLT()
aa_to_design.aas_to_keep(amino)
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.OperateOnResidueSubset(aa_to_design, mut_posi))
# Create Packer
packer = pyrosetta.rosetta.protocols.minimization_packing.PackRotamersMover()
packer.task_factory(tf)
#Perform The Move
if not os.getenv("DEBUG"):
packer.apply(pose)
#Load the previously-relaxed pose.
relaxPose = pose_from_pdb("inputs/1jhl.relax.pdb")
#Clone it
original = relaxPose.clone()
scorefxn = get_score_function()
print("\nOld Energy:", scorefxn(original),"\n")
pack(relaxPose, 130, 'A', scorefxn)
print("\nNew Energy:", scorefxn(relaxPose),"\n")
#Set relaxPose back to original
relaxPose = original.clone()
Old Energy: -1056.7705978308402
New Energy: -1057.2353638196532
Step 3. unbind()
该函数用于结合能分析。为了计算结合能,我们需要对结合态结构的总能量进行评分,分离(解除结合)抗原和抗体,然后对未结合态总能量进行打分。bound energy - unbound energy。您还可以使用位于rosetta.protocols.analysis中的InterfaceAnalyzerOver(IAM)。
from pyrosetta.rosetta.protocols import *
def unbind(pose, partners):
STEP_SIZE = 100
JUMP = 2
docking.setup_foldtree(pose, partners, Vector1([-1,-1,-1]))
trans_mover = rigid.RigidBodyTransMover(pose,JUMP)
trans_mover.step_size(STEP_SIZE)
trans_mover.apply(pose)
##Reset the original pose
relaxPose = original.clone()
scorefxn = get_score_function()
bound_score = scorefxn(relaxPose)
print("\nBound State Score",bound_score,"\n")
unbind(relaxPose, "HL_A")
unbound_score = scorefxn(relaxPose)
print("\nUnbound State Score", unbound_score,"\n")
print('dG', bound_score - unbound_score, 'Rosetta Energy Units (REU)')
Bound State Score -1056.7706161583792
Unbound State Score -998.7222028352029
dG -58.048413323176305 Rosetta Energy Units (REU)
Step 4. wildtype()
评估结合改善的一个重要指标是突变体结合能与野生型结合能的比率。此函数返回给定位置的野生型氨基酸。
def wildtype(aatype = 'AA.aa_gly'):
AA = ['G','A','L','M','F','W','K','Q','E','S','P'
,'V','I','C','Y','H','R','N','D','T']
AA_3 = ['AA.aa_gly','AA.aa_ala','AA.aa_leu','AA.aa_met','AA.aa_phe','AA.aa_trp'
,'AA.aa_lys','AA.aa_gln','AA.aa_glu', 'AA.aa_ser','AA.aa_pro','AA.aa_val'
,'AA.aa_ile','AA.aa_cys','AA.aa_tyr','AA.aa_his','AA.aa_arg','AA.aa_asn'
,'AA.aa_asp','AA.aa_thr']
for i in range(0, len(AA_3)):
if(aatype == AA_3[i]):
return AA[i]
print(wildtype(str(relaxPose.aa(130))))
Setp 4a. Exploration.
我们也可以使用一些实用函数来代替这个函数。
我们可以使用返回的序列STRING。注意,在Rosetta中,字符串的索引为0。
print(relaxPose.sequence())
print("\n",relaxPose.sequence()[2])
DIELTQSPSYLVASPGETITINCRASKSISKSLAWYQEKPGKTNNLLIYSGSTLQSGIPSRFSGSGSGTDFTLTISSLEPEDFAMYICQQHNEYPWTFGGGTKLEIKRQVQLQQSGAELVRPGASVKLSCKASGYTFISYWINWVKQRPGQGLEWIGNIYPSDSYTNYNQKFKDKATLTVDKSSSTAYMQLSSPTSEDSAVYYCTRDDNYGAMDYWGQGTTVTVKVYGRCELAAAMKRMGLDNYRGYSLGNWVCAAKFESNFNTGATNRNTDGSTDYGILQINSRWWCNDGRTPGSKNLCHIPCSALLSSDITASVNCAKKIVSDGDGMNAWVAWRKHCKGTDVNVWIRGCRL
E
现在让我们从保存化学信息的ResidueType对象中获取一些信息。
print(relaxPose.residue_type(1))
ASP:NtermProteinFull (ASP, D):
Base: ASP
Properties: POLYMER PROTEIN CANONICAL_AA LOWER_TERMINUS SC_ORBITALS POLAR CHARGED NEGATIVE_CHARGE METALBINDING ALPHA_AA L_AA
Variant types: LOWER_TERMINUS_VARIANT
Main-chain atoms: N CA C
Backbone atoms: N CA C O 1H 2H 3H HA
Side-chain atoms: CB CG OD1 OD2 1HB 2HB
最后,让我们从类型中获取残留物
resT = relaxPose.residue_type(1)
print(resT.base_name())
print(resT.name3())
print(resT.name1())
ASP
ASP
D
Step 5. Integrate functions for mutate and output
def mutate(pose, posi, amino, partners):
#main function for mutation
CSV_PREFIX = 'notec'
PDB_PREFIX = 'notep'
#Initiate test pose
testPose = Pose()
testPose.assign(pose)
#Initiate energy function
scorefxn = get_fa_scorefxn()
unbind(testPose, partners)
native_ub = scorefxn(testPose)
testPose.assign(pose)
#Variables initiation
content = ''
name = CSV_PREFIX + str(posi)+str(amino) + '.csv'
pdbname = PDB_PREFIX + str(posi)+str(amino) + '.pdb'
wt = wildtype(str(pose.aa(posi)))
# energy before mutaion
pack(testPose, posi, amino, scorefxn)
testPose.dump_pdb(pdbname)
bound = scorefxn(testPose)
unbind(testPose, partners)
unbound = scorefxn(testPose)
binding = unbound - bound
testPose.assign(pose)
# energy after mutaion
if (wt == amino):
wt_energy = binding
else:
pack(testPose, posi, wt, scorefxn)
wtbound = scorefxn(testPose)
unbind(testPose, partners)
wtunbound = scorefxn(testPose)
wt_energy = wtunbound - wtbound
testPose.assign(pose)
content=(content+str(pose.pdb_info().pose2pdb(posi))+','+str(amino)+','
+str(native_ub)+','+str(bound)+','+str(unbound)+','+str(binding)+','
+str(wt_energy)+','+str(wt)+','+str(binding/wt_energy)+'\n')
f = open(name,'w+')
f.write(content)
f.close()
mutate(relaxPose, 130, 'A', 'HL_A')
Step 6. Loop through interface positions
以下代码循环通过所有重链和轻链位置,将它们突变为所有20个氨基酸。实际运行可能再次使笔记本崩溃。
您可以任意输出数据。pose.scores()将具有数据、pandas Dataframes数据帧。在这里,我们将简单地以csv文件的形式输出每个突变的能量信息,并将它们分类到一个文件中以供将来分析。将输出文件上载到输入文件夹以进行R分析。
在实际工作环境中,可以实现并行化。请参见PyRosetta一章。对于此任务,您不一定需要完成整个循环。我们在inputs文件夹中有一个完成的版本。
#NOTE - This takes a very long time!!
# You may skip this block to continue the tutorial with pre-configured outputs.
if not os.getenv("DEBUG"):
for i in [92,85,68,53,5,45,44,42,32,31,22,108,100]:
print("\nMutating Position: ",str(i),"\n")
for j in ['G','A','L','M','F','W','K','Q','E','S','P','V','I','C','Y','H','R','N','D','T']:
mutate(relaxPose, i, j, 'HL_A')
Analysis of binding data
在收集汇总的结合能量信息后,我们使用panda进行过滤和可视化。我们过滤掉了较低的非束缚态能量结构,即那些非束缚态总能量高于原生态的结构,并从过滤后的数据中生成热图。如果您不想在第8步完成for循环,我们将合并输出csv的完成版本上载到名为“note_output.csv”的输入文件夹。
#import modules need for analysis
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
csv_file_name = 'inputs/note_output.csv'
#Generating heatmaps
UNBOUND_CUTOFF = -995
RATIO_CUTOFF = 1.001
#load the data into a pandas dataframe
df = pd.read_csv(csv_file_name, names='Position Amino.Acid Native Bound Unbound Binding WT_Binding WT Ratio'.split(), index_col=False )
#Add wildtype AA to position (for display)
df['Position_WT_aa'] = df.Position + ' (' + df.WT + ')'
#filter values
df = df.query(f'Unbound<{UNBOUND_CUTOFF} and Ratio>{RATIO_CUTOFF}')
# convert from tidy format (https://en.wikipedia.org/wiki/Tidy_data) to a wider format
df = pd.pivot_table(df, values='Ratio',
index=['Position_WT_aa'],
columns='Amino.Acid')
#plot the heatmap
f, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(df, cmap='coolwarm', linewidths= 1, linecolor='lightgray')
plt.xlabel('Amino acid mutation');
plt.ylabel('Position chain and wild type AA');
sns.set_context("talk") #make labels bigger