使用vdjtools进行免疫组库分析

mixcr与vdjtools是基于java平台开发的处理从原始序列到定量克隆型的大量免疫组数据的免疫分析软件,在使用前要确保java环境是ok的。
官网下载 Java Runtime Environment,jre是java的运行环境。

java -version #检查java环境是否ok

下载vdjtools并安装,latest release
vdjtools的可视化依赖于R的一些可视化包,安装所需要的R包。

使用vjtools自带命令安装

java -jar /path to vdjtools/vdjtools-1.2.1.jar Rinstall

也可以在R中手动安装


将分析好的数据转换为vdjtools可识别的格式,上游分析参考使用mixcr构建免疫组库及下游分析

构建分组文件
分组文件应包含所有样本名以及样本所在位置。

metada.txt

# convert 
java -jar /path to vdjtools/vdjtools-1.2.1.jar Convert -S mixcr -m metadata.txt output_prefix
#or
java -jar /path to vdjtools/vdjtools-1.2.1.jar Convert -S mixcr sample1.txt sample2.txt ...  output_prefix
# /path to vdjtools/:  vdjtolls的安装路径
#output_prefix: 输出路径

转换完后的表格

转换结果

1.Basic analysis

1.1 CalcBasicStats

This routine computes a set of basic sample statistics, such as read counts, number of clonotypes, etc.

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcBasicStats sample1.txt sample2.txt ... output_prefix
#or
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcBasicStats -m metadata.txt output_prefix
# /path to vdjtools/:  vdjtolls的安装路径
#output_prefix: 输出路径
all.basicstats.txt

Tabular output

The following table with .basicstats.txt suffix is generated,

Column Description
sample_id Sample unique identifier
Metadata columns. See Metadata section
count Number of reads in a given sample
diversity Number of clonotypes in a given sample
mean_frequency Mean clonotype frequency
geomean_frequency Geometric mean of clonotype frequency
nc_diversity Number of non-coding clonotypes
nc_frequency Frequency of reads that belong to non-coding clonotypes
mean_cdr3nt_length Mean length of CDR3 nucleotide sequence. Weighted by clonotype frequency
mean_insert_size Mean number of inserted random nucleotides in CDR3 sequence. Characterizes V-J insert for receptor chains without D segment, or a sum of V-D and D-J insert sizes
mean_ndn_size Mean number of nucleotides that lie between V and J segment sequences in CDR3
convergence Mean number of unique CDR3 nucleotide sequences that code for the same CDR3 amino acid sequence

1.2 CalcSegmentUsage

This routine computes Variable (V) and Joining (J) segment usage vectors.

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "disease_state" -m metadata.txt ./results/desease_state
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "Sex" -m metadata.txt ./results/Sex
#-p : 画图,依赖于R包
#-f  : 指定分组依据,分组信息在metadata文件中
#--plot-type png 输出png图片
output

disease_state.segments.wt.V

1.3 CalcSpectratype

Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length.

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSpectratype -a -m metadata.txt output_prefix
#-a :Will use CDR3 amino acid sequences for calculation instead of nucleotide ones

output

aa:CDR3的氨基酸序列长度的频率分布
insert: CDR3序列中V-J/V-D/D-J插入的核苷酸序列长度的频率分布
ndn:CDR3序列中V和J片段中间的核苷酸序列长度的频率分布

1.4 PlotFancySpectratype

Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample.This plot allows to detect the highly-expanded clonotypes.

java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotFancySpectratype -t 5 sample1.txt output_prefix
#-t:Number of top clonotypes to visualize. Should not exceed 20, default is 10
#单一样本
fancyspectra

1.5 PlotFancyVJUsage

Plots a circos-style V-J usage plot displaying the frequency of various V-J junctions.

java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotFancyVJUsage sample.txt output_prefix
# -u: Instead of counting read frequency, will count the number of unique clonotypes
fancyvj.wt

1.6 PlotSpectratypeV

Plots a detailed spectratype containing additional info displays CDR3 length distribution for clonotypes from top N Variable segment families.This plot is useful to detect type 1 and type 2 repertoire biases, that could arise under pathological conditions.

java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotSpectratypeV sample.txt output_prefix
# -u: Instead of counting read frequency, will count the number of unique clonotypes
# -t: Number of top (by frequency) V segments to visualize. Should notexceed 12 default is 12
spectraV.wt

2.Diversity estimation

2.1 PlotQuantileStats

Plots a three-layer donut chart to visualize the repertoire clonality.

• First layer (“set”) includes the frequency of singleton (“1”, met once), doubleton (“2”, met twice) and highorder(“3+”, met three or more times) clonotypes.
• The second layer (“quantile”), displays the abundance of top 20% (“Q1”), next 20% (“Q2”), ... (up to “Q5”)
clonotypes for clonotypes from “3+” set.
• The last layer (“top”) displays the individual abundances of top N clonotypes.

java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotQuantileStats -t 10 sample.txt output_prefix
#-t:Number of top clonotypes to visualize. Should not exceed 10, default is 5
qstat

2.2 RarefactionPlot

Plots rarefaction curves for specified list of samples, that is, the dependencies between sample diversity and sample size.

java -jar /path to vdjtools/vdjtools-1.2.1.jar RarefactionPlot -m metadata.txt output_prefix
#-f: factor

rarefaction.strict

Solid and dashed lines mark interpolated and extrapolated regions of rarefaction curves respectively,
points mark exact sample size and diversity. Shaded areas mark 95% confidence intervals.

实线和虚线分别表示稀疏曲线的实际和外推区域,点表示实际的样本大小和多样性。阴影区域表示95%置信区间

2.3 CalcDiversityStats

多样性估计,输出两个表格,一个是原始数据的多样性计算,另一个是在原始数据上外推的多样性计算。

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcDiversityStats -m metadata.txt output_prefix
all.diversity.strict.resampled

3.Repertoire overlap analysis

Clonotype sharing between samples

3.1 OverlapPair

Performs a comprehensive analysis of clonotype sharing for a pair of samples.

java -jar /path to vdjtools/vdjtools-1.2.1.jar OverlapPair -p --plot-area-v2 sample1.txt sample2.txt output_prefix
#-p: plot
#--plot-area-v2:Alternative plotting mode, clonotype CDR3 sequences are shown at plot sides and connected to corresponding areas with lines.

Overlap type

Shorthand Rule Note
strict CDR3nt (AND) V (AND) J (AND) SHMs Require full match for receptor nucleotide sequence
nt CDR3nt
ntV CDR3nt (AND) V
ntVJ CDR3nt (AND) V (AND) J
aa CDR3aa
aaV CDR3aa (AND) V
aaVJ CDR3aa (AND) V (AND) J
aa!nt CDR3aa (AND)((NOT) CDR3nt ) Removes nearly all contamination bias from overlap results. Should not be used for samples from the same donor/tracking experiments
strict.paired.scatter

paired.strict.table.collapsed

Clonotype scatterplot. Main frame contains a scatterplot of clonotype abundances (overlapping clonotypes only) and a linear regression. Point size is scaled to the geometric mean of clonotype frequency in both samples. Scatterplot axes represent log10 clonotype frequencies in each sample. Two marginal histograms show the overlapping (red) and total clonotype (grey) abundance distributions in corresponding sample. Histograms are weighted by clonotype abundance, i.e. they display read distribution by clonotype size.
Shared clonotype abundance plot. Plot shows details for top 20 clonotypes shared between samples, as well as collapsed (“NotShown”) and non-overlapping (“NonOverlapping”) clonotypes. Clonotype CDR3 amino acid sequence is plotted against the sample where the clonotype reaches maximum abundance.

CalcPairwiseDistances

Performs an all-versus-all pairwise overlap for a list of samples and computes a set of repertoire similarity measures. At least 3 samples should be provided.

java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcPairwiseDistances -p  [sample1.txt sample2.txt sample3.txt or -m metadata.txt] output_prefix
#-p: plot
intersect.batch.aa

Pairwise overlap circos plot. Count, frequency and diversity panels correspond to the read count, frequency (both non-symmetric) and the total number of clonotypes that are shared between samples. Pairwise overlaps are stacked, i.e. segment arc length is not equal to sample size.

ClusterSamples

CalcPairwiseDistances的输出文本作为输入进行聚类分析。

java -jar /path to vdjtools/vdjtools-1.2.1.jar ClusterSamples -p  input_prefix output_prefix
#input_prefix等于 calcpariwiseDistance 中的 output_prefix (不用加后缀)
#-p: plot
#-f: factor
#-n:Specifies if plotting factor is continuous

比如:
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcPairwiseDistances -p e:/data/ -m metadata.txt e:/results/all
java -jar /path to vdjtools/vdjtools-1.2.1.jar ClusterSamples -p -f "Sex" e:/results/all e:/results/Sex

官方给的参考图片

image

TestClusters

This routine allows to test whether a given factor influences repertoire clustering. It assesses compactness of samples that have the same factor level and separation between samples with distinct factor levels for the factor specified in ClusterSamples.
(只有ClusterSamples指定了-f时才可以使用该函数,验证factor是如何影响聚类效果的。)

java -jar /path to vdjtools/vdjtools-1.2.1.jar TestClusters   input_prefix output_prefix

官方图片

image

TrackClonotypes

This routine performs an all-vs-all intersection between an ordered list of samples for clonotype tracking purposes. User can specify sample which clonotypes will be traced, e.g. the pre-therapy sample.

java -jar /path to vdjtools/vdjtools-1.2.1.jar TrackClonotypes [options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix
#-m:metadata
#-f:factor
#-p:plot
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,053评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,527评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,779评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,685评论 1 276
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,699评论 5 366
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,609评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,989评论 3 396
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,654评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,890评论 1 298
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,634评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,716评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,394评论 4 319
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,976评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,950评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,191评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 44,849评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,458评论 2 342

推荐阅读更多精彩内容