利用R包ggmsa进行多序列比对_2020-05-31

## 1.设置当前工作目录

setwd("./ggmsa")

## 2.安装和导入R包

# install.packages("ggmsa")

library(ggmsa)

library(ggplot2)

## 3.R包简要信息

help(package = "ggmsa")

# Package: ggmsa

# Title: Plot Multiple Sequence Alignment using 'ggplot2'

# Version: 0.0.4

# Authors@R: c( person("Guangchuang", "Yu", email = "guangchuangyu@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-6485-8781")),

#              person("Lang", "Zhou",      email = "nyzhoulang@gmail.com",    role = "aut"),

#              person("Huina", "Huang",    email = "1185796994@qq.com",      role = "ctb"))

# Description: Supports visualizing multiple sequence alignment of DNA and protein sequences using 'ggplot2'. It supports a number of colour schemes, including Chemistry, Clustal, Shapely, Taylor and Zappo. Multiple sequence alignment can easily be combined with other 'ggplot2' plots, such as aligning a phylogenetic tree produced by 'ggtree' with multiple sequence alignment.

# Depends: R (>= 3.5.0)

# Imports: Biostrings, ggplot2, magrittr, tidyr, utils, stats, stringr

# Suggests: ape, cowplot, ggtree, knitr, methods, seqmagick

# License: Artistic-2.0

# Encoding: UTF-8

# LazyData: true

# RoxygenNote: 7.1.0

# VignetteBuilder: knitr

# NeedsCompilation: no

# Packaged: 2020-05-28 08:15:32 UTC; ygc

# Author: Guangchuang Yu [aut, cre] (<https://orcid.org/0000-0002-6485-8781>),

# Lang Zhou [aut],

# Huina Huang [ctb]

# Maintainer: Guangchuang Yu <guangchuangyu@gmail.com>

#  Repository: CRAN

# Date/Publication: 2020-05-28 10:50:10 UTC

# Built: R 3.6.3; ; 2020-05-29 14:03:22 UTC; windows

ls(package:ggmsa)

# [1] "available_colors" "available_fonts"  "available_msa" 

# [4] "facet_msa"        "geom_asterisk"    "geom_GC"       

# [7] "geom_msa"        "geom_seed"        "geom_seqlogo"   

# [10] "ggmotif"          "ggmsa"            "tidy_msa" 

## 4.测试

# Plot multiple sequence alignment using ggplot2 with multiple color schemes supported.

# Supports visualizing multiple sequence alignment of DNA and protein sequences using ggplot2 It supports a number of colour schemes, including Chemistry, Clustal, Shapely, Taylor and Zappo. Multiple sequence alignment can easily be combined with other ‘ggplot2’ plots, such as aligning a phylogenetic tree produced by ‘ggtree’ with multiple sequence alignment.

### 4.1 Load sample data

# Three sample data are shipped with the ggmsa package. Note that ggmsa supports not only fasta files but other objects as well. available_msa()can be used to list MSA objects currently available.

available_msa()

# files currently available:

#  .fasta

# XStringSet objects from 'Biostrings' package:

#  DNAStringSet RNAStringSet AAStringSet BStringSet DNAMultipleAlignment RNAMultipleAlignment AAMultipleAlignment

# bin objects from 'seqmagick' package:

#  DNAbin AAbin

protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")

miRNA_sequences <- system.file("extdata", "seedSample.fa", package = "ggmsa")

nt_sequences <- system.file("extdata", "LeaderRepeat_All.fa", package = "ggmsa")

path.package("ggmsa")

# [1] "C:/Users/lenovo/Documents/R/win-library/3.6/ggmsa"

# Visualizing Multiple Sequence Alignments #

### 4.2 The most simple code to use ggmsa:

?ggmsa

#@ 简单绘制

ggmsa(protein_sequences, start = 265, end = 300)


#@ 调整参数,实现个性化绘制多序列比对图

ggmsa(protein_sequences, start = 265, end = 300, font = "TimesNewRoman", color = "Clustal", char_width = 0.8, none_bg = T, seq_name = T)


ggmsa(protein_sequences, start = 265, end = 300, font = "TimesNewRoman", color = "Chemistry_AA", char_width = 0.8, none_bg = F)


# Colour Schemes #

available_colors()

# color schemes for nucleotide sequences currently available:

#  Chemistry_NT Shapely_NT Taylor_NT Zappo_NT

# color schemes for AA sequences currently available:

#  Clustal Chemistry_AA Shapely_AA Zappo_AA Taylor_AA

### 4.3 Clustal X Colour Scheme(Default)

#@ This is an emulation of the default colourscheme used for alignments in Clustal X, a graphical interface for the ClustalW multiple sequence alignment program. Each residue in the alignment is assigned a colour if the amino acid profile of the alignment at that position meets some minimum criteria specific for the residue type.

ggmsa(protein_sequences, start = 320, end = 360, color = "Clustal")


### 4.4 Color by Chemistry

#@ Amino acids are colored according to their side chain chemistry:

ggmsa(protein_sequences, start = 320, end = 360, color = "Chemistry_AA")


### 4.5 Color by Shapely

#@ This color scheme matches the RasMol amino acid and RasMol nucleotide color schemes, which are, in turn, based on Robert Fletterick’s “Shapely models”.

ggmsa(protein_sequences, start = 320, end = 360, color = "Shapely_AA")


### 4.6 Color by Taylor

#@ This color scheme is taken from Taylor(Taylor 1997) and is also used in JalView(Waterhouse et al. 2009).

ggmsa(protein_sequences, start = 320, end = 360, color = "Taylor_AA")


### 4.7 Color by Zappo

#@ This scheme colors residues according to their physico-chemical properties, and is also used in JalView(Waterhouse et al. 2009).

ggmsa(protein_sequences, start = 320, end = 360, color = "Zappo_AA")


### 4.8 Font

#@ Several classic font for MSA are shipped in the package. In the same ways, you can use available_fonts() to list font currently available

available_fonts()

# font families currently available:

  # helvetical mono TimesNewRoman DroidSansMono

# helvetical

ggmsa(protein_sequences, start = 320, end = 360, font = "helvetical", color = "Chemistry_AA")


# TimesNewRoman

ggmsa(protein_sequences, start = 320, end = 360, font = "TimesNewRoman", color = "Chemistry_AA")


# DroidSansMono

ggmsa(protein_sequences, start = 320, end = 360, font = "DroidSansMono", color = "Chemistry_AA")


#@ If you specify font = NULL, only tiles will be plot.

ggmsa(protein_sequences, start = 320, end = 360, font = NULL, color = "Chemistry_AA", seq_name = F)


ggmsa(protein_sequences, start = 320, end = 360, font = NULL, color = "Chemistry_AA", seq_name = T)


### 4.9 Characters width

#@ Characters width can be specified by char_width. Defaults is 0.9.

ggmsa(protein_sequences, start = 320, end = 360, char_width = 0.5, color = "Chemistry_AA")


### 4.10 Background

#@ Background can be specified by none_bg. If none_bg = TRUE, only the character will be plot.

ggmsa(protein_sequences, start = 320, end = 360, none_bg = TRUE) + theme_void()


### 4.11 Position Highligthed

#@ Position Highligthed can be specified by posHighligthed. The none_bg = FALSE when you specified position Highligthed by posHighligthed

# 不连续高亮

ggmsa(protein_sequences, 164, 213, color = "Chemistry_AA",

      posHighligthed = c(185, 190))


ggmsa(protein_sequences, 164, 213, color = "Chemistry_AA", posHighligthed = c(180, 190, 200))


# 连续高亮

ggmsa(protein_sequences, 164, 213, color = "Chemistry_AA",

      posHighligthed = c(180:200))


### 4.12 Sequence names

#@ Sequence names Defaults is ‘NULL’ which indicates that the sequence name is displayed when font = NULL, but ‘font = char’ will not be displayed. If seq_name = TRUE the sequence name will be displayed when you need it.

ggmsa(protein_sequences, 164, 213, color = "Chemistry_AA", seq_name = TRUE)


#2 If seq_name = FALSE the sequence name will not be displayed in any case.

ggmsa(protein_sequences, 164, 213, font = NULL, color = "Chemistry_AA", seq_name = FALSE)


## 5.结束

# RUNRPTEST("./ggmsa", rpackage = "ggmsa",install_method = "website", rpackage_repository = "cran")

sessionInfo()

# R version 3.6.3 (2020-02-29)

# Platform: x86_64-w64-mingw32/x64 (64-bit)

# Running under: Windows 10 x64 (build 18363)

#

# Matrix products: default

#

# locale:

#  [1] LC_COLLATE=Chinese (Simplified)_China.936

# [2] LC_CTYPE=Chinese (Simplified)_China.936 

# [3] LC_MONETARY=Chinese (Simplified)_China.936

# [4] LC_NUMERIC=C                             

# [5] LC_TIME=Chinese (Simplified)_China.936   

#

# attached base packages:

#  [1] stats    graphics  grDevices utils    datasets  methods 

# [7] base   

#

# other attached packages:

#  [1] ggplot2_3.3.0 ggmsa_0.0.4 

#

# loaded via a namespace (and not attached):

#  [1] Rcpp_1.0.4.6        pillar_1.4.4        compiler_3.6.3   

# [4] XVector_0.26.0      tools_3.6.3        zlibbioc_1.32.0   

# [7] digest_0.6.25      packrat_0.5.0      lifecycle_0.2.0   

# [10] tibble_3.0.1        gtable_0.3.0        pkgconfig_2.0.3   

# [13] rlang_0.4.6        rstudioapi_0.11    seqmagick_0.1.3   

# [16] parallel_3.6.3      withr_2.2.0        dplyr_0.8.5       

# [19] stringr_1.4.0      Biostrings_2.54.0  S4Vectors_0.24.3 

# [22] vctrs_0.3.0        IRanges_2.20.2      stats4_3.6.3     

# [25] grid_3.6.3          tidyselect_1.1.0    glue_1.4.1       

# [28] R6_2.4.1            purrr_0.3.4        tidyr_1.1.0       

# [31] farver_2.0.3        magrittr_1.5        scales_1.1.1     

# [34] ellipsis_0.3.1      BiocGenerics_0.32.0 assertthat_0.2.1 

# [37] colorspace_1.4-1    labeling_0.3        stringi_1.4.6     

# [40] munsell_0.5.0      crayon_1.3.4

#@ 两篇参考文献,有兴趣的同学读一下

# Taylor, W R. 1997. “Residual Colours: A Proposal for Aminochromography.” Protein Eng 10 (7): 743–46.

# Waterhouse, A. M., J. B. Procter, D. M. Martin, M Clamp, and G. J. Barton. 2009. “Jalview Version 2–a Multiple Sequence Alignment Editor and Analysis Workbench.” Bioinformatics 25 (9): 1189.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,362评论 5 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,330评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,247评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,560评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,580评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,569评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,929评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,587评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,840评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,596评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,678评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,366评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,945评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,929评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,165评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,271评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,403评论 2 342

推荐阅读更多精彩内容