学习小组Day6笔记--贾

R包学习和示例

学习R包

学习R语言最主要的目的是以后利用它的图表功能以及bioconductor中多种生信分析的R包。

CRAN是R默认使用的R包仓库，install.packages()只能用于安装发布在CRAN上的包。此外还有几个软件包仓库，而Bioconductor是基因组数据分析相关的软件包仓库，需要用专门的命令进行安装。
--引用来源于简书文章

注：以下示例来自于微信公众号生信星球

安装包是否可以从CRAN下载可以用命令options()$repos检验
安装包是否可以从Bioc下载可以用options()$BioC_mirror检验
配置镜像源

file.edit('~/.Rprofile')启动Rprofile 编辑文件
输入以下两行命令运行并保存（也可以使用别的镜像网站）

options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))  # repos指包所在的网址，对应清华源
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") # 对应中科院镜像源

之后重启Rstudio就可以不必反复设置镜像源

安装R包

安装命令是install.packages(“包”)或BiocManager::install(“包”)
取决于安装的包存在于CRAN 还是Bioc

加载R包

library(包)
require(包)

两个命令都可以加载R包

dplyr包安装及操作实例

设置好安装镜像之后输入命令

install.packages("dplyr")
library(dplyr)

使用内置数据集iris简化版作为示例数据
test <-iris[c(1:2,51:52,101:102),]

dplyr基础函数操作实例（基于上述iris示例数据）

操作示例来源微信公众号生信星球

mutate(),新增列
mutate(test, new = Sepal.Length * Sepal.Width)

select(),按列筛选
select(test,1)

2.1

select(test,c(1,5))

2.2

select(test,Sepal.Length)

2.3

按列名筛选

select(test, Petal.Length, Petal.Width)
或者
vars <- c("Petal.Length", "Petal.Width")
select(test, one_of(vars))

filter()筛选行
filter(test, Species == "setosa")

4.1

filter(test, Species == "setosa"&Sepal.Length > 5 )

4.2

filter(test, Species %in% c("setosa","versicolor"))

4.3

arrange(),按某1列或某几列对整个表格进行排序
arrange(test, Sepal.Length)#默认从小到大排序

5.1

arrange(test, desc(Sepal.Length))#用desc从大到小

5.2

summarise()：汇总(结合 `group_by``操作)

summarise(test, mean(Sepal.Length), sd(Sepal.Length))# 计算Sepal.Length的平均值和标准差

6.1

先按照Species分组，计算每组Sepal.Length的平均值和标准差

group_by(test, Species)
summarise(group_by(test, Species),mean(Sepal.Length), sd(Sepal.Length))

6.2

PS：stringsAsFactors=FALSE就是不变成属性数据，按字符串读入

dplyr的进阶操作（同样基于上述iris实例）

dplyr进阶

最后编辑于：2020.08.19 22:53:32