本节来介绍tdplyr中的一个重要函数rowwise,可以通过它对数据按行进行处理
https://dplyr.tidyverse.org/articles/rowwise.html
依然还是使用我们熟悉的iris数据集
library(tidyverse)
iris %>% as_tibble()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
可以看到有5列其中一列为因子其余四列为数值,进行计算时只需要数值列
按行计算均值
方法1
iris %>% as_tibble() %>% select(-Species) %>%
mutate(.,mean=rowMeans(.))
Sepal.Length Sepal.Width Petal.Length Petal.Width mean
<dbl> <dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 2.55
2 4.9 3 1.4 0.2 2.38
3 4.7 3.2 1.3 0.2 2.35
方法2
根据逻辑判断只选择了数值列
iris %>% as_tibble() %>% select_if(is.numeric) %>%
mutate(.,mean=rowMeans(.))
Sepal.Length Sepal.Width Petal.Length Petal.Width mean
<dbl> <dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 2.55
2 4.9 3 1.4 0.2 2.38
3 4.7 3.2 1.3 0.2 2.35
方法3
iris %>% as_tibble() %>% select_if(is.numeric) %>%
rowwise() %>%
mutate(mean = mean(c_across(Sepal.Length:Petal.Width)))
通过rowwise说明对数据按行进行处理,c_across选择多列
方法4
iris %>% as_tibble() %>%
rowwise() %>%
mutate(mean = rowMeans(across(where(is.numeric))))
通过across函数只选择了数值列,此函数异常强大,灵活使用能使代码简洁无比
按行统计最小值
iris %>% as_tibble() %>%
rowwise() %>%
mutate(min = min(across(where(is.numeric))))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species min
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 5.1 3.5 1.4 0.2 setosa 0.2
2 4.9 3 1.4 0.2 setosa 0.2
3 4.7 3.2 1.3 0.2 setosa 0.2
4 4.6 3.1 1.5 0.2 setosa 0.2
按行统计最大值
iris %>% as_tibble() %>%
rowwise() %>%
mutate(max = max(across(where(is.numeric))))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species max
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 5.1 3.5 1.4 0.2 setosa 5.1
2 4.9 3 1.4 0.2 setosa 4.9
3 4.7 3.2 1.3 0.2 setosa 4.7
4 4.6 3.1 1.5 0.2 setosa 4.6
按行求和
iris %>% as_tibble() %>%
rowwise() %>%
mutate(sum = sum(across(where(is.numeric))))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 5.1 3.5 1.4 0.2 setosa 10.2
2 4.9 3 1.4 0.2 setosa 9.5
3 4.7 3.2 1.3 0.2 setosa 9.4
4 4.6 3.1 1.5 0.2 setosa 9.4
按行计算标准差
iris %>% as_tibble() %>%
rowwise() %>%
mutate(sd = sd(across(where(is.numeric))))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sd
<dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 5.1 3.5 1.4 0.2 setosa 2.18
2 4.9 3 1.4 0.2 setosa 2.04
3 4.7 3.2 1.3 0.2 setosa 2.00
4 4.6 3.1 1.5 0.2 setosa 1.91
统计每行中某值出现的次数
iris %>% as_tibble() %>% select(-Species) %>%
mutate(.,n=rowSums(. > 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width n
<dbl> <dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 2
2 4.9 3 1.4 0.2 1
3 4.7 3.2 1.3 0.2 2
4 4.6 3.1 1.5 0.2 2