ggforce基于对ggplot2的扩展,可以更好的展示相应的视图,并根据数据绘制轮廓以及区域放大。
1.形状的扩展与收缩
library(ggplot2)
library(ggforce)
# Adapted from geom_polygon documentation
ids <- factor(c("1.1", "2.1", "1.2", "2.2", "1.3", "2.3"))
values <- data.frame(
id = ids,
value = c(3, 3.1, 3.1, 3.2, 3.15, 3.5)
)
positions <- data.frame(
id = rep(ids, each = 4),
x = c(2, 1, 1.1, 2.2, 1, 0, 0.3, 1.1, 2.2, 1.1, 1.2, 2.5, 1.1, 0.3,
0.5, 1.2, 2.5, 1.2, 1.3, 2.7, 1.2, 0.5, 0.6, 1.3),
y = c(-0.5, 0, 1, 0.5, 0, 0.5, 1.5, 1, 0.5, 1, 2.1, 1.7, 1, 1.5,
2.2, 2.1, 1.7, 2.1, 3.2, 2.8, 2.1, 2.2, 3.3, 3.2)
)
datapoly <- merge(values, positions, by = c("id"))
ggplot(datapoly, aes(x = x, y = y)) +
geom_shape(aes(fill = value, group = id), expand = unit(-3, 'mm'))
ggplot(datapoly, aes(x = x, y = y)) +
geom_shape(aes(fill = value, group = id), radius = unit(3, 'mm'))
![image.png](https://upload-images.jianshu.io/upload_images/20297934-699f032dfe9f7dbb.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
ggplot(datapoly, aes(x = x, y = y)) +
geom_shape(aes(fill = value, group = id), expand = unit(3, 'mm'), radius = unit(2, 'mm'), alpha = 0.5)
2.并行图
并行图是显示多维分类数据的一种方式。通过在平行分类轴上的层之间绘制粗斜线,将显示多个类别中的层之间的重叠。泰坦尼克号生存数据集就是一个典型的例证。
需要注意的一点是,通常表示这种类型的数据的方法是在其自己的列中对每个分类级别进行编码,但这不适用于ggplot2,因为它要求同一轴的所有值都在同一列中。ggforce通过提供一个辅助函数将ggplot的数据形式转换促进这项工作。
data <- reshape2::melt(Titanic)
head(data)
## Class Sex Age Survived value
## 1 1st Male Child No 0
## 2 2nd Male Child No 0
## 3 3rd Male Child No 35
## 4 Crew Male Child No 0
## 5 1st Female Child No 0
## 6 2nd Female Child No 0
data <- gather_set_data(data, 1:4)
head(data)
## Class Sex Age Survived value id x y
## 1 1st Male Child No 0 1 Class 1st
## 2 2nd Male Child No 0 2 Class 2nd
## 3 3rd Male Child No 35 3 Class 3rd
## 4 Crew Male Child No 0 4 Class Crew
## 5 1st Female Child No 0 5 Class 1st
## 6 2nd Female Child No 0 6 Class 2nd
ggplot(data, aes(x, id = id, split = y, value = value)) +
geom_parallel_sets(aes(fill = Sex), alpha = 0.3, axis.width = 0.1) +
geom_parallel_sets_axes(axis.width = 0.1) +
geom_parallel_sets_labels(colour = 'white')
3.SinaPlot
geom_sina
它受小提琴图的启发,并通过标准化点密度来限制沿x轴的抖动来进行操作。数据整体上的表示仍然很简单,密度分布是显而易见的,并且该图仍然提供有关每个类别中存在多少个数据点以及离群值是否驱动分布尾部的信息。通过这种方式,可以传达有关数据均值/中位数,方差和数据点的实际数量以及密度分布的信息。
###Sample gaussian distributions with 1, 2 and 3 modes.
df <- data.frame(
"Distribution" = c(rep("Unimodal", 500),
rep("Bimodal", 250),
rep("Trimodal", 600)),
"Value" = c(rnorm(500, 6, 1),
rnorm(200, 3, .7), rnorm(50, 7, 0.4),
rnorm(200, 2, 0.7), rnorm(300, 5.5, 0.4), rnorm(100, 8, 0.4))
)
# Reorder levels
df$Distribution <- factor(df$Distribution,
levels(df$Distribution)[c(3, 1, 2)])
p <- ggplot(df, aes(Distribution, Value))
p + geom_violin(aes(fill = Distribution))
p + geom_sina(aes(color = Distribution), size = 1)
更多参考原文:http://cran.univ-paris1.fr/web/packages/ggforce/vignettes/Visual_Guide.html
欢迎关注~