小郭叨叨叨:
上年发现运来兄有介绍桑基图,感觉好玩,但不知如何使用,无学习的欲望;最近发现果子大神也在介绍桑基图,又勾起了我的好奇心,偷闲学习了一下有关桑基图的R包,特此搬砖备用,以待不时之需😂
What's that ?
桑基图: 流图 ( flow diagram ) 的一种,用来描述流动情况,图中延伸的分支的宽度对应数据流量的大小,通常应用于能源、材料成分、金融等数据的可视化分析。最早由爱尔兰人Matthew Henry Phineas Riall Sanke提出。Sankey是一名船长也是工程师,1898年Sankey在土木工程师学会会报纪要的一篇关于蒸汽机能源效率的文章中首次推出了第一个能量流动图,后来被命名为Sankey图,中文音译为桑基图。
R语言中实现方式
ggalluvial
packages;
ggforce
packages.
本文档仅仅为了学习桑基图相关R包的使用方法,因手上缺少实际数据,不再展开
1. ggalluvial : Alluvial Diagrams in ggplot2
准备工作
install.packages("ggalluvial")
library(ggalluvial)
##使用vignette查看演示教程
vignette(topic = "ggalluvial", package = "ggalluvial")
Alluvial data
ggalluvial识别两种形式的数据:分类重复测量数据的“wide”和“long” formats。用于存储多个分类维度的数据类型"tabular(or array)"也很受欢迎,如Titanic
和UCBAdmissions
数据集。
为了和ggplot2数据格式保持一致,ggalluvial不接受表格输入;base::data.frame()
可将这些array
转换成可接受的 data frame
.
简单的例子
> head(as.data.frame(UCBAdmissions), n = 12)
Admit Gender Dept Freq
1 Admitted Male A 512
2 Rejected Male A 313
3 Admitted Female A 89
4 Rejected Female A 19
5 Admitted Male B 353
6 Rejected Male B 207
7 Admitted Female B 17
8 Rejected Female B 8
9 Admitted Male C 120
10 Rejected Male C 205
11 Admitted Female C 202
12 Rejected Female C 391
> is_alluvia_form(as.data.frame(UCBAdmissions), axes = 1:3, silent = TRUE)
[1] TRUE
ggplot(as.data.frame(UCBAdmissions),
aes(y = Freq, axis1 = Gender, axis2 = Dept)) +
geom_alluvium(aes(fill = Admit), width = 1/12) +
geom_stratum(width = 1/12, fill = "black", color = "grey") +
geom_label(stat = "stratum", label.strata = TRUE) +
scale_x_discrete(limits = c("Gender", "Dept"), expand = c(.05, .05)) +
scale_fill_brewer(type = "qual", palette = "Set1") +
ggtitle("UC Berkeley admissions and rejections, by sex and department")
Alluvia format :wide
& long
wide格式:
as.data.frame
ggplot(as.data.frame(Titanic),
aes(axis1 = Class, axis2 = Sex, axis3 = Age,
y= Freq)) +
scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +
geom_alluvium(aes(fill = Survived)) +
geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +
theme_minimal() +
ggtitle("passengers on the maiden voyage of the Titanic",
"stratified by demographics and survival")
参数说明:data设置数据源,axis设置显示的柱,weight为数值,geom_alluvium为冲击图组间面积连接并按生存率比填充分组,geom_stratum()每种有柱状图,geom_text()显示柱状图中标签,theme_minimal()主题样式的一种,ggtitle()设置图标题。
转换成 long格式 :
to_lodes_form
titanic_long <- to_lodes_form(data.frame(Titanic),
key = "Demographic",
axes = 1:3)
> head(titanic_long)
Survived Freq alluvium Demographic stratum
1 No 0 1 Class 1st
2 No 0 2 Class 2nd
3 No 35 3 Class 3rd
4 No 0 4 Class Crew
5 No 0 5 Class 1st
6 No 0 6 Class 2nd
ggplot(data = titanic_long,
aes(x = Demographic, stratum = stratum, alluvium = alluvium,
y = Freq, label = stratum)) +
geom_alluvium(aes(fill = Survived)) +
geom_stratum() + geom_text(stat = "stratum") +
theme_minimal() +
ggtitle("passengers on the maiden voyage of the Titanic",
"stratified by demographics and survival")
使用coord_flip
函数进行X轴与Y轴的对调
ggplot(as.data.frame(Titanic),
aes(y = Freq,
axis1 = Survived, axis2 = Sex, axis3 = Class)) +
geom_alluvium(aes(fill = Class),
width = 0, knot.pos = 0, reverse = FALSE) +
guides(fill = FALSE) +
geom_stratum(width = 1/8, reverse = FALSE) +
geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
scale_x_continuous(breaks = 1:3, labels = c("Survived", "Sex", "Class")) +
coord_flip() +
ggtitle("Titanic survival by class and sex")
非等高冲击图
data(Refugees, package = "alluvial")
country_regions <- c(
Afghanistan = "Middle East",
Burundi = "Central Africa",
`Congo DRC` = "Central Africa",
Iraq = "Middle East",
Myanmar = "Southeast Asia",
Palestine = "Middle East",
Somalia = "Horn of Africa",
Sudan = "Central Africa",
Syria = "Middle East",
Vietnam = "Southeast Asia"
)
Refugees$region <- country_regions[Refugees$country]
ggplot(data = Refugees,
aes(x = year, y = refugees, alluvium = country)) +
geom_alluvium(aes(fill = country, colour = country),
alpha = .75, decreasing = FALSE) +
scale_x_continuous(breaks = seq(2003, 2013, 2)) +
theme_bw() +
theme(axis.text.x = element_text(angle = -30, hjust = 0)) +
scale_fill_brewer(type = "qual", palette = "Set3") +
scale_color_brewer(type = "qual", palette = "Set3") +
facet_wrap(~ region, scales = "fixed") +
ggtitle("refugee volume by country and region of origin")
Warning message:
In f(...) :
Some differentiation aesthetics vary within alluvia, and will be diffused by their first value.
Consider using `geom_flow()` instead.
登高非等量关系
data(majors)
majors$curriculum <- as.factor(majors$curriculum)
ggplot(majors,
aes(x = semester, stratum = curriculum, alluvium = student,
fill = curriculum, label = curriculum)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", lode.guidance = "rightleft",
color = "darkgray") +
geom_stratum() +
theme(legend.position = "bottom") +
ggtitle("student curricula across several semesters")
流状态随时间转换
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses at three points in time")
2. ggforce: Visual Guide
data <- reshape2::melt(Titanic)
head(data)
Class Sex Age Survived value
1 1st Male Child No 0
2 2nd Male Child No 0
3 3rd Male Child No 35
4 Crew Male Child No 0
5 1st Female Child No 0
6 2nd Female Child No 0
data <- gather_set_data(data, 1:4)
head(data)
Class Sex Age Survived value id x y
1 1st Male Child No 0 1 Class 1st
2 2nd Male Child No 0 2 Class 2nd
3 3rd Male Child No 35 3 Class 3rd
4 Crew Male Child No 0 4 Class Crew
5 1st Female Child No 0 5 Class 1st
6 2nd Female Child No 0 6 Class 2nd
ggplot(data, aes(x, id = id, split = y, value = value)) +
geom_parallel_sets(aes(fill = Sex), alpha = 0.3, axis.width = 0.1) +
geom_parallel_sets_axes(axis.width = 0.1) +
geom_parallel_sets_labels(colour = 'white')
巨人的肩膀
ggalluvial : Alluvial Diagrams in ggplot2
ggforce: Visual Guide
桑基图(Sankey)的简单实现
桑基图怎么看怎么画(附R代码)