简介
forcats 处理分类变量,因子比字符容易处理
创建因子
①创建有效水平的列表
x1<- c("Dec", "Apr","Jan","Mar")
month_level <- c(
"Jan","Feb","Mar","Apr","May","Jun",
"July","Aug","Sep","Oct","Nov","Dec")
② 创建因子
y1 <- factor(x1,levels = month_level)
不在集合内的会转换为NA
x2 <- c("Dec", "Apr","Jam","Mar")
y2 <- factor(x2,levels = month_level)
- 省略定义水平的步骤,则按字母排序
- 因子顺序与初始数据保持一致,两种方法 :a. 水平设置为unique(x) b. 创建因子后对其使用fct_inorder()函数
- 直接访问因子的有效水平集合,levels()
修改因子水平
fct_recode()对每个水平进行修改或者重新编码
fct_collapse()合并多个水平
p159 练习题
思路: 先用fct_collapse()进行partyid的分类合并,然后通过group_by根据年份统计人数,最后使用gg_plot画出折线图,横坐标是三个分类,时间变化,纵坐标是人数变化,答案如下。
gss_cat %>%
mutate(partyid =
fct_collapse(partyid,
other = c("No answer", "Don't know", "Other party"),
rep = c("Strong republican", "Not str republican"),
ind = c("Ind,near rep", "Independent", "Ind,near dem"),
dem = c("Not str democrat", "Strong democrat"))) %>%
count(year, partyid) %>%
group_by(year) %>%
mutate(p = n / sum(n)) %>%
ggplot(aes(x = year, y = p,
colour = fct_reorder2(partyid, year, p))) +
geom_point() +
geom_line() +
labs(colour = "Party ID.")