R包学习之broom

#broom包接受R中内置函数的杂乱输出，如lm、nls或t-test，并将它们转换为整齐的数据帧。

#就是把非数据框的杂乱数据整理为数据框

#broom+dplyr配合使用

#有三个功能：tidy;augment;glance

#例子一

```

lmfit <- lm(mpg ~ wt, mtcars)

lmfit

summary(lmfit)

library(broom)

tidy(lmfit)

```

#返回一个数据框，行名变成了名为term的列

# 您可能对回归中每个原始点的拟合值和残差感兴趣，而不是查看系数。

# 使用augment，它使用来自模型的信息来扩充原始数据

augment(lmfit)

#添加的列前面有一个点.，以避免覆盖原始列

#对于整个回归计算，有好几个总结性统计方法，glance功能可实现

glance(lmfit)

#例子二

```

#Generalized linear and non-linear models

glmfit <- glm(am ~ wt, mtcars, family="binomial")

tidy(glmfit)

augment(glmfit)

glance(glmfit)

#这些功能对非线性模型一样适用

nlsfit <- nls(mpg ~ k / wt + b, mtcars, start=list(k=1, b=0))

tidy(nlsfit)

augment(nlsfit, mtcars)

glance(nlsfit)

#The tidy function can also be applied to htest objects,

#such as those output by popular built-in functions like

#t.test, cor.test, and wilcox.test.

tt <- t.test(wt ~ am, mtcars)

tidy(tt)

wt<-wilcox.test(wt ~ am, mtcars)

tidy(wt)

glance(tt)

glance(wt)

#augment method is defined only for chi-squared tests

chit <- chisq.test(xtabs(Freq ~ Sex + Class, data = as.data.frame(Titanic)))

tidy(chit)

augment(chit)

```

# All functions

# The output of the tidy, augment and glance functions is always a data frame.

# The output never has rownames. This ensures that you can combine it with other tidy outputs without

# fear of losing information (since rownames in R cannot contain duplicates).

# Some column names are kept consistent, so that they can be combined across different models and so

# that you know what to expect (in contrast to asking “is it pval or PValue?” every time). The examples

# below are not all the possible column names, nor will all tidy output contain all or even any of these

# columns.

# tidy functions

# Each row in a tidy output typically represents some well-defined concept, such as one term in a

# regression, one test, or one cluster/class. This meaning varies across models but is usually self-evident.

# The one thing each row cannot represent is a point in the initial data (for that, use the augment method).

# Common column names include:

# term"" the term in a regression or model that is being estimated.

# p.value: this spelling was chosen (over common alternatives such as pvalue, PValue, or pval) to

# be consistent with functions in R’s built-in stats package

# statistic a test statistic, usually the one used to compute the p-value. Combining these across

# many sub-groups is a reliable way to perform (e.g.) bootstrap hypothesis testing

# estimate

# conf.low the low end of a confidence interval on the estimate

# conf.high the high end of a confidence interval on the estimate

# df degrees of freedom

# augment functions

# augment(model, data) adds columns to the original data.

# If the data argument is missing, augment attempts to reconstruct the data from the model (note that

# this may not always be possible, and usually won’t contain columns not used in the model).

# Each row in an augment output matches the corresponding row in the original data.

# If the original data contained rownames, augment turns them into a column called .rownames.

# Newly added column names begin with . to avoid overwriting columns in the original data.

# Common column names include:

# .fitted: the predicted values, on the same scale as the data.

# .resid: residuals: the actual y values minus the fitted values

# .cluster: cluster assignments

# glance functions

# glance always returns a one-row data frame.

# The only exception is that glance(NULL) returns an empty data frame.

# We avoid including arguments that were given to the modeling function. For example, a glm glance

# output does not need to contain a field for family, since that is decided by the user calling glm rather

# than the modeling function itself.

# Common column names include:

# r.squared the fraction of variance explained by the model

# adj.r.squared R^2 adjusted based on the degrees of freedom

# augment(chit)sigma the square root of the estimated variance of the residuals

最后编辑于：2019.06.16 22:03:06

R包学习之broom

推荐阅读更多精彩内容