Day 3: summaries of data - two dimension summary
例子1: multiple boxplot 不同联盟的胜率有什么不同?
> temp <- read.csv("basketball_teams.csv")
> teamdata <- as.data.frame(temp)
> teamdata$new_column <- ifelse(teamdata$games == 0, NA, teamdata$won / teamdata$games)
> stats <- teamdata[, c("name","lgID", "year","new_column")]
boxplot(stats$new_column ~stats$lgID, data = stats, col = "red")
结果如下:
我们也可以用histgram
> par(mfrow = c(2,1), mar = c(4,4,2,1))
> hist(subset(stats$new_column, stats$lgID == "ABA"), col="green")
> hist(subset(stats$new_column, stats$lgID == "NBA"), col="green")
scatterplot
> with(stats, plot(stats$year, stats$new_column))
> abline( h =0.7, lwd = 2, lty = 2)
add color to scatterplot
with(stats, plot(stats$year, stats$new_column, col=stats$lgID))
从这个图中,我们就能看出来各个联赛(ABA,NBA)的球队他们的胜率是什么样子的。
或者,可以做多个scatterplot
分别看NBA和NBL的胜率
> with(subset(stats, stats$lgID == "NBA"), plot(subset(stats, stats$lgID == "NBA")$year, subset(stats, stats$lgID == "NBA")$new_column, main = "NBA"))
> with(subset(stats, stats$lgID == "NBL"), plot(subset(stats, stats$lgID == "NBL")$year, subset(stats, stats$lgID == "NBL")$new_column, main = "NBL"))