Intro & qplot

Reference information

Book Name: ggplot2 - Elegant Graphics for Data Analysis
Author: Hadley Wickham
Publisher: Springer
ISBN: 978-0-387-98140-6
e-ISBN: 978-0-387-98141-3

Intro

  • Create new graphics that are precisely tailored for your problem

Resources:

Grammar of graphics

A statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars).
The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system.
Faceting can be used to generate the same plot for different subsets of the dataset.

  • data
  • aes: describing how variables in the data are mapped to aesthetic attributes
  • geom: what actually see, points, lines, polygons, etc
  • stat: statistical transformations
  • scale: map values in the data space to values in an aesthetic space, colour, size, shape
  • coord: data coordinates mapped to the plane of the graphic, axes and gridlines, Cartesian, polar and map projection
  • facet: break up the data into subsets, display those subsets

Relevant resources

  • Which plot to produce: Chambers et al. (1983);Cleveland (1993a); Robbins (2004); Tukey (1977).
  • Create an attractive plot: Tufte (1990, 1997, 2001, 2006).
  • Dynamic and interactive graphics: Cook and Swayne (2007), rggobi package.

qplot

short for quick plot

Basic use

The first two arguments to qplot() are x and y.

qplot(carat,price, data=diamonds)
qplot(log(carat), log(price), data=diamonds)
qplot(carat, x*y*z, data=diamonds)

Colour, size, shape and other aesthetic attributes

  • With plot it's your responsibility to convert a categorical variable in your data into something that plot knows how to use.
  • qplot can do this for you automatically, and it will automatically provide a legend that maps the displayed attributes to the data values.
    Augment the plot of carat and price with information about diamond colour and cut.
qplot(carat, price, data=dsmall, colour=color)
qplot(carat, price, data=dsmall, shape=cut)

You can also manually set the aesthetics using I().
For large datasets, semitransparent points are often useful to alleviate some of the overplotting.
It's often useful to specify the transparency as a fraction, e.g., 1/10 or 2/10, as the denominator specifies the number of points that must overplot to get a completely opaque colour.

qplot(carat, price, data=diamonds, alpha=I(1/10)

Plot geoms

  • geom='point'
    default
  • geom='smooth'
    fits a smoother to the data and displays the smooth and its standard error
  • geom='boxplot'
  • geom='path' and geom = 'line'
    A line plot is constrained to produce lines that travel from left to right, while paths can go in any direction.
  • 1d distributions, continuous variables
    geom='histogram' draws a histogram(default), geom='freqpoly' a frequency polygon, and geom='density' created a density plot
  • 1d distribution, discrete variables
    **geom='bar' makes a bar chart

Adding a smoother to a plot

qplot(carat, price, data=diamonds, geom=c('point','smooth')

If you want to turn the confidence interval off, use se = FALSE .
There are many different smoothers you can choose between by using the method argument.

  • method='loess'
    default for small n, uses a smooth local regression.
    The wiggliness of the line is controlled by the span parameter, which ranges from 0 (exceedingly wiggly) to 1(not so wiggly).
qplot(carat, price, data=dsmall, geom=c('point','smooth'), span=0.2)

Loess does not work well for large datasets.

  • method='gam'
    load library mgcv
    formula=y~s(x) to fit a generalised additive model.
    Similar to using a spline with lm, but the degree of smoothness is estimated from the data.
    For large data, use the formula y~s(x,bs='cs') .(default when more than 1000 points.)
library(mgcv)
qplot(carat, price, data = dsmall, geom=c('point', 'smooth'), method='gam', formula=y~s(x))
qplot(carat, price, data = diamonds, geom=c('point','smooth'),
method='gam', formula=y~s(x,bs='cs'))
  • method='lm'
    - default: a straight line.
    - formula=y~poly(x,2) : specify a degree 2 polynomial
    - formula=y~ns(x,2) : load the splines packages and use a natural spline. (the second parameter is the degrees of freedom, a higher number will create a wigglier curve.)
library(splines)
qplot(carat, price, data=dsmall, geom=c('point','smooth'),method='lm')
qplot(carat, price, data=dsmall, geom=c('point','smooth'),method='lm',formula=y~ns(x,5)

Boxplots and jittered points

How the values of the continuous variables vary with the levels of the categorical variable.

  • geom='jitter'
  • geom='boxplot'
    Boxplots summarise the bulk of the distribution with only several of the numbers, while jittered plots show every point but can suffer from overplotting.
    The boxplots can give information of the median and adjacent quartiles.
    The overplotting seen in the plot of jittered values can be alleviated somewhat by using semi-transparent points using the alpha argument.
qplot(color, price/carat, data=diamonds, geom='jitter', alpha=I(1/50)

**aesthetics: ** size, colour, shape, fill(boxplot)

Histogram and density plots

qplot(carat, data = diamonds, geom='histogram')
qplot(carat, data= diamonds, geom='density')

For the density plot, the adjust argument controls the degree of smoothness (high values of adjust produce smoother plots).
For the histogram, the binwidth argument controls the amount of smoothing by setting the bin size. (Break points can also be specified explicitly, using the breaks argument.)

  • Gross features of the data show up well at a large bin width, while finer features require a very narrow width.
    To compare the distributions of different subgroups, just add an aesthetic mapping, as in the following code.
qplot(carat, data=diamonds, geom='density', colour = color)
qplot(carat, data=diamonds, geom='histogram', fill = color)

The density plot is more appealing at first because it seems easy to read and compare the various curves. However, it is more difficult to understand exactly what a density plot is showing.
In addition, the density plot makes some assumptions that may not be true for our data, i.e. that it is unbounded, continuous and smooth.

Bar charts

The discrete analogue of histogram is the bar chart.
geom='bar'
The bar geom counts the number of instances of each class so that you don't need to tabulate your values beforehand.
If you'd like to tabulate class members in some other way, such as by summing up a continuous variable, you can use the weight geom.

qplot(color, data=diamonds, geom='bar',weight=carat)+scale_y_continuous('carat'))

Time series with line and path plots

Line and path plots are typically used for time series data.

  • Line join the points from left to right
  • Path join them in the order that they appear in the dataset.
qplot(data, unemploy/pop, data = economics, geom='line')

We could draw a scatterplot of unemployment rate vs. length of unemployment, but then we could no longer see the evolution over time. The solution is to join points adjacent in time with line segments, forming a path plot.
Apply the colour aesthetic to the line to make it easier to see the direction of time.

qplot(unemploy/pop, uempmed, data  = economics, geom='path', colour = year(date)) + scale_area()

Faceting

We have already discussed using aesthetics (colour and shape) to compare subgroups, drawing all groups on the same plot. Faceting takes an alternative approach.

qplot(carat, data=diamonds, facets=color~., geom='histogram',binwidth=0.1, xlim=c(0,3))
qplot(carat, ..density.., data=diamonds, facets=color~., geom='histogram', binwidth=0.1, xlim=c(0,3))

Other options

xlim , ylim
log : e.g. log='x' will log the x-axis, log='xy' will log both.
main : main title of the plot, can be a string or an expression
xlab, ylab

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 199,711评论 5 468
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 83,932评论 2 376
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 146,770评论 0 330
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 53,799评论 1 271
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,697评论 5 359
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,069评论 1 276
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,535评论 3 390
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,200评论 0 254
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,353评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,290评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,331评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,020评论 3 315
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,610评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,694评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,927评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,330评论 2 346
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 41,904评论 2 341

推荐阅读更多精彩内容

  • 两个大学生模样的男子提着行李站在一栋阴森森的居民楼跟前,其中身着白色短袖体恤的小伙儿怯生生的问道:“雪飞,你真的要...
    复明的瞎子阅读 483评论 6 7
  • 可测试的javascript代码 首先,作为一个正在Javascript中挣扎的人来说,写这些东西着实很紧张,毕竟...
    Rondo_9阅读 2,970评论 0 0
  • 还是会回到轨迹上的。 想要去腾讯霸面(长沙 还是去了,还很幸运地被选上了简历有了群面的机会(当然还是要简历的内容和...
    小步喔阅读 570评论 1 1