reshape — Convert data from wide to long form and vice versa

Description

reshape converts data from wide to long form and vice versa.

Quick start

Create v from 2 time periods stored in v1 and v2 for observations identified by idvar and add tvar identifying time period

reshape long v, i(idvar) j(tvar)

Create v from 2 subobservations stored in v1 and v2 for observations identified by idvar and add subobs identifying each subobservation

reshape long v, i(idvar) j(subobs)

As above, but allow subobs to contain strings

reshape long v, i(idvar) j(subobs) string

Undo results from above

reshape wide

Create v1 and v2 from v with observations identified by idvar and time period identified by tvar

reshape wide v, i(idvar) j(tvar)

Undo results from above

reshape long

Create var and time identifier tvar from v1ar and v2ar with observation identifier idvar

reshape long v@ar, i(idvar) j(tvar)

Syntax

Overview

       long
    +------------+                  wide
    | i  j  stub |                 +----------------+
    |------------|                 | i  stub1 stub2 |
    | 1  1   4.1 |     reshape     |----------------|
    | 1  2   4.5 |   <--------->   | 1    4.1   4.5 |
    | 2  1   3.3 |                 | 2    3.3   3.0 |
    | 2  2   3.0 |                 +----------------+
    +------------+

从长到宽::

                                 existing variable
                               /
reshape wide stub, i(i) j(j)

从宽到长:

reshape long stub, i(i) j(j)
                            \
                              j new variable

使用 reshape wide 后返回长型:

reshape long

使用 reshape long 后返回宽型:

reshape wide

Basic syntax

将数据从宽格式转换为长格式

 reshape long stubnames, i(varlist) [options]

将数据从长格式转换为宽格式

reshape wide stubnames, i(varlist) [options]

使用reshape wide后,将数据转换回长格式

 reshape long

使用reshape long后将数据转换回宽格式

reshape wide

List problem observations when reshape fails

reshape error

options

i(varlist) :使用 varlist 作为ID变量
j(varname [values]) : long->wide: varname, 现有变量
wide->long: varname, 新变量
可选地指定子集 ** varname** 的值
string: ** varname** 是一个字符串变量(默认为数字)

i(varlist),此项是必须的。
其中值为#[ - #] [...]如果 varname 是数字(默认)。
"string" ["string" ...] 如果 varname 是字符串。
并且其中存根名称是变量名称 (long-> wide) ,或者是变量名称的存根 (wide-> long) ,并且两种方式都可以包含 @ ,表示 j 出现或将出现在名称中的位置。 在上面的例子中,当我们写 “reshape wide stub” 时,我们可以编写 “reshape wide stub @” ,因为 j 默认最终作为后缀。 如果我们写了 stu @ b ,那么宽变量将被命名为 stu1bstu2b

Advanced syntax

reshape i varlist
reshape j varname [values] [, string]
reshape xij fvarnames [, atwl(chars)]
reshape xi [varlist]
reshape [query]
reshape clear

Options

i(varlist) 指定其唯一值表示逻辑观察的变量。** i()** 是 必须的。
j(varname [values]) 指定其唯一值表示子观察的变量。values 列出了要从 varname 中使用的唯一值,这些值通常没有明确说明,因为 reshape 将自动从数据中确定它们。
string 指定 j() 可以包含字符串值。.
atwl(chars) , 只有高级语法可用且未在对话框中显示,指定在将数据从宽格式转换为长格式时用 ASCII 纯字符代替@character。

Description of basic syntax

在使用 reshape 之前,您需要确定数据是长形还是宽形。 您还必须确定用于组织的逻辑观察 (i) 和子观察 (j)
数据。 假设您有以下数据,可以按照以下方式组织为宽或长格式:

图片.png

根据这些数据,您可以使用 reshape 从一种形式转换为另一种形式:

reshape long inc, i(id) j(year) /* 从左边到右边 */
reshape wide inc, i(id) j(year) /* 从右向左*/

因为我们没有在命令中指定性别,所以 Stata 假定它在逻辑观察中是恒定的,这里是 id

Wide and long data forms

将数据视为样本 Xij 的集合,其中 i 是逻辑观察或组标识符,j 是子观察或组内标识符。通过逻辑观察来组织宽格式数据,将所有数据存储在一行中的特定观察上。 通过子观察组织长格式数据,将数据存储在多行中。

Example 1

例如,我们可能会有关于1980 - 1982年间某人的身份,性别和年收入的数据。 我们有两个 Xij 变量,数据范围很宽:

use http://www.stata-press.com/data/r15/reshape1
list
图片.png

将这些数据转换为长格式,我们可以输入

reshape long inc ue, i(id) j(year)

图片.png

在原始的宽格式数据集中没有名为年份的变量.在我们的长数据集中,年份将是一个新的变量。在这个转换之后,我们有
图片.png

我们可以返回到原始的,宽格式的数据集。
图片.png

从Wide转换到Long将创建 j(year) 变量。从长到宽的转换会删除 j(year) 变量。

Technical note

如果您的数据是宽型的,并且您没有组标识符变量(i(varlist)required 选项),您可以使用 generate 轻松创建一个; 见 [D] generate 。 例如,在最后一个示例中,如果我们的数据集中没有 id 变量,我们可以通过键入来创建它

generate id = _n

Avoiding and correcting mistakes

reshape 通常会检测数据,当数据不适合 reshape ; 将会发出 error ,但数据保持不变。

Example 2

以下宽型数据包含一个错误:


图片.png

图片.png

当数据是宽形式时,i变量必须是唯一的;我们输入了i(Id),但是我们有2个观测值,其中id是2。(第二人是男性还是女性?)

Example 3

当数据是长格式时,重复I变量并不是错误,但是下面的数据也有类似的错误:

图片.png

在长形式中,i(id)不一定是唯一的,但j(year)在i中必须是唯一的; 否则,1981年的公司价值= = 1?
reshape 告诉我们输入reshape error 来查看问题样本。
图片.png

Example 4

考虑一些没有错误的长形数据。 我们列出了前4个观察结果。

图片.png

但是,当我们将数据转换为宽形式时,我们忘记提到 ue 变量(这在人的内部是不同的)。
图片.png

这里 reshape 观察到 ue 在id中不是恒定的,因此无法重构数据,因此对id有单一的观察。 我们应该输入

reshape wide inc ue, i(id) j(year)

总之,有三种情况,reshape 将拒绝转换数据:

  1. 数据是宽型的,i()不是唯一的。
    2.数据是长型的,j在i中不是唯一的。
  2. 数据是长型的,未提及的变量在i内不是恒定的。

Example 5

由于存在一些错误,reshape 可能会转换数据并产生令人惊讶的结果。 假设我们忘记提及 ue 变量在以下宽数据中的id内变化:

图片.png

图片.png

We did not state that ue varied within i, so the variables ue80, ue81, and ue82 were left as is.
reshape did not complain. There is no real problem here because no information has been lost. In
fact, this may actually be the result we wanted. Probably, however, we simply forgot to include ue among the Xij variables. If you obtain an unexpected result, here is how to undo it:

  1. If you typed reshape long . . . to produce the result, type reshape wide (without arguments) to undo it.
  2. If you typed reshape wide . . . to produce the result, type reshape long (without arguments) to undo it.

reshape long and reshape wide without arguments

Whenever you type a reshape long or reshape wide command with arguments, reshape remembers it. Thus you might type

reshape long inc ue, i(id) j(year)

and work with the data like that. You could then type

reshape wide

to convert the data back to the wide form. Then later you could type

reshape long

to convert them back to the long form. If you save the data, you can even continue using reshape wide and reshape long without arguments during a future Stata session.Be careful. If you create new Xij variables, you must tell reshape about them by typing the
full reshape command, although no real damage will be done if you forget. If you are converting
from long to wide form, reshape will catch your error and refuse to make the conversion. If you are converting from wide to long, reshape will convert the data, but the result will be surprising:
remember what happened when we forgot to mention the ue variable and ended up with ue80, ue81,and ue82 in our long data; see example 5. You can reshape long to undo the unwanted change
and then try again.
So, we can type

reshape wide

to get back to our original, wide-form data and then type the reshape long command that we intended:

reshape long inc ue, i(id) j(year)

Missing variables

When converting data from wide form to long form, reshape does not demand that all the variables exist. Missing variables are treated as variables with missing observations.

Example 6

Let’s drop ue81 from the wide form of the data:


图片.png

图片.png

reshape placed missing values where ue81 values were unavailable. If we reshaped these data back to wide form by typing

reshape wide inc ue, i(id) j(year)

the ue81 variable would be created and would contain all missing values.

Advanced issues with basic syntax: i()

The i() option can indicate one i variable (as our past examples have illustrated) or multiple variables. An example of multiple i variables would be hospital ID and patient ID within each hospital.

reshape . . . , i(hid pid)

Unique pairs of values for hid and pid in the data define the grouping variable for reshape.

Advanced issues with basic syntax: j()

The j() option takes a variable name (as our past examples have illustrated) or a variable name and a list of values. When the values are not provided, reshape deduces them from the data. Specifying
the values with the j() option is rarely needed. reshape never makes a mistake when the data are in long form and you type reshape wide. The values are easily obtained by tabulating the j variable.
reshape can make a mistake when the data are in wide form and you type reshape long if your variables are poorly named. Say that you have the inc80, inc81, and inc82 variables, recording
income in each of the indicated years, and you have a variable named inc2, which is not income but indicates when the area was reincorporated. You type

reshape long inc, i(id) j(year)

reshape sees the inc2, inc80, inc81, and inc82 variables and decides that there are four groups in which j = 2, 80, 81, and 82.
The easiest way to solve the problem is to rename the inc2 variable to something other than “inc” followed by a number; see [D] rename.
You can also keep the name and specify the j values. To perform the reshape, you can type

reshape long inc, i(id) j(year 80-82)

or

reshape long inc, i(id) j(year 80 81 82)

You can mix the dash notation for value ranges with individual numbers. reshape would understand 80 82-87 89 91-95 as a valid values specification.
At the other extreme, you can omit the j() option altogether with reshape long. If you do, the j variable will be named -j.

Advanced issues with basic syntax: xij

When specifying variable names, you may include @ characters to indicate where the numbers go.

Example 7

Let’s reshape the following data from wide to long form:


图片.png

图片.png

At most one @ character may appear in each name. If no @ character appears, results are as if the @ character appeared at the end of the name. So, the equivalent reshape command to the one above is

reshape long inc@r ue@, i(id) j(year)

inc@r specifies variables named inc#r in the wide form and incr in the long form. The @ notation may similarly be used for converting data from long to wide format:

 reshape wide inc@r ue, i(id) j(year)

Advanced issues with basic syntax: String identifiers for j()

The string option allows j to take on string values.

Example 8

Consider the following wide data on husbands and wives. In these data, incm is the income of the man and incf is the income of the woman.


图片.png

These data can be reshaped into separate observations for males and females by typing


图片.png

The string option specifies that j take on nonnumeric values. The result is
图片.png

sex will be a string variable. Similarly, these data can be converted from long to wide form by typing

reshape wide inc, i(id) j(sex) string

Strings are not limited to being single characters or even having the same length. You can specify the location of the string identifier in the variable name by using the @ notation.

Example 9

Suppose that our variables are named id, kids, incmale, and incfem.


图片.png

图片.png

If the wide data had variables named minc and finc, the appropriate reshape command would have been

reshape long @inc, i(id) j(sex) string

The resulting variable in the long form would be named inc.
We can also place strings in the middle of the variable names. If the variables were named incMomand incFome, the reshape command would be

reshape long inc@ome, i(id) j(sex) string

Be careful with string identifiers because it is easy to be surprised by the result. Say that we have
wide data having variables named incm, incf, uem, uef, agem, and agef. To make the data long,we might type

reshape long inc ue age, i(id) j(sex) string

Along with these variables, we also have the variable agenda. reshape will decide that the sexes are m, f, and nda. This would not happen without the string option if the variables were named
inc0, inc1, ue0, ue1, age0, and age1, even with the agenda variable present in the data.
Advanced issues with basic syntax: Second-level nesting
Sometimes the data may have more than one possible j variable for reshaping. Suppose that your data have both a year variable and a sex variable. One logical observation in the data might be
represented in any of the following four forms:


图片.png

reshape can convert any of these forms to any other. Converting data from the long–long form to the wide–wide form (or any of the other forms) takes two reshape commands. Here is how we would do it:


图片.png

Description of advanced syntax

The advanced syntax is simply a different way of specifying the reshape command, and it has one seldom-used feature that provides extra control. Rather than typing one reshape command to describe the data and perform the conversion, such as

reshape long inc, i(id) j(year)

you type a sequence of reshape commands. The initial commands describe the data, and the last command performs the conversion:

reshape i id
reshape j year
reshape xij inc
reshape long

reshape i corresponds to i() in the basic syntax.
reshape j corresponds to j() in the basic syntax.
reshape xij corresponds to the variables specified in the basic syntax. reshape xij also accepts the atwl() option for use when @ characters are specified in the fvarnames. atwl stands for at-whenlong. When you specify names such as inc@r or ue@, in the long form the names become incr and ue, and the @ character is ignored. atwl() allows you to change @ into whatever you specify. For example, if you specify atwl(X), the long-form names become incXr and ueX. There is also one more specification, which has no counterpart in the basic syntax:

reshape xi varlist

In the basic syntax, Stata assumes that all unspecified variables are constant within i. The advanced syntax works the same way, unless you specify the reshape xi command, which names the constant�within-i variables. If you specify reshape xi, any variables that you do not explicitly specify are dropped from the data during the conversion. As a practical matter, you should explicitly drop the unwanted variables before conversion. For instance, suppose that the data have variables inc80, inc81, inc82, sex, age, and age2 and that you no longer want the age2 variable. You could specify

reshape xi sex age

or

drop age2

and leave reshape xi unspecified. reshape xi does have one minor advantage. It saves reshape the work of determining which
variables are unspecified. This saves a relatively small amount of computer time. Another advanced-syntax feature is reshape query, which is equivalent to typing reshape by itself. reshape query reports which reshape parameters have been defined. reshape i, reshape j, reshape xij, and reshape xi specifications may be given in any order and may be repeated to change or correct what has been specified.Finally, reshape clear clears the definitions. reshape definitions are stored with the dataset when you save it. reshape clear allows you to erase these definitions. The basic syntax of reshape is implemented in terms of the advanced syntax, so you can mix basic and advanced syntaxes.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 205,236评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,867评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,715评论 0 340
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,899评论 1 278
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,895评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,733评论 1 283
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,085评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,722评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,025评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,696评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,816评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,447评论 4 322
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,057评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,009评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,254评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,204评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,561评论 2 343

推荐阅读更多精彩内容