使用stringr处理字符串数据
工具
匹配检测
Detect the presence or absence of a pattern in a string.
简单理解一下,就是被检测字符串中是否包含想要检测的字符
准备工作
library(tidyverse)
简单运行
x <- c("apple","banana","pear")
str_detect(x,"e")
[1] TRUE FALSE TRUE
进阶
- 在R中定义FALSE为0,TRUE为1。这使得可以使用前面所学的数学函数对其进行运算
#分解一下以下的语句
sum(str_detect(words,"^t"))
#>head(str_detect(words,"^t"))
[1] FALSE FALSE FALSE FALSE FALSE FALSE
#aaa <- str_detect(words,"^t") %>% sum()
#925 *0 +65 *1 =65
mean(str_detect(words,"[aeiou]$"))#跟上边的是一样的。
- 复杂逻辑条件下调用正则表达式
解释下面的正则表达式
[aeiou]+$
先解释[]内的含义表示非aeiou
+号表示重复一次或者多次
以非aeiou开头的单词重复一次或者多次并以非aeiou结尾的单词。 - 取子集和filter筛选
- str_count返回的是每个字符串中需求的字符个数
提取匹配内容
如果我想知道匹配检测为T的单词是什么?就需要对匹配检测的内容进行提取。
注意一点的是str_*系列的函数需要一个string和一个正则表达式才可以(pattern)。
例如:str_detect(string, pattern)
> has_color <- str_subset(sentences,colors)
> has_color
[1] "The spot on the blotter was made by green ink."
[2] "Torn scraps littered the stone floor."
[3] "It is hard to erase blue or red ink."
[4] "The box is held by a bright red snapper."
[5] "Nine men were hired to dig the ruins."
[6] "A man in a blue sweater sat at the desk."
[7] "The sky in the west is tinged with orange red."
has_color <- str_subset(sentences,color_match)
has_color
[1] "Glue the sheet to the dark blue background."
[2] "Two blue fish swam in the tank."
[3] "The colt reared and threw the tall rider."
[4] "The wide road shimmered in the hot sun."
[5] "See the cat glaring at the scared mouse."
[6] "A wisp of cloud hung in the blue air."
[7] "Leaves turn brown and yellow in the fall."
[8] "He ordered peach pie with ice cream."
[9] "Pure bred poodles have curls."
[10] "The spot on the blotter was made by green ink."
[11] "Mud was spattered on the front of his white shirt."
[12] "The sofa cushion is red and of light weight."
[13] "The sky that morning was clear and bright blue."
[14] "Torn scraps littered the stone floor."
[15] "The doctor cured him with these pills."
[16] "The new girl was fired today at noon."
[17] "The third act was dull and tired the players."
[18] "A blue crane is a tall wading bird."
[19] "Lire wires should be kept covered."
[20] "It is hard to erase blue or red ink."
[21] "The wreck occurred by the bank on Main Street."
[22] "The lamp shone with a steady green flame."
[23] "The box is held by a bright red snapper."
[24] "The prince ordered his head chopped off."
[25] "The houses are built of red clay bricks."
[26] "The red tape bound the smuggled food."
[27] "Nine men were hired to dig the ruins."
[28] "The flint sputtered and lit a pine torch."
[29] "Hedge apples may stain your hands green."
[30] "The old pan was covered with hard fudge."
[31] "The plant grew large and green in the window."
[32] "The store walls were lined with colored frocks."
[33] "The purple tie was ten years old."
[34] "Bathe and relax in the cool green grass."
[35] "The clan gathered on each dull night."
[36] "The lake sparkled in the red hot sun."
[37] "Mark the spot with a sign painted red."
[38] "Smoke poured out of every crack."
[39] "Serve the hot rum to the tired heroes."
[40] "The couch cover and hall drapes were blue."
[41] "He offered proof in the form of a lsrge chart."
[42] "A man in a blue sweater sat at the desk."
[43] "The sip of tea revives his tired friend."
[44] "The door was barred, locked, and bolted as well."
[45] "A thick coat of black paint covered all."
[46] "The small red neon lamp went out."
[47] "Paint the sockets in the wall dull green."
[48] "Wake and rise, and step into the green outdoors."
[49] "The green light in the brown box flickered."
[50] "He put his last cartridge into the gun and fired."
[51] "The ram scared the school children off."
[52] "Tear a thin sheet from the yellow pad."
[53] "Dimes showered down from all sides."
[54] "The sky in the west is tinged with orange red."
[55] "The red paper brightened the dim stage."
[56] "The hail pattered on the burnt brown grass."
[57] "The big red apple fell to the ground."
其实在这里我有一些疑问,为什么要将单字符转换为字符串啊!
后来我读了读帮助,恍然大悟
str_subset() is a wrapper around x[str_detect(x, pattern)], and is equivalent to grep(pattern, x, value = TRUE). str_which() is a wrapper around which(str_detect(x, pattern)), and is equivalent to grep(pattern, x).
sum(str_detect(sentences,colors))
[1] 7
因为它只会返回一个与输入向量具有同样长度的逻辑向量啊!多么痛的领悟!
以后跟if,for,apply一起用吧!