1. 剔除文本中包含指定字符的行
文本内容
(1)sed指令
用法:
sed -e '/char/d' # char 为指定的字符(串)
示例:
$ cat test | sed -e '/[N,n][O,o][T|t]4[Q|q][A|a]/d'
adfadsfsdfasd
ewfqwewdvs
weafdfdf
从执行结果可以看出,不论指定的字符位于文本的什么位置都是可以剔除该行的。
那么,如果只想剔除Not4QA(不论大小写)开头的文本呢 ?
$ cat test | sed -e '/^[N,n][O,o][T|t]4[Q|q][A|a]/d'
adfadsfsdfasd
ewfqwewdvs
weafdfdf
ewfqwNnot4qaewdvs
只想剔除Not4QA(不论大小写)结尾的文本呢 ?
$ cat test | sed -e '/[N,n][O,o][T|t]4[Q|q][A|a]$/d'
adfadsfsdfasd
ewfqwewdvs
weafdfdf
ewfqwNnot4qaewdvs
2. 替换文本中指定字符串a为字符串b
(1)sed指令
用法:
sed 's/a/b/g' # 字符串a为文本中原有字符串,b为需要替换的字符串
示例:
$ cat test | sed -e 's/[N|n][O|o]/Yes/g'
Yest4qa
Yest4qa
Yest4qa
Yest4qa
YesT4qa
YesT4qa
YesT4qa
Yest4qa
adfadsfsdfasd
ewfqwewdvs
weafdfdf
ewfqwNYest4qaewdvs
(2)${}指令
用法:
${string/substring/replacement} #使用$replacement, 来代替第一个匹配的$substring
${string//substring/replacement} #使用$replacement, 代替所有匹配的$substring
${string/#substring/replacement} #如果$string的前缀匹配$substring, 那么就用$replacement来代替匹配到的$substring
${string/%substring/replacement} #如果$string的后缀匹配$substring, 那么就用$replacement来代替匹配到的$substring
示例:
$ string=STringstriNG
$ echo ${string//ing/***}
STr***striNG
$ echo ${string/#ST/***}
***ringstriNG
$ echo ${string/%NG/***}
STringstri***
有了替换方法,将要替换的字符串设为空字符串,是不是就是删除指定字符串了呢 ?
$ cat test | sed -e 's/[N|n][O|o]//g' | head -3
t4qa
t4qa
t4qa
$ string=STringstriNG
$ echo ${string/#ST/}
ringstriNG
答案是肯定的。
3. 替换文本中的指定字符串,并写回文件
示例:
$ cat -n test
1 Not4qa
2 not4qabug 1234asdfghj
3 nOt4qaBUG 1234asdfghj
4 TheLineBeforYouWant
5 TheLineYouWant TheStringYouWantToChange TheStringYouDoNotWantToChange
6 adfadsfsdfasd
7 ewfqwewdvs
8 weafdfdf
$ sed -ig "5s/TheStringYouWantToChange/TheNewString/" test
# 或者使用如下指令
$ sed -e "5s/TheString/TheNewString/" test > test_temp
$ mv test_temp test
# end
$ cat -n test
1 Not4qa
2 not4qabug 1234asdfghj
3 nOt4qaBUG 1234asdfghj
4 TheLineBeforYouWant
5 TheLineYouWant TheNewString TheStringYouDoNotWantToChange
6 adfadsfsdfasd
7 ewfqwewdvs
8 weafdfdf
参考:https://coolshell.cn/articles/9104.html
4. 提取文本中的指定字符串
(1)grep指令
示例:
$ cat test | grep -o "[B|b][U|u][G|g] [0-9]\+"
bug 1234
BUG 1234
$ cat test | grep -o "[B|b][U|u][G|g] [0-9]\+" | grep -o "[0-9]\+"
1234
1234
(2)cut指令
用法
cut -d delimiter -f field #选项:-d指定分隔符,-f指定提取第几列
5. 提取文本中的指定字符串的下一行字符串
示例:
$ grep -A 1 'TheLineBeforYouWant' test
TheLineBeforYouWant
TheLineYouWant
$ grep -A 1 'TheLineBeforYouWant' test | grep -v 'TheLineBeforYouWant'
TheLineYouWant
6. 查看文件格式,转换文件格式
查看:
$ file test
test: UTF-8 Unicode text
转换:
# -c When this option is given, characters that cannot be converted are silently discarded, instead of lead-ing to a conversion error.
$ iconv -c -f utf8 -t gb2312//TRANSLIT test > test_gb2312
$ file test_gb2312
test_gb2312: ISO-8859 text
参考:https://stackoverflow.com/questions/7688464/iconv-unicode-unknown-input-format