python中的帮助
- Help on built-in function encode:
encode(...) method of builtins.str instance
S.encode(encoding='utf-8', errors='strict') -> bytes
Encode S using the codec registered for encoding. Default encoding
is 'utf-8'. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.
- Help on built-in function decode:
decode(encoding='utf-8', errors='strict') method of builtins.bytes instance
Decode the bytes using the codec registered for encoding.
encoding
The encoding with which to decode the bytes.
errors
The error handling scheme to use for the handling of decoding errors.
The default is 'strict' meaning that decoding errors raise a
UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that
can handle UnicodeDecodeErrors.
代码
# 将'汉字'字符串实例编码为utf-8,即将汉字转化为计算机能够识别的二进制数字b'\xe6\xb1\x89\xe5\xad\x97'
'汉字'.encode('utf-8')
# 将计算机中的二进制数字转化为对应的字符对象
b'\xe6\xb1\x89\xe5\xad\x97'.decode('utf-8')
windows 下的编码问题
- windows 默认以gbk编码,因此其他编码下的字符串可能无法解码。
此时可先将字符串编码为gbk,在将其解码显示即可。
# 将其它编码格式的文本转化为windows可显示文本
# (utf-8)文本----->gbk----> 二进制数字---gbk---> 文本(gbk)
'其它编码文本'.encode('gbk').decode('gbk')
- 若源文本有特殊字符,出现类似“gbk codec can't encode xxxxx”则需要使用参数ignore
'其它编码文本'.encode('gbk', 'ignore').decode('gbk')
- 从文件中读取内容时,可先读取二进制,在将其解码
# lago1.html为utf-8编码的文件,但在win下会自动以gbk方式读取,报错
with open('lago1.html', 'r') as f:
content = f.read()
print(content)
# 此时可先读取二进制,在解码为utf-8
with open('lago1.html', 'rb') as f:
content = f.read()
print(content.decode('utf-8'))