〇〇一:
2022.1.6 13:30
在linux下用惯了, 换到windows下,会有种种想不到的问题.比如昨天碰到的用read_csv读取文件报错,实际上是路径书写的问题.
错误信息一大堆:
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------OSErrorTraceback (most recent call last)E:\Temp/ipykernel_2916/1776550767.pyin<module> 1importpandasaspd 2# 读取练习数据,文件路径为'./工作/test_data.csv',编码格式为'utf-8'----> 3 test_data=pd.read_csv('E:\Downloads\课程素材\工作\test_data.csv',encoding='utf-8') 4# 查看 test_data 5test_datad:\program files\python37\lib\site-packages\pandas\util\_decorators.pyinwrapper(*args, **kwargs) 309stacklevel=stacklevel, 310)--> 311 returnfunc(*args,**kwargs) 312 313returnwrapperd:\program files\python37\lib\site-packages\pandas\io\parsers\readers.pyinread_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 584kwds.update(kwds_defaults) 585--> 586 return_read(filepath_or_buffer,kwds) 587 588d:\program files\python37\lib\site-packages\pandas\io\parsers\readers.pyin_read(filepath_or_buffer, kwds) 480 481# Create the parser.--> 482 parser=TextFileReader(filepath_or_buffer,**kwds) 483 484ifchunksizeoriterator:d:\program files\python37\lib\site-packages\pandas\io\parsers\readers.pyin__init__(self, f, engine, **kwds) 809self.options["has_index_names"]=kwds["has_index_names"] 810--> 811 self._engine=self._make_engine(self.engine) 812 813defclose(self):d:\program files\python37\lib\site-packages\pandas\io\parsers\readers.pyin_make_engine(self, engine) 1038) 1039# error: Too many arguments for "ParserBase"-> 1040 returnmapping[engine](self.f,**self.options)# type: ignore[call-arg] 1041 1042def_failover_to_python(self):d:\program files\python37\lib\site-packages\pandas\io\parsers\c_parser_wrapper.pyin__init__(self, src, **kwds) 49 50# open handles---> 51 self._open_handles(src,kwds) 52assertself.handlesisnotNone 53d:\program files\python37\lib\site-packages\pandas\io\parsers\base_parser.pyin_open_handles(self, src, kwds) 227memory_map=kwds.get("memory_map",False), 228storage_options=kwds.get("storage_options",None),--> 229 errors=kwds.get("encoding_errors","strict"), 230) 231d:\program files\python37\lib\site-packages\pandas\io\common.pyinget_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 705encoding=ioargs.encoding, 706errors=errors,--> 707 newline="", 708) 709else:OSError: [Errno 22] Invalid argument: 'E:\\Downloads\\课程素材\\工作\test_data.csv'
-----------------------------------------------------------------------------------------------
最主要是最后一句:
OSError: [Errno 22] Invalid argument: 'E:\\Downloads\\课程素材\\工作\test_data.csv'
前面复制的Windows文件夹路径是\\, 后面文件名前面的是我自己加上的\, 提醒我可能是路径格式问题.
根据参考网上查到一篇文章(见下),试了几种方法:
1) r'E:\Downloads\课程素材\工作\test_data.csv' -----OK
2) 'E:\\Downloads\\课程素材\\工作\\test_data.csv' -----OK
3) r'E:\\Downloads\\课程素材\\工作\\test_data.csv' -----OK
4) r'E:\\Downloads\\课程素材\\工作\test_data.csv' -----OK
5) 'E:/Downloads/课程素材/工作/test_data.csv' -----OK
6) r'E:/Downloads/课程素材/工作/test_data.csv' -----OK
7) r'E:\Downloads\\课程素材\/工作/test_data.csv' -----OK
8) 'E:\Downloads\\课程素材\/工作/test_data.csv' -----OK
9) 'E:\Downloads\\课程素材\/工作\/test_data.csv' -----OK
10) 'E:\Downloads\\课程素材\/工作/\test_data.csv' -----不行
11) r'E:\Downloads\\课程素材\/工作/\test_data.csv' -----OK
参考文献:
问题的根本:windows读取文件可以用\,但在字符串里面\被作为转义字符使用,那么python在描述路径时有两种方式:
'd:\\a.txt',转义的方式
r'd:\a.txt',声明字符串不需要转义
这样就实现了python在windows系统中用\来访问,其实这样比较麻烦的是不是,下面对几种情况说明:
问题1:其实python中文件的绝对路径可以直接复制window的路径,
如:
C:\Users\Administrator\Desktop\python\source.txt 这个路径是没有问题的
但是,其实你的绝对路径正确,但是执行报错,那么就是你文件名的问题,如:
C:\Users\Administrator\Desktop\python\t1.txt 这个路径绝对会报错,因为 \t被转义了
python就会解析为 C:\Users\Administrator\Desktop\python 1.txt 这个时候肯定会报错的
若果你改成下面的写法就不会报错啦(推荐使用此写法“/",可以避免很多异常)
C:/Users/Administrator/Desktop/python/t1.txt
————————————————
版权声明:本文为CSDN博主「讲测试的古古奇老师」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/jusulysunbeamy/article/details/51290080
〇〇二:
2022.1.6 15:11
解决了路径格式的问题, 从网上下载的示例csv可以打开了, 但是我从EXCEL表格转出的csv还是报错.
错误信息:
---------------------------------------------------------------------------UnicodeDecodeErrorTraceback (most recent call last)E:\Temp/ipykernel_2916/486158148.pyin<module> 1importpandasaspd 2# 读取练习数据,文件路径为'./工作/test_data.csv',编码格式为'utf-8'----> 3 test_data=pd.read_csv(r'I:\test_data.csv') 4# 查看 test_data 5test_datad:\program files\python37\lib\site-packages\pandas\util\_decorators.pyinwrapper(*args, **kwargs) 309stacklevel=stacklevel, 310)--> 311 returnfunc(*args,**kwargs) 312 313returnwrapperd:\program files\python37\lib\site-packages\pandas\io\parsers\readers.pyinread_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 584kwds.update(kwds_defaults) 585--> 586 return_read(filepath_or_buffer,kwds) 587 588d:\program files\python37\lib\site-packages\pandas\io\parsers\readers.pyin_read(filepath_or_buffer, kwds) 480 481# Create the parser.--> 482 parser=TextFileReader(filepath_or_buffer,**kwds) 483 484ifchunksizeoriterator:d:\program files\python37\lib\site-packages\pandas\io\parsers\readers.pyin__init__(self, f, engine, **kwds) 809self.options["has_index_names"]=kwds["has_index_names"] 810--> 811 self._engine=self._make_engine(self.engine) 812 813defclose(self):d:\program files\python37\lib\site-packages\pandas\io\parsers\readers.pyin_make_engine(self, engine) 1038) 1039# error: Too many arguments for "ParserBase"-> 1040 returnmapping[engine](self.f,**self.options)# type: ignore[call-arg] 1041 1042def_failover_to_python(self):d:\program files\python37\lib\site-packages\pandas\io\parsers\c_parser_wrapper.pyin__init__(self, src, **kwds) 67kwds["dtype"]=ensure_dtype_objs(kwds.get("dtype",None)) 68try:---> 69 self._reader=parsers.TextReader(self.handles.handle,**kwds) 70exceptException: 71self.handles.close()d:\program files\python37\lib\site-packages\pandas\_libs\parsers.pyxinpandas._libs.parsers.TextReader.__cinit__()d:\program files\python37\lib\site-packages\pandas\_libs\parsers.pyxinpandas._libs.parsers.TextReader._get_header()d:\program files\python37\lib\site-packages\pandas\_libs\parsers.pyxinpandas._libs.parsers.TextReader._tokenize_rows()d:\program files\python37\lib\site-packages\pandas\_libs\parsers.pyxinpandas._libs.parsers.raise_parser_error()UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 566: invalid start byte
最后一句提示可能是编码的问题.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 566: invalid start byte
难道excle转存的csv的默认编码格式不是UTF-8? 谷歌一查, 还真是, 默认是ANSI. ( https://zhidao.baidu.com/question/2014606813258805588.html )
在刚才语句后面加上encoding = 'ANSI' , 就可以正确打开csv了.