序列化 (Serialization)是将对象的状态信息转换为可以存储或传输的形式的过程。在序列化期间,对象将其当前状态写入到临时或持久性存储区。以后,可以通过从存储区中读取或反序列化对象的状态,重新创建该对象。
在scrapy_redis中,一个Request对象先经过DupeFilter去重,然后递交给scheduler调度储存在Redis中,这就面临一个问题,Request是一个对象,Redis不能存储该对象,这时就需要将request序列化储存。
scrapy中序列化模块如下:
from scrapy_redis import picklecompat
"""A pickle wrapper module with protocol=-1 by default."""
try:
import cPickle as pickle # PY2
except ImportError:
import pickle
def loads(s):
return pickle.loads(s)
def dumps(obj):
return pickle.dumps(obj, protocol=-1)
当然python3直接使用pickle模块, 已经没有cPickle,该模块最为重要的两个方法,序列化与反序列化如上,通过序列化后的对象我们可以存储在数据库、文本等文件中,并快速恢复。
同时模式设计中的备忘录模式通过这种方式达到最佳效果《python设计模式(十九):备忘录模式》;可序列化的对象和数据类型如下:
-
None
,True,
False - 整数,长整数,浮点数,复数
- 普通字符串和Unicode字符串
- 元组、列表、集合和字典,只包含可选择的对象。
- 在模块顶层定义的函数
- 在模块顶层定义的内置函数
- 在模块的顶层定义的类。
- 这些类的实例
尝试对不可序列化对象进行操作,将引发PicklingError
异常;发生这种情况时,可能已经将未指定的字节数写入基础文件。尝试选择高度递归的数据结构可能会超过最大递归深度,RuntimeError
在这种情况下会被提起。
模块API
pickle.dump
(obj, file[, protocol])
- Write a pickled representation of obj to the open file object file. This is equivalent to
Pickler(file,``protocol).dump(obj)
.
If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value orHIGHEST_PROTOCOL
, the highest protocol version will be used.
*Changed in version 2.3: *Introduced the protocol parameter.
file must have awrite()
method that accepts a single string argument. It can thus be a file object opened for writing, aStringIO
object, or any other custom object that meets this interface. -
pickle.load
(file) - Read a string from the open file object file and interpret it as a pickle data stream, reconstructing and returning the original object hierarchy. This is equivalent to
Unpickler(file).load()
.
file must have two methods, aread()
method that takes an integer argument, and areadline()
method that requires no arguments. Both methods should return a string. Thus file can be a file object opened for reading, aStringIO
object, or any other custom object that meets this interface.
This function automatically determines whether the data stream was written in binary mode or not. -
pickle.dumps
(obj[, protocol]) - Return the pickled representation of the object as a string, instead of writing it to a file.
If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value orHIGHEST_PROTOCOL
, the highest protocol version will be used.
*Changed in version 2.3: *The protocol parameter was added. -
pickle.loads
(string) - Read a pickled object hierarchy from a string. Characters in the string past the pickled object’s representation are ignored.
至于应用场景,比较常见的有如下几种:
程序重启时恢复上次的状态、会话存储、对象的网络传输。