大师兄的Python源码学习笔记(九): 自制Python
大师兄的Python源码学习笔记(十一): Python的虚拟机框架
一、Python程序的执行过程
- Python在执行程序之前,解释器(interpreter)首先要将.py文件编译为字节码。
- 编译后的字节码会被交给虚拟机逐条执行。
- 而解释器和虚拟机都包含在\win32\python37_d.dll文件中。
- 相较编译型语言,Python这种解释型语言更擅长跨平台,但执行效率较低。
二、编译的结果
- 编译结束后,会将源文件中的静态信息存储在PyCodeObject对象中,而PyCodeObject对象有可能会保存在.pyc文件中。
- 对于Python编译器来说,PyCodeObject对象是编译的结果,.pyc文件是硬盘的表现形式。
1. 关于PyCodeObject
- 为了区分成员名,Python编译器会将源码以作用域分割位数个编码块(Code Block),并对应每个编码块(Code Block)创建一个PyCodeObject对象。
- PyCodeObject的数据结构如下:
include\code.h
/* Bytecode object */
typedef struct {
PyObject_HEAD
int co_argcount; /* #arguments, except *args */
int co_kwonlyargcount; /* #keyword only arguments */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
int co_firstlineno; /* first source line number */
PyObject *co_code; /* instruction opcodes */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
/* The rest aren't used in either hash or comparisons, except for co_name,
used in both. This is done to preserve the name and line number
for tracebacks and debuggers; otherwise, constant de-duplication
would collapse identical functions/lambdas defined on different lines.
*/
Py_ssize_t *co_cell2arg; /* Maps cell vars which are arguments. */
PyObject *co_filename; /* unicode (where it was loaded from) */
PyObject *co_name; /* unicode (name, for reference) */
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
void *co_zombieframe; /* for optimization only (see frameobject.c) */
PyObject *co_weakreflist; /* to support weakrefs to code objects */
/* Scratch space for extra data relating to the code object.
Type is a void* to keep the format private in codeobject.c to force
people to go through the proper APIs. */
void *co_extra;
} PyCodeObject;
元素 | 含义 |
---|---|
co_argcount | Code Block的位置参数的个数。 |
co_kwonlyargcount | CodeBlock中的关键参数的个数 |
co_nlocals | Code Block中局部变量的个数,包含位置参数的个数。 |
co_stacksize | 执行该段Code Block需要的栈空间。 |
co_flags | N/A |
co_firstlineno | Code Block在对应.py文件的起始行。 |
co_code | 编辑后得到的字节码,是一个PyStringObject对象。 |
co_consts | 保存Code Block中的所有常量,是一个PyTupleObject对象。 |
co_names | 保存Code Block中的所有符号,是一个PyTupleObject对象。 |
co_varnames | Code Block中的局部变量名集合。 |
co_freevars | 闭包保存的元素。 |
co_cellvars | Code Block中内部嵌套函数所引用的局部变量名集合。 |
co_filename | Code Block所在的文件名 。 |
co_name | Code Block的名字,通常是函数名或者类名。 |
co_lnotab | 字节码指令与python源代码的行号之间的对应关系。 |
- PyCode_Type类型对象:
objects\codeobject.c
PyTypeObject PyCode_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"code",
sizeof(PyCodeObject),
0,
(destructor)code_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
(reprfunc)code_repr, /* tp_repr */
0, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
(hashfunc)code_hash, /* tp_hash */
0, /* tp_call */
0, /* tp_str */
PyObject_GenericGetAttr, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT, /* tp_flags */
code_doc, /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
code_richcompare, /* tp_richcompare */
offsetof(PyCodeObject, co_weakreflist), /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
code_methods, /* tp_methods */
code_memberlist, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
code_new, /* tp_new */
};
2. 关于.pyc文件
- .pyc文件是用于储存PyCodeObject对象的二进制文件。
demo.pyc
420d 0d0a 0000 0000 5945 6160 3400 0000
e300 0000 0000 0000 0000 0000 0003 0000
0040 0000 0073 1a00 0000 4700 6400 6401
8400 6401 8302 5a00 6402 6403 8400 5a01
6404 5300 2905 6300 0000 0000 0000 0000
... ...
- 如果.pyc文件存在,Python将检查.pyc文件的内部时间戳是否不早于相应的.py文件。如果是,则加载.pyc;如果不是,或者.pyc还不存在,则Python将.py文件编译为.pyc并加载它。
- 但.pyc文件并不总是存在,当被内部引用时(import)时,会在__pycache__文件夹下生成.pyc文件。
demo.py
class Demo:
...
def func():
...
main.py
import demo
if __name__ == '__main__':
d = demo.Demo()
demo.func()
- 执行main.py后:
xxxx/xx/xx xx:xx <DIR> .
xxxx/xx/xx xx:xx <DIR> ..
xxxx/xx/xx xx:xx 351 demo.cpython-37.pyc
1 个文件 351 字节
三. 在Python中访问PyCodeObject对象
- Python中对应PyCodeObject的对象为code对象。
- 通过compile()函数可以获得一个code对象,并访问其属性。
>>>co = compile("s = 'Hello World!'",'demo.py','exec')
>>>print(type(co))
>>>for c in dir(co):
>>> print(f"{c}={eval('co.'+c)}")
<class 'code'>
__class__=<class 'code'>
__delattr__=<method-wrapper '__delattr__' of code object at 0x0000020A1D9BD8A0>
__dir__=<built-in method __dir__ of code object at 0x0000020A1D9BD8A0>
__doc__=code(argcount, kwonlyargcount, nlocals, stacksize, flags, codestring,
constants, names, varnames, filename, name, firstlineno,
lnotab[, freevars[, cellvars]])
Create a code object. Not for the faint of heart.
__eq__=<method-wrapper '__eq__' of code object at 0x0000020A1D9BD8A0>
__format__=<built-in method __format__ of code object at 0x0000020A1D9BD8A0>
__ge__=<method-wrapper '__ge__' of code object at 0x0000020A1D9BD8A0>
__getattribute__=<method-wrapper '__getattribute__' of code object at 0x0000020A1D9BD8A0>
__gt__=<method-wrapper '__gt__' of code object at 0x0000020A1D9BD8A0>
__hash__=<method-wrapper '__hash__' of code object at 0x0000020A1D9BD8A0>
__init__=<method-wrapper '__init__' of code object at 0x0000020A1D9BD8A0>
__init_subclass__=<built-in method __init_subclass__ of type object at 0x00007FF848BFB8C0>
__le__=<method-wrapper '__le__' of code object at 0x0000020A1D9BD8A0>
__lt__=<method-wrapper '__lt__' of code object at 0x0000020A1D9BD8A0>
__ne__=<method-wrapper '__ne__' of code object at 0x0000020A1D9BD8A0>
__new__=<built-in method __new__ of type object at 0x00007FF848BFB8C0>
__reduce__=<built-in method __reduce__ of code object at 0x0000020A1D9BD8A0>
__reduce_ex__=<built-in method __reduce_ex__ of code object at 0x0000020A1D9BD8A0>
__repr__=<method-wrapper '__repr__' of code object at 0x0000020A1D9BD8A0>
__setattr__=<method-wrapper '__setattr__' of code object at 0x0000020A1D9BD8A0>
__sizeof__=<built-in method __sizeof__ of code object at 0x0000020A1D9BD8A0>
__str__=<method-wrapper '__str__' of code object at 0x0000020A1D9BD8A0>
__subclasshook__=<built-in method __subclasshook__ of type object at 0x00007FF848BFB8C0>
co_argcount=0
co_cellvars=()
co_code=b'd\x00Z\x00d\x01S\x00'
co_consts=('Hello World!', None)
co_filename=demo.py
co_firstlineno=1
co_flags=64
co_freevars=()
co_kwonlyargcount=0
co_lnotab=b''
co_name=<module>
co_names=('s',)
co_nlocals=0
co_stacksize=1
co_varnames=()
四、.pyc文件的生成过程
1. .pyc文件和PyCodeObject对象的生成流程
- 编译器首先调用load_module()函数。
lib\imp.py
def load_module(name, file, filename, details):
suffix, mode, type_ = details
if mode and (not mode.startswith(('r', 'U')) or '+' in mode):
raise ValueError('invalid file open mode {!r}'.format(mode))
elif file is None and type_ in {PY_SOURCE, PY_COMPILED}:
msg = 'file object required for import (type code {})'.format(type_)
raise ValueError(msg)
elif type_ == PY_SOURCE:
return load_source(name, filename, file)
elif type_ == PY_COMPILED:
return load_compiled(name, filename, file)
elif type_ == C_EXTENSION and load_dynamic is not None:
if file is None:
with open(filename, 'rb') as opened_file:
return load_dynamic(name, filename, opened_file)
else:
return load_dynamic(name, filename, file)
elif type_ == PKG_DIRECTORY:
return load_package(name, filename)
elif type_ == C_BUILTIN:
return init_builtin(name)
elif type_ == PY_FROZEN:
return init_frozen(name)
else:
msg = "Don't know how to import {} (type code {})".format(name, type_)
raise ImportError(msg, name=name)
- 后根据条件判断调用load_source()函数。
lib\imp.py
def load_source(name, pathname, file=None):
loader = _LoadSourceCompatibility(name, pathname, file)
spec = util.spec_from_file_location(name, pathname, loader=loader)
if name in sys.modules:
module = _exec(spec, sys.modules[name])
else:
module = _load(spec)
# To allow reloading to potentially work, use a non-hacked loader which
# won't rely on a now-closed file object.
module.__loader__ = machinery.SourceFileLoader(name, pathname)
module.__spec__.loader = module.__loader__
return module
- 再调用_load()函数。
lib\importlib\_bootstrap.py
def _load(spec):
"""Return a new module object, loaded by the spec's loader.
The module is not added to its parent.
If a module is already in sys.modules, that existing module gets
clobbered.
"""
with _ModuleLockManager(spec.name):
return _load_unlocked(spec)
- 此处实际调用了_frozen_importlib。
Python\importlib.h /* Auto-generated by Programs/_freeze_importlib.c */ const unsigned char _Py_M__importlib[] = { 99,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0, 0,64,0,0,0,115,208,1,0,0,100,0,90,0,100,1, 97,1,100,2,100,3,132,0,90,2,100,4,100,5,132,0, ... ...
- 再调用_load_unlocked()函数
lib\importlib\_bootstrap.py
def _load_unlocked(spec):
# A helper for direct use by the import system.
if spec.loader is not None:
# not a namespace package
if not hasattr(spec.loader, 'exec_module'):
return _load_backward_compatible(spec)
module = module_from_spec(spec)
with _installed_safely(module):
if spec.loader is None:
if spec.submodule_search_locations is None:
raise ImportError('missing loader', name=spec.name)
# A namespace package so do nothing.
else:
spec.loader.exec_module(module)
# We don't ensure that the import-related module attributes get
# set in the sys.modules replacement case. Such modules are on
# their own.
return sys.modules[spec.name]
- 再调用_LoaderBasics类中的exec_module()方法。
lib\importlib\_bootstrap_external.py
def exec_module(self, module):
"""Execute the module."""
code = self.get_code(module.__name__)
if code is None:
raise ImportError('cannot load module {!r} when get_code() '
'returns None'.format(module.__name__))
_bootstrap._call_with_frames_removed(exec, code, module.__dict__)
- 最后调用SourceLoader(_LoaderBasics)类中的get_code()方法,生成PyCodeObject和.pyc字节码文件。
lib\importlib\_bootstrap_external.py
def get_code(self, fullname):
"""Concrete implementation of InspectLoader.get_code.
Reading of bytecode requires path_stats to be implemented. To write
bytecode, set_data must also be implemented.
"""
source_path = self.get_filename(fullname)
source_mtime = None
source_bytes = None
source_hash = None
hash_based = False
check_source = True
try:
bytecode_path = cache_from_source(source_path)
except NotImplementedError:
bytecode_path = None
else:
try:
st = self.path_stats(source_path)
except OSError:
pass
else:
source_mtime = int(st['mtime'])
try:
data = self.get_data(bytecode_path)
except OSError:
pass
else:
exc_details = {
'name': fullname,
'path': bytecode_path,
}
try:
flags = _classify_pyc(data, fullname, exc_details)
bytes_data = memoryview(data)[16:]
hash_based = flags & 0b1 != 0
if hash_based:
check_source = flags & 0b10 != 0
if (_imp.check_hash_based_pycs != 'never' and
(check_source or
_imp.check_hash_based_pycs == 'always')):
source_bytes = self.get_data(source_path)
source_hash = _imp.source_hash(
_RAW_MAGIC_NUMBER,
source_bytes,
)
_validate_hash_pyc(data, source_hash, fullname,
exc_details)
else:
_validate_timestamp_pyc(
data,
source_mtime,
st['size'],
fullname,
exc_details,
)
except (ImportError, EOFError):
pass
else:
_bootstrap._verbose_message('{} matches {}', bytecode_path,
source_path)
return _compile_bytecode(bytes_data, name=fullname,
bytecode_path=bytecode_path,
source_path=source_path)
if source_bytes is None:
source_bytes = self.get_data(source_path)
code_object = self.source_to_code(source_bytes, source_path)
_bootstrap._verbose_message('code object from {}', source_path)
if (not sys.dont_write_bytecode and bytecode_path is not None and
source_mtime is not None):
if hash_based:
if source_hash is None:
source_hash = _imp.source_hash(source_bytes)
data = _code_to_hash_pyc(code_object, source_hash, check_source)
else:
data = _code_to_timestamp_pyc(code_object, source_mtime,
len(source_bytes))
try:
self._cache_bytecode(source_path, bytecode_path, data)
_bootstrap._verbose_message('wrote {!r}', bytecode_path)
except NotImplementedError:
pass
return code_object
2. 生成PyCodeObject的源码
- 生成PyCodeObject的源码对应builtin_compile_impl方法。
Python\bltinmodule.c
static PyObject *
builtin_compile_impl(PyObject *module, PyObject *source, PyObject *filename,
const char *mode, int flags, int dont_inherit,
int optimize)
/*[clinic end generated code: output=1fa176e33452bb63 input=0ff726f595eb9fcd]*/
{
PyObject *source_copy;
const char *str;
int compile_mode = -1;
int is_ast;
PyCompilerFlags cf;
int start[] = {Py_file_input, Py_eval_input, Py_single_input};
PyObject *result;
cf.cf_flags = flags | PyCF_SOURCE_IS_UTF8;
if (flags &
~(PyCF_MASK | PyCF_MASK_OBSOLETE | PyCF_DONT_IMPLY_DEDENT | PyCF_ONLY_AST))
{
PyErr_SetString(PyExc_ValueError,
"compile(): unrecognised flags");
goto error;
}
/* XXX Warn if (supplied_flags & PyCF_MASK_OBSOLETE) != 0? */
if (optimize < -1 || optimize > 2) {
PyErr_SetString(PyExc_ValueError,
"compile(): invalid optimize value");
goto error;
}
if (!dont_inherit) {
PyEval_MergeCompilerFlags(&cf);
}
if (strcmp(mode, "exec") == 0)
compile_mode = 0;
else if (strcmp(mode, "eval") == 0)
compile_mode = 1;
else if (strcmp(mode, "single") == 0)
compile_mode = 2;
else {
PyErr_SetString(PyExc_ValueError,
"compile() mode must be 'exec', 'eval' or 'single'");
goto error;
}
is_ast = PyAST_Check(source);
if (is_ast == -1)
goto error;
if (is_ast) {
if (flags & PyCF_ONLY_AST) {
Py_INCREF(source);
result = source;
}
else {
PyArena *arena;
mod_ty mod;
arena = PyArena_New();
if (arena == NULL)
goto error;
mod = PyAST_obj2mod(source, arena, compile_mode);
if (mod == NULL) {
PyArena_Free(arena);
goto error;
}
if (!PyAST_Validate(mod)) {
PyArena_Free(arena);
goto error;
}
result = (PyObject*)PyAST_CompileObject(mod, filename,
&cf, optimize, arena);
PyArena_Free(arena);
}
goto finally;
}
str = source_as_string(source, "compile", "string, bytes or AST", &cf, &source_copy);
if (str == NULL)
goto error;
result = Py_CompileStringObject(str, filename, start[compile_mode], &cf, optimize);
Py_XDECREF(source_copy);
goto finally;
error:
result = NULL;
finally:
Py_DECREF(filename);
return result;
}
- 在上面代码的最后部分,调用了source_as_string方法将.py源码生成了字符串。
Python\bltinmodule.c
static const char *
source_as_string(PyObject *cmd, const char *funcname, const char *what, PyCompilerFlags *cf, PyObject **cmd_copy)
{
const char *str;
Py_ssize_t size;
Py_buffer view;
*cmd_copy = NULL;
if (PyUnicode_Check(cmd)) {
cf->cf_flags |= PyCF_IGNORE_COOKIE;
str = PyUnicode_AsUTF8AndSize(cmd, &size);
if (str == NULL)
return NULL;
}
else if (PyBytes_Check(cmd)) {
str = PyBytes_AS_STRING(cmd);
size = PyBytes_GET_SIZE(cmd);
}
else if (PyByteArray_Check(cmd)) {
str = PyByteArray_AS_STRING(cmd);
size = PyByteArray_GET_SIZE(cmd);
}
else if (PyObject_GetBuffer(cmd, &view, PyBUF_SIMPLE) == 0) {
/* Copy to NUL-terminated buffer. */
*cmd_copy = PyBytes_FromStringAndSize(
(const char *)view.buf, view.len);
PyBuffer_Release(&view);
if (*cmd_copy == NULL) {
return NULL;
}
str = PyBytes_AS_STRING(*cmd_copy);
size = PyBytes_GET_SIZE(*cmd_copy);
}
else {
PyErr_Format(PyExc_TypeError,
"%s() arg 1 must be a %s object",
funcname, what);
return NULL;
}
if (strlen(str) != (size_t)size) {
PyErr_SetString(PyExc_ValueError,
"source code string cannot contain null bytes");
Py_CLEAR(*cmd_copy);
return NULL;
}
return str;
}
- 而Py_CompileStringObject方法将字符串生成PyCodeObject对象。
Python\pythonrun.c
PyObject *
Py_CompileStringObject(const char *str, PyObject *filename, int start,
PyCompilerFlags *flags, int optimize)
{
PyCodeObject *co;
mod_ty mod;
PyArena *arena = PyArena_New();
if (arena == NULL)
return NULL;
mod = PyParser_ASTFromStringObject(str, filename, start, flags, arena);
if (mod == NULL) {
PyArena_Free(arena);
return NULL;
}
if (flags && (flags->cf_flags & PyCF_ONLY_AST)) {
PyObject *result = PyAST_mod2obj(mod);
PyArena_Free(arena);
return result;
}
co = PyAST_CompileObject(mod, filename, flags, optimize, arena);
PyArena_Free(arena);
return (PyObject *)co;
}
3. 将PyCodeObject转为二进制数据的源码
- 在get_code()方法中,可以看到是由_code_to_timestamp_pyc()方法将PyCodeObject转化为二进制数据的源码。
lib\importlib\_bootstrap_external.py
def _code_to_timestamp_pyc(code, mtime=0, source_size=0):
"Produce the data for a timestamp-based pyc."
data = bytearray(MAGIC_NUMBER)
data.extend(_w_long(0))
data.extend(_w_long(mtime))
data.extend(_w_long(source_size))
data.extend(marshal.dumps(code))
return data
- 可以看到在code前添加了一些字节。
字节 | 含义 |
---|---|
MAGIC_NUMBER | 不同版本有不同的MAGIC_NUMBER 3.7版本MAGIC_NUMBER = (3394).to_bytes(2, 'little') + b'\r\n' |
0 | N/A |
mtime | .py文件最近一次修改时间。 |
source_size | 源码大小 |
- 之后会调用marshal.dumps(code)添加PyCodeObject对象的二进制字节, 在源码中对应marshal_dumps_impl。
Python\marshal.c
static PyObject *
marshal_dumps_impl(PyObject *module, PyObject *value, int version)
/*[clinic end generated code: output=9c200f98d7256cad input=a2139ea8608e9b27]*/
{
return PyMarshal_WriteObjectToString(value, version);
}
- 而marshal_dumps_impl直接调用了PyMarshal_WriteObjectToString。
Python\marshal.c
PyObject *
PyMarshal_WriteObjectToString(PyObject *x, int version)
{
WFILE wf;
memset(&wf, 0, sizeof(wf));
wf.str = PyBytes_FromStringAndSize((char *)NULL, 50);
if (wf.str == NULL)
return NULL;
wf.ptr = wf.buf = PyBytes_AS_STRING((PyBytesObject *)wf.str);
wf.end = wf.ptr + PyBytes_Size(wf.str);
wf.error = WFERR_OK;
wf.version = version;
if (w_init_refs(&wf, version)) {
Py_DECREF(wf.str);
return NULL;
}
w_object(x, &wf);
w_clear_refs(&wf);
if (wf.str != NULL) {
char *base = PyBytes_AS_STRING((PyBytesObject *)wf.str);
if (wf.ptr - base > PY_SSIZE_T_MAX) {
Py_DECREF(wf.str);
PyErr_SetString(PyExc_OverflowError,
"too much marshal data for a bytes object");
return NULL;
}
if (_PyBytes_Resize(&wf.str, (Py_ssize_t)(wf.ptr - base)) < 0)
return NULL;
}
if (wf.error != WFERR_OK) {
Py_XDECREF(wf.str);
if (wf.error == WFERR_NOMEMORY)
PyErr_NoMemory();
else
PyErr_SetString(PyExc_ValueError,
(wf.error==WFERR_UNMARSHALLABLE)?"unmarshallable object"
:"object too deeply nested to marshal");
return NULL;
}
return wf.str;
}
- 其中调用了w_object方法,而它又调用了w_complex_object方法。
Python\marshal.c
static void
w_object(PyObject *v, WFILE *p)
{
char flag = '\0';
p->depth++;
if (p->depth > MAX_MARSHAL_STACK_DEPTH) {
p->error = WFERR_NESTEDTOODEEP;
}
else if (v == NULL) {
w_byte(TYPE_NULL, p);
}
else if (v == Py_None) {
w_byte(TYPE_NONE, p);
}
else if (v == PyExc_StopIteration) {
w_byte(TYPE_STOPITER, p);
}
else if (v == Py_Ellipsis) {
w_byte(TYPE_ELLIPSIS, p);
}
else if (v == Py_False) {
w_byte(TYPE_FALSE, p);
}
else if (v == Py_True) {
w_byte(TYPE_TRUE, p);
}
else if (!w_ref(v, &flag, p))
w_complex_object(v, flag, p);
p->depth--;
}
- w_complex_object实际将PyCodeObject转化为二进制数据。
Python\marshal.c
static void
w_complex_object(PyObject *v, char flag, WFILE *p)
{
... ...
else if (PyCode_Check(v)) {
PyCodeObject *co = (PyCodeObject *)v;
W_TYPE(TYPE_CODE, p);
w_long(co->co_argcount, p);
w_long(co->co_kwonlyargcount, p);
w_long(co->co_nlocals, p);
w_long(co->co_stacksize, p);
w_long(co->co_flags, p);
w_object(co->co_code, p);
w_object(co->co_consts, p);
w_object(co->co_names, p);
w_object(co->co_varnames, p);
w_object(co->co_freevars, p);
w_object(co->co_cellvars, p);
w_object(co->co_filename, p);
w_object(co->co_name, p);
w_long(co->co_firstlineno, p);
w_object(co->co_lnotab, p);
}
... ...
}
4. 将二进制数据写入.pyc文件的源码
- 在get_code()方法中,调用了SourceFileLoader类中的_cache_bytecode()将二进制数据写入pyc文件。
lib\importlib\_bootstrap_external.py
def _cache_bytecode(self, source_path, bytecode_path, data):
# Adapt between the two APIs
mode = _calc_mode(source_path)
return self.set_data(bytecode_path, data, _mode=mode)
- 在_cache_bytecode()中調用了set_data()方法。
lib\importlib\_bootstrap_external.py
def set_data(self, path, data, *, _mode=0o666):
"""Write bytes data to a file."""
parent, filename = _path_split(path)
path_parts = []
# Figure out what directories are missing.
while parent and not _path_isdir(parent):
parent, part = _path_split(parent)
path_parts.append(part)
# Create needed directories.
for part in reversed(path_parts):
parent = _path_join(parent, part)
try:
_os.mkdir(parent)
except FileExistsError:
# Probably another Python process already created the dir.
continue
except OSError as exc:
# Could be a permission error, read-only filesystem: just forget
# about writing the data.
_bootstrap._verbose_message('could not create {!r}: {!r}',
parent, exc)
return
try:
_write_atomic(path, data, _mode)
_bootstrap._verbose_message('created {!r}', path)
except OSError as exc:
# Same as above: just don't write the bytecode.
_bootstrap._verbose_message('could not create {!r}: {!r}', path,
exc)
- 而set_data()中的关键方法_write_atomic()创建了.pyc文件。
lib\importlib\_bootstrap_external.py
def _write_atomic(path, data, mode=0o666):
"""Best-effort function to write data to a path atomically.
Be prepared to handle a FileExistsError if concurrent writing of the
temporary file is attempted."""
# id() is used to generate a pseudo-random filename.
path_tmp = '{}.{}'.format(path, id(path))
fd = _os.open(path_tmp,
_os.O_EXCL | _os.O_CREAT | _os.O_WRONLY, mode & 0o666)
try:
# We first write data to a temporary file, and then use os.replace() to
# perform an atomic rename.
with _io.FileIO(fd, 'wb') as file:
file.write(data)
_os.replace(path_tmp, path)
except OSError:
try:
_os.unlink(path_tmp)
except OSError:
pass
raise