大师兄的Python源码学习笔记(十）: Python的编译过程

大师兄的Python源码学习笔记(九）: 自制Python
大师兄的Python源码学习笔记(十一）: Python的虚拟机框架

一、Python程序的执行过程

Python在执行程序之前，解释器(interpreter)首先要将.py文件编译为字节码。
编译后的字节码会被交给虚拟机逐条执行。
而解释器和虚拟机都包含在\win32\python37_d.dll文件中。
相较编译型语言,Python这种解释型语言更擅长跨平台，但执行效率较低。

二、编译的结果

编译结束后，会将源文件中的静态信息存储在PyCodeObject对象中，而PyCodeObject对象有可能会保存在.pyc文件中。
对于Python编译器来说，PyCodeObject对象是编译的结果，.pyc文件是硬盘的表现形式。

1. 关于PyCodeObject

为了区分成员名，Python编译器会将源码以作用域分割位数个编码块(Code Block)，并对应每个编码块(Code Block)创建一个PyCodeObject对象。
PyCodeObject的数据结构如下:

include\code.h

/* Bytecode object */
typedef struct {
    PyObject_HEAD
    int co_argcount;            /* #arguments, except *args */
    int co_kwonlyargcount;      /* #keyword only arguments */
    int co_nlocals;             /* #local variables */
    int co_stacksize;           /* #entries needed for evaluation stack */
    int co_flags;               /* CO_..., see below */
    int co_firstlineno;         /* first source line number */
    PyObject *co_code;          /* instruction opcodes */
    PyObject *co_consts;        /* list (constants used) */
    PyObject *co_names;         /* list of strings (names used) */
    PyObject *co_varnames;      /* tuple of strings (local variable names) */
    PyObject *co_freevars;      /* tuple of strings (free variable names) */
    PyObject *co_cellvars;      /* tuple of strings (cell variable names) */
    /* The rest aren't used in either hash or comparisons, except for co_name,
       used in both. This is done to preserve the name and line number
       for tracebacks and debuggers; otherwise, constant de-duplication
       would collapse identical functions/lambdas defined on different lines.
    */
    Py_ssize_t *co_cell2arg;    /* Maps cell vars which are arguments. */
    PyObject *co_filename;      /* unicode (where it was loaded from) */
    PyObject *co_name;          /* unicode (name, for reference) */
    PyObject *co_lnotab;        /* string (encoding addr<->lineno mapping) See
                                   Objects/lnotab_notes.txt for details. */
    void *co_zombieframe;       /* for optimization only (see frameobject.c) */
    PyObject *co_weakreflist;   /* to support weakrefs to code objects */
    /* Scratch space for extra data relating to the code object.
       Type is a void* to keep the format private in codeobject.c to force
       people to go through the proper APIs. */
    void *co_extra;
} PyCodeObject;

元素	含义
co_argcount	Code Block的位置参数的个数。
co_kwonlyargcount	CodeBlock中的关键参数的个数
co_nlocals	Code Block中局部变量的个数，包含位置参数的个数。
co_stacksize	执行该段Code Block需要的栈空间。
co_flags	N/A
co_firstlineno	Code Block在对应.py文件的起始行。
co_code	编辑后得到的字节码，是一个PyStringObject对象。
co_consts	保存Code Block中的所有常量，是一个PyTupleObject对象。
co_names	保存Code Block中的所有符号，是一个PyTupleObject对象。
co_varnames	Code Block中的局部变量名集合。
co_freevars	闭包保存的元素。
co_cellvars	Code Block中内部嵌套函数所引用的局部变量名集合。
co_filename	Code Block所在的文件名。
co_name	Code Block的名字，通常是函数名或者类名。
co_lnotab	字节码指令与python源代码的行号之间的对应关系。

PyCode_Type类型对象:

objects\codeobject.c

PyTypeObject PyCode_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "code",
    sizeof(PyCodeObject),
    0,
    (destructor)code_dealloc,           /* tp_dealloc */
    0,                                  /* tp_print */
    0,                                  /* tp_getattr */
    0,                                  /* tp_setattr */
    0,                                  /* tp_reserved */
    (reprfunc)code_repr,                /* tp_repr */
    0,                                  /* tp_as_number */
    0,                                  /* tp_as_sequence */
    0,                                  /* tp_as_mapping */
    (hashfunc)code_hash,                /* tp_hash */
    0,                                  /* tp_call */
    0,                                  /* tp_str */
    PyObject_GenericGetAttr,            /* tp_getattro */
    0,                                  /* tp_setattro */
    0,                                  /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT,                 /* tp_flags */
    code_doc,                           /* tp_doc */
    0,                                  /* tp_traverse */
    0,                                  /* tp_clear */
    code_richcompare,                   /* tp_richcompare */
    offsetof(PyCodeObject, co_weakreflist),     /* tp_weaklistoffset */
    0,                                  /* tp_iter */
    0,                                  /* tp_iternext */
    code_methods,                       /* tp_methods */
    code_memberlist,                    /* tp_members */
    0,                                  /* tp_getset */
    0,                                  /* tp_base */
    0,                                  /* tp_dict */
    0,                                  /* tp_descr_get */
    0,                                  /* tp_descr_set */
    0,                                  /* tp_dictoffset */
    0,                                  /* tp_init */
    0,                                  /* tp_alloc */
    code_new,                           /* tp_new */
};

2. 关于.pyc文件

.pyc文件是用于储存PyCodeObject对象的二进制文件。

demo.pyc

420d 0d0a 0000 0000 5945 6160 3400 0000
e300 0000 0000 0000 0000 0000 0003 0000
0040 0000 0073 1a00 0000 4700 6400 6401
8400 6401 8302 5a00 6402 6403 8400 5a01
6404 5300 2905 6300 0000 0000 0000 0000
... ...

如果.pyc文件存在，Python将检查.pyc文件的内部时间戳是否不早于相应的.py文件。如果是，则加载.pyc；如果不是，或者.pyc还不存在，则Python将.py文件编译为.pyc并加载它。
但.pyc文件并不总是存在，当被内部引用时(import)时，会在__pycache__文件夹下生成.pyc文件。

demo.py

class Demo:
    ...

def func():
    ...

main.py

import demo

if __name__ == '__main__':
    d = demo.Demo()
    demo.func()

执行main.py后：

xxxx/xx/xx  xx:xx    <DIR>          .
xxxx/xx/xx  xx:xx    <DIR>          ..
xxxx/xx/xx  xx:xx               351 demo.cpython-37.pyc
               1 个文件            351 字节

三. 在Python中访问PyCodeObject对象

Python中对应PyCodeObject的对象为code对象。
通过compile()函数可以获得一个code对象，并访问其属性。

>>>co = compile("s = 'Hello World!'",'demo.py','exec')
>>>print(type(co))
>>>for c in dir(co):
>>>    print(f"{c}={eval('co.'+c)}")
<class 'code'>
__class__=<class 'code'>
__delattr__=<method-wrapper '__delattr__' of code object at 0x0000020A1D9BD8A0>
__dir__=<built-in method __dir__ of code object at 0x0000020A1D9BD8A0>
__doc__=code(argcount, kwonlyargcount, nlocals, stacksize, flags, codestring,
      constants, names, varnames, filename, name, firstlineno,
      lnotab[, freevars[, cellvars]])

Create a code object.  Not for the faint of heart.
__eq__=<method-wrapper '__eq__' of code object at 0x0000020A1D9BD8A0>
__format__=<built-in method __format__ of code object at 0x0000020A1D9BD8A0>
__ge__=<method-wrapper '__ge__' of code object at 0x0000020A1D9BD8A0>
__getattribute__=<method-wrapper '__getattribute__' of code object at 0x0000020A1D9BD8A0>
__gt__=<method-wrapper '__gt__' of code object at 0x0000020A1D9BD8A0>
__hash__=<method-wrapper '__hash__' of code object at 0x0000020A1D9BD8A0>
__init__=<method-wrapper '__init__' of code object at 0x0000020A1D9BD8A0>
__init_subclass__=<built-in method __init_subclass__ of type object at 0x00007FF848BFB8C0>
__le__=<method-wrapper '__le__' of code object at 0x0000020A1D9BD8A0>
__lt__=<method-wrapper '__lt__' of code object at 0x0000020A1D9BD8A0>
__ne__=<method-wrapper '__ne__' of code object at 0x0000020A1D9BD8A0>
__new__=<built-in method __new__ of type object at 0x00007FF848BFB8C0>
__reduce__=<built-in method __reduce__ of code object at 0x0000020A1D9BD8A0>
__reduce_ex__=<built-in method __reduce_ex__ of code object at 0x0000020A1D9BD8A0>
__repr__=<method-wrapper '__repr__' of code object at 0x0000020A1D9BD8A0>
__setattr__=<method-wrapper '__setattr__' of code object at 0x0000020A1D9BD8A0>
__sizeof__=<built-in method __sizeof__ of code object at 0x0000020A1D9BD8A0>
__str__=<method-wrapper '__str__' of code object at 0x0000020A1D9BD8A0>
__subclasshook__=<built-in method __subclasshook__ of type object at 0x00007FF848BFB8C0>
co_argcount=0
co_cellvars=()
co_code=b'd\x00Z\x00d\x01S\x00'
co_consts=('Hello World!', None)
co_filename=demo.py
co_firstlineno=1
co_flags=64
co_freevars=()
co_kwonlyargcount=0
co_lnotab=b''
co_name=<module>
co_names=('s',)
co_nlocals=0
co_stacksize=1
co_varnames=()

四、.pyc文件的生成过程

1. .pyc文件和PyCodeObject对象的生成流程

编译器首先调用load_module()函数。

lib\imp.py

def load_module(name, file, filename, details):
    suffix, mode, type_ = details
    if mode and (not mode.startswith(('r', 'U')) or '+' in mode):
        raise ValueError('invalid file open mode {!r}'.format(mode))
    elif file is None and type_ in {PY_SOURCE, PY_COMPILED}:
        msg = 'file object required for import (type code {})'.format(type_)
        raise ValueError(msg)
    elif type_ == PY_SOURCE:
        return load_source(name, filename, file)
    elif type_ == PY_COMPILED:
        return load_compiled(name, filename, file)
    elif type_ == C_EXTENSION and load_dynamic is not None:
        if file is None:
            with open(filename, 'rb') as opened_file:
                return load_dynamic(name, filename, opened_file)
        else:
            return load_dynamic(name, filename, file)
    elif type_ == PKG_DIRECTORY:
        return load_package(name, filename)
    elif type_ == C_BUILTIN:
        return init_builtin(name)
    elif type_ == PY_FROZEN:
        return init_frozen(name)
    else:
        msg =  "Don't know how to import {} (type code {})".format(name, type_)
        raise ImportError(msg, name=name)

后根据条件判断调用load_source()函数。

lib\imp.py

def load_source(name, pathname, file=None):
    loader = _LoadSourceCompatibility(name, pathname, file)
    spec = util.spec_from_file_location(name, pathname, loader=loader)
    if name in sys.modules:
        module = _exec(spec, sys.modules[name])
    else:
        module = _load(spec)
    # To allow reloading to potentially work, use a non-hacked loader which
    # won't rely on a now-closed file object.
    module.__loader__ = machinery.SourceFileLoader(name, pathname)
    module.__spec__.loader = module.__loader__
    return module

再调用_load()函数。

lib\importlib\_bootstrap.py

def _load(spec):
    """Return a new module object, loaded by the spec's loader.

    The module is not added to its parent.

    If a module is already in sys.modules, that existing module gets
    clobbered.

    """
    with _ModuleLockManager(spec.name):
        return _load_unlocked(spec)

此处实际调用了_frozen_importlib。

Python\importlib.h

/* Auto-generated by Programs/_freeze_importlib.c */
const unsigned char _Py_M__importlib[] = {
   99,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,
   0,64,0,0,0,115,208,1,0,0,100,0,90,0,100,1,
   97,1,100,2,100,3,132,0,90,2,100,4,100,5,132,0,
... ...

再调用_load_unlocked()函数

lib\importlib\_bootstrap.py

def _load_unlocked(spec):
    # A helper for direct use by the import system.
    if spec.loader is not None:
        # not a namespace package
        if not hasattr(spec.loader, 'exec_module'):
            return _load_backward_compatible(spec)

    module = module_from_spec(spec)
    with _installed_safely(module):
        if spec.loader is None:
            if spec.submodule_search_locations is None:
                raise ImportError('missing loader', name=spec.name)
            # A namespace package so do nothing.
        else:
            spec.loader.exec_module(module)

    # We don't ensure that the import-related module attributes get
    # set in the sys.modules replacement case.  Such modules are on
    # their own.
    return sys.modules[spec.name]

再调用_LoaderBasics类中的exec_module()方法。

lib\importlib\_bootstrap_external.py

def exec_module(self, module):
        """Execute the module."""
        code = self.get_code(module.__name__)
        if code is None:
            raise ImportError('cannot load module {!r} when get_code() '
                              'returns None'.format(module.__name__))
        _bootstrap._call_with_frames_removed(exec, code, module.__dict__)

最后调用SourceLoader(_LoaderBasics)类中的get_code()方法，生成PyCodeObject和.pyc字节码文件。

lib\importlib\_bootstrap_external.py

  def get_code(self, fullname):
        """Concrete implementation of InspectLoader.get_code.

        Reading of bytecode requires path_stats to be implemented. To write
        bytecode, set_data must also be implemented.

        """
        source_path = self.get_filename(fullname)
        source_mtime = None
        source_bytes = None
        source_hash = None
        hash_based = False
        check_source = True
        try:
            bytecode_path = cache_from_source(source_path)
        except NotImplementedError:
            bytecode_path = None
        else:
            try:
                st = self.path_stats(source_path)
            except OSError:
                pass
            else:
                source_mtime = int(st['mtime'])
                try:
                    data = self.get_data(bytecode_path)
                except OSError:
                    pass
                else:
                    exc_details = {
                        'name': fullname,
                        'path': bytecode_path,
                    }
                    try:
                        flags = _classify_pyc(data, fullname, exc_details)
                        bytes_data = memoryview(data)[16:]
                        hash_based = flags & 0b1 != 0
                        if hash_based:
                            check_source = flags & 0b10 != 0
                            if (_imp.check_hash_based_pycs != 'never' and
                                (check_source or
                                 _imp.check_hash_based_pycs == 'always')):
                                source_bytes = self.get_data(source_path)
                                source_hash = _imp.source_hash(
                                    _RAW_MAGIC_NUMBER,
                                    source_bytes,
                                )
                                _validate_hash_pyc(data, source_hash, fullname,
                                                   exc_details)
                        else:
                            _validate_timestamp_pyc(
                                data,
                                source_mtime,
                                st['size'],
                                fullname,
                                exc_details,
                            )
                    except (ImportError, EOFError):
                        pass
                    else:
                        _bootstrap._verbose_message('{} matches {}', bytecode_path,
                                                    source_path)
                        return _compile_bytecode(bytes_data, name=fullname,
                                                 bytecode_path=bytecode_path,
                                                 source_path=source_path)
        if source_bytes is None:
            source_bytes = self.get_data(source_path)
        code_object = self.source_to_code(source_bytes, source_path)
        _bootstrap._verbose_message('code object from {}', source_path)
        if (not sys.dont_write_bytecode and bytecode_path is not None and
                source_mtime is not None):
            if hash_based:
                if source_hash is None:
                    source_hash = _imp.source_hash(source_bytes)
                data = _code_to_hash_pyc(code_object, source_hash, check_source)
            else:
                data = _code_to_timestamp_pyc(code_object, source_mtime,
                                              len(source_bytes))
            try:
                self._cache_bytecode(source_path, bytecode_path, data)
                _bootstrap._verbose_message('wrote {!r}', bytecode_path)
            except NotImplementedError:
                pass
        return code_object

2. 生成PyCodeObject的源码

生成PyCodeObject的源码对应builtin_compile_impl方法。

Python\bltinmodule.c

static PyObject *
builtin_compile_impl(PyObject *module, PyObject *source, PyObject *filename,
                     const char *mode, int flags, int dont_inherit,
                     int optimize)
/*[clinic end generated code: output=1fa176e33452bb63 input=0ff726f595eb9fcd]*/
{
    PyObject *source_copy;
    const char *str;
    int compile_mode = -1;
    int is_ast;
    PyCompilerFlags cf;
    int start[] = {Py_file_input, Py_eval_input, Py_single_input};
    PyObject *result;

    cf.cf_flags = flags | PyCF_SOURCE_IS_UTF8;

    if (flags &
        ~(PyCF_MASK | PyCF_MASK_OBSOLETE | PyCF_DONT_IMPLY_DEDENT | PyCF_ONLY_AST))
    {
        PyErr_SetString(PyExc_ValueError,
                        "compile(): unrecognised flags");
        goto error;
    }
    /* XXX Warn if (supplied_flags & PyCF_MASK_OBSOLETE) != 0? */

    if (optimize < -1 || optimize > 2) {
        PyErr_SetString(PyExc_ValueError,
                        "compile(): invalid optimize value");
        goto error;
    }

    if (!dont_inherit) {
        PyEval_MergeCompilerFlags(&cf);
    }

    if (strcmp(mode, "exec") == 0)
        compile_mode = 0;
    else if (strcmp(mode, "eval") == 0)
        compile_mode = 1;
    else if (strcmp(mode, "single") == 0)
        compile_mode = 2;
    else {
        PyErr_SetString(PyExc_ValueError,
                        "compile() mode must be 'exec', 'eval' or 'single'");
        goto error;
    }

    is_ast = PyAST_Check(source);
    if (is_ast == -1)
        goto error;
    if (is_ast) {
        if (flags & PyCF_ONLY_AST) {
            Py_INCREF(source);
            result = source;
        }
        else {
            PyArena *arena;
            mod_ty mod;

            arena = PyArena_New();
            if (arena == NULL)
                goto error;
            mod = PyAST_obj2mod(source, arena, compile_mode);
            if (mod == NULL) {
                PyArena_Free(arena);
                goto error;
            }
            if (!PyAST_Validate(mod)) {
                PyArena_Free(arena);
                goto error;
            }
            result = (PyObject*)PyAST_CompileObject(mod, filename,
                                                    &cf, optimize, arena);
            PyArena_Free(arena);
        }
        goto finally;
    }

    str = source_as_string(source, "compile", "string, bytes or AST", &cf, &source_copy);
    if (str == NULL)
        goto error;

    result = Py_CompileStringObject(str, filename, start[compile_mode], &cf, optimize);
    Py_XDECREF(source_copy);
    goto finally;

error:
    result = NULL;
finally:
    Py_DECREF(filename);
    return result;
}

在上面代码的最后部分，调用了source_as_string方法将.py源码生成了字符串。

Python\bltinmodule.c

static const char *
source_as_string(PyObject *cmd, const char *funcname, const char *what, PyCompilerFlags *cf, PyObject **cmd_copy)
{
    const char *str;
    Py_ssize_t size;
    Py_buffer view;

    *cmd_copy = NULL;
    if (PyUnicode_Check(cmd)) {
        cf->cf_flags |= PyCF_IGNORE_COOKIE;
        str = PyUnicode_AsUTF8AndSize(cmd, &size);
        if (str == NULL)
            return NULL;
    }
    else if (PyBytes_Check(cmd)) {
        str = PyBytes_AS_STRING(cmd);
        size = PyBytes_GET_SIZE(cmd);
    }
    else if (PyByteArray_Check(cmd)) {
        str = PyByteArray_AS_STRING(cmd);
        size = PyByteArray_GET_SIZE(cmd);
    }
    else if (PyObject_GetBuffer(cmd, &view, PyBUF_SIMPLE) == 0) {
        /* Copy to NUL-terminated buffer. */
        *cmd_copy = PyBytes_FromStringAndSize(
            (const char *)view.buf, view.len);
        PyBuffer_Release(&view);
        if (*cmd_copy == NULL) {
            return NULL;
        }
        str = PyBytes_AS_STRING(*cmd_copy);
        size = PyBytes_GET_SIZE(*cmd_copy);
    }
    else {
        PyErr_Format(PyExc_TypeError,
          "%s() arg 1 must be a %s object",
          funcname, what);
        return NULL;
    }

    if (strlen(str) != (size_t)size) {
        PyErr_SetString(PyExc_ValueError,
                        "source code string cannot contain null bytes");
        Py_CLEAR(*cmd_copy);
        return NULL;
    }
    return str;
}

而Py_CompileStringObject方法将字符串生成PyCodeObject对象。

Python\pythonrun.c

PyObject *
Py_CompileStringObject(const char *str, PyObject *filename, int start,
                       PyCompilerFlags *flags, int optimize)
{
    PyCodeObject *co;
    mod_ty mod;
    PyArena *arena = PyArena_New();
    if (arena == NULL)
        return NULL;

    mod = PyParser_ASTFromStringObject(str, filename, start, flags, arena);
    if (mod == NULL) {
        PyArena_Free(arena);
        return NULL;
    }
    if (flags && (flags->cf_flags & PyCF_ONLY_AST)) {
        PyObject *result = PyAST_mod2obj(mod);
        PyArena_Free(arena);
        return result;
    }
    co = PyAST_CompileObject(mod, filename, flags, optimize, arena);
    PyArena_Free(arena);
    return (PyObject *)co;
}

3. 将PyCodeObject转为二进制数据的源码

在get_code()方法中，可以看到是由_code_to_timestamp_pyc()方法将PyCodeObject转化为二进制数据的源码。

lib\importlib\_bootstrap_external.py

def _code_to_timestamp_pyc(code, mtime=0, source_size=0):
    "Produce the data for a timestamp-based pyc."
    data = bytearray(MAGIC_NUMBER)
    data.extend(_w_long(0))
    data.extend(_w_long(mtime))
    data.extend(_w_long(source_size))
    data.extend(marshal.dumps(code))
    return data

可以看到在code前添加了一些字节。

字节	含义
MAGIC_NUMBER	不同版本有不同的MAGIC_NUMBER 3.7版本MAGIC_NUMBER = (3394).to_bytes(2, 'little') + b'\r\n'
0	N/A
mtime	.py文件最近一次修改时间。
source_size	源码大小

之后会调用marshal.dumps(code)添加PyCodeObject对象的二进制字节, 在源码中对应marshal_dumps_impl。

Python\marshal.c

static PyObject *
marshal_dumps_impl(PyObject *module, PyObject *value, int version)
/*[clinic end generated code: output=9c200f98d7256cad input=a2139ea8608e9b27]*/
{
    return PyMarshal_WriteObjectToString(value, version);
}

而marshal_dumps_impl直接调用了PyMarshal_WriteObjectToString。

Python\marshal.c

PyObject *
PyMarshal_WriteObjectToString(PyObject *x, int version)
{
    WFILE wf;

    memset(&wf, 0, sizeof(wf));
    wf.str = PyBytes_FromStringAndSize((char *)NULL, 50);
    if (wf.str == NULL)
        return NULL;
    wf.ptr = wf.buf = PyBytes_AS_STRING((PyBytesObject *)wf.str);
    wf.end = wf.ptr + PyBytes_Size(wf.str);
    wf.error = WFERR_OK;
    wf.version = version;
    if (w_init_refs(&wf, version)) {
        Py_DECREF(wf.str);
        return NULL;
    }
    w_object(x, &wf);
    w_clear_refs(&wf);
    if (wf.str != NULL) {
        char *base = PyBytes_AS_STRING((PyBytesObject *)wf.str);
        if (wf.ptr - base > PY_SSIZE_T_MAX) {
            Py_DECREF(wf.str);
            PyErr_SetString(PyExc_OverflowError,
                            "too much marshal data for a bytes object");
            return NULL;
        }
        if (_PyBytes_Resize(&wf.str, (Py_ssize_t)(wf.ptr - base)) < 0)
            return NULL;
    }
    if (wf.error != WFERR_OK) {
        Py_XDECREF(wf.str);
        if (wf.error == WFERR_NOMEMORY)
            PyErr_NoMemory();
        else
            PyErr_SetString(PyExc_ValueError,
              (wf.error==WFERR_UNMARSHALLABLE)?"unmarshallable object"
               :"object too deeply nested to marshal");
        return NULL;
    }
    return wf.str;
}

其中调用了w_object方法，而它又调用了w_complex_object方法。

Python\marshal.c

static void
w_object(PyObject *v, WFILE *p)
{
    char flag = '\0';

    p->depth++;

    if (p->depth > MAX_MARSHAL_STACK_DEPTH) {
        p->error = WFERR_NESTEDTOODEEP;
    }
    else if (v == NULL) {
        w_byte(TYPE_NULL, p);
    }
    else if (v == Py_None) {
        w_byte(TYPE_NONE, p);
    }
    else if (v == PyExc_StopIteration) {
        w_byte(TYPE_STOPITER, p);
    }
    else if (v == Py_Ellipsis) {
        w_byte(TYPE_ELLIPSIS, p);
    }
    else if (v == Py_False) {
        w_byte(TYPE_FALSE, p);
    }
    else if (v == Py_True) {
        w_byte(TYPE_TRUE, p);
    }
    else if (!w_ref(v, &flag, p))
        w_complex_object(v, flag, p);

    p->depth--;
}

w_complex_object实际将PyCodeObject转化为二进制数据。

Python\marshal.c

static void
w_complex_object(PyObject *v, char flag, WFILE *p)
{
    ... ...
    else if (PyCode_Check(v)) {
        PyCodeObject *co = (PyCodeObject *)v;
        W_TYPE(TYPE_CODE, p);
        w_long(co->co_argcount, p);
        w_long(co->co_kwonlyargcount, p);
        w_long(co->co_nlocals, p);
        w_long(co->co_stacksize, p);
        w_long(co->co_flags, p);
        w_object(co->co_code, p);
        w_object(co->co_consts, p);
        w_object(co->co_names, p);
        w_object(co->co_varnames, p);
        w_object(co->co_freevars, p);
        w_object(co->co_cellvars, p);
        w_object(co->co_filename, p);
        w_object(co->co_name, p);
        w_long(co->co_firstlineno, p);
        w_object(co->co_lnotab, p);
    }
    ... ...
}

4. 将二进制数据写入.pyc文件的源码

在get_code()方法中，调用了SourceFileLoader类中的_cache_bytecode()将二进制数据写入pyc文件。

lib\importlib\_bootstrap_external.py

    def _cache_bytecode(self, source_path, bytecode_path, data):
        # Adapt between the two APIs
        mode = _calc_mode(source_path)
        return self.set_data(bytecode_path, data, _mode=mode)

在_cache_bytecode()中調用了set_data()方法。

lib\importlib\_bootstrap_external.py

    def set_data(self, path, data, *, _mode=0o666):
        """Write bytes data to a file."""
        parent, filename = _path_split(path)
        path_parts = []
        # Figure out what directories are missing.
        while parent and not _path_isdir(parent):
            parent, part = _path_split(parent)
            path_parts.append(part)
        # Create needed directories.
        for part in reversed(path_parts):
            parent = _path_join(parent, part)
            try:
                _os.mkdir(parent)
            except FileExistsError:
                # Probably another Python process already created the dir.
                continue
            except OSError as exc:
                # Could be a permission error, read-only filesystem: just forget
                # about writing the data.
                _bootstrap._verbose_message('could not create {!r}: {!r}',
                                            parent, exc)
                return
        try:
            _write_atomic(path, data, _mode)
            _bootstrap._verbose_message('created {!r}', path)
        except OSError as exc:
            # Same as above: just don't write the bytecode.
            _bootstrap._verbose_message('could not create {!r}: {!r}', path,
                                        exc)

而set_data()中的关键方法_write_atomic()创建了.pyc文件。

lib\importlib\_bootstrap_external.py

def _write_atomic(path, data, mode=0o666):
    """Best-effort function to write data to a path atomically.
    Be prepared to handle a FileExistsError if concurrent writing of the
    temporary file is attempted."""
    # id() is used to generate a pseudo-random filename.
    path_tmp = '{}.{}'.format(path, id(path))
    fd = _os.open(path_tmp,
                  _os.O_EXCL | _os.O_CREAT | _os.O_WRONLY, mode & 0o666)
    try:
        # We first write data to a temporary file, and then use os.replace() to
        # perform an atomic rename.
        with _io.FileIO(fd, 'wb') as file:
            file.write(data)
        _os.replace(path_tmp, path)
    except OSError:
        try:
            _os.unlink(path_tmp)
        except OSError:
            pass
        raise

最后编辑于：2021.04.11 11:12:01

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,772评论 6赞 477
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,458评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,610评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,640评论 1赞 276
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,657评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,590评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,962评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,631评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,870评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,611评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,704评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,386评论 4赞 319
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,969评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,944评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,179评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 44,742评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,440评论 2赞 342

大师兄的Python源码学习笔记(十）: Python的编译过程

一、Python程序的执行过程

二、编译的结果

1. 关于PyCodeObject

2. 关于.pyc文件

三. 在Python中访问PyCodeObject对象

四、.pyc文件的生成过程

1. .pyc文件和PyCodeObject对象的生成流程

2. 生成PyCodeObject的源码

3. 将PyCodeObject转为二进制数据的源码

4. 将二进制数据写入.pyc文件的源码

推荐阅读更多精彩内容