大师兄的Python源码学习笔记(二十二): 虚拟机中的类机制(一)
大师兄的Python源码学习笔记(二十四): 虚拟机中的类机制(三)
二. 从type对象到class对象
1. 处理基类和type信息
- 在Python启动时,会对内置类型对应的PyTypeObject填充一些重要内容,这个过程从PyType_Ready开始:
Objects\typeobject.c
int
PyType_Ready(PyTypeObject *type)
{
PyObject *dict, *bases;
PyTypeObject *base;
Py_ssize_t i, n;
if (type->tp_flags & Py_TPFLAGS_READY) {
assert(_PyType_CheckConsistency(type));
return 0;
}
assert((type->tp_flags & Py_TPFLAGS_READYING) == 0);
type->tp_flags |= Py_TPFLAGS_READYING;
#ifdef Py_TRACE_REFS
/* PyType_Ready is the closest thing we have to a choke point
* for type objects, so is the best place I can think of to try
* to get type objects into the doubly-linked list of all objects.
* Still, not all type objects go through PyType_Ready.
*/
_Py_AddToAllObjects((PyObject *)type, 0);
#endif
if (type->tp_name == NULL) {
PyErr_Format(PyExc_SystemError,
"Type does not define the tp_name field.");
goto error;
}
/* Initialize tp_base (defaults to BaseObject unless that's us) */
base = type->tp_base;
if (base == NULL && type != &PyBaseObject_Type) {
base = type->tp_base = &PyBaseObject_Type;
Py_INCREF(base);
}
/* Now the only way base can still be NULL is if type is
* &PyBaseObject_Type.
*/
/* Initialize the base class */
if (base != NULL && base->tp_dict == NULL) {
if (PyType_Ready(base) < 0)
goto error;
}
/* Initialize ob_type if NULL. This means extensions that want to be
compilable separately on Windows can call PyType_Ready() instead of
initializing the ob_type field of their type objects. */
/* The test for base != NULL is really unnecessary, since base is only
NULL when type is &PyBaseObject_Type, and we know its ob_type is
not NULL (it's initialized to &PyType_Type). But coverity doesn't
know that. */
if (Py_TYPE(type) == NULL && base != NULL)
Py_TYPE(type) = Py_TYPE(base);
... ...
}
- 在这里首先尝试获得type的tp_base中指定的基类:
- 如果指定了tp_base,则使用指定的基类。
- 如果没有指定tp_base,则为其指定一个默认基类: PyBaseObject_Type,也就是<class 'object'>。
- 在获得基类后,则需要判断基类是否已经被初始化,如果没有,则先对基类进行初始化
- 最后,将设置type信息。
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
- 这里的ob_type就是metaclass。
- 一些内置class对象的基类信息如下:
class对象 | 基类信息 |
---|---|
PyType_Type | NULL |
PyInt_Type | NULL |
PyBool_Type | &PyInt_Type |
2. 处理基类列表
- 由于Python支持多重继承,所以每一个Python的class对象都会有一个基类列表,接下来PyType_Ready开始处理基类列表:
Objects\typeobject.c
int
PyType_Ready(PyTypeObject *type)
{
PyObject *dict, *bases;
PyTypeObject *base;
Py_ssize_t i, n;
... ...
/* Initialize tp_bases */
bases = type->tp_bases;
if (bases == NULL) {
if (base == NULL)
bases = PyTuple_New(0);
else
bases = PyTuple_Pack(1, base);
if (bases == NULL)
goto error;
type->tp_bases = bases;
}
... ...
}
- 如果bases为空,则将其设置为一个空的PyTuple对象。
- 如果base不为空,则将其压入bases中。
3. 填充tp_dict
- 填充tp_dict是一个复杂的过程:
Objects\typeobject.c
int
PyType_Ready(PyTypeObject *type)
{
PyObject *dict, *bases;
PyTypeObject *base;
Py_ssize_t i, n;
... ...
/* Initialize tp_dict */
dict = type->tp_dict;
if (dict == NULL) {
dict = PyDict_New();
if (dict == NULL)
goto error;
type->tp_dict = dict;
}
/* Add type-specific descriptors to tp_dict */
if (add_operators(type) < 0)
goto error;
if (type->tp_methods != NULL) {
if (add_methods(type, type->tp_methods) < 0)
goto error;
}
if (type->tp_members != NULL) {
if (add_members(type, type->tp_members) < 0)
goto error;
}
if (type->tp_getset != NULL) {
if (add_getset(type, type->tp_getset) < 0)
goto error;
}
... ...
}
- 在这个阶段完成了将__add__和&nb_add加入到tp_dict的过程。
3.1 slot与操作排序
- 在Python内部,slot可以视为表示PyTypeObject中定义的操作,一个操作对应一个slot。
- 但slot不仅仅包含一个函数指针,还包含一些其它信息。
- slot是通过slotdef结构体来实现的,它是一个全局数组。
Objects\typeobject.c
/*
Table mapping __foo__ names to tp_foo offsets and slot_tp_foo wrapper functions.
The table is ordered by offsets relative to the 'PyHeapTypeObject' structure,
which incorporates the additional structures used for numbers, sequences and
mappings. Note that multiple names may map to the same slot (e.g. __eq__,
__ne__ etc. all map to tp_richcompare) and one name may map to multiple slots
(e.g. __str__ affects tp_str as well as tp_repr). The table is terminated with
an all-zero entry. (This table is further initialized in init_slotdefs().)
*/
typedef struct wrapperbase slotdef;
Include\descrobject.h
struct wrapperbase {
const char *name;
int offset;
void *function;
wrapperfunc wrapper;
const char *doc;
int flags;
PyObject *name_strobj;
};
- 在一个slot中,存储着与PyTypeObject中一种操作相对应的各种信息。
- Python中提供了多个宏来定义一个slot,其中最近本的是TPSLOT、FLSLOT和ETSLOT:
Objects\typeobject.c
#define TPSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
{NAME, offsetof(PyTypeObject, SLOT), (void *)(FUNCTION), WRAPPER, \
PyDoc_STR(DOC)}
#define FLSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC, FLAGS) \
{NAME, offsetof(PyTypeObject, SLOT), (void *)(FUNCTION), WRAPPER, \
PyDoc_STR(DOC), FLAGS}
#define ETSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
{NAME, offsetof(PyHeapTypeObject, SLOT), (void *)(FUNCTION), WRAPPER, \
PyDoc_STR(DOC)}
- TPSLOT计算的是操作对应的函数指针在PyTypeObject中的偏移offset。
- ETSLOT计算的是函数指针在PyHeapTypeObject中的偏移量offset。
- FLSLOT与TPSLOT的区别在于增加了FLAGS参数。
- 观察PyHeapTypeObject:
Include\object.h
typedef struct _heaptypeobject {
/* Note: there's a dependency on the order of these members
in slotptr() in typeobject.c . */
PyTypeObject ht_type;
PyAsyncMethods as_async;
PyNumberMethods as_number;
PyMappingMethods as_mapping;
PySequenceMethods as_sequence; /* as_sequence comes after as_mapping,
so that the mapping wins when both
the mapping and the sequence define
a given operator (e.g. __getitem__).
see add_operators() in typeobject.c . */
PyBufferProcs as_buffer;
PyObject *ht_name, *ht_slots, *ht_qualname;
struct _dictkeysobject *ht_cached_keys;
/* here are optional user slots, followed by the members. */
} PyHeapTypeObject;
- PyHeapTypeObject中的第一个域就是PyTypeObject,所以TPSLOT和FLSLOT计算出的偏移量实际上也就是相对于PyHeapTypeObject的偏移量offset。
- 实际上,Python预先定义了slot的集合——slotdefs:
Objects\typeobject.c
static slotdef slotdefs[] = {
... ...
BINSLOT("__matmul__", nb_matrix_multiply, slot_nb_matrix_multiply,
"@"),
RBINSLOT("__rmatmul__", nb_matrix_multiply, slot_nb_matrix_multiply,
"@"),
IBSLOT("__imatmul__", nb_inplace_matrix_multiply, slot_nb_inplace_matrix_multiply,
wrap_binaryfunc, "@="),
MPSLOT("__len__", mp_length, slot_mp_length, wrap_lenfunc,
"__len__($self, /)\n--\n\nReturn len(self)."),
... ...
};
- 其中BINSLOT,MPSLOT等这些宏实际上都是对ETSLOT的简单包装:
Objects\typeobject.c
... ...
#define AMSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
ETSLOT(NAME, as_async.SLOT, FUNCTION, WRAPPER, DOC)
#define SQSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
ETSLOT(NAME, as_sequence.SLOT, FUNCTION, WRAPPER, DOC)
#define MPSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
ETSLOT(NAME, as_mapping.SLOT, FUNCTION, WRAPPER, DOC)
... ...
- 在slotdefs中可以发现,操作名和操作并不是一一对应的,对于同操作名对应不同操作的情况,在填充tp_dict时可能会出现问题。
- 为此,需要利用slot中的offset信息对slot进行排序,而这个排序的过程是在init_slotdefs中完成的:
Objects\typeobject.c
static int slotdefs_initialized = 0;
/* Initialize the slotdefs table by adding interned string objects for the
names. */
static void
init_slotdefs(void)
{
slotdef *p;
if (slotdefs_initialized)
return;
for (p = slotdefs; p->name; p++) {
/* Slots must be ordered by their offset in the PyHeapTypeObject. */
assert(!p[1].name || p->offset <= p[1].offset);
p->name_strobj = PyUnicode_InternFromString(p->name);
if (!p->name_strobj || !PyUnicode_CHECK_INTERNED(p->name_strobj))
Py_FatalError("Out of memory interning slotdef names");
}
slotdefs_initialized = 1;
}
3.2 建立联系
- 排序后的结果仍然存放在slotdefs中,虚拟机将从头到尾遍历slotdefs,基于每一个slot建立一个descriptor,然后在tp_dict中建立从操作名到descriptor的关联,这个过程在add_operators中完成:
Objects\typeobject.c
static int
add_operators(PyTypeObject *type)
{
PyObject *dict = type->tp_dict;
slotdef *p;
PyObject *descr;
void **ptr;
init_slotdefs();
for (p = slotdefs; p->name; p++) {
if (p->wrapper == NULL)
continue;
ptr = slotptr(type, p->offset);
if (!ptr || !*ptr)
continue;
if (PyDict_GetItem(dict, p->name_strobj))
continue;
if (*ptr == (void *)PyObject_HashNotImplemented) {
/* Classes may prevent the inheritance of the tp_hash
slot by storing PyObject_HashNotImplemented in it. Make it
visible as a None value for the __hash__ attribute. */
if (PyDict_SetItem(dict, p->name_strobj, Py_None) < 0)
return -1;
}
else {
descr = PyDescr_NewWrapper(type, p, *ptr);
if (descr == NULL)
return -1;
if (PyDict_SetItem(dict, p->name_strobj, descr) < 0) {
Py_DECREF(descr);
return -1;
}
Py_DECREF(descr);
}
}
if (type->tp_new != NULL) {
if (add_tp_new_wrapper(type) < 0)
return -1;
}
return 0;
}
- 在add_operators中,首先会调用init_slotdefs对操作进行排序。
- 然后遍历排序完成后的slotdefs结构体数组,通过slotptr获得每一个slot对应的操作在PyTypeObject中的函数指针。
- 在这里,虚拟机会检查在tp_dict中操作名是否已经存在,如果已经存在则不会再次建立从操作名到操作的关联。
- 接着创建descriptor,并在tp_dict中建立从操作名(slotdef.name_strobj)到操作(descriptor)的关联。
- 由于slot中存放的offset是相对于PyHeapTypeObject的偏移,而操作的真实函数指针则在PyTypeObject中指定,而且PyTypeObject和PyHeapTypeObject是不同构的,所以需要slotptr函数将slot到slot对应操作的真实函数指针进行转换:
Objects\typeobject.c
static void **
slotptr(PyTypeObject *type, int ioffset)
{
char *ptr;
long offset = ioffset;
/* Note: this depends on the order of the members of PyHeapTypeObject! */
assert(offset >= 0);
assert((size_t)offset < offsetof(PyHeapTypeObject, as_buffer));
if ((size_t)offset >= offsetof(PyHeapTypeObject, as_sequence)) {
ptr = (char *)type->tp_as_sequence;
offset -= offsetof(PyHeapTypeObject, as_sequence);
}
else if ((size_t)offset >= offsetof(PyHeapTypeObject, as_mapping)) {
ptr = (char *)type->tp_as_mapping;
offset -= offsetof(PyHeapTypeObject, as_mapping);
}
else if ((size_t)offset >= offsetof(PyHeapTypeObject, as_number)) {
ptr = (char *)type->tp_as_number;
offset -= offsetof(PyHeapTypeObject, as_number);
}
else if ((size_t)offset >= offsetof(PyHeapTypeObject, as_async)) {
ptr = (char *)type->tp_as_async;
offset -= offsetof(PyHeapTypeObject, as_async);
}
else {
ptr = (char *)type;
}
if (ptr != NULL)
ptr += offset;
return (void **)ptr;
}
- 判断从PyHeapTypeObject中排在后面的PySequenceMethods开始。
-
add_operators完成后的PyList_Type如下:
- 从PyList_Type.tp_as_mapping中延伸出去的部分是在编译时就已经确定好了的。
- 而从tp_dict中延伸出的的部分是在Python运行环境初始化时才建立的。
- PyType_Ready在通过add_operators添加PyTypeObject对象中的一些operator后,还会通过add_methods、add_members和add_getset添加在PyTypeObject中定义的tp_methods、tp_members和tp_getset函数集:
Objects\typeobject.c
int
PyType_Ready(PyTypeObject *type)
{
PyObject *dict, *bases;
PyTypeObject *base;
Py_ssize_t i, n;
... ...
/* Add type-specific descriptors to tp_dict */
if (add_operators(type) < 0)
goto error;
if (type->tp_methods != NULL) {
if (add_methods(type, type->tp_methods) < 0)
goto error;
}
if (type->tp_members != NULL) {
if (add_members(type, type->tp_members) < 0)
goto error;
}
if (type->tp_getset != NULL) {
if (add_getset(type, type->tp_getset) < 0)
goto error;
}
... ...
}
- 这些add过程与add_operators类似,不过最后添加到tp_dict中的descriptor不再是PyWrapperDescrObject,而分别是PyMethodDescrObject、PyMemberDescrObject和PyGetSetDescrObject。
3.2.1 覆盖list特殊操作的类
demo.py
>>>class A(list):
>>> def __repr__(self):
>>> return "Hello!"
>>>if __name__ == '__main__':
>>> print(f"{A()}")
Hello!
- 当调用Python魔法函数__repr__时,最终会调用tp_repr。
- 如果按照正常的布局,
demo.py
应该调用list_repr函数,但实际调用的是A.repr()。 - 这是因为在slotdefs中,有一条特殊的slot:
Objects\typeobject.c
static slotdef slotdefs[] = {
... ...
TPSLOT("__repr__", tp_repr, slot_tp_repr, wrap_unaryfunc,
"__repr__($self, /)\n--\n\nReturn repr(self)."),
... ...
- 虚拟机在初始化类时,会检查类是否的tp_dict中是否存在__repr__,并在定义<class A>时重写了__repr__操作,将其替换成slot_tp_repr。
- 所以当虚拟机执行tp_repr时,实际执行的是slot_tp_repr。
Objects\typeobject.c
static PyObject *
slot_tp_repr(PyObject *self)
{
PyObject *func, *res;
_Py_IDENTIFIER(__repr__);
int unbound;
func = lookup_maybe_method(self, &PyId___repr__, &unbound);
if (func != NULL) {
res = call_unbound_noarg(unbound, func, self);
Py_DECREF(func);
return res;
}
PyErr_Clear();
return PyUnicode_FromFormat("<%s object at %p>",
Py_TYPE(self)->tp_name, self);
}
- 在slot_tp_repr中会寻找__repr__属性对应的对象,也就是A的定义中重写的__repr__()函数,它实际上是一个PyFunctionObject对象。
-
对于A来说,其初始化结束后的内存布局如下: