大师兄的Python源码学习笔记(五十五): Python的内存管理机制(十)
大师兄的Python源码学习笔记(五十七): Python的内存管理机制(十二)
五、Python中的垃圾收集
3. 标记——清除方法
3.2 垃圾标记
- 在成功寻找到root object集合后,就可以开始从root object出发,沿着引用链一个一个标记不能回收的内存。
- 由于root object集合中的对象是不能回收的,所以他们直接或间接引用的对象也是不能回收的。
- 在从root object出发之前,首先要将现有的内存链表一分为二:
- 一条链表中维护root object集合,成为root链表;
- 另一条链表维护剩下的对象,成为unreachable链表。
- Python将通过move_unreachable对原始链表进行剖分:
Modules/gcmodule.c
/* Move the unreachable objects from young to unreachable. After this,
* all objects in young have gc_refs = GC_REACHABLE, and all objects in
* unreachable have gc_refs = GC_TENTATIVELY_UNREACHABLE. All tracked
* gc objects not in young or unreachable still have gc_refs = GC_REACHABLE.
* All objects in young after this are directly or indirectly reachable
* from outside the original young; and all objects in unreachable are
* not.
*/
static void
move_unreachable(PyGC_Head *young, PyGC_Head *unreachable)
{
PyGC_Head *gc = young->gc.gc_next;
/* Invariants: all objects "to the left" of us in young have gc_refs
* = GC_REACHABLE, and are indeed reachable (directly or indirectly)
* from outside the young list as it was at entry. All other objects
* from the original young "to the left" of us are in unreachable now,
* and have gc_refs = GC_TENTATIVELY_UNREACHABLE. All objects to the
* left of us in 'young' now have been scanned, and no objects here
* or to the right have been scanned yet.
*/
while (gc != young) {
PyGC_Head *next;
if (_PyGCHead_REFS(gc)) {
/* gc is definitely reachable from outside the
* original 'young'. Mark it as such, and traverse
* its pointers to find any other objects that may
* be directly reachable from it. Note that the
* call to tp_traverse may append objects to young,
* so we have to wait until it returns to determine
* the next object to visit.
*/
PyObject *op = FROM_GC(gc);
traverseproc traverse = Py_TYPE(op)->tp_traverse;
assert(_PyGCHead_REFS(gc) > 0);
_PyGCHead_SET_REFS(gc, GC_REACHABLE);
(void) traverse(op,
(visitproc)visit_reachable,
(void *)young);
next = gc->gc.gc_next;
if (PyTuple_CheckExact(op)) {
_PyTuple_MaybeUntrack(op);
}
}
else {
/* This *may* be unreachable. To make progress,
* assume it is. gc isn't directly reachable from
* any object we've already traversed, but may be
* reachable from an object we haven't gotten to yet.
* visit_reachable will eventually move gc back into
* young if that's so, and we'll see it again.
*/
next = gc->gc.gc_next;
gc_list_move(gc, unreachable);
_PyGCHead_SET_REFS(gc, GC_TENTATIVELY_UNREACHABLE);
}
gc = next;
}
}
Modules/gcmodule.c
/* A traversal callback for move_unreachable. */
static int
visit_reachable(PyObject *op, PyGC_Head *reachable)
{
if (PyObject_IS_GC(op)) {
PyGC_Head *gc = AS_GC(op);
const Py_ssize_t gc_refs = _PyGCHead_REFS(gc);
if (gc_refs == 0) {
/* This is in move_unreachable's 'young' list, but
* the traversal hasn't yet gotten to it. All
* we need to do is tell move_unreachable that it's
* reachable.
*/
_PyGCHead_SET_REFS(gc, 1);
}
else if (gc_refs == GC_TENTATIVELY_UNREACHABLE) {
/* This had gc_refs = 0 when move_unreachable got
* to it, but turns out it's reachable after all.
* Move it back to move_unreachable's 'young' list,
* and move_unreachable will eventually get to it
* again.
*/
gc_list_move(gc, reachable);
_PyGCHead_SET_REFS(gc, 1);
}
/* Else there's nothing to do.
* If gc_refs > 0, it must be in move_unreachable's 'young'
* list, and move_unreachable will eventually get to it.
* If gc_refs == GC_REACHABLE, it's either in some other
* generation so we don't care about it, or move_unreachable
* already dealt with it.
* If gc_refs == GC_UNTRACKED, it must be ignored.
*/
else {
assert(gc_refs > 0
|| gc_refs == GC_REACHABLE
|| gc_refs == GC_UNTRACKED);
}
}
return 0;
}
- 在move_unreachable中,沿着可收集对象链表依次向前,并检查其PyGC_Head.gc.gc_ref值。
- 可以看到,这里的动作是遍历链表,而不是从root object集合出发遍历引用链。
- 这将导致一个结果,就是当检查到一个对象gc_refs为0时,并不能立刻断定它就是垃圾对象,因为这个对象之后的对象链表上,也许还会遇到一个root object。
- 因此将这个对象暂时标注为GC_TENTATIVELY_UNREACHABLE,但还是通过gc_list_move将其搬倒了unreachable对象链表中。
- 当在move_unreachable中遇到一个gc_refs不为0的对象A时,可以判断A是root object或从某个root object能引用到的对象,而A所引用的所有对象也都是不可回收对象。
- 因此会调用traverse操作,依次对A中所引用的对象进行调用visit_reachable:
Modules/gcmodule.c
static void
move_unreachable(PyGC_Head *young, PyGC_Head *unreachable)
{
PyGC_Head *gc = young->gc.gc_next;
while (gc != young) {
PyGC_Head *next;
if (_PyGCHead_REFS(gc)) {
/* gc is definitely reachable from outside the
* original 'young'. Mark it as such, and traverse
* its pointers to find any other objects that may
* be directly reachable from it. Note that the
* call to tp_traverse may append objects to young,
* so we have to wait until it returns to determine
* the next object to visit.
*/
PyObject *op = FROM_GC(gc);
traverseproc traverse = Py_TYPE(op)->tp_traverse;
assert(_PyGCHead_REFS(gc) > 0);
_PyGCHead_SET_REFS(gc, GC_REACHABLE);
(void) traverse(op,
(visitproc)visit_reachable,
(void *)young);
next = gc->gc.gc_next;
if (PyTuple_CheckExact(op)) {
_PyTuple_MaybeUntrack(op);
}
}
... ...
- 如果A所引用的对象之前被标注为GC_TENTATIVELY_UNREACHABLE,但是现在通过A可以访问到它,则以为着它也是一个不可回收对象。
- 因此Python会重新将其从unreachable链表中搬移回原来的链表。
- 这里的reachable就是move_unreachable中的young,也就是root object链表。
- 从代码中可以看到,这里Python还会将这个对象的gc_refs设置为1,表示该对象是一个不可回收对象。
- 同样,在visit_reachable中,A所引用的gc_refs为0的对象的gc_refs,也会被设置为1,这意味着将链表中move_unreachable还没有访问到的对象掐断了移动到unreachable链表的诱因。
- 当move_unreachable完成后,最初的一条链表就被分成了两条链表:
- 在unreachable链表中就是发现的垃圾对象,也是垃圾回收的目标。
- 但这些垃圾对象未必都能被安全回收,问题出在一种特殊的container对象,即从类对象实例化得到的实例对象:
- 当Python定义一个class时,可以为class定义一个特殊方法:__del__,也就是finalizer。
- 当一个包含finalizer的实例被销毁时,首先会调用finalizer,因为它是开发人员提供的在对象销毁时进行某些资源释放的Hook机制。
- 现在的问题在于,最终在unreachable链表中出现的对象都是只存在循环引用的对象,需要被销毁。
- 假设在unreachable中有两个对象,对象B在finalizer中调用了对象A的某个操作,这意味着安全的垃圾回收必须保证对象A一定要在对象B之后被回收。
- 但是Python在回收垃圾时不能保证回收的顺序,所以有可能在A被销毁之后,B在销毁时访问已经不存在的A。
- 所以Python采用了一种相对保守的方法,将unreachable链表中的拥有finalizer的PyInstanceObject对象统统移到一个名为garbage的PyListObject对象中。