Python描述器协议

Abstract

本文将从三个问题来阐述Python Descriptor：

什么是Descriptor？
Descriptor如何被调用？
Non Data Descriptor 与 Data Descriptor有何区别？

在回答完这三个问题之后，本文还将列举几个使用Descriptor的例子：

@Property原理
基于Property实现@cached_property
Python函数和方法的区别

如果之前不了解Python Descriptor，建议先过一遍Descriptor How To Guide。对于英语不过关的同学，也可以看一下中文资源。

什么是Descriptor？

class Descriptor(object):
    
    def __init__(self):
        self.val = "init"
    
    def __get__(self, obj, type=None):
        """
        return value
        """
        return self.val
    
    def __set__(self, obj, value):
        """
        return None
        """
        self.val = value
        
    def __delete__(self, obj):
        """
        return None
        """
        del self.val
        
class A(object):
    
    x = Descriptor()
    
In [2]: a = A()

In [3]: a.x
Out[3]: 'init'

In [4]: a.x = "change"

In [5]: a.x
Out[5]: 'change'

如果一个继承于object的类实现了__get__，__set_\，__delete__中其中任意一种方法，那么该类的对象就是一个descriptor，在上述例子中a.x就是一个描述器。

为什么一定要继承于object呢？因为descriptor机制只作用于新式类。

Descriptor如何被调用？

Descriptor的调用机制是基于Python的属性访问机制的，因此弄清楚Python的属性访问机制是很有必要的。

Before Descriptor

在出现Descriptor机制之前，如果访问a.x属性，则Python属性默认的查找顺序是：

After Descriptor

在出现Descriptor之后，Python对象属性的访问机制就出现了变化，下面以Python代码模拟这一个寻找过程：

def object_getattr(obj, name):
    value, cls = class_lookup(obj.__class__, name)
    
    # 首先判断是否为Data Descriptor
    if value is not None and hasattr(value, "__get__") and hasattr(value, "__set__"):
        return value.__get__(obj, cls)
    
    # 判断属性是否在对象的__dict__中（Object Attribute）
    w = obj.__dict__.get(name)
    if w is not None:
        return w
    
    if value is not None:
        
        # 判断是否为Non Data Descriptor
        if hasattr(v, "__get__"):
            return v.__get__(obj, cls)
        else:
            # 判断是否为普通的类属性
            return v
        
    raise AttributeError

    
def class_lookup(cls, name):
    value = cls.__dict__.get(name)
    if value is not None:
        return value, cls
    
    # 遍历基类寻找属性
    for i in cls.__bases__:
        value, cls = class_lookup(i, name)
        if value is not None:
            return value, cls
        
    return None, None

在上面的Python代码中很容易可以看到How to Guide中强调的Descriptor优先级是如何保证的：Data Descriptor > Object Attribute > Non Data Descriptor。

有一个想跟大家分享比较特殊的例子是，如果在obj.__dict__中找到的对象是一个descriptor，那么descriptor机制并不会被调用，即a.__dict__[‘x’] = Descriptor()，那么a.x并不会调用__get__方法。

class Descriptor(object):
    
    def __init__(self, val):
        self.val = val
        
    def __get__(self, obj, type=None):
        return self.val
    
    def __set__(self, obj, val):
        self.val = val
        
class A(object):
    
    pass

In [2]: a = A()

In [3]: a.__dict__['x'] = Descriptor("descriptor")

In [4]: a.x
Out[4]: <__main__.Descriptor at 0x10b03c750>

相比Python对象属性的访问机制，Python类属性的访问机制有一点区别：

在Python中，类其实也是一种对象，只不过类是通过MetaClass（元类）生成的，因此Python类属性的访问会将上述的class_lookup函数替换为metaclass_lookup函数。
Object Attribute的查找替换为对class.__mro__的遍历查找，并且在这个查找过程中会判断找到的对象是否拥有__get__方法，如果拥有则调用。
类属性访问机制调用的_get\_方法传入参数时，obj参数传入None，type参数传入class。

希望更详细地探究Python类属性访问机制的同学可以看这里：object-attribute-lookup-in-python

Non Data Descriptor 与 Data Descriptor有何区别？

Non Data Descriptor只需实现__get_方法，Data Descriptor需要同时实现\_get__方法和__set__方法。

如果要实现一个只读的Data Descriptor，那么只需要在__set__方法中抛出异常即可。

Data Descriptor和Non Data Descriptor最大的区别就是上述的优先级问题，当descriptor与obj.__dict__中一个属性同名时：

如果descriptor是一个Data Descriptor，那么返回descriptor.__get__的调用值。
如果descriptor是一个Non Data Descriptor，那么返回obj.__dict__中的属性值。

class NonDataDescriptor(object):
    
    def __init__(self):
        self.value = "nondata_descriptor"
        
    def __get__(self, instance, owner):
        return self.value
    
    
class DataDescriptor(object):
    
    def __init__(self):
        self.value = "data_descriptor"
        
    def __get__(self, instance, owner):
        return self.value
    
    def __set__(self, instance, value):
        self.value = value
        
        
class Container(object):
    
    d1 = NonDataDescriptor()
    d2 = DataDescriptor()
    
In [14]: c = Container()

In [15]: c.__dict__['d1'] = "nondata"

In [16]: c.__dict__['d2'] = "data"

In [17]: c.d1
Out[17]: 'nondata'

In [18]: c.d2
Out[18]: 'data_descriptor'

@Property原理

Property在Python源码中被实现为一个类，通过Property可以快速定义一个Data Descriptor。

Property的纯Python代码实现如下：

class Property(object):
    
    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        self.__doc__ = doc
        
    def __get__(self, obj, type=None):
        if obj is None:
            return self
        
        if self.fget is None:
            raise AttributeError("UnReadable Attribute.")
            
        return self.fget(obj)
    
    
    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set Attribute.")
            
        self.fset(obj, value)
        
        
    def __delete__(self, obj):
        if self.del is None:
            raise AttributeError("can't delete Attribute.")
            
        self.del(obj)
    
    def getter(self, fget):
        return type(self)(fget, fset=self.fset, fdel=self.fdel, doc=self.__doc__)
    
    def setter(self, fset):
        return type(self)(self.fget, fset, fdel=self.fdel, doc=self.__doc__)
    
    def deleter(self, fdel):
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

Property的使用方法一般有两种：

# 第一种使用方法：
@property
def a(self):
    ...
    
@a.setter
def a(self, value)
    ...
    
# 第二种使用方法
a = property()
@a.getter
def get_a():
    ...

@a.setter
def set_a():
    ...

有时候常常被@这个符号所迷惑，其实@符号基本可以等价为a = property(a)。

在使用第一种方法时，需要注意setter装饰的方法必须跟property装饰的方法同名，否则setter装饰器将不起作用，stackoverflow上也有关于这个问题的讨论。

基于Property实现@cached_property

@cached_property装饰器是一种很常用的轮子，在Django和Werkzeug中都有类似的实现。在实现的时候需要注意以下两点：

实例的更改是否会污染类变量
更新实际的value时是否同时更新了缓存中的值

这是是我一开始写的一个错误实现：

class CachedProperty(property):
    
    def __init__(self, *args, **kwargs):
        super(CachedProperty, self).__init__(*args, **kwargs)
        self._cached_property = None
        
    def __get__(self, obj, type=None):
        if obj is None:
            return self
        
        if self.fget is None:
            raise AttributeError("UnReadable Attribute.")
            
        if self._cached_property is None:
            self._cached_property = self.fget(obj)
            
        return self._cached_property
    
    
    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set Attribute.")
            
        self.fset(obj, value)
        self._cached_property = self.fget(obj)
        
    def __delete__(self, obj):
        super(CachedProperty, self).__init__(obj)
        self._cached_property = None
        
class Container(object):
    
    @CachedProperty
    def attr(self):
        return self._attr
    
    @attr.setter
    def attr(self, value):
        self._attr = value

这一种实现哪里出错了呢？它的实例污染了全局变量：

In [11]: c1 = Container()

In [12]: c1.attr = "c1 attr"

In [13]: c1.attr
Out[13]: 'c1 attr'

In [14]: c2 = Container()

In [15]: c2.attr
Out[15]: 'c1 attr'

In [16]: c2.attr = "c2 attr"

In [17]: c1.attr
Out[17]: 'c2 attr'

因此缓存的value不应该绑定在CachedProperty的实例上面，否则每一个Container的实例都能改变其他实例的attr属性，造成了类变量的污染。

正确的实现应该是每个实例的缓存绑定在各自的__dict__变量中：

class CachedProperty(property):

    def __init__(self, *args, **kwargs):
        super(CachedProperty, self).__init__(*args, **kwargs)

    def __get__(self, obj, type=None):
        if obj is None:
            return self

        if self.fget is None:
            raise AttributeError("Unreadable attribute")

        cached_key = self.fget.__name__
        if cached_key not in obj.__dict__:
            obj.__dict__[cached_key] = self.fget(obj)

        return obj.__dict__[cached_key]

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("Unset attribute")

        self.fset(obj, value)
        cached_key = self.fget.__name__
        obj.__dict__[cached_key] = self.fget(obj)

    def __delete__(self, obj):
        super(CachedProperty, self).__delete__(obj)
        del obj.__dict__[self.fget.__name__]

Python函数和方法的区别

Function在Python中被实现为一个Non Data Descriptor，以下是Python代码表示的Function：

class Function(object):
    
    def __get__(self, obj, type=None):
        return types.MethodType(self, obj, type)

当在类中定义Function的时候，如果直接访问类的__dict__变量，仍能得到一个Function object，此时拿到的还不是一个方法，因为Function 还没有跟实例绑定。

class A(object):
    
    def test(self):
        pass
    
In [2]: A.__dict__['test']
Out[2]: <function __main__.test>

当从实例调用方法时，a.test等价于types.MethodType(test, a, None)，此时返回一个bound method，即test function已经绑定了实例a。
当从类调用方法时，A.test等价于types.MethodType(test, None, A)，此时返回的是unbound method。

@classmethod

classmethod需要绑定class，以下是Python代码实现：

class classmethod(object):
    
    def __init__(self, f):
        self.f = f
        
    def __get__(self, obj, klass=None):
        if klass is None:
            klass = type(obj)
            
        def newfunc(*args, **kwargs):
            return self.f(klass, *args, **kwargs)
        
        return newfunc

因此在Python中如果一个类的方法使用了@classmethod，即使从实例调用这个方法，传进去的第一个参数仍然是class。

@staticmethod

staticmethod不需要绑定class，以下是Python代码实现：

class staticmethod(object):
    
    def __init__(self, f):
        self.f = f
        
    def __get__(self, obj, type=None)
        return self.f

参考资料：

Descriptor How To Guide
如何理解 Python 的 Descriptor?
object-attribute-lookup-in-python(强烈推荐)st=>start: a.x cond1=>condition: a.__dict__['x']? cond2=>condition: type(a).__dict__['x']? cond3=>condition: type(a)的基类.__dict__['x']? end1=>end: AttributeError end2=>end: return st->cond1 cond1(yes)->end2 cond1(no)->cond2 cond2(yes)->end2 cond2(no)->cond3 cond3(yes)->end2 cond3(no)->end1{"scale":1,"line-width":2,"line-length":50,"text-margin":10,"font-size":12}