1. 介绍
空间自相关描述了空间上一系列点的属性分布模式。正相关反映了属性值在空间上的相似性,负相关则反映了差异性。这种自相关性显然不是随机产生的。
空间自相关有两种分析方式——全局自相关和局部自相关。顾名思义,全局自相关研究整个地图上属性分布模式的聚类情况。局部自相关则将关注点放在了整体的内部,去识别聚类或是热点,它可能和整体聚类模式相符,也可能反映了和整体模式相反的异质性。
2. 全局自相关
PySAL有5种全局自相关检验:Gamma值、Join Count、Moran’s I、Geary’s C、和Getis and Ord’s G。
2.1 Gamma值
2.2 Join Count
2.3 Moran's I
对n个空间单位中,属性y的的I值计算公式如下:
其中w_ij是空间权重,z_i = y_i-\bar{y},S_0=ΣΣw_ij。
下面以圣路易斯周围78个郡的蓄意谋杀率的案例研究为例:
先读取数据:
>>> import pysal
>>> import numpy as np
>>> f = pysal.open(pysal.examples.get_path("stl_hom.txt"))
>>> y = np.array(f.by_col['HR8893'])
下面创建权重矩阵:
>>> w = pysal.open(pysal.examples.get_path("stl.gal")).read()
获取Moran's I的实例:
>>> mi = pysal.Moran(y, w, two_tailed=False)
>>> "%.3f"%mi.I
'0.244'
>>> mi.EI
-0.012987012987012988
>>> "%.5f"%mi.p_norm
'0.00014'
从结果可以看出,在假定谋杀率服从正态分布的情况下,观察值远高于期望值。
下面我们来看看mi对象的参数和属性:
class Moran(builtins.object)
| Moran's I Global Autocorrelation Statistic
|
| Parameters
| ----------
|
| y : array
| variable measured across n spatial units
| w : W
| spatial weights instance
| transformation : string
| weights transformation, default is row-standardized "r".
| Other options include "B": binary, "D":
| doubly-standardized, "U": untransformed
| (general weights), "V": variance-stabilizing.
| permutations : int
| number of random permutations for calculation of
| pseudo-p_values
| two_tailed : boolean
| If True (default) analytical p-values for Moran are two
| tailed, otherwise if False, they are one-tailed.
|
| Attributes
| ----------
| y : array
| original variable
| w : W
| original w object
| permutations : int
| number of permutations
| I : float
| value of Moran's I
| EI : float
| expected value under normality assumption
| VI_norm : float
| variance of I under normality assumption
| seI_norm : float
| standard deviation of I under normality assumption
| z_norm : float
| z-value of I under normality assumption
| p_norm : float
| p-value of I under normality assumption
| VI_rand : float
| variance of I under randomization assumption
| seI_rand : float
| standard deviation of I under randomization assumption
| z_rand : float
| z-value of I under randomization assumption
| p_rand : float
| p-value of I under randomization assumption
| two_tailed : boolean
| If True p_norm and p_rand are two-tailed, otherwise they
| are one-tailed.
| sim : array
| (if permutations>0)
| vector of I values for permuted samples
| p_sim : array
| (if permutations>0)
| p-value based on permutations (one-tailed)
| null: spatial randomness
| alternative: the observed I is extreme if
| it is either extremely greater or extremely lower
| than the values obtained based on permutations
| EI_sim : float
| (if permutations>0)
| average value of I from permutations
| VI_sim : float
| (if permutations>0)
| variance of I from permutations
| seI_sim : float
| (if permutations>0)
| standard deviation of I under permutations.
| z_sim : float
| (if permutations>0)
| standardized I based on permutations
| p_z_sim : float
| (if permutations>0)
| p-value based on standard normal approximation from
| permutations
哦,原来我们的推断不一定要基于正态假设,也可以基于随机置换的一组值。
>>> np.random.seed(10)
>>> mir = pysal.Moran(y, w, permutations = 9999)
>>> // 得到伪p值:
>>> print(mir.p_sim)
0.0022
关于置换检验
这里观察到21个置换值和实际的I值一样极端,因此p值就是(21+1)/(9999+1)=0.0022。或者,我们也可使用Z变换,将置换检验中得到的I值和原来的I值比较:
>>> print(mir.EI_sim)
-0.0118217511619
>>> print(mir.z_sim)
4.55451777821
>>> print(mir.p_z_sim)
2.62529422013e-06
如果我们的研究变量y是基于不同数量总体的比率,那么y的I值就需要调整,以解释总体间的差异。
为了进行这项调整,我们需要创建一个Moran_Rate的实例,而不是Moran的实例。比方说,我们打算估计死于婴儿瘁死综合症的新生儿比率。下面先读取数据,再创建W实例和Moran_Rate实例。
>>> f = pysal.open(pysal.examples.get_path("sids2.dbf"))
>>> b = np.array(f.by_col('BIR79'))
>>> e = np.array(f.by_col('SID79'))
>>> w = pysal.open(pysal.examples.get_path("sids2.gal")).read()
>>> mi = pysal.esda.moran.Moran_Rate(e, b, w, two_tailed=False)
>>> "%6.4f" % mi.I
'0.1662'
>>> "%6.4f" % mi.EI
'-0.0101'
>>> "%6.4f" % mi.p_norm
'0.0042'
显然,在调整后,I值远高于期望值。
2.4 Geary's C
2.5 Getis and Ord's G
3. 局部自相关
要定量地度量局部自相关性,PySAL提供了Moran's I和Getis and Ord's G的空间关联局部指标(LISA)。
3.1 Local Moran's I
先给出局部自相关I值的公式:
可以计算出n个局部空间自相关I值,每个空间单元1个。依然拿圣路易斯为例,LISA统计量可以这样获得:
>>> f = pysal.open(pysal.examples.get_path("stl_hom.txt"))
>>> y = np.array(f.by_col['HR8893'])
>>> w = pysal.open(pysal.examples.get_path("stl.gal")).read()
>>> np.random.seed(12345)
>>> lm = pysal.Moran_Local(y,w)
>>> lm.n
78
>>> len(lm.Is)
78
78个LISA存储在向量lm.ls中。对这些值的推断由条件随机获得(除了i以外的n-1个空间单元被用来生成i的LISA统计量),从而得到每个LISA的伪p值。
>>> lm.p_sim
array([ 0.176, 0.073, 0.405, 0.267, 0.332, 0.057, 0.296, 0.242,
0.055, 0.062, 0.273, 0.488, 0.44 , 0.354, 0.415, 0.478,
0.473, 0.374, 0.415, 0.21 , 0.161, 0.025, 0.338, 0.375,
0.285, 0.374, 0.208, 0.3 , 0.373, 0.411, 0.478, 0.414,
0.009, 0.429, 0.269, 0.015, 0.005, 0.002, 0.077, 0.001,
0.088, 0.459, 0.435, 0.365, 0.231, 0.017, 0.033, 0.04 ,
0.068, 0.101, 0.284, 0.309, 0.113, 0.457, 0.045, 0.269,
0.118, 0.346, 0.328, 0.379, 0.342, 0.39 , 0.376, 0.467,
0.357, 0.241, 0.26 , 0.401, 0.185, 0.172, 0.248, 0.4 ,
0.482, 0.159, 0.373, 0.455, 0.083, 0.128])
numpy的索引可以帮助我们得到显著的p值:
>>> sig = lm.p_sim<0.05
>>> lm.p_sim[sig]
array([ 0.025, 0.009, 0.015, 0.005, 0.002, 0.001, 0.017, 0.033,
0.04 , 0.045])
也可以获得这些显著值在莫兰指数散点图中的象限:
(1:HH高聚类,2:LH高包围低,3:LL低值聚类,4:HL低包围高)
>>> lm.q[sig]
array([4, 1, 3, 1, 3, 1, 1, 3, 3, 3])
和全局莫兰指数一样,如果是比率,也要进行调整,同样以新生儿数据为例:
>>> f = pysal.open(pysal.examples.get_path("sids2.dbf"))
>>> b = np.array(f.by_col('BIR79'))
>>> e = np.array(f.by_col('SID79'))
>>> w = pysal.open(pysal.examples.get_path("sids2.gal")).read()
>>> np.random.seed(12345)
>>> lm = pysal.esda.moran.Moran_Local_Rate(e, b, w)
>>> lm.Is[:10]
array([-0.13452366, -1.21133985, 0.05019761, 0.06127125, -0.12627466,
0.23497679, 0.26345855, -0.00951288, -0.01517879, -0.34513514])
获取显著的p值:
>>> sig = lm.p_sim<0.05
>>> lm.p_sim[sig]
array([ 0.021, 0.04 , 0.047, 0.015, 0.001, 0.017, 0.032, 0.031,
0.019, 0.014, 0.004, 0.048, 0.003])