通过一致的方式遍历序列。这个特性是通过迭代器协议实现的,迭代器协议是一种令对象可遍历的通用方式。
In [1]: some_dict = {'a': 1, 'b': 2, 'c':3}
In [2]: for key in some_dict:
...: print(key)
...:
a
b
c
当写下 for key in some_dict
时,Python解释器尝试根据 some_dict 生成一个迭代器。
In [3]: dict_iterator = iter(some_dict)
In [4]: dict_iterator
Out[4]: <dict_keyiterator at 0x501d318>
大部分以列表或列表型对象为参数的方法都可以接收任意迭代器对象。
In [5]: tuple(dict_iterator)
Out[5]: ('a', 'b', 'c')
生成器构造新的可遍历对象。创建生成器时将函数中返回关键字 return 换为 yield。实际调用生成器时,代码不会立即执行,直到请求生成器中的元素时,他才会执行她的代码。
In [7]: def squares(n=10):
...: print('Generating squares from 1 to {0}'.format(n ** 2))
...: for i in range(1, n + 1):
...: yield i ** 2
...:
In [8]: gen = squares()
In [9]: gen
Out[9]: <generator object squares at 0x0000000005099048>
In [12]: gen = squares()
In [13]: for x in gen:
...: print(x, end = ' ')
...:
Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100
In [14]: gen = squares()
In [15]: for x in gen:
...: print(x, end = ',')
...:
Generating squares from 1 to 100
1,4,9,16,25,36,49,64,81,100,
生成器表达式,将列表推导式中的中括号替换为小括号。
In [16]: gen = (x ** 2 for x in range(100))
In [17]: gen
Out[17]: <generator object <genexpr> at 0x000000000522E7D8>
等价于:
In [18]: def _make_gen():
...: for x in range(100):
...: yield x ** 2
...:
In [19]: gen = _make_gen()
有时,生成器表达式可以作为函数参数替代列表推导式。
In [20]: sum(x ** 2 for x in range(100))
Out[20]: 328350
In [21]: dict((i, i ** 2) for i in range(5))
Out[21]: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
itertools 模块
标准库中 itertools 模块适用于大多数数据算法的生成器集合。如,groupby 根据任意的序列和一个函数,通过函数返回值对序列中连续的元素进行分组。(《利用Python进行数据分析,78页》)
In [26]: import itertools
In [27]: first_letter = lambda x: x[0]
In [28]: names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
...:
...: for letter, names in itertools.groupby(names, first_letter):
...: print(letter, list(names)) # names is a generator
...:
A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']