Python Cookbook笔记

Python Cookbook

标签(空格分隔): Python


Data Structures and Algorithms

Iteration is fundamental to data processing. And when scanning datasets that don’t fit in memory, we need a way to fetch the items lazily, that is, one at a time and on demand. This is what the Iterator pattern is about.

The yield keyword allows the construction of generators, which work as iterators.

Every generator is an iterator: generators fully implement the iterator interface. But an iterator—as defined in the GoF book - retrieves items from a collection.

Every collection in Python is iterable, and iterators are used internally to support:

  • for loops
  • Collection types construction and extension
  • Looping over text files line by line
  • List, dict, and set comprehensions
  • Tuple unpacking
  • Unpacking actual parameters with * in function calls

Sentence Take #1: A Sequence of Words

Example 14-1. sentence.py: A Sentence as a sequence of words.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text) 

    def __getitem__(self, index):
        return self.words[index]  

    def __len__(self, index): 
        return len(self.words)

    def __repr__(self):
        # reprlib.repr is a utility function to generate abbreviated string representations of data structures that can be very large.
        return 'Sentence(%s)' % reprlib.repr(self.text)

Example 14-2. Testing iteration on a Sentence instance

>>> s = Sentence('"The time has come," the Walrus said,')
>>> s
Sentence('"The time ha... Walrus said,') 
>>> for word in s: 
...     print(word)
The
time
has
come
the
Walrus
said
>>> list(s)
['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']
>>> s[0]
'The'
>>> s[5]
'Walrus'
>>> s[-1]
'said'

Why Sequences Are Iterable: The iter Function

Whenever the interpreter needs to iterate over an object x, it automatically calls iter(x).

The iter built-in function:

  1. Checks whether the object implements iter, and calls that to obtain an iterator.
  2. If __iter__ is not implemented, but __getitem__ is implemented, Python creates an iterator that attempts to fetch items in order, starting from index 0 (zero).
  3. If that fails, Python raises TypeError, usually saying “C object is not iterable,” where C is the class of the target object.

That is why any Python sequence is iterable: they all implement __getitem__. In fact, the standard sequences also implement __iter__, and yours should too, because the special handling of __getitem__ exists for backward compatibility reasons and may be gone in the future.

An object is considered iterable if it implements the __iter__ method. No subclassing or registration is required.

>>> class Foo:
...     def __iter__(self):
...         pass
...
>>> from collections import abc
>>> issubclass(Foo, abc.Iterable)
True
>>> f = Foo()
>>> isinstance(f, abc.Iterable)
True

As of Python 3.4, the most accurate way to check whether an object x is iterable is to call iter(x) and handle a TypeError exception if it isn’t. This is more accurate than using isinstance(x, abc.Iterable), because iter(x) also considers the legacy __getitem__ method, while the Iterable ABC does not.

Iterables Versus Iterators

It’s important to be clear about the relationship between iterables and iterators: Python obtains iterators from iterables.

Here is a simple for loop iterating over a str. The str 'ABC' is the iterable here.

>>> s = 'ABC'
>>> for char in s:
... print(char)
...
A
B
C

If there was no for statement and we had to emulate the for machinery by hand with a while loop.

>>> s = 'ABC'
>>> it = iter(s)
>>> while True:
...     try:
...         print(next(it)) 
...     except StopIteration:
...         del it
...         break
...
A
B
C

StopIteration signals that the iterator is exhausted. This exception is handled internally in for loops and other iteration contexts like list comprehensions, tuple unpacking, etc.

The standard interface for an iterator has two methods:

  • __next__: Returns the next available item, raising StopIteration when there are no more items.
  • __iter__: Returns self; this allows iterators to be used where an iterable is expected, for example, in a for loop.
p018.png-109.4kB
p018.png-109.4kB

Example 14-3. abc.Iterator class; extracted from Lib/_collections_abc.py

class Iterator(Iterable):
    __slots__ = ()
    
    @abstractmethod
    def __next__(self):
    'Return the next item from the iterator. When exhausted, raise StopIteration'
        raise StopIteration
        
    def __iter__(self):
        return self
        
    @classmethod
    def __subclasshook__(cls, C):
        if cls is Iterator:
            if (any("__next__" in B.__dict__ for B in C.__mro__) and
                any("__iter__" in B.__dict__ for B in C.__mro__)):
                return True
            return NotImplemented

The best way to check if an object x is an iterator is to call isinstance(x, abc.Iterator). Thanks to Iterator.__subclasshook__, this test works even if the class of x is not a real or virtual subclass of Iterator.

You can clearly see how the iterator is built by iter(…) and consumed by next(…) using the Python console.

>>> s3 = Sentence('Pig and Pepper') 
>>> it = iter(s3) 
>>> it
<iterator object at 0x...>
>>> next(it) 
'Pig'
>>> next(it)
'and'
>>> next(it)
'Pepper'
>>> next(it) 
Traceback (most recent call last):
...
StopIteration
>>> list(it) 
[]
>>> list(iter(s3)) 
['Pig', 'and', 'Pepper']

Because the only methods required of an iterator are __next__ and __iter__, there is no way to check whether there are remaining items, other than to call next() and catch StopInteration. Also, it’s not possible to “reset” an iterator. If you need to start over, you need to call iter(…) on the iterable that built the iterator in the first place.

  • iterator: Any object that implements the __next__ no-argument method that returns the next item in a series or raises StopIteration when there are no more items. Python iterators also implement the __iter__ method so they are iterable as well.

Sentence Take #2: A Classic Iterator

The next Sentence class is built according to the classic Iterator design pattern following the blueprint in the GoF book.

Example 14-4. sentence_iter.py: Sentence implemented using the Iterator pattern.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):  
        # __iter__ fulfills the iterable protocol by instantiating and returning an iterator.
        return SentenceIterator(self.words) 


class SentenceIterator:

    def __init__(self, words):
        self.words = words 
        self.index = 0 

    def __next__(self):
        try:
            word = self.words[self.index]  
        except IndexError:
            raise StopIteration()  
        self.index += 1  
        return word  

    def __iter__(self): 
        return self

Note that implementing __iter__ in SentenceIterator is not actually needed for this example to work, but the it’s the right thing to do: iterators are supposed to implement both __next__ and __iter__, and doing so makes our iterator pass the issubclass(SentenceInterator, abc.Iterator) test.

Making Sentence an Iterator: Bad Idea

A common cause of errors in building iterables and iterators is to confuse the two. To be clear: iterables have an __iter__ method that instantiates a new iterator every time. Iterators implement a __next__ method that returns individual items, and an __iter__ method that returns self.

Therefore, iterators are also iterable, but iterables are not iterators.

It may be tempting to implement __next__ in addition to __iter__ in the Sentence class, making each Sentence instance at the same time an iterable and iterator over itself. But this is a terrible idea.

A proper implementation of the pattern requires each call to iter(my_itera ble) to create a new, independent, iterator. That is why we need the SentenceIterator class in this example.

An iterable should never act as an iterator over itself. In other words, iterables must implement __iter__, but not __next__. On the other hand, for convenience, iterators should be iterable. An iterator’s __iter__ should just return self.

Sentence Take #3: A Generator Function

A Pythonic implementation of the same functionality uses a generator function to replace the SequenceIterator class.

Example 14-5. sentence_gen.py: Sentence implemented using a generator function.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        for word in self.words: 
            yield word 
        return #This return is not needed

How a Generator Function Works

Any Python function that has the yield keyword in its body is a generator function.

Here is the simplest function useful to demonstrate the behavior of a generator:

>>> def gen_123(): 
...     yield 1
...     yield 2
...     yield 3
...
>>> gen_123
<function gen_123 at 0x...>
>>> gen_123()
<generator object gen_123 at 0x...> 
>>> for i in gen_123(): 
...     print(i)
1
2
3
>>> g = gen_123()
>>> next(g) 
1
>>> next(g)
2
>>> next(g)
3
>>> next(g) 
Traceback (most recent call last):
...
StopIteration

When we invoke next(…) on the generator object, execution advances to the next yield in the function body, and the next(…) call evaluates to the value yielded when the function body is suspended.

Example 14-6. A generator function that prints messages when it runs.

>>> def gen_AB():
...     print('start')
...     yield 'A' 
...     print('continue')
...     yield 'B' 
...     print('end.') 
...
>>> for c in gen_AB(): 
...     print('-->', c) 
...
start
--> A
continue
--> B
end.

In Example 14-5 __iter__ is a generator function which, when called, builds a generator object that implements the iterator interface, so the SentenceIterator class is no longer needed.

This second version of Sentence is much shorter than the first, but it’s not as lazy as it could be. Nowadays, laziness is considered a good trait, at least in programming languages and APIs. A lazy implementation postpones producing values to the last possible moment. This saves memory and may avoid useless processing as well.

Sentence Take #4: A Lazy Implementation

The Iterator interface is designed to be lazy: next(my_iterator) produces one item at a time.

Our Sentence implementations so far have not been lazy because the __init__ eagerly builds a list of all words in the text, binding it to the self.words attribute. This will entail processing the entire text, and the list may use as much memory as the text itself.

The re.finditer function is a lazy version of re.findall which, instead of a list, returns a generator producing re.MatchObject instances on demand.

Example 14-7. sentence_gen2.py: Sentence implemented using a generator function calling the re.finditer generator function.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        for match in RE_WORD.finditer(self.text): 
            yield match.group()

Generator functions are an awesome shortcut, but the code can be made even shorter with a generator expression.

Sentence Take #5: A Generator Expression

A generator expression can be understood as a lazy version of a list comprehension. In other words, if a list comprehension is a factory of lists, a generator expression is a factory of generators.

Example 14-8. The gen_AB generator function is used by a list comprehension, then by a generator expression.

>>> def gen_AB(): 
...     print('start')
...     yield 'A'
...     print('continue')
...     yield 'B'
...     print('end.')
...
>>> res1 = [x*3 for x in gen_AB()]
start
continue
end.
>>> for i in res1:
...     print('-->', i)
...
--> AAA
--> BBB
>>> res2 = (x*3 for x in gen_AB())
>>> res2
<generator object <genexpr> at 0x10063c240>
>>> for i in res2:
...     print('-->', i)
...
start
--> AAA
continue
--> BBB
end.

Example 14-9. sentence_genexp.py: Sentence implemented using a generator expression.

import re
import reprlib

RE_WORD = re.compile('\w+')


class Sentence:

    def __init__(self, text):
        self.text = text

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        return (match.group() for match in RE_WORD.finditer(self.text))

Generator Expressions: When to Use Them

In Example 14-9, we saw that a generator expression is a syntactic shortcut to create a generator without defining and calling a function. On the other hand, generator functions are much more flexible: you can code complex logic with multiple statements, and can even use them as coroutines(协同程序).

Another Example: Arithmetic Progression Generator

Example 14-11. The ArithmeticProgression class.

class ArithmeticProgression:

    def __init__(self, begin, step, end=None):  
        self.begin = begin
        self.step = step
        self.end = end  # None -> "infinite" series

    def __iter__(self):
        result = type(self.begin + self.step)(self.begin)  
        forever = self.end is None 
        index = 0
        while forever or result < self.end: 
            yield result 
            index += 1
            result = self.begin + self.step * index 

Example 14-10. Demonstration of an ArithmeticProgression class.

>>> ap = ArithmeticProgression(0, 1, 3)
>>> list(ap)
[0, 1, 2]
>>> ap = ArithmeticProgression(1, .5, 3)
>>> list(ap)
[1.0, 1.5, 2.0, 2.5]
>>> ap = ArithmeticProgression(0, 1/3, 1)
>>> list(ap)
[0.0, 0.3333333333333333, 0.6666666666666666]
>>> from fractions import Fraction
>>> ap = ArithmeticProgression(0, Fraction(1, 3), 1)
>>> list(ap)
[Fraction(0, 1), Fraction(1, 3), Fraction(2, 3)]
>>> from decimal import Decimal
>>> ap = ArithmeticProgression(0, Decimal('.1'), .3)
>>> list(ap)
[Decimal('0.0'), Decimal('0.1'), Decimal('0.2')]

Example 14-12 shows a generator function called aritprog_gen that does the same job as ArithmeticProgression but with less code.

Example 14-12. The aritprog_gen generator function

def aritprog_gen(begin, step, end=None):
    result = type(begin + step)(begin)
    forever = end is None
    index = 0
    while forever or result < end:
        yield result
        index += 1
        result = begin + step * index

There are plenty of ready-to-use generators in the standard library, and the next section will show an even cooler implementation using the itertools module.

Arithmetic Progression with itertools

The itertools module in Python 3.4 has 19 generator functions that can be combined in a variety of interesting ways.

For example, the itertools.count function returns a generator that produces numbers.

>>> import itertools
>>> gen = itertools.count(1, .5)
>>> next(gen)
1
>>> next(gen)
1.5
>>> next(gen)
2.0
>>> next(gen)
2.5

However, itertools.count never stops, so if you call list(count()), Python will try to build a list larger than available memory and your machine will be very grumpy long before the call fails.

On the other hand, there is the itertools.takewhile function: it produces a generator that consumes another generator and stops when a given predicate evaluates to False.

>>> gen = itertools.takewhile(lambda n: n < 3, itertools.count(1, .5))
>>> list(gen)
[1, 1.5, 2.0, 2.5]

Example 14-13. aritprog_v3.py: this works like the previous aritprog_gen functions.

import itertools


def aritprog_gen(begin, step, end=None):
    first = type(begin + step)(begin)
    ap_gen = itertools.count(first, step)
    if end is not None:
        ap_gen = itertools.takewhile(lambda n: n < end, ap_gen)
    return ap_gen

When implementing generators, know what is available in the standard library, otherwise there’s a good chance you’ll reinvent the wheel.

Generator Functions in the Standard Library

The os.walk generator function is impressive.

I summarize two dozen of them, from the built-in, itertools, and functools modules.

The first group are filtering generator functions: they yield a subset of items produced by the input iterable, without changing the items themselves.

p019.png-82.3kB
p019.png-82.3kB

p020.png-70.2kB
p020.png-70.2kB

Example 14-14. Filtering generator functions examples.

>>> def vowel(c):
...     return c.lower() in 'aeiou'
...
>>> list(filter(vowel, 'Aardvark'))
['A', 'a', 'a']
>>> import itertools
>>> list(itertools.filterfalse(vowel, 'Aardvark'))
['r', 'd', 'v', 'r', 'k']
>>> list(itertools.dropwhile(vowel, 'Aardvark'))
['r', 'd', 'v', 'a', 'r', 'k']
>>> list(itertools.takewhile(vowel, 'Aardvark'))
['A', 'a']
>>> list(itertools.compress('Aardvark', (1,0,1,1,0,1)))
['A', 'r', 'd', 'a']
>>> list(itertools.islice('Aardvark', 4))
['A', 'a', 'r', 'd']
>>> list(itertools.islice('Aardvark', 4, 7))
['v', 'a', 'r']
>>> list(itertools.islice('Aardvark', 1, 7, 2))
['a', 'd', 'a']

The next group are the mapping generators: they yield items computed from each individual item in the input iterable—or iterables, in the case of map and starmap.

p021.png-147kB
p021.png-147kB

Example 14-15. itertools.accumulate generator function examples.

>>> sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]
>>> import itertools
>>> list(itertools.accumulate(sample)) 
[5, 9, 11, 19, 26, 32, 35, 35, 44, 45]
>>> list(itertools.accumulate(sample, min)) 
[5, 4, 2, 2, 2, 2, 2, 0, 0, 0]
>>> list(itertools.accumulate(sample, max)) 
[5, 5, 5, 8, 8, 8, 8, 8, 9, 9]
>>> import operator
>>> list(itertools.accumulate(sample, operator.mul)) 
[5, 20, 40, 320, 2240, 13440, 40320, 0, 0, 0]
>>> list(itertools.accumulate(range(1, 11), operator.mul))
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

Example 14-16. Mapping generator function examples.

>>> list(enumerate('albatroz', 1))
[(1, 'a'), (2, 'l'), (3, 'b'), (4, 'a'), (5, 't'), (6, 'r'), (7, 'o'), (8, 'z')]
>>> import operator
>>> list(map(operator.mul, range(11), range(11)))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> list(map(operator.mul, range(11), [2, 4, 8]))
[0, 4, 16]
>>> list(map(lambda a, b: (a, b), range(11), [2, 4, 8]))
[(0, 2), (1, 4), (2, 8)]
>>> import itertools
>>> list(itertools.starmap(operator.mul, enumerate('albatroz', 1))) 
['a', 'll', 'bbb', 'aaaa', 'ttttt', 'rrrrrr', 'ooooooo', 'zzzzzzzz']
>>> sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]
>>> list(itertools.starmap(lambda a, b: b/a,
...     enumerate(itertools.accumulate(sample), 1))) 
[5.0, 4.5, 3.6666666666666665, 4.75, 5.2, 5.333333333333333, 5.0, 4.375, 4.888888888888889, 4.5]

Next, we have the group of merging generators—all of these yield items from multiple input iterables. chain and chain.from_iterable consume the input iterables sequentially (one after the other), while product, zip, and zip_longest consume the input iterables in parallel.

p022.png-186kB
p022.png-186kB

Example 14-17. Merging generator function examples.

>>> list(itertools.chain('ABC', range(2))) 
['A', 'B', 'C', 0, 1]
>>> list(itertools.chain(enumerate('ABC'))) 
[(0, 'A'), (1, 'B'), (2, 'C')]
>>> list(itertools.chain.from_iterable(enumerate('ABC'))) 
[0, 'A', 1, 'B', 2, 'C']
>>> list(zip('ABC', range(5))) 
[('A', 0), ('B', 1), ('C', 2)]
>>> list(zip('ABC', range(5), [10, 20, 30, 40])) 
[('A', 0, 10), ('B', 1, 20), ('C', 2, 30)]
>>> list(itertools.zip_longest('ABC', range(5))) 
[('A', 0), ('B', 1), ('C', 2), (None, 3), (None, 4)]
>>> list(itertools.zip_longest('ABC', range(5), fillvalue='?')) 
[('A', 0), ('B', 1), ('C', 2), ('?', 3), ('?', 4)]

Example 14-18. itertools.product generator function examples.

>>> list(itertools.product('ABC', range(2))) 
[('A', 0), ('A', 1), ('B', 0), ('B', 1), ('C', 0), ('C', 1)]
>>> suits = 'spades hearts diamonds clubs'.split()
>>> list(itertools.product('AK', suits)) 
[('A', 'spades'), ('A', 'hearts'), ('A', 'diamonds'), ('A', 'clubs'), ('K', 'spades'), ('K', 'hearts'), ('K', 'diamonds'), ('K', 'clubs')]
>>> list(itertools.product('ABC')) 
[('A',), ('B',), ('C',)]
>>> list(itertools.product('ABC', repeat=2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'B'), ('C', 'C')]
>>> list(itertools.product(range(2), repeat=3))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]
>>> rows = itertools.product('AB', range(2), repeat=2)
>>> for row in rows: print(row)
...
('A', 0, 'A', 0)
('A', 0, 'A', 1)
('A', 0, 'B', 0)
('A', 0, 'B', 1)
('A', 1, 'A', 0)
('A', 1, 'A', 1)
('A', 1, 'B', 0)
('A', 1, 'B', 1)
('B', 0, 'A', 0)
('B', 0, 'A', 1)
('B', 0, 'B', 0)
('B', 0, 'B', 1)
('B', 1, 'A', 0)
('B', 1, 'A', 1)
('B', 1, 'B', 0)
('B', 1, 'B', 1)

Some generator functions expand the input by yielding more than one value per input item.

p023.png-159.4kB
p023.png-159.4kB

p024.png-28.9kB
p024.png-28.9kB

Example 14-19. count, cycle, and repeat.

>>> ct = itertools.count() 
>>> next(ct) 
0
>>> next(ct), next(ct), next(ct) 
(1, 2, 3)
>>> list(itertools.islice(itertools.count(1, .3), 3)) 
[1, 1.3, 1.6]
>>> cy = itertools.cycle('ABC') 
>>> next(cy)
'A'
>>> list(itertools.islice(cy, 7)) 
['B', 'C', 'A', 'B', 'C', 'A', 'B']
>>> rp = itertools.repeat(7) 
>>> next(rp), next(rp)
(7, 7)
>>> list(itertools.repeat(8, 4)) 
[8, 8, 8, 8]
>>> list(map(operator.mul, range(11), itertools.repeat(5))) 
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]

Example 14-20. Combinatoric generator functions yield multiple values per input item.

>>> list(itertools.combinations('ABC', 2))
[('A', 'B'), ('A', 'C'), ('B', 'C')]
>>> list(itertools.combinations_with_replacement('ABC', 2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')]
>>> list(itertools.permutations('ABC', 2))
[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
>>> list(itertools.product('ABC', repeat=2))
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'B'), ('C', 'C')]

The last group of generator functions we’ll cover in this section are designed to yield all items in the input iterables, but rearranged in some way.

p025.png-55.4kB
p025.png-55.4kB

p026.png-56.9kB
p026.png-56.9kB

Example 14-21. itertools.groupby.

>>> list(itertools.groupby('LLLLAAGGG'))
[('L', <itertools._grouper object at 0x102227cc0>),
('A', <itertools._grouper object at 0x102227b38>),
('G', <itertools._grouper object at 0x102227b70>)]
>>> for char, group in itertools.groupby('LLLLAAAGG'):
...     print(char, '->', list(group))
...
L -> ['L', 'L', 'L', 'L']
A -> ['A', 'A',]
G -> ['G', 'G', 'G']
>>> animals = ['duck', 'eagle', 'rat', 'giraffe', 'bear',
...             'bat', 'dolphin', 'shark', 'lion']
>>> animals.sort(key=len)
>>> animals
['rat', 'bat', 'duck', 'bear', 'lion', 'eagle', 'shark', 'giraffe', 'dolphin']
>>> for length, group in itertools.groupby(animals, len):
...     print(length, '->', list(group))
...
3 -> ['rat', 'bat']
4 -> ['duck', 'bear', 'lion']
5 -> ['eagle', 'shark']
7 -> ['giraffe', 'dolphin']
>>> for length, group in itertools.groupby(reversed(animals), len):
...     print(length, '->', list(group))
...
7 -> ['dolphin', 'giraffe']
5 -> ['shark', 'eagle']
4 -> ['lion', 'bear', 'duck']
3 -> ['bat', 'rat']

Example 14-22. itertools.tee yields multiple generators, each yielding every item of the input generator.

>>> list(itertools.tee('ABC'))
[<itertools._tee object at 0x10222abc8>, <itertools._tee object at 0x10222ac08>]
>>> g1, g2 = itertools.tee('ABC')
>>> next(g1)
'A'
>>> next(g2)
'A'
>>> next(g2)
'B'
>>> list(g1)
['B', 'C']
>>> list(g2)
['C']
>>> list(zip(*itertools.tee('ABC')))
[('A', 'A'), ('B', 'B'), ('C', 'C')]

Iterable Reducing Functions

The functions in Table 14-6 all take an iterable and return a single result. Actually, every one of the builtins listed here can be implemented with functools.reduce, but they exist as built-ins because they address some common use cases more easily.

p027.png-68.3kB
p027.png-68.3kB

p028.png-128.9kB
p028.png-128.9kB

Example 14-23. Results of all and any for some sequences.

>>> all([1, 2, 3])
True
>>> all([1, 0, 3])
False
>>> all([])
True
>>> any([1, 2, 3])
True
>>> any([1, 0, 3])
True
>>> any([0, 0.0])
False
>>> any([])
False
>>> g = (n for n in [0, 0.0, 7, 8])
>>> any(g)
True
>>> next(g)
8

A Closer Look at the iter Function

It can be called with two arguments to create an iterator from a regular function or any callable object. In this usage, the first argument must be a callable to be invoked repeatedly (with no arguments) to yield values, and the second argument is a sentinel: a marker value which, when returned by the callable, causes the iterator to raise StopIteration instead of yielding the sentinel.

The following example shows how to use iter to roll a six-sided die until a 1 is rolled:

>>> def d6():
...     return randint(1, 6)
...
>>> d6_iter = iter(d6, 1)
>>> d6_iter
<callable_iterator object at 0x00000000029BE6A0>
>>> for roll in d6_iter:
...     print(roll)
...
4
3
6
3

A useful example is found in the iter built-in function documentation. This snippet reads lines from a file until a blank line is found or the end of file is reached:

with open('mydata.txt') as fp:
    for line in iter(fp.readline, ''):
        process_line(line)


Context Managers and else Blocks

Context Managers and with Blocks

Context manager objects exist to control a with statement, just like iterators exist to control a for statement.

The with statement was designed to simplify the try/finally pattern, which guarantees that some operation is performed after a block of code, even if the block is aborted because of an exception, a return or sys.exit() call. The code in the finally clause usually releases a critical resource or restores some previous state that was temporarily changed.

The most common example is making sure a file object is closed.

Example 15-1. Demonstration of a file object as a context manager.

>>> with open('mirror.py') as fp:
...     src = fp.read(60)
...
>>> len(src)
60
>>> fp
<_io.TextIOWrapper name='mirror.py' mode='r' encoding='UTF-8'>
>>> fp.closed, fp.encoding
(True, 'UTF-8')
>>> fp.read(60)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file.


Concurrency with Futures

This chapter focuses on the concurrent.futures library introduced in Python 3.2.

Example: Web Downloads in Three Styles

To handle network I/O efficiently, you need concurrency(并发性), as it involves high latency — so instead of wasting CPU cycles waiting, it’s better to do something else until a response comes back from the network.

I wrote three simple programs to download images of 20 country flags from the Web.

The first one, flags.py, runs sequentially: it only requests the next image when the previous one is downloaded and saved to disk. The other two scripts make concurrent downloads: they request all images practically at the same time, and save the files as they arrive. The flags_threadpool.py script uses the concurrent.futures package, while flags_asyncio.py uses asyncio.

A Sequential Download Script

Example 17-2. flags.py: sequential download script; some functions will be reused by the other scripts.

import os
import time
import sys

import requests  

POP20_CC = ('CN IN US ID BR PK NG BD RU JP '
            'MX PH VN ET EG DE IR TR CD FR').split() 

BASE_URL = 'http://flupy.org/data/flags' 

DEST_DIR = 'downloads/'  


def save_flag(img, filename):
    path = os.path.join(DEST_DIR, filename)
    with open(path, 'wb') as fp:
        fp.write(img)


def get_flag(cc): 
    url = '{}/{cc}/{cc}.gif'.format(BASE_URL, cc=cc.lower())
    resp = requests.get(url)
    return resp.content


def show(text): 
    print(text, end=' ')
    sys.stdout.flush()


def download_many(cc_list):
    for cc in sorted(cc_list):
        image = get_flag(cc)
        show(cc)
        save_flag(image, cc.lower() + '.gif')

    return len(cc_list)


def main(download_many): 
    t0 = time.time()
    count = download_many(POP20_CC)
    elapsed = time.time() - t0
    msg = '\n{} flags downloaded in {:.2f}s'
    print(msg.format(count, elapsed))


if __name__ == '__main__':
    main(download_many)

The requests library by Kenneth Reitz is available on PyPI and is more powerful and easier to use than the urllib.request module from the Python 3 standard library.

Downloading with concurrent.futures

The main features of the concurrent.futures package are the ThreadPoolExecutor and ProcessPoolExecutor classes, which implement an interface that allows you to submit callables for execution in different threads or processes, respectively.

Example 17-3. flags_threadpool.py: threaded download script using futures.ThreadPoolExecutor.

rom concurrent import futures

from flags import save_flag, get_flag, show, main

MAX_WORKERS = 20


def download_one(cc):
    image = get_flag(cc)
    show(cc)
    save_flag(image, cc.lower() + '.gif')
    return cc


def download_many(cc_list):
    # Set the number of worker threads: use the smaller number between the maximum we want to allow (MAX_WORKERS) and the actual items to be processed, so no unnecessary threads are created.
    workers = min(MAX_WORKERS, len(cc_list))
    # Instantiate the ThreadPoolExecutor with that number of worker threads; the executor.__exit__ method will call executor.shutdown(wait=True), which will block until all threads are done.
    with futures.ThreadPoolExecutor(workers) as executor:
        # The map method is similar to the map built-in, except that the download_one function will be called concurrently from multiple threads; it returns a generator that can be iterated over to retrieve the value returned by each function.
        res = executor.map(download_one, sorted(cc_list)) 

    return len(list(res))


if __name__ == '__main__':
    main(download_many)
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,214评论 6 481
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,307评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 152,543评论 0 341
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,221评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,224评论 5 371
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,007评论 1 284
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,313评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,956评论 0 259
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,441评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,925评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,018评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,685评论 4 322
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,234评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,240评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,464评论 1 261
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,467评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,762评论 2 345

推荐阅读更多精彩内容

  • 函数调用 Built-in Functions abs(x) Return the absolute value ...
    叫我七夜阅读 1,161评论 0 0
  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 9,446评论 0 13
  • (稻盛哲学学习会)打卡第106天 姓名:李晨珂 部门:分水碶 组别:待定 【知~学习】 诵读《京瓷哲学》第一章 度...
    李晨珂分水碶阅读 133评论 0 0
  • 内存优化基本可以分为以下几个方面:下面就分别介绍。 减少对象的内存占用 对内存对象进行复用 避免对象的内存泄漏 内...
    wangling90阅读 315评论 0 0
  • 友友可欣是个聪明又善于交谈的女孩子。 如果没有碰巧先生,可欣大概会一直带着这样一个令人羡慕的光环,继续在女神的道路...
    花脆阅读 407评论 0 1