现有列表如下:
[1, 7, 10, 4, 9, 10, 9, 8, 5, 8]
希望统计出各个元素出现的次数,最终得到一个这样的结果:{8: 2, 9: 2...}
,即:{某个元素: 出现的次数...}。
- 方法一:
首先要将这些元素作为字典的键,建立一个初始值为0的字典:
>>> from random import randint
>>> data = [randint(1,10) for x in xrange(10)]
>>> data
[1, 7, 10, 4, 9, 10, 9, 8, 5, 8]
>>> d = dict.fromkeys(data, 0)
>>> d
{1: 0, 4: 0, 5: 0, 7: 0, 8: 0, 9: 0, 10: 0}
>>> for x in data:
>>> d[x] += 1
>>> d
{1: 1, 4: 1, 5: 1, 7: 1, 8: 2, 9: 2, 10: 2}
- 方法二:
利用collections
模块中的Counter
,Counter
是一个简单的计数器:
>>> from collections import Counter
>>> c = Counter(data)
>>> c
Counter({1: 1, 4: 1, 5: 1, 7: 1, 8: 2, 9: 2, 10: 2})
>>> isinstance(c, dict)
True
# 该 Counter 对象是 dict 的子类,所以可以通过键来访问对应值
>>> c[1]
1
# most_common(n),直接统计出前n个最高词频
>>> c.most_common(2)
[(8, 2), (9, 2)]
参考文档:
class Counter(__builtin__.dict)
| Dict subclass for counting hashable items. Sometimes called a bag
| or multiset. Elements are stored as dictionary keys and their counts
| are stored as dictionary values.
|
| >>> c = Counter('abcdeabcdabcaba') # count elements from a string
|
| >>> c.most_common(3) # three most common elements
| [('a', 5), ('b', 4), ('c', 3)]
| >>> sorted(c) # list all unique elements
| ['a', 'b', 'c', 'd', 'e']
| >>> ''.join(sorted(c.elements())) # list elements with repetitions
| 'aaaaabbbbcccdde'
| >>> sum(c.values()) # total of all counts
| 15
|
| >>> c['a'] # count of letter 'a'
| 5
| >>> for elem in 'shazam': # update counts from an iterable
| ... c[elem] += 1 # by adding 1 to each element's count
| >>> c['a'] # now there are seven 'a'
| 7
| >>> del c['b'] # remove all 'b'
| >>> c['b'] # now there are zero 'b'
| 0
|
| >>> d = Counter('simsalabim') # make another counter
| >>> c.update(d) # add in the second counter
| >>> c['a'] # now there are nine 'a'
| 9
|
| >>> c.clear() # empty the counter
| >>> c
| Counter()
|
| Note: If a count is set to zero or reduced to zero, it will remain
| in the counter until the entry is deleted or the counter is cleared:
|
| >>> c = Counter('aaabbc')
| >>> c['b'] -= 2 # reduce the count of 'b' by two
| >>> c.most_common() # 'b' is still in, but its count is zero | [('a', 3), ('c', 1), ('b', 0)]