C++11 标准库源代码分析：连载之八

无序关联容器

无序关联容器（Unordered associative container）是C++11标准库中新增的类型，包括

unordered_set
unordered_multiset
unordered_map
unordered_multimap

共四种类型，它们的共同特点是

在容器内部，元素的排列是没有特定顺序的，这也正是它们被叫做“unordered container”的原因。
都是通过hast table实现的。这点我们稍后会详细介绍。

标准库中已经有了map和set，为什么还要定义unordered的map和set呢？答案还是那两个字：效率。我们知道，map和set的底层数据结构是红黑树，执行查找、插入，删除等操作的时间复杂度为O(logn)；而unordered map和unordered set的底层数据为hash table，执行查找、插入和删除等操作的时间复杂度为O(1)，明显要快很多，所以如果对元素的排列顺序没有要求，建议使用无序关联容器。

既然无序关联容器都是基于hash table的，那我们就有必要先了解一下C++ 11中的hash算法。

C++ 11中的hash算法

C++ 11标准对如何计算hash有明确的要求：

hash方程应该是一个function object，声明如下所示：

template<T>
struct hash {
    size_t operator()(T key) const noexcept;
};

对于类型为T的参数t1和t2，如果t1 == t2，则hash<T>()(t1) == hash<T>()(t2);
对于类型为T的参数t1和t2，如果t1 != t2，则hash<T>()(t1) == hash<T>()(t2)的概率应近似于1.0/std::numeric_limits<size_t>::max()（在我的MacBook Pro上，这个值为0.00000000000000000005.421，小数点后面19个0）。

不过，C++标准只是规定了hash方程的形式和必须满足的条件，具体到如何计算hash值，则没有要求。就libc++而言，针对不同的类型，其计算方法也不尽相同。

简单数值类型

对于简单数值类型，如bool、int、char等，libc++的hash算法也很简单：直接返回数值本身。

// header: <functional>

template<class T> struct hash; // forward declaration

// specialization for bool
template<>
struct hash<bool> : pubic unary_function<bool, size_t> {
    size_t operator()(bool value) const noexcept {
        return static_cast<size_t>)(value);
    }
};

// specialization for int
template<>
struct hash<int> : public unary_function<int, size_t> {
    size_t operator()(int value) const noexcept {
        return static_cast<size_t>(value);
    }
};

// specialization for char
template<>
struct hash<char> : pubic unary_function<char, size_t> {
    size_t operator(char value) const noexcept {
        return static_cast<size_t>(value);
    }
};

浮点数值类型

针对复杂数值类型，如float、double等，libc++提供了两种hash算法：murmur2和cityhash64：

// header: <memory>

template<class Size, size_t = sizeof(Size) * 8>
struct murmur2_or_cityhash;

// use murmur2 on 32-bit system, 
// because size_t is 32 bits on 32-bit system
template<class Size>
struct murmur2_or_cityhash<Size, 32> {
    Size operator()(const void* key, Size len) {
        // murmur2 hash算法
        ...
    }
};

// use cityhash64 on 64-bit system,
// because size_t is 64 bits on 64-bit system
template<class Size>
struct murmur2_or_cityhash<Size, 64> {
    Size operator()(const void* key, Size len) {
        // cityhash64 hash算法
        ...
    }
};

murmur2_or_cityhash::operator()接受两个参数，并不满足C++标准的要求，为了方便使用，libc++又定义了一个外敷类scalar_hash：

template<class T, size_t = sizeof(T) / sizeof(size_t)>
struct scalar_hash;

template<class T>
struct scalar_hash<T, 0> : public unary_function<T, size_t> {
    size_t operator()(T value) const {
        union {
            T      t;
            size_t a;
        } _u;
        _u.a = 0;
        _u.t = value;
        return _u.a;
    }
};

template <class T>
struct scalar_hash<T, 1> : public unary_function<T, size_t> {
    size_t operator()(T value) const {
        union{
            T        t;
            size_t   a;
        } _u;
        _u.t = value;
        return _u.a;
    }
};

template <class T>
struct scalar_hash<T, 2> : public unary_function<T, size_t> {
    size_t operator()(Tp value) const {
        union {
            Tp t;
            struct {
                size_t a;
                size_t b;
            } s;
        } _u;
        _u.t = value;
        return murmur2_or_cityhash<size_t>()(&_u, sizeof(_u));
    }
};

template <class T>
struct scalar_hash<T, 3> : public unary_function<T, size_t> {
    size_t operator()(T value) const {
        union {
            T t;
            struct {
                size_t _a;
                size_t _b;
                size_t _c;
            } s;
        } _u;
        _u.t = value;
        return murmur2_or_cityhash<size_t>()(&_u, sizeof(_u));
    }
};

template <class T>
struct scalar_hash<T, 4> : public unary_function<T, size_t> {
    size_t operator()(T value) const {
        union {
            T t;
            struct {
                size_t a;
                size_t b;
                size_t c;
                size_t d;
            } s;
        } _u;
        _u.t = value;
        return murmur2_or_cityhash<size_t>()(&_u, sizeof(_u));
    }
};

浮点数值类型最终的hash计算方法如下：

template <>
struct hash<float> : public scalar_hash<float> {
    size_t operator()(float value) const
    {
        // -0.0 and 0.0 should return same hash
       if (value == 0)
           return 0;
        return scalar_hash<float>::operator()(value);
    }
};

template <>
struct hash<double> : public scalar_hash<double> {
    size_t operator()(double value) const
    {
        // -0.0 and 0.0 should return same hash
       if (value == 0)
           return 0;
        return scalar_hash<double>::operator()(value);
    }
};

内置非数值类型

对于标准库中的非数值类型，比如string等，标准库也提供了hash方程：

// header: <string>

template<class CharT, class Traits, class Allocator>
struct hash<basic_string<CharT, Traits, Allocator> >
    : public unary_function<basic_string<CharT, Traits, Allocator>, size_t> {

    size_t operator()(const basic_string<CharT, Traits, Allocator>& val) const noexcept;
};

template<class CharT, class Traits, class Allocator>
size_t hash<basic_string<CharT, Traits, Allocator> >::operator()(
        const basic_string<CharT, Traits, Allocator>& val) const noexcept {
    return __do_string_hash(val.data(), val.data() + val.size());
}

__do_string_hash定义在文件<__string>中：

// header: <__string>

template<class Ptr>
inline size_t __do_string_hash(Ptr p, Ptr e) {
    typedef typename iterator_traits<Ptr>::value_type value_type;
    return murmur2_or_cityhash<size_t>()(p, (e-p)*sizeof(value_type));
}

自定义类型

如果你有一个类，

struct Foo {
    int         _i;
    double      _d;
    std::string _s;

    Foo(int i, double d, const std::string &s)
        : _i(i), _d(d), _s(s) {}
};

你可以这样计算hash：

template<>
struct hash<Foo> : public unary_function<Foo, size_t> {
    size_t operator()（const Foo& foo）const noexcept {
        return murmur2_or_cityhash<size_t>()
            (static_cast<const void*>(&foo), sizeof(Foo));
    }
};

现在我们已经对hash算法有所了解，接下来我们就讲讲那些基于hash算法的容器是如何实现的。虽然C++ 11标准库中基于hash算法的容器一共有四种，但是它们的实现方式大同小异，所以我们就讲讲最典型的hash容器：unordered_map。

unordered_map

首先来看unordered_map的声明：

// file: unordered_map

template<class _Key, class _Tp, class _Hash = hash<_Key>, class _Pred = equal_to<_key>, 
         class _Alloc = allocator<pair<const _Key, _Tp> > >
class unordered_map {
public:
    // types
    typedef _Key                                           key_type;
    typedef _Tp                                            mapped_type;
    typedef _Hash                                          hasher;
    typedef _Pred                                          key_equal;
    typedef _Alloc                                         allocator_type;
    typedef pair<const key_type, mapped_type>              value_type;
    typedef pair<key_type, mapped_type>                    __nc_value_type;
    typedef value_type&                                    reference;
    typedef const value_type&                              const_reference;

private:
    typedef __hash_table<__value_type, __hasher, __key_equal, __allocator_type>   __table;
    __table __table_;

    // ...
};

可以看到unordered_map的实现采用了Pimpl idiom，unordered_map只是个wrapper，真正的实现是在__hash_table中。要讲清楚__hash_table不是一件容易的事情，好在libc++的__hash_table从技术上讲并没有什么奇特之处，仍然采用了大家都很熟悉的bucket list形式，如下图所示：

下面正式进入__hash_table：

template<class _Tp, class _Hash, class _Equal, class _Alloc>
class __hash_table {
public:
    typedef _Tp         value_type;
    typedef _Hash       hasher;
    typedef _Equal      key_equal;
    typedef _Alloc      allocator_type;

private:
    typedef std::allocator_traits<allocator_type> __alloc_traits;
    typedef typename __make_hash_node_types<value_type, typename __alloc_traits::void_pointer>::type _NodeTypes;

public:
    typedef typename _NodeTypes::__node_value_type           __node_value_type;
    typedef typename _NodeTypes::__container_value_type      __container_value_type;
    typedef typename _NodeTypes::key_type                    key_type;
    typedef value_type&                              reference;
    typedef const value_type&                        const_reference;
    typedef typename __alloc_traits::pointer         pointer;
    typedef typename __alloc_traits::const_pointer   const_pointer;
    typedef typename __alloc_traits::size_type       size_type;
    typedef typename _NodeTypes::difference_type     difference_type;

public:
    // Create __node

    typedef typename _NodeTypes::__node_type __node;
    typedef typename std::__rebind_alloc_helper<__alloc_traits, __node>::type __node_allocator;
    typedef std::allocator_traits<__node_allocator>       __node_traits;
    typedef typename _NodeTypes::__void_pointer      __void_pointer;
    typedef typename _NodeTypes::__node_pointer      __node_pointer;
    typedef typename _NodeTypes::__node_pointer      __node_const_pointer;
    typedef typename _NodeTypes::__node_base_type    __first_node;
    typedef typename _NodeTypes::__node_base_pointer __node_base_pointer;
    typedef typename _NodeTypes::__next_pointer      __next_pointer;

private:
    typedef typename std::__rebind_alloc_helper<__node_traits, __next_pointer>::type __pointer_allocator;
    typedef std::__bucket_list_deallocator<__pointer_allocator> __bucket_list_deleter;
    typedef std::unique_ptr<__next_pointer[], __bucket_list_deleter> __bucket_list;
    typedef std::allocator_traits<__pointer_allocator>          __pointer_alloc_traits;
    typedef typename __bucket_list_deleter::pointer       __node_pointer_pointer;

    // --- Member data begin ---
    __bucket_list                               __bucket_list_;
    std::pair<__first_node, __node_allocator>   __p1_;
    std::pair<size_type, hasher>                __p2_;
    std::pair<float, key_equal>                 __p3_;

public:
    size_type size() const noexcept { return __p2_.first();}

    float max_load_factor() const noexcept { return __p3_.first();}
};

我们来梳理一下，每一个__hash_table都有一个__bucket_list_：

__bucket_list __bucket_list_

而__bucket_list_是一个smart pointer：

typedef typename __make_hash_node_types<value_type, typename __alloc_traits::void_pointer>::type _NodeTypes;
typedef typename _NodeTypes::__next_pointer                      __next_pointer;
typedef std::unique_ptr<__next_pointer[], __bucket_list_deleter> __bucket_list;

这里的关键就是__make_hash_node_types：

template <class _NodePtr, class _NodeT = typename pointer_traits<_NodePtr>::element_type>
struct __hash_node_types;

template <class _NodePtr, class _Tp, class _VoidPtr>
struct __hash_node_types<_NodePtr, __hash_node<_Tp, _VoidPtr> >
    : public __hash_key_value_types<_Tp>, __hash_map_pointer_types<_Tp, _VoidPtr>{

    typedef _NodePtr                                       __node_pointer;
    typedef __hash_node_base<__node_pointer>               __node_base_type;
    typedef typename __node_base_type::__next_pointer      __next_pointer;

    // ...
};

template<class _NodeValueTp, class _VoidPtr>
struct __make_hash_node_types {
    typedef __hash_node<_NodeValueTp, _VoidPtr> _NodeTp;
    typedef typename std::__rebind_pointer<_VoidPtr, _NodeTp>::type _NodePtr;
    typedef __hash_node_types<_NodePtr> type;
};

经过这一连串让人头晕目眩的类型代换后，__bucket_list_大致可以写成：

typedef std::pair<Key, Value>                               _Tp;
typedef __hash_node<_Tp, _VoidPtr>*                         _NodePtr;
typedef typename __hash_node_base<_NodePtr>::__next_pointer __next_pointer;
std::unique_ptr<__next_pointer[]>                           __bucket_list_;

__hash_node和__hash_node_base就是一切开始的地方：

// file: __hash_table

template<class _NodePtr>
struct __hash_node_base {
    typedef typename std::pointer_traits<_NodePtr>::element_type __node_type;
    typedef __hash_node_base __first_node;
    typedef typename std::__rebind_pointer<_NodePtr, __first_node>::type __node_base_pointer;
    typedef _NodePtr __node_pointer;
    typedef __node_base_pointer __next_pointer;

    __next_pointer __next_;
};

template<class _Tp, class _VoidPtr>
struct __hash_node 
    : public __hash_node_base<typename std::__rebind_pointer<_VoidPtr, __hash_node<_Tp, _VoidPtr> >::type> {
    typedef _Tp __node_value_type;

    size_t  __hash_;
    __node_value_type __value_;
};

纵观源代码，我们会发现90%的代码其实都是类型定义和代换，大量的typedef让人头晕目眩，真不知道这样的代码是怎样构思出来的，又是怎样测试的。

写了这么多，也仅仅是了解了__hash_table的声明而已。篇幅有限，我们不能详细讲解__hash_table具体的实现了，就挑两个方法讲一下吧：

默认构造函数

默认构造函数非常简单：

// file: __hash_table

template<class _Tp, class _Hash, class _Equal, class _Alloc>
inline __hash_table<_Tp, _Hash, _Equal, _Alloc>::__hash_table()
    : __p2_(0), __p3(1.0f) {

}

插入元素

源代码有点复杂，不过只要看懂了__hash_table声明部分的代码，这部分代码相对来说还是比较容易的。

// file: unordered_map

pair<iterator, bool> insert(const value_type& __) {
    return __table_.__insert_unique(__x);
}

// file: __hash_table

pair<iterator, bool> __insert_unique(__container_value_type&& __x) {
    return __emplace_unique_key_args(_NodeType::__get_key(__x), std::move(__x));
}

template<class _Tp, class _Hash, class _Equal, class _Alloc>
template<class _key, class ..._Args>
pair<typename __hash_table<_Tp, _Hash, _Equal, _Alloc>::iterator, bool>
__hash_table<_Tp, _Hash, _Equal, _Alloc>::__emplace_unique_key_args(_Key const& __k, _Args&&... __args) {
    size_t __hash = hash_function()(__k);
    size_type __bc = bucket_count();
    bool __inserted = false;
    __next_pointer __nd;
    size_t __chash;

    if (__bc != 0) {
        __chash = __constrain_hash(__hash, __bc);
        __nd = __bucket_list_[_chash];
        if (__nd != nullptr) {
            for (__nd = __nd->next; __nd != nullptr && (__nd->__hash() == __hash || __constrain_hash(__nd->__hash(), __bc) == __chash);
                    __nd = __nd->__next_) {
                if (key_eq()(__nd->__upcast()->__value, __k))
                    goto __done;
            }
        }
    }
    {
        __node_holder __h = __construct_node_hash(__hash, std::forward<_Args>(__args)...);
        if (size() + 1 > __bc * max_load_factor() || __bc == 0) {
            // rehash
        }

        __next_pointer __pn = __bucket_list_[__chash];
        if (__pn == nullptr) {
            __pn = __p1_.first().__ptr();
            __h->__next_ = __pn->__next_;
            __pn->__next_ = __h.get()->__ptr();

            __bucket_list_[__chash] = __pn;
            if (__h->__next_ != nullptr)
                __bucket_list_[__constrain_hash(__h->__next_->__hash(), __bc)] = __h.get()->__ptr();
        }
        else {
            __h->__next_ = __pn->__next_;
            __pn->__next_ = static_cast<__next_pointer>(__h.get());
        }
        __nd = static_cast<__next_pointer>(__h.release());
        ++size();
        __inserted = true;
    }
__done:
    return pair<iterator, bool>(iterator(__nd), __inserted);
}

小结

对unordered_map的到此算是告一段落，至于其余三个unordered容器，它们的实现方法大同小异，就不一一赘述了。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,772评论 6赞 477
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,458评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,610评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,640评论 1赞 276
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,657评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,590评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,962评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,631评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,870评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,611评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,704评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,386评论 4赞 319
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,969评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,944评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,179评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 44,742评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,440评论 2赞 342