Java-ConcurrentHashMap中英文注释:二,开始

Overview:


    The primary design goal of this hash table is to maintain

    concurrent readability (typically method get(), but also

    iterators and related methods) while minimizing update

    contention. Secondary goals are to keep space consumption about

    the same or better than java.util.HashMap, and to support high

    initial insertion rates on an empty table by many threads.


    This map usually acts as a binned (bucketed) hash table.  Each

    key-value mapping is held in a Node.  Most nodes are instances

    of the basic Node class with hash, key, value, and next

    fields. However, various subclasses exist: TreeNodes are

    arranged in balanced trees, not lists.  TreeBins hold the roots

    of sets of TreeNodes. ForwardingNodes are placed at the heads

    of bins during resizing. ReservationNodes are used as

    placeholders while establishing values in computeIfAbsent and

    related methods.  The types TreeBin, ForwardingNode, and

    ReservationNode do not hold normal user keys, values, or

    hashes, and are readily distinguishable during search etc

    because they have negative hash fields and null key and value

    fields. (These special nodes are either uncommon or transient,

    so the impact of carrying around some unused fields is

    insignificant.)


    The table is lazily initialized to a power-of-two size upon the

    first insertion.  Each bin in the table normally contains a

    list of Nodes (most often, the list has only zero or one Node).

    Table accesses require volatile/atomic reads, writes, and

    CASes.  Because there is no other way to arrange this without

    adding further indirections, we use intrinsics

    (sun.misc.Unsafe) operations.


    We use the top (sign) bit of Node hash fields for control

    purposes -- it is available anyway because of addressing

    constraints.  Nodes with negative hash fields are specially

    handled or ignored in map methods.


    Insertion (via put or its variants) of the first node in an

    empty bin is performed by just CASing it to the bin.  This is

    by far the most common case for put operations under most

    key/hash distributions.  Other update operations (insert,

    delete, and replace) require locks.  We do not want to waste

    the space required to associate a distinct lock object with

    each bin, so instead use the first node of a bin list itself as

    a lock. Locking support for these locks relies on builtin

    "synchronized" monitors.


    Using the first node of a list as a lock does not by itself

    suffice though: When a node is locked, any update must first

    validate that it is still the first node after locking it, and

    retry if not. Because new nodes are always appended to lists,

    once a node is first in a bin, it remains first until deleted

    or the bin becomes invalidated (upon resizing).


    The main disadvantage of per-bin locks is that other update

    operations on other nodes in a bin list protected by the same

    lock can stall, for example when user equals() or mapping

    functions take a long time.  However, statistically, under

    random hash codes, this is not a common problem.  Ideally, the

    frequency of nodes in bins follows a Poisson distribution

    (http://en.wikipedia.org/wiki/Poisson_distribution) with a

    parameter of about 0.5 on average, given the resizing threshold

    of 0.75, although with a large variance because of resizing

    granularity. Ignoring variance, the expected occurrences of

    list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The

    first values are:


    0:    0.60653066

    1:    0.30326533

    2:    0.07581633

    3:    0.01263606

    4:    0.00157952

    5:    0.00015795

    6:    0.00001316

    7:    0.00000094

    8:    0.00000006

    more: less than 1 in ten million


    Lock contention probability for two threads accessing distinct

    elements is roughly 1 / (8 * #elements) under random hashes.


    Actual hash code distributions encountered in practice

    sometimes deviate significantly from uniform randomness.  This

    includes the case when N > (1<<30), so some keys MUST collide.

    Similarly for dumb or hostile usages in which multiple keys are

    designed to have identical hash codes or ones that differs only

    in masked-out high bits. So we use a secondary strategy that

    applies when the number of nodes in a bin exceeds a

    threshold. These TreeBins use a balanced tree to hold nodes (a

    specialized form of red-black trees), bounding search time to

    O(log N).  Each search step in a TreeBin is at least twice as

    slow as in a regular list, but given that N cannot exceed

    (1<<64) (before running out of addresses) this bounds search

    steps, lock hold times, etc, to reasonable constants (roughly

    100 nodes inspected per operation worst case) so long as keys

    are Comparable (which is very common -- String, Long, etc).

    TreeBin nodes (TreeNodes) also maintain the same "next"

    traversal pointers as regular nodes, so can be traversed in

    iterators in the same way.


    The table is resized when occupancy exceeds a percentage

    threshold (nominally, 0.75, but see below).  Any thread

    noticing an overfull bin may assist in resizing after the

    initiating thread allocates and sets up the replacement array.

    However, rather than stalling, these other threads may proceed

    with insertions etc.  The use of TreeBins shields us from the

    worst case effects of overfilling while resizes are in

    progress.  Resizing proceeds by transferring bins, one by one,

    from the table to the next table. However, threads claim small

    blocks of indices to transfer (via field transferIndex) before

    doing so, reducing contention.  A generation stamp in field

    sizeCtl ensures that resizings do not overlap. Because we are

    using power-of-two expansion, the elements from each bin must

    either stay at same index, or move with a power of two

    offset. We eliminate unnecessary node creation by catching

    cases where old nodes can be reused because their next fields

    won't change.  On average, only about one-sixth of them need

    cloning when a table doubles. The nodes they replace will be

    garbage collectable as soon as they are no longer referenced by

    any reader thread that may be in the midst of concurrently

    traversing table.  Upon transfer, the old table bin contains

    only a special forwarding node (with hash field "MOVED") that

    contains the next table as its key. On encountering a

    forwarding node, access and update operations restart, using

    the new table.


    Each bin transfer requires its bin lock, which can stall

    waiting for locks while resizing. However, because other

    threads can join in and help resize rather than contend for

    locks, average aggregate waits become shorter as resizing

    progresses.  The transfer operation must also ensure that all

    accessible bins in both the old and new table are usable by any

    traversal.  This is arranged in part by proceeding from the

    last bin (table.length - 1) up towards the first.  Upon seeing

    a forwarding node, traversals (see class Traverser) arrange to

    move to the new table without revisiting nodes.  To ensure that

    no intervening nodes are skipped even when moved out of order,

    a stack (see class TableStack) is created on first encounter of

    a forwarding node during a traversal, to maintain its place if

    later processing the current table. The need for these

    save/restore mechanics is relatively rare, but when one

    forwarding node is encountered, typically many more will be.

    So Traversers use a simple caching scheme to avoid creating so

    many new TableStack nodes. (Thanks to Peter Levart for

    suggesting use of a stack here.)


    The traversal scheme also applies to partial traversals of

    ranges of bins (via an alternate Traverser constructor)

    to support partitioned aggregate operations.  Also, read-only

    operations give up if ever forwarded to a null table, which

    provides support for shutdown-style clearing, which is also not

    currently implemented.


    Lazy table initialization minimizes footprint until first use,

    and also avoids resizings when the first operation is from a

    putAll, constructor with map argument, or deserialization.

    These cases attempt to override the initial capacity settings,

    but harmlessly fail to take effect in cases of races.


    The element count is maintained using a specialization of

    LongAdder. We need to incorporate a specialization rather than

    just use a LongAdder in order to access implicit

    contention-sensing that leads to creation of multiple

    CounterCells.  The counter mechanics avoid contention on

    updates but can encounter cache thrashing if read too

    frequently during concurrent access. To avoid reading so often,

    resizing under contention is attempted only upon adding to a

    bin already holding two or more nodes. Under uniform hash

    distributions, the probability of this occurring at threshold

    is around 13%, meaning that only about 1 in 8 puts check

    threshold (and after resizing, many fewer do so).


    TreeBins use a special form of comparison for search and

    related operations (which is the main reason we cannot use

    existing collections such as TreeMaps). TreeBins contain

    Comparable elements, but may contain others, as well as

    elements that are Comparable but not necessarily Comparable for

    the same T, so we cannot invoke compareTo among them. To handle

    this, the tree is ordered primarily by hash value, then by

    Comparable.compareTo order if applicable.  On lookup at a node,

    if elements are not comparable or compare as 0 then both left

    and right children may need to be searched in the case of tied

    hash values. (This corresponds to the full list search that

    would be necessary if all elements were non-Comparable and had

    tied hashes.) On insertion, to keep a total ordering (or as

    close as is required here) across rebalancings, we compare

    classes and identityHashCodes as tie-breakers. The red-black

    balancing code is updated from pre-jdk-collections

    (http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)

    based in turn on Cormen, Leiserson, and Rivest "Introduction to

    Algorithms" (CLR).


    TreeBins also require an additional locking mechanism.  While

    list traversal is always possible by readers even during

    updates, tree traversal is not, mainly because of tree-rotations

    that may change the root node and/or its linkages.  TreeBins

    include a simple read-write lock mechanism parasitic on the

    main bin-synchronization strategy: Structural adjustments

    associated with an insertion or removal are already bin-locked

    (and so cannot conflict with other writers) but must wait for

    ongoing readers to finish. Since there can be only one such

    waiter, we use a simple scheme using a single "waiter" field to

    block writers.  However, readers need never block.  If the root

    lock is held, they proceed along the slow traversal path (via

    next-pointers) until the lock becomes available or the list is

    exhausted, whichever comes first. These cases are not fast, but

    maximize aggregate expected throughput.


    Maintaining API and serialization compatibility with previous

    versions of this class introduces several oddities. Mainly: We

    leave untouched but unused constructor arguments refering to

    concurrencyLevel. We accept a loadFactor constructor argument,

    but apply it only to initial table capacity (which is the only

    time that we can guarantee to honor it.) We also declare an

    unused "Segment" class that is instantiated in minimal form

    only when serializing.


    Also, solely for compatibility with previous versions of this

    class, it extends AbstractMap, even though all of its methods

    are overridden, so it is just useless baggage.


    This file is organized to make things a little easier to follow

    while reading than they might otherwise: First the main static

    declarations and utilities, then fields, then main public

    methods (with a few factorings of multiple public methods into

    internal ones), then sizing methods, trees, traversers, and

    bulk operations.

概述:

这个哈希表的主要设计目标是维护

并发可读性(通常是方法get(),但是

迭代器和相关方法),同时最小化更新

争论。次要目标是保持空间消耗

相同或优于java.util.HashMap文件,并支持高

许多线程在空表上的初始插入速率。

这个映射通常充当一个装箱的哈希表。每个

键值映射保存在节点中。大多数节点都是实例

包含hash、key、value和next的基本节点类的

领域。然而,存在着不同的子类:TreeNodes是

排列在平衡的树中,而不是列表中。树根长在树上

一组树节点。转发节点放置在头部

重新调整大小期间的箱子数量。ReservationNodes用作

在ComputeFabSent和

相关方法。TreeBin、ForwardingNode和

ReservationNode不保存普通用户键、值或

散列,并且在搜索过程中容易区分

因为它们有负的散列字段和空的键和值

领域。(这些特殊节点要么不常见,要么短暂,

因此,携带一些未使用的土地的影响是

无关紧要。)

表被惰性地初始化为

第一次插入。表中的每个箱子通常包含一个

节点列表(通常,列表只有零个或一个节点)。

表访问需要易失性/原子性读、写和

案例。因为没有别的办法

添加更多的间接指令,我们使用内部函数

(sun.misc.不安全)操作。

我们使用节点散列字段的顶部(符号)位进行控制

目的——由于寻址,它仍然可用

约束条件。具有负散列字段的节点

在映射方法中处理或忽略。

将第一个节点(通过put或其变体)插入

清空垃圾箱只需将其装入垃圾箱即可。这是

到目前为止,将操作置于

密钥/散列分布。其他更新操作(插入,

删除和替换)需要锁。我们不想浪费时间

将不同的锁对象与关联所需的空间

每个bin,因此使用bin列表本身的第一个节点作为

一把锁。对这些锁的锁定支持依赖于内置的

“同步”监视器。

将列表的第一个节点用作锁本身并不能

不过,这就足够了:当一个节点被锁定时,必须首先进行任何更新

验证它是否仍然是锁定后的第一个节点,以及

否则重试。因为新节点总是附加到列表中,

一旦一个节点是bin中的第一个节点,它将保持第一个节点,直到被删除

或者箱子失效(在调整大小时)。

per-bin锁的主要缺点是

在受相同文件保护的bin列表中的其他节点上的操作

锁定可能会暂停,例如当user equals()或映射时

函数需要很长时间。然而,从统计上看

随机散列码,这不是一个常见的问题。理想情况下

箱中节点的频率服从泊松分布

(http://en.wikipedia.org/wiki/Poisson\u分布)带着

给定调整阈值,参数平均约为0.5

0.75,但由于调整大小而有较大差异

粒度。忽略方差

列表大小k是(exp(-0.5)*pow(0.5,k)/阶乘(k))。这个

第一个值是:

0: 0.60653066

1: 0.30326533

2: 0.07581633

3: 0.01263606

4: 0.00157952

5: 0.00015795

6: 0.00001316

7: 0.00000094

8: 0.00000006

更多:不到千万分之一

两个线程访问不同进程的锁争用概率

在随机散列下,元素大约是1/(8*#个元素)。

实践中遇到的实际哈希代码分布

有时明显偏离均匀随机性。这个

包括N>(1<<30)时的大小写,因此某些键必须发生碰撞。

类似地,对于多个键都是

设计为具有相同的散列码或仅不同的散列码

隐藏的高位。所以我们使用了一个次要的策略

当存储箱中的节点数超过

门槛。这些树状结构使用一个平衡树来保存节点(一个

红黑树的特化形式),搜索时间限定到

O(对数N)。树索引中的每个搜索步骤至少是

在常规列表中缓慢,但考虑到N不能超过

(1<<64)(地址用完之前)此边界搜索

步骤,锁保持时间等,以合理的常数(大致

每个操作检查100个节点(最坏情况下),只要键

是可比较的(这是非常常见的——字符串、长字符串等)。

树节点(TreeNodes)也是主要的

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,189评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,577评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,857评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,703评论 1 276
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,705评论 5 366
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,620评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,995评论 3 396
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,656评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,898评论 1 298
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,639评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,720评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,395评论 4 319
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,982评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,953评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,195评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 44,907评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,472评论 2 342

推荐阅读更多精彩内容