8. 全域哈希和完全哈希

A fundamental weakness of hashing:
For any choice of hash function, there exists a bad set of keys that all hash to the same slot.
避免方法:Randomness
The idea is that you choose a hash function at random.
The name of the scheme is universal hashing.

Universal Hashing

Def. Let U be a universe of keys and H be a finite collection of hash functions mapping U to the slots in our hash table.
H is universal: if ∀x:∀y: x∈U ∧ y∈U ∧ x≠y, | {h∈H: h(x) = h(y)} | = |H|/m

Theorem

If we choose h randomly from the set of hash functions H and then we suppose we're hashing n keys into m slots in table T. Then for given key x, the expected number of collision with x = n/m = α(the load factor of the table)

proof

Let Cx be the random variable denoting the total number of collisions of keys in T with x.
C_{xy}=\begin{cases} 1, h(x) = h(y)\\ 0, otherwise \end{cases}
Note: E[Cxy] = 1/m and Cx = \sum_{y∈T*{x}}{C_{xy}}

Constructing a universal hash function

Let m be prime. Decompose key k into r + 1 digits:
k=<k0, k1, ..., kr> where 0 < ki <= m-1
Pick a = <a0, a1, ..., ar>, each ai is chosen randomly from {0, 1, ..., m-1}
Define ha(k) = \left(\sum_{i = 0}^r{a_ik_i}\right)mod(m)
How big is H? |H| = mr+1

Theorem

H is universal

proof

Let x = <x0, x1, ..., xr>
y = <y0, y1, ..., yr> be distinct keys.

They differ in at least one digit, without loss of generality position 0. For how many ha ∈ H do x and y collide?
Must have ha(x) = ha(y).
⇒\sum_{i=0}^r{a_ix_i}\equiv\sum_{i=0}^r{a_iy_i}(mod(m))
⇒\sum_{i=0}^r{a_i(x_i-y_i)}\equiv0(mod(m))
⇒a_0(x_0-y_0) + \sum_{i=1}^r{a_i(x_i-y_i)}\equiv0(mod(m))
⇒a_0(x_0-y_0) \equiv -\sum_{i=1}^r{a_i(x_i-y_i)}(mod(m))
⇒a_0 \equiv (-\sum_{i=1}^r{a_i(x_i-y_i)})*(x_0-y_0)^{-1}(mod(m))
Thus for any choices of a1, a2, ... , ar, exactly 1 of the m choices for a0 causes x and y to collide, and no collision for other m-1 choices for a0.
has that cause x, y to collides = mr = |H|/m

Number theory fact

Let m be prime for any z∈Zm(integers mod m) such that z \not\equiv 0, ∃ unique z-1 ∈ Zm such that z*z-1 \equiv 1 (mod(m))

Perfect Hashing

Problem: Given n keys, construct a static hash table of size m = O(n) such that search takes O(1) time in the worst case.
Idea: 2-level scheme with universal hashing at both levels.
No collision at level 2.
The reason why we don't have collisions in the second level:
If there are ni item that hash to level one slot i, then we're going to use mi = ni2 in the level two hash table. Under these circumstances, it's easy to find hash functions such that there are no collisions.

Level 2 analysis

Theorem

Hash n keys into m = n2 slots, using a random hash function in a universal set H. Then the expected number of collision is less than on half.

proof

The probability that two given keys collide under h is 1/(n2).
There are n(n-1)/2 pairs of keys.
E[#collisions] = (n(n-1)/2)*(1/(n2)) = n(n-1)/(2n2) < 1/2

Markov's Inequality

For random variable X \geq 0, Pr{X \geq t} \leq E[X]/t
proof:
E[x] = \sum_{x=0}^\infty{x*Pr\{X = \})} \geq \sum_{x=t}^\infty{x*Pr\{X = x\}}
\geq \sum_{x=t}^\infty{t*Pr\{X = x\}} = t*Pr\{x \geq t\}

Corollary

Pr{no collisions} \geq 1/2
proof Pr{\geq 1 cooision} \leq E[#collisoins]/1 < 1/2
To find a good level-2 hash function, just test a few at random. Find one quickly, since at least half will work.

Analysis of storage

For level 1, choose m = n, and let ni be the random variable for the number for #keys that hash to slot i in T. Let mi = ni2 slots in each level 2 table S sub i.
E[total storage] = n + E[\sum_{i=0}^{m-1}{θ(n_i^2)}] = θ(n) by bucket sort analysis.

(⇒⇔↔¬∧∨∀∃∈⊂⊃∪∩→← )

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,362评论 5 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,330评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,247评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,560评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,580评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,569评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,929评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,587评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,840评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,596评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,678评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,366评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,945评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,929评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,165评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,271评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,403评论 2 342

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,279评论 0 10
  • 曾经记忆里最深刻的是一头红发的樱木花道大喊“我是天才”,那是一代人的童年。 当你关注某个话题之后,会发现看到的满是...
    艾字阅读 1,280评论 0 51
  • 这是中乔大三农集团的领头人红旗渠精神传承人乔书领董事长来到会场亲切的和各位客户打招呼。每位客户见到乔总也是兴高彩烈...
    hopesh阅读 215评论 0 0