A structure that maps keys to values is a hash table.

That Every element has equal probability of hashing into any of the slots is simple uniform hashing.

A function that computes the location of the key in the array is a hash function.

Standard deletion cannot be performed in an open addressing hash table, because the cells might have caused collision. Hence, the hash tables implement lazy deletion.

 The average retrieval time when n keys hash to the same slot is given by Theta(n) as the collision occurs in the hash table.

基本要素

重点:数据字段

TableSize:表的大小

哈希函数:理想情况下,该函数应易于计算,并应确保任何两个不同的键得到不同的单元格。每个键被映射到范围O到tablesize -1中的某个数字,并放置在适当的单元格中。

哈希的基本思想:处理函数的选择,决定当碰撞发生时该做什么,决定表的大小。

特殊情况

A collision occurs when we hash two nonidentical identifiers into the same bucket, i.e.  f ( i1 ) = f ( i2 ) when i1 != i2 .

冲突发生在当我们把两个不相同的标识符放入一个桶中。

An overflow occurs when we hash a new identifier into a full bucket.

过载发生在当我们把新的标识符放入一个满的桶中。

解决冲突方法

开放寻址法

线性探查法linear probing

把数据存放在下一个非空单元内

f(i)=i is the correct function definition for linear probing.

Primary collision occurs due to linear probing technique. It is overcome using a quadratic probing technique.

平方探查法quadratic probing

在发生冲突的单元d[i],加上探测次数的平方,如果找到的单元为空,放入数据。

例如第五次探查,如果d[i]+5^2单元为空,放入数据

双散列函数探查法double hashing

 

 Note:

1. If double hashing is correctly implemented, simulations imply that the expected number of probes is almost the same as for a random collision resolution strategy.如果双哈希是正确实现的,仿真表明预期的探测数量几乎是相同的随机碰撞解决策略。           

2. Quadratic probing does not require the use of a second hash function and is thus likely to be simpler and faster in practice.二次探测不需要使用第二个哈希函数,因此在实践中可能更简单、更快。

伪随机探查法

The load factor for an open addressing technique should be 0.5. For separate chaining technique, the load factor is 1

条件:1. f ( x ) must be easy to compute and minimizes the number of collisions.

           2.f ( x ) should be unbiased.  That is, for any x and any i, we have that Probability( f ( x ) = i ) = 1 / b.  

 

 

 

 图样:

重哈希rehashing

重哈希不是开放性址法Rehashing is not a collision resolution strategy for open addressing.

Clustering is not a theoretical problem but it occurs in implementations of hashing. Rehashing is a kind of hashing.

怎么做

1. Build another table that is about twice as big; 构建另一个大约两倍大的表;

2. Scan down the entire original hash table for non-deleted elements; 扫描整个原始哈希表中未删除的元素;

3. Use a new function to hash those elements into the new table.使用一个新函数将这些元素散列到新表中。

什么时候重哈希When to rehash:

1.As soon as the table is half full 等表格满一半的时候

2.When an insertion fails 当插入失败时

3.When the table reaches a certain load factor当表达到一定的负载系数时

Note: Usually there should have been N/2 insertions before rehash, so O(N) rehash only adds a constant cost to each insertion. However, in an interactive system, the unfortunate user whose insertion caused a rehash could see a slowdown.通常在重哈希之前应该有N/2次插入,所以O(N)重哈希只会给每次插入增加一个常数的代价。然而,在一个交互系统中,不幸的用户的插入导致了重新hash,可能会看到减速。

链地址法

公式:

建立公共溢出区

文章链接

评论可见,请评论后查看内容,谢谢!!!评论后请刷新页面。