散列表数据结构模拟实现哈希表超详解（C++）

 hello！我是bug。前面几期我们讲解了树的相关结构，今天我们就要进入哈希表的学习了：（代码可能会有一点问题，请各位老铁指正   ）

文章目录

一、哈希 1、什么是哈希？ 2、直接定址法3、除留余数法（重点）4、解决哈希冲突 闭散列 （1）线性探测  （2）二次探测

 开散列

二、闭散列（哈希）的模拟实现三、开散列（哈希）的模拟实现

一、哈希

在前面，我们学习了红黑树、AVL树的相关性质，他们增删查改的时间复杂度都是O（logN），这个效率已经是比较高的了。但是有没有一种数据结构，（不需要进行比较就可以直接增删查改）它进行增删查改的时间复杂度为O（1）呢？下面我们就要介绍这种数据结构——哈希表。

 1、什么是哈希？

哈希（Hash）：散列函数，其本质是映射。

 那么映射又是什么呢？如下图：（例如医院的挂号系统）其中我们存入一对键值，通过序号我们就可以找到人员。而序号和姓名之间就是一种映射关系。（也就是KV模型）

 哈希表（HashTable）：是根据关键码值(Key value)而直接进行访问的数据结构。通过把关键码值映射到表中一个位置来访问记录，以加快查找的速度。这个映射函数叫做散列函数，存放记录的数组叫做散列表。通过这种映射关系，那么我们就可以直接通过key值直接找的相应的数据，插入和删除数据时也可以直接根据key值进行插入删除，其时间复杂度就是O（1）。

作用：哈希表在增删查改上的效率都是十分优秀的，一般我们都是通过哈希表来快速查找数据是否出现在表中。

哈希冲突：在绝大多数情况下，我们都会面临一个问题，那就是如何使一个位置映射一个数据。当我们开辟一个数组时，我们是采用下标来进行映射的，即每一个下标都会映射数据。但是当多个数据都映射到同一个位置时，那么就产生了哈希冲突。对于哈希冲突我们是要尽量避免的，因为哈希冲突会降低我们增删查改的效率。（具体原因下面再介绍）

哈希表的K模型和KV模型：和前面的红黑树（AVL树）一样，所以哈希表也可以封装map和set。C++中的unordered_map和unordered_set的底层就是哈希表。

缺点：当我们要对大量数据进行操作时，要开辟大量的空间来存储数据。而且，散列函数的采用也是至关重要的（尽量避免哈希冲突），那么我们要采取哪种散列函数来进行映射呢？（下面进行介绍）

 2、直接定址法

假设有这样一个数组，我们要将它存入哈希表中：

我们直接根据元素的实际大小进行插入。如上图，即每一个下标都对应一个相应的数据，没有对应的数据就为空。

 注意 ❗️ ❗️

 （1）开辟数组的时候，一定要知道数据的范围，并根据数据的范围开辟数组，如果数据的大小超过了数组的大小就会越界。 （2）采用直接定址法一定要数据比较集中，因为最大最小数据如果相差太大，那么浪费的空间也会很多。 （3）如果我们存储的是整型，那么不需要担心哈希冲突，因为每个数值都对应唯一的位置。

 但是，当我们存入的数据不是整型的时候我们就要考虑如何来进行映射？  其实在大多数情况下，我们采用unordered_map和unordered_set存储的都是字符串，那我们如何将字符串和下标联系起来呢？

可能你会觉得字符都有对应的ASCII码，我们可以用字符的ASCII来进行对应。但是这种情况下就不可避免的会出现哈希冲突，因为只要我们不规定字符串的长度，那么字符串就是无穷无尽的，通过其ASCII来对应是一定会产生冲突的。（对于哈希冲突的解决我们下面再来进行详细的介绍）

3、除留余数法（重点）

当一组数据中的最大和最小相差比较大时，就会浪费大量的空间。所以直接定址法就不再适用了，这时候我们就要采用除留余数法。

 什么是除留余数法呢？假设有这样一个数组，如下图：  如何通过除留余数法将数据存入哈希表中呢？

即我们将每一个数据对表的大小进行取模，取模过后数据就可以通过下标来映射相应的位置。

所以插入后数据如图：

除留余数法可以很好的避免空间大量的浪费，所以一般我们都采用除留余数法。当然，除留余数法也会产生哈希冲突。

除了直接定址法和除留余数法，我们还有其他的方法比如：平方取中法、折叠法、随机数法、数学分析法。（但是这些方法不常用，这里不做介绍）

 注意❗️ ❗️

 哈希函数设计的越精妙，产生哈希冲突的可能性就越低，但是无法避免哈希冲突。

  下面就是实现重点了，嘿嘿。

4、解决哈希冲突

上面直接定址法和除留余数法都会产生哈希冲突的问题，下面我们就来解决：

 闭散列

闭散列：也叫做开放定址法，当发生哈希冲突时，如果哈希表未被装满，说明在哈希表中必然还有空位置，那么可以把key存放到冲突位置中的“下一个” 空位置中去。那如何寻找下一个空位置呢？

 （1）线性探测

线性探测：线性探测是计算机程序解决哈希表冲突时所采取的一种策略。

还是这个数组：

我们通过除留余数法，可以算得8和15取模后，其对应同一个位置，那么我们从这个位置开始，一个一个往后面去探测，直到找到一个空位置，就将数据插入。

但是会有这种情况，即：取模后发生冲突，然后向后面探测，直到探测到最后一个位置都没有发现空位置，那么我们就再从头开始进行探测。

（线性探测中探测的终点就是取模后对应的位置，因为当我们从开始取模位置探测，探测到最后一个位置都没有，那么我们再从头开始探测，如果还是没有，那么会又回到取模位置。所以我们表的大小一定要超过数据的个数。）

 优点：十分简单，好操作。

 缺点：会造成成片冲突。一旦发生哈希冲突，所有的冲突连在一起，容易产生数据“堆积”。（如下图）

如下数组：其中每个数据取模后都对应下标为1的位置，通过线性探测插入后，如下图：当我们进行删除查找的时候，都是通过取模找到位置1，如果位置1不是我们要查找的数据，就会向后面遍历，直到找到数据为止。这也是为什么哈希冲突的发生会造成效率下降，而当大量的哈希冲突扎堆时，其效率更会大幅度下降。所以，也就引出了下面的二次探测来缓解成堆的哈希冲突。

 （2）二次探测

二次探测：线性探测的缺陷是产生冲突的数据堆积在一块，这与其找下一空位置有关系，因为找空位置的方式就是挨着往后逐个去找，因此二次探测为了避免该问题，找下一个空位置的方法为： Hi= (H0 + i2)% m,或者：Hi = (H0 - i2)% m。其中：i = 1,2,3…，是通过散列函数Hash(x)对元素的关键码 key 进行计算得到的位置，m是表的大小。

还是看图，有这样一组数据：

通过二次探测后，即15、22和8都对应位置1，先插入15，接着我们插入22时，二次探测，从i^2开始向后找空位置。i = 1时，就是取模位置的下一个，插入22 。插入8，向后面探测，i = 1时，已经插入了22。那么再探测i = 2时，也就是第一次位置的后面第四个位置，插入8 。

 注意❗️ ❗️

 二次探测可以防止成堆的哈希冲突都连续存储，缓解了效率的下降。

 开散列

开散列：也叫做拉链法、哈希桶。首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。其本质就是一个指针数组，数组中存储的是单链表的首元素地址。

还是这个数组：

我们要将它插入哈希表（开散列）中，如下图：当我们插入15、22和8时，取模计算位置，然后采用头插的方式，直接插入。发生哈希冲突的元素就通过链表进行存储。

二、闭散列（哈希）的模拟实现

 问题一：哈希表内部有什么结构？（迭代器+哈希结点+哈希表。）

迭代器：正向迭代器内部成员有结点指针+哈希表的指针，因为结点和结点之间没有联系，我们要从一个结点跳转到下一个结点就要通过哈希表来进行调整。所以需要哈希表的指针，同时传参过程可以很巧妙地利用this指针完成。反向迭代器通过封装正向迭代器来实现，闭散列的哈希表可以完成自减的重载，所以反向迭代器可以实现。哈希结点：成员变量有data数据，state状态。哈希表：封装哈希结点和迭代器于一体。

  问题二：当我们插入删除数据时，怎么判断当前位置是否可以操作？使用一个枚举类型，里面包含EMPTY、EXITS、DELETE三种状态。默认状态为EMPTY，插入数据后状态变为EXITS，删除数据时，只要把状态改成DELETE即可。

  是不是会觉得很困惑，为什么存在三种状态？DELETE有什么作用呢？

假设有一个哈希表，如下图：当我们删除数据22时，将其置为EMPTY，但是会影响查找过程。查找时先计算数据的下标，如果当前位置数据不是查找的数据，那么向后依次查找，直到查到当前位置被标记为EMPTY时停止。在我们查找的过程中，如果删除数据就将其置EMPTY，当发生扎堆的哈希冲突，删除冲突中间的一个数据，就会影响对后面冲突数据的查询，所以要用DELETE进行标识，这样不会影响查找过程。如下图：

当我们查找8的时候，计算下标为1 。但是如果22被标记成EMPTY，那么查找到22这个位置时就停止，但是如果是DELETE，那么会继续向后面查找。

  问题三：怎么控制哈希表的大小？当我们的表过小时，会产生大量的哈希冲突，效率会大幅度下滑。但是当我们的表过大时，哈希冲突得到了缓解，但是大量的空间又会被浪费。所以这里我们引入负载因子，负载因子即：有效的数据的数量/哈希表的长度，当负载因子超过0.7的时候，我们就要给哈希表扩容。为什么选中0.7作为标准值呢？在0.7的时候，哈希冲突和空间浪费之间得到了平衡，算的上是比较好的选择了。当我们对哈希表进行扩容后，映射关系要进行调整。因为扩容的本质是重新开辟更大的空间，再将数据拷贝过去。但是，当哈希表扩容，其映射关系也会发生改变。所以我们不能直接将数据原封不动的拷贝过去，而是要重新建立映射关系。在VS的编译器里，其采用了素数表。因为按照素数进行扩容，取模计算下标时不容易发生哈希冲突。（虽然我也不太清楚怎么计算的）

 问题四：哈希表有K模型和KV模型，我们如何实现呢？采用仿函数来取pair键对中的key，即如果是K模型，那么直接返回；如果是键对模型，那么选出里面的key再返回。

 问题五：字符串（比较常用）通过什么方式进行转化可以更好的避免哈希冲突呢？（这里我们也要采用仿函数来实现。）

（不可取）（1）首字符的ACSII码字符总共只有256个，如果只用首字符来映射的话，很容易产生哈希冲突。（2）字符串内所有字符ASCII码值之和所有字符ACSII码值之和也容易冲突，即字符串中的字符相同，但是顺序不同时，全部都会冲突。

对于字符串，一群大佬经过研究后得出了下面几种不容易冲突的算法：

 BKDR哈希算法⬇️ ⬇️ :

hash = hash * 131 + ch; // 也可以乘以31、131、1313、13131、131313..

 SDBM哈希算法⬇️ ⬇️ :

`hash = 65599 * hash + ch; `

 RS哈希算法⬇️ ⬇️ ：

template

size_t RSHash(const T *str)

{

size_t magic = 63689;

while (size_t ch = (size_t)*str++)

{

hash = hash * magic + ch;

magic *= 378551;

}

return hash;

}

（剩下的就不介绍了，有兴趣的可以去看看。网址：哈希算法）

 问题六：查找中怎么判断当前数据是否是我们需要的查找内容？如果是内置类型，直接用等号比较即可。但是如果是自定义类型，它的比较就需要我们自己来完成。例如：字符串的比较，我们可以用string自己的等号重载来判断。所以采用仿函数来完成这项内容，需要比较自定义类型，那么就自己来写仿函数进行判断。

  问题七：方括号（“[]”）的重载？在KV模型中，方括号的重载是很重要的。即可以直接使用方括号进行插入数据，或者对数据进行修改。进行重载中，调用insert函数进行插入，insert函数的返回是键对值。键对值中第一个参数迭代器类型，是插入位置的迭代器；第二个参数是布尔类型，判断插入是否成功。我们根据返回值中的迭代器参数可以获得插入位置的指针，然后再通过指针获得插入键对值中的V（这里插入过程中，V值为默认值），将V返回后，我们可以通过等号进行修改⬇️ ⬇️

V& operator[](const K& key)

{

return (insert(std::make_pair(key, V())).first._pnode->_data).second;

}

//使用方式

HTs["look"] = "看";

HTs.operator[]("eat") = "吃";

 其他问题：因为迭代器中要使用哈希表的指针，而哈希表又要使用迭代器，所以我们需要添加一个前置声明。迭代器中需要访问哈希表的私有成员_table，所以我们要把迭代器定义为哈希表的友元类。（不到万不得已，不使用友元，因为友元会破坏封装）

  HashTable（闭散列）的完整代码⬇️ ⬇️ ：

#pragma once

#include

using std::cin;

using std::cout;

using std::endl;

using std::vector;

using std::string;

using std::pair;

using std::make_pair;

//选出key

template

struct PairSelect1st

{

const K& operator()(const pair& kv) { return kv.first; }

};

template

struct KSelect1st

{

const K& operator()(const K& k) { return k; }

};

//转成整型

template

struct HashFunc

{

size_t operator()(const K& val) { return val; }

};

//模板的特化

template<>

struct HashFunc

{

size_t operator()(const std::string& s1)

{

size_t sum = 0;

for (size_t i = 0; i < s1.size(); i++)

{

sum = sum * 131 + s1[i];

}

return sum;

}

};

//比较判断

template

struct equal_to

{

bool operator()(const K& lval, const K& rval) { return lval == rval; }

};

template<>

//模板特化

struct equal_to

{

bool operator()(const std::string& s1, const std::string& s2) { return s1 == s2; }

};

//素数表

const int PRIMECOUNT = 28;

const size_t primeList[PRIMECOUNT] = {

53ul, 97ul, 193ul, 389ul, 769ul,

1543ul, 3079ul, 6151ul, 12289ul, 24593ul,

49157ul, 98317ul, 196613ul, 393241ul, 786433ul,

1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,

50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,

1610612741ul, 3221225473ul, 4294967291ul

};

namespace CloseHash

{

enum State { EMPTY, EXITS, DELETE };

template

struct HashData

{

T _data;//存储可能是k，可能是kv

State _state = EMPTY;//标识状态

HashData(const T& data = T(), State state = EMPTY) :_data(data), _state(state) {}

};

template

class HashTable;

template

, class Select1st = PairSelect1st, class HashFunc = HashFunc>

struct Iterator

{

typedef HashData HD;

typedef HashTable HT;

typedef Iterator self;

typedef Ref reference;

typedef Ptr pointer;

HD* _pnode;

HT* _pht;

Iterator(HD* pnode = nullptr, HT* pht = nullptr) :_pnode(pnode), _pht(pht) { }

bool operator!=(const self& it)const { return _pnode != it._pnode; }

bool operator==(const self& it)const { return _pnode == it._pnode; }

Ref operator*() { return _pnode->_data; }

Ptr operator->() { return &_pnode->_data; }

self& operator++()

{

size_t index = HashFunc()(Select1st()(_pnode->_data)) % _pht->_table.size();

HD* cur = &_pht->_table[index];

//找到_pnode的位置

while (_pnode != cur)

{

index++;

//如果到尾没找到，就从头开始找

if (index >= _pht->_table.size())

index = 0;

cur = &_pht->_table[index];

if (_pnode == cur)

break;

}

index++;

if (index >= _pht->_table.size())

{

_pnode = nullptr;

return *this;

}

//去找_pnode后面的一个数据

cur = &_pht->_table[index];

while (cur->_state != EXITS)

{

index++;

//如果下标不小于_table的大小时，遍历结束

if (index >= _pht->_table.size())

{

_pnode = nullptr;

return *this;

}

cur = &_pht->_table[index];

}

_pnode = cur;

return *this;

}

self operator++(int) { self tmp = *this; ++*this; return tmp; }

self& operator--()

{

int index = HashFunc()(Select1st()(_pnode->_data)) % _pht->_table.size();

HD* cur = &_pht->_table[index];

//找到_pnode的位置

while (_pnode != cur)

{

index++;

if (index >= _pht->_table.size())

index = 0;

cur = &_pht->_table[index];

if (_pnode == cur)

break;

}

index--;

if (index < 0)

{

_pnode = nullptr;

return *this;

}

//去找_pnode前面的一个数据

cur = &_pht->_table[index];

while (cur->_state != EXITS)

{

index--;

//如果index小于0，那么就置空返回。

if (index < 0)

{

_pnode = nullptr;

return *this;

}

cur = &_pht->_table[index];

}

_pnode = cur;

return *this;

}

self operator--(int) { self tmp = *this; --(*this); return tmp; }

};

template

struct Reverse_Iterator

{

typedef const iterator const_reverse_iterator;

typedef typename iterator::reference Ref;

typedef typename iterator::pointer Ptr;

typedef Reverse_Iterator self;

iterator _it;

Reverse_Iterator(iterator it) :_it(it) {}

Ref operator*() { return *_it; }

Ptr operator->() { return &(*_it) ; }

self& operator++() { _it--; return *this; }

self operator++(int) { self tmp = *this; _it--; return tmp; }

bool operator ==(const self& it) { return _it._pnode == it._it._pnode; }

bool operator != (const self& it) { return _it._pnode != it._it._pnode; }

};

template

,class Select1st = PairSelect1st, class HashFunc = HashFunc>

class HashTable

//参数有KVT（键对值），仿函数选择键对值的key，仿函数将非整型转换成整型,仿函数进行比较

{

template

friend struct Iterator;

typedef HashData HD;

typedef HashTable HT;

public:

typedef Iterator const_iterator;

typedef Iterator iterator;

typedef Reverse_Iterator const_reverse_iterator;

typedef Reverse_Iterator reverse_iterator;

private:

vector _table;

size_t _n;//记录有效数据的个数

public:

HashTable() :_n(0) { }

HashTable(const HT& h)

{

for (auto& e : h._table)

{

if(e._state == EXITS)

insert(e._data);

}

_n = h._n;

}

HT& operator=(const HT& h) { HT tmp(h); swap(tmp); return *this; }

iterator begin()

{

//如果迭代器为空直接返回

if (_n == 0)

return iterator(nullptr, this);

size_t index = 0;

HD* cur = &_table[index];

while (cur->_state != EXITS)

{

index++;

if (index >= _table.size())

return iterator(nullptr, this);

cur = &_table[index];

}

return iterator(cur, this);

}

iterator end() { return iterator(nullptr, this); }

const_iterator cbegin()

{

//如果迭代器为空直接返回

if (_n == 0)

return const_iterator(nullptr, this);

size_t index = 0;

HD* cur = &_table[index];

while (cur->_state != EXITS)

{

index++;

if (index >= _table.size())

return const_iterator(nullptr, this);

cur = &_table[index];

}

return const_iterator(cur, this);

}

const_iterator cend() { return const_iterator(nullptr, this); }

reverse_iterator rbegin()

{

//如果迭代器为空直接返回

if (_n == 0)

return reverse_iterator(iterator(nullptr, this));

size_t index = _table.size() - 1;

HD* cur = &_table[index];

while (cur->_state != EXITS)

{

index--;

if (index < 0)

return reverse_iterator(iterator(nullptr, this));

cur = &_table[index];

}

return reverse_iterator(iterator(cur, this));

}

reverse_iterator rend() { return reverse_iterator(iterator(nullptr, this)); }

const_reverse_iterator crbegin()

{

//如果迭代器为空直接返回

if(_n == 0)

return const_reverse_iterator(const_iterator(nullptr, this));

size_t index = _table.size() - 1;

HD* cur = &_table[index];

while (cur->_state != EXITS)

{

index--;

if (index < 0)

return const_reverse_iterator(const_iterator(nullptr, this));

cur = &_table[index];

}

return const_reverse_iterator(const_iterator(cur, this));

}

const_reverse_iterator crend() { return const_reverse_iterator(const_iterator(nullptr, this)); }

void swap(HT& h)

{

_table.swap(h._table);

std::swap(_n, h._n);

}

V& operator[](const K& key) { return (insert(std::make_pair(key, V())).first._pnode->_data).second; }

pair insert(const T& data)

{

//仿函数挑选

Select1st st1;

//仿函数转化

HashFunc hf;

if (!_table.size())

_table.resize(53ul);

//先判断是否key重复

iterator ret = find(data);

if (ret._pnode != nullptr)

return std::make_pair(ret,false);

//负载因子调节表的大小

if ((double)_n / (double) _table.size() >= 0.7)

{

HT tmp;

tmp._table.resize(GetNextPrime(_table.size()));

auto it = _table.begin();

while (it != _table.end())

{

if (it->_state == EXITS)

tmp.insert(it->_data);

it++;

}

_table.swap(tmp._table);

}

//插入过程

size_t i = 1;//探测的数据

size_t index = (hf(st1(data))) % _table.size();//下标计算

while (_table[index]._state == EXITS)

{

//线性探测

index += i;

//二次探测

//index += i * i;

//i++;

if (index >= _table.size())

index %= _table.size();

}

_table[index]._data = data;

_table[index]._state = EXITS;

_n++;

return std::make_pair(iterator(& _table[index],this),true);

}

iterator find(const T& data)

{

Select1st slt;

HashFunc hf;

size_t index = hf(slt(data)) % _table.size();

//相等为真,这里需要先判断一次，再进循环

if (Pred()(slt(_table[index]._data), slt(data)))

return iterator(&_table[index], this);

while(1)

{

//相等为真

if (Pred()(slt(_table[index]._data), slt(data)))

return iterator(&_table[index], this);

if (_table[index]._state == EMPTY)

break;

index++;

//超过了,就取模

if (index >= _table.size())

index %= _table.size();

}

return iterator(nullptr,this);

}

bool erase(const T& data)

{

if (!empty())

{

iterator ret = find(data);

if (ret._pnode != nullptr)

{

ret._pnode->_state = DELETE;

_n--;

return true;

}

return false;

}

//获得素数

size_t GetNextPrime(size_t prime)

{

size_t i = 0;

for (; i < PRIMECOUNT; i++)

{

if (primeList[i] > prime)

return primeList[i];

}

return primeList[i];

}

bool empty()const { return _n == 0; }

size_t size()const { return _n; }

};

}

  测试代码⬇️ ⬇️

//CloseHash(闭散列)

void Test_KV1()//KV模型：字典

{

CloseHash::HashTable

, equal_to,PairSelect1st> HTs;

pair arr[] = {

make_pair("left", "左边") ,make_pair("right", "右边"),make_pair("up", "向上")

,make_pair("down", "向下"),make_pair("left","左边"),make_pair("eat","吃")

,make_pair("sleep","睡觉"),make_pair("run","跑"),make_pair("jump","跳") };

for (auto e : arr)

HTs.insert(e);

//方括号测试

HTs["look"] = "看";

//非const迭代器

CloseHash::HashTable

, equal_to, PairSelect1st>::iterator it1 = HTs.begin();

while (it1 != HTs.end())

{

cout << it1->first << ":" << it1->second << endl;

it1++;

}

cout << endl;

//反向迭代器

auto rit1 = HTs.rbegin();

while (rit1 != HTs.rend())

{

cout << rit1->first << ":" << rit1->second << endl;

rit1++;

}

cout << endl;

//删除测试

HTs.erase(make_pair("left", "左边"));

HTs.erase(make_pair("down", "向下"));

//const迭代器

CloseHash::HashTable

, equal_to, PairSelect1st>::const_reverse_iterator crit1 = HTs.crbegin();

while (crit1 != HTs.crend())

{

cout << crit1->first << ":" << crit1->second << endl;

crit1++;

}

cout << endl;

}

void Test_K1()//K模型

{

CloseHash::HashTable, KSelect1st>HTs;

string arr[] = {

"left", "左边" ,"right", "右边","up", "向上"

,"down", "向下","left","左边","eat","吃"

,"sleep","睡觉","run","跑","jump","跳" };

for (auto e : arr)

HTs.insert(e);

//非const迭代器

CloseHash::HashTable

equal_to, KSelect1st>::iterator it = HTs.begin();

while (it != HTs.end())

{

cout << *it << endl;

it++;

}

cout << endl;

//删除功能

HTs.erase("r");

HTs.erase("up");

HTs.erase("dowm");

//const迭代器

auto cit = HTs.cbegin();

while (cit != HTs.cend())

{

cout << *cit << endl;

cit++;

}

cout << endl;

}

三、开散列（哈希）的模拟实现

开散列中许多内容和闭散列一致，这里我们主要谈谈它的区别：

  问题一：开散列需不需要状态来进行标识？开散列不需要标识，插入发生哈希冲突时，直接头插即可。删除结点，如果只有一个结点，直接释放后，将_table中的指针置空。如果有多个结点，释放结点后，直接将前一个结点和后一个结点相连。查找也只需要计算下标后，找到对应的桶，然后进行查找。

  问题二：开散列的迭代器如何实现？和闭散列的迭代器类似，即需要结点指针和哈希表的指针。遍历过程就是一个一个桶进行遍历，桶内遍历直接根据next指针即可。桶间的转换需要借助哈希表的指针。因为我们的哈希表采用的是vector+forward_list，所以不能够重载自减，即无法实现反向迭代器。（单链表不支持，双向链表可以支持）

  问题三：迭代器中的自增如何重载？先计算当前结点指针处于哪一个桶中，然后去桶中遍历，直到找到当前结点指针，然后再判断，如果为尾，那么我们就去找下一个桶，直到找到存储数据的桶，然后将我们的结点指针修改成桶中的第一个指针。

  问题四：极端情况应该怎么解决？极端情况发生的概率极低，而且发生之后即使效率大幅度降低也只是暂时的，只要继续插入数据，效率又会恢复。但是在java中对这种情况还是进行了处理，即每次插入数据之后都会统计链表的长度，如果链表中的结点个数达到了8个，那么就会将单链表转化成红黑树来进行存储。如果发生扩容，即重新映射后，结点的个数小于8，那么又会转化成单链表进行存储。

 问题五：开散列的平衡因子该怎么来控制？在开散列中其平衡因子设置为达到1时，进行扩容。即数据个数等于桶的个数。在开散列中，我们不用去担心数据的插入问题，因为不论多少个数据一定都可以插入，因为数据存储在链表中。但是当链表的长度过长时，其操作效率会大幅度下降，所以我们要控制好扩容时机，不能浪费过多的空间，同时也不能使效率下降的过多。

  HashTable（开散列）的完整代码⬇️ ⬇️

#pragma once

#include

using std::cin;

using std::cout;

using std::endl;

using std::vector;

using std::string;

using std::pair;

using std::make_pair;

//选出key

template

struct PairSelect1st

{

const K& operator()(const pair& kv) { return kv.first; }

};

template

struct KSelect1st

{

const K& operator()(const K& k) { return k; }

};

//转成整型

template

struct HashFunc

{

size_t operator()(const K& val) { return val; }

};

//模板的特化

template<>

struct HashFunc

{

size_t operator()(const std::string& s1)

{

size_t sum = 0;

for (size_t i = 0; i < s1.size(); i++)

{

sum = sum * 131 + s1[i];

}

return sum;

}

};

//比较判断

template

struct equal_to

{

bool operator()(const K& lval, const K& rval) { return lval == rval; }

};

template<>

//模板特化

struct equal_to

{

bool operator()(const std::string& s1, const std::string& s2) { return s1 == s2; }

};

//素数表

const int PRIMECOUNT = 28;

const size_t primeList[PRIMECOUNT] = {

53ul, 97ul, 193ul, 389ul, 769ul,

1543ul, 3079ul, 6151ul, 12289ul, 24593ul,

49157ul, 98317ul, 196613ul, 393241ul, 786433ul,

1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,

50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,

1610612741ul, 3221225473ul, 4294967291ul

};

namespace OpenHash

{

template

struct HashNode

{

typedef HashNode Node;

typedef HashNode* pNode;

HashNode* _next;

T _data;

public:

HashNode(const T& data = T())

:_next(nullptr)

,_data(data)

{}

};

template

class HashTable;

template

struct Iterator

{

typedef HashNode Node;

typedef HashTable HashTable;

typedef Iterator self;

Node* _pnode;

HashTable* _pHT;

Iterator(Node* pnode = nullptr, HashTable* pHT = nullptr) : _pnode(pnode), _pHT(pHT) { }

Ref operator*() { return _pnode->_data; }

const Ref operator*()const { return _pnode->_data; }

Ptr operator->() { return &_pnode->_data; }

const Ptr operator->()const { return &_pnode->_data; }

self& operator++()

{

if (_pnode == nullptr)

return *this;

if (_pnode->_next != nullptr)

{

_pnode = _pnode->_next;

return *this;

}

//_pnode->next == nullptr我们要去找现在的结点属于哪一个桶

size_t index = HashFunc()(Select1st()(_pnode->_data)) % _pHT->_table.size() + 1;

for (; index < _pHT->_table.size(); index++)

{

Node* cur = _pHT->_table[index];

if (cur != nullptr)

{

_pnode = cur;

return *this;

}

_pnode = nullptr;

return *this;

}

self operator++(int)

{

self tmp = *this;

++(*this);

return tmp;

}

bool operator!=(const self& it)const { return _pnode != it._pnode; }

bool operator==(const self& it)const { return _pnode == it._pnode; }

};

template

class Select1st = PairSelect1st, class HashFunc = HashFunc>

class HashTable

{

typedef HashNode* pNode;

typedef HashNode Node;

template

friend struct Iterator;

private:

//存结点指针

vector _table;

size_t _n;

public:

typedef Iterator const_iterator;

typedef Iterator iterator;

HashTable() :_n(0) { }

void clear()

{

for (size_t index = 0; index < _table.size(); index++)

{

pNode cur = _table[index];

pNode prev = cur;

while (cur)

{

prev = cur;

cur = cur->_next;

delete prev;

}

~HashTable()

{

clear();

}

iterator begin()

{

size_t index = 0;

for (; index < _table.size(); index++)

{

pNode cur = _table[index];

if (cur != nullptr)

return iterator(cur,this);

}

return iterator(nullptr, this);

}

iterator end() { return iterator(nullptr, this); }

const_iterator cbegin()

{

size_t index = 0;

for (; index < _table.size(); index++)

{

pNode cur = _table[index];

if (cur != nullptr)

return const_iterator(cur, this);

}

return const_iterator(nullptr, this);

}

const_iterator cend() { return const_iterator(nullptr, this); }

pair insert(const T& data)

{

//如果为空，则开空间

if (!_table.size())

_table.resize(53ul);

//挑选key

Select1st st1;

//转换整型

HashFunc hf;

//判断是否冗余

iterator ret = find(data);

if (ret._pnode != nullptr)

return std::make_pair(iterator(nullptr,this), false);

//判断是否需要扩容

if ((double)_n / (double)_table.size() >= 1)

{

vector new_table(GetNextPrime(_table.size()));

for (size_t i = 0; i < _table.size(); i++)

{

pNode cur = _table[i];

if (cur != nullptr)

{

pNode next = _table[i];

while (cur)

{

next = cur->_next;

size_t new_index = (hf(st1(cur->_data))) % new_table.size();

//头插

cur->_next = new_table[new_index];

new_table[new_index] = cur;

cur = next;

}

_table[i] = nullptr;

}

//不推荐，插入的时候重新创建结点，浪费

/*while(e != nullptr)

{

tmp.insert(e->_kv);

e = e->_next;

}*/

}

new_table.swap(_table);

}

//计算hashbucket的下标

size_t index = hf(st1(data)) % _table.size();

pNode newNode = new Node(data);

//头插

newNode->_next = _table[index];

_table[index] = newNode;

_n++;

return std::make_pair(iterator(newNode,this), true);

}

iterator find(const T& data)

{

HashFunc hf;

Select1st slt;

if (_table.size() == 0)

return iterator(nullptr,this);

size_t index = hf(slt(data)) % _table.size();

pNode cur = _table[index];

while (cur)

{

if (Pred()(slt(cur->_data), slt(data)))

return iterator(cur,this);

else

cur = cur->_next;

}

return iterator(nullptr,this);

}

bool erase(const T& data)

{

Select1st st1;

size_t index = HashFunc()(st1(data)) % _table.size();

pNode cur = _table[index];

pNode prev = cur;

while (cur)

{

if (Pred()(st1(cur->_data) , st1(data)))

{

//找到了

if (cur == _table[index])

{

_table[index] = cur->_next;

_n--;

delete cur;

return true;

}

else

{

prev->_next = cur->_next;

_n--;

delete cur;

return true;

}

else//没找到

{

prev = cur;

cur = cur->_next;

}

return false;

}

size_t GetNextPrime(size_t prime)

{

size_t i = 0;

for (; i < PRIMECOUNT; i++)

{

if (primeList[i] > prime)

return primeList[i];

}

return primeList[i];

}

size_t size() const{ return _n; }

};

}

  测试代码⬇️ ⬇️

//OpenHash(开散列)

void Test_KV2()//KV模型

{

OpenHash::HashTable> hts;

pair arr[] = {

make_pair("left", "左边") ,make_pair("right", "右边"),make_pair("up", "向上")

,make_pair("down", "向下"),make_pair("left","左边"),make_pair("eat","吃")

,make_pair("sleep","睡觉"),make_pair("run","跑"),make_pair("jump","跳")};

for (auto e : arr)

hts.insert(e);

//非const迭代器

OpenHash::HashTable>::iterator it = hts.begin();

while (it != hts.end())

{

cout << it->first << ":" << it->second << endl;

it++;

}

cout << endl;

hts.erase(make_pair("sleep", "睡觉"));

hts.erase(make_pair("left", "左边"));

hts.erase(make_pair("up", "向上"));

//const类型

OpenHash::HashTable>::const_iterator cit = hts.cbegin();

while (cit != hts.cend())

{

cout << cit->first << ":" << cit->second << endl;

cit++;

}

cout << endl;

}

void Test_K2()//K模型

{

OpenHash::HashTable, KSelect1st, HashFunc> hts;

string arr[] = {

"left", "左边" ,"right", "右边","up", "向上"

,"down", "向下","left","左边","eat","吃"

,"sleep","睡觉","run","跑","jump","跳" };

for (auto e : arr)

hts.insert(e);

OpenHash::HashTable, KSelect1st, HashFunc>::iterator it = hts.begin();

while (it != hts.end())

{

cout << *it << " ";

it++;

}

cout << endl;

}

  总结：空间消耗：在开散列中虽然增加了链表的链接指针，但是其空间消耗还是要比闭散列小，因为闭散列的平衡因子一定会比开散列的小，闭散列的平衡因子一定小于1，而且越接近1，越容易发生哈希冲突。所以相同数量的数据进行存储时，开散列开辟的空间更小。

  今天的内容到这里就结束了，希望小伙伴们能够有所收获。

精彩内容

评论可见，请评论后查看内容，谢谢！！！评论后请刷新页面。

夸智网

散列表数据结构模拟实现哈希表超详解（C++）

算法 java leetcode 动态规划课堂5-----子序列问题（动态规划 + 哈希表）

散列表 Rust踩雷笔记（2）——一道hard带来的思考[哈希表、字符串、滑动窗口]

发表评论取消回复

夸智网

散列表 数据结构 模拟实现哈希表超详解（C++）

算法 java leetcode 动态规划课堂5-----子序列问题（动态规划 + 哈希表）

散列表 Rust踩雷笔记（2）——一道hard带来的思考[哈希表、字符串、滑动窗口]

相关文章

发表评论取消回复

散列表数据结构模拟实现哈希表超详解（C++）