What is Hashing in DBMS?

In DBMS, hashing is a technique to directly search the location of desired data on the disk without using index structure. Data is stored in the form of data blocks whose address is generated by applying a hash function in the memory location where these records are stored known as a data block or data bucket.

In this DBMS tutorial, you will learn,

Why do we need Hashing?

Here, are the situations in the DBMS where you need to apply the Hashing method:

  • For a huge database structure, it's tough to search all the index values through all its level and then you need to reach the destination data block to get the desired data.
  • Hashing method is used to index and retrieve items in a database as it is faster to search that specific item using the shorter hashed key instead of using its original value.
  • Hashing is an ideal method to calculate the direct location of a data record on the disk without using index structure.
  • It is also a helpful technique for implementing dictionaries.

Important Terminologies using in Hashing

Here, are important terminologies which are used in Hashing:

  • Data bucket – Data buckets are memory locations where the records are stored. It is also known as Unit Of Storage.
  • Key: A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple) in a relation(table). This allows you to find the relationship between two tables.
  • Hash function: A hash function, is a mapping function which maps all the set of search keys to the address where actual records are placed.
  • Linear Probing – Linear probing is a fixed interval between probes. In this method, the next available data block is used to enter the new record, instead of overwriting on the older record.
  • Quadratic probing- It helps you to determine the new bucket address. It helps you to add Interval between probes by adding the consecutive output of quadratic polynomial to starting value given by the original computation.
  • Hash index – It is an address of the data block. A hash function could be a simple mathematical function to even a complex mathematical function.
  • Double Hashing –Double hashing is a computer programming method used in hash tables to resolve the issues of has a collision.
  • Bucket Overflow: The condition of bucket-overflow is called collision. This is a fatal stage for any static has to function.

There are mainly two types of SQL hashing methods:

  1. Static Hashing
  2. Dynamic Hashing

Static Hashing

In the static hashing, the resultant data bucket address will always remain the same.

Therefore, if you generate an address for say Student_ID = 10 using hashing function mod(3), the resultant bucket address will always be 1. So, you will not see any change in the bucket address.

Therefore, in this static hashing method, the number of data buckets in memory always remains constant.

Static Hash Functions

  • Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the new record using its hash key. When the address is generated, the record is automatically stored in that location.
  • Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address of the bucket where data should be stored.
  • Delete a record: Using the hash function, you can first fetch the record which is you wants to delete. Then you can remove the records for that address in memory.

Static hashing is further divided into

  1. Open hashing
  2. Close hashing.

Open Hashing

In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This method is also known as linear probing.

For example, A2 is a new record which you wants to insert. The hash function generates address as 222. But it is already occupied by some other value. That's why the system looks for the next data bucket 501 and assigns A2 to it.

How Open Hash Works

Close Hashing

In the close hashing method, when buckets are full, a new bucket is allocated for the same hash and result are linked after the previous one.

Dynamic Hashing

Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand. In this hashing, the hash function helps you to create a large number of values.

Comparison of Ordered Indexing and Hashing

Parameters Order Indexing Hashing
Storing of address Addresses in the memory are sorted according to a key value called the primary key Addresses are always generated using a hash function on the key value.
Performance It can decrease when the data increases in the hash file. As it stores the data in a sorted form when there is any (insert/delete/update) operation performed which decreases its performance. Performance of hashing will be best when there is a constant addition and deletion of data. However, when the database is huge, then hash file organization and its maintenance will be costlier.
Use for Preferred for range retrieval of data- which means whenever there is retrieval data for a particular range, this method is an ideal option. This is an ideal method when you want to retrieve a particular record based on the search key. However, it will only perform well when the hash function is on the search key.
Memory management There will be many unused data blocks because of the delete/update operation. These data blocks can't be released for re-use. That's why regular maintenance of the memory is required. In static and dynamic hashing methods, memory is always managed. Bucket overflow is also handled perfectly to extend static hashing.

What is Collision?

Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in the hash table.

How to deal with Hashing Collision?

There are two technique which you can use to avoid a hash collision:

  1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is found, where a record should be placed.
  2. Chaining: Chaining method builds a Linked list of items whose key hashes to the same value. This method requires an extra link field to each table position.

Summary:

  • In DBMS, hashing is a technique to directly search the location of desired data on the disk without using index structure.
  • Hashing method is used to index and retrieve items in a database as it is faster to search that specific item using the shorter hashed key instead of using its original value.
  • Data bucket, Key , Hash function, Linear Probing, Quadratic probing , Hash index, Double Hashing, Bucket Overflow are important terminologies used in hashing
  • Two types of hashing methods are 1) static hashing 2) dynamic hashing
  • In the static hashing, the resultant data bucket address will always remain the same.
  • Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand.
  • In order Indexing addresses in the memory are sorted according to a critical value while in hashing addresses are always generated using a hash function on the key value.
  • Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in the hash table.
  • Rehashing and chaining are two methods which help you to avoid hashing collision.

 

Guru99 is Sponsored by AQUAFOLD
Aquafold

Aqua Data Studio is proud to sponsor Guru99's informative website for database professionals. Take your productivity to new heights with query tools, visual analytics, and ER diagramming for multiple platforms.