HBase ArchiTecture: Use Cases, Components & Data Model

⚡ Smart opsummering

HBase architecture is built from four coordinating components — HMaster, Region Servers, ZooKeeper, and HDFS — that store data in a column-oriented model, split it into regions, and serve low-latency random reads and writes.

🧭 HMaster: Assigns regions to Region Servers, handles load balancing and failover, and manages schema and metadata changes.
🗄️ Region Servers: Serve client read and write requests, host regions, and split regions automatically as data grows.
🧱 Regions and stores: Each region keeps one store per column family, built from a MemStore in memory and HFiles on disk.
🔗 Dyrepasser: Coordinates the cluster, tracks server failures, and holds the quorum configuration clients use to connect.
🧮 Datamodel: Tables group column families and rows, and a row key acts as the primary key for every access.
⚡ HBase vs HDFS: HBase adds low-latency random reads and writes on top of HDFS batch storage.

Læs mere

Apache HBase is a distributed, column-oriented NoSQL database that runs on top of Hadoop and the Hadoop Distributed File System (HDFS). Its architecture combines a coordinating master, region servers, and ZooKeeper to store very large tables and serve fast random reads and writes.

HBase Architecture og dens vigtige komponenter

HBase architecture has the following main components:

HMaster
HRegionServer
HR-regioner
Dyrepasser
HDFS

Below is a detailed architecture of HBase with its components, as shown in the diagram.

HMaster

HMaster in HBase is the implementation of a Master server in HBase architecture. It acts as a monitoring agent to monitor all Region Server instances present in the cluster and acts as an interface for all the metadata changes. In a distributed cluster environment, the Master runs on the NameNode. The Master runs several background threads.

The following are important roles performed by HMaster in HBase:

Spiller en afgørende rolle med hensyn til ydeevne og vedligeholdelse af noder i klyngen.
HMaster leverer admin ydeevne og distribuerer tjenester til forskellige regionsservere.
HMaster tildeler regioner til regionsservere.
HMaster controls load balancing and failover to handle the load over nodes present in the cluster.
When a client wants to change any schema or any metadata operation, HMaster takes responsibility for these operations.

Some of the methods exposed by the HMaster interface are primarily metadata-oriented methods:

Tabel (opret tabel, fjern tabel, aktiver, deaktiver)
ColumnFamily (tilføj kolonne, rediger kolonne)
Region (flytte, tildele)

The client communicates in a bi-directional way with both HMaster and ZooKeeper. For read and write operations, it directly contacts the HRegion servers. HMaster assigns regions to region servers and, in turn, checks the health status of region servers.

In the entire architecture, we have multiple region servers. An HLog is present in the region servers, which stores all the log files.

HBase Region-servere

When an HBase Region Server receives write and read requests from the client, it assigns the request to a specific region, where the actual column family resides. The client can directly contact the HRegion servers; there is no need for mandatory HMaster permission for the client to communicate with the HRegion servers. The client requires HMaster help only when operations related to metadata and schema changes are required.

HRegionServer is the Region Server implementation. It is responsible for serving and managing regions, or the data that is present in a distributed cluster. The region servers run on the Data Nodes present in the Hadoop cluster.

HMaster can get into contact with multiple HRegion servers and performs the following functions:

Hosting og administration af regioner
Opdeler regioner automatisk
Handling read and write requests
Kommunikerer direkte med kunden

HBase-regioner

HRegions are the basic building elements of an HBase cluster. They consist of the distribution of tables and are comprised of column families. A region contains multiple stores, one for each column family. It mainly consists of two components: the MemStore and the HFile.

Dyrepasser

HBase Dyrepasser is a centralized monitoring server that maintains configuration information and provides distributed synchronization. Distributed synchronization coordinates the distributed applications running across the cluster, providing coordination services between nodes. If the client wants to communicate with regions, the client has to approach ZooKeeper first.

It is an open source project, and it provides many important services.

Services provided by ZooKeeper:

Maintains configuration information
Giver distribueret synkronisering
Establishes client communication with region servers
Provides ephemeral nodes that represent different region servers
Lets the Master server use these ephemeral nodes to discover available servers in the cluster
Tracks server failure and network partitions

The Master and HBase slave nodes (region servers) register themselves with ZooKeeper. The client needs access to the ZooKeeper (ZK) quorum configuration to connect with the master and region servers.

During a failure of nodes present in the HBase cluster, the ZooKeeper quorum triggers error messages and starts to repair the failed nodes.

HDFS

HDFS is the Hadoop Distributed File System. As the name implies, it provides a distributed environment for storage, and it is a file system designed to run on commodity hardware. It stores each file in multiple blocks, and to maintain fault tolerance, the blocks are replicated across a Hadoop cluster.

HDFS provides a high degree of fault tolerance and runs on cheap commodity hardware. By adding nodes to the cluster and performing processing and storing using cheap commodity hardware, it gives the client better results compared to the existing setup.

Here, the data stored in each block is replicated across 3 nodes, so if any node goes down there will be no loss of data; it has a proper backup and recovery mechanism.

HDFS gets in contact with the HBase components and stores a large amount of data in a distributed manner.

HBase Data Model

The HBase Data Model is a set of components that consists of Tables, Rows, Column families, Cells, Columns, and Versions. HBase tables contain column families and rows, with elements defined as primary keys. A column in the HBase data model table represents an attribute of the objects.

The HBase Data Model consists of the following elements:

Sæt af borde
Hver tabel med kolonnefamilier og rækker
Each table must have an element defined as a primary key.
The row key acts as a primary key in HBase.
Any access to HBase tables uses this primary key.
Each column present in HBase denotes an attribute corresponding to an object.

HBase Use Cases

The following are examples of HBase use cases with a detailed explanation of the solution HBase provides to various technical problems.

Problemformulering	Løsning
The telecom industry faces the following technical challenges: storing billions of Call Detail Record (CDR) log records generated by the telecom domain; providing real-time access to CDR logs and billing information of customers; and providing a cost-effective solution compared to traditional database systems.	HBase bruges til at gemme milliarder af rækker af detaljerede opkaldsposter. Hvis der tilføjes 20 TB data om måneden til den eksisterende RDBMS-database, forringes ydeevnen. Til at håndtere en stor mængde data i denne use case er HBase den bedste løsning. HBase udfører hurtig forespørgsel og viser poster.
The banking industry generates millions of records on a daily basis. In addition, the banking industry also needs an analytics solution that can detect fraud in money transactions.	To store, process, and update vast volumes of data and to perform analytics, an ideal solution is HBase integrated with several Hadoop ecosystem components.

Apart from that, HBase can be used:

Whenever there is a need for write-heavy applications.
For performing online log analytics and generating compliance reports.

Opbevaringsmekanisme i HBase

HBase is a column-oriented database, and data is stored in tables. The tables are sorted by RowId. As shown below, HBase has a RowId, which is the collection of several column families that are present in the table.

The column families that are present in the schema are key-value pairs. If we observe in detail, each column family has multiple columns. The column values are stored on disk memory. Each cell of the table has its own metadata, such as a timestamp and other information.

The column-oriented storage layout, with row keys, column families, and cells, is shown below.

The following are the key terms representing an HBase table schema:

Table: Collection of rows present.
Row: Collection of column families.
Column Family: Collection of columns.
Column: Collection of key-value pairs.
Namespace: Logical grouping af borde.
Cell: A {row, column, version} tuple that exactly specifies a cell definition in HBase.

Kolonneorienterede vs rækkeorienterede opbevaringssteder

Column-oriented and row-oriented storages differ in their storage mechanism. As we all know, traditional relational models store data in a row-based format, in terms of rows of data. Column-oriented storages store data tables in terms of columns and column families.

The following table gives some key differences between these two storages.

Kolonneorienteret database	Row-oriented Database
Used when the situation involves processing and analytics, such as Online Analytical Processing and its applications.	Online Transactional Processing, such as banking and finance domains, uses this approach.
The amount of data that can be stored in this model is very large, in terms of petabytes.	Den er designet til et lille antal rækker og kolonner.

HBase læse og skrive data forklaret

The read and write operations from the client into the HFile are shown in the diagram below.

Step 1) The client wants to write data and, in turn, first communicates with the Region Server and then the regions.

Step 2) The region contacts the MemStore for storing the data associated with the column family.

Step 3) First, the data is stored in the MemStore, where the data is sorted, and after that it flushes into the HFile. The main reason for using the MemStore is to store data in a distributed file system based on the row key. The MemStore is placed in the Region Server main memory, while HFiles are written into HDFS.

Step 4) The client wants to read data from the regions.

Step 5) In turn, the client can have direct access to the MemStore and can request data.

Step 6) The client approaches the HFiles to get the data. The data is fetched and retrieved by the client.

The MemStore holds in-memory modifications to the store. The hierarchy of objects in HBase regions, from top to bottom, is shown in the table below.

Bordlampe	HBase-tabel til stede i HBase-klyngen
Område	HRegioner for de præsenterede tabeller
Butik	Stores one per column family for each region for the table
MemStore	MemStore for each store for each region for the table. It sorts data before flushing into HFiles. Write and read performance increase because of sorting.
Gem fil	StoreFiles for hver butik for hver region for tabellen
Blokering	Blokke til stede i StoreFiles

HBase vs. HDFS

HBase runs on top of HDFS and Hadoop. Some key differences between HDFS and HBase are in terms of data operations and processing.

HBase	HDFS
Operationer med lav latens	Operationer med høj latens
Tilfældig læsning og skrivning	Write once, read many times
Tilgås gennem skal kommandoer, a client API in Java, REST, Avro, or Thrift	Primarily accessed through MapReduce (MR) jobs
Both storage and processing can be performed	It is only for storage areas

Some typical IT industrial applications use HBase operations along with Hadoop. Applications include stock exchange data and online banking data operations, where HBase is the best-suited solution. Once your cluster is ready, you can read and write data in HBase or install HBase on a fresh node.

Ofte Stillede Spørgsmål

Yes. HBase is a distributed, column-oriented NoSQL database modeled on Google Bigtable and built on top of HDFS. It stores sparse data in tables of column families and does not use fixed schemas or SQL joins like a relational database.

The WAL, also called HLog, records every write on the Region Server before it enters the MemStore. It is stored on HDFS, so if a Region Server crashes before a flush, HBase replays the WAL to recover the unsaved edits.

Compaction merges HFiles to keep reads fast. Minor compaction combines several small adjacent HFiles into one. Major compaction rewrites all HFiles of a column family into a single file and physically removes deleted and expired cells.

Both are Bigtable-inspired NoSQL stores, but HBase runs on HDFS with a single active HMaster and strong consistency, while Cassandra is masterless with tunable, eventually consistent replication. HBase suits Hadoop analytics; Cassandra suits always-on writes.

Design row keys so reads and writes spread evenly across regions. Avoid monotonically increasing keys, which create hotspots on one Region Server. Use salting, hashing, or field reversal, and keep keys short because they repeat in every cell.

A region splits automatically when its store grows past a configured size threshold. The Region Server divides it into two child regions at the middle row key, and HMaster may reassign one of them to another server to balance the load.

AI and machine-learning tools analyze query and access patterns to suggest row-key and column-family designs that avoid hotspots. They also scan Region Server metrics and logs to flag anomalies such as skewed regions or failing nodes early.

Ja. GitHub Copilot drafts HBase Java client code, shell commands, and scan filters from a short comment. Review its output for correct table names, column families, and API classes such as Connection and Table before running it on a real cluster.

HBase ArchiTecture: Use Cases, Components & Data Model

HBase Architecture og dens vigtige komponenter

HMaster

HBase Region-servere

HBase-regioner

Dyrepasser

HDFS

HBase Data Model

HBase Use Cases

Opbevaringsmekanisme i HBase

Kolonneorienterede vs rækkeorienterede opbevaringssteder

HBase læse og skrive data forklaret

HBase vs. HDFS

Ofte Stillede Spørgsmål

Opsummer dette indlæg med:

Tilmeld dig nyhedsbrevet

HBase Architecture og dens vigtige komponenter

HMaster

HBase Region-servere

HBase-regioner

Dyrepasser

HDFS

HBase Data Model

RELATEREDE ARTIKLER

HBase Use Cases

Opbevaringsmekanisme i HBase

Kolonneorienterede vs rækkeorienterede opbevaringssteder

HBase læse og skrive data forklaret

HBase vs. HDFS

Ofte Stillede Spørgsmål

Opsummer dette indlæg med:

Tilmeld dig nyhedsbrevet