HBase Architectie: Use Cases, Componenten & Datamodel

⚡ Slimme samenvatting

HBase architecture is built from four coordinating components — HMaster, Region Servers, ZooKeeper, and HDFS — that store data in a column-oriented model, split it into regions, and serve low-latency random reads and writes.

🧭 HMaster: Assigns regions to Region Servers, handles load balancing and failover, and manages schema and metadata changes.
️ Region Servers: Serve client read and write requests, host regions, and split regions automatically as data grows.
🧱 Regions and stores: Each region keeps one store per column family, built from a MemStore in memory and HFiles on disk.
🔗 Dierentuinmedewerker: Coordinates the cluster, tracks server failures, and holds the quorum configuration clients use to connect.
🧮 Gegevensmodel: Tables group column families and rows, and a row key acts as the primary key for every access.
⚡ HBase vs HDFS: HBase adds low-latency random reads and writes on top of HDFS batch storage.

Meer informatie

Apache HBase is a distributed, column-oriented NoSQL database that runs on top of Hadoop and the Hadoop Distributed File System (HDFS). Its architecture combines a coordinating master, region servers, and ZooKeeper to store very large tables and serve fast random reads and writes.

HBase Architectie en zijn belangrijke componenten

HBase architecture has the following main components:

HMaster
HRegionServer
HRegio's
Dierentuinmedewerker
HDFS

Below is a detailed architecture of HBase with its components, as shown in the diagram.

HMaster

HMaster in HBase is the implementation of a Master server in HBase architecture. It acts as a monitoring agent to monitor all Region Server instances present in the cluster and acts as an interface for all the metadata changes. In a distributed cluster environment, the Master runs on the NameNode. The Master runs several background threads.

The following are important roles performed by HMaster in HBase:

Speelt een cruciale rol als het gaat om prestaties en het onderhouden van knooppunten in het cluster.
HMaster biedt beheerdersprestaties en distribueert services naar servers in verschillende regio's.
HMaster wijst regio's toe aan regioservers.
HMaster controls load balancing and failover to handle the load over nodes present in the cluster.
When a client wants to change any schema or any metadata operation, HMaster takes responsibility for these operations.

Some of the methods exposed by the HMaster interface are primarily metadata-oriented methods:

Tabel (createTable, removeTable, inschakelen, uitschakelen)
ColumnFamily (kolom toevoegen, kolom wijzigen)
Regio (verplaatsen, toewijzen)

The client communicates in a bi-directional way with both HMaster and ZooKeeper. For read and write operations, it directly contacts the HRegion servers. HMaster assigns regions to region servers and, in turn, checks the health status of region servers.

In the entire architecture, we have multiple region servers. An HLog is present in the region servers, which stores all the log files.

HBase-regioservers

When an HBase Region Server receives write and read requests from the client, it assigns the request to a specific region, where the actual column family resides. The client can directly contact the HRegion servers; there is no need for mandatory HMaster permission for the client to communicate with the HRegion servers. The client requires HMaster help only when operations related to metadata and schema changes are required.

HRegionServer is the Region Server implementation. It is responsible for serving and managing regions, or the data that is present in a distributed cluster. The region servers run on the Data Nodes present in the Hadoop cluster.

HMaster can get into contact with multiple HRegion servers and performs the following functions:

Regio's hosten en beheren
Regio's automatisch splitsen
Handling read and write requests
Direct communiceren met de klant

HBase-regio's

HRegions are the basic building elements of an HBase cluster. They consist of the distribution of tables and are comprised of column families. A region contains multiple stores, one for each column family. It mainly consists of two components: the MemStore and the HFile.

Dierentuinmedewerker

HBase Dierentuinmedewerker is a centralized monitoring server that maintains configuration information and provides distributed synchronization. Distributed synchronization coordinates the distributed applications running across the cluster, providing coordination services between nodes. If the client wants to communicate with regions, the client has to approach ZooKeeper first.

It is an open source project, and it provides many important services.

Services provided by ZooKeeper:

Maintains configuration information
Biedt gedistribueerde synchronisatie
Establishes client communication with region servers
Provides ephemeral nodes that represent different region servers
Lets the Master server use these ephemeral nodes to discover available servers in the cluster
Tracks server failure and network partitions

The Master and HBase slave nodes (region servers) register themselves with ZooKeeper. The client needs access to the ZooKeeper (ZK) quorum configuration to connect with the master and region servers.

During a failure of nodes present in the HBase cluster, the ZooKeeper quorum triggers error messages and starts to repair the failed nodes.

HDFS

HDFS is the Hadoop Distributed File System. As the name implies, it provides a distributed environment for storage, and it is a file system designed to run on commodity hardware. It stores each file in multiple blocks, and to maintain fault tolerance, the blocks are replicated across a Hadoop cluster.

HDFS provides a high degree of fault tolerance and runs on cheap commodity hardware. By adding nodes to the cluster and performing processing and storing using cheap commodity hardware, it gives the client better results compared to the existing setup.

Here, the data stored in each block is replicated across 3 nodes, so if any node goes down there will be no loss of data; it has a proper backup and recovery mechanism.

HDFS gets in contact with the HBase components and stores a large amount of data in a distributed manner.

HBase-gegevensmodel

The HBase Data Model is a set of components that consists of Tables, Rows, Column families, Cells, Columns, and Versions. HBase tables contain column families and rows, with elements defined as primary keys. A column in the HBase data model table represents an attribute of the objects.

The HBase Data Model consists of the following elements:

Aantal tafels
Elke tabel met kolomfamilies en rijen
Each table must have an element defined as a primary key.
The row key acts as a primary key in HBase.
Any access to HBase tables uses this primary key.
Each column present in HBase denotes an attribute corresponding to an object.

HBase-gebruiksscenario's

The following are examples of HBase use cases with a detailed explanation of the solution HBase provides to various technical problems.

Probleemstelling	Het resultaat
The telecom industry faces the following technical challenges: storing billions of Call Detail Record (CDR) log records generated by the telecom domain; providing real-time access to CDR logs and billing information of customers; and providing a cost-effective solution compared to traditional database systems.	HBase wordt gebruikt om miljarden rijen met gedetailleerde oproeprecords op te slaan. Als er 20 TB aan data per maand wordt toegevoegd aan de bestaande RDBMS-database, zal de performance verslechteren. Om een grote hoeveelheid data in dit use case te verwerken, is HBase de beste oplossing. HBase voert snelle query's uit en geeft records weer.
The banking industry generates millions of records on a daily basis. In addition, the banking industry also needs an analytics solution that can detect fraud in money transactions.	To store, process, and update vast volumes of data and to perform analytics, an ideal solution is HBase integrated with several Hadoop ecosystem components.

Apart from that, HBase can be used:

Whenever there is a need for write-heavy applications.
For performing online log analytics and generating compliance reports.

Opslagmechanisme in HBase

HBase is a column-oriented database, and data is stored in tables. The tables are sorted by RowId. As shown below, HBase has a RowId, which is the collection of several column families that are present in the table.

The column families that are present in the schema are key-value pairs. If we observe in detail, each column family has multiple columns. The column values are stored on disk memory. Each cell of the table has its own metadata, such as a timestamp and other information.

The column-oriented storage layout, with row keys, column families, and cells, is shown below.

The following are the key terms representing an HBase table schema:

Table: Collection of rows present.
Row: Collection of column families.
Column Family: Collection of columns.
Column: Collection of key-value pairs.
Namespace: Logical grouping van tabellen.
Cell: A {row, column, version} tuple that exactly specifies a cell definition in HBase.

Kolomgeoriënteerde versus rijgerichte opslagplaatsen

Column-oriented and row-oriented storages differ in their storage mechanism. As we all know, traditional relational models store data in a row-based format, in terms of rows of data. Column-oriented storages store data tables in terms of columns and column families.

The following table gives some key differences between these two storages.

Kolomgeoriënteerde database	Row-oriented Database
Used when the situation involves processing and analytics, such as Online Analytical Processing and its applications.	Online Transactional Processing, such as banking and finance domains, uses this approach.
The amount of data that can be stored in this model is very large, in terms of petabytes.	Het is ontworpen voor een klein aantal rijen en kolommen.

HBase-gegevens lezen en schrijven uitgelegd

The read and write operations from the client into the HFile are shown in the diagram below.

Step 1) The client wants to write data and, in turn, first communicates with the Region Server and then the regions.

Step 2) The region contacts the MemStore for storing the data associated with the column family.

Step 3) First, the data is stored in the MemStore, where the data is sorted, and after that it flushes into the HFile. The main reason for using the MemStore is to store data in a distributed file system based on the row key. The MemStore is placed in the Region Server main memory, while HFiles are written into HDFS.

Step 4) The client wants to read data from the regions.

Step 5) In turn, the client can have direct access to the MemStore and can request data.

Step 6) The client approaches the HFiles to get the data. The data is fetched and retrieved by the client.

The MemStore holds in-memory modifications to the store. The hierarchy of objects in HBase regions, from top to bottom, is shown in the table below.

tafel	HBase-tabel aanwezig in het HBase-cluster
Regio	HRegions voor de gepresenteerde tabellen
Shop	Stores one per column family for each region for the table
MemWinkel	MemStore for each store for each region for the table. It sorts data before flushing into HFiles. Write and read performance increase because of sorting.
Winkelbestand	StoreFiles voor elke winkel voor elke regio voor de tabel
Block	Blokken aanwezig in StoreFiles

HBase versus HDFS

HBase runs on top of HDFS and Hadoop. Some key differences between HDFS and HBase are in terms of data operations and processing.

HBase	HDFS
Lage latentie-bewerkingen	Hoge latentie-bewerkingen
Willekeurig leest en schrijft	Write once, read many times
toegankelijk via shell-opdrachten, a client API in Java, REST, Avro, or Thrift	Primarily accessed through MapReduce (MR) jobs
Both storage and processing can be performed	It is only for storage areas

Some typical IT industrial applications use HBase operations along with Hadoop. Applications include stock exchange data and online banking data operations, where HBase is the best-suited solution. Once your cluster is ready, you can read and write data in HBase or install HBase on a fresh node.

Veelgestelde vragen

Yes. HBase is a distributed, column-oriented NoSQL database modeled on Google Bigtable and built on top of HDFS. It stores sparse data in tables of column families and does not use fixed schemas or SQL joins like a relational database.

The WAL, also called HLog, records every write on the Region Server before it enters the MemStore. It is stored on HDFS, so if a Region Server crashes before a flush, HBase replays the WAL to recover the unsaved edits.

Compaction merges HFiles to keep reads fast. Minor compaction combines several small adjacent HFiles into one. Major compaction rewrites all HFiles of a column family into a single file and physically removes deleted and expired cells.

Both are Bigtable-inspired NoSQL stores, but HBase runs on HDFS with a single active HMaster and strong consistency, while Cassandra is masterless with tunable, eventually consistent replication. HBase suits Hadoop analytics; Cassandra suits always-on writes.

Design row keys so reads and writes spread evenly across regions. Avoid monotonically increasing keys, which create hotspots on one Region Server. Use salting, hashing, or field reversal, and keep keys short because they repeat in every cell.

A region splits automatically when its store grows past a configured size threshold. The Region Server divides it into two child regions at the middle row key, and HMaster may reassign one of them to another server to balance the load.

AI and machine-learning tools analyze query and access patterns to suggest row-key and column-family designs that avoid hotspots. They also scan Region Server metrics and logs to flag anomalies such as skewed regions or failing nodes early.

Ja. GitHub-copiloot drafts HBase Java client code, shell commands, and scan filters from a short comment. Review its output for correct table names, column families, and API classes such as Connection and Table before running it on a real cluster.

HBase Architectie: Use Cases, Componenten & Datamodel

HBase Architectie en zijn belangrijke componenten

HMaster

HBase-regioservers

HBase-regio's

Dierentuinmedewerker

HDFS

HBase-gegevensmodel

HBase-gebruiksscenario's

Opslagmechanisme in HBase

Kolomgeoriënteerde versus rijgerichte opslagplaatsen

HBase-gegevens lezen en schrijven uitgelegd

HBase versus HDFS

Veelgestelde vragen

Vat dit bericht samen met:

Schrijf je in voor de nieuwsbrief

HBase Architectie en zijn belangrijke componenten

HMaster

HBase-regioservers

HBase-regio's

Dierentuinmedewerker

HDFS

HBase-gegevensmodel

GERELATEERDE ARTIKELEN

HBase-gebruiksscenario's

Opslagmechanisme in HBase

Kolomgeoriënteerde versus rijgerichte opslagplaatsen

HBase-gegevens lezen en schrijven uitgelegd

HBase versus HDFS

Veelgestelde vragen

Vat dit bericht samen met:

Schrijf je in voor de nieuwsbrief