Base H Archiarchitettura: casi d'uso, componenti e modello di dati

⚡ Riepilogo intelligente

HBase architecture is built from four coordinating components — HMaster, Region Servers, ZooKeeper, and HDFS — that store data in a column-oriented model, split it into regions, and serve low-latency random reads and writes.

🧭 HMaster: Assigns regions to Region Servers, handles load balancing and failover, and manages schema and metadata changes.
️ Region Servers: Serve client read and write requests, host regions, and split regions automatically as data grows.
🧱 Regions and stores: Each region keeps one store per column family, built from a MemStore in memory and HFiles on disk.
🔗 Custode dello zoo: Coordinates the cluster, tracks server failures, and holds the quorum configuration clients use to connect.
🧮 Modello di dati: Tables group column families and rows, and a row key acts as the primary key for every access.
⚡ HBase vs HDFS: HBase adds low-latency random reads and writes on top of HDFS batch storage.

Scopri di più

Apache HBase is a distributed, column-oriented NoSQL database that runs on top of Hadoop and the Hadoop Distributed File System (HDFS). Its architecture combines a coordinating master, region servers, and ZooKeeper to store very large tables and serve fast random reads and writes.

Base H Architettura e i suoi componenti importanti

HBase architecture has the following main components:

Maestro
HRegionServer
HRegioni
Custode dello zoo
HDFS

Below is a detailed architecture of HBase with its components, as shown in the diagram.

Maestro

HMaster in HBase is the implementation of a Master server in HBase architecture. It acts as a monitoring agent to monitor all Region Server instances present in the cluster and acts as an interface for all the metadata changes. In a distributed cluster environment, the Master runs on the NameNode. The Master runs several background threads.

The following are important roles performed by HMaster in HBase:

Svolge un ruolo fondamentale in termini di prestazioni e mantenimento dei nodi nel cluster.
HMaster fornisce prestazioni di amministrazione e distribuisce servizi a server di diverse regioni.
HMaster assegna le regioni ai server delle regioni.
HMaster controls load balancing and failover to handle the load over nodes present in the cluster.
When a client wants to change any schema or any metadata operation, HMaster takes responsibility for these operations.

Some of the methods exposed by the HMaster interface are primarily metadata-oriented methods:

Tabella (creatabella, rimuovitabella, abilita, disabilita)
ColumnFamily (aggiungi colonna, modifica colonna)
Regione (spostare, assegnare)

The client communicates in a bi-directional way with both HMaster and ZooKeeper. For read and write operations, it directly contacts the HRegion servers. HMaster assigns regions to region servers and, in turn, checks the health status of region servers.

In the entire architecture, we have multiple region servers. An HLog is present in the region servers, which stores all the log files.

Server della regione HBase

When an HBase Region Server receives write and read requests from the client, it assigns the request to a specific region, where the actual column family resides. The client can directly contact the HRegion servers; there is no need for mandatory HMaster permission for the client to communicate with the HRegion servers. The client requires HMaster help only when operations related to metadata and schema changes are required.

HRegionServer is the Region Server implementation. It is responsible for serving and managing regions, or the data that is present in a distributed cluster. The region servers run on the Data Nodes present in the Hadoop cluster.

HMaster can get into contact with multiple HRegion servers and performs the following functions:

Ospitare e gestire le regioni
Suddivisione automatica delle regioni
Handling read and write requests
Comunicare direttamente con il cliente

Regioni HBase

HRegions are the basic building elements of an HBase cluster. They consist of the distribution of tables and are comprised of column families. A region contains multiple stores, one for each column family. It mainly consists of two components: the MemStore and the HFile.

Custode dello zoo

Base H Custode dello zoo is a centralized monitoring server that maintains configuration information and provides distributed synchronization. Distributed synchronization coordinates the distributed applications running across the cluster, providing coordination services between nodes. If the client wants to communicate with regions, the client has to approach ZooKeeper first.

It is an open source project, and it provides many important services.

Services provided by ZooKeeper:

Maintains configuration information
Fornisce sincronizzazione distribuita
Establishes client communication with region servers
Provides ephemeral nodes that represent different region servers
Lets the Master server use these ephemeral nodes to discover available servers in the cluster
Tracks server failure and network partitions

The Master and HBase slave nodes (region servers) register themselves with ZooKeeper. The client needs access to the ZooKeeper (ZK) quorum configuration to connect with the master and region servers.

During a failure of nodes present in the HBase cluster, the ZooKeeper quorum triggers error messages and starts to repair the failed nodes.

HDFS

HDFS is the Hadoop Distributed File System. As the name implies, it provides a distributed environment for storage, and it is a file system designed to run on commodity hardware. It stores each file in multiple blocks, and to maintain fault tolerance, the blocks are replicated across a Hadoop cluster.

HDFS provides a high degree of fault tolerance and runs on cheap commodity hardware. By adding nodes to the cluster and performing processing and storing using cheap commodity hardware, it gives the client better results compared to the existing setup.

Here, the data stored in each block is replicated across 3 nodes, so if any node goes down there will be no loss of data; it has a proper backup and recovery mechanism.

HDFS gets in contact with the HBase components and stores a large amount of data in a distributed manner.

Modello dati HBase

The HBase Data Model is a set of components that consists of Tables, Rows, Column families, Cells, Columns, and Versions. HBase tables contain column families and rows, with elements defined as primary keys. A column in the HBase data model table represents an attribute of the objects.

The HBase Data Model consists of the following elements:

Set di tavoli
Ogni tabella con famiglie di colonne e righe
Each table must have an element defined as a primary key.
The row key acts as a primary key in HBase.
Any access to HBase tables uses this primary key.
Each column present in HBase denotes an attribute corresponding to an object.

Casi d'uso HBase

The following are examples of HBase use cases with a detailed explanation of the solution HBase provides to various technical problems.

Dichiarazione problema	Soluzione
The telecom industry faces the following technical challenges: storing billions of Call Detail Record (CDR) log records generated by the telecom domain; providing real-time access to CDR logs and billing information of customers; and providing a cost-effective solution compared to traditional database systems.	HBase viene utilizzato per archiviare miliardi di righe di record di chiamate dettagliati. Se vengono aggiunti 20 TB di dati al mese al database RDBMS esistente, le prestazioni peggioreranno. Per gestire una grande quantità di dati in questo caso d'uso, HBase è la soluzione migliore. HBase esegue query rapide e visualizza i record.
The banking industry generates millions of records on a daily basis. In addition, the banking industry also needs an analytics solution that can detect fraud in money transactions.	To store, process, and update vast volumes of data and to perform analytics, an ideal solution is HBase integrated with several Hadoop ecosystem components.

Apart from that, HBase can be used:

Whenever there is a need for write-heavy applications.
For performing online log analytics and generating compliance reports.

Meccanismo di archiviazione in HBase

HBase is a column-oriented database, and data is stored in tables. The tables are sorted by RowId. As shown below, HBase has a RowId, which is the collection of several column families that are present in the table.

The column families that are present in the schema are key-value pairs. If we observe in detail, each column family has multiple columns. The column values are stored on disk memory. Each cell of the table has its own metadata, such as a timestamp and other information.

The column-oriented storage layout, with row keys, column families, and cells, is shown below.

The following are the key terms representing an HBase table schema:

Table: Collection of rows present.
Row: Collection of column families.
Column Family: Collection of columns.
Column: Collection of key-value pairs.
Namespace: Logical grouping di tavoli.
Cell: A {row, column, version} tuple that exactly specifies a cell definition in HBase.

Archiviazioni orientate alle colonne e orientate alle righe

Column-oriented and row-oriented storages differ in their storage mechanism. As we all know, traditional relational models store data in a row-based format, in terms of rows of data. Column-oriented storages store data tables in terms of columns and column families.

The following table gives some key differences between these two storages.

Database orientato alle colonne	Row-oriented Database
Used when the situation involves processing and analytics, such as Online Analytical Processing and its applications.	Online Transactional Processing, such as banking and finance domains, uses this approach.
The amount of data that can be stored in this model is very large, in terms of petabytes.	È progettato per un numero limitato di righe e colonne.

Spiegazione dei dati di lettura e scrittura HBase

The read and write operations from the client into the HFile are shown in the diagram below.

Step 1) The client wants to write data and, in turn, first communicates with the Region Server and then the regions.

Step 2) The region contacts the MemStore for storing the data associated with the column family.

Step 3) First, the data is stored in the MemStore, where the data is sorted, and after that it flushes into the HFile. The main reason for using the MemStore is to store data in a distributed file system based on the row key. The MemStore is placed in the Region Server main memory, while HFiles are written into HDFS.

Step 4) The client wants to read data from the regions.

Step 5) In turn, the client can have direct access to the MemStore and can request data.

Step 6) The client approaches the HFiles to get the data. The data is fetched and retrieved by the client.

The MemStore holds in-memory modifications to the store. The hierarchy of objects in HBase regions, from top to bottom, is shown in the table below.

Table	Tabella HBase presente nel cluster HBase
destinazione	HRegioni per le tabelle presentate
Negozio	Stores one per column family for each region for the table
MemStore	MemStore for each store for each region for the table. It sorts data before flushing into HFiles. Write and read performance increase because of sorting.
StoreFile	StoreFiles per ogni negozio per ogni regione della tabella
Bloccare	Blocchi presenti all'interno di StoreFiles

HBase e HDFS

HBase runs on top of HDFS and Hadoop. Some key differences between HDFS and HBase are in terms of data operations and processing.

Base H	HDFS
Operazioni a bassa latenza	Operazioni ad alta latenza
Letture e scritture casuali	Write once, read many times
Si accede tramite comandi della shell, a client API in Java, REST, Avro, or Thrift	Primarily accessed through MapReduce (MR) jobs
Both storage and processing can be performed	It is only for storage areas

Some typical IT industrial applications use HBase operations along with Hadoop. Applications include stock exchange data and online banking data operations, where HBase is the best-suited solution. Once your cluster is ready, you can read and write data in HBase or install HBase on a fresh node.

DOMANDE FREQUENTI

Yes. HBase is a distributed, column-oriented NoSQL database modeled on Google Bigtable and built on top of HDFS. It stores sparse data in tables of column families and does not use fixed schemas or SQL joins like a relational database.

The WAL, also called HLog, records every write on the Region Server before it enters the MemStore. It is stored on HDFS, so if a Region Server crashes before a flush, HBase replays the WAL to recover the unsaved edits.

Compaction merges HFiles to keep reads fast. Minor compaction combines several small adjacent HFiles into one. Major compaction rewrites all HFiles of a column family into a single file and physically removes deleted and expired cells.

Both are Bigtable-inspired NoSQL stores, but HBase runs on HDFS with a single active HMaster and strong consistency, while Cassandra is masterless with tunable, eventually consistent replication. HBase suits Hadoop analytics; Cassandra suits always-on writes.

Design row keys so reads and writes spread evenly across regions. Avoid monotonically increasing keys, which create hotspots on one Region Server. Use salting, hashing, or field reversal, and keep keys short because they repeat in every cell.

A region splits automatically when its store grows past a configured size threshold. The Region Server divides it into two child regions at the middle row key, and HMaster may reassign one of them to another server to balance the load.

AI and machine-learning tools analyze query and access patterns to suggest row-key and column-family designs that avoid hotspots. They also scan Region Server metrics and logs to flag anomalies such as skewed regions or failing nodes early.

Sì. Le serrature scorrevoli portatili e i catenacci a superficie possono essere usati per mettere in sicurezza una porta a scomparsa dall'esterno. Alcuni kit con catena di sicurezza consentono anche il bloccaggio esterno con chiave o manopola girevole. Copilota GitHub drafts HBase Java client code, shell commands, and scan filters from a short comment. Review its output for correct table names, column families, and API classes such as Connection and Table before running it on a real cluster.

Base H Archiarchitettura: casi d'uso, componenti e modello di dati

Base H Architettura e i suoi componenti importanti

Maestro

Server della regione HBase

Regioni HBase

Custode dello zoo

HDFS

Modello dati HBase

Casi d'uso HBase

Meccanismo di archiviazione in HBase

Archiviazioni orientate alle colonne e orientate alle righe

Spiegazione dei dati di lettura e scrittura HBase

HBase e HDFS

DOMANDE FREQUENTI

Riassumi questo post con:

Iscriviti alla newsletter

Base H Architettura e i suoi componenti importanti

Maestro

Server della regione HBase

Regioni HBase

Custode dello zoo

HDFS

Modello dati HBase

ARTICOLI CORRELATI

Casi d'uso HBase

Meccanismo di archiviazione in HBase

Archiviazioni orientate alle colonne e orientate alle righe

Spiegazione dei dati di lettura e scrittura HBase

HBase e HDFS

DOMANDE FREQUENTI

Riassumi questo post con:

Iscriviti alla newsletter