Top 30 HBase Interview Questions (2023)

Here are Apache HBase interview questions and answers for fresher as well experienced candidates to get their dream job.

 

HBase Interview Questions and Answers for Freshers

1) Explain What is HBase?

HBase is a column-oriented database management system which runs on top of HDFS (Hadoop Distribute File System). HBase is not a relational data store, and it does not support structured query language like SQL.

In HBase, a master node regulates the cluster and region servers to store portions of the tables and operates the work on the data.

👉 Free PDF Download: HBase Interview Questions & Answers


2) Explain why to use HBase?

  • High capacity storage system
  • Distributed design to cater large tables
  • Column-Oriented Stores
  • Horizontally Scalable
  • High performance & Availability
  • Base goal of HBase is millions of columns, thousands of versions and billions of rows
  • Unlike HDFS (Hadoop Distribute File System), it supports random real time CRUD operations

3) Mention what are the key components of HBase?

HBase architecture consists mainly of following components

  • Zookeeper: It does the co-ordination work between client and HBase Maser
  • HBase Master: HBase Master monitors the Region Server
  • RegionServer: RegionServer monitors the Region
  • Region: It contains in memory data store(MemStore) and Hfile.
  • Catalog Tables: Catalog tables consist of ROOT and META
HBase Architecture Diagram
HBase Architecture Diagram

4) Explain what does HBase consists of?

  • HBase consists of a set of tables
  • And each table contains rows and columns like traditional database
  • Each table must contain an element defined as a Primary Key
  • HBase column denotes an attribute of an object

5) Mention how many operational commands in HBase?

There are mainly Five types of Operational commands in HBase:

  • Get
  • Put
  • Delete
  • Scan
  • Increment
HBase Interview Questions
HBase Interview Questions

6) Explain what is WAL and Hlog in HBase?

WAL (Write Ahead Log) is similar to MySQL BIN log; it records all the changes occur in data. It is a standard sequence file by Hadoop and it stores HLogkey’s. These keys consist of a sequential number as well as actual data and are used to replay not yet persisted data after a server crash. So, in cash of server failure WAL work as a life-line and retrieves the lost data’s.


7) When you should use HBase?

  • Data size is huge: When you have tons and millions of records to operate
  • Complete Redesign: When you are moving RDBMS to HBase, you consider it as a complete re-design then mere just changing the ports
  • SQL-Less commands: You have several features like transactions; inner joins, typed columns, etc.
  • Infrastructure Investment: You need to have enough cluster for HBase to be really useful

8) In HBase what is column families?

Column families comprise the basic unit of physical storage in HBase to which features like compressions are applied.


9) Explain what is the row key?

Row key is defined by the application. As the combined key is pre-fixed by the rowkey, it enables the application to define the desired sort order. It also allows logical grouping of cells and make sure that all cells with the same rowkey are co-located on the same server.


10) Explain deletion in HBase? Mention what are the three types of tombstone markers in HBase?

When you delete the cell in HBase, the data is not actually deleted but a tombstone marker is set, making the deleted cells invisible. HBase deleted are actually removed during compactions.

Three types of tombstone markers are there:

  • Version delete marker: For deletion, it marks a single version of a column
  • Column delete marker: For deletion, it marks all the versions of a column
  • Family delete marker: For deletion, it marks of all column for a column family

11) Explain how does HBase actually delete a row?

In HBase, whatever you write will be stored from RAM to disk, these disk writes are immutable barring compaction. During deletion process in HBase, major compaction process delete marker while minor compactions don’t. In normal deletes, it results in a delete tombstone marker- these delete data they represent are removed during compaction.

Also, if you delete data and add more data, but with an earlier timestamp than the tombstone timestamp, further Gets may be masked by the delete/tombstone marker and hence you will not receive the inserted value until after the major compaction.


12) Explain what happens if you alter the block size of a column family on an already occupied database?

When you alter the block size of the column family, the new data occupies the new block size while the old data remains within the old block size. During data compaction, old data will take the new block size. New files as they are flushed, have a new block size whereas existing data will continue to be read correctly. All data should be transformed to the new block size, after the next major compaction.


13) Mention the difference between HBase and Relational Database?

Here are some important differences between Apache HBase and Relational Database:

HBase Relational Database
  • It is schema-less
  • It is a column-oriented data store
  • It is used to store de-normalized data
  • It contains sparsely populated tables
  • Automated partitioning is done in HBase
  • It is a schema based database
  • It is a row-oriented data store
  • It is used to store normalized data
  • It contains thin tables
  • There is no such provision or built-in support for partitioning

14) What is HBaseFsck class?

There is a tool name called back is available in HBase, which is implemented by the HBaseFsck class. It offers several command-line switches that influence its behavior.


HBase Interview Questions and Answers for Experienced

15) What are the main key structures of HBase?

Row key and Column key are the two most important key structures using in HBase


16) Discuss how you can use filters in Apache HBase

Filters In HBase Shell. It was introduced in Apache HBase 0.92 which helps you to conduct server-side filtering for accessing HBase over HBase shell or thrift.


17) HBase support syntax structure like SQL yes or No?

No, unfortunately, SQL support for HBase is not available currently. However, by using Apache Phoenix, we can retrieve data from HBase through SQL queries.


18) What is the meaning of compaction in HBase?

At the time of heavy incoming writes, it is impossible to achieve optimal performance by having one file per store. HBase helps you to combines all these HFiles to reduce the number of disk seeds for every read. This process is known as for as Compaction in HBase.


19) How will you implement joins in HBase?

HBase, not support joins directly but uses MapReduce jobs join queries can be implemented by retrieving data with the help of different HBase tables.


20) Explain JMX concerning HBSE

Java Management Extensions or JMX is an export status of Java applications is the standard for them.


21) What is the use of MasterServer?

Master sever helps you to assign a region to the region server as well. It also helps you to handle the load balancing we use the MasterServer.


22) Define the Term Thrift

Apache Thrift is written in C++. It provides schema compilers for various programming languages like C++, Perl, PHP, Python, Ruby, and more.


23) Why use HColumnDescriptor class?

The detail regarding column family such as compression settings, Number of versions, are stored .in HColumnDescriptor.


24) What is a cell in HBase?

A cell in HBase is the smallest unit of an HBase table. It helps you to holds a piece of data in the form of a tuple{row, column, version}


25) What is a Bloom filter?

HBase supports Bloom Filter helps you to improve the overall throughput of the cluster. An HBase Bloom Filter is a space-efficient mechanism to test whether a HFile includes certain row or row-col cell.


26) Tell me about the types of HBase Operations?

Ans. Two types of HBase Operations are:

  • Read Operation
  • Write Operation

27) What is the use of HBase HMaster?

Main responsibilities of a master are:

  1. Coordinating the region servers
  2. Admin functions

28) Which technique can you use in HBase to access HFile directly without the help of HBase?

To access HFile directly without using HBase, we use HFile.main() method.


29) Can the region server will be located on all DataNodes?

Yes, Region Servers run on the same servers as a DataNodes


30) Name the filter which accepts the page size as the parameter in HBase

A filter named PageFilter accepts the page size as the parameter.

This document has been composed with the instant HTML converter tools.

These interview questions will also help in your viva(orals)