HBase Interview Questions and Answers

HBase Interview Questions and Answers

Last updated on 05th Oct 2020, Blog, Interview Question

About author

AjithKumar (Lead Engineer - Director Level )

Highly Expertise in Respective Industry Domain with 7+ Years of Experience Also, He is a Technical Blog Writer for Past 4 Years to Renders A Kind Of Informative Knowledge for JOB Seeker

(5.0) | 15623 Ratings 1464

HBase is an open-source non-relational distributed database modeled after Google’s Bigtable and written in Java. It is developed as part of Apache Software Foundation’s Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).

1.What is Apache HBase?

Ans:

An Open Source, Hadoop project which is distributed as well as has its genesis in Google’s Bigtable. This is what we call Apache HBase. The programming language of HBase is Java. Moreover, now it is considered an integral part of the Apache Software Foundation as well as the Hadoop ecosystem.

2.What is HBaseFsck class?

Ans:

There is a tool named hbck available in HBase, which is implemented by the HBaseFsck class. Basically, it offers several command-line switches that influence its behavior.

3.What is REST?

Ans:

Rest explains the semantics so that we can use the protocol in a generic way to address remote resources. Also to communicate with the server, it supports different message formats, offering many choices for a client application.

4.Define Thrift?

Ans:

In C++, Apache Thrift is written, but for many programming languages, it offers schema compilers, which includes Java, C++, Perl, PHP, Python, Ruby, and more.

5.What are the fundamental key structures of HBase?

Ans:

Row key and Column key are the fundamental key structures of HBase.

6.What is JMX?

Ans:

To export the status of Java applications, the Java Management Extensions technology is the standard for them.

7.What is Nagios?

Ans:

A very commonly supported tool for gaining qualitative data regarding cluster status is Nagios. On a regular basis, it polls current metrics and also compares them with given thresholds.

8.What is the syntax of describe Command?

Ans:

Syntax –

hbase> describe tablename

9.What is the use of the exit command?

Ans:

In order to check that the specified table exists or not, the exists command is used.

10.What is the use of MasterServer?

Ans:

To assign a region to the region server as well as to handle the load balancing we use the MasterServer.

11.What is HBase Shell?

Ans:

A Java API by which we communicate with HBase is what we call HBase Shell.

12.What are the different commands used in Hbase operations?

Ans:

There are 5 atomic commands which carry out different operations by Hbase.

Get, Put, Delete, Scan and Increment.

Subscribe For Free Demo

Error: Contact form not found.

13.How to connect to Hbase?

Ans:

A connection to Hbase is established through Hbase Shell which is a Java API.

14.What is the role of Zookeeper in Hbase?

Ans:

The zookeeper maintains configuration information, provides distributed synchronization, and also maintains the communication between clients and region servers.

15.When do we need to disable a table in Hbase?

Ans:

In Hbase a table is disabled to allow it to be modified or change its settings. .When a table is disabled it cannot be accessed through the scan command.

16.Give a command to check if a table is disabled.

Ans:

  • Hbase > is_disabled “table name”

17.What does the following table do?

  • hbase > disable_all ‘p.*’

Ans:

The command will disable all the table starting with the letter p

18.What are the different types of filters used in Hbase?

Ans:

Filters are used to get specific data from a Hbase table rather than all the records.

They are of the following types.

  • Column Value Filter
  • Column Value comparators
  • KeyValue Metadata filters.
  • RowKey filters.

19.Name three disadvantages Hbase has as compared to RDBMS?

Ans:

  • Hbase does not have in-built authentication/permission mechanism
  • The indexes can be created only on a key column, but in RDBMS it can be done in any column.
  • With one HMaster node there is a single point of failure.

20.What are catalog tables in Hbase?

Ans:

The catalog tables in Hbase maintain the metadata information. They are named as −ROOT− and .META. The −ROOT− table stores information about location of .META> table and the .META> table holds information about all regions and their locations.

21.Is Hbase a scale out or scale up process?

Ans:

Hbase runs on top of Hadoop which is a distributed system. Haddop can only scale up as and when required by adding more machines on the fly. So Hbase is a scale out process.

22.What are the steps in writing something into Hbase by a client?

Ans:

In Hbase the client does not write directly into the HFile. The client first writes to WAL(Write Access Log), which then is accessed by Memstore. The Memstore Flushes the data into permanent memory from time to time.

23.What is compaction in Hbase?

Ans:

As more and more data is written to Hbase, many HFiles get created. Compaction is the process of merging these HFiles to one file and after the merged file is created successfully, discard the old file.

24.What are the different compaction types in Hbase?

Ans:

There are two types of compaction. Major and Minor compaction. 

  • In minor compaction, the adjacent small HFiles are merged to create a single HFile without removing the deleted HFiles. Files to be merged are chosen randomly.
  • In Major compaction, all the HFiles of a column are emerged and a single HFiles is created. The deleted HFiles are discarded and it is generally triggered manually.

25.What is the difference between the commands delete column and delete family?

Ans:

The Delete column command deletes all versions of a column but the delete family deletes all columns of a particular family.

26.What is a cell in Hbase?

Ans:

A cell in Hbase is the smallest unit of a Hbase table which holds a piece of data in the form of a tuple{row,column,version}

27.What is the role of the class HColumnDescriptor in Hbase?

Ans:

This class is used to store information about a column family such as the number of versions, compression settings, etc. It is used as input when creating a table or adding a column.

28.What is the lower bound of versions in Hbase?

Ans:

The lower bound of versions indicates the minimum number of versions to be stored in Hbase for a column. For example If the value is set to 3 then three latest versions will be maintained and the older ones will be removed.

29.What is TTL (Time to live) in Hbase?

Ans:

TTL is a data retention technique using which the version of a cell can be preserved till a specific time period.Once that timestamp is reached the specific version will be removed.

30.Does Hbase support table joins?

Ans:

Hbase does not support table joins. But using a mapreduce job we can specify join queries to retrieve data from multiple Hbase tables.

31.What is a rowkey in Hbase?

Ans:

Each row in Hbase is identified by a unique byte of array called row key.

32.What are the two ways in which you can access data from Hbase?

Ans:

The data in Hbase can be accessed in two ways.

  • Using the rowkey and table scan for a range of row key values.
  • Using mapreduce in a batch manner.

33.What are the two types of table design approach in Hbase?

Ans:

They are − (i) Short and Wide (ii) Tall and Thin

34.In which scenario should we consider creating a short and wide Hbase table?

Ans:

The short and wide table design is considered when there is

  • There is a small number of columns
  • There is a large number of rows

35.In Which scenario should we consider a Tall-thin table design?

Ans:

The tall and thin table design is considered when there is

  • There is a large number of columns
  • There is a small number of rows
Course Curriculum

Get Experts Curated HBase Training with Industry Trends Concepts

  • Instructor-led Sessions
  • Real-life Case Studies
  • Assignments
Explore Curriculum

36.Give a command to store 4 versions in a table rather than the default 3.?

Ans:

  • hbase > alter ‘tablename’, {NAME => ‘ColFamily’, VERSIONS => 4}

37.What does the following command do?

Ans:

  • hbase > alter ‘tablename’, {NAME => ‘colFamily’, METHOD => ‘delete’}

This command deletes the column family from the table.

38.Give the commands to add a new column family “(newcolfamily”) to a table (“tablename”) which has a existing column family(“oldcolfamily”)?

Ans:

  • Hbase > disable ‘tablename’
  • Hbase > alter ‘tablename’ {NAME => ‘oldcolfamily’,NAME=>’newcolfamily’}
  • Habse > enable ‘tablename’

39.What is the Hbase shell command to only 10 records from a table?

Ans:

  • scan ‘tablename’, {LIMIT=>10,
  • STARTROW=>”start_row”,
  • STOPROW=>”stop_row”}

40.How does Hbase support Bulk data loading?

Ans:

There are two main steps to do a data bulk load in Hbase.

  • Generate Hbase data file(StoreFile) using a custom mapreduce job) from the data source. The StoreFile is created in Hbase internal format which can be efficiently loaded.
  • The prepared file is imported using another tool like completebulkload to import data into a running cluster. Each file gets loaded to one specific region.

41.How does Hbase provide high availability?

Ans:

Hbase uses a feature called region replication. In this feature for each region of a table, there will be multiple replicas that are opened in different RegionServers. The Load Balancer ensures that the region replicas are not co-hosted in the same region servers.

42.What is HMaster?

Ans:

The Hmaster is the Master server responsible for monitoring all RegionServer instances in the cluster and it is the interface for all metadata changes. In a distributed cluster, it runs on the Namenode.

43.What is HRegionServer in Hbase?

Ans:

HRegionServer is the RegionServer implementation. It is responsible for serving and managing regions. In a distributed cluster, a RegionServer runs on a DataNode.

44.What are the different Block Caches in Hbase?

Ans:

HBase provides two different BlockCache implementations: the default on-heap LruBlockCache and the BucketCache, which is (usually) off-heap.

45.How does WAL help when a RegionServer crashes?

Ans:

The Write Ahead Log (WAL) records all changes to data in HBase, to file-based storage. if a RegionServer crashes or becomes unavailable before the 

MemStore is flushed, the WAL ensures that the changes to the data can be replayed.

46.Why is MultiWAL needed?

Ans:

With a single WAL per RegionServer, the RegionServer must write to the WAL serially, because HDFS files must be sequential. This causes the WAL to be a performance bottleneck.

47.In Hbase what is log splitting?

Ans:

When a region is edited, the edits in the WAL file which belong to that region need to be replayed. Therefore, edits in the WAL file must be grouped by region so that particular sets can be replayed to regenerate the data in a particular region. The process of grouping the WAL edits by region is called log splitting.

48.How can you disable WAL? What is the benefit?

Ans:

WAL can be disabled to improve performance bottleneck. This is done by calling the Hbase client field Mutation.writeToWAL(false).

49.When do we do manual Region splitting?

Ans:

The manual region splitting is done. We have an unexpected hotspot in your table because of many clients querying the same table.

50.What is a Hbase Store?

Ans:

A Habse Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.

51.Which file in Hbase is designed after the SSTable file of BigTable?

Ans:

The HFile in Hbase which stores the Actual data(not metadata) is designed after the SSTable file of BigTable.

52.Why do we pre-create empty regions?

Ans:

Tables in HBase are initially created with one region by default. Then for bulk imports, all clients will write to the same region until it is large enough to split and become distributed across the cluster. So empty regions are created to make this process faster.

53.What is hotspotting in Hbase?

Ans:

Hotspotting is a situation when a large amount of client traffic is directed at one node, or only a few nodes, of a cluster. This traffic may represent reads, writes, or other operations. This traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability.

54.What are the approaches to avoid hotspotting?

Ans:

Hotspotting can be avoided or minimized by distributing the rowkeys across multiple regions. The different techniques to do this is salting and Hashing.

55.Why should we try to minimize the row name and column name sizes in Hbase?

Ans:

Hbase values are always freighted with their coordinates; as a cell value passes through the system, it’ll be accompanied by its row, column name, and timestamp. If the rows and column names are large, especially compared to the size of the cell value, then indices that are kept on HBase store files (StoreFile (HFile)) to facilitate random access may end up occupying large chunks of the HBase allotted RAM than the data itself because the cell value coordinates are large.

Course Curriculum

Get Practical Oriented HBase Certification Course By Experts Trainers

Weekday / Weekend BatchesSee Batch Details

56.What is the scope of a rowkey in Hbase?

Ans:

Rowkeys are scoped to ColumnFamilies. The same rowkey could exist in each ColumnFamily that exists in a table without collision.

57.What is the information stored in hbase:meta table?

Ans:

The Hbase:meta tables stores details of region in the system in the following format.

  • info:regioninfo (serialized HRegionInfo instance for this region)
  • info:server (server:port of the RegionServer containing this region)
  • info:serverstartcode (start-time of the RegionServer process containing this region)

58.What is a Namespace in Hbase?

Ans:

A Namespace is a logical grouping of tables . It is similar to a database object in a Relational database system.

59.How do we get the complete list of columns that exist in a column Family?

Ans:

The complete list of columns in a column family can be obtained only querying all the rows for that column family.

60.When the records are fetched from a Hbase table, in which order are they sorted?

Ans:

The records fetched from Hbase are always sorted in the order of rowkey-> column Family-> column qualifier-> timestamp.

61.Define catalog tables in HBase?

Ans:

In order to maintain the metadata information, we use Catalog tables.

62.What is the use of HColumnDescriptor class?

Ans:

The information about a column family like compression settings, Number of versions etc, stores in HColumnDescriptor.

63.What is the function of HMaster?

Ans:

For monitoring all Region Server instances in clusters, a MasterServer is responsible.

64.Which filter accepts the page size as the parameter in HBase?

Ans:

A filter named PageFilter accepts the page size as the parameter.

65.Which method is used to access HFile directly without using HBase?

Ans:

In order to access HFile directly without using HBase, we use HFile.main() method.

66.Pros of HBase?

Ans:

There are various advantages of HBase, like:

  • Large data sets:
    It can easily handle as well as stores large datasets on top of HDFS file storage.
  • Databases breakdown:
    When relational databases break down at that time,  HBase shines.
  • Fast Processing:
    In HBase, data reading and processing will take less amount of time.
  • Failover support and load sharing:
    Since HDFS is internally distributed and automatically recovered and 66HBase runs on top of HDFS, so HBase is automatically recovered. And with the help of  RegionServer replication, we have this failover facility.
  • Scalability:
    In both linear and modular form, Scalability supports.

67.Cons of HBase?

Ans:

There are various disadvantages to HBase, like:

  • Single point of failure:
    At the time when only one HMaster is used, there is a possibility of failure.
  • No transaction support:
    In HBase, there is no support for the transaction.
  • No handling of JOINS in database:
    Instead of the database itself, JOINs are handled in the MapReduce layer.

68.Specify some uses of HBase.

Ans:

Most Use cases of Apache HBase are:

  • To have random, real-time read/write access to Big Data,  Apache HBase is great.
  • To host very large tables on top of clusters of commodity hardware Apache HBase is a great choice.
  • HBase is a non-relational database model.

69.State some applications of HBase.

Ans:

some applications of HBase are:

  • For write-heavy applications, we can use Apache HBase.
  • Moreover, for fast random access to available data, HBase is a good choice.
  • And companies, like  Twitter, Facebook, Yahoo, and Adobe etc. are using HBase internally.

70.Compare HBase vs HDFS?

Ans:

a. Built on: 

  • HBase

Basically,  it is built on top of the HDFS.

  • HDFS

Whereas, for storing large files, it is suitable.

b.lookups

  • HBase

For larger tables, basically, it offers fast lookups.

  • HDFS

However, HDFS does not offer fast lookups.

c.Latency

  • HBase

It provides low latency access.

  • HDFS

And, it provides high latency batch processing.

71.Compare HBase vs RDBMS.

Ans:

Below given is the feature wise comparison of HBase vs RDBMS:

a.Structure

  • HBase

Structure of HBase is schema-less.

  • RDBMS

Well, RDBMS is governed by its schema only.

b.Scalability

  • HBase

Basically, for wide tables, it is built and also it is horizontally scalable.

  • RDBMS

But, RDBMS is thin and built for small tables. Also it is Hard to scale.

c.Transaction

  • HBase

No transactions possible in HBase.

  • RDBMS

RDBMS is transactional.

72.Any 3 Features of HBase.

Ans:

Some features of HBase are:

a.Consistency

For high-speed requirements, we can use it due to its consistent reads and writes.

b.Atomic read and write

At the time of one read or write process, different processes are prevented from performing any read or write operations that’s why it is named as Atomic read and write.

c.Sharding

HBase offers automatic and manual splitting of regions into smaller subregions when it reaches a threshold size.

73.How many types of HBase Operations are there?

Ans:

There are two basic types of HBase Operations:

  • Read Operation
  • Write Operation

74.Explain HBase Architecture in brief?

Ans:

Basically, servers in an HBase Architecture are of 3 types  HMaster, Region Server, and ZooKeeper.

i.  Servers which serve data for reads and write purposes are Region servers. That means while accessing data clients can directly communicate with HBase RegionServers.

ii. HBase Master handles the region assignment as well as DDL (create, delete tables) operations.

iii. And, Zookeeper maintains a live cluster state.

75.Explain HBase Meta Table?

Ans:

A special HBase Catalog table is a META table. Mainly, it holds the location of the regions in the cluster.

hbase Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

76.Give the name of the key components of HBase

Ans:

The key components of HBase are Zookeeper, RegionServer, Region, Catalog Tables and HBase Master.

77.What is S3?

Ans:

S3 stands for simple storage service and it is a one of the file systems used by hbase.

78.What is the use of get() method?

Ans:

get() method is used to read the data from the table.

79.What is the reason for using HBase?

Ans:

HBase is used because it provides random read and write operations and it can perform a number of operations per second on a large data set.

80.In how many modes HBase can run?

Ans:

There are two run modes of HBase i.e. standalone and distributed.

81.Define the difference between hive and HBase?

Ans:

HBase is used to support record level operations but hive does not support record level operations.

82.Define column families?

Ans:

It is a collection of columns whereas row is a collection of column families.

83.Define standalone mode in HBase?

Ans:

It is a default mode of HBase. In standalone mode, HBase does not use HDFS—it uses the local filesystem instead—and it runs all HBase daemons and a local ZooKeeper in the same JVM process.

84.What is decorating Filters?

Ans:

It is useful to modify, or extend, the behavior of a filter to gain additional control over the returned data.

85.What is the full form of YCSB?

Ans:

YCSB stands for Yahoo! Cloud Serving Benchmark.

86.What is the use of YCSB?

Ans:

It can be used to run comparable workloads against different storage systems.

87.Which operating system is supported by HBase?

Ans:

HBase supports those OS which support java like windows, Linux.

88.What is the most common file system of HBase?

Ans:

The most common file system of HBase is HDFS i.e. Hadoop Distributed File System.

89.Define Pseudo Distributed mode?

Ans:

A pseudo distributed mode is simply a distributed mode that is run on a single host.

90.What is regionserver?

Ans:

It is a file which lists the known region server names.

91.Define MapReduce.

Ans:

MapReduce as a process was designed to solve the problem of processing in excess of terabytes of data in a scalable way.

92.What are the operational commands of HBase?

Ans:

Operational commands of HBase are Get, Delete, Put, Increment, and Scan.

93.Which code is used to open the connection in Hbase?

Ans:

Following code is used to open a connection:

  • Configuration myConf = HBaseConfiguration.create();
  • HTableInterface usersTable = new HTable(myConf, “users”);

94.Which command is used to show the version?

Ans:

Version command is used to show the version of HBase.

  • Syntax – hbase> version

95.What is the use of tools command?

Ans:

This command is used to list the HBase surgery tools.

96.What is the use of shutdown command?

Ans:

It is used to shut down the cluster.

97.What is the use of truncate command?

Ans:

It is used to disable, recreate and drop the specified tables.

98.Which command is used to run HBase Shell?

Ans:

$ ./bin/hbase shell command is used to run the HBase shell.

99.Which command is used to show the current HBase user?

Ans:

The whoami command is used to show HBase users.

100.What is the use of InputFormat in MapReduce process?

Ans:

InputFormat the input data, and then it returns a RecordReader instance that defines the classes of the key and value objects, and provides a next() method that is used to iterate over each input record.

101.Define LZO?

Ans:

Lempel-Ziv-Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed, and written in ANSI C.

Are you looking training with Right Jobs?

Contact Us

Popular Courses