Hbase introduction LEARNOVITA

Introduction to HBase and Its Architecture | A Complete Guide For Beginners

Last updated on 02nd Nov 2022, Artciles, Blog

About author

Karthika (Data Engineer )

Karthika has a wealth of experience in cloud computing, including BI, Perl, Salesforce, Microstrategy, and Cobit. Moreover, she has over 9 years of experience as Data Engineer in AI can automate many of the tasks that data scientists and data engineers perform.

(5.0) | 18539 Ratings 2118
    • In this article you will learn:
    • 1.Introduction to HBase.
    • 2.HBase History.
    • 3.What is HBase?
    • 4.Why HBase?
    • 5.HBase Architecture.
    • 6.Characteristics of HBase.
    • 7.Applications of HBase .
    • 8.Features of HBase .
    • 9.Conclusion.

Introduction to HBase:

A few decades ago, an internet wasn’t available, that is also when a data generated was much lesser and also was structured in a nature.Structured data means data that has a definite structure and which has to standard order.This data was stored in a Relational Database (RDBMS) without any hassle. With an evolution of the internet, large volumes of structured and semi-structured data started getting generated.Semi-structured data includes the emails, JSON, XML, and .csv files to name a few.Loads of a semi-structured data was created across globe. As a result, storing and processing this data became a main challenge.

HBase History:

  • Back in a November 2006, Google released a paper on BigTable.
  • Then in February 2007, HBase prototype was created as Hadoop contribution.
  • In October 2007, first usable HBase along with the Hadoop 0.15.0 was released, and a HBase became a subproject of Hadoop in January 2008. HBase 0.81.1, 0.19.0 and 0.20.0 were released between Oct 2008 and Sep 2009.
  • Finally, in May 2010, HBase became an Apache top-level project.

What is a HBase?

  • HBase is modeled after a Google’s Bigtable, which is the distributed storage system for structured data.
  • Just as a Bigtable leverages the distributed data storage provided by a Google File System, Apache HBase provides a Bigtable-like capabilities on top of a Hadoop and HDFS.
  • Some of companies that use a HBase as their core program are be Facebook, Netflix, Yahoo, Adobe, and Twitter.
  • The goal of HBase is to host large tables with the billions of rows and millions of columns on a top of clusters of commodity hardware.

Why HBase?

  • It can save huge amounts of data in the tabular format for extremely fast reads and writes.
  • HBase is mostly used in the scenario that requires regular, consistent insertion and overwriting of a data.
  • However, it performs the only batch processing where the data is accessed in the sequential manner.
  • This means one has to search an entire dataset for even simplest of jobs.
  • Hence, a solution was a required to access, read, or write data any time regardless of its sequence in a clusters of data.
HBase Architecture

HBase Real Life Connect – Example:

May be aware that Facebook has introduced the new Social Inbox integrating email, IM, SMS, text messages, and on-site Facebook messages. They need to save over a 135 billion messages a month.Facebook chose a HBase because it needed the system that could handle the two types of data patterns:

  • An ever-growing dataset that is be rarely accessed.
  • An ever-growing dataset that is highly volatile read what’s in a Inbox, and then rarely look at it again.

HBase Architecture:

  • The Apache Zookeeper monitors a system, and the HBase Master assigns regions and also load balancing.
  • The Region server serves a data to read and write. The Region Server is all various computers in a Hadoop cluster.
  • It consists of a Region, HLog, Store, MemoryStore, and different files. All this is a part of HDFS storage system.

Characteristics of HBase:

  • HBase is the type of NoSQL database and is classified as key-value store.
  • Value is identified with the key.
  • Both key and values are be Byte Array, which means binary formats can be saved easily.
  • Values are saved in key-orders.
  • Values can be quickly accessed by keys.
  • HBase is the database in which tables have no schema; column families and not columns are explained at the time of table creation.
Overview of HBase

Applications of HBase :

There are number of HBase applications across the different industries, from healthcare to e-commerce to the sports sector. For instance:

  • In healthcare sector, HBase is used for the storing genome sequences and disease history of a people or a particular area.
  • In a field of e-commerce, HBase is used for saving logs about customer search history and it also performs the analytics and target advertisement for the better business insights.
  • In sports, HBase is used to save match details and the history of every match. It uses this data for a better prediction.

Features of HBase :

Scalable: HBase allows the data to be scaled across different nodes as it is stored in HDFS.

Automatic failure support: Write a ahead Log across clusters are present that offers an automatic support against failure.

Consistent read and write: HBase offers consistent read and write of data.

JAVA API for client access: HBase offers easy to use a JAVA API for clients.

Block cache and Bloom filters: It supports the block cache and bloom filters for more volume query optimization.

Conclusion:

HBase style components:

  • A HMaster, HRegionServer, HRegions, ZooKeeper, HDFS.
  • HMaster in HBase is a first server implementation for a HBase vogue.
  • Regions are the needed building blocks of HBase cluster that has distribution tables and is built by a Column families.
  • HDFS offera high level of error tolerance and uses a foremost reasonable hardware.
  • HBase information Model may be a collection of the elements that embrace Tables, Rows, Column Families, Cells, Columns, and Versions.
  • The column and end-line endings disagree in final path.

Are you looking training with Right Jobs?

Contact Us

Popular Courses