Hive Interview Questions and Answers

Hive Interview Questions and Answers

Last updated on 14th Oct 2020, Blog, Interview Question

About author

Keerthana (Sr Tech Lead Engineer - Director Level )

He is Possessing 7+ Years Of Experience in Hive. His Passion lies in Developing Entrepreneurs & Activities. Also, Rendered his intelligence to the Enthusiastic JOB Seekers.

(5.0) | 13321 Ratings 1924

The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

1. What are the different types of tables available in HIve?

Ans:

There are two types. Managed table and external table. In the managed table both the data and schema are under control of hive but in the external table only the schema is under control of Hive.

2. Is Hive suitable to be used for OLTP systems? Why?

Ans:

No Hive does not provide insert and update at row level. So it is not suitable for the OLTP system.

3. How can a table be renamed in Hive?

Ans:

Alter Table table_name RENAME TO new_name

4. How can we change the data type of a column in a hive table?

Ans:

Using REPLACE column option

ALTER TABLE table_name REPLACE COLUMNS ……

5. What is a metastore in Hive?

Ans:

It is a relational database storing the metadata of hive tables, partitions, Hive databases etc

6. What is the need for custom Serde?

Ans:

Depending on the nature of data the user has, the inbuilt SerDe may not satisfy the format of the data. SO users need to write their own java code to satisfy their data format requirements.

7. Why do we need Hive?

Ans:

Hive is a tool in the Hadoop ecosystem which provides an interface to organize and query data in a database like fashion and write SQL like queries. It is suitable for accessing and analyzing data in Hadoop using SQL syntax.

8. What is the default location where hive stores table data?

Ans:

hdfs://namenode_server/user/hive/warehouse

9. What are the three different modes in which hives can be run?

Ans:

  • Local mode
  • Distributed mode
  • Pseudo Distributed mode

10. Is there a date data type in Hive?

Ans:

Yes. The TIMESTAMP data types stores date in java.sql.timestamp format

11. What are collection data types in Hive?

Ans:

There are three collection data types in Hive.

  • ARRAY
  • MAP
  • STRUCT

12. Can we run unix shell commands from hive? Give examples.

Ans:

Yes, using the ! mark just before the command.

For example pwd at hive prompt will list the current directory.

Subscribe For Free Demo

Error: Contact form not found.

13. What is a Hive variable? What for we use it?

Ans:

The hive variable is a variable created in the Hive environment that can be referenced by Hive scripts. It is used to pass some values to the hive queries when the query starts executing.

14. Can hive queries be executed from script files? How?

Ans:

Using the source command.

Example −

Hive> source /path/to/file/file_with_query.hql

15. What is the importance of the .hiverc file?

Ans:

It is a file containing a list of commands needed to run when the hive CLI starts. For example setting the strict mode to be true etc.

16. What are the default record and field delimiter used for hive text files?

Ans:

The default record delimiter is − \n

And the filed delimiters are − \001,\002,\003

17. What do you mean by schema on read?

Ans:

The schema is validated with the data when reading the data and not enforced when writing data.

18. How do you list all databases whose name starts with p?

Ans:

SHOW DATABASES LIKE ‘p.*’

19. What does the “USE” command in hive do?

Ans:

With the use command you fix the database on which all the subsequent hive queries will run.

20. How can you delete the DBPROPERTY in Hive?

Ans:

There is no way you can delete the DBPROPERTY.

21. What kind of applications are supported by Apache Hive?

Ans:

All those client applications which are written in Java, PHP, Python, C++ or Ruby by exposing its thrift server, Hive supports them.

22. What does the following query do?
INSERT OVERWRITE TABLE employees PARTITION (country, state) SELECT …, se.cnty, se.st FROM staged_employees se;

Ans:

It creates partition on table employees with partition values coming from the columns in the select clause. It is called a Dynamic partition insert.

23. Where does the data of a Hive table gets stored?

Ans:

In an HDFS directory – /user/hive/warehouse, the Hive table is stored, by default only. Moreover, by specifying the desired directory in hive.metastore.warehouse.dir configuration parameter present in the hive-site.xml, one can change it. 

24. Why does Hive not store metadata information in HDFS?

Ans:

Using RDBMS instead of HDFS, Hive stores metadata information in the metastore. Basically, to achieve low latency we use RDBMS. Because HDFS read/write operations are time-consuming processes.

25. What is the difference between local and remote metastore?

Ans:

Local Metastore: The metastore service runs in the same JVM in which the Hive service is running and connects to a database running in a separate JVM. Either on the same machine or on a remote machine.

Remote Metastore: In this configuration, the metastore service runs on its own separate JVM and not in the Hive service JVM.

26. What is the default database provided by Apache Hive for metastore?

Ans:

It offers an embedded Derby database instance backed by the local disk for the metastore, by default. It is what we call embedded metastore configuration.

27. What is the difference between the external table and managed table?

Ans:

Managed table: The metadata information along with the table data is deleted from the Hive warehouse directory if one drops a managed table.\

External table: Hive just deletes the metadata information regarding the table. Further, it leaves the table data present in HDFS untouched.

28. Is it possible to change the default location of a managed table?

Ans:

Yes, by using the clause – LOCATION ‘<hdfs_path>’ we can change the default location of a managed table.

29. When should we use SORT BY instead of ORDER BY?

Ans:

Despite ORDER BY we should use SORT BY. Especially while we have to sort huge datasets. The reason is SORT BY clause sorts the data using multiple reducers. ORDER BY sorts all of the data together using a single reducer. Hence, using ORDER BY will take a lot of time to execute a large number of inputs.

30. What is a partition in Hive?

Ans:

Basically, for the purpose of grouping similar types of data together on the basis of column or partition key, Hive organizes tables into partitions. Moreover, to identify a particular partition each table can have one or more partition keys. On defining Hive Partition, in other words, it is a sub-directory in the table directory.

31. Why do we perform partitioning in Hive?

Ans:

In a Hive table, Partitioning provides granularity. Hence, by scanning only relevant partitioned data instead of the whole dataset it reduces the query latency.

32. What is dynamic partitioning and when is it used?

Ans:

Dynamic partitioning values for partition columns are known in the runtime. In other words, it is known during loading of the data into a Hive table.

Usage:

  1. 1. While we Load data from an existing non-partitioned table, in order to improve the sampling. Thus it decreases the query latency.
  2. 2. Also, while we do not know all the values of the partitions beforehand. Thus, finding these partition values manually from a huge dataset is a tedious task.

33. Why do we need buckets?

Ans:

Basically, for performing bucketing to a partition there are two main reasons:

  • A map side join requires the data belonging to a unique join key to be present in the same partition.
  • It allows us to decrease the query time. Also, makes the sampling process more efficient.

34. How Hive distributes the rows into buckets?

Ans:

By using the formula: hash_function (bucketing_column) modulo (num_of_buckets) Hive determines the bucket number for a row. Basically, hash_function depends on the column data type. Although, hash_function for integer data type will be:

  • hash_function (int_type_column)= value of int_type_column

35.  What is indexing and why do we need it?

Ans:

Hive index is a Hive query optimization technique. Basically, we use it to speed up the access of a column or set of columns in a Hive database. Since, the database system does not need to read all rows in the table to find the data with the use of the index, especially that one has selected.

Course Curriculum

Best JOB Oriented Hive in Depth Training Training By Industry Experts

  • Instructor-led Sessions
  • Real-life Case Studies
  • Assignments
Explore Curriculum

36. What is the use of Hcatalog?

Ans:

Basically, to share data structures with external systems we use Hcatalog. It offers access to hive metastore to users of other tools on Hadoop. Hence, they can read and write data to their data warehouse.

37. Where is table data stored in Apache Hive by default?

Ans:

hdfs: //namenode_server/user/hive/warehouse

38. Are multi-line comments supported in Hive?

Ans:

No

39. What is ObjectInspector functionality?

Ans:

To analyze the structure of individual columns and the internal structure of the row objects we use ObjectInspector. Basically, it provides access to complex objects which can be stored in multiple formats in Hive.

40. Explain about the different types of join in Hive.

Ans:

There are  4 different types of joins in HiveQL –

  • JOIN-  It is very similar to Outer Join in SQL
  • FULL OUTER JOIN – This join Combines the records of both the left and right outer tables. Basically, that fulfill the join condition.
  • LEFT OUTER JOIN- Through this Join, All the rows from the left table are returned even if there are no matches in the right table.
  • RIGHT OUTER JOIN – Here also, all the rows from the right table are returned even if there are no matches in the left table.

41. How can you configure remote metastore mode in Hive?

Ans:

Basically, hive-site.xml file has to be configured with the below property, to configure metastore in Hive –

hive.metastore.uris

  thrift: //node1 (or IP Address):9083

  IP address and port of the metastore host

42. Is it possible to change the default location of Managed Tables in Hive, if so how?

Ans:

Yes, by using the LOCATION keyword while creating the managed table, we can change the default location of Managed tables. But the one condition is, the user has to specify the storage path of the managed table as the value of the LOCATION keyword.

43.  How does data transfer happen from HDFS to Hive?

Ans:

Basically, the user need not LOAD DATA that moves the files to the /user/hive/warehouse/. But only if data is already present in HDFS. Hence, using the keyword external that creates the table definition in the hive metastore  the user just has to define the table.

  • Create external table table_name (
  •  id int,
  •  myfields string
  • )
  • location ‘/my/location/in/hdfs’;

44. What are the different components of a Hive architecture?

Ans:

There are several components of Hive Architecture. Such as –

  1. 1. User Interface – Basically, it calls the execute interface to the driver. Further, the driver creates a session handle to the query. Then sends the query to the compiler to generate an execution plan for it.
  2. 2. Metastore – It is used to Send the metadata to the compiler. Basically, for the execution of the query on receiving the send MetaData request.
  3. 3. Compiler- It generates the execution plan. Especially, that is a DAG of stages where each stage is either a metadata operation, a map or reduce job or an operation on HDFS.
  4. 4. Execute Engine- Basically,  by managing the dependencies for submitting each of these stages to the relevant components we use Execute engine.

45. Wherever (Different Directory) I run the hive query, it creates new metastore_db, please explain the reason for it?

Ans:

Basically, it creates the local metastore, while we run the hive in embedded mode. Also, it looks whether a metastore already exists or not before creating the metastore. Hence, in configuration file hive-site.xml. Property is “javax.jdo.option.ConnectionURL” with default value “jdbc:derby:;databaseName=metastore_db;create=true” this property is defined. Hence, to change the behavior change the location to the absolute path, thus metastore will be used from that location.

46. Is it possible to use the same metastore by multiple users, in case of the embedded hive?

Ans:

No, we cannot use metastore in sharing mode. It is possible to use it in a standalone “real” database. Such as MySQL or PostGresSQL.

47. Usage of Hive.

Ans:

Here, we will look at the following Hive usages.

We use Hive for Schema flexibility as well as evolution.

Moreover, it is possible to portion and bucket tables in Apache Hive.

Also, we can use JDBC/ODBC drivers, since they are available in Hive.

48. Features and Limitations of Hive.

Ans:

Features of Hive

  1. 1. The best feature is it offers data summarization, query, and analysis in a much easier manner.
  2. 2. To process data without actually storing in HDFS, Hive supports external tables.
  3. 3. Moreover, it fits the low-level interface requirement of Hadoop perfectly.

Limitation of Hive

  1. 1.We can not perform real-time queries with Hive. Also, it does not offer row-level updates.
  2. 2. Moreover,  for interactive data browsing Hive offers acceptable latency.
  3. 3. Also, we can say Hive is not the right choice for online transaction processing.

49. What is the significance of the line

set hive.mapred.mode = strict;

Ans:

It sets the mapreduce jobs to strict mode.By which the queries on partitioned tables can not run without a WHERE clause. This prevents very large jobs running for a long time.

50. How do you check if a particular partition exists?

Ans:

This can be done with following query

SHOW PARTITIONS table_name PARTITION(partitioned_column=’partition_value’)

51. Which java class handles the Input record encoding into files which store the tables in Hive?

Ans:

org.apache.hadoop.mapred.TextInputFormat

52. Which java class handles the output record encoding into files which result from Hive queries?

Ans:

org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

53. What is the significance of the ‘IF EXISTS” clause while dropping a table?

Ans:

When we issue the command DROP TABLE IF EXISTS table_name

Hive throws an error if the table being dropped does not exist in the first place.

54. When you point a partition of a hive table to a new directory, what happens to the data?

Ans:

The data stays in the old location. It has to be moved manually.

55. Write a query to insert a new column(new_col INT) into a hiev table (htab) at a position before an existing column (x_col)

Ans:

  • ALTER TABLE table_name
  • CHANGE COLUMN new_col  INT
  • BEFORE x_col
Course Curriculum

Gain Hands-on Experience with Hive Certification Course to Build Your Skills

Weekday / Weekend BatchesSee Batch Details

56. Does the archiving of Hive tables give any space saving in HDFS?

Ans:

No. It only reduces the number of files which becomes easier for the namenode to manage.

57. How can you stop a partition from being queried?

Ans:

By using the ENABLE OFFLINE clause with ALTER TABLE statement.

58. While loading data into a hive table using the LOAD DATA clause, how do you specify it is a hdfs file and not a local file ?

Ans:

By Omitting the LOCAL CLAUSE in the LOAD DATA statement.

59. If you omit the OVERWRITE clause while creating a hive table,what happens to files which are new and files which already exist?

Ans:

The new incoming files are just added to the target directory and the existing files are simply overwritten. Other files whose name does not match any of the incoming files will continue to exist.

If you add the OVERWRITE clause then all the existing data in the directory will be deleted before new data is written.

60.Differentiate between Pig and Hive.

Ans:

CriteriaApache PigApache Hive
NatureProcedural data flow languageDeclarative SQL-like language
ApplicationUsed for programmingUsed for report creation
Used byResearchers and programmersMainly Data Analysts
Operates onThe client-side of a clusterThe server-side of a cluster
Accessing raw dataNot as fast as HiveQLFaster with in-built features
Schema or data typeAlways defined in the script itselfStored in the local database
Ease of learningTakes little extra time and effort to masterEasy to learn from database experts

61. How to skip header rows from a table in Hive?

Ans:

Imagine that header records in a table are as follows:

System=…

Version=…

Sub-version=…

Suppose, we do not want to include the above three lines of headers in our Hive query. To skip the header lines from our table in Hive, we will set a table property.

  • CREATE EXTERNAL TABLE employee (
  • name STRING,
  • job STRING,
  • dob STRING,
  • id INT,
  • salary INT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE

LOCATION ‘/user/data’

TBLPROPERTIES(“skip.header.line.count”=”2”);

62. Explain the process to access subdirectories recursively in Hive queries.

Ans:

By using the below commands, we can access subdirectories recursively in Hive:

hive> Set mapred.input.dir.recursive=true;

hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be pointed to the higher level directory, and this is suitable for the directory structure like:

/data/country/state/city/

63. Can we change the settings within a Hive session? If yes, how?

Ans:

Yes, we can change the settings within a Hive session using the SET command. It helps change the Hive job settings for an exact query. For example, the following command shows that buckets are occupied according to the table definition:

hive> SET hive.enforce.bucketing=true;

We can see the current value of any property by using SET with the property name. SET will list all the properties with their values set by Hive.

hive> SET hive.enforce.bucketing;

hive.enforce.bucketing=true

This list will not include the defaults of Hadoop. So, we should use the below code:

SET -v

It will list all the properties including the Hadoop defaults in the system.

64. Is it possible to add 100 nodes when we already have 100 nodes in Hive? If yes, how?

Ans:

Yes, we can add the nodes by following the below steps:

Step 1: Take a new system; create a new username and password

Step 2: Install SSH and with the master node setup SSH connections

Step 3: Add ssh public_rsa id key to the authorized_keys file

Step 4: Add the new DataNode hostname, IP address, and other details in /etc/hosts slaves file:

192.168.1.102 slave3.in slave3

Step 5: Start the DataNode on a new node

Step 6: Login to the new node like suhadoop or:

ssh -X hadoop@192.168.1.103

Step 7: Start HDFS of the newly added slave node by using the following command:

./bin/hadoop-daemon.sh start data node

Step 8: Check the output of the jps command on the new node

65. Explain the concatenation function in Hive with an example.

Ans:

The concatenate function will join the input strings. We can specify

‘n’ number of strings separated by a comma.

Example:

  • CONCAT (‘Rose’,’-‘,’is’,’-‘,’a’,’-‘,’beautiful’,’-’,’flower’);

Output:

Rose-is-a-beautiful-flower

Every time, we set the limits of the strings by ‘-‘. If it is common for every string, then Hive provides another command:

CONCAT_WS

In this case, we have to specify the set limits of the operator first as follows:

  • CONCAT_WS (‘-‘,’Rose’,’-‘,’is’,’-‘,’a’,’-‘,’beautiful’,’-’,’flower’);

Output:

Rose-is-a-beautiful-flower

66. Explain the Trim and Reverse functions in Hive with examples.

Ans:

The trim function will delete the spaces associated with a string.

Example:

  • TRIM(‘ ABBBBBBCCCC‘);

Output:

ABBBBBBCCCC

To remove the leading space:

LTRIM(‘ ABBBBBBCCCC’);

To remove the trailing space:

RTRIM(‘ABBBBBBCCCC ‘);

In the reverse function, characters are reversed in the string.

Example:

  • REVERSE(‘ABBBBBBCCCC’);

Output:

CCCCBBBBBBA

67. How to change the column data type in Hive? Explain RLIKE in Hive.

Ans:

We can change the column data type by using ALTER and CHANGE as follows:

ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;

For example, if we want to change the data type of the salary column from integer to bigint in the employee table, we can use the following:

ALTER TABLE employee CHANGE salary salary BIGINT;

RLIKE: Its full form is Right-Like and it is a special function in Hive. It helps examine two substrings, i.e., if the substring of A matches with B, then it evaluates to true.

Example:

  • ‘ABBBBBBCCCC’ RLIKE ‘tell’  True
  • ‘ABBBBBBCCCC’ RLIKE ‘^I.*’  True (this is a regular expression)

68. What kind of data warehouse application is suitable for Hive? What are the types of tables in Hive?

Ans:

Hive is not considered a full database. The design rules and regulations of Hadoop and HDFS have put restrictions on what Hive can do. However, Hive is most suitable for data warehouse applications because it:

  • Analyzes relatively static data
  • Has less responsive time
  • Does not make rapid changes in data

Although Hive doesn’t provide fundamental features required for Online Transaction Processing (OLTP), it is suitable for data warehouse applications in large datasets. There are two types of tables in Hive:

  • Managed tables
  • External tables

69. What is the present version of Hive? Explain ACID transactions in Hive.

Ans:

The present version of Hive is 0.13.1. Hive supports ACID (Atomicity, Consistency, Isolation, and Durability) transactions. ACID transactions are provided at row levels. Following are the options Hive uses to support ACID transactions:

  • Insert
  • Delete
  • Update

70. What is the maximum size of a string data type supported by Hive? Explain how Hive supports binary formats.

Ans:

The maximum size of a string data type supported by Hive is 2 GB. Hive supports the text file format by default, and it also supports the binary format sequence files, ORC files, Avro data files, and Parquet files.

  • Sequence file: It is a splittable, compressible, and row-oriented file with a general binary format.
  • ORC file: Optimized row columnar (ORC) format file is a record-columnar and column-oriented storage file. It divides the table in row split. Each split stores the value of the first row in the first column and follows subsequently.
  • Avro data file: It is the same as a sequence file that is splittable, compressible, and row-oriented but without the support of schema evolution and multilingual binding.
  • Parquet file: In Parquet format, along with storing rows of data adjacent to one another, we can also store column values adjacent to each other such that both horizontally and vertically datasets are partitioned.

71. What is the precedence order of Hive configuration?

Ans:

We are using a precedence hierarchy for setting properties:

  1. 1. The SET command in Hive
  2. 2. The command-line –hiveconf option
  3. 3. Hive-site.XML
  4. 4. Hive-default.xml
  5. 5. Hadoop-site.xml
  6. 6. Hadoop-default.xml

72. If you run a select * query in Hive, why doesn’t it run MapReduce?

Ans:

The hive.fetch.task.conversion property of Hive lowers the latency of MapReduce overhead, and in effect when executing queries such as SELECT, FILTER, LIMIT, etc. it skips the MapReduce function.

73. How can we improve the performance with ORC format tables in Hive?

Ans:

We can store Hive data in a highly efficient manner in an Optimized Row Columnar (ORC) file format. It can simplify many Hive file format limitations. We can improve the performance by using ORC files while reading, writing, and processing data.

  • Set hive.compute.query.using.stats-true;
  • Set hive.stats.dbclass-fs;
  • CREATE TABLE orc_table (
  • idint,
  • name string)
  • ROW FORMAT DELIMITED
  • FIELDS TERMINATED BY ‘\:’
  • LINES TERMINATED BY ‘\n’
  • STORES AS ORC;

74. Explain the functionality of ObjectInspector.

Ans:

ObjectInspector helps analyze the internal structure of a row object and the individual structure of columns in Hive. It also provides a uniform way to access complex objects that can be stored in multiple formats in the memory.

  • An instance of Java class
  • A standard Java object
  • A lazily initialized object

ObjectInspector tells the structure of the object and also the ways to access the internal fields inside the object.

75. Whenever we run a Hive query, a new metastore_db is created. Why?

Ans:

A local metastore is created when we run Hive in an embedded mode. Before creating, it checks whether the metastore exists or not, and this metastore property is defined in the configuration file, hive-site.xml. The property is:

javax.jdo.option.ConnectionURL

with the default value:

  • jdbc:derby:;databaseName=metastore_db;create=true

Therefore, we have to change the behavior of the location to an absolute path so that from that location the metastore can be used.

hive Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

76. Differentiate between Hive and HBase.

Ans:

HiveHBase
Enables most SQL queriesDoes not allow SQL queries
Operations do not run in real timeOperations run in real time
A data warehouse frameworkA NoSQL database
Runs on top of MapReduceRuns on top of HDFS

77. How can we access the subdirectories recursively?

Ans:

By using the below commands, we can access subdirectories recursively in Hive:

  • hive> Set mapred.input.dir.recursive=true;
  • hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be pointed to the higher level directory, and this is suitable for the directory structure:

/data/country/state/city/

78. What are the uses of Hive Explode?

Ans:

Hadoop Developers consider an array as their input and convert it into a separate table row. To convert complicated data types into desired table formats, Hive uses Explode.

79. What is the available mechanism for connecting applications when we run Hive as a server?

Ans:

  • Thrift Client: Using Thrift, we can call Hive commands from various programming languages, such as C++, PHP, Java, Python, and Ruby.
  • JDBC Driver: JDBC Driver enables accessing data with JDBC support, by translating calls from an application into SQL and passing the SQL queries to the Hive engine.
  • ODBC Driver: It implements the ODBC API standard for the Hive DBMS, enabling ODBC-compliant applications to interact seamlessly with Hive.

80. How do we write our own custom SerDe?

Ans:

Mostly, end-users prefer writing a Deserializer instead of using SerDe as they want to read their own data format instead of writing to it, e.g., RegexDeserializer deserializes data with the help of the configuration parameter ‘regex’ and with a list of column names.

If our SerDe supports DDL (i.e., SerDe with parameterized columns and column types), we will probably implement a protocol based on DynamicSerDe, instead of writing a SerDe. This is because the framework passes DDL to SerDe through the ‘Thrift DDL’ format and it’s totally unnecessary to write a “Thrift DDL” parser.

81. Mention various date types supported by Hive.

Ans:

The timestamp data type stores data in the java.sql.timestamp format.

Three collection data types in Hive are:

  • Arrays
  • Maps
  • Structs

82. Can we run UNIX shell commands from Hive? Can Hive queries be executed from script files? If yes, how? Give an example.

Ans:

Yes, we can run UNIX shell commands from Hive using an ‘!‘ mark before the command. For example, !pwd at Hive prompt will display the current directory.

We can execute Hive queries from the script files using the source command.

Example:

  • Hive> source /path/to/file/file_with_query.hql

83. What is a Table generating Function on hive?

Ans:

A table generating function is a function which takes a single column as argument and expands it to multiple columns or rows. Example explore()

84. How can Hive avoid mapreduce?

Ans:

If we set the property hive.exec.mode.local.auto to true then hive will avoid mapreduce to fetch query results.

85. What is the difference between LIKE and RLIKE operators in Hive?

Ans:

The LIKE operator behaves the same way as the regular SQL operators used in select queries. Example −

street_name like ‘%Chi’

But the RLIKE operator uses more advanced regular expressions which are available in java

Example − street_name RLIKE ‘.*(Chi|Oho).*’ which will select any word which has either chi or oho in it.

86. Is it possible to create Cartesian join between 2 tables, using Hive?

Ans:

No. As this kind of Join can not be implemented in mapreduce

87. As part of Optimizing the queries in HIve, what should be the order of table size in a join query?

Ans:

In a join query the smallest table to be taken in the first position and largest table should be taken in the last position.

88. What is the usefulness of the DISTRIBUTED BY clause in Hive?

Ans:

It controls how the map output is reduced among the reducers. It is useful in case of streaming data

89. How will you convert the string ’51.2’ to a float value in the price column?

Ans:

Select cast(price as FLOAT)

90. What will be the result when you do cast(‘abc’ as INT)?

Ans:

Hive will return NULL

91. Can the name of a view be the same as the name of a hive table?

Ans:

No. The name of a view must be unique when compared to all other tables and views present in the same database.

92. What types of costs are associated in creating index on hive tables?

Ans:

Indexes occupy space and there is a processing cost in arranging the values of the column on which index is created.

93. Give the command to see the indexes on a table.

Ans:

SHOW INDEX ON table_name

This will list all the indexes created on any of the columns in the table table_name.

94. What does /*streamtable(table_name)*/ do?

Ans:

It is a query hint to stream a table into memory before running the query. It is a query optimization Technique.

95. Can a partition be archived? What are the advantages and Disadvantages?

Ans:

Yes. A partition can be archived. Advantage is it decreases the number of files stored in namenode and the archived file can be queried using hive. The disadvantage is it will cause less efficient query and does not offer any space savings.

96. What is a generic UDF in hive?

Ans:

It is a UDF which is created using a java program to serve some specific needs not covered under the existing functions in Hive. It can detect the type of input argument programmatically and provide appropriate response.

97.The following statement failed to execute. What can be the cause?

LOAD DATA LOCAL INPATH ${env:HOME}/country/state/’

OVERWRITE INTO TABLE address;

Ans:

The local inpath should contain a file and not a directory. The $env:HOME is a valid variable available in the hive environment.

98. How do you specify the table creator name when creating a table in Hive?

Ans:

The TBLPROPERTIES clause is used to add the creator name while creating a table.

  • The TBLPROPERTIES is added like −
  • TBLPROPERTIES(‘creator’= ‘Joan’)

99. How do ORC format tables help Hive to enhance the performance?

Ans:

Using the ORC (Optimized Row Columnar) file format, you can store the Hive data efficiently as it helps to simplify numerous limitations of the Hive file format. 

100. What is the function of the Object-Inspector?

Ans:

In Hive, the Object-Inspector helps to analyze the internal structure of a row object and individual structure of columns. Furthermore, it also offers ways to access complex objects that can be stored in different formats in memory.

101. What are the components used in the Hive Query Processor?

Ans:

Following are the components of a Hive Query Processor:

  • Parse and Semantic Analysis (ql/parse)
  • Metadata Layer (ql/metadata)
  • Type Interfaces (ql/typeinfo)
  • Sessions (ql/session)
  • Map/Reduce Execution Engine (ql/exec)
  • Plan Components (ql/plan)
  • Hive Function Framework (ql/udf)
  • Tools (ql/tools)
  • Optimizer (ql/optimizer)

Are you looking training with Right Jobs?

Contact Us

Popular Courses