Data Warehouse Interview Questions and Answers

Data Warehouse Interview Questions and Answers

Last updated on 14th Oct 2020, Blog, Interview Question

About author

Karthikeyan (Technical Lead - Hive )

He is Possessing 11+ Years Of Experience in Data Warehouse. His Passion lies in Developing Entrepreneurs & Activities. Also, Rendered his intelligence to the Enthusiastic JOB Seekers.

(5.0) | 11236 Ratings 2051

Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations.

1. Define data warehouse?

Ans:

Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management’s decision-making process.

2. What does subject-oriented data warehouse signify?

Ans:

Subject oriented signifies that the data warehouse stores the information around a particular subject such as product, customer, sales, etc.

3. List any five applications of data warehouse

Ans:

Some applications include financial services, banking services, customer goods, retail sectors, controlled manufacturing.

4. What do OLAP and OLTP stand for?

Ans:

OLAP is an acronym for Online Analytical Processing and OLTP is an acronym of Online Transactional Processing.

5. What is the very basic difference between data warehouse and operational databases?

Ans:

A data warehouse contains historical information that is made available for analysis of the business whereas an operational database contains current information that is required to run the business.

6. List the Schema that a data warehouse system can implement.

Ans:

A data Warehouse can implement star schema, snowflake schema, and fact constellation schema.

7. List the processes that are involved in Data Warehousing.

Ans:

Data Warehousing involves data cleaning, data integration and data consolidations.

8. List the functions of data warehouse tools and utilities.

Ans:

The functions performed by Data warehouse tools and utilities are Data Extraction, Data Cleaning, Data Transformation, Data Loading and Refreshing.

9. What do you mean by Data Extraction?

Ans:

Data extraction means gathering data from multiple heterogeneous sources.

10. Define metadata

Ans:  Metadata is simply defined as data about data. In other words, we can say that metadata is the summarized data that leads us to the detailed data.

11. What does the Metadata Repository contain?

Ans:

Metadata repository contains definition of data warehouse, business metadata, operational metadata, data for mapping from operational environment to data warehouse, and the algorithms for summarization.

12. How does a Data Cube help?

Ans:

Data cube helps us to represent the data in multiple dimensions. The data cube is defined by dimensions and facts.

Subscribe For Free Demo

Error: Contact form not found.

13. Define dimension?

Ans:

The dimensions are the entities with respect to which an enterprise keeps the records.

14. Explain data mart.

Ans:

Data mart contains the subset of organization-wide data. This subset of data is valuable to specific groups of an organization. In other words, we can say that a data mart contains data specific to a particular group.

15. What is Virtual Warehouse?

Ans:

The view over an operational data warehouse is known as virtual warehouse.

16. List the phases involved in the data warehouse delivery process.

Ans:

The stages are IT strategy, Education, Business Case Analysis, technical Blueprint, Build the version, History Load, Ad hoc query, Requirement Evolution, Automation, and Extending Scope.

17. Define load manager.

Ans:

A load manager performs the operations required to extract and load the process. The size and complexity of load managers varies between specific solutions from data warehouse to data warehouse.

18. Define the functions of a load manager.

Ans:

A load manager extracts data from the source system. Fast load the extracted data into a temporary data store. Perform simple transformations into structure similar to the one in the data warehouse.

19. Define a warehouse manager.

Ans:

Warehouse manager is responsible for the warehouse management process. The warehouse manager consists of third party system software, C programs and shell scripts. The size and complexity of warehouse managers varies between specific solutions.

20. Define the functions of a warehouse manager.

Ans:

The warehouse manager performs consistency and referential integrity checks, creates the indexes, business views, partition views against the base data, transforms and merge the source data into the temporary store into the published data warehouse, backs up the data in the data warehouse, and archives the data that has reached the end of its captured life.

21. What is Summary Information?

Ans:

Summary Information is the area in the data warehouse where the predefined aggregations are kept.

22. What is the Query Manager responsible for?

Ans:

Query Manager is responsible for directing the queries to the suitable tables.

23. List the types of OLAP server

Ans:

There are four types of OLAP servers, namely Relational OLAP, Multidimensional OLAP, Hybrid OLAP, and Specialized SQL Servers.

24. Which one is faster, Multidimensional OLAP or Relational OLAP?

Ans:

Multidimensional OLAP is faster than Relational OLAP.

25. List the functions performed by OLAP.

Ans:

OLAP performs functions such as roll-up, drill-down, slice, dice, and pivot.

26. How many dimensions are selected in Slice operation?

Ans:

Only one dimension is selected for the slice operation.

27. How many dimensions are selected in dice operation?

Ans:

For dice operation two or more dimensions are selected for a given cube.

28. How many fact tables are there in a star schema?

Ans:

There is only one fact table in a star Schema.

29. What is Normalization?

Ans:

Normalization splits up the data into additional tables.

30. Out of star schema and snowflake schema, whose dimension table is normalized?

Ans:

Snowflake schema uses the concept of normalization.

31. What is the benefit of normalization?

Ans:

Normalization helps in reducing data redundancy.

32. Which language is used for defining Schema Definition?

Ans:

Data Mining Query Language (DMQL) is used for Schema Definition.

33. What language is the base of DMQL?

Ans:

DMQL is based on Structured Query Language (SQL).

34. What are the reasons for partitioning?

Ans:

Partitioning is done for various reasons such as easy management, to assist backup recovery, to enhance performance.

35. What kind of costs are involved in Data Marting?

Ans:

Data Marting involves hardware & software cost, network access cost, and time cost.

Course Curriculum

Get Comprehensive Data Warehouse Training to Build Your Career

  • Instructor-led Sessions
  • Real-life Case Studies
  • Assignments
Explore Curriculum

36. Compare a database with Data Warehouse.

Ans:

CriteriaDatabaseData Warehouse
Type of dataRelational or object-oriented dataLarge volume with multiple data types
Data operationsTransaction processingData modeling and analysis
Dimensions of dataTwo-dimensional dataMultidimensional data
Data designER-based and application-oriented database designStar/Snowflake schema and subject-oriented database design
Size of the dataSmall (in GB)Large (in TB)
FunctionalityHigh availability and performanceHigh flexibility and user autonomy

A database uses a relational model to store data, whereas a Data Warehouse uses  various schemas such as star schema and others. In the star schema, each dimension is represented by only the one-dimensional table. Data Warehouse supports dimensional modeling, which is a design technique to support end-user queries.

37. What is the purpose of cluster analysis in Data Warehousing?

Ans:

Cluster analysis is used to define the object without giving the class label. It analyzes all the data that is present in the Data Warehouse and compares the cluster with the cluster that is already running. It performs the task of assigning some set of objects into groups, also known as clusters. It is used to perform the data mining job using a technique like statistical data analysis. It includes all the information and knowledge around many fields such as Machine Learning, pattern recognition, image analysis, and bioinformatics. Cluster analysis performs the iterative process of knowledge discovery and includes trials and failures. It is used with the pre-processing and other parameters to achieve the properties that are desired to be used.

Purpose of cluster analysis:

  • Scalability
  • Ability to deal with different kinds of attributes
  • Discovery of clusters with attribute shape
  • High dimensionality
  • Ability to deal with noise
  • Interpretability

38. What is the difference between agglomerative and divisive hierarchical clustering?

Ans:

  • Agglomerative hierarchical clustering method allows clusters to be read from bottom to top so that the program always reads from the sub-component first then moves to the parent; whereas, divisive hierarchical clustering uses top to bottom approach in which the parent is visited first then the child.
  • Agglomerative hierarchical methods consist of objects in which each object creates its own clusters, and these clusters are grouped together to create a large cluster. It is a process of continuous merging until all the single clusters are merged together into a complete big cluster that will consist of all the objects of child clusters. However, in divisive clustering, the parent cluster is divided into smaller clusters, and it keeps on dividing until each cluster has a single object to represent.

39.Explain the chameleon method used in Data Warehousing.

Ans:

Chameleon is a hierarchical clustering algorithm that overcomes the limitations of the existing models and methods present in Data Warehousing. This method operates on the sparse graph having nodes that represent data items and edges which represent the weights of the data items.

This representation allows large datasets to be created and operated successfully. The method finds the clusters that are used in the dataset using the two-phase algorithm.

  • The first phase consists of the graph partitioning that allows the clustering of the data items into a large number of sub-clusters.
  • The second phase uses an agglomerative hierarchical clustering algorithm to search for the clusters that are genuine and can be combined together with the sub-clusters that are produced.

40. What is Virtual Data Warehousing?

Ans:

  • A Virtual Data Warehouse provides a collective view of the completed data. A Virtual Data Warehouse has no historic data. It can be considered as a logical data model of the given metadata.
  • Virtual Data Warehousing is a ‘de facto’ information system strategy for supporting analytical decision-making. It is one of the best ways for translating raw data and presenting it in the form that can be used by decision-makers. It provides a semantic map—which allows the end user for viewing as virtualized.

41. What is Active Data Warehousing?

Ans:

  • An Active Data Warehouse represents a single state of a business. Active Data Warehousing considers the analytic perspectives of customers and suppliers. It helps deliver the updated data through reports.
  • A form of repository of captured transactional data is known as ‘Active Data Warehousing.’ Using this concept, trends and patterns are found to be used for future decision-making. Active Data Warehouse has a feature which can integrate the changes of data while scheduled cycles refresh. Enterprises utilize an Active Data Warehouse in drawing the company’s image in a statistical manner.

42. What is a snapshot with reference to Data Warehouse?

Ans:

  • A snapshot refers to a complete visualization of data at the time of extraction. It occupies less space and can be used to back up and restore data quickly.
  • A snapshot is a process of knowing about the activities performed. It is stored in a report format from a specific catalog. The report is generated soon after the catalog is disconnected.

43. What is XMLA?

Ans:

  • XMLA is XML for Analysis which can be considered as a standard for accessing data in OLAP, data mining, or data sources on the Internet. It is Simple Object Access Protocol. XMLA uses ‘Discover’ and ‘Execute’ methods. Discover fetches information from the Internet, while ‘Execute’ allows the applications to execute against the data sources.
  • XMLA is an industry standard for accessing data in analytical systems, such as OLAP. It is based on XML, SOAP, and HTTP.
  • XMLA specifies MDXML as a query language. In the XMLA 1.1 version, the only construct in the MDXML is an MDX statement enclosed in the tag.

44. What is ODS?

Ans:

  • An operational data store (ODS) is a database designed to integrate data from multiple sources for additional operations on the data. Unlike a master data store, the data is not sent back to operational systems. It may be passed for further operations and to the Data Warehouse for reporting.
  • In ODS, data can be scrubbed, resolved for redundancy, and checked for compliance with the corresponding business rules. This data store can be used for integrating disparate data from multiple sources so that business operations, analysis, and reporting can be carried out. This is the place where most of the data used in the current operation is housed before it’s transferred to the Data Warehouse for longer-term storage or archiving.
  • An ODS is designed for relatively simple queries on small amounts of data (such as finding the status of a customer order), rather than the complex queries on large amounts of data typical of the Data Warehouse.
  • An ODS is similar to the short-term memory where it only stores very recent information. On the contrary, the Data Warehouse is more like long-term memory, storing relatively permanent information.

45. What is the level of granularity of a fact table?

Ans:

A fact table is usually designed at a low level of granularity. This means that we need to find the lowest level of information that can be stored in a fact table  e.g., employee performance is a very high level of granularity. Employee_performance_daily and employee_perfomance_weekly can be considered as lower levels of granularity.

The granularity is the lowest level of information stored in the fact table. The depth of the data level is known as granularity. In date dimension, the level could be year, month, quarter, period, week, and day of granularity.

The process consists of the following two steps:

  • Determining the dimensions that are to be included
  • Determining the location to find the hierarchy of each dimension of the information

The above factors of determination will be re-sent as per the requirements.

46. What is the difference between ‘view’ and ‘materialized view’?

Ans:

View:

  • Tail raid data representation is provided with a view to access data from its table.
  • It has a logical structure that does not occupy space.
  • Changes get affected in the corresponding tables.

Materialized view:

  • Pre-calculated data persists in the materialized view.
  • It has physical data space occupation.
  • Changes will not get affected in the corresponding tables.

47. What is junk dimension?

Ans:

  • In scenarios where certain data may not be appropriate to store in the schema, the data (or attributes) can be stored in a junk dimension. The nature of the data of junk dimension is usually Boolean or flag values.
  • A single dimension is formed by lumping a number of small dimensions. This is called a junk dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text attributes in a dimension by transmitting them to a distinguished sub-dimension is related to junk dimension.

48. What are the different types of SCDs used in Data Warehousing?

Ans:

SCDs (slowly changing dimensions) are the dimensions in which the data changes slowly, rather than changing regularly on a time basis.

Three types of SCDs are used in Data Warehousing:

  • SCD1: It is a record that is used to replace the original record even when there is only one record existing in the database. The current data will be replaced and the new data will take its place.
  • SCD2: It is the new record file that is added to the dimension table. This record exists in the database with the current data and the previous data that is stored in the history.
  • SCD3: This uses the original data that is modified to the new data. This consists of two records: one record that exists in the database and another record that will replace the old database record with the new information.

49. What is Hybrid SCD?

Ans:

Hybrid SCDs are a combination of both SCD1 and SCD2.

It may happen that in a table, some columns are important and we need to track changes for them, i.e., capture the historical data for them, whereas in some columns even if the data changes we do not have to bother. For such tables, we implement Hybrid SCDs, wherein some columns are Type 1 and some are Type 2.

50. Why do we override the execute method in Struts?

Ans:

As part of Struts Framework, we can develop the Action Servlets and the ActionForm Servlets and other servlet classes.

In case of ActionForm class, we can develop the validate() method. This method will return the ActionErrors object. In this method, we can write the validation code.

  • If this method returns null or ActionErrors with size = 0, the web container will call execute() as part of the Action class.
  • If it returns size > 0, it will not call the execute() method. It will rather execute the jsp, servlet, or html file as the value for the input attribute as part of the attribute in the struts-config.xml file.

51. What is VLDB?

Ans:

A very large database (VLDB) is a database that contains an extremely large number of tuples (database rows) or occupies an extremely large physical file system storage space. A one terabyte database would normally be considered to be a VLDB.

52. How do you load the time dimension?

Ans:

Time dimensions are usually loaded by a program that loops through all possible dates appearing in the data. It is not unusual for 100 years to be represented in a time dimension, with one row per day.

53. What are conformed dimensions?

Ans:

  • Conformed dimensions are the dimensions which can be used across multiple data marts in combination with multiple fact tables accordingly.
  • A conformed dimension is a dimension that has exactly the same meaning and content when being referred from different fact tables. It can refer to multiple tables in multiple data marts within the same organization.

54. What is the main difference between Inmon and Kimball philosophies of Data Warehousing?

Ans:

Both differ in the concept of building the Data Warehouse.

  • Kimball views Data Warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in an organization, and the Data Warehouse is a conformed dimension of the data marts. Hence, a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level.
  • Inmon explains in creating a Data Warehouse on a subject-by-subject area basis. Hence, the development of the Data Warehouse can start with data from the online store. Other subject areas can be added to the Data Warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides that it is necessary.
  • Hence, the process will be as follows:
    Kimball > First Data Marts > Combined Ways > Data Warehouse
    Inmon > First Data Warehouse > Data marts

55. What is the difference between a data warehouse and a data mart?

Ans:

A data warehouse is a set of data isolated from operational systems. This helps an organization deal with its decision-making process. A data mart is a subset of a data warehouse that is geared to a particular business line. Data marts provide the stock of condensed data collected in the organization for research on a particular field or entity.

A data warehouse typically has a size greater than 100 GB, while the size of a data mart is generally less than 100 GB. Due to the disparity in scope, the design and utility of data marts are comparatively simpler.

56. Explain the ETL cycle’s 3-layer architecture.

Ans:

The staging layer, the data integration layer, and the access layer are the three layers that are involved in an ETL cycle.

  • Staging layer: It is used to store the data extracted from various data structures of the source.
  • Data integration layer: Data from the staging layer is transformed and transferred to the database using the integration layer. The data is arranged into hierarchical groups (often referred to as dimensions), facts, and aggregates. In a DW system, the combination of facts and dimensions tables is called a schema.
  • Access layer: For analytical reporting, end-users use the access layer to retrieve the data.
Course Curriculum

Advance Your Skills with Data Warehouse Training Course

Weekday / Weekend BatchesSee Batch Details

57. What does data purging mean?

Ans:

Data purging is a process, involving methods that can erase data permanently from the storage. Several techniques and strategies are used for data purging.

The process of data purging often contrasts with data deletion. Deleting data is more of a temporary process, while data purging permanently removes data. This, in turn, frees up storage and/or memory space, which can be utilized for other purposes.

The purging process allows us to archive data even if it is permanently removed from the main source, giving us an option to retrieve the data from the archive if it is needed. The deleting process also permanently removes the data but does not necessarily involve keeping a backup, and it generally involves insignificant amounts of data.

58. Can you define the five main testing phases of a project?

Ans:

The ETL test is performed in five stages as follows:

  • The identification of data sources and requirements
  • The acquisition of data
  • Implementing business logic and dimensional modeling
  • Building and publishing data
  • Reports building

59. What do you mean by the slice action? How many slice-operated dimensions are used?

Ans:

A slice operation is the filtration process in a data warehouse. It selects a specific dimension from a given cube and provides a new sub-cube. In the slice operation, only a single dimension is used.

60. What is data analytics in terms of a data warehouse?

Ans:

Data Analytics is the science to check raw data to draw business-driven conclusions of the data. The data warehouse enables data analysis.

61. What does OLAP mean, and what are its types?

Ans:

OLAP is a system that processes, manages, and collects multi-dimensional data for management. It stands for Online Analytical Processing.

There are four types of OLAP Servers given below:

  • Hybrid OLAP
  • Relational OLAP
  • Specialized SQL Servers
  • Multi-dimensional OLAP

62. What is the difference between OLAP and OLTP?

Ans:

OLAP is a software tool used for data analysis that helps in business decisions while OLTP is a transaction-oriented application used in a three-tier architecture. Below are some of the differences between OLAP and OLTP: 

OLAP (Online Analytical Processing)OLTP (Online Transaction Processing)
It contains the historical data collected from different databases.It contains operational data.
It is used in data analytics, data mining, and decision making.It is application-oriented and is used for various business-related tasks.
It stores a huge amount of data and is in TB.It stores a small amount of data and is stored in MB, GB, etc.
It works slowly because the size of the data is large.It works very fast and queries take place on 5% of the stored data.
It needs the backup of data from time to time only.Backup and recovery of data occur regularly.
It is mainly used for a read operation with write operation occurring rarely.It is used for both read and write operations.

63. What is the ER Diagram?

Ans:

ER Diagram stands for Entity-Relationship Diagram that shows the interrelationships between the entities in the database.

64. What is SCD?

Ans:

SCD stands for slowly changing dimensions, and it applies to such cases where records change over time.

65. Define the types of SCD.

Ans:

There are 3 types of SCD as given below:

SCD 1: The new record replaces the original record.

SCD 2: The new record gets added to the existing client table

SCD 3: The original data gets changes to enter new data.

66. What is a Snowflake Schema?

Ans:

Snowflake Schema is a schema having a primary dimension table. One or more dimensions can be joined in the primary dimension table. It is the only table that can join with the fact table.

67. Define Star Schema.

Ans:

Star Schema refers to managing the table in a way that results can readily get recovered in the data warehouse environment.

68. Define BUS Schema.

Ans:

The BUS Schema includes the suite of standardized definition and conformed dimension if a fact table is there.

69. Define the core dimension.

Ans:

Core Dimension is a Dimension Table that is mainly used for data mart or a single fact table.

70. Define the loops in the data warehouse.

Ans:

These loops exist between the tables in the data warehouse. If any loops are between the tables, then the query generation takes more time and creates an enigma. So it is always recommended to avoid any loops between the tables.

71. Define the Cube in the Data warehouse.

Ans:

Cubes in a Data warehouse are the representation of multi-dimensional data. The body of the cube consists of data values, and the edge of the cube contains dimension members.

72. What is the execution plan, and what approach does the optimizer use during the execution plan?

Ans:

The execution plan is the plan used by the optimizer to choose the combination of steps for the execution of SQL queries. The optimizer selects the most efficient combination of steps for executing the SQL queries. The optimizer uses the two approaches in the execution plan, i.e., rule-based and cost-based.

73. What are the different tools used in ETL (Extraction, Transform, and Load)?

Ans:

Below is the list of ETL tools:

  • Informatica
  • Oracle
  • Data Stage
  • Data Junction
  • Ab Initio
  • Warehouse builder

74. How are metadata and data dictionaries different?

Ans:

Metadata describes the data. It contains all the information about data such as, the source of data, who collected the data, and the data format. It is crucial to understand the information about the data stored in the data warehouses. On the other side, a data dictionary is the basic definition of the database. Data dictionary consists of the files which are present in the database, count of records present in each file, and all the information about the fields in the database.

Data Warehouse Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

75. What approaches are used to design the data warehouse?

Ans:

There are mainly two approaches used for the data warehouse design:

  1. 1. Inmon approach: It is the top-down approach in which first the data warehouse gets created, and then the data marts are built. In this approach, the data warehouse acts as the center of the Corporate Information Factory, and the data warehouse acts as a logical framework.
  2. 2. Kimball approach: It is the bottom-up approach in which data mart gets created first. The data mart then integrates to form the complete data warehouse. The integration of different data marts is called the data warehouse bus architecture.

76. What is a real-time data warehouse, and what are its benefits?

Ans:

A real-time data warehouse is the data warehouse concept that captures real-time data as soon as it occurs and makes it available in the data warehouse.

Benefits of a real-time data warehouse:

  • It helps in easy decision-making.
  • It removes the batch window.
  • It resolves the issue related to the ideal data load.
  • It offers an optimized way to run the transformations in the database.
  • It offers quick recovery of data.

77. What are some benefits of cloud-based data warehouses when compared to on-premise solutions?

Ans:

In the last few years, cloud computing is prevalent, and now most of the companies prefer to use cloud-based data warehouses over traditionally used on-site warehouses. Below are the top reasons for companies using cloud-based data warehouses:

  • It is easy and practical to scale data warehouses in the cloud.
  • It is economical to store a data warehouse on the cloud as it eliminates the hardware and licensing cost, which is required for on-site warehouse setup.
  • The cloud data warehouse is optimized for data analytics because it uses Massively Parallel Processing (MPP) and columnar storage, which are known for offering better performance and helps in executing complex queries.

78. What are the advantages of the Inmon approach?

Ans:

Below are some advantages of top-down or Inmon design:

  • Easy to maintain and though the initial cost is high, subsequently the project development cost is low
  • Offers consistent dimensional views of data across all Data Marts
  • A highly robust approach toward frequent business changes

79. What are the disadvantages of the Inmon approach?

Ans:

Below are some disadvantages of top-down or Inmon design:

  • Represents a large chunk of data thus cost of implementing design is high
  • Requires more time for initial set up
  • Represents substantial projects and hence it is complex

80. What are the advantages of the Kimball approach?

Ans:

Below are some advantages of bottom-up or Kimball design:

  • Contains consistent Data Marts which are easy to deliver
  • Data Marts showcase reporting capabilities
  • Initial setup is quick and easy hence it is easy to accommodate new business units by merely creating new Data Marts and clubbing it with other data warehouses

81. What are the disadvantages of the Kimball approach?

Ans:

Below are some disadvantages of bottom-up or Kimball design:

  • The overall cost is high
  • Data Mart and data warehouse positions are differentiated
  • At times difficult to maintain

82. What are the different stages of data warehouse decision support evolution?

Ans:

Below are the 5 stages involved in data warehouse decision support evolution:

  1. 1. Report
  2. 2. Analyze
  3. 3. Predict
  4. 4. Operationalize
  5. 5. Active warehousing

83. Name the components of data warehousing.

Ans:

Below are the 5 components of data warehousing:

  1. 1. Data Warehouse Database
  2. 2. Sourcing, Acquisition, Clean-up and Transformation Tools (ETL)
  3. 3. Metadata
  4. 4. Query Tools
  5. 5. Data warehouse Bus Architecture

84. What do you know about Amazon Redshift’s architecture?

Ans:

Amazon Redshift, based on PostgreSQL, is the most popular cloud service offered by Amazon Web Services. This tool is popularly used for handling Petabyte-scale data. Its unique features help the analyst to query data in seconds. With almost negligible cost, Redshift is easy to set up and maintain.

Redshift can be integrated with other BI and analytical tools and works with Extract, Transform, and Load (ETL) tools.

Below are some features of Redshift:

  1. 1. Columnar storage and MPP processing
  2. 2. Compression (column-level operation)
  3. 3. Management and Security
  4. 4. Data Types
  5. 5. Updates and Upserts

85. What should you avoid when planning to construct a real-time data warehouse?

Ans:

One must avoid mistakes like:

  • Not focusing on data integrity when constructing real-time data
  • Overlooking traditional OLTP systems
  • Not initiating business process changes in real-time data warehousing

86. State the difference between Star and Snowflake schema.

Ans:

Below is the list of differences between star schema and snowflake schema:

Star SchemaSnowflake Schema
Dimension hierarchy is stored in a dimensional tableHierarchy is divided into multiple tables
Dimension table surrounded fact tablesOther dimension tables further surround dimension tables
A single join reflects the relation between fact and dimension tableRequires multiple joins to establish the relationship
DB design is simpleDB design is complex
Data redundancy is possibleData redundancy is hardly possible
Fast cube processingCube processing is a bit slow
Denormalized Data structureNormalized Data Structure

87. Define a Galaxy schema.

Ans:

Galaxy schema, also known as Fact Constellation Schema, contains two fact tables along with dimensional tables. In other words, it can be called a combination of stars.

88. What are the types of fact tables?

Ans:

In the dimensional model, the fact table is the primary table, which contains facts and foreign keys to the dimension table. It is used for measurement in the business process. The fact table has three different types:

Fact Table TypesDescription
AdditiveAll dimensions must have measures
Semi-AdditiveMeasures must be added to only some dimensions and not all
Non-AdditiveOnly contains some fundamental unit of measurement

89. What are the types of dimension tables?

Ans:

Joined via a foreign key, a dimension table includes the dimension of facts. It is also known as denormalized tables that offer descriptive characteristics of facts. Below are the types of dimension tables:

  • Conformed dimensions
  • Outrigger dimensions
  • Shrunken rollup dimensions
  • Dimension-to-dimension table joins
  • Junk dimensions

90. Give the steps to start and shut down the database.

Ans:

Below are the steps to start a database:

  • Start an instance
  • Mount the database
  • Open the database

Below are the steps to shut down a database:

  • Close the database
  • Dismount the database
  • Shutdown the instance

91. Define the surrogate key.

Ans:

Surrogate key functions as a substitute for the natural primary key.

92. When do you use bteqexport?

Ans:

Whenever the total number of rows is less than half a million, bteqexport is used.

93. When do you use fastexport?

Ans:

“fastexport” is used when the total number of rows is more than half a million.

94. Name the primary functions of dimensions.

Ans:

The primary functions of the dimensions are:

  • Filtering
  • Grouping
  • Labeling

95. As a data warehouse manager, what were your key job responsibilities in the previous company?

Ans:

Some of my prime responsibilities are:

  • Work on creating data warehouse process models
  • Verify the integrity of warehouse data and ensure consistent changes
  • Implement data extraction procedures
  • Maintain data standards
  • Handle data related troubleshooting
  • Use different computer language and methods to perform data analysis
  • Implement metadata processes
  • Review data designs, codes, and test plans
  • Use database management system software like Apache, MongoDB, Oracle to smoothly perform data warehousing functions

96. Which data warehousing skills you master?

Ans:

In the data warehousing interview, you can talk about your critical technical skills. You can say – Some of my strengths are:

  • Enterprise system management software
  • Apache Avro
  • Human resource management software HRMS
  • Data mining software like Rapid-I RapidMiner, SAP NetWeaver Business Warehouse
  • CRM software
  • Data analysis
  • MS office

97. Define the meaning of aggregate tables.

Ans:

Aggregate table contains the data of an existing warehouse that has been grouped to a certain level of dimensions. This is easy to retrieve data from aggregate tables as compared to original tables because of a large number of records. It can reduce the load in the database server and increases the overall performance of a query.

Are you looking training with Right Jobs?

Contact Us

Popular Courses