ETL Testing Interview Questions
Last updated on 27th Sep 2020, Blog, Interview Question
Business information and the data are of key importance to any business and company. Many companies invest a lot of time and money in the process of analyzing and sorting out this vital information.
Analyzing and Integration of Data has gained a huge potential market and so to make this process organized and simple, ETL testing tools have been introduced by many software vendors.
There are many open-source ETL tools available in the market, where the vendors allow the users to directly download the free versions from their official website. All the basic functions will be available in this free version download but to upgrade to the next level, the company needs to subscribe to the vendors on payment.
Each company has a different business structure and model, so they need to make a clear analysis before choosing the ETL tool for their business. With the help of these open source ETL tools, the business has the opportunity to try out the free software tools without any huge investments.
1. What is ETL?
- ETL stands for Extract-TrAnsform-Load. It is an important component in the data warehouse with which one can manage the data of any business.
- The extract does the reading of the data from the database.
- Transform does the conversion of data such that it can be used for analysis and reporting.
- Load does the allocation of data to the respective database.
2. What are the operations included in ETL testing?
Following are the operations included in ETL testing:
- Verification of the conversion of data to a required business format.
- Verification of the loading of data to the respective data warehouse without any cut short of the main data.
- Ensuring that there are no invalid data and if found any, replacing them with the default data.
- Ensuring the time frame to improve the performance and expandability of the loading.
3. What are the different types of data warehouse applications?
Following are the different types of data warehouse applications:
- Processing of information.
- Processing of analytics.
- Data mining.
4. What is the difference between data mining and data warehousing?
Following is the table explaining the difference between data mining and data warehousing:
|Parameter||Data mining||Data warehousing|
|Definition||Data mining refers to extracting information fromhidden patterns.||Data warehousing refers to the collection of data from various places and storing them in one place.|
|Key features||Outcomes that are likely can be predicted.The patterns can be discovered automatically||Data can be obtained within the fixed time frame.Heterogeneous data can be used to make the final database.|
|Advantages||The marketing of the product is direct.Detailed analysis of trends in the marketplace.||Productivity and performance are better.Cost-effective.|
5. Name the types of tools used in ETL.
Following are the different types of tools that are used in ETL:
- Warehouse builder from Oracle.
- Decision stream from Cognos.
- Business warehouse from SAS.
- Enterprise ETL server from SAS.
- Business object XI.
6. Define fact.
Fact is defined as the central component related to the multi-dimensional model. The multi-dimensional model contains measures that are to be analyzed.
7. What are the different types of facts?
Following are the different types of facts:
8. What do Cubes mean?
Cubes are defined as the data processing units that consist of facts tables and dimensions obtained from the data warehouse. It is used for multi-dimensional analysis.
9. What does OLAP Cubes mean?
OLAP cubes stand for Online Analytics Processing cubes. It is used for storing multidimensional data on a large scale. It consists of dimensions that are segregated on the basis of measures.
10. What does the tracing level mean?
The tracing level is defined as the amount of data that is stored in the log files.
Subscribe For Free Demo[contact-form-7 404 "Not Found"]
11. What are the different types of tracing levels?
Following are the two different types of tracing levels:
12. Explain the types of tracing levels.
Normal level is the first type of tracing levels that are used to explain the tracing level in a detailed manner. The verbose level is the second type of levels that are used to explain the tracing level at every row.
13. What do you mean by the term “Grain of Fact”?
Grain of fact is also known as Fact Granularity and is defined as the storage place of the fact information.
14. Define measures.
Measures are defined as the numeric data on the basis of the columns in a fact table.
15. What is a factless fact schema?
A factless fact schema is the fact table without any measures. It is used to view the number of occurrences of the events.
16. What is the transformation?
Transformation is defined as the storage place where the generation, modification, and passing of data take place.
17. How many types of transformation are there?
There are two types of transformation:
- Active transformation
- Passive transformation
18. What is an active transformation?
Active transformation is used to modify the rows of data and also the number of input rows that are passed through them. An example of an active transformation is Filter transformation.
19. What is a passive transformation?
Passive transformation is used to get the input and output data in the same number of rows. An example of passive transformation is Lookup transformation.
20. What is the use of Lookup transformation?
Following are the uses of Lookup transformation:
- With the use of column value, the related value can be found from the table.
- The dimension of the table changes slowly.
- Lookup example formation is used for the verification of the existing records.
21. What is partitioning?
Partitioning is defined as the division of the data storage to improve performance. There are two types of partitioning:
- Round-robin partitioning
- Hash partitioning
22. What is round-robin partitioning?
Round-robin partitioning is a type of partitioning which is done to distribute the data uniformly in all the divisions and is applied when the number of rows for processing are equal.
23. What is hash partitioning?
Hash partitioning is a type of partitioning which is done for grouping of the data based on the keys and is used for ensuring that the processed groups are in the same partition. Hush partitioning finds application in the Informatica server.
24. What is the advantage of using the DataReader Destination Adapter?
The advantage of using a DataReader Destination Adapter is that the records and columns in the memory are postulated such that the data from the DataFlow task is available for full consumption.
25. What is Informatica?
Informatica is a software development company that offers products related to data integration. Products from Informatics are used by ETL, data quality, master data management, data masking, etc.
Get ETL Testing Training By Industry Experts to UPGRADE Your Skills
- Instructor-led Sessions
- Real-life Case Studies
26. Name the list of transformations that are available on Informatica.
Following are the list of transformations that are available on Informatica:
- transformation for rank.
- transformation for sequence generator.
- transformation for controlling transactions.
- transformation for source qualifier
- transformation for normalization.
27. What is filter transformation?
Filter transformation is an active transformation which is used for filtering the records on the basis of filter condition.
28. What is SSIS?
SSIS stands for SQL Server Integration Service. It is one of the components of the Microsoft SQL Server database which is used for conducting a wide range of data integration. SSIS is used in ETL testing because it is fast and flexible and also the movement of the data from one database to another becomes easy with the help of SSIS.
29. How to update the table with the help of SSIS?
Following are the ways to update the table with the help of SSIS:
- By using the SL command.
- By using a staging table.
- With the help of cache.
- By using the script task.
30. Name the two types of ETL testing that are available.
Following are the two types of ETL testing that are available:
- Application testing
- Data centric testing
31. Define dimensions.
Dimensions are the place where the summarized data are stored.
32. Why do we need ETL testing?
Following are the reasons why we need ETL testing:
- With the help of ETL testing, one can check for the efficiency and speed of the process.
- To keep an eye on the trAnsfer of the data from one system to the other.
- To get familiar with the ETL process before running the entire business using ETL.
33. What do you mean by the term “staging area”?
During the process of data integration, the data is stored at a place temporarily so that the data is cleaned and checked for any duplication. This storage area is known as a staging area.
34. Define ETL mapping sheets.
ETL mapping sheet is a place where one can find all the information related to the source file which includes all the rows and columns. This sheet is very helpful for ETL tool testing.
35. Name a few ETL bugs.
Following is the list of ETL bugs:
- Bug related to ECP.
- Bug related to load conditioning.
- Source related bugs.
- Bugs related to calculations.
- Bug related to the user interface.
36. Name a few test cases.
Following is the list of test cases:
- Issues related to correctness.
- Data checker
- Validation on mapping doc
37. What is the use of mapping doc validation?
With the help of mapping doc validation, one can check if the provided information is available in the mapping doc.
38. What is the purpose of data check as a test case?
With the help of the data check test case, one can easily get the information related to data check, number check, and null check.
39. What is the use of the correctness issue test case?
As the name suggests, the correctness issues test case helps in understanding the misspelled data, null data, and inaccurate data.
40. What is the difference between power mart and power center?
Following is the table explaining the difference between power mart and power center:
|Power mart||Power center|
|It is used only for the local storage||It is used for local and global storage.|
|There is a specification for the conversion of local data into global||It can be used for the conversion of local data into global data|
|EPR sources are not supported||ERP sources such as SAP is supported|
|The main purpose of power mart is to process low volume data||The main purpose of power center is to process huge amount of data|
41. What is the difference between unconnected and connected lookup?
Following are the difference between unconnected and connected lookup:
|Unconnected lookup||Connected lookup|
|The cache used is static||The cache used can be either static or dynamic|
|Only a single output port can be used||Multiple output ports can be used|
|Only a single transformation can be used||Multiple transformations can be used|
42. What is the difference between OLAP tools and ETL tools?
Following is the table explaining the difference between OLAP tools and ETL tools:
|OLAP tools||ETL tools|
|OLAP tools are used for reporting data from the OLAP database||ETL tools are used for the extraction of data from the system and to load them at the specific database|
|Cognos is an example of the OLAP tool||Informatica is an example of the ETL tool|
43. What do understand by the term data purging?
Data purging is defined as the process of deleting junk data from the data warehouse.
44. What is the bus schema?
Bus schema is used for identifying similar dimensions in various business processes. Bus schema provides standard information with precise dimensions.
45. What are schema objects?
Schema objects are the logical structures that are used for referring to the database. These objects are tables, indexes, database links, function packages, etc.
Best ETL Testing Certification Course & Get Noticed By Top Hiring CompaniesWeekday / Weekend BatchesSee Batch Details
46. What is the purpose of a staging area?
Following are the purposes of staging area:
- Restructuring of the database for proper data extraction and transformation.
- Cleaning data and transformation of values.
- Used for the replacement of key assignments.
47. Explain the following terms:
Mapplet: This is used for arranging a set of transformations.
Worklet: This is used for representing a specific set of tasks.
Workflow: This is used as a set of instructions for the server to execute the tasks.
Session: This is used as a set of definitions that are used for the commanding of a server while moving data from the target source.
48. Explain the steps for the extraction of SAP data using Informatica.
Following are the steps for the extraction of SAP data using Informatica:
SAP data is extracted using Informatica by using the option called power connect.
By installing and configuring the Power control tool.
49. What is the data source view?
Data source view is used for defining the relational schema that is used for the analysis of the service database.
50. What is the use of dynamic cache and static cache in connected and unconnected transformation?
Static cache is used for flat files while the dynamic cache is used for updating the master table by slowly changing the dimensions.
51. Difference between tsortaggrow and tag grow
tsortaggrow will accept any type of data
tag grow will accept data in sorted order
52. Difference between hash and buffet components
Hash: Hash is used in a single job
Buffer: Buffer can be used in different job
53. Difference between partition and departure
Partition: Partition is you can split data into number of flows
Departure: Departure is to make a single flow for a number of flows
54. How can we improve the performance of a job
we can improve the performance of a job by
Reduce the number of maps
Remove unnecessary columns
Optimize query if included
55. Design a job where the job should start if there are records in the DB.
We have to use an online parameter if it is greater than 0 job should start.
56. Design a job where it can read files when the file is in the folder.
We have a component file where we have to write code if the file exists.
57. Design a job where updated records or newly inserted records must be loaded into the target
Design a job where we have to design it should pic from next record use Count of db
58. Explain parallelism
All jobs can run parallel where performance can be increased
59. How to improve map performance
We have advanced settings to delete cache data, we have to use it.
60. what is the use of a unique match
It will pick unique record from a set of records
61. Explain the ETL testing operations included?
ETL testing includes
- Verify whether the data is transforming correctly according to business requirements
- Verify that the projected data is loaded into the data warehouse without any truncation and data loss
- Make sure that the ETL application reports invalid data and replaces it with default values.
- Make sure that data loads at the expected time frame to improve scalability and performance.
62. Mention What are the types of data warehouse applications
The types of data warehouse applications are
- Info Processing
- Analytical Processing
- Data Mining
63. What are the various tools used in ETL?
- Cognos Decision Stream
- Oracle Warehouse Builder
- Business Objects XI
- SAS business warehouse
- SAS Enterprise ETL
64. What is fact?
It is a central component of a multi-dimensional model which contains the measures to be analysed. Facts are related to dimensions.
65.Explain what are Cubes and OLAP Cubes?
Cubes are data processing units composed of fact tables and dimensions from the data warehouse. It provides multidimensional analysis.
OLAP stands for Online Analytical Processing, and OLAP cube stores large data in multi-dimensional form for reporting purposes. It consists of facts called as measures categorized by dimensions.
66. Explain what is tracing level and what are the types?
Tracing level is the amount of data stored in the log files. Tracing level can be classified in two Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose explains the tracing levels at each and every row.
67. Explain what a Grain of Fact is?
Grain fact can be defined as the level at which the fact information is stored. It is also known as Fact Granularity
68. Explain what factless fact schema is and what is Measures?
A fact table without measures is known as Factless fact table. It can view the number of occurring events. For example, it is used to record an event such as employee count in a company.
The numeric data based on columns in a fact table is known as Measures.
69. Explain what is transformation?
A transformation is a repository object which generates, modifies or passes data. Transformation are of two types Active and Passive
70. Explain the use of Lookup Transformation?
- The Lookup Transformation is useful for
- Getting a related value from a table using a column value
- Update slowly changing dimension table
- Verify whether records already exist in the table
71.What are the types of facts?
Types of facts are
- Additive Facts
- Semi-additive Facts
- Non-additive Facts
72.Mention What is the advantage of using a DataReader Destination Adapter?
The advantage of using the DataReader Destination Adapter is that it populates an ADO recordset (consist of records and columns) in memory and exposes the data from the DataFlow task by implementing the DataReader interface, so that other applications can consume the data.
73.Using SSIS ( SQL Server Integration Service) what are the possible ways to update tables?
To update table using SSIS the possible ways are:
- Use a SL command
- Use a staging table
- Use Cache
- Use the Script TaskUse full database name for updating if MSSL is used
74. In case you have a non-OLEDB (Object Linking and Embedding Database) source for the lookup, what would you do?
In case if you have non-OLEDB source for the lookup then you have to use Cache to load data and use it as source.
75. In what case do you use dynamic cache and static cache in connected and unconnected transformations?
Dynamic cache is used when you have to update the master table and slowly changing dimensions (SCD) type 1 For flat files Static cache is used.
76. Explain what a data source view is?
A data source view allows to define the relational schema which will be used in the analysis services databases. Rather than directly from data source objects, dimensions and cubes are created from data source views.
77. Explain what is the difference between OLAP tools and ETL tools ?
The difference between ETL and OLAP tools is that ETL tool is meant for the extraction of data from the legacy systems and load into a specified database with some process of cleansing data.
Example: Data stage, Informatica etc.
While OLAP is meant for reporting purposes in OLAP data available in multi-directional models.
Example: Business Objects, Cognos etc.
78.How you can extract SAP data using Informatica?
- With the power connect option you extract SAP data using informatica
- Install and configure the PowerConnect tool
- Import the source into the Source Analyzer. Between Informatica and SAP Powerconnect act as a gateway. The next step is to generate the ABAP code for the mapping then only informatica can pull data from SAP
- To connect and import sources from external systems PowerConnect is used
79.Explain what staging area ?
Data staging is an area where you hold the data temporarily on a data warehouse server.
Data staging includes following steps
- Source data extraction and data transformation ( restructuring )
- Data transformation (data cleansing, value transformation )
- Surrogate key assignments
- For the various business processes to identify the common dimensions, BUS schema is used.
- It comes with a conformed dimension along with a standardized definition of information.
80. Explain what is data purging?
Data purging is a process of deleting data from a data warehouse. It deletes junk data like rows with null values or extra spaces.
81. Explain what Schema Objects are?
Schema objects are the logical structure that directly refer to the database’s data. Schema objects include tables, views, sequence synonyms, indexes, clusters, functions packages and database links.
82. Explain these terms Session, Worklet, Mapplet and Workflow ?
Mapplet : It arranges or creates sets of transformation
Worklet: It represents a specific set of tasks given
Workflow: It’s a set of instructions that tell the server how to execute tasks Session: It is a set of parameters that tells the server how to move data from sources to target
83. What is an ETL process?
ETL is the process of Extraction, Transformation, and Loading.
84. How many steps are there in an ETL process?
In an ETL process, first data is extracted from a source, such as database servers, and this data is then used to generate business roles.
85.What are the steps involved in an ETL process?
The steps involved are defining the source and the target, creating the mapping, creating the session, and creating the workflow.
86. Can there be sub-steps for each of the ETL steps?
Each of the steps involved in ETL has several sub-steps. The transform step has more sub-steps.
87. What are initial load and full load?
In ETL, the initial load is the process for populating all data warehousing tables for the very first time. In full load, when the data is loaded for the first time, all set records are loaded at a stretch depending on its volume. It would erase all contents from the table and would reload the fresh data.
88. What is meant by incremental load?
Incremental load refers to applying dynamic changes as and when required in a specific period and predefined schedules.
89. What is a 3-tier system in ETL?
The data warehouse is considered to be the 3-tier system in ETL.
90. What are the names of the layers in ETL?
The first layer in ETL is the source layer, and it is the layer where data lands. The second layer is the integration layer where the data is stored after transformation. The third layer is the dimension layer where the actual presentation layer stands.
91. What is meant by snapshots?
Snapshots are the copies of the read-only data that is stored in the master table.
92. What are the characteristics of snapshots?
Snapshots are located on remote nodes and refreshed periodically so that the changes in the master table can be recorded. They are also the replica of tables.
93. What are views?
Views are built using the attributes of one or more tables. Views with a single table can be updated, but those with multiple tables cannot be updated.
94. What is meant by a materialized view log?
A materialized view log is the pre-computed table with aggregated or joined data from the fact tables, as well as the dimension tables.
95. What is the difference between PowerCenter and PowerMart?
PowerCenter processes large volumes of data, whereas Power Mart processes small volumes of data.
96. With which apps can PowerCenter be connected?
PowerCenter can be connected with ERP sources such as SAP, Oracle Apps, PeopleSoft, etc.
97. Which partition is used to improve the performances of ETL transactions?
To improve the performances of ETL transactions, the session partition is used.
98. Does PowerMart provide connections to ERP sources?
No! PowerMart does not provide connections to any of the ERP sources. It also does not allow sessions partition
99. What is meant by an operational data store?
The operational data store (ODS) is the repository that exists between the staging area and the data warehouse. The data stored in ODS has low granularity.
Are you looking training with Right Jobs?Contact Us
- Talend Interview Questions and Answers
- Pentaho Interview Questions and Answers
- ETL Tutorial
- Informatica Tutorial: The Ultimate Guide [STEP-IN] | ACTE
- Software Testing Interview Questions and Answers
- What is Dimension Reduction? | Know the techniques
- Difference between Data Lake vs Data Warehouse: A Complete Guide For Beginners with Best Practices
- What is Dimension Reduction? | Know the techniques
- What does the Yield keyword do and How to use Yield in python ? [ OverView ]
- Agile Sprint Planning | Everything You Need to Know