Data Modeling Interview Questions and Answers
Last updated on 07th Oct 2020, Blog, Interview Question
Here I am going to share some Data Modeling interview questions and detailed answers based on my own experience during interview interactions in a few renowned IT MNCs.
Below question-answers can be of your great help if you get a chance to face or take an interview on Data Modelling.
1. What Are the Three Types of Data Models?
The three types of data models:
Physical data model – This is where the framework or schema describes how data is physically stored in the database.
Conceptual data model – This model focuses on the high-level, user’s view of the data in question
Logical data models – They straddle between physical and theoretical data models, allowing the logical representation of data to exist apart from the physical storage.
2. What is a Table?
A table consists of data stored in rows and columns. Columns, also known as fields, show data in vertical alignment. Rows, also called a record or tuple, represent data’s horizontal alignment.
3. What is Normalization?
Database normalization is the process of designing the database in such a way that it reduces data redundancy without sacrificing integrity.
4. What Does a Data Modeler Use Normalization For?
The purposes of normalization are:
- Remove useless or redundant data
- Reduce data complexity
- Ensure relationships between the tables in addition to the data residing in the tables
- Ensure data dependencies and that the data is stored logically.
5. So, What is Denormalization, and What is its Purpose?
Denormalization is a technique where redundant data is added to an already normalized database. The procedure enhances read performance by sacrificing write performance.
6. What Does ERD Stand for, and What is it?
ERD stands for Entity Relationship Diagram, and is a logical entity representation, defining the relationships between the entities. Entities reside in boxes, and arrows symbolize relationships.
7. What’s the Definition of a Surrogate Key?
A surrogate key, also known as a primary key, enforces numerical attributes. This surrogate key replaces natural keys. Instead of having primary or composite primary keys, data modelers create the surrogate key, which is a valuable tool for identifying records, building SQL queries, and enhancing performance.
8. What Are the Critical Relationship Types Found in a Data Model? Describe Them.
The main relationship types are:
- Identifying. A relationship line normally connects parent and child tables. But if a child table’s reference column is part of the table’s primary key, the tables are connected by a thick line, signifying an identifying relationship.
- Non-identifying. If a child table’s reference column is NOT a part of the table’s primary key, the tables are connected by a dotted line, signifying a no-identifying relationship.
- Self-recursive. A recursive relationship is a standalone column in a table connected to the primary key in the same table.
9. What is an Enterprise Data Model?
This is a data model that consists of all the entries required by an enterprise.
10. What Are the Most Common Errors You Can Potentially Face in Data Modeling?
- These are the errors most likely encountered during data modeling.
- Building overly broad data models: If tables are run higher than 200, the data model becomes increasingly complex, increasing the likelihood of failure
- Unnecessary surrogate keys: Surrogate keys must only be used when the natural key cannot fulfill the role of a primary key
- The purpose is missing: Situations may arise where the user has no clue about the business’s mission or goal. It’s difficult, if not impossible, to create a specific business model if the data modeler doesn’t have a workable understanding of the company’s business model
- Inappropriate denormalization: Users shouldn’t use this tactic unless there is an excellent reason to do so. Denormalization improves read performance, but it creates redundant data, which is a challenge to maintain.
11. Explain the Two Different Design Schemas.
The two design schemas are called Star schema and Snowflake schema. The Star schema has a fact table centered with multiple dimension tables surrounding it. A Snowflake schema is similar, except that the level of normalization is higher, which results in the schema looking like a snowflake.
12. What is a Slowly Changing Dimension?
These are dimensions used to manage both historical data and current data in data-warehousing. There are four different types of slowly changing dimensions: SCD
Type 0 through SCD Type 3.
13. What is Data Mart?
A data mart is the most straightforward set of data warehousing and is used to focus on one functional area of any given business. Data marts are a subset of data warehouses oriented to a specific line of business or functional area of an organization (e.g., marketing, finance, sales). Data enters data marts by an assortment of transactional systems, other data warehouses, or even external sources.
14. What is Granularity?
Granularity represents the level of information stored in a table. Granularity is defined as high or low. High granularity data contains transaction-level data. Low granularity has low-level information only, such as that found in fact tables.
15. What is Data Sparsity, and How Does it Impact Aggregation?
Data sparsity defines how much data we have for a model’s specified dimension or entity. If there is insufficient information stored in the dimensions, then more space is needed to store these aggregations, resulting in an oversized, cumbersome database.
Subscribe For Free Demo
Error: Contact form not found.
16. What Are Subtype and Supertype Entities?
Entities can be broken down into several sub-entities, or grouped by specific features. Each sub-entity has relevant attributes and is called a subtype entity. Attributes common to every entity are placed in a higher or super level entity, which is why they are called supertype entities.
17. In the Context of Data Modeling, What is the Importance of Metadata?
Metadata is defined as “data about data.” In the context of data modeling, it’s the data that covers what types of data are in the system, what it’s used for, and who uses it.
18. Should All Databases Be Rendered in 3NF?
No, it’s not an absolute requirement. However, denormalized databases are easily accessible, easier to maintain, and less redundant.
19. What’s the Difference Between Forward and Reverse Engineering, in the Context of Data Models?
Forward engineering is a process where Data Definition Language (DDL) scripts are generated from the data model itself. DDL scripts can be used to create databases. Reverse Engineering creates data models from a database or scripts. Some data modeling tools have options that connect with the database, allowing the user to engineer a database into a data model.
20. What Are Recursive Relationships, and How Do You Rectify Them?
Recursive relationships happen when a relationship exists between an entity and itself. For instance, a doctor could be in a health center’s database as a care provider, but if the doctor is sick and goes in as a patient, this results in a recursive relationship. You would need to add a foreign key to the health center’s number in each patient’s record.
21. What’s a Conformed Dimension?
If a dimension is confirmed, it’s attached to at least two fact tables.
22. Why Are NoSQL Databases More Useful than Relational Databases?
NoSQL databases have the following advantages:
- They can store structured, semi-structured, or unstructured data
- They have a dynamic schema, which means they can evolve and change as quickly as needed
- NoSQL databases have sharding, the process of splitting up and distributing data to smaller databases for faster access
- They offer failover and better recovery options thanks to the replication
- It’s easily scalable, growing or shrinking as necessary
23. What’s a Junk Dimension?
This is a grouping of low-cardinality attributes like indicators and flags, removed from other tables, and subsequently “junked” into an abstract dimension table. They are often used to initiate Rapidly Changing Dimensions within data warehouses.
24. If a Unique Constraint Gets Applied to a Column, Will It Generate an Error If You Attempt to Place Two Nulls in It?
No, it won’t, because null error values are never equal. You can put in numerous null values in a column and not generate an error.
25. What is a sequence clustering algorithm?
Sequence clustering algorithm collects paths which are similar or related to each other and sequences of data having events.
26. What is discrete and continuous data?
Discrete data is a finite data or defined data. E.g., gender, telephone numbers. Continuous data is data that changes in a continuous and ordered manner. E.g., age.
27. What is the time series algorithm?
Time series algorithm is a method to predict continuous values of data in tables. E.g., Performance one employee can forecast the profit or influence.
28. What is Business Intelligence?
BI (Business Intelligence) is a set of processes, architectures, and technologies that convert raw data into meaningful information that drives profitable business actions. It is a suite of software and services to transform data into actionable intelligence and knowledge.
29. What is a bit mapped index?
Bitmap indexes are a special type of database index that uses bitmaps (bit arrays) to answer queries by executing bitwise operations.
30. Explain data warehousing in detail
Data warehousing is a process for collecting and managing data from varied sources. It provides meaningful business enterprise insights. Data warehousing is typically used to connect and analyse data from heterogeneous sources. It is the core of the BI system, which is built for data analysis and reporting.
31. Explain data scheme
Data Scheme is a diagrammatic representation that illustrates data relationships and structures.
32. Explain data collection frequency
Data collection frequency is the rate to collect the data. It also passes through various stages. These stages are: 1) extracting from various sources, 3) transforming, 4) cleansing, and 5) storing.
33. What is database cardinality?
Cardinality is a numerical attribute of the relationship between two entities or entity sets.
34. What are the different types of cardinal relationships?
Different types of key cardinal relationships are:
- One-to-One Relationships
- One-to-Many Relationships
- Many-to-One Relationships
- Many-to-Many Relationships
35. Define Critical Success Factor and list its four types
Critical Success Factor is a favorable result of any activity needed for the organization to reach its goal.
Four types of critical success factor are:
- Industry CSFs
- Strategy CSFs
- Environmental CSFs
- Temporal CSFs
36. What is data mining?
Data mining is a multi-disciplinary skill that uses machine learning, statistics, AI, and database technology. It is all about discovering unsuspected / previously unknown relationships amongst the data.
37. What is the difference between Star schema and Snowflake schema?
|Hierarchies for the dimensions are stored in the dimensional table.
|Hierarchies are divided into separate tables.
|It contains a fact table surrounded by dimension tables.
|One fact table surrounded by dimension table which is inturn surrounded by dimension table
|In a star schema, only a single join creates the relationship between the fact table and any dimension tables.
|A snowflake schema requires many joins to fetch the data.
|It has a simple database design
|It has a complex database design
|Denormalized data structure and query also run faster.
|Normalized data Structure.
|High level of data redundancy
|Very low-level data redundancy
|Offers higher-performing queries using Star Join Query Optimization. Tables may be connected with multiple dimensions.
|The Snow Flake Schema is represented by a centralized fact table which is unlikely connected with multiple dimensions.
38. What is an identifying relationship?
Identifying entity relationships in DBMS is used to identify a relationship between two entities:
- strong entity
- weak entity.
Learn Data Modeling Training to Advance Your Career with Expert-led Course
- Instructor-led Sessions
- Real-life Case Studies
39. What is a self-recursive relationship?
Recursive relationship is a standalone column in a table which is connected to the primary key of the same table.
40. Explain relational data modelling
Relational data modelling is representation of objects in a relational database, which is usually normalized.
41. What is predictive modelling analytics?
The process of validating or testing a model which would be used to predict testing and validating outcomes. It can be used for machine learning, artificial intelligence, as well as statistics.
42. What is the difference between logical data model and physical data model?
|Logical data model
|Physical data model
|A logical data model can design the requirement of business logically.
|A physical data model provides information about the target database source and its properties.
|It is responsible for the actual implementation of data which is stored in the database.
|A physical data model helps you to create a new database model from existing and apply the referential integrity constraint.
|It contains an entity, primary key attributes, Inversion keys, alternate key, rule, business relation, definition, etc.
|A physical data model contains a table, key constraints, unique key, columns, foreign key, indexes, default values, etc.
43. What are the different types of constraints?
A different type of constraint could be unique, null values, foreign keys, composite key or check constraint, etc.
44. What is a data-modelling tool?
Data modelling tool is a software that helps in constructing data flow and the relation between data. Examples of such tools are Borland Together, Altova DatabaseSpy, casewise, Case Studio 2, etc.
45. What is hierarchical DBMS?
In the hierarchical database, model data is organized in a tree-like structure. Data is stored in a hierarchical format. Data is represented using a parent-child relationship. In hierarchical DBMS parents may have many children, children have only one parent.
46. What are the drawbacks of the hierarchical data model?
The drawbacks of the hierarchical data model are:
- It is not flexible as it takes time to adapt to the changing needs of the business.
- The structure poses the issue in, inter-departmental communication, vertical communication, as well as inter-agency communication.
- Hierarchical data models can create problems of disunity.
47. Explain the process-driven approach of data modelling
Process-driven approach used in data modelling follows a step by step method on the relationship between the entity-relationship model and organizational process.
48. What are the advantages of using data modelling?
The advantages of using data modelling in data warehousing are:
- It helps you to manage business data by normalizing it and defining its attributes.
- Data modelling integrates the data of various systems to reduce data redundancy.
- It enables efficient database design.
- Data modelling helps the organization department to function as a team.
- It facilitates access to data with ease.
49. What are the disadvantages of using data modelling?
The disadvantages of using data modelling are:
- It has less structural independency
- It can make the system complex.
50. What is an index?
Index is used for a column or group of columns to retrieve data fast.
51. What are the characteristics of a logical data model?
Characteristics of logical data model are:
- Describes data needs for a single project but could integrate with other logical data models based on the scope of the project.
- Designed and developed independently from the DBMS.
- Data attributes will have data types with exact precisions and length.
- Normalization processes to the model, which are generally applied typically till 3NF.
52. What are the characteristics of physical data models?
Characteristics of physical data model are:
- The physical data model describes data needed for a single project or application. It may be integrated with other physical data models based on project scope.
- Data model contains relationships between tables that address cardinality and nullability of the relationships.
- Developed for a specific version of a DBMS, location, data storage, or technology to be used in the project.
- Columns should have exact data types, lengths assigned, and default values.
- Primary and foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.
53. What are the two types of data modelling techniques?
Two types of data modelling techniques are:
- entity-relationship (E-R) Model
- UML (Unified Modelling Language).
54. What is UML?
UML (Unified Modelling Language) is a general-purpose, database development, modelling language in the field of software engineering. The main intention is to provide a generalized way to visualize system design.
55. Explain object-oriented database model
The object-oriented database model is a collection of objects. These objects can have associated features as well as methods.
Join Data Modeling Course with Global Recognised CertificationWeekday / Weekend BatchesSee Batch Details
56. What is a network model?
It is a model which is built on a hierarchical model. It allows more than one relationship to link records, which indicates that it has multiple records. It is possible to construct a set of parent records and child records. Each record can belong to multiple sets that enable you to perform complex table relationships.
57. What is hashing?
Hashing is a technique which is used to search all the index values and retrieve desired data. It helps to calculate the direct location of data, which is recorded on disk without using the structure of the index.
58. What are business or natural keys?
business or natural keys is a field that uniquely identifies an entity. For example, client ID, employee number, email etc.
59. What is a compound key?
When more than one field is used to represent a key, it is referred to as a compound key.
60. What is the first normal form?
First normal form or 1NF is a property of a relation available in a relational database management system. Any relation is called the first normal form if the domain of every attribute contains values which are atomic. It contains one value from that domain.
61. What is the difference between primary key and foreign key?
|Primary key helps you to uniquely identify a record in the table.
|Foreign key is a field in the table that is the primary key of another table.
|Primary Key never accepts null values.
|A foreign key may accept multiple null values.
|Primary key is a clustered index, and data in the DBMS table are physically organized in the sequence of the clustered index.
|A foreign key cannot automatically create an index, clustered, or non-clustered. However, you can manually create an index on the foreign key.
|You can have the single Primary key in a table.
|You can have multiple foreign keys in a table.
62. What are the requirements of the second normal form?
The requirements of second normal form are:
- It should be in first normal form.
- It does not contain any non-prime attribute, which is functionally dependent on any subset of candidate keys of the table relation.
63. What are the rules for third normal form?
Rules for third normal forms are:
- It should be in second normal form
- It has no transitive functional dependencies.
64. What is the importance of using keys?
- Keys help you to identify any row of data in a table. In a real-world application, a table could contain thousands of records.
- Keys ensure that you can uniquely identify a table record despite these challenges.
- Allows you to establish a relationship between and identify the relation between tables
- Help you to enforce identity and integrity in the relationship.
65. Explain alternate key in detail
Alternate key is a column or group of columns in a table that uniquely identifies every row in that table. A table can have multiple choices for a primary key, but only one can be set as the primary key. All the keys which are not primary keys are called an Alternate Key.
66. What is the fourth normal form in DBMS?
Fourth normal form is a level of database normalization where there must not have non trivial dependency other than candidate key.
67. What is a database management system?
Database management system or DBMS is a software for storing and retrieving user data. It consists of a group of programs which manipulate the database.
68. What is the rule of fifth normal form?
A table is in 5th normal form only if it is in 4th normal form, and it cannot be decomposed into any number of smaller tables without loss of data.
69. Explain the characteristics of a database management system
- Provides security and removes redundancy
- Self-describing nature of database system
- Insulation between programs and data abstraction
- Support of multiple views of data.
- Sharing of data and multi user transaction processing
- DBMS allows entities and relations among them to form tables.
- It follows the ACID concept (Atomicity, Consistency, Isolation, and Durability).
- DBMS supports a multi-user environment that allows users to access and access and manipulate data in parallel.
70. List out popular DBMS software
Popular DBMS software is:
9.Microsoft SQL Server.
71. Explain the concept of RDBMS
Relational Database Management System is a software which is used to store data in the form of tables. In this kind of system, data is managed and stored in rows and columns, which is known as tuples and attributes. RDBMS is a powerful data management system and is widely used across the world.
72. What are the advantages of data models?
Advantages of the data model are:
- The main goal of a designing data model is to make sure that data objects offered by the functional team are represented accurately.
- The data model should be detailed enough to be used for building the physical database.
- The information in the data model can be used for defining the relationship between tables, primary and foreign keys, and stored procedures.
- Data Model helps businesses to communicate within and across organizations.
- Data model helps to documents data mappings in the ETL process
- Help to recognize correct sources of data to populate the model
73. What are the disadvantages of Data Model?
Disadvantages of Data model are:
- To develop Data model, one should know physical data stored characteristics.
- This is a navigational system that produces complex application development, management. Thus, it requires knowledge of the biographical truth.
- Even smaller changes made in structure require modification in the entire application.
- There is no set of data manipulation language in DBMS.
74. Explain various types of fact tables
There are three types of fact tables:
Additive: It is a measure that is added to any dimension.
Non-additive: It is a measure that can’t be added to any dimension.
Semi-additive: It is a measure that can be added to a few dimensions.
75. What is an aggregate table?
The aggregate table contains aggregated data that can be calculated using functions such as:
76. What is Conformed dimension?
A conformed dimension is a dimension which is designed in a way that can be used across many fact tables in various areas of a data warehouse.
77. List types of Hierarchies in data modelling
There are two types of Hierarchies: 1) Level based hierarchies and 2) Parent-child hierarchies.
78. What is the difference between a data mart and data warehouse?
|Data mart focuses on a single subject area of business.
|Data warehouse focuses on multiple areas of business.
|It is used to make tactical decisions for business growth.
|It helps business owners to take a strategic decision
|Data mart follows the bottom-up model
|Data warehouse follows a top-down model
|Data source comes from one data source
|Data source comes from more than one heterogeneous data source.
79. What is XMLA?
XMLA is an XML analysis that is considered as standard for accessing data in Online Analytical Processing (OLAP).
80. Explain junk dimension
Junk dimension helps to store data. It is used when data is not proper to store in a schema.
81. Explain chained data replication
The situation when a secondary node selects a target using ping time or when the closest node is a secondary, it is called chained data replication.
82. Explain Virtual Data Warehousing
A virtual data warehouse gives a collective view of the completed data. A virtual data warehouse does not have historical data. It is considered as a logical data model having metadata.
83. Explain snapshot of data warehouse
Snapshot is a complete visualization of data at the time when the data extraction process begins.
84. What is a bi-directional extract?
The ability of the system to extract, cleanse, and transfer data in two directions is called a directional extract.
85. What is the surrogate key? Explain it with an example.
- Data warehouses commonly use a surrogate key to uniquely identify an entity. A surrogate is not generated by the user but by the system. A primary difference between a primary key and surrogate key in few databases is that PK uniquely identifies a record while a SK uniquely identifies an entity.
- E.g. an employee may be recruited before the year 2000 while another employee with the same name may be recruited after the year 2000. Here, the primary key will uniquely identify the record while the surrogate key will be generated by the system (say a serial number) since the SK is NOT derived from the data.
86. What is a level of Granularity of a fact table?
- A fact table is usually designed at a low level of Granularity. This means that we need to find the lowest level of information that can store in a fact table.
- E.g. Employee performance is a very high level of granularity. Employee_performance_daily, employee_perfomance_weekly can be considered lower levels of granularity.
87. What are Critical Success Factors?
Key areas of activity in which favorable results are necessary for a company to reach its goal.
There are four basic types of CSFs which are:
- Industry CSFs
- Strategy CSFs
- Environmental CSFs
- A few CSFs are:
- Your future
- Customer satisfaction
- Product or service development
- Intellectual capital
- Strategic relationships
- Employee attraction and retention
88. Distinguish between OLTP and OLAP?
- OLTP stands for the Online Transaction Processing System & OLAP stands for the Online Analytical Processing System. OLTP maintains the transactional data of the business & is highly normalized generally. On the contrary, OLAP is for analysis and reporting purposes & it is in de-normalized form.
- This difference between OLAP and OLTP also gives you the way to choosing the design of a schema. If your system is OLTP, you should go with star schema design and if your system is OLAP, you should go with snowflake schema.
89. What do you understand by data mart?
- Data marts are for the most part intended for a solitary branch of business. They are designed for the individual departments.
- For Example, I used to work for a health insurance provider company that had different departments in it like Finance, Reporting, Sales and so forth.
- We had a data warehouse that was holding the information pertaining to all these departments and then we have few data marts built on top of this data warehouse. These DataMart were specific to each department. In simple words, you can say that a DataMart is a subset of a data warehouse.
90. What are the different types of measures?
We have three types of measures, namely
- Non- additive measures
- Semi- additive measures
- Additive measures
- Non-additive measures are the ones on top of which no aggregation function can be applied. For Example, a ratio or a percentage column; a flag or an indicator column present in fact table holding values like Y/N, etc. is a non-additive measure.
- Semi- additive measures are the ones on top of which some (but not all) aggregation functions can be applied. For Example, fee rate or account balance.
- Additive measures are the ones on top of which all aggregation functions can be applied. For Example, units purchased.
91. What is a Surrogate key? How is it different from a primary key?
Surrogate Key is a unique identifier or a system-generated sequence number key that can act as a primary key. It can be a column or a combination of columns. Unlike a primary key, it is not picked up from the existing application data fields.
92. Is this true that all databases should be in 3NF?
It is not mandatory for a database to be in 3NF. However, if your purpose is the easy maintenance of data, less redundancy, and efficient access then you should go with a denormalized database.
93. Have you ever come across the scenario of recursive relationships? If yes, how did you handle it?
- A recursive relationship occurs in the case where an entity is related to itself. Yes, I have come across such a scenario.
- Talking about the healthcare domain, it is a possibility that a healthcare provider (say, a doctor) is a patient to any other health care provider. Because, if the doctor himself falls ill and needs surgery, he will have to visit some other doctor for getting the surgical treatment.
- So, in this case, the entity – health care provider is related to itself. A foreign key to the health insurance provider’s number will have to present in each member’s (patient) record.
94. List out a few common mistakes encountered during Data Modelling?
Few common mistakes encountered during Data Modelling are:
- Building massive data models: Large data models are likely to have more design faults. Try to restrict your data model to not more than 200 tables.
- Lack of purpose: If you do not know what your business solution is intended for, you might come up with an incorrect data model. So having clarity on the business purpose is very important to come up with the right data model.
- Inappropriate use of surrogate keys: Surrogate keys should not be used unnecessarily. Use a surrogate key only when the natural key cannot serve the purpose of a primary key.
- Unnecessary denormalization: Don’t denormalize until and unless you have a solid & clear business reason to do so because denormalization creates redundant data which is difficult to maintain.
95. What is the number of child tables that can be created out from a single parent table?
The number of child tables that can be created out of the single parent table is equal to the number of fields/columns in the parent table that are non-keys.
96. Employee health details are hidden from his employer by the health care provider. Which level of data hiding is this? Conceptual, physical or external?
This is the scenario of an external level of data hiding.
97. What particulars would you need to come up with a conceptual model in a healthcare domain project?
For a health care project, below details would suffice the requirement to design a basic conceptual model
Are you looking training with Right Jobs?Contact Us
- Sharepoint Tutorial
- Salesforce Tutorial
- Adobe Experience Manager Tutorial
- WordPress Interview Questions and Answers
- Sitecore Interview Questions and Answers
- What is Dimension Reduction? | Know the techniques
- Difference between Data Lake vs Data Warehouse: A Complete Guide For Beginners with Best Practices
- What is Dimension Reduction? | Know the techniques
- What does the Yield keyword do and How to use Yield in python ? [ OverView ]
- Agile Sprint Planning | Everything You Need to Know