Data Architect Interview Questions and Answers

Data Architect Interview Questions and Answers

Last updated on 19th Oct 2020, Blog, Interview Question

About author

Vikram (Sr Advisor - Data Architect )

High level Domain Expert in TOP MNCs with 8+ Years of Experience. Also, Handled Around 20+ Projects and Shared his Knowledge by Writing these Blogs for us.

(5.0) | 12253 Ratings 2298

Data Architects design, deploy and maintain systems to ensure company information is gathered effectively and stored securely. They analyze both user and database system requirements, create data models and provide functional solutions.Your ideal candidates should have solid technical backgrounds, acquired by Data Science or relevant IT degrees. Use the following interview questions to test candidates on their knowledge of key database structure principles and statistical analysis tools.

1. Have you ever taken part in improving a company’s existing data architecture? Please describe your involvement in the process and the overall impact the changes had on the company?

Ans:

Routine tasks and maintenance are an important part of a data architect’s job. However, as a data architect, you should also be proactive and strive to improve the company’s data processes and structures. Employers want to hire data architects with a critical mindset who are willing to take part in increasing the efficiency and productivity of current environments. So, do your best to show the interviewer you don’t get preoccupied with routine tasks, and you don’t lose sight of the bigger picture.

Example

“In my work experience, marrying external data with internal data in corporate systems can pose a variety of threats to data integrity. That’s why I launched a project where I established a step-by-step screening process for our 3-rd party purchased data. I also managed to further improve the relationship with our data supplier, who, in turn, agreed to run a few checks on their data before sending it to us. This initiative had a positive impact on the company’s data reliability and decreased database errors by 29% within 1 year.”

2. As a data architect, have you faced any challenges related to the company’s data security? How did you ensure the integrity of the data was not compromised?

Ans:

Data security is a top priority for every company. That’s why hiring managers would like to learn more about your experience with data security issues. When answering this question, emphasize that data security is an important aspect of your job, although your background isn’t focused in that particular field.

Example

“When working in a team, it’s sometimes hard to agree on what could pose a security risk. I remember a situation when some colleagues of mine wanted to change the established process for uploading franchise data to our system. I was sure these changes could result in security risks. So, in order to validate my point, I calculated the possible financial loss to the company in case security was compromised. This prompted the team members to modify their plan to strengthen data security measures.”

3. As a data architect, you should be up to date with the latest technologies and developments in the field. How do you keep yourself informed about the new trends in data architecture?

Ans:

When working in a technical role, it’s common to get absorbed in the company’s current processes and miss out on the latest industry developments. Hiring managers will value your willingness to educate yourself despite your busy schedule. So, try to list news resources you’re subscribed to, and mention some conferences or training, or industry events you attend when you have the chance.

Example

“I do my best to stay informed about the latest industry trends and technology advancements. I believe this helps me learn things that can improve my work… Or inspire me to come up with an idea that will benefit the company’s status quo. I’m subscribed to newsfeeds such as InformationWeek and TechNewsWorld. I also attend 2-3 conferences a year where I network with other professionals in the field. Whenever my schedule allows it, I attend specialized training and seminars.”

4. A lot of companies use data from both internal and internal sources. Have you faced any problems while trying to integrate a new external data source into the existing company’s infrastructures? How did you solve these problems?

Ans:

External data often comes from sources using different data formats and systems. Obviously, that may cause a series of issues when importing this data into the company’s data systems. As a data architect, you have to make sure the data format is readable and ready-to-use, before storing it in the data warehouse. With this question, hiring managers want to assess your problem-solving skills when faced with external data integration challenges. So, try to provide an answer that will demonstrate how you address such issues.

Example

“In my work experience, the cause for external data integration issues is usually a different system that creates the data in an incompatible format. Unfortunately, it isn’t possible for all companies to use the same systems. So, I solved this problem by creating and running a script prior to uploading the data in my company’s warehouse tables. The script not only changed the external data format but also ran tests to ensure the new format was compatible with our systems.”

5. Have you worked with open source technology? Tell us about some issues you have come across when using it?

Ans:

When an interviewer asks a specific question like that, the company is either considering using open source technology in the future or is already utilizing it. If you have relevant experience, give some particular examples. And be sure you also highlight your ability to modify the open source programming code. If you haven’t encountered any problems using it, mention any possible disadvantages to open source technology you’re aware of.

Example

“I’ve worked with both Hadoop and MySQL without facing any major problems. Nevertheless, I realize that using open source databases or software utilities has its drawbacks. For example, you have to rely on advice from user forums, as there is no formal customer support to address your issue. Another thing is that developers don’t spend a lot of time on their user interface, so you may lack the resources you need to get started.”

6.  State and describe the different types of SQL joins?

Ans:

The basic types of SQL joins are: inner, left, and right (in SQL theory, there is one more type of join – full. However, it is used very rarely today). The easiest and most intuitive way to explain the difference between the inner, left, and right joins is by using a Venn diagram, which shows all possible logical relations between data sets.

The SQL INNER JOIN lets us select all records from Table A and Table B, as long as there is a match between the columns.

The SQL LEFT JOIN returns all records from the left table, plus the matched values from the right table. In case there are no matches, the left join still returns all rows from the left table and a NULL value from the right.

Regarding the functionality of the SQL RIGHT JOINS – it is identical to LEFT JOINS, but with the opposite direction of the operation.

7. What is a primary key and a foreign key?

Ans:

A primary key is a column (or a set of columns) whose value exists and is unique for every record in a table. It’s important to know that each table can have one and only one primary key.

Therefore, you can think of a primary key as the field (or group of fields) that identifies the content of a table in a unique way. For this reason, the primary keys are also called the unique identifiers of a table.

Another crucial feature of primary keys is they cannot contain null values. This means, in an example with a single-column primary key, there must always be a value inserted in the rows under this column. You cannot leave it blank.

One last remark about primary keys – not all tables you work with will have a primary key, although almost all tables in any database will have a single-column or a multi-column primary key.

A foreign key, instead, is a column (or a set of columns) that references a column (most often the primary key) of another table. Foreign keys can be called identifiers, too, but they identify the relationships between tables, not the tables themselves.

In the relational schemas form of representation, relations between tables are expressed in the following way – the column name that designates the logical match is a foreign key in one table, and it is connected with a corresponding column from another table. Often, the relationship goes from a foreign key to a primary key, but in more advanced circumstances, this will not be the case. To catch the relations on which a database is built, we should always look for the foreign keys, as they show us where the relations are.

8. How many types of data structures does R have?

Ans:

This question is important because virtually everything you do in R involves data in some shape or form. The most commonly used data structures in R are these:

  • Vectors (atomic and lists);
  • Matrixes;
  • Data frames;
  • Factors.

9. What modeling tools have you used in your work so far? Which do you consider efficient or powerful?

Ans:

Even if data modeling isn’t one of your main responsibilities, your role as a data architect requires you to have an in-depth understanding of data modeling. If you lack the experience, demonstrate that you are well-informed on the topic and mention the data modeling tools you find most useful. The interviewer will value that you’re at least familiar with the subject.

Example

 “I’ve used mainly both Oracle SQL Developer Data Modeler, and PowerDesigner. I can say that the Oracle Data Modeler has been more than sufficient for my needs with its dimensional modeling, and integrated source code control that supports collaborative development. However, PowerDesigner also boasts some wonderful technology-centric metadata management capabilities for data architects, and business-centric techniques for non-technical co workers. Overall, I think both tools are worth the try, depending on the company’s needs.”

10. What’s your experience with batch and real-time data processing?

Ans:

Each of these data processing methods can be applied depending on the business case. If you have experience with only one of them, provide examples of situations where the other processing method would be a better fit. This will indicate you have a basic understanding of both batch and real-time data processing.

Example

 ‘I’m familiar with both types of data processing. However, I’ve had more exposure to batch processing. That’s because one of my responsibilities was to write programs that captured, processed, and produced output for the company’s billing department. As I mentioned, I’ve had less experience with real-time data processing. However, I know our company uses it to take immediate action on the data collected from our stores’ POS systems.”

11. In your role as a data architect, what metrics have you created or used to measure the quality of new and existing data?

Ans:

Having established processes to ensure the quality of data is key to a company’s data infrastructure. With this question, the hiring manager wants to assess your relevant experience. Make sure you highlight the particular dimensions you’ve monitored to validate the data quality.

Example

“I’ve always been involved in ensuring data quality in my job as a data architect. My team and I monitored some specific dimensions to validate the quality of data. These included completeness, uniqueness, timeliness, validity, accuracy, and consistency. Monitoring these dimensions helped us detect inconsistencies that could negatively affect the accuracy of data analysis.”

Behavioral Questions

Data architects often work with coworkers from various departments, backgrounds, and responsibilities. This is why you should be prepared to answer some behavioral questions focused on your work style and ability to handle conflict in cross-functional teams.

12. What challenges have you faced working with colleagues with no technical background? How did you address and overcome these challenges?

Ans:

Data architects often work with other departments within a company. That involves collaborating with people who lack technical background and understanding of the data processes. The interviewer would like to assess your communication style and your ability to reach common ground with your coworkers, in spite of your differences. Describe a specific situation to illustrate the issues you encountered and how you solved them.

Example

“I believe a good data architect should understand the needs of the different departments across the company. That said, I’ve had to work with people who don’t fully understand my role and responsibilities on numerous occasions. Some of my coworkers would pose requests that I had to reject due to our data architecture limitations. And that has led to certain tensions. I’d say overcoming such challenges takes time. Gradually, we learned more about each other’s work which helped us brainstorm possible solutions. All in all, making the extra step to educate myself and the others made has made all the difference.”

13. How would you describe your work style?

Ans:

This question is not so much about your personality, but more about how you approach your work to get things done. Talk about the way you handle tasks and projects, and how you communicate with coworkers and clients. Your work style might be: collaborative, well-structured, speedy, flexible, or independent. No matter what word you choose to describe it, keep the job description in mind and how your work style fits the profile.

Example

“I’d describe my work style as collaborative. I like to work on full-team participation projects and co-create with my teammates. If I’m not sure of the direction I should take on a project, I always consult with my team. This way we can work toward consensus and align our ideas.”

14. How would you resolve a conflict within your team?

Ans:

The hiring manager wants to hear about your ability to professionally solve team issues when they occur. Think of an example where you had to use your communication skills to handle a conflict with your coworkers. Or when you managed to help 2 of your teammates find common ground as a mediator.

Example

“I like to think I have excellent conflict management skills. As a data architect in a large company, I’ve worked in a high-stress environment. And that has sometimes caused tension to build up among team members. When this escalates to a conflict, I try to deal with it openly. Usually, I’d organize a group meeting where everyone can voice their concerns. This is how we can sort out the issue and move on with our work on the project.”

15. What is the most critical factor for you when taking a job?

Ans:

There are a lot of factors that may influence your decision to take on a new job. These include:

  • career growth opportunity;
  • compensation;
  • work/life balance;
  • travel required for the role;
  • medical and dental benefits;
  • perks such as a gym membership, onsite kids center, spending account;
  • paid vacation time;
  • the company’s location;
  • the company’s reputation and culture.

Share with the interviewer which factors are most important to you when you consider starting a new job. If you aren’t sure about all the details regarding this position, this is a good time to get informed.

Example

“The most important factors for me, as a data architect, are the company’s industry and the workplace culture. The first one predefines the projects I’ll be involved in. The second one indicates if the work environment will be positive and teamwork-oriented. To me, those are equally important to compensation and benefits.”

Subscribe For Free Demo

Error: Contact form not found.

16. Are you also interviewing with any of our close competitors?

Ans:

If the interviewer wants to know if you’re also applying for a job at a competitor’s company, you can give a direct answer. However, you should refrain from giving away the name of the company or sharing too many details. Let the interviewer know you aren’t putting all of your eggs in one basket. At the same time, try to leave the impression that you are critical when it comes to the companies you apply at.

Example

“I wouldn’t disclose the names of the competitors I’m currently interviewing with. However, I can tell you that I’m in the mid-interview stages with 3 other companies. That said, your company is my first choice and I’m happy that we’ve reached the final stage in the process.

17. How would you assess your performance in the data architect interview questions so far?

Ans:

This is a question you should answer openly. Generally, you would know if you performed well, or if your interview was a disaster. In fact, if you address the issues of your performance, you might get a chance to answer some additional questions that could give you extra points.”

Example

If you think that your performance in the interview is going great:

“I’m positive that the interview has been quite successful and I’m satisfied with my performance. Is there anything you’d like me to clarify from our talk?”

If you think that your performance in the interview is not satisfactory:

“I don’t think I managed to portray myself in the best light possible in this interview. However, I’m always trying to do my best. So, if there’s anything I could further clarify for you, I’d be more than happy to do so.”

Brainteasers help the interviewer assess your logical thinking combined with your ability to come up with a creative problem solution on the spot.

18. What is the sum of the numbers from 1 to 100?

Ans:

There’s a little bit of history coming with this question. The math teacher of young Karl Gauss, the famous mathematician, asked the entire class to sum the numbers from 1 to 100. He expected that the task would require at least half an hour to his students, but was shocked when Gauss gave him the exact number within mere seconds. Anyway, here is how this question is solved.

There are precisely 50 pairs of numbers from 1 to 100, whose sum is 101.

1 + 100 = 101, 2 + 99 = 101, 3 + 98 =101, etc.

50 * 101 = 5050

This trick will work for any series of numbers provided that they are evenly spaced. You need to find the sum of the first and the last number and then multiply by the number of pairs.

19. You are given two containers – one is 5 and the other one is 7 gallons. How do you use them to measure 4 gallons of water?

Ans:

Fill the entire 7 gallon container with water. Then use the water in the 7 gallon container in order to fill the entire 5 gallon container. This would leave 2 gallons of water in the 7 gallon container. Dump the water in the 5 gallon container and then pour in it the 2 gallons of water that are in the 7 gallon container. Fill the entire 7 gallon container with water and then start pouring the water in the 5 gallon container. Given that it is already filled with 2 gallons of water, you will be able to pour only 3 gallons, which means that 4 gallons would remain within the 7 gallon container. This is how you are able to measure 4 gallons of water.

20. Who is a data architect, please explain?

Ans:

The individual who is into data architect roles is a person who can be considered as a data architecture practitioner. So when it comes to data architecture it includes the following stages:

  • Designing
  • Creating
  • Deploying
  • Managing

All of these activities are carried out with the organization’s data architecture.

With their help and skill set, the organization can take a constructive decision of how the data is stored, how the data is consumed and how the data is integrated into different IT systems. In a sense, this process is closely aligned with business architecture, because they should be aware of this process so that the security policies are also taken into consideration.

21.  What are the fundamental skills of a Data Architect?

Ans:

The fundamental skills of a Data Architect are as follows:

  • The individual should possess knowledge about data modeling in detail
  • Physical data modeling concepts
  • Should be familiar with ETL process
  • Should be familiar with Data warehousing concepts
  • Hands-on experience with data warehouse tools and different software
  • Should have experience in terms of developing data strategies
  • Build data policies and plans for executions

22.  What is a data block and what is a data file? Please explain briefly?

Ans:

A data block is nothing but a logical space where the Oracle database data is stored.

A data file is nothing but a file where all the data is available. For every Oracle database, we will be having one or more data files associated.

23.  What is cluster analysis? What is the purpose of cluster analysis?

Ans:

A cluster analysis is defined as a process where an object is defined without giving any label to it. It uses statistical data analysis techniques and processes the data mining job. Using cluster analysis, an iterative process of knowledge discovery is processed in the form of trails.

The purpose of cluster analysis:

  • It is scalable
  • It can deal with different set of attributes
  • High dimensionality
  • Interpretability

24.  What is virtual Data warehousing?

Ans:

A virtual data warehouse provides a view of completed data. Within Virtual data warehousing, it doesn’t have any historical data and it can be considered as a logical data model which has the metadata. A virtual data warehouse is a perfect information system where it acts as an appropriate analytical decision-making system.

It is one of the best ways of portraying raw data in the form of meaningful data for executive users which makes business sense and at the same time it provides suggestions at the time of decision making.

25.  What is a snapshot with reference to a data warehouse?

Ans:

As the name itself implies, the snapshot is nothing but a set of complete data visualization when a data extraction is executed. The best part is that it uses less space and it can be easily used to take backup and also the data can be restored quickly from a snapshot.

26.  What is XMLA?

Ans:

XMLA is nothing but XML for analysis purposes. This is considered as a standard for access of data in OLAP. XMLA actually uses discover and execute methods. So the Discover method actually is used to fetch the information from the internet and the execute method is used for the applications to execute against all the data sources that are available.

27.  What is the main difference between view and materialized view?

Ans:

The main difference between view and materialized view is as follows:

View:

  • Data representation is provided by view where the data is accessed from its table.
  • View has a logical structure which does not occupy space
  • All the changes are affected in corresponding tables.

Materialized View:

  • Within materialized view, pre-calculated data is available
  • The materialized view has a physical structure which does occupy space
  • All the changes are not reflected in the corresponding tables.

28.  What is junk dimension?

Ans:

A junk dimension is nothing but a dimension where a certain type of data is stored which is not appropriate to store in the schema. The nature of the junk dimension is usually a Boolean has flag values.

A single dimension is formed by a group of small dimensions gathered together. This can be considered as a junk dimension.

29.  What is data warehouse architecture?

Ans:

The data warehouse architecture is a three-tier architecture. The following is the three-tier architecture:

  • Bottom Tier
  • Middle Tier
  • Upper Tier

It is nothing but a repository of integrating data which is extracted from different data sources.

30. What are Integrity constraints? What are different types of Integrity constraints?

Ans:

An integrity constraint is nothing but a specific requirement that the data in the database has to meet. It is nothing but a business rule for a particular column in a table. In the data warehouse concept, they are 5 integrity constraints

The following are the integrity constraints:

  • Null
  • Unique key
  • Primary key
  • Foreign key
  • Check

31. Why is that data architect actually monitoring and enforcing compliance data standards? What is the need?

Ans:

The primary idea of keeping the standards high on compliance for data standards is because it will help to reduce the data redundancy and help the team to have quality data. As this information is actually carried out or used throughout the organization.

32.  Explain the different data models that are available in detail?

Ans:

There are three different kinds of data models that are available and they are as follows:

  • Conceptual
  • Logical
  • Physical

Conceptual data model:

As the name itself implies that this data model depicts the high-level design of the available physical data.

Logical data model:

Within the logical model, the entity names, entity relationships, attributes, primary keys and foreign keys will show up.

Physical data model:

Based on this data model, the view will give out more information and showcases how the model is implemented in the database. All the primary keys, foreign keys, tables names and column names will be showing up.

33.  Differentiate between dimension and attribute?

Ans:

In short, dimensions are nothing but which represents qualitative data. For example data like a plan, product, class are all considered as dimensions.

The attribute is nothing but a subset of a dimension. Within a dimension table, we will have attributes. The attributes can be textual or descriptive. For example, product name and product category are nothing but an attribute of product dimensions.

34.  Differentiate between OLTP and OLAP?

Ans:

  • OLTP stands for Online Transaction Process System
  • OLTP is known for maintaining transactional level data of the organization and generally, they are highly normalized. If it is OLTP route then it is going to be a star schema design.
  • OLAP stands for Online Analytical process system.
  • OLAP is known for a lot of analysis and fulfills reporting purposes. It is a denormalized form.

If it is an OLAP route then it is going to be a snowflake schema design.

35.  How to become a data architect?

Ans:

The following are the prerequisites for an individual to start his career in Data Architect.

  • A bachelor’s degree is essential and preferably in computer science background
  • No predefined certifications are necessary, but it is always good to have few certifications related to the field because few of the companies might expect. It is advisable to go through CDMA (Certified 3. Data Management Professional)
  • Should have at least 3-8 years of IT experience.
  • Should be creative, innovative and good at problem-solving.
  • Has good programming knowledge and data modeling concepts
  • Should be well versed with the tools like SOA, ETL, ERP, XML etc

36. The responsibilities of a data architect and data administrator are the same?

Ans:

No, not at all. The responsibilities of data architect are completely different from that of data administrator. For example:

Data architect works on data modeling and designs the database design in a robust manner where the users will be able to extract the information easily. When it comes to data administrators, they are responsible for having the databases run efficiently and effectively.

37.  Is data architect and data scientist roles similar?

Ans:

No, data architect and data scientist roles are two different roles in an organization. The following are few activities that data architect is involved :

  • Data warehousing solutions
  • ETL activities
  • Data Architecture development activities
  • Data modelling
  • The following are few activities that data scientist is involved in:
  • Data cleansing and processing
  • Predictive modelling
  • Machine learning
  • Statistical analysis applied
  • Data visualization

38. What are the different types of measures available?

Ans:

The three different types of measures are available, they are as follows:

  • Non-additive measures
  • Semi-additive measures
  • Additive measures
Course Curriculum

Get Data Architect Training with In-Depth Modules for Beginner to Experts

  • Instructor-led Sessions
  • Real-life Case Studies
  • Assignments
Explore Curriculum

39. What are the common mistakes that encounter during data modeling activity, list them out?

Ans:

The common mistakes that are encountered during data modeling activities are listed below:

  • First and foremost is trying to build massive data models. The problem with large massive data models is that they have more design faults. The ideal case scenarios is to have a data model build which is under 200 table limit
  • Misunderstanding of the business problem, if this is the case then the data model that is built will not suffice the purpose.
  • An inappropriate way of surrogate key usage
  • Carrying out unnecessary denormalization

40. What do you understand by Data Modelling?

Ans:

Data Modelling is the diagrammatic representation showing how the entities are related to each other. It is the initial step towards database design. We first create the conceptual model, then the logical model and finally move to the physical model.

Generally, the data models are created in data analysis & design phase of software development life cycle.

41.Explain your understanding of different data models?

Ans:

 There are three types of data models – conceptual, logical and physical. The level of complexity and detail increases from conceptual to logical to a physical data model.

The conceptual model shows a very basic high level of design while the physical data model shows a very detailed view of design.

  • Conceptual Model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model.
  • Logical Model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.
  • Physical Data Model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database.

42. Throw some light on your experience in Data Modelling with respect to projects you have worked on till date?

Ans:

 I have worked on a project for a health insurance provider company where we have interfaces build in Informatica that transforms and process the data fetched from Facets database and sends out useful information to vendors.

Each of the data entities has its own data attributes. For Example, a data attribute of the provider will be provider identification number, few data attributes of the membership will be subscriber ID, member ID, one of the data attribute of claim will claim ID, each healthcare product or plan will be having a unique product ID and so on.

43. What are the different design schemas in Data Modelling? Explain with the example?

Ans:

There are two different kinds of schemas in data modeling

  • Star Schema
  • Snowflake Schema

Now, I will be explaining each of these schemas one by one.

The simplest of the schemas is star schema where we have a fact table in the center that references multiple dimension tables around it. All the dimension tables are connected to the fact table. The primary key in all dimension tables acts as a foreign key in the fact table.

The star schema is quite simple, flexible and it is in de-normalized form.

In a snowflake schema, the level of normalization increases. The fact table here remains the same as in star schema. However, the dimension tables are normalized.  Due to several layers of dimension tables, it looks like a snowflake and thus it is named as snowflake schema.

44. Which schema is better – star or snowflake?

Ans:

The choice of a schema always depends upon the project requirements & scenarios.

Since star schema is in de-normalized form, you require fewer joins for a query. The query is simple and runs faster in a star schema. Coming to the snowflake schema, since it is in normalized form, it will require a number of joins as compared to a star schema, the query will be complex and execution will be slower than star schema.

Another significant difference between these two schemas is that snowflake schema does not contain redundant data and thus it is easy to maintain. On the contrary, star schema has a high level of redundancy and thus it is difficult to maintain.

Now, which one to choose for your project? If the purpose of your project is to do more of dimension analysis, you should go for snowflake schema. For Example, if you need to find out that “how many subscribers are tied to a particular plan which is currently active?” – go with the snowflake model.

If the purpose of your project is to do more of a metrics analysis, you should go with a star schema.In my project, we used snowflake schema because we had to do analysis across several dimensions and generate summary reports for the business. Another reason for using snowflake schema was it is less memory consumption.

45.What do you understand by dimension and attribute?

Ans:

Dimensions represent qualitative data. For Example, plan, product, class are all dimensions.

A dimension table contains descriptive or textual attributes. For Example, the product category & product name are the attributes of the product dimension.

46. What is a fact & a fact table?

Ans:

 Facts represent quantitative data.

For Example, the net amount due is a fact. A fact table contains numerical data and foreign keys from related dimensional tables. An example of the fact table can be seen from Figure 2 shown above.

47.What are the different types of dimensions you have come across? Explain each of them in detail with an example?

Ans:

There are typically five types of dimensions.

a) Conformed dimensions: A Dimension that is utilized as a part of different areas are called a conformed dimension. It might be utilized with different fact tables in a single database or over numerous data marts/warehouses.

For Example, if the subscriber dimension is connected to two fact tables – billing and claim then the subscriber dimension would be treated as a conformed dimension.

b) Junk Dimension: It is a dimension table comprising of attributes that don’t have a place in the fact table or in any of the current dimension tables. Generally, these are properties like flags or indicators.

For Example, it can be a member eligibility flag set as ‘Y’ or ‘N’ or any other indicator set as true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table then its size gets increased. So, we combine all such attributes and put in a single dimension table called a junk dimension having unique junk IDs with a possible combination of all the indicator values.

c) Role-Playing Dimension: These are the dimensions that are utilized for multiple purposes in the same database.

For Example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called a Role-playing dimension. The primary key of the Date dimension will be associated with multiple foreign keys in the fact table.

d) Slowly Changing Dimension (SCD): These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs

  • Type-0: These are the dimensions where attribute value remains steady with time. For Example, Subscriber’s DOB is a type-0 SCD because it will always remain the same irrespective of the time.
  • Type-1: These are the dimensions where the previous value of the attribute is replaced by the current value. No history is maintained in the Type-1 dimension. For Example, Subscriber’s address (where the business requires to keep the only current address of subscriber) can be a Type-1 dimension.
  • Type-2: These are the dimensions where unlimited history is preserved. For Example, Subscriber’s address (where the business requires to keep a record of all the previous addresses of the subscriber). In this case, multiple rows for a subscriber will be inserted in the table with his/her different addresses. There will be some column(s) that will identify the current address. For Example, ‘Start date’ and ‘End date’. The row where ‘End date’ value will be blank would contain the subscriber’s current address and all other rows will be having previous addresses of the subscriber.
  • Type-3: These are the type of dimensions where limited history is preserved. And we use an additional column to maintain the history. For Example, Subscriber’s address (where the business requires to keep a record of current & just one previous address). In this case, we can dissolve the ‘address’ column into two different columns – ‘current address’ and ‘previous address’. So, instead of having multiple rows, we will be having just one-row showing current as well as the previous address of the subscriber.
  • Type-4: In this type of dimension, the historical data is preserved in a separate table. The main dimension table holds only the current data. For Example, the main dimension table will have only one row per subscriber holding its current address. All other previous addresses of the subscriber will be kept in the separate history table. This type of dimension is hardly ever used.

e) Degenerated Dimension: A degenerated dimension is a dimension that is not a fact but presents in the fact table as a primary key. It does not have its own dimension table. We can also call it as a single attribute dimension table.

But, instead of keeping it separately in a dimension table and putting an additional join, we put this attribute in the fact table directly as a key. Since it does not have its own dimension table, it can never act as a foreign key in the fact table.

48. Give your idea regarding factless fact? And why do we use it?

Ans:

 Factless fact table is a fact table that contains no fact measure in it. It has only the dimension keys in it.

At times, certain situations may arise in the business where you need to have a factless fact table.

You can see that the above table does not contain any measure. Now, if you want to answer the below question, you can do easily using the above single factless fact table rather than having two separate fact tables:

“How many employees of a particular department were present on a particular day?”

So, the factless fact table offers flexibility to the design.

Data Architect Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

49. Distinguish between OLTP and OLAP?

Ans:

 OLTP stands for the Online Transaction Processing System & OLAP stands for the Online Analytical Processing System. OLTP maintains the transactional data of the business & is highly normalized generally. On the contrary, OLAP is for analysis and reporting purposes & it is in de-normalized form.

This difference between OLAP and OLTP also gives you the way to choosing the design of schema. If your system is OLTP, you should go with star schema design and if your system is OLAP, you should go with snowflake schema.

50. What do you understand by data mart?

Ans:

Data marts are for the most part intended for a solitary branch of business. They are designed for the individual departments.

51. What is a Surrogate key? How is it different from a primary key?

Ans:

 Surrogate Key is a unique identifier or a system-generated sequence number key that can act as a primary key. It can be a column or a combination of columns. Unlike a primary key, it is not picked up from the existing application data fields.

52. Is this true that all databases should be in 3NF?

Ans:

 It is not mandatory for a database to be in 3NF. However, if your purpose is the easy maintenance of data, less redundancy, and efficient access then you should go with a denormalized database.

53. Have you ever come across the scenario of recursive relationships? If yes, how did you handle it?

Ans:

A recursive relationship occurs in the case where an entity is related to itself. Yes, I have come across such a scenario.

Talking about the health care domain, it is a possibility that a health care provider (say, a doctor) is a patient to any other health care provider. Because, if the doctor himself falls ill and needs surgery, he will have to visit some other doctor for getting the surgical treatment.

So, in this case, the entity – health care provider is related to itself. A foreign key to the health insurance provider’s number will have to present in each member’s (patient) record.

54. List out a few common mistakes encountered during Data Modelling?

Ans:

 Few common mistakes encountered during Data Modelling are:

  • Building massive data models: Large data models are like to have more design faults. Try to restrict your data model to not more than 200 tables.
  • Lack of purpose: If you do not know that what is your business solution is intended for, you might come up with an incorrect data model. So having clarity on the business purpose is very important to come up with the right data model.
  • Inappropriate use of surrogate keys: Surrogate key should not be used unnecessarily. Use surrogate key only when the natural key cannot serve the purpose of a primary key.
  • Unnecessary de-normalization: Don’t denormalize until and unless you have a solid & clear business reason to do so because de-normalization creates redundant data which is difficult to maintain.

55. What is the number of child tables that can be created out from a single parent table?

Ans:

The number of child tables that can be created out of the single parent table is equal to the number of fields/columns in the parent table that are non-keys.

56. Employee health details are hidden from his employer by the health care provider. Which level of data hiding is this? Conceptual, physical or external?

Ans:

 This is the scenario of an external level of data hiding.

57.What is the form of fact table & dimension table?

Ans:

 Generally, the fact table is in normalized form and the dimension table is in de-normalized form.

58. What particulars you would need to come up with a conceptual model in a health care domain project?

Ans:

 For a health care project, below details would suffice the requirement to design a basic conceptual model

  • Different categories of health care plans and products.
  • Type of subscription (group or individual).
  • Set of health care providers.
  • Claim and billing process overview.

59. If a unique constraint is applied to a column then will it throw an error if you try to insert two nulls into it?

Ans:

 No, it will not throw any error in this case because a null value is unequal to another null value. So, more than one null will be inserted in the column without any error.

60. Can you quote an example of a subtype and supertype entity?

Ans:

 Yes, let’s say we have these different entities – vehicle, car, bike, economy car, family car, sports car.

Here, a vehicle is a super-type entity. Car and bike are its subtype entities. Furthermore, economy cars, sports cars, and family cars are subtype entities of its super-type entity- car.

A super-type entity is the one that is at a higher level. Subtype entities are ones that are grouped together on the basis of certain characteristics

61. What is the significance of metadata?

Ans:

 Metadata is data about data. It tells you what kind of data is actually stored in the system, what is its purpose and for whom it is intended.

Are you looking training with Right Jobs?

Contact Us

Popular Courses