Last updated on 08th Oct 2020, Blog, Tutorials
What is Data Modeling?
The process of creating a model for the storage of data in a database is termed as data modeling. It is a theoretical presentation of data objects and associations among various data objects. Data modeling is a process of formulating data in an information system in a structured format. It helps in analyzing data easily which will further help in meeting business requirements.
The process of data modeling requires data modelers which will precisely work with stakeholders and prospective users of an information system. Data modeling process ends with the creation of a data model that supports the business information system infrastructure. This process also involves understanding the structure of an organization and proposing a solution that enables the organization to achieve its objectives. It bridges the gaps between technical and functional areas.
Subscribe For Free Demo[contact-form-7 404 "Not Found"]
Check out this insightful video on Data Modeling:
Why use a data model?
Primary reasons for using a data model are listed below:
- Visual representation of data helps improve data analysis. It provides a holistic picture of the data which can be used by developers to create a physical database.
- All important data of an enterprise are accurately presented in the model. The data model reduces the chances of data omission. Data omission can lead to incorrect results and faulty reports.
- The data model portrays a better understanding of business requirements.
- It helps in the creation of a robust design that brings the entire data of an organization on the same platform. It assists in identifying the redundant, duplicate, and missing data as well.
- A qualified data model helps in providing better consistency across all projects of an enterprise.
- It improves data quality.
- It helps Project Managers with a better scope and quality management. It also improves performance to the core.
- It defines relational tables, stored procedures, and primary and foreign keys.
Three Perspectives of a Data Model :
Conceptual Model :
This level defines what needs to be present in the structure of the model in order to define and organize business concepts. It mainly focuses on business-oriented entries, attributes, and relations. It is basically designed by Data Architects and Business Stakeholders.
Logical Model :
The logical model defines how the model should be implemented. It broadly includes all kinds of data that need to be captured such as tables, columns, etc. This model is generally designed by Business Analysts and Data Architects.
Physical Model :
The physical model defines how to implement a data model with the help of the database management system. It outlines the implementation methodology in terms of tables, CRUD operations, indexes, partitioning, etc. It is created by Database Administrators and Developers.
Types of Data Models :
There are various approaches to data modeling, but the base concept remains the same for all types of models. Let’s take a glance at some of the data models that are popularly used:
Hierarchical Model :
This is a database modeling that is based on a tree-like structuring. Here, each of the records has a single root or parent. When it comes to sibling records, they are sorted in a particular order. This order is used as the physical order for storing the database. You can use this type of modeling for many real-world model relationships. During 1960s and 1970s, this database model was all the rage. But due to some inefficiencies, they are very rarely used now.
Relational Model :
This was initially proposed as an alternative to the hierarchical model in 1970 by an IBM researcher. It doesn’t require developers to define the data path. Here, data segments are explicitly combined with the help of tables. This model has reduced the program complexity. It requires detailed knowledge of the physical data storage adopted by the organization. Soon after the introduction of this model, it was combined with Structured Query Language (SQL).
Network Model :
The network model can be built on the hierarchical model, wherein it allows multiple relationships among linked records which implies that it has multiple parent records. As per the mathematical set theory, we construct the model with sets of related records. Each set consists of a parent record and multiple child records. Each record can belong to multiple sets and allows the model for conveying complex relationships.
Object-oriented Database Model :
The object-oriented database model consists of a collection of objects. These objects have associated features and methods. We have various kinds of object-oriented databases, namely multimedia database, hypertext database, and more. This type of a database model is known as a post-relational database model as it is not limited to tables, even though it incorporates tables. We can call such types of database models as hybrid models.
Entity–Relationship Model :
As the name indicates, the entity–relationship model is a graphical presentation of entities and their relationships. It is also known as the entity–relationship (E–R) diagram. An entity can be called as a concept, a piece of data, or an object about which the data (and the relations surrounding the data) is stored.
Object-relational Model :
We can think of the object-relational model as a relational model having the advanced functionality of the object-oriented database model. What this type of a database model allows is to let designers incorporate the functions into a familiar table structure.
Facts and Dimensions :
To learn data modeling, one really needs to understand its facts and dimensions.
1. Fact Table: It is a table containing measurements and granularity of every measurement. Facts can be additive or semi-additive, for example, sales.
2. Dimension Table: It is a table that collects fields containing descriptions of business elements and is referred by multiple fact tables.
Dimensional Modeling :
Dimensional modeling is a design technique of data warehouse. It uses confirmed dimensions and facts and helps in easy navigation. Dimensional modeling design helps in fast performance query. Dimensional models are casually known as star schemas.
Keys Related to Dimensional Modeling :
Keys are important to understand while we learn data modeling. Keys of dimensional modeling are divided into five categories.
- 1. Business or Natural Keys: It is a field that uniquely identifies an entity. For example, customer ID, employee number, etc.
- 2. Primary and Alternate Keys: Any field containing a unique record can be called as a primary key. The user needs to select one of all available primary keys, and the remaining become alternate keys.
- 3. Composite or Compound Keys: When more than one field is used to represent a key, it is referred to as a composite key.
- 4. Surrogate Keys: It is a field with no business meaning and is usually auto-generated.
- 5. Foreign Keys: It is a key that points to another key in some other table.
The process of data modeling involves designing and producing all types of data models. These data models are then converted through a data definition language. Data definition language is used to generate a database. This database will then be termed as a fully attributed data model.
Advantages and Disadvantages of Data Models :
- 1. Data objects provided by the functional team are presented accurately with data modeling.
- 2. Data modeling allows you to query data from the database and derive various reports based on the data. It indirectly contributes to data analysis with the help of reports. These reports can be used for improving the quality and productivity of the project.
- 3. Businesses have plenty of data in a variety of formats. Data modeling provides a structured system for such unstructured forms of data.
- 4. Data modeling improves business intelligence by making data modelers work closely with the ground realities of the project which include gathering data from multiple unstructured sources, reporting requirements, spending patterns, etc.
- 5. It improves communication across the organization.
- 6. It helps in documenting data mapping during the ETL process.
- 1. Development of a data model is a very tedious job. One should be aware of the physical characteristics of the data storage.
- 2. This system involves complex application development and knowledge of biographical truth.
- 3. The model is not quite user-friendly. Small changes induced in the system require major modification in the entire application.
Data models are developed for the data to be stored in a database. The main objective of these data models is to ensure that data objects created by the functional team are denoted accurately. As said earlier, even the smallest change in the system will require changes in the entire model. However, despite some drawbacks, the data modeling concept is the first and major phase of database design, because it defines data entities, the relations among data objects, etc. A data model holistically talks about the business rules, government policies, and regulatory compliance on the data.
Data Models Describe Business Entities And Relationships :
Data models are made up of entities, which are the objects or concepts we want to track data about, and they become the tables in a database. Products, vendors, and customers are all examples of potential entities in a data model. Entities have attributes, which are details we want to track about entities—you can think of attributes as the columns in a table. If we have a product entity, the product name could be an attribute.
Entities don’t exist in isolation; they’re connected to each other. The connections between entities in a data model are called relationships, and relationships reflect business rules. Relationships between entities can be one-to-one, one-to-many, or many-to-many.
The relationship between products and vendors can illustrate a one-to-many relationship. For example, if Vendor A makes Widget 1 and Widget 2, that means a single vendor makes many products. If you think about this relationship in the other direction, it still makes sense—I can purchase multiple products from the same vendor. One-to-many relationships are by far the most common type, but it’s important to know how to deal with the other two kinds of relationships.
One-to-one relationships usually occur when a business needs to use entities that are supertypes and subtypes, also called parents and children. For example, a product could be a supertype, and a specific kind of product, like a book, could be the subtype. In some situations, there are good reasons for storing general product data in a separate table from more specific data about books, and this can help in representing hierarchical relationships among entities. In this example, a single book title should be associated with a single product ID number, and vice versa, making this a one-to-one relationship between products and books.
The third kind of relationship, a many-to-many relationship, happens when multiple instances of one entity are related to multiple instances of another entity. Books and authors are a great example of this relationship—one author can write multiple books, and a single book can be written by multiple authors. Although many-to-many relationships exist, you don’t actually see them in data models. Instead, we use associative entities, which are tables that break the many-to-many relationship into many-to-one relationships. In this case, you might have a books table, an authors table, and an author-book table to link the two.
Data Modeling Should Enforce Data Integrity :
When we talk with clients about leveraging their data, data integrity is a crucial prerequisite. Before companies can start using their data to make decisions, they need to be able to trust that the data sets are accurate and reliable. In data modeling, there are two kinds of rules that are foundational to maintaining data integrity—entity integrity and referential integrity.
Entity integrity means that the data within a single entity or table are reliable. The use of primary keys is an essential step toward entity integrity. Primary keys are unique identifiers, such as product ID numbers, that serve the purpose of identifying a particular record and preventing data duplication. There are three parts to the entity integrity rule:
- All entities should have a primary key.
- The values of all primary keys must be unique.
- The value of a primary key cannot be null.
Referential integrity means that the relationship between two entities or tables is reliable. The use of foreign keys is an essential step toward referential integrity. Foreign keys are the primary keys of one table that appear in a different table. The rule of referential integrity says that for any foreign key value in one table, there must be a matching primary key value in the referenced table.
Data Modeling Involves Normalization :
An introduction to data modeling would not be complete without mentioning the concept of normalization. Normalization is the process by which anomalies are avoided and redundancy is eliminated, and the concept was first suggested by a data scientist named Edgar Codd. According to Codd, normalizing a data model means structuring data so that each entity only has one theme or topic. In more technical terms, we refer to this as removing partial dependencies and transitive dependencies.
Let’s say I want to store data about customer purchases. I’ll want to know information like the dollar amount purchased, the customer name, and the store where the purchase took place. I could store all of these data points in a single table, but that would lead to problems. For example, a customer who has made hundreds of purchases may get married and change her name. To reflect that change in my records, I would have to go through my table and change her name for every single purchase she’s ever made. Instead of dealing with hundreds of changes, a normalized data model would allow me to make only one change. In this situation, I should store customer data in a customer table, product data in a product table, store data in a store table, and so on. By using primary keys and foreign keys, I can link these tables together and access all the information I need. Better yet, when I need to change a customer’s name, I only have to make the change in one place. The diagram shown here is a very simple illustration—most data models are much larger and more complex.
Data modeling plays a crucial role in the growth of any business who understands that data-driven decisions are key to their success. Having your data in the right format makes sure that you can get to the answers to your business questions more easily and quickly.
Every day 2.5 quintillion bytes of data are created, and this pace is likewise accelerating at a daily rate. With so much information at our disposal, it is becoming increasingly important for organizations and enterprises to access and analyze the relevant data to predict outcomes and improve services.
However, arbitrarily organizing the data into random structures and relationships is not enough. In order to access the data properly and extract the most out of it, it is essential to model your data correctly.
Are you looking training with Right Jobs?Contact Us
- Salesforce Tutorial
- Adobe Experience Manager Tutorial
- Salesforce Developer Tutorial
- What is Dimension Reduction? | Know the techniques
- Difference between Data Lake vs Data Warehouse: A Complete Guide For Beginners with Best Practices
- What is Dimension Reduction? | Know the techniques
- What does the Yield keyword do and How to use Yield in python ? [ OverView ]
- Agile Sprint Planning | Everything You Need to Know