Scd type LEARNOVITA

What are Slowly Changing Dimension | SCD Types and Implementations | Step-By-Step Process

Last updated on 04th Nov 2022, Artciles, Blog

About author

Bharathi Suresh (Talend ETL Developer )

Bharathi Suresh has extensive experience with SQL, Teradata, Data warehousing, Talend open studio, MDM & MDM, ETL, MySQL. His articles assist in sharing information and abilities in core fields and provide students with informative knowledge.

(5.0) | 18854 Ratings 2156
    • In this article you will learn:
    • 1.Introduction.
    • 2.Slowly changing dimension.
    • 3.Types of Slowly changing dimension.
    • 4.Conclusion.

Introduction:

A Slowly Changing Dimension (SCD) is the dimension that stores and manages the both current and historical data over time in data warehouse.

Slowly changing dimension:

  • In a data management and data warehousing, a slowly changing dimension (SCD) is the dimension that consists of relatively static data that can change slowly but unexpectedly, rather than on regular schedule. Some examples of a specific slowly changing dimensions are entities in the form of the names of a geographic locations, customers or products.
  • For example the database may contain fact table that are stores sales records. This fact table will be linked to dimensions by a foreign keys. One of these dimensions can contain a data about a company’s salespeople: for example regional offices they work in.
  • However a salespeople are sometimes transferred from the one regional office to another. A Historical sales reporting purposes may need keeping records of the fact that a particular sales person was assigned to the particular regional office at earlier date, whereas that sales person is now assigned to a various regional office.
  • Tackling these problems are includes SCD management practises referred to as a Types 0 to 6. Type 6 SCDs are sometimes called a hybrid SCDs.
SCD Types

Types of Slowly changing dimension:

Type 0: retain original

Type 0 dimension attributes are never change and are assigned to attributes that have durable values or are explained as ‘native’.Example: date of birth, original credit score.

Type 1: overwrite

This method overwrites an old data with a new data, and therefore does not track a historical data.Example of a supplier table:

  • supplier_key supplier_code supplier_name supplier_state
  • 123 ABC Acme Supply Company Ca
  • In above example, supplier_code is a natural key and supplier_key is surrogate key. Technically, surrogate key is not necessary, as the row will be unique by a natural key (supplier_code).
  • If supplier moves a headquarters to Illinois, the record will be an overwritten by:
  • supplier_key supplier_code supplier_name supplier_state
  • 123 ABC Acme Supply Company IL
  • The disadvantage of a Type 1 method is that there is no history in a data warehouse. However it has advantage that it is simple to maintain.
  • If one has computed an aggregate table summarizing facts by a supplier state, it will need to be a recalculated when supplier_state changes.

Type 2: add new row

This method tracks a historical data by creating the multiple records for a given natural key in a dimensional tables with various surrogate keys and/or different version numbers. Unlimited history is a preserved for each insert.For example, if a supplier moves to Illinois, version number will be an incremented sequentially: supplier_key supplier_code supplier_name supplier_state version

  • 123 ABC Acme Supply Company Ca 0
  • 124 ABC ACME SUPPLY COMPANY IL 1
  • The Another way is to add an Effective Date column.
  • supplier_key supplier_code supplier_name supplier_state start_date end_date.
  • The start date/time of a second row is equal to the end date/time of previous row. The null End_Date in line two indicates a current tuple version. Instead of a standardized surrogate higher date (such as 9999-12-31) can be used as a end date, so that the field can be included in index, and so that zero-value replacement is not be required when querying Ho.
  • And third method uses an effective date and also current flag.
  • Supplier_key supplier_code supplier_name supplier_state effective_date current_flag.
  • The Current_Flag value of ‘Y’ indicates a current tuple version.
  • Transactions that refer to the particular surrogate key (supplier_key) are then permanently bound to time slice defined by that row of the slowly changing dimension table. A composite table summarizing facts by the supplier state showing a historical position that is the state the supplier was in at a time of the transaction No update is be required. In order to refer to an entity via the natural key, it is important to overcome unique constraint that makes it impossible by aDBMS.
  • If changes are made retroactively to content of a dimension, or if new attributes are added to dimension (for example the Sales_Rep column) that have various effective dates than those already explained , this may result in the need for an existing transactions Updated to reflect a new status. This can be an expensive database operation, so Type 2 SCD is not good choice if a dimensional model is subject to frequent changes.

Type 3: add new attribute

Type 3 preserves the limited amount of history because it is limited to a number of columns specified for a storing historical data. Type 1 and Type 2 have same basic table structure but Type 3 adds additional columns. In following example, an additional column has been added to a table to record the original position of supplier – only a past history is stored.

Type 4: add history table

The Type 4 method is usually referred to be using a “history tables” where one table holds a current data, and an additional table is used to maintain a record of some or all of changes. Both the surrogate keys are referenced in a fact table to increase query performance.

SCD Type 2

Type 5 : Technology build

  • The Type 5 technology builds on a Type 4 mini-dimension by an embedding a “current profile” mini-dimension key in a base dimension which is overwritten as Type 1 attribute. This approach is called a type 5 because 4 + 1 equals 5. Type 5 slowly changing the dimensions allows currently assigned a mini-dimension attribute values to be accessed with the others of base dimension without being linked by a fact table.
  • Logically usually represent a base dimension and the current mini-dimension profile outrigger as a table in a presentation layer.
  • The outrigger attribute must have distinct column names, like “current income level”, to distinguish them from the attributes of a mini-dimensions associated with fact table.
  • Whenever a current mini-dimension changes over time,an ETL team must update/overwrite a Type 1 mini-dimension reference.
  • If outrigger approach does not offer a satisfactory query performance, mini-dimension attributes can be physically embedded (and updated) in a base dimension.

Type 6: Combined approach

  • The type 6 method combines a methods of types 1, 2 and 3 (1 + 2 + 3 = 6). One possible explanation for an origin of the term was that it was coined by a Ralph Kimball during a conversation with the Stephen Pace of Kaleido [citation needed]. Ralph Kimball in a Data Warehouse Toolkit calls this method “unexpected changes with a single-version overlays”.
  • The supplier table begins with the record for an example supplier: supplier_key row_key supplier_code supplier_name current_state historical_state start_date end date current_flag 123 1 ABC Acme Supply Company CA CA 2000-01-01T00:00:00 9999-12-31T23:59:59 Y Current_state and historical_state are be same. The optional Current_Flag attribute indicates the whether this is a current or latest record for this supplier.
  • When Acme Supply Company moves to the Illinois, we add a new record, as in Type 2 processing, although row key is included to ensure we have a unique key for every row: supplier_key row_key supplier_code supplier_name current_state historical_state start_date end date current_flag
  • 123 1 ABC Acme Supply Company IL CA 2000-01-01T00:00:00 2004-12-22T00:00:00 N
  • 123 2 ABC ACME SUPPLY COMPANY IL IL 2004-12-22T00:00:00 9999-12-31T23:59:59 Y

Overwrite a Current_State information in a first record (Row_Key = 1) with a new information, as in type 1 processing. create a new record to track changes, as in a type 2 processing. And can store the history in the another state column (Historical_State), which contains a type 3 processing.For example, if a supplier were to be moved again, would add another record to supplier dimension, and would overwrite contents of the Current_State column.

Type 7: Hybrid[4]

  • An alternative implementation is to place the both surrogate key and natural key in a fact tableThis allows the user to select an appropriate dimension record based on.
  • The primary effective date on a fact record (above), most recent or a current information, Any other date attached to a record of fact.
  • This method allows for much flexible link to dimension, even if one used Type 2 approach rather than Type 6.

Conclusion:

To understand an importance of SCDs, it is best to look at example. A company that sells a cell phones divides devices into three categories: high, medium and basic. In this area, technology develops a rapidly, so that high-end (for example) terminals may be considered a mid-range with an advent of a new equipment a few months later.The company’s ERP transactional system stores an information in the standardised database. On one hand, have the “Sales” table, where each transaction is be recorded. Another “Products” table contains a technical information for mobile, which includes field indicating the range.The marketing department, while loading a data into data warehouse, needs to know how many cell phones are sold every year in each category. What happens when cell phone changes range? By modifying this data in a Products table, all the past sales are also changed to the new category, falsifying the information and making it impossible to know this data for a sure.

Are you looking training with Right Jobs?

Contact Us

Popular Courses