Monday, March 23, 2015

Dynamic Labelling of Dimension Attributes

In Dimensional Modelling, dimensions are usually implemented as single, fully denormalized tables. This morphology translates into a simplified and more readable structure as well as into a performance gain when processing user queries, since less joins between tables are involved.

Dimensions essentially consist of attributes. Each dimension attribute should always be uniquely identified by a code and might show one or more descriptive fields:

The "Car" Dimension and its attributes.

In this example, each car to be sold is associated with a model, a manufacturer, a building year, a price, and a color. "building year" and "price" are numeric attributes and don't need therefore any further additional descriptive information - in some cases we might even copy the price field back to the fact tables, in order to allow user analysis over it. But what about "model", "manufacturer" and "color"?

Those attributes are uniquely identified by a code. However, during the processing of the dimensional table, we need a way to dynamically associate these codes with their corresponding text labels. Here a simple, ANSI-standard SQL-based solution:

; with
        ID_iPvt                         =    ID // Dimension ID
        , Color_Descr                   =    max([color])
        , Manufacturer_Descr            =    max([manufacturer])
        , Model_Descr                   =    max([model])
                , m.Code
                , m.Description
                tb_Dim_Car k
                inner join tb_Car_Attribute_Map m on
                    (m.Code = k.Color_Code and m.Type = 'Color')
                    or (m.Code = k.Manufacturer_Code and m.Type = 'Manufacturer')    
                    or (m.Code = k.Model_Code and m.Type = 'Model')
        ) Src
            for [Attribute]  in
                , [manufacturer]
                , [model]
        ) Pvt   
    group by   
    ID                              =    k.ID   
    , Color_Code                    =    k.Color_Code
    , Color_Descr                   =    coalesce(p.Color_Descr, 'n/a')
    , Manufacturer_Code             =    k.Manufacturer_Code
    , Manufacturer_Descr            =    coalesce(p.Manufacturer_Descr, 'n/a')
    , Model_Code                    =    k.Model_Code
    , Model_Descr                   =    coalesce(p.Model_Descr, 'n/a')
    , Building_Year                 =    k.Building_Year
    , Price                         =    k.Price
    tb_Dim_Car k
    left outer join iPvt p on
        p.ID_iPvt = k.ID


This simple query joins every attribute code/key of our "tb_Dim_Car" dimension with the "tb_Car_Attribute_Map", a 2NF table containing all metadata we need in order to allow a dynamic mapping across all attributes:

The tb_Car_Attribute_Map mapping table.

The result of this join will be then pivoted back and outer joined with the original dimensional table. In order to allow this, however, we need a unique "ID" for each dimension row - in case we lack it, a ranking windowing functions comes of course at handy - if supported by the database engine.

This pattern can be easly extended in case of multilngual description attributes by simply adding a "Language Code" column in the mapping table.

No comments:

Post a Comment