Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Possible alternative to concatenate, or link tables?

I am reading about using concatenation and link tables to resolve circular references associated with a multi-fact QV data model.  As a star schema data modeler, I have always created “logical stars”, consisting of separate fact tables for each business process (Ex: Sales, Purchasing, Customer Service, Inventory, etc.)  I struggle with mixing apples and oranges in to a single table (concatenation).  I also struggle with introducing the possibility of dirty data by "manufacturing data" (in ETL), for all combinations of facts (fact link table).  I am used to other metadata based tools like OBIEE or Business Objects , which use “alias” column names to resolve circular references. 

 

I have demonstrated this in the below QV data model below.  Doing it this way allows multiple fact stars in the same data model with no circular references, and a “real” star schema.

 

Does anyone see any reason my logic is flawed?

 

-------------------------------------------------------------------------------

  

Dimensional Modeling (star schema) for QV data model :

No need to concatenate fact tables, or use link tables to avoid circular references.  Also, xxx_id "foreign keys" are necessary in the facts to allow for slowly changing dimensions.

   

FactSales:

 

%sales_product_id,

%sales_period_id,

%Promotion_id,

sales_amount,

--------------------------


FactPurchases:

 

%purchase_product_id,

%purchase_period_id,

purchase_amount,

-----------------------------------

  

DimProduct:


%sales_product_id,

%purchase_product_id,    (same value as %sales_product_id)

product_code,                     (business key, unique by start_date)

product_name,                   (SCD Type 2)

category,                              (SCD Type2)

product_cost                     (SCD Type2)

start_date,

end_date,

IsCurrentRecordFlag      (set to Yes for only one distinct product_code)

...

--------------

 

DimCalendar:

 

%periodID,  (AutoNumber() generated key, used in set analysis YTD, LYTD, etc)

%sales_period_id  (same value as %PeriodID)

%purchase_period_id (same value as %PeriodID)

QuarterID,  (used in set analysis)

Date,

Year,

Month,

MonthYear,

Quarter,

YearWeek

...

---------------------

 

SalesPromotion:

 

%Promotion_id,

Promotion_name,

promotion_start

promotion_end

...

1 Solution

Accepted Solutions
hic
Former Employee
Former Employee

I strongly agree with Steve here. A very common example is, as Steve points out, Budget vs Actuals. Here you almost always have different granularity in the two fact tables. If you just load the tables as they are, you will get several circular references:

Image1.png

So you have only two options: either to use a link table (containing all the keys), or to use concatenation. My experience is that a concatenation has better performance and is easier to implement. And - you can use (roughly) the same dimensional tables as you have.

HIC

View solution in original post

7 Replies
stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi David,

There are a couple of reasons why I tend to opt for bringing as many dimensions as possible into my main fact table, rather than going for the star or snowflake approach.

The first is one of performance, as users click around your QlikView app all the charts and tables that are visible will be recalculated.  QlikView has to ensure it has all the data required to do that recalculation to hand and associated.  It takes slightly more processing to do this if there are more tables in play.

Another reason I like bringing everything together at the QVD generation stage is that you create a smaller number of more useful QVD files.  If you want a number of developers to all build on the same data layer then you will need to do less explaining if all the dimensions those developers need are in a single file.

Regarding concatenation, I would again advocate this as an approach.  As well as performance here; selections can become a lot simpler if tables are concatenated.  A prime example is an Actuals file and a Budget file.  If you try and join the two on a composite key of Month / Office / SalesPerson (or whatever) then if there are rows in the budget that don't exist in the acutals selections don't work - that is true even when your actuals go up to today and your budget is for the whole year.  If you concatenate the budget onto your actuals any columns that are the same can be selected on (Month, Product Type, Office) and both acutal and budget rows will be selected.  If you pick a dimension in the actuals that doesn't exist in the budget (perhaps Date or SKU) then the budget numbers are excluded - but then a comparison of a single SKU against the whole months budget would not make sense anyway.

With actuals and budgets in separate tables with a join key between them selections on fields can cause a fair bit of weirdness to take place.  Put them in one table and everything is quite a bit simpler.  The same is also true of other combinations of files, such as purchases and sales files.

When concatenating tables in this way you do need to keep an eye on whether you are keeping loads from QVD's optimised (at least for large tables) and there are approaches that can help ensure this.

These sort of performance best practices are exactly the sort of thing I often blog about over on the Quick Intelligence blog.

Cheers,

Steve

Quick Intelligence

Not applicable
Author

Steve,

QVD load design can still be utilized, using multiple facts, (as well as dimensions), so not sure why you say separate facts would not allow that.  Ex Products.qvd, Actuals.qvd, Budget.qvd, etc.

Regarding rows being eliminated in the sales vs actuals, I question that, as long as a calendar selection is made (Ex. Month / year, etc).  Are you saying that if ProductName is also selected in addition to MonthYear (actuals has product grain, then budget numbers (no product grain) would not show?   I will test that out.  If you are right on that one, then my theory will definitely now work.

As far as join keys are concerned, I would use always surrogate number based keys, so there would be no concatenation of "business" keys.

I will test your scenarios out with multiple facts and differing grains, and let you know.  Thanks so much getting back, and shedding light on these issues.

Not applicable
Author

Steve,

My model still generates a circular reference, due to DimCalendar.  So, my theory is incorrect.  Definitely need concatenated tables.

Thanks so much for your insight.

hic
Former Employee
Former Employee

I strongly agree with Steve here. A very common example is, as Steve points out, Budget vs Actuals. Here you almost always have different granularity in the two fact tables. If you just load the tables as they are, you will get several circular references:

Image1.png

So you have only two options: either to use a link table (containing all the keys), or to use concatenation. My experience is that a concatenation has better performance and is easier to implement. And - you can use (roughly) the same dimensional tables as you have.

HIC

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Thank you Henric for adding your thoughts here - it is always nice to have my opinions by someone who really knows the inner workings of QlikView.

The fact of the matter is that with most RDMSs you can have a different set of joins for each query run, and pull data from different data islands to merge into a single report.  QlikView requires you to sort your data model once and everything then works off that.  This can present an initial challenge, but once your data model is right building numerous different interpretations and presentations of that data is easy.

As I always say, when you are building a QlikView app you will typically be spending longer getting the back end right that you do on the presentation.

As long as you recognise, understand and embrace the differences with QlikView data modelling to that of other systems you are away.

Steve

Not applicable
Author

Yes, thanks - I was working on something similar last night which proved out that flaw.  I am now a believer in concatenation.

Now I will have to figure out how Qlikview deals with "role playing" dimensions, which would cause circular references as well.  Ex.  OrderDate, ShipDate both in the same fact table.  But I will not hijack this post with that

hic
Former Employee
Former Employee