Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Using fact tables to build reference table

I'm looking for some advice on a project I'm working on.  Essentially I have multiple fact tables, and multiple dimensional/reference tables.  Here's the problem: The reference tables are sometimes incomplete.  They need information from the fact tables appended to them to have a complete list of the items.

Good example: I have two sets of sales data that have the Seller ID and Seller Name, and Customer ID and Customer Name.  Because people are always coming and going, one sales data set has some seller IDs or Customer IDs, and the other set may have the other.  The reference/dimensional table that is supposed to have a list of all seller IDs and Names may be missing the name or ID from one or both fact sets.

I can't build a reference/dimensional table from just the facts sets because the fact sets may be missing some of the information (IE may have a customer ID without a name).  Any thoughts would be appreciated.

5 Replies
jagan
Luminary Alumni
Luminary Alumni

Hi Jordan,

If you attach sample data then it would be easier to understand and provide the solution.

Regards,

Jagan.

Anonymous
Not applicable
Author

Hi,

I would suggest that instead of having facts, have dimensions. This way, the dimensions will have data associated to that dataset. EG: a person dimension will have; name, customerID, etc. The fact than will just have references to the dimensions as well as the measures that are associated. Basically, create a star-schema model.

jonathandienst
Partner - Champion III
Partner - Champion III

Here is a design pattern for loading a dimension into a separate dimension table, including possible multiple sources, and handling non referenced id values. This option is good if the dimension contains many attributes:

// Load Fact table

Fact:

LOAD ...,

  CustomerID,

  ...;

// Load first dimension source

// Not exists ensures that only one instance of each ID is loaded

DimCustomer:

LOAD CustomerID,

  CustomerID As _CID,

  CustomerName,

  ...

From Source1

WHERE Not Exists(_CID);

// If you have multiple sources for the dimension, load them like this

// Again, the not exists ensures that you load each ID only once, and

// if it was loaded for the frst source, it is not loaded again.

LOAD CustomerID,

  CustomerID As _CID,

  CustomerName,

  ...

FROM Source2

WHERE Not Exists(_CID);

// Now load the unknowns. These IDs were not found in the reference data

LOAD Distinct CustomerID,

  'Unknown' As CustomerName

Resident Fact

Where Not Exists(_CID, CustomerID);


DROP Field _CID;   // not needed any more


Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
jonathandienst
Partner - Champion III
Partner - Champion III

If you want to integrate the CustomerName (in the previous example) into the act table, then use this pattern instead. This option is good if you only loading one or two fields in the dimension.

// Load first dimension source

// Not exists ensures that only one instance of each ID is loaded

DimCustomer:

LOAD CustomerID,

  CustomerID As _CID,

  CustomerName

From Source1

WHERE Not Exists(_CID);

// If you have multiple sources for the dimension, load them like this

// Again, the not exists ensures that you load each ID only once, and

// if it was loaded for the frst source, it is not loaded again.

LOAD CustomerID,

  CustomerID As _CID,

  CustomerName

FROM Source2

WHERE Not Exists(_CID);

MapCustomers:

Mapping LOAD * Resident DimCustomer;

// Load Fact table

Fact:

LOAD ...,

  CustomerID,

  ApplyMap('MapCustomers', CustomerID, 'Unknown') As CustomerName,

  ...;

DROP Table DimCustomer;

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
Not applicable
Author

This actually seems like exactly what I'm looking for.  I will definitely try this out today and see if it works.