Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
joioan
Contributor III
Contributor III

A join that doesn't surprisingly work

Hello everyone, I would like to discuss this case and I'm sure someone can explain to me why this happens.
In my following script, table1 is a fact table of data about Activities that actually have happened and table 2 is a  fact able about all activities, including the ones in table1 and more that are announced but haven't happened yet so they cannot be in table1 yet. Dim_team is a dimension table with team details. The cardinality is many to 1 such as an Event_id can only belong to an ID_Activity, but an ID_Activity and can have many Event_ids.  So the question now is: I tried to right join the tables with a composite key "Activity_Key" (because in fact ID_Activity and ID_for_Activity are the same and so are ID_Team and ID_of_Team) so that I can have all data for Activities even when they have nulls in table1.
The result I'm getting is not what I expected, it gets partial information, doesn't match info as it should.
A link table saved the day, but I was -and still am- very confused about that.
 
 
 
 
dim_team:
 
LOAD ID_Team,
      Team_Name
      Team_Code
FROM [$(vPathQVD)team.qvd]
(qvd);
 
 
 
fact_table1:   
//
LOAD
    "Event_id",
    "ID_Person",
    "ID_Team",
    "ID_Activity",
    "ID_Team"&"ID_Activity" as Activity_Key
 
FROM [$(vPathQVD)table1.qvd]
(qvd);
 
 
right join(fact_table1)
//fact_table2
LOAD
 
    "ID_for_Activity",
    "ID_of_Team",
    "ID_of_Team"&"ID_for_Activity" as Activity_Key,
    "Activity_Name",
    "Participants_Number",
     Date,
     Place
 
    
FROM [$(vPathQVD)Activity.qvd]
(qvd);
 
 
I could use some help! Thank you 🙂
Labels (3)
9 Replies
jchx
Contributor III
Contributor III

"so that I can have all data for Activities even when they have nulls in table1." Are you missing the null activities? Any chance you unchecked "Include Null Values" in the measure?

BrunPierre
Partner - Master
Partner - Master

The issue might be because you're using a right join, which retains all records from the right table (fact_table2) and only matches records from the left table (fact_table1). Since fact_table1 is your main fact table containing data about activities that have happened, it's possible that some records from fact_table2 are not present in fact_table1, leading to partial or unexpected results.

joioan
Contributor III
Contributor III
Author

Hello everyone,

@BrunPierre but the thing is I wanted to have data for every activity (even it hasn't happened yet) along with the details for those who happened, so a right join would solve this for me - every activity_id, with nulls if it hasn't happened and all the info matched if it indeed has. Am I grasping something the wrong way about right join? 

 

They were partial in a very weird way, not even retaining all info from fact_table2... weird things. The cardinality shouldn't be a problem right?

BrunPierre
Partner - Master
Partner - Master

Based on your description, it appears that the result would resemble that of an "OUTER JOIN." The final table combines data from both tables based on a related field, while including all records, even if there is no match.

marcus_sommer

What you are doing seems to be suitable to your description. This means if it's not working your data are different to your expectation, for example if the relationship isn't 1:n else n:m and/or the fact_table2 has not all keys of the fact-table1 and/or there are issues with the data-quality of your keys respectively to the way they are created.

If the combined ID fields contain numbers you may get invalide ones by applying:

ID1 & ID2 as Key

Better would be to add a delimiter between them, like:

ID1 & '|' & ID2 as Key

joioan
Contributor III
Contributor III
Author

@marcus_sommer I will certainly try it, because yes, I believe they're ints.

In fact another thing I tried before was to only right join fact_table2 with fact_table1 on field Activity_id, but guess what, data was off, then I created Activity_Key to get all possible combinations for common fields of both tables and then right joined, but only my solution with link table worked. This is a first-timer.

I'm pretty sure the cardinality is 1 to many between my fact tables. 

marcus_sommer

I think if you investigate the missing and/or wrong data more closely you should detect why it happens - just picking one or two and then fixing the cause and then going to the next - because there might be several different reasons.

This shouldn't be done with any charts else with table-boxes and very helpful might be also to add recno() and rowno() in both loads and naming them differently. With it you could track each record. Sometimes it's also useful to load the origin tables additionally as independent tables, like:

qualify *; t1: load *, recno() as RecNo, rowno()as RowNo from fact1; unqualify *;
qualify *; t2: load *, recno() as RecNo, rowno()as RowNo from fact2; unqualify *;

which enables a direct comparing of key-values from the sources as well as from the target.

In each case take a closer look on the above mentioned invalide key-creation.

Personally I wouldn't use either a join nor a link-table else I would use a mapping to merge the information from both facts, for example:

m: mapping load 
    "ID_Team" & '|' & "ID_Activity", 
    "Event_id" & '|' &  "ID_Person"  & '|' & "ID_Team"  & '|' & "ID_Activity"
FROM [$(vPathQVD)table1.qvd] (qvd);
 
and then:
 
fact: load *,
   subfield(applymap('m', "ID_of_Team" & '|' & "ID_for_Activity", 'no match|'), '|', 1) as Event_id,
   ...
from FROM [$(vPathQVD)Activity.qvd] (qvd);
 
because a mapping has no risks to change the number of records, provides the possibilities to calculate directly with the results without applying any following loads, providing the specify a non-matching default-value to avoid NULL and the mapping-table(s) as well as the applymap-call could be nested in various ways and are not mandatory restrict to a single return-field.
joioan
Contributor III
Contributor III
Author

@marcus_sommer , wow such a cool bunch of ideas to try and to think about.

Regarding your last proposed solution, I'm only confused as per why do I need Event_id as a column in my Activity table when all I want to have is activities. Did you do that in order to join with fact1 on event_id after this?

 

marcus_sommer

I just took the fields from your table1 and the right join should be added them, too. And it was also a demonstration to combine multiple returns and how to split them again with a subfield(). If you don't need this field just skip it.