Solved: Join data duplicating records - Qlik Community

qlikuser233 · ‎2023-11-16

I am joining two data sources, the issue is that I am brining in certain columns for each and they share 4 columns with the primary key being unique identifier. The columns that are missing from Tracker 2 are populating as 0's instead of merging into one line.

Tracker:

LOAD Client,

[Client ID],

[Team Cd],

[Policy/Contract Effective Date],

[Policy/Contract Expiration Date],

[Pol/Cont Num],

[Captive or Non-Captive],

[Policy/Contract Effective Date Conversion],

Adjustment,

[Client ID]&[Pol/Cont Num]&[Policy/Contract Effective Date] as [Unique Identifier],
[3133]

[3137]

Outer join (Tracker)

Tracker_2:

LOAD A as [Client ID],

B as [Client],
C as [Pol/Cont Num]

D as [Policy/Contract Effective Date],

I as [Unique Identifier],

Z as [Timing - Waiting for Registration (New/RNW or Adjustment) - Premium - COU, PAU, ACR],

Attached is a photo for one record. It splits into 2 lines, the top line is coming from tracker 2 as it has 0's where it should be filling in the information from Tracker. I have crossed out the unique identifier number but they are identical so I cannot figure out why this is creating a line for each source rather than merging them when the primary key for both sources is identical.

marcus_sommer · ‎2023-11-16

Like @Or mentioned a join is performed against all key-fields which means that all field-values of them must be matching.

Further missing keys and/or any duplicates will have an impact on the resulting data-set and may adding respectively removing records (depending on the applied join-type). To get rid of it could be quite difficult especially if there is not enough knowledge and control in regard to the data-quality. Therefore I suggest to change the join-logic to a mapping which is often easier, more flexible and performant and without any risk to change the number of records:

Don't join - use Applymap instead - Qlik Community - 1467592

View solution in original post

Or · ‎2023-11-16

You marked out the exact parts needed to try and see what's going on, so having to go on a guess here, but... keep in mind that joins are based on all fields with identical names. There's no concept of "primary key" in a Qlik based join.

qlikuser233 · ‎2023-11-16

Everything I marked out is identical but I can redo it with sample data:

Client ID	Client	Pol/Cont Num	effective date	Unique ID	3033	3137
123	Crumbl	456	999	123456999	0	0
123	Crumbl	456	999	123456999	1	587

so 3033 and 3137 are coming from Tracker and I would expect to see one line here but it is creating 2. @Or

qlikuser233 · ‎2023-11-16

Client ID, Pol/Cont Num, effective date and Unique ID are all shared fields but it should only be using Unique ID to match. The rest of the columns have potential duplicates so how can I force it to match based only on unique ID? @Or

Or · ‎2023-11-16

These don't seem to match up - you create the unique ID by concatenating a date, but in this case it's clearly not based on that date? It's also missing at least one of the fields that has an identical name, [Policy/Contract Effective Date], which in your screen capture is showing different values between rows.

Or · ‎2023-11-16

You would want to change the field names on one side - presumably on the outer side - for anything that isn't supposed to match. Otherwise, I don't see how you'd expect to end up with one row when the field has two potential values. You could add a second run with a Group By to handle the doubled rows in whichever way you want to, though.

qlikuser233 · ‎2023-11-16

m

Or · ‎2023-11-16

Again, you can't join two tables which have different values in a column and expect anything other than duplication. This is the correct result. If you want to achieve a different result, you need to use a Group By function to deal with the "duplication" and tell Qlik how you want to pick a value between the two options, or rename the columns so they don't share the same name.

marcus_sommer · ‎2023-11-16

You are not looking really on the table-data else on an UI view. This means the extra row might not come from the join-statement else from another association between your data.

Just comment everything else from your script unless this join-stuff. Further adding a recno() and a rowno() to the load will be quite helpful to track what happens - and then put everything in a table-box and not a table-chart. If it results really in any duplicates it would mean that there are duplicates within the sources and/or that the key-values are exactly matching together. For example identically looking data must not mandatory be equally like a real date and a per date() formatted timestamp ...

qlikuser233 · ‎2023-11-16

They don't have different values in the columns. The shared columns in each source contain identical values. The only thing I am trying to do is add columns on to the Tracker table using Unique Identifier. If I rename the columns then that would just create further duplication. Why would it not use the unique ID to pull in info from the other columns?

These columns are identical in both A as [Client ID],

C as [Pol/Cont Num],

D as [Policy/Contract Effective Date],

I as [Unique Identifier],
The only new columns I need from Tracker 2 are from V-AA. @Or If I am misunderstanding how would I use group by with the ID to achieve this?

Join data duplicating records

Application Development