Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Rayk
Contributor II
Contributor II

Duplicate records within multiple fields

Hi,

I want to be able to find duplicated records within a table with around 100 fields.

I have a table with over 100 fields and i have a script that finds duplicated ID's but I want to know which field is responsible for the duplication.

ID Mobile Number Cost
10045022 07842479330 32,100
10045022 07982879541 32,100

 

Here is an example the ID is the same so as the cost but the number is different.

 

Any help ill be great.

Labels (1)
1 Solution

Accepted Solutions
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I may be overthinking it, but here is a generalized script to build a table of IDs and Fieldnames that have duplicate values within the ID. 

Data:
LOAD * Inline [
ID, Fa, Fb, Fc, Fd
1, 34, hello, there, again
1, 34, is, a, duplicate
2, 34, not, a, duplicate
3, 99, 2, 3, 4
3, 00, 2, 3, 88
4, 11, 22, 33, 44
]
;
DupeIds:
Mapping
Load ID, Count(ID) as Count
Resident Data
Group By ID;

For idx = 1 to NoOfFields('Data')
Let vFname = FieldName($(idx), 'Data');
If '$(vFname)' <> 'ID' Then

TempDistinct:
// Make a Temp table instead of preceding because distinct
// is applied at end of LOAD
LOAD Distinct
ID, [$(vFname)]
Resident Data
Where ApplyMap('DupeIds', ID) > 1
;
DupeFields:
Load
ID, '$(vFname)' as DupeField
Where Count < ApplyMap('DupeIds', ID)
;
LOAD
ID,
'$(vFname)' as DupeField,
Count([$(vFname)]) as Count
Resident TempDistinct
Group by ID
;
Drop Table TempDistinct;
End if
Next idx

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

View solution in original post

3 Replies
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I may be overthinking it, but here is a generalized script to build a table of IDs and Fieldnames that have duplicate values within the ID. 

Data:
LOAD * Inline [
ID, Fa, Fb, Fc, Fd
1, 34, hello, there, again
1, 34, is, a, duplicate
2, 34, not, a, duplicate
3, 99, 2, 3, 4
3, 00, 2, 3, 88
4, 11, 22, 33, 44
]
;
DupeIds:
Mapping
Load ID, Count(ID) as Count
Resident Data
Group By ID;

For idx = 1 to NoOfFields('Data')
Let vFname = FieldName($(idx), 'Data');
If '$(vFname)' <> 'ID' Then

TempDistinct:
// Make a Temp table instead of preceding because distinct
// is applied at end of LOAD
LOAD Distinct
ID, [$(vFname)]
Resident Data
Where ApplyMap('DupeIds', ID) > 1
;
DupeFields:
Load
ID, '$(vFname)' as DupeField
Where Count < ApplyMap('DupeIds', ID)
;
LOAD
ID,
'$(vFname)' as DupeField,
Count([$(vFname)]) as Count
Resident TempDistinct
Group by ID
;
Drop Table TempDistinct;
End if
Next idx

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

Rayk
Contributor II
Contributor II
Author

Hi Rwunderlich,

It worked with a bit of of tweak, but I already have a table with a duplicated IDs and this table tells me which table it is in, but I need to do your script for each table in the list.

here is an example of the table:

Error ,PrimaryKeyID ,Table
PK Duplication ,10045022 ,Order
PK Duplication ,10045037 ,Order
PK Duplication ,10046679 ,Order
PK Duplication ,ML_10036414/10109581 ,Order
PK Duplication ,ML_10037063/10111962 ,Order
PK Duplication ,ML_10038116/10114074 ,MasterListCSVprovisional
PK Duplication ,ML_10053793/10139362 ,Driver
PK Duplication ,ML_10053793/10139437 ,Driver
PK Duplication ,ML_10053793/10142502 ,Driver
PK Duplication ,ML_10111446/10111962 ,Driver
PK Duplication ,ML_10111446/10114074 ,Driver
PK Duplication ,ML_10139640/10109581 ,Driver
PK Duplication ,ML_10139640/10113870 ,Driver
PK Duplication ,ML_10139640/10113875 ,Driver

 

So, for each table in the field table I want it to do your script.

Hope this make sense.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

So you'll just load from the Resident variable table name instead of Resident Data. 

You can loop over the TableNames from in your dupes table like this:

For i = 1 to FieldValueCount('TableName')
Let vTableName = FieldValue('TableName', $(i);

// The field processing loop here using 
Resident [$(vTableName)];

Next i

-Rob