Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
lbunnell
Creator
Creator

Grouping Misspelled or Duplicate Names By ID

Can someone show me how I can group names that may be misspelled or otherwise duplicated in a table due to inclusion of middle initials based on a common User ID number so that they can be aggregated in a table or chart?

1 Solution

Accepted Solutions
MK_QSL
MVP
MVP

or even like below

Load

  Name as OriginalName,

  Capitalize(SubField(Name,' ',1)) & ' ' & Capitalize(SubField(Name, ' ',-1)) as NewName

Inline

[

  Name

  Tom Smith

  tom L. smith

  Tom  smith

];

View solution in original post

9 Replies
MK_QSL
MVP
MVP

provide some example of such names to work

hic
Former Employee
Former Employee

I would use a mapping table that lists different variants of the same name. See more on Data Cleansing.

HIC

sunny_talwar

I think you will need to normalize the data using a master normalize file. Once you have the normalized file, you can use a Mapping Load with ApplyMap() function to get it working.

lbunnell
Creator
Creator
Author

Tom Smith

Tom L. Smith

Tom  Smith (double space between first and last name)

All of the above have the same User ID

lbunnell
Creator
Creator
Author

I could see that working with a small number of records for Country, but there are thousands of names in the list. To create a mapping table for each name would be cumbersome. Is there not a way to group the names based on ID or am I thinking about this the wrong way?

MK_QSL
MVP
MVP

try something like below

Load

  Name as OriginalName,

  SubField(Name,' ',1) & ' ' & SubField(Name, ' ',-1) as NewName

Inline

[

  Name

  Tom Smith

  Tom L. Smith

  Tom  Smith

];

MK_QSL
MVP
MVP

or even like below

Load

  Name as OriginalName,

  Capitalize(SubField(Name,' ',1)) & ' ' & Capitalize(SubField(Name, ' ',-1)) as NewName

Inline

[

  Name

  Tom Smith

  tom L. smith

  Tom  smith

];

lbunnell
Creator
Creator
Author

I couldn't tell at first from the way example formatted, but I just needed to put a space between the single quotes  ' ' and it worked.

Thanks!

hic
Former Employee
Former Employee

No, you can group the names based on ID. Then you could, for example, start by loading the IDs and the names in one separate table, e.g.

Load

     ID,

     Name

     From <ListOfNamesAndIDs>

     Where Not Exists(ID);

The Where-clause will effectively stop any duplicate names from being loaded. This way you will get only one name per ID. It be the first name found.

Then you can proceed loading all other tables - with the field "ID", but without the field "Name".

HIC