Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Can someone show me how I can group names that may be misspelled or otherwise duplicated in a table due to inclusion of middle initials based on a common User ID number so that they can be aggregated in a table or chart?
or even like below
Load
Name as OriginalName,
Capitalize(SubField(Name,' ',1)) & ' ' & Capitalize(SubField(Name, ' ',-1)) as NewName
Inline
[
Name
Tom Smith
tom L. smith
Tom smith
];
provide some example of such names to work
I would use a mapping table that lists different variants of the same name. See more on Data Cleansing.
HIC
I think you will need to normalize the data using a master normalize file. Once you have the normalized file, you can use a Mapping Load with ApplyMap() function to get it working.
Tom Smith
Tom L. Smith
Tom Smith (double space between first and last name)
All of the above have the same User ID
I could see that working with a small number of records for Country, but there are thousands of names in the list. To create a mapping table for each name would be cumbersome. Is there not a way to group the names based on ID or am I thinking about this the wrong way?
try something like below
Load
Name as OriginalName,
SubField(Name,' ',1) & ' ' & SubField(Name, ' ',-1) as NewName
Inline
[
Name
Tom Smith
Tom L. Smith
Tom Smith
];
or even like below
Load
Name as OriginalName,
Capitalize(SubField(Name,' ',1)) & ' ' & Capitalize(SubField(Name, ' ',-1)) as NewName
Inline
[
Name
Tom Smith
tom L. smith
Tom smith
];
I couldn't tell at first from the way example formatted, but I just needed to put a space between the single quotes ' ' and it worked.
Thanks!
No, you can group the names based on ID. Then you could, for example, start by loading the IDs and the names in one separate table, e.g.
Load
ID,
Name
From <ListOfNamesAndIDs>
Where Not Exists(ID);
The Where-clause will effectively stop any duplicate names from being loaded. This way you will get only one name per ID. It be the first name found.
Then you can proceed loading all other tables - with the field "ID", but without the field "Name".
HIC