Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
pedromsouza
Creator
Creator

Does it make sense to use distinct in mapping load?

Does it make sense to use distinct in mapping load? Wouldn't it load distinct by default ? Does it make any difference in performance?

Carga_Mapa_Competencia:
mapping load
distinct
k_smm, ano_mes_atendido
Resident Smm;

Qliking since '09
Labels (1)
5 Replies
Or
MVP
MVP

It would not load distinct by default. Barring extremely large tables, it's not going to impact performance to a significant degree either way. If you're dealing with extremely large tables, try it both ways and see which runs faster.

BrunPierre
Partner - Master
Partner - Master

Hi, The "Mapping LOAD" statement does not inherently load distinct values by default. Using "distinct" can ensure unique combinations of fields in the loaded data.

marcus_sommer

A mapping table is not intended to be distinct. Like the function LOOKUP in Excel it will return the value from the first match - and this could be used to load multiple return-values for a lookup-value in a hierarchically priority just by concatenating multiple sources in a certain order and/or loading the data with an appropriate order by statement, for example if for an ID exists multiple names and you want to ensure that always the newest entry is taken.

Or
MVP
MVP

But in that example... the values wouldn't be distinct. They'd be different?

Near as I can tell, there is no impact to speak of with using Distinct or not since ApplyMap() will just grab the first value that matches. Both calculating Distinct and running over a slightly larger mapping table make little to no difference with performance, at least up to reasonable table sizes. I didn't check for anything absurdly large (billions of values) because I can't imagine that comes up very often.

marcus_sommer

In regard to the performance it's not important if a mapping-load is distinct or not. I never tried to measure any differences for these scenarios especially as I believe that the mapping isn't an access to a data-table else to a system-table which has in general only distinct field-values.

The reason for it is that I'm using a lot of vertically and horizontally nested mapping-tables within multiple nested applymap() with sometimes many millions of records and the performance is usually very reasonable and far better as with all other possibilities. I get rather a RAM issue by keeping multiple of such large mapping-tables too long within the RAM ...