Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a fact table data in daily QVD's (e.g. Traffic_20170827.qvd). Each file has multiple "entity_name_raw" and each day there are some new entity_name_raw. I need to assign a unique ID to each entity_name_raw.
I am using the below script to do that. Although this script runs fine on most of the days but sometimes a different ID is assigned to a entity_name_raw each day the script runs.
Entity_Bigpond_TID:
LOAD
Distinct
entity_name_raw
FROM
[$(vL.QVDDestPath)Traffic_*.qvd]
(qvd);
Entity_temp:
Load
entity_name_raw,
text('E_BT' & num(AutoNumber(entity_name_raw, 'auto1'), '000000000')) as entity_id
Resident Entity_Bigpond_TID;
Can anyone suggest a better way to achieve this?
Regards,
Shashank
I have created a sample for you...
Day 1
Table:
LOAD Entity
FROM
FirstFile.xlsx
(ooxml, embedded labels, table is Sheet1);
UniqueID:
LOAD Entity,
If(Entity = Previous(Entity), Peek('RowNo'), RangeSum(Peek('RowNo'), 1)) as RowNo
Resident Table
Order By Entity;
STORE UniqueID into UniqueID.qvd (qvd);
You run your main fact table and create a new table UniqueID where you create the ID (I have used the name RowNo)
Day 2 and Onwards (In the sample I have did Day2 and Day3... but the script is the same)
UniqueID:
LOAD Entity as EntityCheck,
RowNo
FROM UniqueID.qvd (qvd);
MaxID:
LOAD Max(RowNo) as Max
Resident UniqueID;
LET vMaxID = Peek('Max');
TRACE $(vMaxID);
DROP Table MaxID;
Table:
LOAD Entity
FROM
SecondFile.xlsx
(ooxml, embedded labels, table is Sheet1);
Temp:
NoConcatenate
LOAD Entity as EntityCheck,
If(Entity = Previous(Entity), Peek('RowNo'), RangeSum(Alt(Peek('RowNo'), $(vMaxID)), 1)) as RowNo
Resident Table
Where not Exists(EntityCheck, Entity)
Order By Entity;
FinalUniqueID:
LOAD EntityCheck as Entity,
RowNo
Resident UniqueID;
DROP Table UniqueID;
Concatenate(FinalUniqueID)
LOAD EntityCheck as Entity,
RowNo
Resident Temp;
STORE FinalUniqueID into UniqueID.qvd (qvd);
RENAME Table FinalUniqueID to UniqueID;
DROP Table Temp;
So, basically Day 1 is needed one time to initiate the qvd building process. Once created, you will always need to run Day 2 and onward script...
In the sample just run the script for one day at a time and comment the other days and start with Day 1.
Best,
Sunny
Although this script runs fine on most of the days but sometimes a different ID is assigned to a entity_name_raw each day the script runs.
I am not sure I understand this? A different ID is assigned to a entity_name_raw each day? This happens only sometimes? Can you elaborate of may be show some examples?
An alternative might be this, but not sure if this will address the above issue or not
Entity_Bigpond_TID:
LOAD
Distinct
entity_name_raw
FROM
[$(vL.QVDDestPath)Traffic_*.qvd]
(qvd);
Entity_temp:
LOAD *,
text('E_BT' & num(RowNo, '000000000')) as entity_id;
LOAD
entity_name_raw,
If(entity_name_raw = Previous(entity_name_raw), Alt(Peek('RowNo'), 1), RangeSum(Peek('RowNo'), 1)) as RowNo
Resident Entity_Bigpond_TID
Order By entity_name_raw;
Hi Sunny,
Look at the screenshot below. Until 20178/20, E_CM000000001 was assigned to a particular Entity and it changed later.
Did you try the approach I provided above?
Do you mean that a different ID is assigned by Autonumber? Autonumber() starts from scratch at each load and it numbers on the load order sequence. If your source data changes the load order, then the autonumber will change as well.
That's true Jonathan. My assumption was that when I am loading the data from QVD's using Traffic_*.qvd method the new entities would automatically be appended in the last.
Correct me if I am wrong, but my understanding is that all the distinct entity_name_raw from the first QVD would be loaded in memory first and then all the new entity_name_raw from other QVD's (which are not present in the first QVD) would be appended at the end.
Can you help me understand how load using *.qvd works.
Hi Sunny,
I reloaded all my data today using your code and it seems to be working fine at the moment.
I will keep you posted.
Thanks a lot for the help. Appreciate it!!
Hi Shashank,
in order to avoid collisions like this try using AutoNumberHash, instead of AutoNumber.
The key difference here is that AutoNumberHash is deterministic, and thus should avoid assigning the same result to more than one row.
https://community.qlik.com/message/683831
Marcus
Reading through this does make sense. Thanks Marcus.
I think this should be the correct solution.
At the moment, Sunny's code is working fine for me but I will test the AutoNumberHash() funtion as well.
Hi Guys,
Both the approaches failed today (Sunny's and Marcus') because both of them are dependent on the Load Order of the entity_name_raw from the QVD files. The entity_name_raw must be in a certain fixed order with new ones added at the end for both the methods to work correctly.
To my surprise, QV doesn't keep the load order from the original files and loads them in random order sometimes (which is strange).
Can someone explain how the ordering works while loading the data and why does QV shuffle the load order?