Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Voyager_RVA
Contributor II
Contributor II

Initial Data Load Too Large - alternatives???

I'm new to Qlik sense - 

I'm a new Qlik Sense business analysis and I'm building a Qlik Sense app for analysis to replace excel VBA code to pull historical purchases for one or more items - IT will no longer support the excel VBA :(. My organization is large and the total number of records in my PO data warehouse exceeds over well over 100M records. Requirement is that I use Qlik Sense to pull the same data as the excel VBA code.

My only search criteria is item number. There is no limit to time, geography or entity. 

When i build the SQL to the data warehouse, I write the SQL in the Data Load Editor and it pulls all PO data... over 100M+ records. Couple of challenges:

1/ When Qlik Sense users attempt to use the app, they often run into Out of Memory errors.

2/ When the App works, it takes more than 20min for the sheet to display and users can enter item numbers. 

Questions:

1/ is this expected behavior for Qlik Sense to frequently result in errors for such large data sets? I filtered the data sets to return about 50M records and it still results in error, sometimes. 

2/ Is there a way in Qlik Sense to allow users to enter the item numbers BEFORE the initial data load and have only the required item number PO history be pulled? 

3/ I tried building a QVD file but it takes 20 min to load the data then another 30 minutes to index. Waiting nearly an hour to run a report doesn't seem to make sense. 

What other options, if any, do i have in Qlik Sense to limit the initial data load? 

Labels (1)
1 Solution

Accepted Solutions
marcus_sommer

The error seems to be coming from your machine by running out of RAM - a look within the task-manager should clearly show the consumption of the resources.

That you want to show all data doesn't mean that you couldn't divide them into n buckets during the load-part, for example with a loop like:

t: load date(addmonths(makedate(2022), recno() - 1), 'YYYYMM') as Period autogenerate 24;

for i = 0 to noofrows('t') - 1
   vFilter = peek('Period', $(i), 't');
   [$(vFilter)]: sql select * from MyTable where Period = $(vFilter);
   store [$(vFilter)] into [$(vFilter).qvd] (qvd);
   drop tables [$(vFilter)];
next

assuming that there is a period-field - in general each type of period-field could be addressed.

View solution in original post

10 Replies
marcus_sommer

The data-set isn't too large else the data-model and/or the UI isn't suitable. Ideally the data-model is a star-scheme with a single fact-table and n dimension-table all essential logic is implemented within the dimensional layer that just a few sum() and count() are enough in the UI. Even by 100M+ are response-times < 1 second possible.

What you are describing sounds like transferring a sql-scheme into Qlik which may cause multiple synthetic keys and/or circular loops and probably creating cartesian products within the UI.

Beside this you should consider incremental approach with sliced qvd's - maybe just to YYYYMM periods - and then loading only the new/changed records from the data-base.

Voyager_RVA
Contributor II
Contributor II
Author

Thank you for the quick response. I apologize for the basic and long winded questions. 

I have a single fact table that combines PO Headers and PO Lines. Only dimension tables are items and vendors. 

I may be missing something with QVDs: Incremental QVD's split by time doesn't work for me due my need to search the entire history by item. 

When attempting to load the entire data set, I pull the entire PO + Items data set (100M+ records). I get an error in Data Load Editor from the SQL statement stating: "General error;4 cannot allocate enough memory: search table error: [9] Memory allocation failed;exception 1000002: Allocation failed ;" - Perhaps you can help, is this Qlik error or something else?  

 

marcus_sommer

The error seems to be coming from your machine by running out of RAM - a look within the task-manager should clearly show the consumption of the resources.

That you want to show all data doesn't mean that you couldn't divide them into n buckets during the load-part, for example with a loop like:

t: load date(addmonths(makedate(2022), recno() - 1), 'YYYYMM') as Period autogenerate 24;

for i = 0 to noofrows('t') - 1
   vFilter = peek('Period', $(i), 't');
   [$(vFilter)]: sql select * from MyTable where Period = $(vFilter);
   store [$(vFilter)] into [$(vFilter).qvd] (qvd);
   drop tables [$(vFilter)];
next

assuming that there is a period-field - in general each type of period-field could be addressed.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Are you Selecting only the fields that you need? In might make sense to keep OrderHeader and OrderDetail in separate tables, 

If you decide to segment QVDs, here are some script patterns to do so.

https://qlikviewcookbook.com/2022/03/how-to-segment-qvd-files/

-Rob
http://www.easyqlik.com
http://masterssummit.com
http://qlikviewcookbook.com

Voyager_RVA
Contributor II
Contributor II
Author

Great - I will take a look at building a loop to pull the data. I will need to look into your code... still a noob learning Qlik :). 

Voyager_RVA
Contributor II
Contributor II
Author

Yes - I'm pulling only the fields I need. Unfortunately, I don't have any control over the database tables. 

thank you for the link - I will look into them to see how best to make this work!  

marcus_sommer

If your data-set contained fields with a high cardinality like timestamps you should consider to split them into dates and times. Also you may look on the fields again and if there are any record-id's they are seldom useful within the report-layer. Further avoiding any row-level formatting could also reduce the RAM footprint significantly.

Voyager_RVA
Contributor II
Contributor II
Author

Thank you for the further clarification - I'm not clear on the following:

"Further avoiding any row-level formatting could also reduce the RAM footprint significantly" - by this do you mean avoiding formatting in"

1 - the SQL select statement pulling the data or

2 - the LOAD statement or

3 - both? 

Then this seems to imply formatting and other functions should best be done when displaying data if memory use is a concern during loading? 

marcus_sommer

The mentioned formatting is related to Qlik and has the following meaning. If you load a timestamp which is appropriate interpreted by the interpretation-variable (usually listed at the beginning of the script) each single distinct value is a number of 8 byte and the formatting-information is stored within the header-metadata of this field. But if there is more as a single format this kind of information couldn't be stored as metadata else it would be stored on row-level per value and might be easily more than 20 extra Bytes per value.

By timestamps with usually millions of distinct values this could consume a lot of RAM. In this regard again the hint to split such values into dates and times with something like:

date(floor(Timestamp)) as Date,
time(frac(Timestamp)) as Time

to reduce the number of the distinct values.