Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Splitting up a large dataset

I have a client with a large (1.5 billion rows), somewhat-generic database that I need to use for several different projects.

I have a working QVW in place that has downloaded the database into QVD "slice" files generally 50 million records in size, and now each day the records added to the database are loaded into a new slice. For reference, the database has several million rows added to it every day.

The next step is to process these time-based "slices" into a second set of QVD files based on one field (I'll call it "ProjectCode" here.)

I have a system in place that does this too, but I'm looking to make it more efficient if possible. The net effect I'm going for is for it to take a Slice#####.qvd file and append the data from that file onto Project1.qvd, Project2.qvd, Project3.qvd etc. based on the value of the ProjectCode field.

What I'd really like is a where clause in the Store statement!

What I'm doing right now is to load the Slice#####.qvd file to a temp table, load a table of distinct ProjectCodes that are present in the slice, loop through each ProjectCode and read its QVD, do a resident load of the slice where the ProjectCode is the one I'm currently looking at, join that to the table I read in from the ProjectCode QVD, store the full set of records back to the ProjectCode QVD, move on to the next ProjectCode in the list, etc.

What gets slow is that I have to do that resident load over and over again with a different where clause for each ProjectCode in the table. So, what I'm essentially doing is this:

Load 8MillionRecords;

Load Distinct ProjectCode Resident 8MillionRecords;

For Each ProjectCode

     Load ProjectCode1 from ProjectCode1.qvd;

     Concatenate (ProjectCode1) Load * Resident 8MillionRecords Where ProjectCode = ProjectCode1;

     Store * From ProjectCode1 into ProjectCode1.qvd;

     Drop Table ProjectCode1;

Next ProjectCode;

Is there any faster way to split things up into ProjectCodes than the resident load loop? Even with 8 million records, some of the ProjectCodes may only have a handful of records.

1 Reply
Gysbert_Wassenaar

I've been told that optimized loads from qvd can be faster than using a resident load. What you could try is create a temporary table with only one projectcode value and use a where exists on the load from the qvd so it still loads optimized. See this blog post for more information on optimized qvd loads.


talk is cheap, supply exceeds demand