Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Import files one at a time, check for fields, and overwrite existing files

     Hi, I need some help automating ETL code to run on multiple files in a folder, one at a time.

I have over a year's worth of data broken down into monthly files.  We've been building these files over the year and the script has occasionally changed, causing some of these files to randomly have 2 extra fields.  I need all the files to have these extra fields.  The files either contain both or neither, and the fields are just a calculation from existing fields so no extra data is needed.   I need the monthly dated file structure to remain the same – this script is just to add the fields if necessary.

This is across ~90 files with different naming conventions and fields, but the two I am trying to add are consistent across all files..  All are in one folder. 

Filename syntax eg.

   AgentSourceData_20141101.qvd

   AgentSourceData_20141201.qvd

   TeamSourceData_20151101.qvd

   TeamSourceData_20151201.qvd

   CentreSourceData_20151201.qvd

I need to be able to run a fairly automated script that:

   Import a file,

   Check if it has these extra columns

          if true, discard data and import next file.

                If false, add the columns

                Export to the same filename, overwriting the existing file.

                Discard data

    Import the next file

    etc.

I’ve written the code below which works, but obviously only on one specific file at a time.  I need to know how to make it loop through all the files in the folder without me having to specify each one individually.

Test1:

Load *

FROM  [AgentSourceData_20151101.qvd](qvd);

IF FieldNumber('DateYear','Test1')=0 then

Test2:

  Load *

   ,Year(Date) as "DateYear"  

   ,Month(Date) as "DateMonth"

  Resident Test1;

   Store Test2 into [AgentSourceData_20151101.qvd];

     Drop Table Test2;

     Drop table Test1;

Else

    Drop table Test1;

Endif;

Test1:

Load *

FROM  [AgentSourceData_20151201.qvd](qvd);

etc...

Hope that makes sense.

Cheers,


Dave

1 Solution

Accepted Solutions
marcus_sommer

For such things you could use filelist to loop through your folders and then make your checks on topics like filetime/filesize or the contained fields whereby it's enough to read only one record with the first-statement and by qvd-files you could also use the file-functions like qvdfieldname() and qvdnooffields(). If you searched on filelist you will find many examples like this one: Re: Pick the latest excel file from folders

- Marcus

View solution in original post

5 Replies
Kushal_Chawda

You want to load all the QVDs in folder and want to create the Year & Month from file name?

marcus_sommer

For such things you could use filelist to loop through your folders and then make your checks on topics like filetime/filesize or the contained fields whereby it's enough to read only one record with the first-statement and by qvd-files you could also use the file-functions like qvdfieldname() and qvdnooffields(). If you searched on filelist you will find many examples like this one: Re: Pick the latest excel file from folders

- Marcus

Anonymous
Not applicable
Author

Thanks Marcus.  

This was my finalised working script.  I had to add some additional logic to look using the LEN function to look into the columns if they existed and recreate if there was no data in there as we had some files with the column headings but no data.

This is working, but any suggestions for optimisation / improvement are welcome.

For each vFileName in Filelist ('$(SDL_Directory)*')

Test1:

Load *

from [$(vFileName)] (qvd);

  IF FieldNumber('DateYear','Test1')=0 then

  Test2:

  Load *

   ,Year(Date) as "DateYear" 

   ,Month(Date) as "DateMonth"

  Resident Test1;

     Store Test2 into [$(vFileName)];

     Drop Table Test2;

     Drop table Test1;

  Else

    IF len(peek('DateYear',0,'Test1'))  =0 then

  drop fields DateYear, DateMonth;

  Test2:

  Load *

   ,Year(Date) as "DateYear" 

   ,Month(Date) as "DateMonth"

  Resident Test1;

     Store Test2 into [$(vFileName)];

    

     Drop Table Test2;

     Drop table Test1;

  ELSE

  Drop table Test1;

   ENDIF;

  Endif;

ENDIF;

  Next vFileName;

Cheers,


Dave

marcus_sommer

A possible optimization might be to load only the first record from Test1:

Test1:

First 1 Load ...

and then the second load Test2 to load again from the qvd. But you will need larger datasets to see here noticable differences.

But a bigger improvement could be to skip these task completely. Then it's often not necessary to hold a month/year field within a table if you have a date which could be later connected with a master-calendar which provides all needed period-fields.

- Marcus

Anonymous
Not applicable
Author

Great advice on the 1st row.  Some of these QVDs are huge, that will save significant time.

Without going into the boring detail, the month/year columns are needed for an daily automated extraction script.   When they go through an ETL script once a week to bring into our dashboard these columns are indeed dropped and linked to a master calendar.

Cheers again.