Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Substring from text field from variable start position

Hi all,

In my script I need to extract from an excel field a substring that begins with the letters "PT" and is followed by 8 numeric characters. The source text field from which to extract the substring can have multiple occurrences of substring "PT....." but only one followed by 8 numeric characters.

Let me explain with an example:

source text string: "Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments"

The only string to extract is "PT63004910" and not "PT7aW" or "PT8".

Thanks for the help.

16 Replies
swuehl
MVP
MVP

I would suggest using a regular expression here. Unfortunately, QV doesn't support regular expressions itself. But we could easily use a VBScript Macro for that and call that from script.

I just reused the Macro code posted here

http://www.qlikfix.com/2010/10/18/regular-expressions-in-the-load-script/

which just encapsulates the VB code for a regular expression quite nicely.

Then, the resulting script code just looks like

LOAD Source,

RegExFind(Source, 'PT[0-9]{8}',',',false()) as Result

INLINE [

Source

"Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments"

];

Not applicable
Author

thank you so much!

Not applicable
Author

Hi Stefan,

Is it possible to apply that same function if you just want to extract one specific string form multi rows?

Let's say I have text file where I just want to extract the string 'CIP', 'CPT' or 'EXWORKS' and name it as Terms,

The problem is that text is stored in serveral rows instead of one line or at one fixed position. Would it still be possible to apply the RegExFind function? Or should I look for another function/statement?

Hope you can help.

Carter

swuehl
MVP
MVP

Maybe you don't even need the regular expression for this, simple searches can also be done with QV functions, e.g. Wildmatch(). Maybe like this:

INPUT:

LOAD

mid(filename(),Index(filename(),'PR_')+3,9) as OrderNo,

date(FileTime()) As DateCreated,

@1

FROM

[.\Commercial_ED_REL_GCPR_*.txt]  // <<<<<< I've removed the rest of the filename in order to load multiple textfiles without adding them manualy.

(txt, codepage is 1252, no labels, delimiter is ';', msq);

TERMS:

LOAD OrderNo, DateCreated,

concat(@1) as Text,

pick(wildmatch(concat(@1),'*CIP*','*CPT*','*EXWORKS*'),'CIP','CPT','EXWORKS') as Terms

Resident INPUT group by OrderNo,DateCreated;

//drop table INPUT;

//drop field Text;

Not applicable
Author

Hi Stefan,

Firstly my apologizies for not thank you for your help! I thought I replied to your last message.

I'm trying to extract a City name from an invoice. Unfortunatelly the invoice contains alot of gibberich. The city name happens to be between 'Delivery To: and ', 'Customer No:'

I first tried to extract the destintion between a specific code and two letters by using the following expression:

Trim(Capitalize(TextBetween(Text, pick(WildMatch(Text, '*CIP*'), 'CIP'), pick(wildmatch(Text, '*HS*', '*84*', '*TARIFF*'), 'HS', '84', 'TARIFF', ))))

Unfortunatelly not all the destination are mention between this two strings, but they are however between 'Delivery To' and 'Customer NO:'

So I tried the following expression:

Trim(Capitalize(TextBetween(Text, pick(WildMatch(Text, '*Delivery To:*', City), City), pick(wildmatch(Text, '*Customer No:*'), City ))))

But that loaded all of the Text :S

Do you know how to tackle this issue?

Please find my qvw file.

Thanks in advance

Carter

swuehl
MVP
MVP

Could you point me to one of those OrderNo, please? As you said, there is lot of gibberich around...

And could you highlight / post the part of the OrderNo Text you are interested in?

Not applicable
Author

Hi stefan,

I analysed the data before loading it in to qlikview. I've noticed that the original textfile, after ' %%[ ProductName: GPL Ghostscript ]%%' , starts with ' Commercial Invoice'.

When I load the text file in Qlikview, the file also starts with '%%[ ProductName: GPL Ghostscript ]%%',' but not followed by 'Commercial Invoice'.

Deliver To:// from this area

ell Eval/Demo AccountELL SAS CASANEARSHORE1100 BD EL QODS-sidi maarouf

CASABLANCA20150Morocco // in this case I need to extract 'CASABLANCA'

mehdi zbaidiCustomer No:/*till this area */

I've attached the original files, and converted them to html  in order to point out in what text range i'm interessed in.

Since there are multiple 'Deliver to:' in the text file, i'm to take a part of the text that I pointed out and 'dump' everything else.

Hope you can help me.

Regards,

Carter

Message was edited by: Carter James

swuehl
MVP
MVP

I've also noticed that the Text field shows not the same content is the original file.

The lines are out of order, since concat without a sort weight will sort the lines alphabetically.

But you can create a sort weight by creating a LineID field in your INPUT load

INPUT:

LOAD

@1,

recno() as LineID,

mid(filename(),Index(filename(),'PR_')+3,9) as OrderNo,

subfield(filename(), '_', 5) as Order,

mid(filename(),Index(filename(),'PR_')+3,2) as OrderNot,

//pick(wildmatch(concat(@1),'*CIP*','*CPT*','*EXWORKS*'),'CIP','CPT','EXWORKS') as Terms,

if(mid(filename(),index(filename(),'PR_')+3,2),'EDB','ROW') as Service,

date(FileTime()) As DateCreated

...

and then concat using the weight:

TERMS:

LOAD

OrderNo,

concat(@1,' ',LineID) as Text,

pick(WildMatch(Concat(@1,' ',LineID),'*CIP*','*CPT*','*EXWORKS*','*CIF*','*DDP*','*DAP*','*DDU*', '*FOC*', '*free of charge*', '*FCA*', '*CFR*'),

                          'CIP','CPT','EXWORKS','CIF','DDP','DAP','DDU', 'FOC', 'free of charge', 'FCA', 'CFR') as Term   

Resident INPUT group by OrderNo;


Then I changed your third test expression to

=Trim(Capitalize(TextBetween(Text, 'Deliver To', 'Customer',1)))

(You don't really need the pick / match combinations here)

I think this will give you the text portions you were asking for, but I think it is still going to be pretty hard to extract the something like 'Casablanca' out of

: Dell Eval/Demo Accountdell Sas Casanearshore1100 Bd El Qods-Sidi Maarouf Casablanca20150morocco Mehdi Zbaidi

Please see also attached.

Regards,

Stefan

P.S: I assume that you really need to parse in the postscript files, it would be much easier and better to directly access the original data in a DB. Isn't there any possible way?

Not applicable
Author

Can you see if this sample code works for you?

[Master Data]:

LOAD

          RowNo ()          as master_row,

          F1

INLINE [

    F1

    Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments

          This is a test

          Second PT test

          PT12345678 third test

          PT1234567 fourth PTABCDEFGH

];

[Child Data]:

NoConcatenate

LOAD

          master_row                              as rowid,

          F1                                                  as temp_F1,

          SubField (F1, ' ')          as temp_F2

Resident

          [Master Data];

 

Left Join ([Master Data])

LOAD

          rowid          as master_row,

          temp_F2          as F2

Resident

          [Child Data]

Where

          (Mid (temp_F2, 1, 2) = 'PT')          and

          (Len (temp_F2) = 10)                              and

          (IsNum (Mid (temp_F2, 3, 4)));

DROP TAble [Child Data];

DROP Field master_row;