Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
In my script I need to extract from an excel field a substring that begins with the letters "PT" and is followed by 8 numeric characters. The source text field from which to extract the substring can have multiple occurrences of substring "PT....." but only one followed by 8 numeric characters.
Let me explain with an example:
source text string: "Support Automation PT7aW. Resolution PT8. Related Issues Activities PT63004910 Date% Event Log 11. Attachments"
The only string to extract is "PT63004910" and not "PT7aW" or "PT8".
Thanks for the help.
I would suggest using a regular expression here. Unfortunately, QV doesn't support regular expressions itself. But we could easily use a VBScript Macro for that and call that from script.
I just reused the Macro code posted here
http://www.qlikfix.com/2010/10/18/regular-expressions-in-the-load-script/
which just encapsulates the VB code for a regular expression quite nicely.
Then, the resulting script code just looks like
LOAD Source,
RegExFind(Source, 'PT[0-9]{8}',',',false()) as Result
INLINE [
Source
"Support Automation PT7aW. Resolution PT8. Related Issues Activities PT63004910 Date% Event Log 11. Attachments"
];
thank you so much!
Hi Stefan,
Is it possible to apply that same function if you just want to extract one specific string form multi rows?
Let's say I have text file where I just want to extract the string 'CIP', 'CPT' or 'EXWORKS' and name it as Terms,
The problem is that text is stored in serveral rows instead of one line or at one fixed position. Would it still be possible to apply the RegExFind function? Or should I look for another function/statement?
Hope you can help.
Carter
Maybe you don't even need the regular expression for this, simple searches can also be done with QV functions, e.g. Wildmatch(). Maybe like this:
INPUT:
LOAD
mid(filename(),Index(filename(),'PR_')+3,9) as OrderNo,
date(FileTime()) As DateCreated,
@1
FROM
[.\Commercial_ED_REL_GCPR_*.txt] // <<<<<< I've removed the rest of the filename in order to load multiple textfiles without adding them manualy.
(txt, codepage is 1252, no labels, delimiter is ';', msq);
TERMS:
LOAD OrderNo, DateCreated,
concat(@1) as Text,
pick(wildmatch(concat(@1),'*CIP*','*CPT*','*EXWORKS*'),'CIP','CPT','EXWORKS') as Terms
Resident INPUT group by OrderNo,DateCreated;
//drop table INPUT;
//drop field Text;
Hi Stefan,
Firstly my apologizies for not thank you for your help! I thought I replied to your last message.
I'm trying to extract a City name from an invoice. Unfortunatelly the invoice contains alot of gibberich. The city name happens to be between 'Delivery To: and ', 'Customer No:'
I first tried to extract the destintion between a specific code and two letters by using the following expression:
Trim(Capitalize(TextBetween(Text, pick(WildMatch(Text, '*CIP*'), 'CIP'), pick(wildmatch(Text, '*HS*', '*84*', '*TARIFF*'), 'HS', '84', 'TARIFF', ))))
Unfortunatelly not all the destination are mention between this two strings, but they are however between 'Delivery To' and 'Customer NO:'
So I tried the following expression:
Trim(Capitalize(TextBetween(Text, pick(WildMatch(Text, '*Delivery To:*', City), City), pick(wildmatch(Text, '*Customer No:*'), City ))))
But that loaded all of the Text :S
Do you know how to tackle this issue?
Please find my qvw file.
Thanks in advance
Carter
Could you point me to one of those OrderNo, please? As you said, there is lot of gibberich around...
And could you highlight / post the part of the OrderNo Text you are interested in?
Hi stefan,
I analysed the data before loading it in to qlikview. I've noticed that the original textfile, after ' %%[ ProductName: GPL Ghostscript ]%%' , starts with ' Commercial Invoice'.
When I load the text file in Qlikview, the file also starts with '%%[ ProductName: GPL Ghostscript ]%%',' but not followed by 'Commercial Invoice'.
Deliver To:// from this area
ell Eval/Demo AccountELL SAS CASANEARSHORE1100 BD EL QODS-sidi maarouf
CASABLANCA20150Morocco // in this case I need to extract 'CASABLANCA'
mehdi zbaidiCustomer No:/*till this area */
I've attached the original files, and converted them to html in order to point out in what text range i'm interessed in.
Since there are multiple 'Deliver to:' in the text file, i'm to take a part of the text that I pointed out and 'dump' everything else.
Hope you can help me.
Regards,
Carter
Message was edited by: Carter James
I've also noticed that the Text field shows not the same content is the original file.
The lines are out of order, since concat without a sort weight will sort the lines alphabetically.
But you can create a sort weight by creating a LineID field in your INPUT load
INPUT:
LOAD
@1,
recno() as LineID,
mid(filename(),Index(filename(),'PR_')+3,9) as OrderNo,
subfield(filename(), '_', 5) as Order,
mid(filename(),Index(filename(),'PR_')+3,2) as OrderNot,
//pick(wildmatch(concat(@1),'*CIP*','*CPT*','*EXWORKS*'),'CIP','CPT','EXWORKS') as Terms,
if(mid(filename(),index(filename(),'PR_')+3,2),'EDB','ROW') as Service,
date(FileTime()) As DateCreated
...
and then concat using the weight:
TERMS:
LOAD
OrderNo,
concat(@1,' ',LineID) as Text,
pick(WildMatch(Concat(@1,' ',LineID),'*CIP*','*CPT*','*EXWORKS*','*CIF*','*DDP*','*DAP*','*DDU*', '*FOC*', '*free of charge*', '*FCA*', '*CFR*'),
'CIP','CPT','EXWORKS','CIF','DDP','DAP','DDU', 'FOC', 'free of charge', 'FCA', 'CFR') as Term
Resident INPUT group by OrderNo;
Then I changed your third test expression to
=Trim(Capitalize(TextBetween(Text, 'Deliver To', 'Customer',1)))
(You don't really need the pick / match combinations here)
I think this will give you the text portions you were asking for, but I think it is still going to be pretty hard to extract the something like 'Casablanca' out of
: Dell Eval/Demo Accountdell Sas Casanearshore1100 Bd El Qods-Sidi Maarouf Casablanca20150morocco Mehdi Zbaidi
Please see also attached.
Regards,
Stefan
P.S: I assume that you really need to parse in the postscript files, it would be much easier and better to directly access the original data in a DB. Isn't there any possible way?
Can you see if this sample code works for you?
[Master Data]:
LOAD
RowNo () as master_row,
F1
INLINE [
F1
Support Automation PT7aW. Resolution PT8. Related Issues Activities PT63004910 Date% Event Log 11. Attachments
This is a test
Second PT test
PT12345678 third test
PT1234567 fourth PTABCDEFGH
];
[Child Data]:
NoConcatenate
LOAD
master_row as rowid,
F1 as temp_F1,
SubField (F1, ' ') as temp_F2
Resident
[Master Data];
Left Join ([Master Data])
LOAD
rowid as master_row,
temp_F2 as F2
Resident
[Child Data]
Where
(Mid (temp_F2, 1, 2) = 'PT') and
(Len (temp_F2) = 10) and
(IsNum (Mid (temp_F2, 3, 4)));
DROP TAble [Child Data];
DROP Field master_row;