16 Replies Latest reply: Feb 20, 2012 4:31 AM by qliksus RSS

    Substring from text field from variable start position

    qlikvuser

      Hi all,

      In my script I need to extract from an excel field a substring that begins with the letters "PT" and is followed by 8 numeric characters. The source text field from which to extract the substring can have multiple occurrences of substring "PT....." but only one followed by 8 numeric characters.

      Let me explain with an example:

      source text string: "Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments"

      The only string to extract is "PT63004910" and not "PT7aW" or "PT8".

      Thanks for the help.

        • Re: Substring from text field from variable start position
          swuehl

          I would suggest using a regular expression here. Unfortunately, QV doesn't support regular expressions itself. But we could easily use a VBScript Macro for that and call that from script.

           

          I just reused the Macro code posted here

          http://www.qlikfix.com/2010/10/18/regular-expressions-in-the-load-script/

          which just encapsulates the VB code for a regular expression quite nicely.

           

           

          Then, the resulting script code just looks like

           

          LOAD Source,

          RegExFind(Source, 'PT[0-9]{8}',',',false()) as Result

          INLINE [

          Source

          "Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments"

          ];

            • Re: Substring from text field from variable start position
              qlikvuser

              thank you so much!

              • Re: Substring from text field from variable start position
                cartierr

                Hi Stefan,

                 

                Is it possible to apply that same function if you just want to extract one specific string form multi rows?

                Let's say I have text file where I just want to extract the string 'CIP', 'CPT' or 'EXWORKS' and name it as Terms,

                The problem is that text is stored in serveral rows instead of one line or at one fixed position. Would it still be possible to apply the RegExFind function? Or should I look for another function/statement?

                 

                 

                 

                Hope you can help.

                 

                Carter

                  • Re: Substring from text field from variable start position
                    swuehl

                    Maybe you don't even need the regular expression for this, simple searches can also be done with QV functions, e.g. Wildmatch(). Maybe like this:

                     

                    INPUT:

                    LOAD

                    mid(filename(),Index(filename(),'PR_')+3,9) as OrderNo,

                    date(FileTime()) As DateCreated,

                    @1

                    FROM

                    [.\Commercial_ED_REL_GCPR_*.txt]  // <<<<<< I've removed the rest of the filename in order to load multiple textfiles without adding them manualy.

                    (txt, codepage is 1252, no labels, delimiter is ';', msq);

                     

                    TERMS:

                    LOAD OrderNo, DateCreated,

                    concat(@1) as Text,

                    pick(wildmatch(concat(@1),'*CIP*','*CPT*','*EXWORKS*'),'CIP','CPT','EXWORKS') as Terms

                    Resident INPUT group by OrderNo,DateCreated;

                     

                    //drop table INPUT;

                    //drop field Text;

                      • Re: Substring from text field from variable start position
                        cartierr

                        Hi Stefan,

                         

                        Firstly my apologizies for not thank you for your help! I thought I replied to your last message.

                         

                        I'm trying to extract a City name from an invoice. Unfortunatelly the invoice contains alot of gibberich. The city name happens to be between 'Delivery To: and ', 'Customer No:'

                         

                        I first tried to extract the destintion between a specific code and two letters by using the following expression:

                         

                        Trim(Capitalize(TextBetween(Text, pick(WildMatch(Text, '*CIP*'), 'CIP'), pick(wildmatch(Text, '*HS*', '*84*', '*TARIFF*'), 'HS', '84', 'TARIFF', ))))

                         

                        Unfortunatelly not all the destination are mention between this two strings, but they are however between 'Delivery To' and 'Customer NO:'

                         

                        So I tried the following expression:

                         

                        Trim(Capitalize(TextBetween(Text, pick(WildMatch(Text, '*Delivery To:*', City), City), pick(wildmatch(Text, '*Customer No:*'), City ))))

                         

                        But that loaded all of the Text :S

                         

                        Do you know how to tackle this issue?

                         

                        Please find my qvw file.

                         

                        Thanks in advance

                         

                        Carter

                          • Re: Substring from text field from variable start position
                            swuehl

                            Could you point me to one of those OrderNo, please? As you said, there is lot of gibberich around...

                            And could you highlight / post the part of the OrderNo Text you are interested in?

                              • Re: Substring from text field from variable start position
                                cartierr

                                Hi stefan,

                                 

                                I analysed the data before loading it in to qlikview. I've noticed that the original textfile, after ' %%[ ProductName: GPL Ghostscript ]%%' , starts with ' Commercial Invoice'.

                                When I load the text file in Qlikview, the file also starts with '%%[ ProductName: GPL Ghostscript ]%%',' but not followed by 'Commercial Invoice'.

                                 

                                Deliver To:// from this area

                                ell Eval/Demo AccountELL SAS CASANEARSHORE1100 BD EL QODS-sidi maarouf

                                CASABLANCA20150Morocco // in this case I need to extract 'CASABLANCA'

                                mehdi zbaidiCustomer No:/*till this area */

                                 

                                 

                                 

                                I've attached the original files, and converted them to html  in order to point out in what text range i'm interessed in.

                                Since there are multiple 'Deliver to:' in the text file, i'm to take a part of the text that I pointed out and 'dump' everything else.

                                 

                                Hope you can help me.

                                 

                                Regards,

                                 

                                Carter

                                 

                                Message was edited by: Carter James

                                  • Re: Substring from text field from variable start position
                                    swuehl

                                    I've also noticed that the Text field shows not the same content is the original file.

                                    The lines are out of order, since concat without a sort weight will sort the lines alphabetically.

                                    But you can create a sort weight by creating a LineID field in your INPUT load

                                     

                                    INPUT:

                                    LOAD

                                    @1,

                                    recno() as LineID,

                                    mid(filename(),Index(filename(),'PR_')+3,9) as OrderNo,

                                    subfield(filename(), '_', 5) as Order,

                                    mid(filename(),Index(filename(),'PR_')+3,2) as OrderNot,

                                    //pick(wildmatch(concat(@1),'*CIP*','*CPT*','*EXWORKS*'),'CIP','CPT','EXWORKS') as Terms,

                                    if(mid(filename(),index(filename(),'PR_')+3,2),'EDB','ROW') as Service,

                                    date(FileTime()) As DateCreated

                                    ...

                                     

                                    and then concat using the weight:

                                     

                                    TERMS:

                                    LOAD

                                    OrderNo,

                                    concat(@1,' ',LineID) as Text,

                                    pick(WildMatch(Concat(@1,' ',LineID),'*CIP*','*CPT*','*EXWORKS*','*CIF*','*DDP*','*DAP*','*DDU*', '*FOC*', '*free of charge*', '*FCA*', '*CFR*'),

                                                              'CIP','CPT','EXWORKS','CIF','DDP','DAP','DDU', 'FOC', 'free of charge', 'FCA', 'CFR') as Term   

                                    Resident INPUT group by OrderNo;


                                    Then I changed your third test expression to

                                    =Trim(Capitalize(TextBetween(Text, 'Deliver To', 'Customer',1)))

                                     

                                    (You don't really need the pick / match combinations here)

                                    I think this will give you the text portions you were asking for, but I think it is still going to be pretty hard to extract the something like 'Casablanca' out of

                                     

                                    : Dell Eval/Demo Accountdell Sas Casanearshore1100 Bd El Qods-Sidi Maarouf Casablanca20150morocco Mehdi Zbaidi

                                     

                                    Please see also attached.

                                     

                                    Regards,

                                    Stefan

                                     

                                    P.S: I assume that you really need to parse in the postscript files, it would be much easier and better to directly access the original data in a DB. Isn't there any possible way?

                                      • Re: Substring from text field from variable start position
                                        Ed Hilsinger

                                        Can you see if this sample code works for you?

                                         

                                        [Master Data]:

                                        LOAD

                                                  RowNo ()          as master_row,

                                                  F1

                                        INLINE [

                                            F1

                                            Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments

                                                  This is a test

                                                  Second PT test

                                                  PT12345678 third test

                                                  PT1234567 fourth PTABCDEFGH

                                        ];

                                         

                                         

                                        [Child Data]:

                                        NoConcatenate

                                        LOAD

                                                  master_row                              as rowid,

                                                  F1                                                  as temp_F1,

                                                  SubField (F1, ' ')          as temp_F2

                                        Resident

                                                  [Master Data];

                                         

                                         

                                         

                                        Left Join ([Master Data])

                                        LOAD

                                                  rowid          as master_row,

                                                  temp_F2          as F2

                                        Resident

                                                  [Child Data]

                                        Where

                                                  (Mid (temp_F2, 1, 2) = 'PT')          and

                                                  (Len (temp_F2) = 10)                              and

                                                  (IsNum (Mid (temp_F2, 3, 4)));

                                         

                                         

                                         

                                         

                                        DROP TAble [Child Data];

                                        DROP Field master_row;

                                        • Substring from text field from variable start position
                                          cartierr

                                          Hi Stefan,

                                           

                                          Thanks for you help stefan.

                                          You are right! I think this is to difficult to extract a city name. What I wanted to do is first select text between to words in this case:

                                          Trim(Capitalize(TextBetween(Text, 'Deliver To', 'Customer',1))) as DestinationTest,

                                           

                                          Then use something like:

                                           

                                          =Pick(Wildmatch(DestinationTest, *City*), City)

                                           

                                          City in this case in not a string but a fieldname that contains all the cities. I wanted to use the wildmatch function to extract the City that matches the field City.


                                          So if the field City, contains ''Casablcana" that matches in the DestinationTest, it would pick that City name.

                                          I tried it but it doesn't work.

                                           

                                          But thanks for all your help!!! I really appriciate it.

                                          I wish they had the city names in the database, it would have been much easier..

                                            • Re: Substring from text field from variable start position
                                              swuehl

                                              I don't think that you can use a field in Wildmatch (well, no, you probably could using a concat technique, but let's keep that for a later time).

                                              You could probably use a mapping table together with mapsubstring function to search and map your city names in case you have all citynames available:

                                               

                                              CITYMAP:

                                              MAPPING LOAD * INLINE [

                                              F1, F2

                                              ALMATY, §1Almaty§2

                                              CASABLANCA, §1Casablanca§2

                                              ];

                                               

                                              ...

                                               

                                              Destination:

                                              Load *,

                                              TextBetween(MapSubString('CITYMAP',DestTest1),'§1','§2') as DestTest1Result;

                                              Load

                                              *,

                                              Trim(Capitalize(TextBetween(Text,pick(WildMatch(Text, '*CIP*'), 'CIP'), pick(wildmatch(Text, '*HS*', '*84*', '*TARIFF*'), 'HS', '84', 'TARIFF', )))) as Destination,

                                              Trim(UPPER(TextBetween(Text, 'Deliver To', 'Customer',1))) as DestTest1,

                                              TextBetween(Text, pick(WildMatch(Text, '*Deliver To:*'), 'Deliver To:'), pick(WildMatch(Text,'*Page 1*'), 'Page 1')) as TestDest

                                              ...

                                               

                                              The Citymap mapping table could be loaded from any source, read the city field one time in in UPPER case, one time concatenated with the two delimiters (which are arbitrary).

                                               

                                              See also attached.

                                               

                                              Regards,

                                              Stefan

                              • Substring from text field from variable start position
                                qliksus

                                Hi ,

                                 

                                Try this approach if you dont want to use macro

                                 

                                test1:
                                load SubStringCount( replace(KeepChar(a,'PT0123456789')&'-','PT','-PT'),'-') as new1 ;
                                LOAD * INLINE [
                                a
                                Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments ];

                                 

                                let loopval = peek('new1',0,'test1') ;

                                 

                                test:
                                LOAD * INLINE [
                                a
                                Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments ];


                                for i = 0 to $(loopval)

                                load

                                left(if ( len(textbetween( replace(KeepChar(a,'PT0123456789')&'-','PT','-PT'),'-','-' ,$(i)))>=10 ,

                                textbetween( replace(KeepChar(a,'PT0123456789')&'-','PT','-PT'),'-','-' ,$(i)) ),10) as new
                                 
                                resident test ;

                                NEXT