Solved: Permutations Math - Qlik Community

richardouellett · ‎2015-06-18

I have created a field called 'Sequence' a 7 character (Upper Case) text field that is made up from the sequence (ascending order) of 7 timestamps that are captured for each record transaction. The timestamps are:

O = Ordered product

G = List product

L = Label Product

C = Collect

A = Arrive Product

P = Prepare Product

R = Report on Product

Depending on the sequence order they are performed a typical Sequence field could look like OGLCAPR or OLGCAPR. My analysis thus far is displayed below:

I now want to perform this time analysis above based on distinct combinations of 2 , i.e. Permutations without repetition (n=7, r=2)

List has 42 maximum entries.

OL OG OC OA OP OR LO LG LC LA LP LR GO GL GC GA GP GR CO CL CG CA CP CR AO AL AG AC AP AR PO PL PG PC PA PR RO RL RG RC RA RP

If I take the same table above and apply the MID() function to the Sequence filed I would get the results below.

Challenge/Question?

I now want to perform an analysis similar to the above on unique timestamp permutations for example 'OG' occurred 646 + 23 + 5 = 474 times.and PR occurred 905 times. (In the above example I want to analyze the performance of OG regardless of it being performed in step 1 or step 2.) The new analysis would consists of unique permutations, e.g., OG, PR, LO with corresponding Count, % and throughput statistics. Hope this all makes sense. How can this be accomplished in QlikView?

swuehl · ‎2015-06-18

If I've understood correctly, I think your interval calculations can be simplified if you perform them same place you calculate your sequence:

DATA:
LOAD
  recno() as ID,
  Timestamp(today() + RAND()) as O,
  Timestamp(today() + RAND()) as G,
  Timestamp(today() + RAND()) as L,
  Timestamp(today() + RAND()) as C,
  Timestamp(today() + RAND()) as A,
  Timestamp(today() + RAND()) as P,
  Timestamp(today() + RAND()) as R
AutoGenerate 20;

X:
CrossTable (Field, Time) LOAD * RESIDENT DATA;

LEFT JOIN (DATA)
LOAD
  ID,
  Concat(Field, '',Time) as Sequence,
  (min(Time,7)-min(Time,1))*24 as i1_7,
  (min(Time,2)-min(Time,1))*24 as i1_2,
  (min(Time,3)-min(Time,2))*24 as i2_3,
  (min(Time,4)-min(Time,3))*24 as i3_4,
  (min(Time,5)-min(Time,4))*24 as i4_5,
  (min(Time,6)-min(Time,5))*24 as i5_6,
  (min(Time,7)-min(Time,6))*24 as i6_7   
Resident X
Group BY ID;

drop table X;

LOAD Sequence, mid(Sequence,IterNo(),2) as Permut, 1 as Count
RESIDENT DATA
WHILE iterno() <= 6;

I also added the link table to your permutations (and not shown here, an INLINE table of all permutations, see attached QVW).

Does this allow to perform the analysis you want to do?

View solution in original post

sunny_talwar · ‎2015-06-18

Wow this is some high end stuff you are doing here (at least seems like that). To tell you the truth I don't understand the question well, but would love to know what you have already done (see it in a qvw) and then may be try to understand what you are looking to do and may be propose a solution ultimately (there is no certainty that I will be able to provide a solution, but I would still love to give it a try )

If we are in agreement and if you can share the qvw of what you have done, we can get the ball rolling.

Best,

Sunny

swuehl · ‎2015-06-18

Maybe like

=COUNT({<Sequence = {"*OG*"}>} Sequence)

resp.

=COUNT({<Sequence = {"*PR*"}>} Sequence)

or

COUNT( IF(Sequence LIKE '*OG*', Sequence))

richardouellett · ‎2015-06-18

Thank you for this suggestion. This is a large database with many millions of records. The goal is to find a way to determine which of the 42 possible permutations had activity for a give query and then do the analysis. In the above example I believe there were only 18 of 42 permutations that had activity.

swuehl · ‎2015-06-18

You do already have a field with your permutations, right? Let's assume it's called PERMUT.

You can create a chart with PERMUT as dimension and an expression:

=COUNT(IF (Sequence LIKE '*' & PERMUT & '*', Sequence))

How does this perform in your setting? There are some alternatives for the match in the front end, you could also consider creating a link table between Sequence field and its permutations in the script.

richardouellett · ‎2015-06-18

The only field I have to work with is the 7 character Sequence field. Each successive 2 letter combination from this field provides the permutation equivalent, e.g., Mid(Sequence,1,2). I tried to show this in the last image above.

richardouellett · ‎2015-06-18

Hit return too fast. I like your second suggestion, can you provide a bit more detail?

swuehl · ‎2015-06-18

Richard Ouellette wrote:

The only field I have to work with is the 7 character Sequence field. Each successive 2 letter combination from this field provides the permutation equivalent, e.g., Mid(Sequence,1,2). I tried to show this in the last image above.

Well, I think you also have a list of permuations:

List has 42 maximum entries.

OL OG OC OA OP OR LO LG LC LA LP LR GO GL GC GA GP GR CO CL CG CA CP CR AO AL AG AC AP AR PO PL PG PC PA PR RO RL RG RC RA RP

Just create an INLINE LOAD for the list and you're set.

And you liked my second suggestion, which one?

edit:

This is how you can create your link table:

LOAD Sequence, mid(Sequence,iterno(),2) as Permut 
INLINE [
Sequence
OGLCAPR
OLGCAPR
] 
while IterNo() <= 6;

Instead of the INLINE, you can do a RESIDENT LOAD of your Sequence table.

richardouellett · ‎2015-06-18

Swuehl

Below is some sample code that you graciously assisted me with last week. As you can see I am calculating the time intervals as load time due to the complexity of the calculations and application impact. I would like to now perform the time durations based on the various time stamp permutations based on your above recommendation but am at a bit of a loss with how to proceed. Can you assist? Also my interval calculation is long and complex... is there an easier way to accomplish this?

DATA:

LOAD

recno() as ID,

Timestamp(today() + RAND()) as O,

Timestamp(today() + RAND()) as G,

Timestamp(today() + RAND()) as L,

Timestamp(today() + RAND()) as C,

Timestamp(today() + RAND()) as A,

Timestamp(today() + RAND()) as P,

Timestamp(today() + RAND()) as R

AutoGenerate 20;

X:

CrossTable (Field, Time) LOAD * RESIDENT DATA;

LEFT JOIN (DATA)

LOAD

ID,

Concat(Field, '',Time) as Sequence

Resident X

Group BY ID;

drop table X;

Drop fields ID;

Store DATA into Data.qvd (qvd);

Drop Table DATA;

DATA:

Load O,

G,

L,

C,

A,

P,

R,

Sequence,

Interval((If( MID(Sequence,7,1)='O',O, If( MID(Sequence,7,1)='L',L, If( MID(Sequence,7,1)='G',G, If( MID(Sequence,7,1)='C',C,

If( MID(Sequence,7,1)='A',A, If( MID(Sequence,7,1)='P',P,R))))))) - (If( MID(Sequence,1,1)='O',O, If( MID(Sequence,1,1)='L',L,

If( MID(Sequence,1,1)='G',G, If( MID(Sequence,1,1)='C',C, If( MID(Sequence,1,1)='A',A, If( MID(Sequence,1,1)='P',P,R)))))))

, 'hh:mm')*24 as i1_7,

Interval((If( MID(Sequence,2,1)='O',O, If( MID(Sequence,2,1)='L',L, If( MID(Sequence,2,1)='G',G, If( MID(Sequence,2,1)='C',C,

If( MID(Sequence,2,1)='A',A, If( MID(Sequence,2,1)='P',P,R))))))) - (If( MID(Sequence,1,1)='O',O, If( MID(Sequence,1,1)='L',L,

If( MID(Sequence,1,1)='G',G, If( MID(Sequence,1,1)='C',C, If( MID(Sequence,1,1)='A',A, If( MID(Sequence,1,1)='P',P,R)))))))

, 'hh:mm')*24 as i1_2,

Interval((If( MID(Sequence,3,1)='O',O, If( MID(Sequence,3,1)='L',L, If( MID(Sequence,3,1)='G',G, If( MID(Sequence,3,1)='C',C,

If( MID(Sequence,3,1)='A',A, If( MID(Sequence,3,1)='P',P,R))))))) - (If( MID(Sequence,2,1)='O',O, If( MID(Sequence,2,1)='L',L,

If( MID(Sequence,2,1)='G',G, If( MID(Sequence,2,1)='C',C, If( MID(Sequence,2,1)='A',A, If( MID(Sequence,2,1)='P',P,R)))))))

, 'hh:mm')*24 as i2_3,

Interval((If( MID(Sequence,4,1)='O',O, If( MID(Sequence,4,1)='L',L, If( MID(Sequence,4,1)='G',G, If( MID(Sequence,4,1)='C',C,

If( MID(Sequence,4,1)='A',A, If( MID(Sequence,4,1)='P',P,R))))))) - (If( MID(Sequence,3,1)='O',O, If( MID(Sequence,3,1)='L',L,

If( MID(Sequence,3,1)='G',G, If( MID(Sequence,3,1)='C',C, If( MID(Sequence,3,1)='A',A, If( MID(Sequence,3,1)='P',P,R)))))))

, 'hh:mm')*24 as i3_4,

Interval((If( MID(Sequence,5,1)='O',O, If( MID(Sequence,5,1)='L',L, If( MID(Sequence,5,1)='G',G, If( MID(Sequence,5,1)='C',C,

If( MID(Sequence,5,1)='A',A, If( MID(Sequence,5,1)='P',P,R))))))) - (If( MID(Sequence,4,1)='O',O, If( MID(Sequence,4,1)='L',L,

If( MID(Sequence,4,1)='G',G, If( MID(Sequence,4,1)='C',C, If( MID(Sequence,4,1)='A',A, If( MID(Sequence,4,1)='P',P,R)))))))

, 'hh:mm')*24 as i4_5,

Interval((If( MID(Sequence,6,1)='O',O, If( MID(Sequence,6,1)='L',L, If( MID(Sequence,6,1)='G',G, If( MID(Sequence,6,1)='C',C,

If( MID(Sequence,6,1)='A',A, If( MID(Sequence,6,1)='P',P,R))))))) - (If( MID(Sequence,5,1)='O',O, If( MID(Sequence,5,1)='L',L,

If( MID(Sequence,5,1)='G',G, If( MID(Sequence,5,1)='C',C, If( MID(Sequence,5,1)='A',A, If( MID(Sequence,5,1)='P',P,R)))))))

, 'hh:mm')*24 as i5_6,

Interval((If( MID(Sequence,7,1)='O',O, If( MID(Sequence,7,1)='L',L, If( MID(Sequence,7,1)='G',G, If( MID(Sequence,7,1)='C',C,

If( MID(Sequence,7,1)='A',A, If( MID(Sequence,7,1)='P',P,R))))))) - (If( MID(Sequence,6,1)='O',O, If( MID(Sequence,6,1)='L',L,

If( MID(Sequence,6,1)='G',G, If( MID(Sequence,6,1)='C',C, If( MID(Sequence,6,1)='A',A, If( MID(Sequence,6,1)='P',P,R)))))))

, 'hh:mm')*24 as i6_7

FROM

Data.qvd (QVD);

swuehl · ‎2015-06-18

If I've understood correctly, I think your interval calculations can be simplified if you perform them same place you calculate your sequence:

DATA:
LOAD
  recno() as ID,
  Timestamp(today() + RAND()) as O,
  Timestamp(today() + RAND()) as G,
  Timestamp(today() + RAND()) as L,
  Timestamp(today() + RAND()) as C,
  Timestamp(today() + RAND()) as A,
  Timestamp(today() + RAND()) as P,
  Timestamp(today() + RAND()) as R
AutoGenerate 20;

X:
CrossTable (Field, Time) LOAD * RESIDENT DATA;

LEFT JOIN (DATA)
LOAD
  ID,
  Concat(Field, '',Time) as Sequence,
  (min(Time,7)-min(Time,1))*24 as i1_7,
  (min(Time,2)-min(Time,1))*24 as i1_2,
  (min(Time,3)-min(Time,2))*24 as i2_3,
  (min(Time,4)-min(Time,3))*24 as i3_4,
  (min(Time,5)-min(Time,4))*24 as i4_5,
  (min(Time,6)-min(Time,5))*24 as i5_6,
  (min(Time,7)-min(Time,6))*24 as i6_7   
Resident X
Group BY ID;

drop table X;

LOAD Sequence, mid(Sequence,IterNo(),2) as Permut, 1 as Count
RESIDENT DATA
WHILE iterno() <= 6;

I also added the link table to your permutations (and not shown here, an INLINE table of all permutations, see attached QVW).

Does this allow to perform the analysis you want to do?