Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Angazi
Contributor II
Contributor II

Split strings by delimiter into multiple subfields similar to how CSV does it but using Qlik

Hi,

I have some data exported from Shopify which is effect is a whole bunch of comma delimited tag words of a particular item but stored in a single field. In practice its something like a Tshirt in size Medium where a customer has picked multiple single items of the various colors on sale and has purchased a multi color pack of goods. I want to split up this field (which can be any length) into separate columns in a Qlik table, similar to what would happen if you imported the field as if it were a CSV without the " " around the string when it gets split into separate columns. I'm using Qlikview here.

Some example data below:

[Table1]:
LOAD * INLINE [
Order, ColorList
#43308, "White,White,White,White,White,White"
#43327, "Powder Beige,Powder Beige,Powder Beige,Powder Beige,White,White,White"
#43329, "Black,Black,Black,Black,Powder Beige"
#43335, "Powder Beige,Powder Beige,Powder Beige,Powder Beige,Powder Beige"
#43342, "Black,Black,Black,Midnight Navy,Midnight Navy,Midnight Navy"
#43346, "Black,Black"
#43347, "Black,Black,Black,Black,Black,Black,Black"
#43349, "Black,Black,Black,Black,Black,Black,Black"
#43354, "Powder Beige,Powder Beige,Powder Beige,Black,Black"
#43354, "Black,Cherry,Black"
#43361, "Blood Stone,Blood Stone,Midnight Navy,Midnight Navy,Black,Black,Black"
];

Ideally I want to display the above data as:

Order       White    PowderBeige   Black   ... etc.

#43308        6

#43308        3                 4

#43329                           1                  4

... etc.

If the formatting above doesnt get mangled when I post this I hope you'll get the idea. I simply want to create additional columns using the unique names of the colors as the title of the additional columns and then put the count of the number of times that color name is repeated per order line in the ColorList string(field).

With help from other posts in this forum 

https://community.qlik.com/t5/QlikView-App-Dev/Split-String-Function-In-QlikView/td-p/221912

https://community.qlik.com/t5/QlikView-App-Dev/How-to-get-Subfield-value-count/td-p/488174

https://community.qlik.com/t5/New-to-Qlik-Analytics/SUBFIELD-and-CountIF/td-p/1786994

and https://www.bitmetric.nl/blog/qlik-sense-subfield/ 

its fairly straightforward to determine the unique items in the ColorList strings and also to count them, and then using set expressions and manually creating a Table report column per unique color name achieve a result.

For example by doing this:

LOAD Order, SubField(ColorList, ',') AS UniqueColor RESIDENT [Table1];

LOAD Order, SubStringCount(ColorList,',')+1 AS ColorCount RESIDENT [Table1];

However, the caveat is this isnt easily scalable. Doing it with 10 items is easy, but when I have a list of 500+ colors and 50000 records I need to find a way to do this for much longer comma delimited strings and have a way of in script splitting up this string into separate fields or a subtable of the color list items which I can then further manipulate as desired.

This expression works (obviously I change the color name in the {' '} section for other colors in additional expressions per color but this doesnt scale well so its a lot of manual model front end maintenance to keep this updated.

=Count({<UniqueColor={'Black'}>}SubField(ColorList,','))

Can anyone suggest a way I can do this in Qlik code (or some other method that's generic and automated) for different delimited color names (or any other tags) in the loaded string automatically in Qlik.

Thanks

Labels (3)
1 Solution

Accepted Solutions
rodrigo_martins
Partner - Creator
Partner - Creator

Hello @Angazi 
Yes, I did mean tall table, sorry about that!

In my example I built the Pivot Table with the same model as yours (up to [Table2]), and the following settings (it's in pt-br):

rodrigo_martins_0-1715197653644.png

Another way to achieve the same result would be to create a third table in your model, grouping the orders and colors and counting the number of colors in each group, like this:

Table2:
Load
Order,
SubField(ColorList, ',') as Colors
Resident Table1;

Drop Table Table1;

Table3:
Load
SubField(KeyOrderColor, '|', 1) as Order,
SubField(KeyOrderColor, '|', 2) as Color,
Qty;
Load 
Order & '|' & Colors as KeyOrderColor,
Count(Colors) as Qty
Resident Table2 Group By Order & '|' & Colors ;

Drop Table Table2;

The resulting model would look like this

rodrigo_martins_1-1715197817436.png

The Pivot Table would then need to be modified to:

rodrigo_martins_2-1715197834921.png

(it can be Max, Min, Sum, as there will only be a single record for the key order + color)

Hope that helps!

View solution in original post

4 Replies
rodrigo_martins
Partner - Creator
Partner - Creator

Hi @Angazi ,

I'm not sure I understand exactly what you need. If you have already tested the alternative I am going to present, I am sorry.
From your example model, just run this script:

Table2:
Load
Order,
SubField(ColorList, ',') as Colors
Resident Table1;

To have Qlik automatically generate this data model:

rodrigo_martins_0-1715006838266.png

 

In this scenario, working with many columns (an undefined number) would not be ideal. The best way is actually a "high" table, which is already the result offered by Qlik. With this table you can create the visualization you presented in your question (the pivot table generates the columns automatically):

rodrigo_martins_1-1715006870854.png

On the other hand, if you need the columns in your data model, you can try using the Generic prefix in the result of the first transformation. It does the opposite of a Crosstable, generating tables for different values ​​from the 2nd dimension of the original table. The disadvantage is that it generates a table for each value found, which will leave your model very cluttered (500+ tables):

rodrigo_martins_2-1715006979740.png

You can even work on joining these tables, but in that case you will still be stuck with a "fixed" list of colors. As for performance, I believe that the first option (without using Generic) is better, but the ideal is to always test the available options and compare.

Hope this helps.

Angazi
Contributor II
Contributor II
Author

Hi Rodrigo,
Thank you very much for your prompt response and detailed explanations.
 
I am unclear by what you mean by a "high" table - is that the same thing as a "tall" table as described here?
 
I have tried in the past 2 days various script permutations using crosstable and generic prefixes but have not had any success. My crosstable syntax produces incorrect results and generic gives me many tables but they contain multiple color names per named table which is also not what I'm expecting.
 
I have made a little progress in the following direction however,
If I take your expression, which appears to be the same as my "UniqueColor" example.
 
Table2:
Load Order, SubField(ColorList, ',') as Colors Resident Table1;
 
I see this, which is helpful because I am getting the Colors - the colorname - repeated multiple times per order - which makes sense because in my original ColourList data this colorname was repeated that many times separated by the comma delimiter, so then in theory if I can count the number of times a particular colorname repeats per order# I have something useful to start with.
 

Angazi_4-1715190665330.png

 
Table3:
LOAD Colors, TextCount(Colors) as ColorTextCount Resident Table2 Group By Colors;
 
I see this which seems to show a correct TOTAL count of the colornames across all the ColorList column data in the example INLINE sample data I provided initially.
 

Angazi_5-1715190745223.png

So now while I have gotten to a total above which is great, I'd also like to have an intermediate sum of the count of unique colornames per individual order#.
 
Like I think you were suggesting I need to use a Pivot table for this. However, clearly I'm missing something fundamental and/or my aggregation logic is broken, because the below isnt getting me the result I want and when I add Colors as a dimension the results look even worse.
 

Angazi_6-1715190824404.png

I'm trying to get to the view you showed like this:
 

Angazi_7-1715190844021.png

Thanks
rodrigo_martins
Partner - Creator
Partner - Creator

Hello @Angazi 
Yes, I did mean tall table, sorry about that!

In my example I built the Pivot Table with the same model as yours (up to [Table2]), and the following settings (it's in pt-br):

rodrigo_martins_0-1715197653644.png

Another way to achieve the same result would be to create a third table in your model, grouping the orders and colors and counting the number of colors in each group, like this:

Table2:
Load
Order,
SubField(ColorList, ',') as Colors
Resident Table1;

Drop Table Table1;

Table3:
Load
SubField(KeyOrderColor, '|', 1) as Order,
SubField(KeyOrderColor, '|', 2) as Color,
Qty;
Load 
Order & '|' & Colors as KeyOrderColor,
Count(Colors) as Qty
Resident Table2 Group By Order & '|' & Colors ;

Drop Table Table2;

The resulting model would look like this

rodrigo_martins_1-1715197817436.png

The Pivot Table would then need to be modified to:

rodrigo_martins_2-1715197834921.png

(it can be Max, Min, Sum, as there will only be a single record for the key order + color)

Hope that helps!

Angazi
Contributor II
Contributor II
Author

Aha, thank you, I was over complicating things, your approach is much simpler than what I was trying and more importantly works perfectly🙂 In en-za I get to this in Qlikview:
 

Angazi_2-1715269422706.png

Trying the table3 approach, I get this which is also as expected.
 

Angazi_3-1715269440757.png

Excellent, I'm pretty sure I can scale this concept generically across all the excessive comma delimited "tag" fields Shopify data exports insist on frustrating me with so that I can build proper models that make much more sense to me by using Qlik.
 
Wonderful, thank you very much for your superb assistance - this is a great solution. Cheers