difference in meaning in (set) expressions

DutchArjo · ‎2024-03-22

For a Qlik sense training course I have created a gauge displaying the ratio of males as a percentage of the total population. Because I wanted to have a benchmark, I set a limit which is dynamic with the use of a set-expression.

Whenever I select a different age group, the gauge changes to display a new number.

I have created a master item (measure) for this ratio and added this expression:

count(distinct(if(Gender='Male',PatientID)))
/
count(distinct(PatientID))

I don't know why but I thought of finding out a different way to write this expression. I tried some things and finally ended up with this:

count({1<Gender={'Male'}>} distinct(PatientID))
/
count(distinct(PatientID))

Can I conclude that both expressions actually do the same and performance-wise: does one way have an advantage over the other?

BrunPierre · ‎2024-03-22

The second expression using set analysis may offer better performance, particularly with a larger data set, even though both will produce the same result.

The first expression utilizes a conditional IF statement to filter males before counting.

marcus_sommer · ‎2024-03-22

Your second approach of: {1<Gender={'Male'}>} is called set analysis and in general the better method because it worked like a selection against the field which only contained distinct values - just flagging the values with TRUE/FALSE in dependency to the selection state.

Your first method of: if(Gender='Male',PatientID) is a if-loop and is much more worse from a performance point of view because it will check the condition for each single record.

Beside this is a set analysis much more powerful because you could change/overwrite the selection state in many ways like you already did with the 1 as set identifier. The if-loop instead will always be following the selection state directly. It needs to be only applied if the conditions needs to be checked against a row-level.

DutchArjo · ‎2024-03-22

However, When looking at the expressions again, I am wondering: the numerator has the set analysis, but the denumerator has not. So the denumator changes for every selection causing a completely wrong result I assume.

Therefore, the total combination of the expressions should be like this?
Male ratio Expression:

count({$<Gender = {'Male'}>} distinct(PatientID))

/

count(distinct(PatientID))

Male Ratio benchmark expression:

Count({1<Gender = {'Male'}>} distinct(PatientID))

 /

Count({1} distinct(PatientID))

And while I created separate master items for both the male and female ratio, but also the male and female benchmark, I was thinking: What would be faster/better?

1. have both expressions for the benchmarks.

2. 1 benchmark expression and the second one derived from the first? Because with only 2 possible options, the second benchmark for example is 1 minus the first benchmark.

(but only with 2 possible options of course)

marcus_sommer · ‎2024-03-25

If you could use an expression like 1 - count() it would be faster as count() / count() but in many scenarios you won't notice the differences of the calculation time.

Personally I'm not so sure that you have here only two options because there might be already currently or in the future any kind of X-value between both "classical" genders respectively '<NONE>' and also the possibility to have issues with the data-quality and the values are written differently or might be NULL.

If any set analysis needs to be included within the numerator as well as within the denumerator depends on the aimed view - should it in some way be independent to the selection state or not? Various views might be intended but defining a fix set analysis may prevent them respectively returning incorrect results and/or forcing you to create multiple versions.

In this regard I'm not sure that using master-measures for it is a good idea - at least not from a developing and maintaining point of view. Ideally an application has only simple sum() and count() expressions and the user selects directly the wanted views. Adding a set analysis (respectively any other kind of condition) to them should be only done if there are specific requirements but not as a general approach.

Further the most expressions shouldn't be moved into master-measures and/or (parallelized) variables because it always adds some complexity. Only if there is a higher degree of redundancy and/or complexity of the expression and/or they is really usable to combine and/or nesting them with other expression parts they could have an added value. Only if master-measure and variables really simplify the matter they should be used - otherwise it's a detour.

DutchArjo

Yes, in case of gender, nowadays it is not limited to 2 possible options anymore. But let's say in a 1 or 0 or yes or no field (using a validation list), the '1 minus' solution would be the more efficient option if I understand you correctly.

The main reason I made these expressions as master-item is because of the course I'm following to learn/better understand Qlik sense. And that day the subject 'creating master items' was on the agenda 😉

marcus_sommer

Numeric values will be in general need less storage and processing as strings but by just two respectively only a few values it won't be significantly (Qlik stored only the distinct values).

Quite useful in many cases is to use dual-values, like:

pick(match(Gender, 'male', 'female', 'X') + 1,
dual('no value', 0), dual('male', 1), dual('female', 2), dual('X', 3)) as GenderFlag

to provide descriptions for ID/Flag-values or maybe adding a customized sorting to strings and making it possible to query them as strings as well as numbers, for example:

sum({< GenderFlag = {'female', 'male'}>} Value)

respectively

sum({< GenderFlag = {">0<3"}>} Value)

Another approach might be to create multiple flags, like:

-(Gender='male') as MaleFlag

and returning 0/1 (-1 is reversed) and then this flag might be applied like above in the set analysis but also like:

sum(Value) * MaleFlag

I never worked with master-items but like above hinted I have serious doubts that it would be sensible to place nearly everything in them - even if they are reusable between several applications and could be globally maintained because it's additionally layer which costs some overhead.

expression

Qlik Sense

Set Analysis