Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Replace count distinct by sum : bad results

Hello,

I have attached an example.

I want to test to replace count distinct big expression with many if conditions by sum with if conditions to compare calculation speed.

I used a table with my data (TMP) and an column containing Id

I have added a table with only the distinct Id and a column flag.

But with replacing my count distinct by a sum of the flag column, I have bad results.

Could you explain me what is the problem?

Cause to the filter conditions I can't use the peek style load script because the first line with the Id which obtain the flag may have other fields values which validate only certain condition and not others. But i want my count distinct evaluate expressions dynamically on selected data.

That why I have attempted to use a table (TMPDistinct)

to associate one single Id value to one single flag 1.

Regards,

1 Solution

Accepted Solutions
swuehl
MVP
MVP

You could try something like this (if you are limiting your evaluation to a certain date):

=If(

     CategoryTtl<>'Total'

     ,sum({$<Date={'$(=vDate)'}>} aggr(if(CategoryTtl=Category,  flag),CategoryTtl,Category,Date,Id)) 

      ,sum({$<Date={'$(=vDate)'}>} aggr(flag,CategoryTtl, Date, Id))

)

View solution in original post

13 Replies
Not applicable
Author

I have done some other test with formulas in attached application.

I don't know if the thing I want is possible:

use the current line value of the "fake" dimension CategoryTtl as value condition on the true aggregated sum  on field Category in the "Set analysis" formula.

The "best" I have done is the yellow column with an explicit if switch. I don't think it's really "better" concerning performance...

And with the If conditions I don't understand why all my Sum result are wrong.

swuehl
MVP
MVP

Hi user5674,

as soon as you use something like

=sum(If(Date=vDate,flag))

in your chart with dimension Category, you tell QV to go through your table containing Date and return flag for every line where Date=vDate.

For example, for your chosen date and Cat A, there are three lines. But you are only interested in the distinct two lines.

If you use the set analysis version,

=sum( {$<Date={'$(=vDate)'}>} flag)


you limit the possible values for ID for Cat A to 1 and 4 by your set expression, so if you do the sum over flag, only two values of 1 are summed up (again, QV will look at each line of your flag table, but here there are only two lines to be regarded).

This sounds a bit complicated and I think it is. I hope I made my point a bit clear.

Regards,

Stefan


Not applicable
Author

Hi,

Thanks for your answer.

I understand what is the problem on the simple Sum(if(...)) now. It take all lines of the TMP table validating the condition instead of all lines of the flag table (TMPDistinct ).

The first "set analysis" on the left pivot table (Sum set analysis) seems give the correct results.

But I don't know why the "set analysis" don't take account of the selection. if I select the 20/05/2012 the column :

-first table: Sum set analysis

-second table: Sum set analysis with explicit if-switch

remain the same but I haven't the condition Date=vDate (=25/05/2012) validated as Date=20/05/2012.

So why I haven't "0" ?I have written the "$" though in order to use current selection.

I haven't solution to have the good result on my second table with the "fake" dimension "CategoryTtl".

For this one I'm not sure if  it's possible to heve the correct results without the horrible If/Switch conditions: I think a count distinct with the condition if(Category=CategoryTtl,...) is less heavy in calculation time, so it's meaningless to use set analysis with this method.

swuehl
MVP
MVP

>So why I haven't "0" ?I have written the "$" though in order to use current selection.

Sure, you are using the current selection set, but you override your selection in Date by using set modifier

<Date={'$(=vDate)'}>}


and vDate equals 25/05/2012, right?

Looking at your second question using the data island, may I ask why do you want to use a data island here?

I think you can make your expression work like:

=If(

    CategoryTtl<>'Total'

    ,sum({$<Date={'$(=vDate)'}>} aggr(if(CategoryTtl=Category,  flag),CategoryTtl,Category,Date,Id))  

     ,sum({$<Date={'$(=vDate)'}>} flag)

)

Also, maybe this is of use to understand how QV handles set analysis in dimension context:

http://community.qlik.com/docs/DOC-1335

Regards,

Stefan

Not applicable
Author

About the <Date={'$(=vDate)'}>} I misunderstood set analysis here. I thought that was just a sort of filter on current selection, not a forced selection which override current selection.

But for this type of formula this isn't a big problem here because the variable is automatically updated by macro in the real application when I change the date field.

Concerning the goal of the data island this is related to this subject: http://community.qlik.com/message/221295

I have a total aggregated over a "fake" dimension label in order to have two total rows with distinct formula.

I have deleted the two total rows in this dicussion because it isn't directly related to the subject.

Regards,

Sylvain

Not applicable
Author

Hello Stefan,

Thanks for the formula

=If(

    CategoryTtl<>'Total'

    ,sum({$<Date={'$(=vDate)'}>} aggr(if(CategoryTtl=Category,  flag),CategoryTtl,Category,Date,Id))  

     ,sum({$<Date={'$(=vDate)'}>} flag)

)

It return good dynamic results on "normal" rows, but the total row isn't dynamic (always "4" and not "0" when the 2012-05-20 date is selected). I have tried to change as sum({$<Date={'$(=vDate)'}>} aggr(flag,Date) ) but it don't give me the good result.

Have you a solution for this ?

Regards,

Sylvain

swuehl
MVP
MVP

You could try something like this (if you are limiting your evaluation to a certain date):

=If(

     CategoryTtl<>'Total'

     ,sum({$<Date={'$(=vDate)'}>} aggr(if(CategoryTtl=Category,  flag),CategoryTtl,Category,Date,Id)) 

      ,sum({$<Date={'$(=vDate)'}>} aggr(flag,CategoryTtl, Date, Id))

)

Not applicable
Author

Thanks!

It solves my problem.

Not applicable
Author

Hi,

I have a question on the method sum( {SET} aggr(, Dimensions) 😞

I must add each field used in the set analysis formula as a dimensions of the aggr?

I have many "IF" condition with text field range exclusion and date field range inclusion to replace by a set analysis formula in order to boost performances in my true application.

If I must add them, I think I don't optimize but decrease performance as I aggregate over the ID I want count, and other fields.

My full expression is like this:

Count(DISTINCT
IF(
  [Category]=[CategoryTtl]
  AND [Period]=vLastPeriod
  AND Class='B'
  AND ((vActive=0 AND Action='Y') OR vActive=1)
  AND CStatus<>'SOLVED'
  AND CStatus<>'CANCELED'
  AND (not isnull([CStatus))
  AND CStatus<>''
  AND Record.Date<=vLastDate
  AND Record.Date>vLastDate_1
  ,FlagDistinctID
)
)


I have set this expression but it doesn't work (only 0 on each row) :

(variable)

Sum(

    {$<
      Period={'$(=vLastPeriod)'}
      ,Class={'B'}
      ,CStatus - = {'SOLVED','CANCELED',''}
      ,"Record.Date"={'>$(=vLastDate_1)<=$(=vLastDate)'}
     >}

     aggr(
          IF(
               [Category]=[CategoryTtl]
               AND ((vActive=0 AND Action='Y') OR vActive=1)
               ,FlagDistinctID
          )
          ,[CategoryTtl],[Category]
      )
)

My goal is to decrease expressions calculation time which is too long.

I can't do something as sub aggregates/fields on the load script because I have many expressions like this one and my application have many selectable field for users, so my expressions must remain dependant of users selections.

And It's to difficult to estimate all possible sub aggregates to flag distinct ID in the load script.

( expression A->distinct ID, expression B->distinct ID,...)

Could you help me ?