Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Aggregation problem

Hi all,

I have data as QUARTER, MONTH, RESID. For each RESID I need to count its occurrences in the data set but only if it occurs more than, say, 5 times. I have figured out how to accomplish this when counting all occurrences of RESID with the above mentioned condition by using the following expression:

=sum(aggr(nodistinct if (count(RESID)>5, 1, 0), MONTH, RESID))

However, if I remove the nodistinct keyword (or change it to distinct) to count distinct RESIDs instead of all occurrences I get very odd results as many RESID that do occur more than five times don't get any value on RESID level and thus sum up as zero.

Edit: I know it should not be zero as I also have another expression in the pivot showing the number of occurrences ( =count(RESID) ).

This might be something rather basic as I feel I haven't really grasped some of concepts of QV that behave differently from more traditional products...

Thanks in advance,

Esa

15 Replies
Not applicable
Author

Hi Esa,

I tried it with a dataset I made up, and I thought it was working. But, may be, I did not understand exactly what you are asking.

Anyway, if you like to have a look at my test application, it is attached. It is trying to find out the number of RESIDs whose COUNTs are greater than 2 for a given YEAR, QTR, MONTH combination.

Best regards,

Not applicable
Author

Hi Idaydin,

I checked out your dataset and it would seem to me that with it you get actually too little number for 2008/Q1, as it should be two (both bba and fba qualify) but at least my QV shows the result as only one.

The ultimate difference between your data and my data is that in my data most of the resources exceed the limit every month. See what happens if you add three entries for RESID fba to months jan, mar and apr on 2008. With the initial data you get 3 for the yearly total and one and two for Q1 and Q2 respectively. The resources behind the numbers are bba, dba and fba. With the altered data (which does not add any active resources) you get 6 for yearly total and 3 for both quarters even though none of the figures should have changed (fba was qualified for counting already with the initial data).

Br,

Esa

Not applicable
Author

hi,

I tried another solution. It is attached.

Best regards,

http://quickdevtips.blogspot.com

Not applicable
Author

Hi Ibaydin,

That seems like a great solution judging from your test project, but so far I've had trouble incorporating that in my project. Could this be rather heavy solution as my data size is around 1,7 million rows and after getting all pieces right qv.exe is running 100% on this 16-core piece of HW after almost 20 minutes it does still go on.

Anyway, that was a fresh angle at the task and I learned some new things here, again. Nice. I'll go and double-check my work with a limited data set.

Edit: BTW, is there a specific reason (other than simple typo) for the count expression being

=pick(Dimensionality(),$(=vLevel2Formula),$(=vLevel3Formula),$(=vLevel4Formula),$(=vLevel4Formula))

and not

=pick(Dimensionality(),$(=vLevel1Formula),$(=vLevel2Formula),$(=vLevel3Formula),$(=vLevel4Formula))

Edit 2: Yeah, got it working, and it seems like this approach is very sensitive to the amount of data. With <30000 rows there's no perceivable delay, but with 150 000 rows it already takes several seconds. Unfortunately it's not going to work for our data.

Br,

Esa

Not applicable
Author

Seems like the slowness was due to one additional dimension field that came from an joined table. During my attempt to pre-calculate the conditional counts in load script I moved that field to the same table as the rest of the data and suddenly the solution with pick(Dimensionality(),...) worked like a charm.

Thanks a lot Ibaydin for your help!

Br,

Esa

Not applicable
Author

Thanks for the update. I am glad that it worked.

Best regards,