15 Replies Latest reply: May 16, 2012 3:16 PM by B Aydin RSS

    Aggregation problem

    Esa Tikka

      Hi all,

       

      I have data as QUARTER, MONTH, RESID. For each RESID I need to count its occurrences in the data set but only if it occurs more than, say, 5 times. I have figured out how to accomplish this when counting all occurrences of RESID with the above mentioned condition by using the following expression:

       

      =sum(aggr(nodistinct if (count(RESID)>5, 1, 0), MONTH, RESID))
      

       

      However, if I remove the nodistinct keyword (or change it to distinct) to count distinct RESIDs instead of all occurrences I get very odd results as many RESID that do occur more than five times don't get any value on RESID level and thus sum up as zero.

      Edit: I know it should not be zero as I also have another expression in the pivot showing the number of occurrences ( =count(RESID) ).

       

      This might be something rather basic as I feel I haven't really grasped some of concepts of QV that behave differently from more traditional products...

       

      Thanks in advance,

       

      Esa

        • Aggregation problem
          Bin Xue

          Hi Esa,

           

          A simple example would illustrate what's the difference between distinct and nodistinct aggr. Say if we have a table

          Table1:

          key, value

          1,a

          1,b

          2,c

          2,d

          3,a

           

          aggr(distinct count(value),key) would produce the following output

          key, distinct count(value)

          1,2

          2,2

          3,1

           

          aggr(nodistinct count(value),key) would produce the following output

          key, nodistinct count(value)

          1,2

          1,2

          2,2

          2,2

          3,1

           

          If you do not specify distinct or nodistinct keyword, distinct would be assumed. That's why when you remove your nodistinct keyword, the sum would be different as a different table is produced by aggr function. Hope it helps.

           

          Regards,

          Xue Bin

          • Re: Aggregation problem
            Henric Cronström

            I assume that count(RESID)>5 should be defined globally and not just within each grouping defined by the group by fields? If so, I suggest you put the if function outside the aggregation:

             

            if(count(RESID)>5, <aggregation>, null())

             

            Secondly, you have it in a pivot table. If you have Month and RESID as dimensions, then I do not understand why you use the Aggr()-function. It would work straight off with

             

            if(count(total RESID)>5, count(RESID), null())

             

            HIC

              • Aggregation problem
                Esa Tikka

                I do want my counting to be defined on group level, so that the user can see the number of RESIDs on the desired level. The purpose is to follow the activity of resources on yearly, quarterly and monthly level, and to see the number of resources that have more than minimal amount of activity.

                 

                I have RESID as a dimension merely for the purpose to be able to verify my results, in the final version I will drop it off and just have the (time) dimensions plus the number of individual resources that have had more than minimal amount of acitivty within the time period the grouping.

                 

                Trying

                 

                =if(count(total RESID)>5, count(RESID), null())

                 

                produces exactly the same results as having just

                 

                =count(RESID)

                 

                but without "total"

                 

                =if(count(RESID)>5, count(RESID), null())

                 

                it does implement what I want, but still it does not solve the requirement of counting only distinct RESIDs fulfilling the condition. That is why I ended up putting "1" instead of "count(RESID)" in the THEN part of the IF clause, and that's probably why I ended up messing around with the aggregation (to get all the 1s added up).

                 

                I'll keep on trying to wrap my brain around this. I might have already solved this in Cognos (which I have more experience on) and I guess the biggest obstacle I have is my failure at forgetting the Cognos way of doing things...

                 

                Br,

                Esa

                  • Re: Aggregation problem
                    Henric Cronström

                    Then you should try =if(count(RESID)>5, count(distinct RESID), null())

                     

                    HIC

                      • Aggregation problem
                        Esa Tikka

                        Henric Cronström wrote:

                         

                        Then you should try =if(count(RESID)>5, count(distinct RESID), null())

                         

                        HIC

                        Unfortunately that results the same numbers as with having just

                        =count(distinct RESID)

                        thus disabling the condition count>5 from the aggregation. That's exactly what I had before I received the additional requirement of weeding out those resources that have minimal activity.

                        So basically my problem is how I get that additional condition to be applied to all aggregation.

                         

                         

                        I'll try to attach an annotated pic with screen copies from the report. In those,

                        "Assigned resources"=if(count(RESID)>5, count(distinct RESID), null())

                        and

                        "Assignments"=count(RESID)

                        http://onelove.doesntexist.org/~etikka/extra/sample_results1.GIF

                        In the upper part of the pic you see that for the month "huhti" none of the resources have over 5 assignments (thus count(RESID)<=5 meaning do not count those).

                         

                        In the lower part where the months have been collapsed "Assigned resources" results 3 for month "huhti" although it should be zero/null. In fact, the results are quite odd since the resource with the ID starting "f9a..." should not be counted at all even on the quarter level as there are not enough assignments (occurrences in the data) to qualify that.

                         

                        Thanks for your time anyway.

                         

                        Br,

                        Esa

                          • Re: Aggregation problem
                            Esa Tikka

                            I guess that from the screenshot it becomes clear that my issue is actually caused by having the expression evaluated always on the grouped numbers, not the data on the detail level or even on the lowest level in the dimension hierarchy in the pivot.

                             

                            Could someone please tell me if the thing I'm trying to do is somehow possible to accomplish directly in QV or is it implicitly "forbidden" to do such things with this tool?

                             

                            Browsing the docs it seems that grouping data is possible in load scripts so should I there calculate the values for some base interval and then just use that in the pivot?

                             

                            Br,

                            Esa

                             

                            Message was edited by: Esa Tikka after reading the manual...

                            • Aggregation problem
                              B Aydin

                              Hi,

                               

                              In the pivot tabIe (which is seen in your screenshots above), could you try the following expression for 'Assigned resources'?

                               

                              =sum(

                                   if(aggr(count(ResID)>5, Year,Quarter,Month,ResID)=-1,1,0)

                              )

                               

                               

                              Regards,

                                • Re: Aggregation problem
                                  Esa Tikka

                                  Ibaydin,

                                  On ResID level that works ok, but on Quarter and Year levels it gives way too high values, which I think is because the same ResID is counted many times. I don't quite know how the aggregation works in QV but I would guess that the same ResID occurs under multiple Months thus affecting Quarter level, and the same thing happens also between Quarter and Year.

                                   

                                  But thanks for providing something that works at least with limited level of grouping. I just need to put the different groupings to different pivots - that's a bit ugly but at least we now get figures that look sane. I need to do some double-checking, of course, but surely that looks promising!

                                   

                                  Br,

                                  Esa

                        • Re: Aggregation problem
                          B Aydin

                          Hi Esa,

                           

                          I tried it with a dataset I made up, and I thought it was working. But, may be, I did not understand exactly what you are asking.

                           

                          Anyway, if you like to have a look at my test application, it is attached. It is trying to find out the number of RESIDs whose COUNTs are greater than 2 for a given YEAR, QTR, MONTH combination.

                           

                          Best regards,

                            • Re: Aggregation problem
                              Esa Tikka

                              Hi Idaydin,

                              I checked out your dataset and it would seem to me that with it you get actually too little number for 2008/Q1, as it should be two (both bba and fba qualify) but at least my QV shows the result as only one.

                               

                              The ultimate difference between your data and my data is that in my data most of the resources exceed the limit every month. See what happens if you add three entries for RESID fba to months jan, mar and apr on 2008. With the initial data you get 3 for the yearly total and one and two for Q1 and Q2 respectively. The resources behind the numbers are bba, dba and fba. With the altered data (which does not add any active resources) you get 6 for yearly total and 3 for both quarters even though none of the figures should have changed (fba was qualified for counting already with the initial data).

                               

                              Br,

                              Esa

                                • Re: Aggregation problem
                                  B Aydin

                                  hi,

                                   

                                  I tried another solution. It is attached.

                                   

                                  Best regards,

                                   

                                  http://quickdevtips.blogspot.com

                                    • Re: Aggregation problem
                                      Esa Tikka

                                      Hi Ibaydin,

                                      That seems like a great solution judging from your test project, but so far I've had trouble incorporating that in my project. Could this be rather heavy solution as my data size is around 1,7 million rows and after getting all pieces right qv.exe is running 100% on this 16-core piece of HW after almost 20 minutes it does still go on.

                                       

                                      Anyway, that was a fresh angle at the task and I learned some new things here, again. Nice. I'll go and double-check my work with a limited data set.

                                       

                                      Edit: BTW, is there a specific reason (other than simple typo) for the count expression being

                                      =pick(Dimensionality(),$(=vLevel2Formula),$(=vLevel3Formula),$(=vLevel4Formula),$(=vLevel4Formula))

                                      and not

                                      =pick(Dimensionality(),$(=vLevel1Formula),$(=vLevel2Formula),$(=vLevel3Formula),$(=vLevel4Formula))

                                       

                                      Edit 2: Yeah, got it working, and it seems like this approach is very sensitive to the amount of data. With <30000 rows there's no perceivable delay, but with 150 000 rows it already takes several seconds. Unfortunately it's not going to work for our data.

                                       

                                      Br,

                                      Esa

                                      • Re: Aggregation problem
                                        Esa Tikka

                                        Seems like the slowness was due to one additional dimension field that came from an joined table. During my attempt to pre-calculate the conditional counts in load script I moved that field to the same table as the rest of the data and suddenly the solution with pick(Dimensionality(),...) worked like a charm.

                                         

                                        Thanks a lot Ibaydin for your help!

                                         

                                        Br,

                                        Esa