I also looked into the document and indeed very interesting challenge but i'm a bit confused about the correct/wanted result.
Could you repost the Excel after you added a collumn with the correct/wanted average per line ? So I can see where it changes and how it is calculated.
Most of the time a Excel with the correct data brings 50% of the sollution to these 'challenges'.
Thanks for the interest in the challange.
I have attached a new Excel document with some of the working, note in this example i am only checking for points above the line however if we can resolve the issue for points above the line we will be able to also apply it to points below the line. Please also note a lot of this is done manually in my example eg having to identify the 9th point and then restarting the running average in a new column. Hopefully this is clearer as to desired outcomes. Just let me know if there is anything else.
Please note that we need to be comparing each data point to a RUNNING average.
Data Average Changes V2.xls 74.0 K
I think the Excel-solution is not far away.
Attached is a Excel with the same results as you gave me but I still have to copy 2 columns and manualy activate the formula.
I don't think it is a good moment to look into the Excel, but I would ask you if you could give me a new list of data (only Date + Data is OK, no chart, no correct figures) and I will use the formules to predict the results. After that it will be up to you to check these and if this is OK I think we can continue to Qlikview.
PS: I don't think it is possible to solve the challlange with formules, but will have to use the script I suppose, and that could slow down the load. Are we talking about great number of records ?
PS2 : Erica if you find the solution first, please do NOT say it was easy, it would ruin my week
Data Average Changes V3.xls 98.5 K
This is definitely another step closer.
I have attached another spreadsheet with new data. The first tab is the same data as before however with no chart and extra columns. The second sheet i have modified the records in the data column.
The data sets will not be large so load times should not be an issue. My only concern with doing this in the script is that the chart will not be dynamic for example if we wanted to filter for a particular year or region. However if we can do it in the script that is still better than manually in excel.
New Data Average Changes.xls 37.5 K
Here's a solution in script, I think, using some random data, but it proves the method.
// Sample Data - row order will need to be accurate to ensure correct calculation later
Rowno() & ':' & ID & ':' & Value as UID,
* inline [
Let SeedSample = 3; //specify how many rows to sample to start off average
Let RunPeriod = 8; //specify how many consecutive periods should trigger a change
Load Avg(Value) as SeedAvg resident Data1 where RecNo() <= $(SeedSample);
Let Seed = peek('SeedAvg');
Drop Table FirstAvg;
let runningAvg = 0;
let runningSum = 0;
let AvgSequence = 0;
let LastAvg = 0;
let seq = 1;
let AvgRun = 0;
//Loop over each row
for r = 1 to NoOfRows('Data1')
let runningAvgPrev = runningAvg;
let runningSum = (runningSum + SubField(FieldValue('UID',$(r)),':',3));
let runningAvg = runningSum / seq;
if runningAvg >= Seed then
let AvgSequence = 1;
let AvgSequence = -1;
if AvgSequencePrev=AvgSequence then
let AvgRun = AvgRun + 1;
let AvgRun = 1;
let AvgSequencePrev = AvgSequence;
if AvgRun = $(RunPeriod) then
let seq = 1;
let runningSum = SubField(FieldValue('UID',$(r)),':',3);
let runningAvg = runningAvgPrev;
let Seed = runningAvg;
let AvgRun = 1;
let seq = seq + 1;
$(r) as ID,
$(runningSum) as RunningSum,
$(runningAvg) as RunningAvg,
$(AvgSequence) as AvgSequence,
$(AvgRun) as AvgRun,
$(Seed) as ActiveAvg
Try copying it into a new doc and see how it works.
Thanks for responding. I have put your script into a QVW and input the data (see QVW attached). It seems to be re-calculating the average every 8 points (or what ever the 'runperiod' is set to). What we require to trigger a change in the average is 9 (runperiod) consecutive points above the average. I feel like your script and the concept of a loop approach is another step closer with one of the remaining piece of the puzzle for this approach is getting the step up trigger correct.
I look forward to seeing if we crack this.
Flipside eg new numbers.qvw 202.8 K
I have made this change and we are still having the average calculated at regular intervals instead of only when there are 9 consecutive points above the running average.
I think what we are missing is that every loop we need to re compare the previous 9 'Values' to the new 'RunningAvg' to see if they are above it or not. If they are all above the new running average this would trigger a change.
Again i feel like this approach is close.
Hmm, I'm not entirely sure what you need. My script changes the ActiveAvg when it finds x consecutive values above (or x consecutive values below) its value. Once ActiveAvg is reset it then compares this new value with the average reset from the next row of data (so it will be that row's data value divided by 1) and so on until it finds another run of x consecutive values above (or below) its new value. In your example your data looks like it has an increasing trend so the average is continually increasing. Try putting in a sequence of some lower values mid way to see if it drags the average down.
What I would suggest however is that you build up a script starting as follows so you can be sure the logic is as you require it ...
1. Ensure your data has a unique field in it. I have created the field UID. This is so we can guarantee that the FieldValue returns the rows without skipping any values.
2. Build up a table called RunningAvgs row by row in a loop starting with totals only. Your first script might look like this ...
let runningSum = 0;
for r = 1 to NoOfRows('Data1')
let runningSum = (runningSum + SubField(FieldValue('UID',$(r)),':',3));
$(r) as ID,
$(runningSum) as RunningSum
You can compare the calculated values with an excel spreadsheet to see if the logic is as you require, and moreover, that the rows are being called in the correct sequence.
3. Add in logic to calculate the average. At this point you could just divide the RunningSum by r, but because you want to reset the average later on, you'll need to use a second counter, seq. You can't use r because it would break the loop sequence and/or cause an eternal loop. Remember to increase the seq counter within the loop.
4. Once you have your RunningAvg, you can start comparing it to the Seed average values or Data values to determine whether it is above or below (1, -1), and then on the next loop check if there is a sequence of above/below values and increase the sequence counter.
It's easier to understand when you've built up the logic from the ground.
Hope this helps
Enclosed the new data with the Excel-formula.
The first set is givin same result, the second has to be checked.
Did you test flipside's solution ? I'am very curious but I have no time for the moment to test it myself.
Perhaps later on.
flipside, you have already my respect
New Data Average Changes V2.xls 100.5 K
The second sheet is very close to be being perfect. The only thing is that once the rule has been meet it can not be meet again until there has been at lease another 9 points. I have reattached it and highlighted in red where this rule would come into effect. I am not sure how/where we add the rule that says if the average has restarted then needs to be at least 9 points away from when it started but i think once we get it we would have automated it in excel at least.
Thanks again for all your help to date
New Data Average Changes V4.xls 103.5 K
Hope you are still interested in a Excel-solution because I did not follow the QlikView-solution yet ?
The previous Excel was correct, but I had to copy the calculated formulas manualy to 2 columns.
In this new version I used the INDERECT-Excel-function (new to me) so that you only have to give columns A and B and the average will show up in column R (you can hide all the other columns).
I also added 9 columns to test if the data is decreasing, but I think in the current sample there is no such point.
Could you confirm the correct results (and test other series of data) ?
If it's OK, we have a tool to compare it with the QV-results, and I can start looking into it (but I think the others are more skilled in QV than me !!)
It is possible in the chart.
If you set 2 more expressions as counters
R2CUMA_revCT: how many rows above untill the next time there is a run of 8
R2CUMA_abvCT: how many rows below until the next time there is a run of 8
Then use these in a rangesum calculation to get the average for "R2CUMA_revCT" rows below and "R2CUMA_abvCT" rows above.
Example attached. NB: the shift is barely detectable, but still there!
It's not quite there yet, but something to work from!
Thanks Erica. This also sounds like it is a step closer and if we can do it in the chart that would be AMAZING!!! In your example it seems that the average is not what I would have expected though. I would have expected it to be around 91% for starters and then jump to around 94% on the 01/11/13 to the end of the period. This is based on 9 consecutive points. Again i feel like the concept is there and it is just a matter of tweaking to get it right.
I have to head to a wedding now so don't have time to fully understand the new fields sorry but I will hopefully get a chance over the weekend.
It's probably because there issues with the Above function - for some reason rangecount(above(Data,0,30) is returning 17, not 30! I think it because the field is dynamic ie $(variable) rather than the field name... the counters also need to be tweaked to recognise the end of a run rather than just 8 steps into it
Anyway enjoy your weekend, and the wedding
If you change the counters to this:
if(([R2CUMA]<=-8 and Below([R2CUMA])=0) or ([R2CUM]<=-8 and Below([R2CUM])=0),
if((above([R2CUMA])<=-8 and [R2CUMA]=0) or (above([R2CUM])<=-8 and [R2CUM]=0),
Then they will commence at the end of a run of 8 or more negative/ positive values.
I'm losing the plot with the Range and above/below functions though when it comes to the average. Even just a simple expression rangecount(above(total [Data],-30,30)) is returning strange values. (it should return 30 - because that is the range that is returned by the above function...!)
Any ideas on that anyone?
Okay - this works, but only as a temporary fix as either I don't fully under stand the above / below functions, or there maybe a bug.
I've replaced the "average 2" formula with a function which uses the below() function if it is at the start of a run (seems to work fine when offset=0) otherwise, steal the value above it!
Thanks again for having another crack at this however the average should be jumping up on 1/09/11 (assuming 9 consecutive points).
I am pretty sure in your example it is jumping up later because 'R2CUMA_abvCT' and 'R2CUMA_revCT' are linked to 'R2CUMA' and 'R2CUM' which are both linked to the overall 'Average'. We really need to be testing each data point against a cumulative average, a good example is Pauls excel document. I have reattached this with the original data for comparison purposes.
I think what we need is:
- a cumulative average field. Then:
- R2CUMA and R2CUM to be changed so at each data point they check the previous 9 data points against the current 'CumAvg' and come up with the number that meet the criteria. Your other logic should then flow from there. My understanding of how the above and below is far less than yours but hopefully it makes sense what i THINK we need to do.
From what i can tell the other functionality seems to be working.
This is an interesting problem. There are great many clever approaches above. I could not help but to give it a try, so here is another variant. I used two counters, one that keeps track of consecutive points above average and one for points below. Then the Average formula checks these counters and updates the average when it is time.
The values is consistent with my manual check in Excel, but the average adjustments points differ from the picture in Excel. Maybe I misunderstood the logic. Anyhow, here is the solution.
Expression for Counter to track points above average (CounterH):
Expression for Counter to track points above average (CounterL):
The mod function secures that the counter is reset after 9 consecutive points. The Counters need to be set as invisible and also put on the right axis.
Average starts with an average of all data, and a change kicks in when any of the counters reach 8 (0-8 = 9 points) and creates an average of points below.
=if(RowNo()<=8, Avg(Total Data),
if(CounterH<8 AND CounterL<8, Above(Average,1),rangesum(Below(sum(Data),0,NoOfRows()-RowNo()))/ (NoOfRows()-RowNo()) ))
Thanks for replying to the problem, it is great to see yet another approach. I am looking forward to seeing what one will finally crack it.
I think the reason it is different to the picture in excel is that we should be using a cumulative average to compare to each data point right from the start. I think currently it is only doing this after there has been 8 points above the TOTAL average.
Please note that we are also looking for 9 points above or below the average.
I look forward to hopefully seeing if a revised version is possible.
Just Googled Control Chart. http://www.wikihow.com/Create-a-Control-Chart
The graph is out-of-control if any of the following are true:
- Any point falls beyond the red zone (above or below the 3-sigma line).
- 8 consecutive points fall on one side of the centerline.
- 2 of 3 consecutive points fall within zone A.
- 4 of 5 consecutive points fall within zone A and/or zone B.
- 15 consecutive points are within Zone C.
- 8 consecutive points not in zone C.
Zone A is +/- 1 std deviation, Zone B is 2 and Zone C is 3 std dev.
Do you wish all these rules to apply when re-adjusting the mean?
At this stage we are only focused on the rule of the 9 consecutive points above the cumulative average as the trigger to re-adjust the mean. Once this has been triggered the cumulative average would begin again with the search for another 9 points above or below the mean. Does this makes sense? If you see Paul's Excel example above this provides a good overview of the desired result.
I gave this some more thought (honestly, it bugged me a lot!) and I am see this challenge is a bit trickier than it seemed to be at first glance. It has a built in trap. I'll explain how.
Generate an average for a section. The section ends when 9 consecutive points are above or below the section average. The average should be based on all points in the section, excluding the 9th point. The next section begins at the 9th consecutive point.
The rule is a sort of recursive logic, where you have to iterate the size of each section until the end of data or until 9 consecutive points. With this rule, there are data sets cannot be resolved due to that logic loops that occur. With your specific data set you do not encounter the loop situation. You do however with the set below.
The data set below triggers the logic loop. As the first section is expanded to find the 9 points in a row that are below or above average, no such pattern is found until you include point 21 - a very high value.
Suddenly, the first 9 points in the section are below the cumulated average. The section should then end at point 8. You need to recalculate the average up until point 8. However, now you don't find any 8 consecutive points below or above average anymore. Not until the section once again is expanded until point 21. And so it goes…
You have a logical loop that never ends.
Since there are control diagrams with changing averages out there, they should have a different set of rules.
Sorry to not be able to help you more.
Thanks for taking another look at this and apologies for the delay in responding, i needed a couple of days not looking at this to keep my sanity!!!
I have taken your data set and put it into Pauls excel solution (see attached), it seems to calculate as expected in excel. Note the 21st point does not actually trigger the change as the data point (100000000) is still above the average even though all others are below. It does get triggered on line 30 though when (100000000) is no longer compares part of the 9 consecutive points.
You noted "The average should be based on all points in the section, excluding the 9th point." Please note the average actually includes the 9th consecutive point.
Based on the above do you think this is something you could take another look at? I feel like your initial approach was close and I think my notes here hopefully mean recursive logic is no longer an issue.
I think we may have a solution, please see QVW attached. This basically takes the logic of Paul's spreadsheet and puts it into a QlikView table/chart. If you have a chance it would be great if people could try and pick some holes in the solution to check it holds up.
I am also intrigued to see if people have any suggestions for:
1. Is there a better way to Count the number of points that are above or below the CumAvg instead of all the Pos. Line1 and Neg Line 1? If so we may be able to allow users to choose the number of consecutive points above ro below the average with a variable.
2. How to Add further rules as per Ericas original Control Chart.
a) Highlight the points that make up values that are above/below the average to trigger the step change.
b) Highlight series of points that are increasing and decreasing.
My fingers are crossed this is going to hold up under cross examination!!!! Thank you all again for soooooo much help. This would have got no where with out so many ideas bouncing around. I had all but given up hope that this was possible.
Erica, if this does end up a solution then please feel free to write it up in your blog.