Best Practices Re: a Calendar Table and Multiple D... - Qlik Community

Report Inappropriate Content · ‎2011-11-28

Howdy. I've searched through the forum, but not found a post which quite addresses this question. This one was close, but represented the opposite problem. The "similar posts" tool found more results han my searching: this post, this one, and this one cover some solutions, but don't discuss which is best, or more generally what advantages and disadvantages exist. What's easiest? What's most commonly used? Has anyone used one of these solutions for 12 months or so and has some feedback on how well it's worked?

A common practice in data warehouses is to have a Dates table which gives values such as year, quarter, and month for each day. It looks like this is common in QlikView too, and the name "calendar table" seems to be widely-used. If this calendar table has a Date field as its key, any queries which also contain a Date field will automagically join to it. Groovy!

But... what do you do when you have multiple fact tables, with different date fields, which should not join together? For example, say there's a Customers.Date field representing the date the customer was created, and an Orders.Date field representing the date an order was placed. A report which contains both fields will need to rename at least one of them, breaking the join.

Also, a fact table may contain multiple date dimensions. In my case, the Orders table has OrderDate, DatePaid, and ShipDate fields. I could create three more calendar tables, with key fields of OrderDate, DatePaid, and ShipDate, respectively. This seems cumbersome; is this the best practice, or is there a simpler option?

In SQL, I'd write three JOINs, to the same table each time but specifying a different field from the first table each time. But QlikView seems to be restricted to natural joins, which as a DBA makes me wince.

Instead of using one or more calendar tables, you could just import the date values as dates, and use QlikView date functions to populate your date hierarchies. Is this what most users end up using?

I appreciate any input. Thank you!

johnw · ‎2011-11-28

I think your third link above covers the two basic options - create a separate calendar for each field, or create a new table that links dates by date type to a single calendar. It also discusses the advantages and disadvantages of each approach, though mostly in "how the users will interact with it" terms, which in my opinion is primary.

In your case, it sounds like you want to keep all these dates separate. You would never want to select November 28, 2011 in a single Date field, and see orders that were ordered, paid OR shipped on that date. If you want to see orders ordered on that date, you would explicitly select that OrderDate. That's option 1 in the third linked thread, which is also the easiest and clearest for most developers, I'd guess.

I do consider a separate calendar table for each distinct kind of date to be the best practice when you want your dates to be distinct. However, there's no real disadvantage to handling it with SQL-like joins. Denormalization isn't really a problem in QlikView, and any extra memory it might have required will be recovered by compression. If anything, the user interface may actually work marginally faster. We tend to not do this, though, perhaps because the script may be a little slower, or maybe just because everyone else seems to do it with a separate calendar, so that's just what everyone learns, even if there's no real reason for it.

What I've personally done, though I'm not sure that this is common, is create a Calendar.qvd. This helps me enforce standards and centralizes logic if we, say, decide to change our definition of week numbers. If you wished, your example could work with a resident calendar, or you could generate a new calendar for every field. Anyway, the script for the SQL-like join approach would be something like this:

LEFT JOIN (Orders)
LOAD
Date as OrderDate
,Month as OrderMonth
,Year as OrderYear
FROM Calendar.qvd (QVD)
;
LEFT JOIN (Orders)
LOAD
Date as DatePaid
,Month as MonthPaid
,Year as YearPaid
FROM Calendar.qvd (QVD)
;

And so on for every date field in every table. I can't think of anything wrong with that if you prefer the explicit joins instead of QlikView's association. I suspect it is more common to have separate tables, but there's really no need. The same is true even if you just have a single date field - it may be most common for us to create a calendar table, but it isn't necessary, and there's no real reason NOT to just join the data.

Report Inappropriate Content · ‎2011-11-29

Correct; I'd like users to be able to filter on each date field separate. Orders placed in September, paid in October, and cancelled in November, for example.

I have a Dates QVD, so your rename-and-join method works, which would save me from over-normalizing my fact tables. What makes me hesitate it that I'd have to write dozens of lines of aliases, one for each of the 22 fields in the date table x the number of date fields in my fact table. That's well over 100 lines, to be repeated in almost all reports. Thousands of lines of code, just to stop the system from joining! And if someday a date field is renamed or deleted, I'd need to change the script for every report, making one change for each date field, so several hundred manual changes. This wouldn't be so bad if one could edit QlikView scripts as plain text, separately from the associated reports.

One could save typing by QUALIFYing each copy of the dates table, but then the fields would have ugly names like "Order Date.Is year-to-date". Since field names are the primary interface for non-technical users, I'd like to keep them "pretty".

Using multiple QVDs, one for each date field in the warehouse, would still require hundreds of aliases, but at least I would not have to repeat them in each report. It bugs me to store the same data a dozen times over, though it's a small table.

These all seem like hacks, painful to build and fragile in use. Are there other approaches? Is there's no way to specify the field(s) by which tables are to be joined, other than aliasing every field?

johnw · ‎2011-11-29

This seems a fundamental limitation of what you want your users to see, not a QlikView limitation. Note that if you were doing this purely with SQL, you can join any way you want, but if you want your users to see a field DatePaid, either you have a table with DatePaid on it, or somewhere in your SQL you're going to see Date as DatePaid.

As far as a better way to do all that aliasing, I'm running off to a meeting, but I think you could use qualify initially, then use a loop through the fields, renaming them according to your pattern. I'm sure we could come in well under 100 lines if you were doing a load *. Though I'd personally avoid load * because it opens you up to unintended associations if people add new fields to the calendar QVD.

Report Inappropriate Content · ‎2011-11-29

Well, the big difference in SQL is that I'd only put aliases on the fields I'm using, so three or four per report, not dozens.

Hmm, so one can rename specific fields after the load? That would be something, I could live with ugly "Order Date"."Is year-to-date" fields if it meant I just needed to alias a few fields in each report. Something like this?

Orders:
LOAD * FROM Orders.qvd (QVD);
OrderDate:
QUALIFY *;
ALIAS OrderDate.%DateKey AS %OrderDateKey;
LOAD * FROM Dates.qvd (QVD);

It's a bit ugly (the resulting QVW, that is; the script is nice and consise), but it works. I am torn as to whether to accept this or to create the dozen separate QVDs after all.

johnw · ‎2011-11-29

So only rename the fields you're using. Why would you load and force yourself to rename fields you don't need? As I suggested earlier, LOAD * FROM Dates.qvd (QVD); is a dangerous practice that leaves you open to unintended associations as people add new fields to the QVD. It also wastes memory to load fields you don't need. Same for ANY load, really. Avoid LOAD * except from something like an inline table where you have full and obvious control over what's being loaded.

Following up on what I mentioned in my previous post, I've come up with the attached example that automatically creates calendars associated with every date field in every table. Again, I wouldn't do the load * like in this example, but you could. I suspect it also has a bug if you have the same date field in two different tables already, as it will probably try to load the same calendar and field names twice. I'm sure it's a solvable problem, but I wouldn't actually use this code anyway, so it's just an example of what's possible. As an example, I think it's good enough. In practice, what I do is load the specific fields I need, no more, no less, and rename as appropriate in the load.

Edit: I suspect the script can be simplified.

Report Inappropriate Content · ‎2011-12-01

It's necessary to rename all the fields for two reasons. Because I'm producing raw data which other people may use, I can't be sure about which fields they may want. Also, if I leave any field names as-is, QlikView will join my date tables together (sigh).

I'm OK with "LOAD *" (flashback to the Commodore 64 magazine), because I have control over the QVDs and the SQL views from which they get their data. No one else will be changing these files. Plus it's tedious typing every field thrice over (once in the SQL field list, and twice in the LOAD list). The QVDs need all fields, per se; they must store everything so that actual QVWs have access to everything.

Thanks for the sample file. So if I used lower-case names in the Dates.qvd file, and "Order date" as the field name in the orders table, I'd end up with "Order date year", "Order date month", "Order date is business day", "Order date is year-to-date", etc. Nice! That's a little prettier than QUALIFY, which gives me "Order date.Year", "Order date.Month", etc. The downside is 20 lines of code which would need to appear in every script, but there should be no reason for the script to change if, say, we add or rename a field, so that's probably not a time bomb.

I'm still curious about what other approaches people have used, and among the two talked about in this thread and the linked threads, which is more popular and why. Are there long-term gotchas with either approach?

johnw · ‎2011-12-01

You can make arbitrarily-complicated logic to recognize your date fields, probably up to and including regular expressions, though I'd have to refresh my memory on how that is done. This was just an example, and an example that assumed your naming convention for date fields was '* Date'. If it's '* date', then use ' date' where I used ' Date'. Alternatively, just use a "for each in" loop, and specify the date fields you want to build calendars for. The example was trying to go all the way, where merely adding a date field to some QVD the program knows nothing about would result in a calendar being loaded for that date field. But following your earlier argument, you have control over the QVDs, so when you add a date field, you can know to put it in the list. A combination might also work, where anything following the standard pattern would be picked up automatically, while anything that didn't could be placed in an explicit list.

As far as which of the two main approaches is more popular, I don't see how popularity has anything to do with it because they aren't two ways of solving the same problem. They're solutions to two DIFFERENT problems. Your users' requirements should determine which approach you use, or if you use both at once. So let's say you have an application that displays order data. This system has two dates, an order date and a ship date.

Do your users want to be able to select December 1, 2011, and see all orders placed OR shipped on that date? Then create a single generic calendar linked by date type.
Do your users want to be able to see orders that were both placed on November 15, 2011 and then shipped on November 20, 2011? Then create separate calendars.
Do your users want to be able to do either of these things? Then create separate calendars AND a generic calendar.

I'm going to guess that the separate calendars are more popular in the community by far. I'm also going to guess that this has little to do with user requirements, and more to do with it being more easily understood by more developers, so that's just what they do. If users ask for the other requirement, I expect many developers to get confused, or even say "QlikView can't do that", and leave it at that. In my own applications, I suspect that it's around a 50/50 mix, and I think I only have one application with both approaches at the same time, and even there, one approach (generic calendar) is primary and obvious, and the other is secondary and somewhat hidden, supplied only for a single power user. We've been using QlikView for 6 years using both approaches, and I don't think there have been any gotchas with either. I'm sure there has been SOME confusion with both approaches, but none that ever made it back to me (and I get a lot of phone calls and emails from users).

Edit: One of my applications has the generic calendar approach but can still answer the kind of questions that separate calendars answer. It does this through an "add to memory" button. So for the example you'd select the "order" date type, and select November 15, 2011. Then hit "add to memory", and what it does is select all of the possible order IDs. Then select the "shipped" date type, and select November 20, 2011. Now you have the same orders as you would have had with separate calendars. A user could of course do this manually without a button. But either way, if your users really want to do that, I'd just provide separate calendars, since that makes it easier for them.

Sorry, I don't mean to monopolize the replies when you're surely trying to get more of a community perspective instead of my personal perspective. I should probably stop answering and give other people a chance to provide their own opinions. And maybe people will be more likely to do that if I make it obvious like this...

What are other people's thoughts?

Report Inappropriate Content · ‎2011-12-01

Sure; I made a few tweaks to your code, mostly to clarify what was happening, but that's just detail. This is exactly what I needed!

I should have been more clear: the two approaches I was contrasting were (A) multiple date files or (B) joining to the same date file multiple times, using some mechanism to rename its fields for each join. I see how the former approach, which does not require any scripting, would appeal to most users.

Certainly what's most popular shouldn't be the only factor, or even primary, but if most people are doing one thing, it deserves consideration. We've examined this from a couple of angles over the past few days, but there may be more approaches we haven't considered. Though if there is such an animal, given six years with QlikView and 4,500 forum posts, you probably would've seen it by now.

johnw · ‎2011-12-01

I think in your situation, with a whole lot of different date fields, and wanting all the calendar information for each, I'd left join all the calendar data back onto the main QVD multiple times. I wouldn't want a separate calendar QVD for each date field - that's a lot of joins in the user application. And I wouldn't want a single calendar QVD, because that's just as many joins PLUS a bunch of renames. So I'm thinking you modify the loop to do the joins when you're creating the main QVD.

I haven't thought about it long and hard, but that's certainly my initial impulse.

In fact, it has me wondering if I should do something like that around here. We DO have a lot of dates on the order item, but don't tend to use more than one or two in any given user application. That tends to make the load from a generic calendar QVD pretty simple, and I didn't want to inflate the size of my QVDs, so the choice seemed clear at the time. But maybe we shouldn't be doing the generic calendar load at all. Maybe all the date fields should have all calendar fields associated with them by default in the order item QVD. Maybe I've been doing it the "wrong" way all along.

Hmmm.

Edit: Oh, and I'd guess that neither of your approaches is the most popular. I'd guess that the most popular approach is to just add additional fields in the user application during the QVD load, simple month(Date) as Month types of things, deoptimizing the load, or as a left join by key with probably similar performance penalties. I'd bet that the second most popular approach would be to generate calendar tables after the main load in the user application, which can have high performance if done with fieldvalue(). A calendar QVD is probably at least third on the list. Multiple calendar QVDs or building everything into the main QVD I'm guessing are rather rare. So I'm thinking the best approach for you is probably one of the most rare things to see people actually do.

Best Practices Re: a Calendar Table and Multiple Date Fields