Friday, 29 June 2018

Reading Reports: An analogy

I was trying to think of the right analogy for what finding and seeing patterns in data is like, and I think the closest I'm going to get is this:


The objects making up the logo are there all along, hanging in seemingly disconnected places in the air. But as the camera's point of view moves through the scene, suddenly we look at them from the right point of view, and then the 4 is clearly visible.

That's what finding something in the data is like: it's always there, but you need to be standing in the right place, thinking of the right questions, to see it.


(
Obviously there are more forced pespective sculptures in the world than just the Channel 4 logo, but that was the one that came most readily to mind. Here's one:

 
(

Wednesday, 27 June 2018

Reading Reports

One of the challenges we're working on at the moment is helping people read reports.

The funny thing is, writing reports is easy. (Easy, if your SQL and XSLT are up to scratch, but they aren't very hard skills to learn). When we started out, the system shipped with hardly any reports, and I said to people "I've got few reports here, but if there's a report that you need that we don't have, we'll write it for you. Writing reports is easy". So now, we have loads of reports. And it turns out that reading reports is hard.

Even reading report names is hard. Because we now have loads of reports, and because it's hard to describe what they do and how they work in a single sentence, our most common report related support question is "Which report contains the information I want?". Right now someone's going over all of them with a view to rationalising them and improving the clarity of the description. I don't know what the answer is going to be but I expect it will look like more tags and more filtering options.

The reason why it's hard to find the report you want is that there are lots of ways a report can do its calculations.

Suppose you sell a ticket on the 1st of the month for show A, and then you exchange it for a ticket for show B a week later.

Some reports are based on the transaction log, so if you ask for the sales for show A broken down by day for that month, you see a positive entry on the 1st, and a negative entry on the 8th. If you ran the report just for the first week of the month, you wouldn't know that anything had been cancelled at all. Reports based on these transaction logs are useful for payment reconciliations: if you're trying to reconcile what was in the bank at the end of the 1st with what the database says you've sold, then you absolutely need to know about that sale: from the point of view of the bank statement at the end of the day, the cancellation hasn't happened yet.

Some reports are based on the current status of the orders / tickets. If you run a report saying "how many seats have been sold for show A as of now" and then you cancel an order and then you run the report again, the cancelled order no longer shows up. We don't care that it had been sold in the past and then cancelled, we just want to know whether or not it is sold right now. And even these reports can be broken down by day: "of the tickets that are currently sold for this show, how many were sold per day over the last month" is a valid question. If you ran a report like that on our scenario above, the tickets that were sold and then cancelled wouldn't show up at all - even if you filtered the report down to "the first 7 days of the month".  Reports based on the current state of orders are useful for sales analysis and promoter returns: if you want to know whether an advert you bought on the 2nd performed well and you're looking at sales you made on the 3rd, you definitely want to ignore any sales that were made on the 3rd and subsequently cancelled: those don't count towards your ad performance.

So the second most common question is "Why do these two reports on the same date range return different numbers?". And the answer to that is usually this difference in the basis for calculation: both methods are "right" in different contexts, but they'll give you slightly different answers.

If it's not that, then it's either "One figure includes booking fees and transaction fees and delivery fees" (useful for payment reconciliation) and "One figure is just the ticket value net of fees" (useful for communicating with promoters: the fees you charge are none of their business), or sometimes even "One report's date range is filtering by performance date and the other is filtering by order date".

Given that people consuming reports don't much want to understand how they work, and that reports can work in different ways that it's hard to express without a couple of paragraphs of text, that makes reading reports hard.

What people say they want is "a sheet with all the data so that I can see the various combinations for myself". But although such reports exist, they're not much use. The raw data is just a list of ticket amounts with transaction dates - plus an awful lot of parameters that they have to be grouped by in order to mean anything much: show or performance or genre or order date or performance date or sales channel or ledger code... The best reports answer a specific question: generic lists of numbers to allow someone to figure it out themselves don't get used much. You can reconstruct a sales movement report over a month by giving someone the job of running the "payment reconciliation today" report every day of the month and coping the figures into a spreadsheet (I've seen it done in venues converting from another system to Monad), but really, your sales movement by day is one thing and your payment reconciliation another.

Because good reports can only answer one question at a time, if you're looking for strategic information on what you're doing well and what you could be doing better, don't go looking in your day-to-day operational reports. Your front of house report is designed to give the FOH manager everything they need to get the audience in and the show open: it will be hard work spotting long term sales trends in it. Your payment reconciliation reports are designed to make sure the numbers we calculate for sales today align with the numbers in the bank account - broken down by transaction date and payment type, but not by show or genre. Your promoter sales returns show the final destination of the show's sales performance, but not the route it took to get there.

Reading reports with an eye to identifying challenges and opportunities means reading reports to identify patterns, and those patterns won't be visible if you aren't asking the right questions. You might be interested in how far in advance of a show people book, and a report like "Chart: Advance Booking Trends" with parameters "All Films" / "Performance in the last week"


(scale obscured because this is from real data)

might make you think that most of your cinema sales are on the door and you shouldn't worry too much about advance booking... but drilling down into just a single PG rated film shows a different story:



People who've promised their children that they're going to see a film behave differently from the general public - who knew?! I didn't, until I looked.

And a music event tells a very different story:

With people either sure they're going well in advance (dedicated fans - but not enough to sell out) and people deciding on the night, but almost no sales in between.

Your ticketing system will be collecting an enormous amount of data for you, and that data is there to be analysed, but in order to do that you have to start with a question. You can't just stare at reports hoping that the answer will leap out at you like magic lawyer speed reading a telephone directory in a movie.

Let's suppose we want to sell more tickets, because who doesn't? Well, to sell more tickets, we need to know why people buy tickets, and what stops people from buying tickets. Making people want to buy tickets is someone else's job (and, the show has to be good), but it's easy to want to go to a show and not buy a ticket. There are loads of plays and films and gigs I want to see that I don't buy tickets for. So let's look at what stops people buying tickets. Is it money? Then let's look at sales by day of the month. If most people get paid at the start of the month we might expect to see on average more orders on the 5th than on the 25th. Is it the time it takes to make the booking? Then let's look at sales by time of day, and if people are too busy to research and book tickets at work we might see a spike at 7pm when they get home. Is our box office open to help with these calls, or do most staff go home at the same time as our customers do? Is it that they're too sober? Let's look as see if there's a spike at 11pm when the pubs kick out. Is it the time that it takes to attend the performance? Then let's get some hard numbers on how much Saturday tickets out sell Wednesday tickets, and use that to inform how much more we charge for a ticket on the weekend or during half term.

I don't even work at a venue, and there's a bunch of questions right there that I find interesting and that I wish that my customers asked more often. We don't currently have reports that break down sales by time of day or day of week, because literally no one has ever asked, but that's exactly the kind of report I would be delighted to write if I thought it would ever get used.