When a count goes mad ... or how to count orders over time in DAX
A count of orders looks simple, doesn't it? But what happens when we have to broaden our scope to include more orders than visible in the current period? Let's look at possible solutions.
by Salvatore Cagliari
One of my clients approached me with the following two requests:
1. I would like to count all orders, including the open orders during a specific period.
2. I would like to categorise my customers based on the order count over time.
This sounded easy, but after some thought, I had to think twice about the possible solutions.
To solve these two tasks, we must manipulate the filter context to include more rows than visible in the current filter context.
The data model I used looked like this:
Figure 1 - Data model (For this article, I used the Contoso data model (see below for the source and license information).
The online sales table had three relationships to the date table:
- Order Date -> Date
- DueDate -> Date (Inactive)
- ShipDate -> Date (Inactive)
The first step was to define a truth table to specify the desired outcome:
Figure 2 - Truth table for open orders.
As you can see, each order that was created before the start of the observed period but shipped during the observed period is defined as open.
In addition, all orders created during the observed period have to be included in the count.
To fulfil these requirements, we need to create two measures:
1. Simple order count
This measure counts the order number in the fact table.
As each order has several order lines (one per product in the order), I had to use the DISTINCTCOUNT() function for this measure:
This measure returns the count of the orders in the current filter context.
2. Count of active orders
This measure is a little more complex, as I had to include all open orders that have their order date before the start of the observed period (the actual filter context for the date table):
I used the measure [Online Order Count] to count the orders.
Then I disabled the active relationship to the date table using CROSSFILTER().
This function makes sure that the filter context is not applied for this calculation. Alternatively, you can use ALL('Date') to remove the actual filter context on the date table.
Afterwards, I used the variable FirstActualDate, defined at the beginning of the measure, to include only orders that were created before the actual period/filter context and were not yet shipped.
Writing a measure to get all orders in the current filter context and all open orders, as defined in the truth table above, would have resulted in a highly complex measure. Consequently, I decided to change the last line of the measure to bring together the [Online Order Count] measure and the new measure to get the correct result:
Thus, I was able to include all other orders in the result with a straightforward approach.
This solution is much more complex.
The precise requirement was the following: Each customer must be observed and categorised by the number of orders over the last 180 days (the following requirements are adapted to the Contoso dataset from my client's original requirements):
1. When a customer has placed less than 20 orders, the customer has to be categorised as "To contact"
2. When a customer has placed between 20 and 35 orders, the customer has to be categorised as a "Normal customer"
3. When a customer has placed between 36 and 50 orders, the customer has to be categorised as a "Good customer"
4. When a customer has placed over 50 orders, the customer has to be categorised as a "Very Good customer"
My client emphasised his wish to make this categorisation for each day and each customer.
The single customer is not very important at this stage. But it is crucial to observe the development of all the customers over time.
My first thought was to calculate the distance between each customer's order and from that derive the category of each customer.
The most straightforward approach for getting this result is to create a calculated column. Another is to create a measure and calculate this information on the fly.
Each approach has benefits and drawbacks. I was sure that the calculated column could support the solution as each customer would change between categories over time. However, the final solution was a measure, as a customer can create 100 orders in one month but only ten in the following three months, which would shift them to another category over time.
On the way to find the solution in DAX
I needed 3 attempts to finally solve this problem. First, I used context transition to get the date of the previous order for each customer and each order. To learn more about context transition, read this article:
The next step was to create the measures – one measure for each category. Multiple measures are the only way to dynamically count the customers. The calculated column I used to generate a result. As the measure returned only a number when there was an order and the client needed one number per period, it was unfortunately impossible to get the correct result with this approach.
My second attempt to solve this challenge resulted in a measure that took over five minutes to return the result for the first two categories, which was unacceptable. The reason for this lengthy response was the GENERATE() function that I used.
I tried to generate one row with every combination of each customer and each row in the date table and used the resulting table in a FILTER() function to filter out the needed rows. To achieve this, I calculated the OrderCount column in the result of SUMMARIZE() using context transition ( CALCULATE ( DISTINCTCOUNT('Online Sales'[Sales Order Number]) ) ).
While I got a result with the Contoso Data Set, with 19'000 customers, it wouldn't have worked with my client's data: they had over 1.4 Million rows on their customer table. Moreover, the result was actually incorrect.
Trying to find a new approach, I actually didn't have to generate a table with one row for each date and each customer to count the orders. After all, I always had a date context when creating a report based on a date hierarchy.
Thus, I separated the problem into two questions:
- How many orders have been created over the last 180 days?
- How many orders has each customer created over the last 180 days?
Based on this information, I could categorise the customers.
The first measure was to count the orders over the last 180 days:
I generated a table with all dates before the actual filter context – 180 days. In a second step, I used CALCULATETABLE() and SUMMARIZE() to get a list of all orders over this time.
After having counted the rows of the second step, I used this information to count the orders per customer. For this calculation, I created the following measure for the first category:
As I needed to consider each customer, I had to iterate this throughout the entire customer table and check the order count for each customer separately.
Let’s look at this measure in more detail:
- COUNTROWS() counts the number of table rows with the count of orders per customer.
- CALCULATETABLE() generates a table.
a. Inside CALCULATETABLE(), I used VALUES(‘Customer[CustomerKey]) to get all customers.
b. Then I used FILTER() to filter the customer table by the number of orders per customer.
As FILTER() is an iterator, I leveraged context transition to get the order count over the last 180 days for each customer. The measure [Orders last 180 days] gets the actual customer as filter context and calculates the number of orders for each customer.
For all periods before the first order and after the last order, I had to use NOT ISBLANK([Orders last 180 Days]) to exclude periods without any order. Without this exclusion, I would have gotten the count of all customers for all periods, even without orders.
The result was a list of customers with more than 20 orders over the last 180 days. The rows in this list were the input for COUNTROWS():
Figure 3 - Result of final solution (figure by the author).
This approach had the following two advantages:
- I created a report for each customer, as the filter context per customer pre-filtered the customer table. This way, I got the correct result per customer.
- The calculation of the result was completed in a very short time.
Figure 4 - Query performance of solution (figure by the author).
You can see that the storage engine did most of the work (blue part). Even though the engine had 27 queries to do, they could be parallelised and executed in a very short time.
I transferred this solution to the Power BI file of my client with 1.4 million entries in their customer table.
The first visual took more than one minute to display the result, but after I had set a filter on the order year, the power BI completed the calculation of the result within less than a second.
Going forward, I will have to look at the performance and discuss it with my client to see if this is OK, or if I have to optimise the solution further.
While the solution for the first challenge was relatively straightforward, the second challenge proved to be much more difficult.
The reason why I struggled with solving the second problem was that I started thinking in the same way in which my client had defined the problem. As soon as I looked at it from a fresh perspective, the solution came to me. I only had to dissect the challenge into smaller pieces, and it worked.
Looking back, I recommend not to start searching for a solution while discussing the problem with your client. First, complete the definition of the challenge. Then sit back for a few minutes and look at the challenge after switching your mind to the "DAX mode".
In this article, I used the Contoso sample dataset. You can download the ContosoRetailDW Dataset for free from Microsoft here.
The Contoso Data can be freely used under the MIT License, as described here.
I enlarged the dataset to make the DAX engine work harder.
The online sales table contains 71 million rows (instead of 12.6 million rows), and the retail sales table contains 15.5 million rows (instead of 3.4 million rows).