Get performance data from Power BI with DAX Studio
You have slow Power BI reports and want to find out why? In this hack, I'll show you how to use DAX Studio to collect performance data from Power BI and find areas of improvement.
by Salvatore Cagliari
Slow loading times of reports in Power BI can be quite tedious. However, if you want to fix the problem, you first need to know where it lies. And the way to do this is to collect and analyze performance metrics.
To understand what these metrics tell us, you first need to understand the internal structure of a Power BI (or Analysis Services) data model. After that, we will look at an example of how to collect the metrics and what they mean.
Each tabular data model has two engines:
- Storage engine
- Formula engine
So, let's take a look at what different properties these two have and what tasks they perform.
Storage engine (SE)
The storage engine (SE for short) is the interface between the DAX query and the data stored in the table model. It takes any DAX query and sends it to the Vertipaq storage engine, which stores the data in the data model.
To do this, the SE uses a language called xmSQL, which is based on the standard SQL language, but supports only a few and simple operators: +, -, /, *, =, <> and IN. To aggregate data, xmSQL supports SUM, MIN, MAX, COUNT and DCOUNT (Distinct Count). It also supports GROUP BY, WHERE and JOINs. Accordingly, a basic understanding of SQL queries is very helpful to understand SE queries.
In terms of performance metrics, it is important to know that SE is a multithreaded component. I.e., it uses multiple CPU cores to speed up the execution of a query. In addition, the SE can cache queries and their results, which makes it execute a repeated execution of the same query faster.
Formula engine (FE)
The formula engine (FE for short) is the DAX engine. All functions that cannot be executed by the SE are executed by the FE. For example, the SE fetches the data from the data model and passes the result to the FE. This process is called "materialization" because the data is stored in memory to be processed by the FE.
The SE can also call the FE if an xmSQL query contains functions that the SE cannot perform. This operation is called "CallbackDataID".
Unlike the SE, the FE is single-threaded and has no cache. This means:
- No parallelism due to the use of multiple CPU cores
- No reuse of repeated execution of the same query
For this reason, one should outsource as many operations as possible to the SE and bypass materialization and CallbackDataID if possible. Unfortunately, however, it is not possible to directly define which part of the DAX code is executed by which engine. Thus, it is important to avoid certain patterns to ensure that the right engine does the job in the shortest possible time.
Let's now take a look at how we can use DAX Studio to retrieve performance data from Power BI and thus find out how much time is needed by each engine.
These are the top 3 hacks from Salvatore Cagliari
How can you get the performance data?
As a basic requirement to be able to retrieve performance metrics, you need DAX Studio on your computer. The download link is listed under the references if needed. If you can't install the software, it is possible to get a portable version from the same website. Just download the ZIP file, unzip it, save it in any local folder and then run DAXStudio.exe to use all functions without any restrictions.
Now you have to get the DAX queries from Power BI. To do this, you first need to start the performance analyzer in Power BI Desktop:
Figure 1: Starting the performance analyzer in Power BI Desktop
Once the performance analyzer window is visible, the recording of performance data and DAX query for all visuals can begin:
Figure 2: Starting the performance analyzer in Power BI Desktop
First click "Start recording", then click "Refresh visuals" to restart the rendering of all visuals of the current page. If you now click on the dropdown menu of a column in the list, you can check if the corresponding visual is activated. There is also a link to copy the DAX query to the clipboard.
Figure 3: Select visual and copy query to clipboard
As you can see, Power BI needed 80,606 milliseconds to complete the rendering of the matrix visuals. The DAX query alone took 80,194 milliseconds. This is a very bad sign for the Measure used in this visual.
Now DAX Studio can be started. If you have it installed on your computer, you can find it in the "External Tools" ribbon:
Figure 4: Starting DAX Studio in Power BI Desktop
DAX Studio will be started and automatically connected to the current Power BI Desktop file.
If you need to start DAX Studio separately, you can connect in the open Power BI Desktop as follows:
Figure 5: Manual connection of DAX Studio with Power BI desktop
After the connection is established, an empty query is opened in DAX Studio. In the lower part of the DAX Studio window there is a log area where you can see what is happening.
However, before you insert the DAX query from Power BI Desktop, you need to start the server timings in DAX Studio (upper right corner of DAX Studio window):
Figure 6: Starting the server timings in DAX Studio
After a few seconds, it will be reported in the log area that the trace has been started.
After you have pasted the query into the empty editor window, you have to activate the "Clear on Run" button and then “Run” the query:
Figure 7: Activating the "Clear on Run" function
"Clear on Run" ensures that the storage engine cache is cleared before the query is executed. Clearing the cache before measuring performance data is the best practice to get a consistent starting point.
After the query is executed, the Server Timings page is displayed at the bottom of the DAX Studio window:
Figure 8: Server Timings page in DAX Studio
Now you can see the information we're going to examine next to find out why the Power BI report was so slow.
What can you learn from this data?
On the left side of "Server Timings" you have insight into the execution times:
Figure 9: Execution times
Here you can see the following numbers:
- Total: The total execution time in milliseconds (ms)
- SE CPU: The total CPU time spent by the storage engine (SE) to execute the query (usually this number is larger than the total time because parallel execution uses multiple CPU cores)
- FE: The time spent by the formula engine (FE) and the percentage of the total execution time
- SE: The time spent by the storage engine (SE) and the percentage of the total execution time
- SE Queries: The number of storage engine queries required for the DAX query
- SE Cache: The usage of the storage engine cache, if it was used at all
As a rule of thumb: the larger the percentage of storage engine time compared to formula engine time, the better.
The middle section shows a list of Storage Engine queries:
Figure 10: List of storage engine queries
This list shows how many SE queries were executed for the DAX query and includes some statistical columns:
- Line/Index line: Usually, not all lines are shown. The full lines can be seen, if you click on the "Cache" and "Internal" buttons in the upper right corner of the server's time window. However, they are not very useful because they are an internal representation of the visible queries. Sometimes it can be helpful to see the cache queries to see which part of the query was accelerated by the SE cache.
- Subclass: Usually "Scan”
- Duration: Time spent for each SE query
- CPU: CPU time spent for each SE query
- Par: Parallelism of each SE query
- Rows and KB: Size of the "materialization" by the SE query
- Waterfall: time sequence of the SE queries
- Query: The beginning of each SE query
In this case, the first SE query returned 12,527,422 rows to the formula engine (the number of rows of the entire fact table), consuming 1 GB of memory. This is not good, as large "materialization" like this are performance killers.
It is clear that a significant mistake was made in this DAX code.
Finally, the actual xmSQL code of the selected row can be read:
Figure 11: Query code of the selected line
With this you can try to understand the problem of the DAX query. In the present case, you can see that there is a highlighted "CallbackDataID". DAX Studio makes all queries in the query list bold that contain a CallbackDataID.
As you can see, in this case an IF() function is passed to the formula engine (FE) because the SE cannot process this function. However, the SE knows that the FE can execute this function. So, it calls the FE for each row of the result. In this case, over 12 million times. As you could see in the measurement, this takes a lot of time.
Now you know that the written DAX code is faulty, and the SE calls the FE many times to execute a DAX function. Also, you need 1 GB of RAM to execute the query and the parallelism is only 1.9 times, which could be much better.
What can you do now?
The DAX query contains only the query created by Power BI Desktop. But in most cases, we need the measure code to find and fix the error.
DAX Studio provides a function called "Define Measures" to extract the DAX code of the measure from the data model:
- Insert one or two blank lines at the beginning of the query
- Put the cursor on the first (empty) row
- Search for the key figure in the data model
- Click with the right mouse button on the measure and then on "Define Measure"
- If your measure calls another measure, you can click “Define Dependent Measures”. With this feature, DAX Studio extracts the code of all measures used by the selected measure.
Figure 12: Defining measures in DAX Studio
The result is a DEFINE statement followed by one or more MEASURE statements containing the DAX code of the measure that is responsible for the slow report.
Now the measure's code can be tweaked without modifying the measure in the data model.
After tweaking the code in this example, I ran the new query and compared the server times to the original data:
Figure 13: Comparison between slow and fast DAX code
Now the whole query took only 55 milliseconds, and the SE created a "materialization" of only 19 lines. The parallelism is 2.6 times, which is better than 1.9 times. It looks like the SE does not need that much processing power to increase the parallelism.
This is a very good sign.
The final step now is to transfer the modified measure code into the data model to optimize the report itself.
If you have a slow visual in the Power BI report and want to find out the reason, you need some information first. Using the performance analyzer in Power BI Desktop, you can find out where the time is spent rendering the result of the visual.
If you see that the DAX query is taking a lot of time to execute, you need to use DAX Studio – as described in this hack – to figure out the problem and fix it.
But fixing performance problems in measures, is another story – which I will show you in a separate hack …
Download DAX Studio here for free
Free SQLBI Tools Training: DAX Tools Video Course - SQLBI
I use the Contoso sample dataset as in my previous articles. You can download the ContosoRetailDW dataset for free from Microsoft here.