OData has been adopted by many software solutions and has been around for many years. Most solutions are using the OData is to serve their transactional processes. But as we know, Power BI is an analytical solution that can fetch hundreds of thousands (or millions) rows of data in a single table. So, obviously, OData is not optimised for that kind of purpose. One of the biggest challenges many Power BI developers face when working with OData connections is performance issues. The performance depends on numerous factors such as the size of tables in the backend database that the OData connection is serving, peak read data volume over periods of time, throttling mechanism to control over-utilisation of resources etc…
So, generally speaking, we do not expect to get a blazing fast data refresh performance over OData connections, that’s why in many cases using OData connections for analytical tools such as Power BI is discouraged. So, what are the solutions or alternatives if we do not use OData connections in Power BI? Well, the best solution is to migrate the data into an intermediary repository, such as Azure SQL Database or Azure Data Lake Store or even a simple Azure Storage Account, then connect from Power BI to that database. We must decide on the intermediary repository depending on the business requirements, technology preferences, costs, desired data latency, future support requirement and expertise etc…
But, what if we do not have any other options for now, and we have to use OData connection in Power BI without blasting the size and costs of the project by moving the data to an intermediary space? And.. let’s face it, many organisations dislike the idea of using an intermediary space for various reasons. The simplest one is that they simply cannot afford the associated costs of using intermediary storage or they do not have the expertise to support the solution in long term.
In this post, I am not discussing the solutions involving any alternatives; instead, I provide some tips and tricks that can improve the performance of your data refreshes over OData connections in Power BI.
The tips in this post will not give you blazing-fast data refresh performance over OData, but they will help you to improve the data refresh performance. So if you take all the actions explained in this post and you still do not get an acceptable performance, then you might need to think about the alternatives and move your data into a central repository.
If you are getting data from a D365 data source, you may want to look at some alternatives to OData connection such as Dataverse (SQL Endpoint), D365 Dataverse (Legacy) or Common Data Services (CDS). But keep in mind, even those connectors have some limitations and might not give you an acceptable data refresh performance. For instance, Dataverse (SQL Endpoint) has 80MB table size limitation. There might be some other reasons for not getting a good performance over those connections such as having extra wide tables. Believe me, I’ve seen some tables with more than 800 columns.
Some suggestions in this post apply to other data sources and are not limited to OData connections only.
Suggestion 1: Measure the data source size
It is always good to have an idea of the size of the data source we are dealing with and OData connection is no different. In fact, the backend tables on OData sources can be wast. I wrote a blog post around that before, so I suggest you use the custom function I wrote to understand the size of the data source. If your data source is large, then the query in that post takes a long time to get the results, but you can filter the tables to get the results quicker.
Suggestion 2: Avoid getting throttled
As mentioned earlier, many solutions have some throttling mechanisms to control the over-utilisation of resources. Sending many API requests may trigger throttling which limits our access to the data for a short period of time. During that period, our calls are redirected to a different URL.
Tip 1: Disabling Parallel Loading of Tables
One of the many reasons that Power BI requests many API calls is loading the data into multiple tables in Parallel. We can disable this setting from Power BI Desktop by following these steps:
- Click the File menu
- Click Options and settings
- Click Options
- Click the Data Load tab from the CURREN FILE section
- Untick the Enable parallel loading of tables option
With this option disabled, the tables will get refreshed sequentially, which significantly decreases the number of calls, therefore, we do not get throttled prematurely.
Tip 2: Avoiding Multiple Calls in Power Query
Another reason (of many) that the OData calls in Power BI get throttled is that Power Query calls the same API multiple times. There are many known reasons that Power Query runs a query multiple times such as checking for data privacy or the way that the connector is built or having referencing queries. Here is a comprehensive list of reasons for running queries multiple times and the ways to avoid them.
Tip 3: Delaying OData Calls
If you have done all the above and you still get throttled, then it is a good idea to review your queries in Power Query and look to see if you have used any custom functions. Especially, if the custom function appends data, then it is highly likely that invoking function is the culprit. The amazing Chris Webb explains how to use the
Function.InvokeAfter() function on his blog post here.
Suggestion 3: Consider Querying OData Instead of Loading the Entire Table
This is one of the best ways to optimise data load performance over OData connections in Power BI. As mentioned earlier, some backend tables exposed via OData are pretty wide with hundreds (if not thousands) of columns. A common mistake many of us make is that we simply use the OData connector and get the entire table and think that we will remove all the unnecessary columns later. If the underlying table is large then we are in trouble. Luckily, we can use OData queries in the OData connector in Power BI. You can learn more about OData Querying Options here.
If you are coming from an SQL background, then you may love this one as much I do.
Let’s have a look at the OData query options with an example. I am using the official test data from the OData website.
- I initially load the OData URL in the Power Query Editor from Power BI Desktop using the OData connector
- Select the tables, remember we will change the Source of each table later
This is what many of us typically do. We connect to the source and get all tables. Hopefully we get only the required ones. But, the whole purpose of this post is not to do so. In the next few steps, we change the Source step.
- In the Power Query Editor, select the desired query from the Queries pane, I selected the PersonDetails table
- Click the Advanced Editor button
- Replace the OData URL with an OData query
- Click Done
As you can see, we can select only the required columns from the table. Here are the results of running the preceding query:
In real-wrold scenarios, as you can imagine, the performance of running a query over an OData connection would be much better than getting all columns from the same connection and then removing unwanted ones.
The possibilities are endless when it comes to querying a data source and OData querying in no different. For instance, let’s say we require to analyse the data for people older than 24. So we can narrow down the number of rows by adding a filter to the query. Here are the results:
Some Extra Resources to Learn More
Here are some invaluable resources for your reference:
While I was looking for the resources I found the following amazing weblogs. There are very good reads:
As always, I would be happy to know about your opinion and experience, so leave your comments below.
2 thoughts on “Optimising OData Refresh Performance in Power Query for Power BI and Excel”
I’ve read so many posts and articles about this topic.
I’ve read through the documentation forwards and backwards.
This article finally delivers what I needed, thank you so much.
I’m going to link people back here at every opportunity.
Thank you, finally a succinct answer!