Optimising OData Refresh Performance in Power Query for Power BI and Excel

OData has been adopted by many software solutions and has been around for many years. Most solutions are using the OData is to serve their transactional processes. But as we know, Power BI is an analytical solution that can fetch hundreds of thousands (or millions) rows of data in a single table. So, obviously, OData is not optimised for that kind of purpose. One of the biggest challenges many Power BI developers face when working with OData connections is performance issues. The performance depends on numerous factors such as the size of tables in the backend database that the OData connection is serving, peak read data volume over periods of time, throttling mechanism to control over-utilisation of resources etc…

So, generally speaking, we do not expect to get a blazing fast data refresh performance over OData connections, that’s why in many cases using OData connections for analytical tools such as Power BI is discouraged. So, what are the solutions or alternatives if we do not use OData connections in Power BI? Well, the best solution is to migrate the data into an intermediary repository, such as Azure SQL Database or Azure Data Lake Store or even a simple Azure Storage Account, then connect from Power BI to that database. We must decide on the intermediary repository depending on the business requirements, technology preferences, costs, desired data latency, future support requirement and expertise etc…

But, what if we do not have any other options for now, and we have to use OData connection in Power BI without blasting the size and costs of the project by moving the data to an intermediary space? And.. let’s face it, many organisations dislike the idea of using an intermediary space for various reasons. The simplest one is that they simply cannot afford the associated costs of using intermediary storage or they do not have the expertise to support the solution in long term.

In this post, I am not discussing the solutions involving any alternatives; instead, I provide some tips and tricks that can improve the performance of your data refreshes over OData connections in Power BI.

Notes

The tips in this post will not give you blazing-fast data refresh performance over OData, but they will help you to improve the data refresh performance. So if you take all the actions explained in this post and you still do not get an acceptable performance, then you might need to think about the alternatives and move your data into a central repository.

If you are getting data from a D365 data source, you may want to look at some alternatives to OData connection such as Dataverse (SQL Endpoint), D365 Dataverse (Legacy) or Common Data Services (CDS). But keep in mind, even those connectors have some limitations and might not give you an acceptable data refresh performance. For instance, Dataverse (SQL Endpoint) has 80MB table size limitation. There might be some other reasons for not getting a good performance over those connections such as having extra wide tables. Believe me, I’ve seen some tables with more than 800 columns.

Some suggestions in this post apply to other data sources and are not limited to OData connections only.

Suggestion 1: Measure the data source size

It is always good to have an idea of the size of the data source we are dealing with and OData connection is no different. In fact, the backend tables on OData sources can be wast. I wrote a blog post around that before, so I suggest you use the custom function I wrote to understand the size of the data source. If your data source is large, then the query in that post takes a long time to get the results, but you can filter the tables to get the results quicker.

Suggestion 2: Avoid getting throttled

As mentioned earlier, many solutions have some throttling mechanisms to control the over-utilisation of resources. Sending many API requests may trigger throttling which limits our access to the data for a short period of time. During that period, our calls are redirected to a different URL.

Tip 1: Disabling Parallel Loading of Tables

One of the many reasons that Power BI requests many API calls is loading the data into multiple tables in Parallel. We can disable this setting from Power BI Desktop by following these steps:

  1. Click the File menu
  2. Click Options and settings
  3. Click Options
  4. Click the Data Load tab from the CURREN FILE section
  5. Untick the Enable parallel loading of tables option
Disabling Parallel Loading of Tables in Power BI
Disabling Parallel Loading of Tables in Power BI Desktop
Continue reading “Optimising OData Refresh Performance in Power Query for Power BI and Excel”

Incremental Refresh in Power BI, Part 1: Implementation in Power BI Desktop

Incremental-Refresh-in-Power-BI-Part-1-Implementation-in-Power-BI-Desktop

Incremental refresh, or IR, refers to loading the data incrementally, which has been around in the world of ETL for data warehousing for a long time. Let us discuss incremental refresh (or incremental data loading) in a simple language to better understand how it works.

From a data movement standpoint, there are always two options when we transfer data from location A to location B:

  1. Truncation and load: We transfer the data as a whole from location A to location B. If location B has some data already, we entirely truncate the location B and reload the whole data from location A to B
  2. Incremental load: We transfer the data as a whole from location A to location B just once for the first time. The next time, we only load the data changes from A to B. In this approach, we never truncate B. Instead, we only transfer the data that exists in A but not in B

When we refresh the data in Power BI, we use the first approach, truncation and load, if we have not configured an incremental refresh. In Power BI, the first approach only applies to tables with Import or Dual storage modes. Previously, the Incremental load was available only in the tables with either Import or Dual storage modes. But the new announcement from Microsoft about Hybrid Tables greatly affects how Incremental load works. With the Hybrid Tables, the Incremental load is available on a portion of the table when a specific partition is in Direct Query mode, while the rest of the partitions are in Import storage mode.

Incremental refresh used to be available only on Premium capacities, but from Feb 2020 onwards, it is also available in Power BI Pro with some limitations. However, the Hybrid Tables are currently available on Power BI Premium Capacity and Premium Per User (PPU), not Pro. Let’s hope that Microsft will change its licensing plan for the Hybrid Tables in the future and make it available in Pro.

I will write about Hybrid Tables in a future blog post.

When we successfully configure the incremental refresh policies in Power BI, we always have two ranges of data; the historical range and the incremental range. The historical range includes all data processed in the past, and the incremental range is the current range of data to process. Incremental refresh in Power BI always looks for data changes in the incremental range, not the historical range. Therefore, the incremental refresh will not notice any changes in the historical data. When we talk about the data changes, we are referring to new rows inserted, updated or deleted, however, the incremental refresh detects updated rows as deleting the rows and inserting new rows of data.

Benefits of Incremental Refresh

Configuring incremental refresh is beneficial for large tables with hundreds of millions of rows. The following are some benefits of configuring incremental refresh in Power BI:

  • The data refreshes much faster than when we truncate and load the data as the incremental refresh only refreshes the incremental range
  • The data refresh process is less resource-intensive than refreshing the entire data all the time
  • The data refresh is less expensive and more maintainable than the non-incremental refreshes over large tables
  • The incremental refresh is inevitable when dealing with massive datasets with billions of rows that do not fit into our data model in Power BI Desktop. Remember, Power BI uses in-memory data processing engine; therefore, it is improbable that our local machine can handle importing billions of rows of data into the memory

Now that we understand the basic concepts of the incremental refresh, let us see how it works in Power BI.

Implementing Incremental Refresh Policies with Power BI Desktop

We currently can configure incremental refresh in the Power BI Desktop and in Dataflows contained in a Premium Workspace. This blog post looks at the incremental refresh implementation within the Power BI Desktop.

After successfully implementing the incremental refresh policies with the desktop, we publish the model to Power BI Service. The first data refresh takes longer as we transfer all data from the data source(s) to Power BI Service for the first time. After the first load, all future data refreshes will be incremental.

How to Implement Incremental Refresh

Implementing incremental refresh in Power BI is simple. There are two generic parts of the implementation:

  1. Preparing some prerequisites in Power Query and defining incremental policies in the data model
  2. Publishing the model to Power BI Service and refreshing the dataset

Let’s briefly get to some more details to quickly understand how the implementation works.

  • Preparing Prerequisites in Power Query
    • We require to define two parameters with DateTime data type in Power Query Editor. The names for the two parameters are RangeStart and RangeEnd, which are reserved for defining incremental refresh policies. As you know, Power Query is case-sensitive, so the names of the parameters must be RangeStart and RangeEnd.
    • The next step is to filter the table by a DateTime column using the RangeStart and RangeEnd parameters when the value of the DateTime column is between RangeStart and RangeEnd.

Notes

  • The data type of the parameters must be DateTime
  • The datat tpe of the column we use for incremental refresh must be Int64 (integer) Date or DateTime.Therefore, for scenarios that our table has a smart date key instead of Date or DateTime, we have to convert the RangeStart and RangeEnd parameters to Int64
  • When we filter a table using the RangeStart and RangeEnd parameters, Power BI uses the filter on the DateTime column for creating partitions on the table. So it is important to pay attention to the DateTime ranges when filtering the values so that only one filter condition must have an “equal to” on RangeStart or RangeEnd, not both
Continue reading “Incremental Refresh in Power BI, Part 1: Implementation in Power BI Desktop”

Power BI 101, Report Authoring Tools

This is my last blog post in 2021. I wish you all a wonderful break and a happy new year.

In the first post of my Power BI 101 series, we learnt that Power BI is a data platform containing various tools and services. We also explained the currently available subscriptions within the Power BI platform. The focus of the second post of this series is on what we should learn to achieve our goals more efficiently. In this post, we focus on the reporting tools available to us according to our subscription plan. From this post onwards, we discuss more specific aspects of the Power BI platform.

We learnt so far that Power BI is not just a reporting tool to build sophisticated reports; it is indeed a platform supplying a wide range of features from data preparation, data modelling and data visualization to contribute to an organisation’s data analysis journey in many ways such as sharing datasets, reports, and dashboards. All of these are possible only if we take the correct steps in building our Power BI ecosystem. But, it is very true that Power BI gives us the flexibility to create professional-looking and eye-catching visualisations providing easy to understand insights around a subject. The most renowned tool within Power BI is Power BI Desktop, but it is not the only tool available to us to create reports. Besides, Power BI Desktop reports are not necessarily the best answer to all business requirements. In fact, the business requirements define the Power BI architecture that supports those requirements. Based on the architecture, organisations decide to acquire a certain subscription plan, and based on the subscription plan, we have various reporting tools available to us. At this point, you might ask, “well, what architecture supports my organisation requires?”. Let’s answer that question in a future blog post.

For the purpose of this blog post, it is enough to know what reporting tools are available under which subscription plans. Let’s get started.

Reporting tools available in Power BI

As mentioned earlier, there are various reporting tools available to us. Let’s first see what reporting tools are available to us regardless of the subscription plans. Then we will look at the subscription plans supporting those tools.

Power BI Service

Power BI Service is a SaaS (Software as a Service) offering from Microsoft in the cloud. The users within an organisation, depending on their access rights, may create reports directly in Power BI Service. The users can also securely share and distribute those reports. While creating or editing reports is possible in Power BI Service, it is strongly recommended to avoid this method for several reasons. The most obvious one is that the changes we make to a report may soon get overwritten by someone else that republishes the same report from Power BI Desktop. Check this blog post from SQLChick to see why you should avoid creating or editing reports directly from Power BI Service. The reports are downloadable in PBIX format. 

You can access Power BI Service here.

Power BI Desktop

It is a desktop application built for data preparation, data modelling and data visualisation. If you like to learn about data modelling with Power BI, check out my book here. We can use Power BI Desktop to connect to more than 250 different data sources, preparing, transforming and cleansing that data and at lastly visualising the data. Power BI Desktop is the predominant and most renowned report authoring tool available in the Power BI platform. It has many more functionalities and is more flexible than Power BI Service. For instance, setting up Role Level Security (RLS) is not available in Power BI Service. The file format of the reports created in Power BI Desktop is PBIX. 

Download Power BI Desktop from here.

Power BI Report Builder (Paginated)

Paginated reports aka pixel-perfect reports are formatted in a way to fit perfectly on a page. That report page might later be printed. We have exact control over page formatting to display our data in tables or charts. The reports are not as interactive as the reports created in Power BI Desktop.

Paginated reports are based on RDL technology which is standard report format in SQL Server Reporting Services (SSRS). The tool for developing a paginated report in the Power BI ecosystem is Power BI Report Builder. The reports file type is RDL. We can currently publish Paginated reports only to a Workspace that is backed with either a Premium Per User (PPU) or a Premium Capacity. 

Download Power BI Report Builder from here.

Power BI Desktop Optimised for Report Server (RS)

Power BI Report Server (PBIRS) is an on-premises server capable of rendering Power BI report files (PBIX). If we have a PBIRS up and running within our organisation and we require to publish Power BI reports to PBIRS, then we will need to create our reports in a special edition of Power BI Desktop which is optimised for PBIRS. This edition is different from Power BI Desktop, which we usually use to create and publish our reports to Power BI Service. For instance, Power BI Desktop RS does not include the preview features we used to see in Power BI Desktop until those features are released. Building reports in Power BI Desktop RS guarantees that the reports are fully functional after being deployed to our PBIRS. We can install Power BI Desktop and Power BI Desktop RS side-by-side on the same machine. 

Download Power BI Desktop RS from here.

Continue reading “Power BI 101, Report Authoring Tools”

Power BI 101, What Should I Learn?

This is the second part of my new series of Power BI posts named Power BI 101. In the previous post, I briefly discussed what Power BI is. In this post, I look into one of the most confusing parts for those who want to start learning Power BI. Many people jump straight online and look for Power BI training courses which there are plenty out there. But which one is the right training course for you? Let’s find out.

What do you want to gain from learning Power BI?

Regardless of attending paid training courses or being a self-learner, the above question is one of the most important questions you might ask yourself before going to the next steps. The answer to this question dictates the sort of training you must look for. Your answer to the preceding question can be one or none of the following:

  • I am a graduate/student looking at the job market
  • I am a business analyst and I want to know how Power BI can help you with my daily job
  • I am a database developer and I want to learn more about business intelligence and data and analytics space
  • I am a non-Microsoft Business Intelligence developer and I want to start learning more about Microsoft offerings
  • I am a system admin and I have to manage our Power BI tenant
  • I am a data scientist and I want to know how I can use Power BI
  • I am just ciourious to see what Power BI can do for me

As mentioned, your answer might not be any of the above, but, thinking about your reason(s) for learning Power BI can help you to find the best way to learn and use Power BI more efficiently. You can spend time and money taking some online courses and get even more confused. You don’t want that do you?

So, whatever reason(s) you have in mind to learn Power BI, most probably you fall into one of the following user categories:

Think about your goal(s) and what you want to achieve by learning Power BI then try to identify your user category. For instance, if you are a student thinking of joining an IT company as a data and analytics developer, then your user category is most probably a Power BI Developer or a Contributor.

To help you find out your user category let’s see what the above user categories mean.

Power BI Developers

The Power BI Developers are the beating hearts of any Power BI development project. Regardless of the project you will be involved with, you definitely require to have a certain level of knowledge of the following:

  • Data preparation/ETL processes
  • Data warehousing
  • Data modelling/Star schema
  • Data visualisation

To be a successful Power BI developer you must learn the following languages in Power BI:

  • Power Query
  • DAX

Depending on the types of projects you will be involved in, you may require to learn the following languages as well:

  • Microsoft Visual Basic (for Paginated Reports)
  • Python
  • R
  • T-SQL
  • PL/SQL

As a Power BI developer, you will write a lot of Power Query and DAX expressions. Most probably you require to learn T-SQL as well. The following resources can be pretty helpful:

Continue reading “Power BI 101, What Should I Learn?”