Incremental Refresh in Power BI, Part 3: Best Practices for Large Semantic Models

Incremental Refresh in Power BI, Best Practices for Large Semantic Models

In the two previous posts of the Incremental Refresh in Power BI series, we have learned what incremental refresh is, how to implement it, and best practices on how to safely publish the semantic model changes to Microsoft Fabric (aka Power BI Service). This post focuses on a couple of more best practices in implementing incremental refresh on large semantic models in Power BI.

Note

Since May 2023 that Microsoft announced Microsoft Fabric for the first time, Power BI is a part of Microsoft Fabric. Hence, we use the term Microsoft Fabric throughout this post to refer to Power BI or Power BI Service.

The Problem

Implementing incremental refresh on Power BI is usually straightforward if we carefully follow the implementation steps. However in some real-world scenarios, following the implementation steps is not enough. In different parts of my latest book, Expert Data Modeling with Power BI, 2’nd Edition, I emphasis the fact that understanding business requirements is the key to every single development project and data modelling is no different. Let me explain it more in the context of incremental data refresh implementation.

Let’s say we followed all the required implementation steps and we also followed the deployment best practices and everything runs pretty good in our development environment; the first data refresh takes longer, we we expected, all the partitions are also created and everything looks fine. So, we deploy the solution to production environment and refresh the semantic model. Our production data source has substantially larger data than the development data source. So the data refresh takes way too long. We wait a couple of hours and leave it to run overnight. The next day we find out that the first refresh failed. Some of the possibilities that lead the first data refresh to fail are Timeout, Out of resources, or Out of memory errors. This can happen regardless of your licensing plan, even on Power BI Premium capacities.

Another issue you may face usually happens during development. Many development teams try to keep their development data source’s size as close as possible to their production data source. And… NO, I am NOT suggesting using the production data source for development. Anyway, you may be tempted to do so. You set one month’s worth of data using the RangeStart and RangeEnd parameters just to find out that the data source actually has hundreds of millions of rows in a month. Now, your PBIX file on your local machine is way too large so you cannot even save it on your local machine.

This post provides some best practices. Some of the practices this post focuses on require implementation. To keep this post at an optimal length, I save the implementations for future posts. With that in mind, let’s begin.

Best Practices

So far, we have scratched the surface of some common challenges that we may face if we do not pay attention to the requirements and the size of the data being loaded into the data model. The good news is that this post explores a couple of good practices to guarantee smoother and more controlled implementation avoiding the data refresh issues as much as possible. Indeed, there might still be cases where we follow all best practices and we still face challenges.

Note

While implementing incremental refresh is available in Power BI Pro semantic models, but the restrictions on parallelism and lack of XMLA endpoint might be a deal breaker in many scenarios. So many of the techniques and best practices discussed in this post require a premium semantic model backed by either Premium Per User (PPU), Power BI Capacity (P/A/EM) or Fabric Capacity.

The next few sections explain some best practices to mitigate the risks of facing difficult challenges down the road.

Practice 1: Investigate the data source in terms of its complexity and size

This one is easy; not really. It is necessary to know what kind of beast we are dealing with. If you have access to the pre-production data source or to the production, it is good to know how much data will be loaded into the semantic model. Let’s say the source table contains 400 million rows of data for the past 2 years. A quick math suggests that on average we will have more than 16 million rows per month. While these are just hypothetical numbers, you may have even larger data sources. So having some data source size and growth estimation is always helpful for taking the next steps more thoroughly.

Practice 2: Keep the date range between the RangeStart and RangeEnd small

Continuing from the previous practice, if we deal with fairly large data sources, then waiting for millions of rows to be loaded into the data model at development time doesn’t make too much sense. So depending on the numbers you get from the previous point, select a date range that is small enough to let you easily continue with your development without needing to wait a long time to load the data into the model with every single change in the Power Query layer. Remember, the date range selected between the RangeStart and RangeEnd does NOT affect the creation of the partition on Microsoft Fabric after publishing. So there wouldn’t be any issues if you chose the values of the RangeStart and RangeEnd to be on the same day or even at the exact same time. One important point to remember is that we cannot change the values of the RangeStart and RangeEnd parameters after publishing the model to Microsoft Fabric.

Continue reading “Incremental Refresh in Power BI, Part 3: Best Practices for Large Semantic Models”

Unveiling Microsoft Fabric’s Impact on Power BI Developers and Analysts

Unveiling Microsoft Fabric’s Impact on Power BI Developers and Analysts

Microsoft Fabric is a new platform designed to bring together the data and analytics features of Microsoft products like Power BI and Azure Synapse Analytics into a single SaaS product. Its goal is to provide a smooth and consistent experience for both data professionals and business users, covering everything from data entry to gaining insights. A new data platform comes with new keywords and terminologies, so to get more familiar with some new terms in Microsoft Fabric, check out this blog post.

As mentioned in one of my previous posts, Microsoft Fabric is built upon the Power BI platform; therefore we expect it to provide ease of use, strong collaboration, and wide integration capabilities. While Microsoft Fabric is getting more attention in the market, so we see more and more organisations investigating the possibilities of migrating their existing data platforms to Microsoft Fabric. But what does it mean for seasoned Power BI developers? What about Power BI professional users such as data analysts and business analysts? In this post, I endeavor to answer those questions.

I have been blogging predominantly around Microsoft Data Platforms and especially Power BI since 2013. But I have never written about the history of Power BI. I believe it makes sense to touch upon the history of Power BI to better understand the size of its user base and how introducing a new data platform that includes Power BI can affect them. A quick search on the internet provides some interesting facts about it. So let’s take a moment and talk about it.

The history of Power BI

Power BI started as a top-secret project at Microsoft in 2006 by Thierry D’Hers and Amir Netz. They wanted to make a better way to analyse data using Microsoft Excel. They called their project “Gemini” at first.

In 2009, they released PowerPivot, a free extension for Excel that supports in-memory data processing. This made it faster and easier to do calculations and create reports. PowerPivot got quickly popular among Excel users, but it had some limitations. For example, it was hard to share large Excel files with others, and it was not possible to update the data automatically.

In 2015, Microsoft combined PowerPivot with another extension called Power Query, which lets users get data from different sources and clean it up. They also added a cloud service that lets users publish and share their reports online. They called this new product Power BI, which stands for Power Business Intelligence.

In the past few years, Power BI grasped a lot of attention in the market and improved a lot to cover more use cases and business requirements from data transformation, data modelling, and data visualisation to combining all these goods with the power of AI and ML to provide predictive and prescriptive analysis.

Who are Power BI Users?

Since its birth, Power BI has become one of the most popular and powerful data analysis and data visualisation tools in the world used by a wide variety of users. In the past few years, Power BI generated many new roles in the job market, such as Power BI developer, Power BI consultant, Power BI administrator, Power BI report writer, and whatnot, as well as helping many others by making their lives easier, such as data analysts and business analysts. With Power BI, the data analysts could efficiently analyse the data and make recommendations based on their findings. Business analysts could use Power BI to focus on more practical changes resulting from their analysis of the data and show their findings to the business much quicker than before. As a result, millions of users interact with Power BI on a daily basis in many ways. So, introducing a new data platform that sort of “Swallows Power BI” may sound daunting to those whose daily job relates to content creation, maintenance, or administrating Power BI environments. For many, the fear is real. But shall the developers and analysts be afraid of Microsoft Fabric? The short answer is “Absolutely not!”. Does it change the way we used to work with Power BI? Well, it depends.

To answer these questions, we first need to know who are Power BI users and how they interact with it.

Continue reading “Unveiling Microsoft Fabric’s Impact on Power BI Developers and Analysts”

Datatype Conversion in Power Query Affects Data Modeling in Power BI

Datatype Conversion in Power Query Affects Data Modeling in Power BI

In my consulting experience working with customers using Power BI, many challenges that Power BI developers face are due to negligence to data types. Here are some common challenges that are the direct or indirect results of inappropriate data types and data type conversion:

  • Getting incorrect results while all calculations in your data model are correct.
  • Poor performing data model.
  • Bloated model size.
  • Difficulties in configuring user-defined aggregations (agg awareness).
  • Difficulties in setting up incremental data refresh.
  • Getting blank visuals after the first data refresh in Power BI service.

In this blogpost, I explain the common pitfalls to prevent future challenges that can be time-consuming to identify and fix.

Background

Before we dive into the topic of this blog post, I would like to start with a bit of background. We all know that Power BI is not only a reporting tool. It is indeed a data platform supporting various aspects of business intelligence, data engineering, and data science. There are two languages we must learn to be able to work with Power BI: Power Query (M) and DAX. The purpose of the two languages is quite different. We use Power Query for data transformation and data preparation, while DAX is used for data analysis in the Tabular data model. Here is the point, the two languages in Power BI have different data types.

The most common Power BI development scenarios start with connecting to the data source(s). Power BI supports hundreds of data sources. Most data source connections happen in Power Query (the data preparation layer in a Power BI solution) unless we connect live to a semantic layer such as an SSAS instance or a Power BI dataset. Many supported data sources have their own data types, and some don’t. For instance, SQL Server has its own data types, but CSV doesn’t. When the data source has data types, the mashup engine tries to identify data types to the closest data type available in Power Query. Even though the source system has data types, the data types might not be compatible with Power Query data types. For the data sources that do not support data types, the matchup engine tries to detect the data types based on the sample data loaded into the data preview pane in the Power Query Editor window. But, there is no guarantee that the detected data types are correct. So, it is best practice to validate the detected data types anyway.

Power BI uses the Tabular model data types when it loads the data into the data model. The data types in the data model may or may not be compatible with the data types defined in Power Query. For instance, Power Query has a Binary data type, but the Tabular model does not.

The following table shows Power Query’s datatypes, their representations in the Power Query Editor’s UI, their mapping data types in the data model (DAX), and the internal data types in the xVelocity (Tabular model) engine:

Power Query and DAX (data model) data type mapping
Power Query and DAX (data model) data type mapping

As the above table shows, in Power Query’s UI, Whole Number, Decimal, Fixed Decimal and Percentage are all in type number in the Power Query engine. The type names in the Power BI UI also differ from their equivalents in the xVelocity engine. Let us dig deeper.

Continue reading “Datatype Conversion in Power Query Affects Data Modeling in Power BI”

Endorsement in Power BI, Part 2, How to Endorse?

Endorsement in Power BI, Part 2, How to Endorse?

In the previous post I explained the basic concepts around endorsement in Power BI. We discussed that users’ ability to collaborate in creating and sharing artifacts is one of the key aspects of users’ experience in Power BI. But it would be hard, if not impossible, to identify the quality of the artifact without a mechanism to identify the artifact’s quality in large organisations. Endorsement is the answer to this challenge. We discussed the following in the previous post:

In this post, I explain the following:

How do Power BI administrators enable certification and grant rights to security groups?

In the previous post, we discussed that a Power BI administrator must enable certification and grant sufficient rights to the security groups. Therefore, all members of the specified security group are authorised to certify the artifacts. If you are a Power BI administrator, follow these steps to do so:

  1. After logging into Power BI Service, click the Settings button
  2. Click Admin Portal
  3. From the Tenant settings, scroll down to find the Export and sharing settings
  4. Find and expand the Certification setting
  5. Enable certification
  6. Put the certification process documentation URL (if any)
  7. It is not recommended to enable this feature for the entire organisation. So, select the Specific security groups option
  8. Type the security group name and select it from the list
  9. Click the Apply button

The following image shows the above steps:

Enabling certification from the Admin Portal in Power BI Service
Enabling certification from the Admin Portal in Power BI Service

It may take up to 15 minutes for the changes to go through. After that, all the members of the specified security can certify the artifacts. In the next section, we see how to certify the supported artifacts.

Note

Everyone who has “write” permission on the Workspace containing the artifact can promote it. Therefore, the users or security groups with one of the AdminMember, or Contributor roles in the Workspace can promote the artifacts.

However, one should not promote the artifacts just because he/she can. The organisations usually have a promotion process to follow, but the boundaries around promoting are often much more relaxed than certifying it.

Continue reading “Endorsement in Power BI, Part 2, How to Endorse?”