In Power BI development in Microsoft Fabric, understanding and utilising source control mechanisms is crucial for efficient collaboration and version management. This blog post delves into the essential aspects of source control for Power BI. This blog also includes the recording of my session at Saudi Arabia’s Excel User Group on the 26th of August 2023. The event was organised by Microsoft MVP, Faraz Sheik, where we walked through all the topics discussed in this blog.
Understanding Source Control
At its core, source control is a system that records changes to a file or set of files over time. This lets developers recall specific versions later, ensuring efficient collaboration and error management. It’s particularly vital for development teams, allowing multiple contributors to work on the same codebase without overwriting each other’s work.
For Power BI developers, this means tracking changes made to reports, and data models that are the most crucial components of every Power BI project.
Since May 2023 that Microsoft announced Microsoft Fabric for the first time, Power BI is a part of Microsoft Fabric. Hence, we use the term Microsoft Fabric throughout this post to refer to Power BI or Power BI Service.
The Problem
Implementing incremental refresh on Power BI is usually straightforward if we carefully follow the implementation steps. However in some real-world scenarios, following the implementation steps is not enough. In different parts of my latest book, Expert Data Modeling with Power BI, 2’nd Edition, I emphasis the fact that understanding business requirements is the key to every single development project and data modelling is no different. Let me explain it more in the context of incremental data refresh implementation.
Let’s say we followed all the required implementation steps and we also followed the deployment best practices and everything runs pretty good in our development environment; the first data refresh takes longer, we we expected, all the partitions are also created and everything looks fine. So, we deploy the solution to production environment and refresh the semantic model. Our production data source has substantially larger data than the development data source. So the data refresh takes way too long. We wait a couple of hours and leave it to run overnight. The next day we find out that the first refresh failed. Some of the possibilities that lead the first data refresh to fail are Timeout, Out of resources, or Out of memory errors. This can happen regardless of your licensing plan, even on Power BI Premium capacities.
Another issue you may face usually happens during development. Many development teams try to keep their development data source’s size as close as possible to their production data source. And… NO, I am NOT suggesting using the production data source for development. Anyway, you may be tempted to do so. You set one month’s worth of data using the RangeStart and RangeEnd parameters just to find out that the data source actually has hundreds of millions of rows in a month. Now, your PBIX file on your local machine is way too large so you cannot even save it on your local machine.
This post provides some best practices. Some of the practices this post focuses on require implementation. To keep this post at an optimal length, I save the implementations for future posts. With that in mind, let’s begin.
Best Practices
So far, we have scratched the surface of some common challenges that we may face if we do not pay attention to the requirements and the size of the data being loaded into the data model. The good news is that this post explores a couple of good practices to guarantee smoother and more controlled implementation avoiding the data refresh issues as much as possible. Indeed, there might still be cases where we follow all best practices and we still face challenges.
Note
While implementing incremental refresh is available in Power BI Pro semantic models, but the restrictions on parallelism and lack of XMLA endpoint might be a deal breaker in many scenarios. So many of the techniques and best practices discussed in this post require a premium semantic model backed by either Premium Per User (PPU), Power BI Capacity (P/A/EM) or Fabric Capacity.
The next few sections explain some best practices to mitigate the risks of facing difficult challenges down the road.
Practice 1: Investigate the data source in terms of its complexity and size
This one is easy; not really. It is necessary to know what kind of beast we are dealing with. If you have access to the pre-production data source or to the production, it is good to know how much data will be loaded into the semantic model. Let’s say the source table contains 400 million rows of data for the past 2 years. A quick math suggests that on average we will have more than 16 million rows per month. While these are just hypothetical numbers, you may have even larger data sources. So having some data source size and growth estimation is always helpful for taking the next steps more thoroughly.
Practice 2: Keep the date range between the RangeStart and RangeEnd small
Continuing from the previous practice, if we deal with fairly large data sources, then waiting for millions of rows to be loaded into the data model at development time doesn’t make too much sense. So depending on the numbers you get from the previous point, select a date range that is small enough to let you easily continue with your development without needing to wait a long time to load the data into the model with every single change in the Power Query layer. Remember, the date range selected between the RangeStart and RangeEnd does NOT affect the creation of the partition on Microsoft Fabric after publishing. So there wouldn’t be any issues if you chose the values of the RangeStart and RangeEnd to be on the same day or even at the exact same time. One important point to remember is that we cannot change the values of the RangeStart and RangeEnd parameters after publishing the model to Microsoft Fabric.
I have published a new blog explaining how to resolve the issue of CapacityAdmins disappering. Check it out here.
NOTE
This method uses the Create or Update a Resource action. This action uses PUT method which messes up with capacity admin settings on the Fabric Admin portal. This is stated by Kevin and later by Wiroj in the comments section. The REST API to scale or down the Fabric Capacity is as follows:
If you run the preceding API using PATCH method, you do not need to have your capacity running. In other words, you can run the API even when the capacity is paused. I will post a new blog explaining this but for now, I hope the information provided here is helpful.
In a previous post I explained how to manage the capacity costs of a Fabric F capacity (under Pay-As-You-Go pricing model) using Logic Apps to Suspend and Resume it.
A customer who read my previous blog asked me “Can we use a similar method to scale up and down before and after specific workloads?”. This blog post is to answer exactly that.
I want to make some important points clear first and before we dig deeper into the solution:
The method described in this post works with Fabric F SKUs under Pay-As-You-Go pricing model.
If you have a Power BI Premium capacity, then this method is not valid for your case. But you might be interested in the autoscale option for Power BI Premium capacities.
Depending on your current workload, scaling down may not work due to resource unavailability.
Depending on your workload, this method may take a while to go through.
You need to be either a Capacity Admin or a Fabric Admin to successfully implement this method.
This method works based on user authentication, however, you may want to use Service Principal or Manage Identity which require more effort but could be a more desirable method in many scenarios.
This post explains a very basic scenario, you’re welcome to scale it to your specific needs.
You can consider this post as a continuation of the previous post. So if you are unsure you correctly understand what this blog is trying to explain, then I suggest you read my previous post first where I explain the Logic Apps implementation in more detail.
The Problem
I have an F Fabric capacity and I want to upscale it to an upper tier between the pick-time from 8 AM to 12 PM local time, then downscale it to its original tier.
The Solution
There are many ways to do this including using Azure Resource Manager APIs, Manage Azure Resources in PowerShell, or using Azure Resource Manager connector that can be used on Azure Logic Apps, Power Automate Premium, and Power Apps Premium. This post explores the use of Azure Resource Manager connectors in Azure Logic Apps. With that, let’s begin.
If you are evaluating Microsoft Fabric and do not currently own a Premium Capacity, chances are you’re using Microsoft Fabric Trial Capacities. All Power BI users within an organisation or specific security groups given the rights can opt into Fabric Trial Capacities. Therefore, you may already have several Trial Fabric Capacities in your tenant. Your Fabric Administrators can specifically control who can opt into the Fabric Trial capacities within the Fabric Admin Portal, on the Help and support settings section, and enabling the Users can try Microsoft Fabric paid features setting as shown in the following image:
The authorised users can then opt into Fabric Trial by following this process:
Click the Account Manager on the top right corner of the page
Click the Start trial button
Click the Start trial button again
Provide the required details
Click the Extend my free trial button
The following image shows the preceding steps:
As you see, opting into Fabric Trial is simple, unless it isn’t!
There are cases where authorised users cannot start their Fabric Trial because their tenant has already exceeded the limit of available trial capacities. In that case, the users get the following message: