Understanding Power BI Totals: The Math, the Model, and the Misconceptions

The long-running debate around how Power BI calculates totals in tables and matrices has been part of the community conversation for years. Greg Deckler has kept the topic alive through his ongoing “broken totals” posts on social media, often suggesting that Power BI should include a simple toggle to make totals behave more like Excel. His continued campaign prompted a detailed reply from Daniel Otykier in his article No More Measure Totals Shenanigans, and earlier, Diego Scalioni explored how DAX evaluates totals internally in his post Cache me if you can: DAX Totals behind the scenes.

This blog brings all those perspectives together from a scientific and comparative angle. It looks at how totals are calculated in Power BI and compares that behaviour with Tableau, Excel, Paginated Reports, and even T-SQL. The goal is not to take sides, but to clear up the confusion around what is happening under the hood.

If you are into podcasts and prefer the audio version of this blog, I got you covered. Here an AI generated podcast for this blog. 👇

Power BI’s Broken Totals – Myth Debunked

Are Power BI Totals Really Broken?

Let’s get one thing clear right at the start, no, Power BI totals are not broken. There is no “it depends” this time. What some interpret as broken behaviour is actually how DAX and the underlying model are designed to work.

This post is not personal, it is purely scientific and technical. While I have great respect for Greg and his significant contributions to the Power BI community, I disagree with the use of the word “BROKEN.” It sounds dramatic but does not reflect the full truth. Totals in Power BI behave exactly as the model and the maths define them to. Want to know why? Keep reading.

Why this matters

When someone with Greg’s influence keeps saying totals are “broken”, it really affects how new users see Power BI. Some even start thinking the tool itself is not reliable, when what they are seeing is actually how different reporting tools do their calculations in different ways.

It helps to know the main calculation styles that these tools use:

  • Cell based: This is what you get in worksheet formulas and classic PivotTables that use Excel ranges. Totals are just simple sums of the shown items, with no model or relationships behind the scene.
  • Model driven: This is how Power BI works and also Excel PivotTables that use the Data Model (Power Pivot) or connect to a tabular dataset. Measures are calculated again for every context, so totals depend on how filters and relationships are set.
  • Query driven: Tools like Paginated Reports work this way. The report runs a query, for example SQL or DAX, gets the dataset, and then sums or averages values in the report design. The author decides how each total should be calculated.
  • Hybrid (query and context driven): Tableau fits in here. It gets the data through a query but also lets you change the level of detail and how totals behave in the visual. So sometimes it acts like a query tool and sometimes more like a model one.

Most of the confusion happens when people compare results from these tools as if they all worked the same way. Once you understand the difference between cell based, model driven, query driven, and hybrid tools, the way Power BI shows its totals starts to make full sense.

The problem that started it

Greg’s long-running example uses a small table with a single column of numbers and a DAX measure like this:

SUMX(SampleData, SampleData[Amount]) - 10

In the total row, the result shows 590, while he expects 580 (two groups of 290 each). Based on that, he argues that Power BI totals are “wrong”.

But DAX is only doing what it is told to do. In this measure, the subtraction of 10 happens after the total amount is calculated, not for each row. If the intention was to take 10 away per row, then the measure should be written like this:

SUMX(SampleData, SampleData[Amount] - 10)

This version gives the expected 580 because the subtraction now happens at the lowest level of detail, which is per row.

This might look like a small detail, but it is exactly where most of the confusion around totals begins. The difference is not about Power BI being wrong; it is about understanding where in the calculation the operation happens.

The math behind it

Before we look at the numbers, let’s first talk about what we are trying to do. We Greg’s small and very simple table that shows some amounts by Category and Colour:

CategoryColourAmount
ARed100
AGreen100
ABlue100
BRed100
BGreen100
BBlue100
Continue reading “Understanding Power BI Totals: The Math, the Model, and the Misconceptions”

Incremental Refresh in Power BI, Part 2; Best Practice; Do NOT Publish Data Model Changes from Power BI Desktop

Incremental Refresh Best Practice, Do NOT Publish Changes from Power BI Desktop

In a previous post, I shared a comprehensive guide on implementing Incremental Data Refresh in Power BI Desktop. We covered essential concepts such as truncation and load versus incremental load, understanding historical and incremental ranges, and the significant benefits of adopting incremental refresh for large tables. If you missed that post, I highly recommend giving it a read to get a solid foundation on the topic.

Now, let’s dive into Part 2 of this series where we will explore tips and tricks for implementing Incremental Data Refresh in more complex scenarios. This blog follows up on the insights provided in the first part, offering a deeper understanding of how Incremental Data Refresh works in Power BI. Whether you’re a seasoned Power BI user or just getting started, this post will provide valuable information on optimising your data refresh strategies. So, let’s begin.

When we publish a Power BI solution from Power BI Desktop to Fabric Service, we upload the data model, queries, reports, and the loaded data into the data model to the cloud. In essence, the Power Query queries, the data model and the loaded data will turn to the Semantic Model and the report will be a new report connected to the semantic model with Connect Live storage mode to the semantic model. If you are not sure what Connect Live means, then check out this post where I explain the differences between Connect Live and Direct Query storage modes.

The Publish process in Power BI Desktop makes absolute sense in the majority of Power BI developments. While Power BI Desktop is the predominant development tool to implement Power BI solutions, the publishing process is still not quite up to the task, especially on more complex scenarios such as having Incremental Data Refresh configured on one or more tables. Here is why.

As explained in this post, publishing the solution into the service for the first time does not create the partitions required for the incremental refresh. The partitions will be created after the first time we refresh the semantic model from the Fabric Service. Imagine the case where we successfully refreshed the semantic model, but we need to modify the solution in Power BI Desktop and republish the changes to the service. That’s where things get more complex than expected. Whenever we republish the new version from Power BI Desktop to Fabric Service, we get a warning that the semantic model exists in the target workspace and that we want to Overwrite it with the new one. In other words, Power BI Desktop currently does not offer to apply the semantic model changes without overwriting the entire model. This means that if we move forward, as the warning message suggests, we replace the existing semantic model and the created partitions with the new one without any partitions. So the new semantic model is now in its very first stage and the partitions of the table(s) with incremental refresh are gone. Of course, the partitions will be created during the next refresh, but this is not efficient and realistically totally unacceptable in production environments. That’s why we MUST NOT use Power BI Desktop for republishing an already published semantic model to avoid overriding the already created tables’ partitions. Now that Power BI Desktop does not support more advanced publishing scenarios such as detecting the existing partitions created by the incremental refresh process, let’s discuss our other options.

Alternatives to Power BI Desktop to Publish Changes to Fabric Service

While we should not publish the changes from Power BI Desktop to the Service, we can still use it as our development tool and publish the changes using third-party tools, thanks to the External Tools support feature. The following subsections explain using two tools that I believe are the best.

Continue reading “Incremental Refresh in Power BI, Part 2; Best Practice; Do NOT Publish Data Model Changes from Power BI Desktop”

Datatype Conversion in Power Query Affects Data Modeling in Power BI

Datatype Conversion in Power Query Affects Data Modeling in Power BI

In my consulting experience working with customers using Power BI, many challenges that Power BI developers face are due to negligence to data types. Here are some common challenges that are the direct or indirect results of inappropriate data types and data type conversion:

  • Getting incorrect results while all calculations in your data model are correct.
  • Poor performing data model.
  • Bloated model size.
  • Difficulties in configuring user-defined aggregations (agg awareness).
  • Difficulties in setting up incremental data refresh.
  • Getting blank visuals after the first data refresh in Power BI service.

In this blogpost, I explain the common pitfalls to prevent future challenges that can be time-consuming to identify and fix.

Background

Before we dive into the topic of this blog post, I would like to start with a bit of background. We all know that Power BI is not only a reporting tool. It is indeed a data platform supporting various aspects of business intelligence, data engineering, and data science. There are two languages we must learn to be able to work with Power BI: Power Query (M) and DAX. The purpose of the two languages is quite different. We use Power Query for data transformation and data preparation, while DAX is used for data analysis in the Tabular data model. Here is the point, the two languages in Power BI have different data types.

The most common Power BI development scenarios start with connecting to the data source(s). Power BI supports hundreds of data sources. Most data source connections happen in Power Query (the data preparation layer in a Power BI solution) unless we connect live to a semantic layer such as an SSAS instance or a Power BI dataset. Many supported data sources have their own data types, and some don’t. For instance, SQL Server has its own data types, but CSV doesn’t. When the data source has data types, the mashup engine tries to identify data types to the closest data type available in Power Query. Even though the source system has data types, the data types might not be compatible with Power Query data types. For the data sources that do not support data types, the matchup engine tries to detect the data types based on the sample data loaded into the data preview pane in the Power Query Editor window. But, there is no guarantee that the detected data types are correct. So, it is best practice to validate the detected data types anyway.

Power BI uses the Tabular model data types when it loads the data into the data model. The data types in the data model may or may not be compatible with the data types defined in Power Query. For instance, Power Query has a Binary data type, but the Tabular model does not.

The following table shows Power Query’s datatypes, their representations in the Power Query Editor’s UI, their mapping data types in the data model (DAX), and the internal data types in the xVelocity (Tabular model) engine:

Power Query and DAX (data model) data type mapping
Power Query and DAX (data model) data type mapping

As the above table shows, in Power Query’s UI, Whole Number, Decimal, Fixed Decimal and Percentage are all in type number in the Power Query engine. The type names in the Power BI UI also differ from their equivalents in the xVelocity engine. Let us dig deeper.

Continue reading “Datatype Conversion in Power Query Affects Data Modeling in Power BI”

Incremental Refresh in Power BI, Part 1: Implementation in Power BI Desktop

Incremental-Refresh-in-Power-BI-Part-1-Implementation-in-Power-BI-Desktop

Incremental refresh, or IR, refers to loading the data incrementally, which has been around in the world of ETL for data warehousing for a long time. Let us discuss incremental refresh (or incremental data loading) in a simple language to better understand how it works.

From a data movement standpoint, there are always two options when we transfer data from location A to location B:

  1. Truncation and load: We transfer the data as a whole from location A to location B. If location B has some data already, we entirely truncate the location B and reload the whole data from location A to B
  2. Incremental load: We transfer the data as a whole from location A to location B just once for the first time. The next time, we only load the data changes from A to B. In this approach, we never truncate B. Instead, we only transfer the data that exists in A but not in B

When we refresh the data in Power BI, we use the first approach, truncation and load, if we have not configured an incremental refresh. In Power BI, the first approach only applies to tables with Import or Dual storage modes. Previously, the Incremental load was available only in the tables with either Import or Dual storage modes. But the new announcement from Microsoft about Hybrid Tables greatly affects how Incremental load works. With the Hybrid Tables, the Incremental load is available on a portion of the table when a specific partition is in Direct Query mode, while the rest of the partitions are in Import storage mode.

Incremental refresh used to be available only on Premium capacities, but from Feb 2020 onwards, it is also available in Power BI Pro with some limitations. However, the Hybrid Tables are currently available on Power BI Premium Capacity and Premium Per User (PPU), not Pro. Let’s hope that Microsft will change its licensing plan for the Hybrid Tables in the future and make it available in Pro.

I will write about Hybrid Tables in a future blog post.

When we successfully configure the incremental refresh policies in Power BI, we always have two ranges of data; the historical range and the incremental range. The historical range includes all data processed in the past, and the incremental range is the current range of data to process. Incremental refresh in Power BI always looks for data changes in the incremental range, not the historical range. Therefore, the incremental refresh will not notice any changes in the historical data. When we talk about the data changes, we are referring to new rows inserted, updated or deleted, however, the incremental refresh detects updated rows as deleting the rows and inserting new rows of data.

Benefits of Incremental Refresh

Configuring incremental refresh is beneficial for large tables with hundreds of millions of rows. The following are some benefits of configuring incremental refresh in Power BI:

  • The data refreshes much faster than when we truncate and load the data as the incremental refresh only refreshes the incremental range
  • The data refresh process is less resource-intensive than refreshing the entire data all the time
  • The data refresh is less expensive and more maintainable than the non-incremental refreshes over large tables
  • The incremental refresh is inevitable when dealing with massive datasets with billions of rows that do not fit into our data model in Power BI Desktop. Remember, Power BI uses in-memory data processing engine; therefore, it is improbable that our local machine can handle importing billions of rows of data into the memory

Now that we understand the basic concepts of the incremental refresh, let us see how it works in Power BI.

Implementing Incremental Refresh Policies with Power BI Desktop

We currently can configure incremental refresh in the Power BI Desktop and in Dataflows contained in a Premium Workspace. This blog post looks at the incremental refresh implementation within the Power BI Desktop.

After successfully implementing the incremental refresh policies with the desktop, we publish the model to Power BI Service. The first data refresh takes longer as we transfer all data from the data source(s) to Power BI Service for the first time. After the first load, all future data refreshes will be incremental.

How to Implement Incremental Refresh

Implementing incremental refresh in Power BI is simple. There are two generic parts of the implementation:

  1. Preparing some prerequisites in Power Query and defining incremental policies in the data model
  2. Publishing the model to Power BI Service and refreshing the dataset

Let’s briefly get to some more details to quickly understand how the implementation works.

  • Preparing Prerequisites in Power Query
    • We require to define two parameters with DateTime data type in Power Query Editor. The names for the two parameters are RangeStart and RangeEnd, which are reserved for defining incremental refresh policies. As you know, Power Query is case-sensitive, so the names of the parameters must be RangeStart and RangeEnd.
    • The next step is to filter the table by a DateTime column using the RangeStart and RangeEnd parameters when the value of the DateTime column is between RangeStart and RangeEnd.

Notes

  • The data type of the parameters must be DateTime
  • The datat tpe of the column we use for incremental refresh must be Int64 (integer) Date or DateTime.Therefore, for scenarios that our table has a smart date key instead of Date or DateTime, we have to convert the RangeStart and RangeEnd parameters to Int64
  • When we filter a table using the RangeStart and RangeEnd parameters, Power BI uses the filter on the DateTime column for creating partitions on the table. So it is important to pay attention to the DateTime ranges when filtering the values so that only one filter condition must have an “equal to” on RangeStart or RangeEnd, not both
Continue reading “Incremental Refresh in Power BI, Part 1: Implementation in Power BI Desktop”