Datatype Conversion in Power Query Affects Data Modeling in Power BI

Datatype Conversion in Power Query Affects Data Modeling in Power BI

In my consulting experience working with customers using Power BI, many challenges that Power BI developers face are due to negligence to data types. Here are some common challenges that are the direct or indirect results of inappropriate data types and data type conversion:

  • Getting incorrect results while all calculations in your data model are correct.
  • Poor performing data model.
  • Bloated model size.
  • Difficulties in configuring user-defined aggregations (agg awareness).
  • Difficulties in setting up incremental data refresh.
  • Getting blank visuals after the first data refresh in Power BI service.

In this blogpost, I explain the common pitfalls to prevent future challenges that can be time-consuming to identify and fix.

Background

Before we dive into the topic of this blog post, I would like to start with a bit of background. We all know that Power BI is not only a reporting tool. It is indeed a data platform supporting various aspects of business intelligence, data engineering, and data science. There are two languages we must learn to be able to work with Power BI: Power Query (M) and DAX. The purpose of the two languages is quite different. We use Power Query for data transformation and data preparation, while DAX is used for data analysis in the Tabular data model. Here is the point, the two languages in Power BI have different data types.

The most common Power BI development scenarios start with connecting to the data source(s). Power BI supports hundreds of data sources. Most data source connections happen in Power Query (the data preparation layer in a Power BI solution) unless we connect live to a semantic layer such as an SSAS instance or a Power BI dataset. Many supported data sources have their own data types, and some don’t. For instance, SQL Server has its own data types, but CSV doesn’t. When the data source has data types, the mashup engine tries to identify data types to the closest data type available in Power Query. Even though the source system has data types, the data types might not be compatible with Power Query data types. For the data sources that do not support data types, the matchup engine tries to detect the data types based on the sample data loaded into the data preview pane in the Power Query Editor window. But, there is no guarantee that the detected data types are correct. So, it is best practice to validate the detected data types anyway.

Power BI uses the Tabular model data types when it loads the data into the data model. The data types in the data model may or may not be compatible with the data types defined in Power Query. For instance, Power Query has a Binary data type, but the Tabular model does not.

The following table shows Power Query’s datatypes, their representations in the Power Query Editor’s UI, their mapping data types in the data model (DAX), and the internal data types in the xVelocity (Tabular model) engine:

Power Query and DAX (data model) data type mapping
Power Query and DAX (data model) data type mapping

As the above table shows, in Power Query’s UI, Whole Number, Decimal, Fixed Decimal and Percentage are all in type number in the Power Query engine. The type names in the Power BI UI also differ from their equivalents in the xVelocity engine. Let us dig deeper.

Continue reading “Datatype Conversion in Power Query Affects Data Modeling in Power BI”

Automate Testing SSAS Tabular Models

Automate Testing SSAS Tabular

In real world SSAS Tabular projects, you need to run many different testing scenarios to prove your customer that the data in Tabular model is correct. If you are running a Tabular Model on top of a proper data warehouse then your life would be a bit easier than when you build your semantic model on top of an operational database. However it would be still a fairly time-consuming process to run many test cases on Tabular Model, then run similar tests on the data warehouse and compare the results. So your test cases always have two sides, one side is your source database that can be a data warehouse and the other side is the Tabular Model. There are many ways to test the system, you can browse your Tabular Model in Excel, connecting to your Data Warehouse in Excel and create pivot tables then compare the data coming from Tabular Model and the data coming from the Data Warehouse. But, for how many measures and dimensions you can do the above test in Excel?

The other way is to run DAX queries on Tabular Model side. If your source database is a SQL Server database, then you need to run T-SQL queries on the database side then match the results of both sides to prove the data in Tabular Model is correct.

In this post I’d like to share with you a way to automate the DAX queries to be run on a Tabular model.

Straight away, this is going to be a long post, so you can make or take a cup of coffee while enjoying your reading.

While I will not cover the other side, the source or the data warehouse side, it is worth to automate that part too as you can save heaps of times. I’m sure a similar process can be developed in SQL Server side, but, I leave that part for now. What I’m going to explain in this post is just one of many possible ways to generate and run DAX queries and store the results in SQL Server. Perhaps it is not perfect, but, it is a good starting point. If you have a better idea it would be great to share it with us in the comments section below this post.

Requirements

  • SQL Server Analysis Services Tabular 2016 and later (Compatibility Level 1200 and higher)
  • An instance of SQL Server
  • SQL Server Management Studio (SSMS)

How does it work

What I’m going to explain is very simple. I want to generate and run DAX queries and capture the results. The first step is to get all measures and their relevant dimensions, then I slice all the measures by all relevant dimensions and get the results. At the end I capture and store the results in a SQL Server temp table. Let’s think about a simple scenario:

  • you have just one measure, [Internet Sales], from ‘Internet Sales’ table
  • The measure is related to just one dimension, “Date” dimension
  • The “Date” dimension has only four columns, Year, Month, Year-Month and Date
  • you want to slice [Internet Sales] by Year, Month, Year-Month and Date

So you need to write four DAX queries as below:

EVALUATE
SUMMARIZE(
    'Internet Sales'
    , Date'[Calendar Year]
    , "Internet Sales", [Internet Total Sales]
)
EVALUATE
SUMMARIZE(
   'Internet Sales'
   , 'Date'[Month Name]
   , "Internet Sales", [Internet Total Sales]
)
EVALUATE
SUMMARIZE(
    'Internet Sales'
   , 'Date'[Year-Month]
   , "Internet Sales", [Internet Total Sales]
)
EVALUATE
SUMMARIZE(
     'Internet Sales'
    , 'Date'[Date]
    , "Internet Sales", [Internet Total Sales]
)

It is easy isn’t it? But, wait. What if you have 10 measures related to 4 dimension and each dimension has 10 columns? That sounds laborious doesn’t it? Well, in real world scenarios you won’t slice all measures by all relevant dimensions, but, you still need to do a lot. What we are going to do is to generate and run the DAX queries and store the results in a table in SQL Server. How cool is that?

OK, this is how it works…

  • Creating a Linked Server for SSAS Tabular instance from SQL Server
  • Generating DAX queries using Tabular DMVs
  • Running the queries through Tabular model and getting/storing the results in a SQL Server temp table

Continue reading “Automate Testing SSAS Tabular Models”