Datatype Conversion in Power Query Affects Data Modeling in Power BI

Datatype Conversion in Power Query Affects Data Modeling in Power BI

In my consulting experience working with customers using Power BI, many challenges that Power BI developers face are due to negligence to data types. Here are some common challenges that are the direct or indirect results of inappropriate data types and data type conversion:

  • Getting incorrect results while all calculations in your data model are correct.
  • Poor performing data model.
  • Bloated model size.
  • Difficulties in configuring user-defined aggregations (agg awareness).
  • Difficulties in setting up incremental data refresh.
  • Getting blank visuals after the first data refresh in Power BI service.

In this blogpost, I explain the common pitfalls to prevent future challenges that can be time-consuming to identify and fix.

Background

Before we dive into the topic of this blog post, I would like to start with a bit of background. We all know that Power BI is not only a reporting tool. It is indeed a data platform supporting various aspects of business intelligence, data engineering, and data science. There are two languages we must learn to be able to work with Power BI: Power Query (M) and DAX. The purpose of the two languages is quite different. We use Power Query for data transformation and data preparation, while DAX is used for data analysis in the Tabular data model. Here is the point, the two languages in Power BI have different data types.

The most common Power BI development scenarios start with connecting to the data source(s). Power BI supports hundreds of data sources. Most data source connections happen in Power Query (the data preparation layer in a Power BI solution) unless we connect live to a semantic layer such as an SSAS instance or a Power BI dataset. Many supported data sources have their own data types, and some don’t. For instance, SQL Server has its own data types, but CSV doesn’t. When the data source has data types, the mashup engine tries to identify data types to the closest data type available in Power Query. Even though the source system has data types, the data types might not be compatible with Power Query data types. For the data sources that do not support data types, the matchup engine tries to detect the data types based on the sample data loaded into the data preview pane in the Power Query Editor window. But, there is no guarantee that the detected data types are correct. So, it is best practice to validate the detected data types anyway.

Power BI uses the Tabular model data types when it loads the data into the data model. The data types in the data model may or may not be compatible with the data types defined in Power Query. For instance, Power Query has a Binary data type, but the Tabular model does not.

The following table shows Power Query’s datatypes, their representations in the Power Query Editor’s UI, their mapping data types in the data model (DAX), and the internal data types in the xVelocity (Tabular model) engine:

Power Query and DAX (data model) data type mapping
Power Query and DAX (data model) data type mapping

As the above table shows, in Power Query’s UI, Whole Number, Decimal, Fixed Decimal and Percentage are all in type number in the Power Query engine. The type names in the Power BI UI also differ from their equivalents in the xVelocity engine. Let us dig deeper.

Continue reading “Datatype Conversion in Power Query Affects Data Modeling in Power BI”

Endorsement in Power BI, Part 2, How to Endorse?

Endorsement in Power BI, Part 2, How to Endorse?

In the previous post I explained the basic concepts around endorsement in Power BI. We discussed that users’ ability to collaborate in creating and sharing artifacts is one of the key aspects of users’ experience in Power BI. But it would be hard, if not impossible, to identify the quality of the artifact without a mechanism to identify the artifact’s quality in large organisations. Endorsement is the answer to this challenge. We discussed the following in the previous post:

In this post, I explain the following:

How do Power BI administrators enable certification and grant rights to security groups?

In the previous post, we discussed that a Power BI administrator must enable certification and grant sufficient rights to the security groups. Therefore, all members of the specified security group are authorised to certify the artifacts. If you are a Power BI administrator, follow these steps to do so:

  1. After logging into Power BI Service, click the Settings button
  2. Click Admin Portal
  3. From the Tenant settings, scroll down to find the Export and sharing settings
  4. Find and expand the Certification setting
  5. Enable certification
  6. Put the certification process documentation URL (if any)
  7. It is not recommended to enable this feature for the entire organisation. So, select the Specific security groups option
  8. Type the security group name and select it from the list
  9. Click the Apply button

The following image shows the above steps:

Enabling certification from the Admin Portal in Power BI Service
Enabling certification from the Admin Portal in Power BI Service

It may take up to 15 minutes for the changes to go through. After that, all the members of the specified security can certify the artifacts. In the next section, we see how to certify the supported artifacts.

Note

Everyone who has “write” permission on the Workspace containing the artifact can promote it. Therefore, the users or security groups with one of the AdminMember, or Contributor roles in the Workspace can promote the artifacts.

However, one should not promote the artifacts just because he/she can. The organisations usually have a promotion process to follow, but the boundaries around promoting are often much more relaxed than certifying it.

Continue reading “Endorsement in Power BI, Part 2, How to Endorse?”

Endorsement in Power BI, Part 1, The Basics

Content Endorsement in Power BI, Part 1, The Basics

As you may already know, Power BI is not a report-authoring tool only. Indeed, it is much more than that. Power BI is an all-around data platform supporting many aspects you’d expect from such a platform. You can ingest the data from various data sources, transform it, model it, visualise and share it with others. Read more about what Power BI is here.

One of the key aspects of users’ experience in Power BI is their ability to collaborate in creating and sharing artifacts, making it an easy-to-use and convenient platform. But the convenience comes with the cost of having a lot of shared artifacts in large organisations raising concerns about the artifact’s quality and trustworthiness. It would be hard, if not impossible, to identify the quality of the artifacts without a mechanism to identify the quality of the artifacts. Endorsement is the answer to this.

In this series of blog posts, I answer the following questions:

But before we start, we need to know what content means in Power BI.

What does Content Mean in Power BI?

Update:
Microsoft lately updated the “Content” terminology, which is slightly different from when I wrote this blog. So I replaced content with artifact that is a more generic term. While the term content is not relevant to the topic anymore, I decided to keep this section explaining what content means in Power BI.

When we use the term Content in the context of Power BI, we refer to the artifacts related to visuals in Power BI Service. We currently have the following artifacts in Power BI:

From those artifacts, the Reports, Dashboards and Apps are Contents.

Continue reading “Endorsement in Power BI, Part 1, The Basics”

Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1

Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1

I explained what SCD means in a Business Intelligence solution in my previous post. We also discussed that while we do not expect to handle SCD2 in a Power BI implementation, we can handle scenarios similar to SCD1. In this post, I explain how to do so.

Scenario

We have a retail company selling products. The company releases the list of products in Excel format, including list price and dealer price, every year. The product list is released on the first day of July when the financial year starts. We have to implement a Power BI solution that keeps the latest product data to analyse the sales transactions. The following image shows the Product list for 2013:

Products List 2013 in Excel
Products List 2013

So each year, we receive a similar Excel file to the above image. The files are stored on a SharePoint Online site.

Scenario Explained

As the previous post explains, an SCD1 always keeps the current data by updating the old data with the new data. So an ETL process reads the data from the source, identifies the existing data in the destination table, inserts the new rows to the destination, updates the existing rows, and deletes the removed rows.

Here is why our scenario is similar to SCD1, with one exception:

  • We do not actually update the data in the Excel files and do not create an ETL process to read the data from the Excel files, identify the changes and apply the changes to an intermediary Excel file
  • We must read the data from the source Excel files, keep the latest data while filtering out the old ones and load the data into the data model.

As you see, while we are taking a very different implementation approach, the results are very similar with an exception: we do not delete any rows.

Implementation

Here is what we should do to achieve the goal:

  • We get the data in Power Query Editor using the SharePoint Folder connector
  • We combite the files
  • We use the ProductNumber column to identify the duplicated products
  • We use the Reporting Date column to identify the latest dates
  • We only keep the latest rows

Getting Data from SharePoint Online Folder

As we get the data from multiple files stored on SharePoint Online, we have to use the SharePoint Folder connector. Follow these steps:

  1. Login to SharePoint Online and navigate to the site holding the Product list Excel files and copy the site URL from the browser
Getting SharePoint Online Site URL
Getting SharePoint Online Site URL
  1. From the Get Data in the Power BI Desktop, select the SharePoint Folder connector
  2. Click Connect
Connecting to SharePoint Online Folder from Power BI
Connecting to SharePoint Online Folder from Power BI
  1. Paste the Site URL copied on step 1
  2. Click OK
Connecting to SharePoint Online Folder from Power BI using the SharePoint Folder connector
Connecting to SharePoint Online Folder from Power BI using the SharePoint Folder connector
  1. Click Transform Data
Transforming data in Power Query Editor
Transforming data in Power Query Editor
Continue reading “Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1”