Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1

Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1

I explained what SCD means in a Business Intelligence solution in my previous post. We also discussed that while we do not expect to handle SCD2 in a Power BI implementation, we can handle scenarios similar to SCD1. In this post, I explain how to do so.

Scenario

We have a retail company selling products. The company releases the list of products in Excel format, including list price and dealer price, every year. The product list is released on the first day of July when the financial year starts. We have to implement a Power BI solution that keeps the latest product data to analyse the sales transactions. The following image shows the Product list for 2013:

Products List 2013 in Excel
Products List 2013

So each year, we receive a similar Excel file to the above image. The files are stored on a SharePoint Online site.

Scenario Explained

As the previous post explains, an SCD1 always keeps the current data by updating the old data with the new data. So an ETL process reads the data from the source, identifies the existing data in the destination table, inserts the new rows to the destination, updates the existing rows, and deletes the removed rows.

Here is why our scenario is similar to SCD1, with one exception:

  • We do not actually update the data in the Excel files and do not create an ETL process to read the data from the Excel files, identify the changes and apply the changes to an intermediary Excel file
  • We must read the data from the source Excel files, keep the latest data while filtering out the old ones and load the data into the data model.

As you see, while we are taking a very different implementation approach, the results are very similar with an exception: we do not delete any rows.

Implementation

Here is what we should do to achieve the goal:

  • We get the data in Power Query Editor using the SharePoint Folder connector
  • We combite the files
  • We use the ProductNumber column to identify the duplicated products
  • We use the Reporting Date column to identify the latest dates
  • We only keep the latest rows

Getting Data from SharePoint Online Folder

As we get the data from multiple files stored on SharePoint Online, we have to use the SharePoint Folder connector. Follow these steps:

  1. Login to SharePoint Online and navigate to the site holding the Product list Excel files and copy the site URL from the browser
Getting SharePoint Online Site URL
Getting SharePoint Online Site URL
  1. From the Get Data in the Power BI Desktop, select the SharePoint Folder connector
  2. Click Connect
Connecting to SharePoint Online Folder from Power BI
Connecting to SharePoint Online Folder from Power BI
  1. Paste the Site URL copied on step 1
  2. Click OK
Connecting to SharePoint Online Folder from Power BI using the SharePoint Folder connector
Connecting to SharePoint Online Folder from Power BI using the SharePoint Folder connector
  1. Click Transform Data
Transforming data in Power Query Editor
Transforming data in Power Query Editor
Continue reading “Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1”

Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD

Slowly changing dimension (SCD) is a data warehousing concept coined by the amazing Ralph Kimball. The SCD concept deals with moving a specific set of data from one state to another. Imagine a human resources (HR) system having an Employee table. As the following image shows, Stephen Jiang is a Sales Manager having ten sales representatives in his team:

SCD in Power BI, Stephen Jiang is the sales manager of a team of 10 sales representatives
Image 1: Stephen Jiang is the sales manager of a team of 10 sales representatives

Today, Stephen Jiang got his promotion to the Vice President of Sales role, so his team has grown in size from 10 to 17. Stephen is the same person, but his role is now changed, as shown in the following image:

SCD in Power BI, Stephen's team after he was promoted to Vice President of Sales
Image 2: Stephen’s team after he was promoted to Vice President of Sales

Another example is when a customer’s address changes in a sales system. Again, the customer is the same, but their address is now different. From a data warehousing standpoint, we have different options to deal with the data depending on the business requirements, leading us to different types of SDCs. It is crucial to note that the data changes in the transactional source systems (in our examples, the HR system or a sales system). We move and transform the data from the transactional systems via ETL (Extract, Transform, and Load) processes and land it in a data warehouse, where the SCD concept kicks in. SCD is about how changes in the source systems reflect the data in the data warehouse. These kinds of changes in the source system do not happen very often hence the term slowly changing. Many SCD types have been developed over the years, which is out of the scope of this post, but for your reference, we cover the first three types as follows.

SCD type zero (SCD 0)

With this type of SCD, we ignore all changes in a dimension. So, when a person’s residential address changes in the source system (an HR system, in our example), we do not change the landing dimension in our data warehouse. In other words, we ignore the changes within the data source. SCD 0 is also referred to as fixed dimensions.

Continue reading “Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD”

Thin Reports, Real-world Challenges

Power BI Thin Reports, Real-world Challenges

I previously explained in a blog post what thin reports are and why we should care about them. I also explained Report Level Measures in another blog post. In this post, I try to raise some real-world challenges we face when developing thin reports. I also provide a solution to those challenges.

Report Level Measure Related Challenges

Creating and using Report Level Measures is relatively easy, but there are some challenges that we face from time to time, such as:

  • Distinguishing Report Level Measures from Dataset Level Measures
  • Report Level Measure dependencies

Determining Report Level Measures from Dataset Level Measures

One of the challenges that Power BI Developers face is creating many report level measures. Unfortunately, Power BI Desktop currently uses the same iconography for both types of measures, making it hard to distinguish the actual measures created within the dataset from the report level measures. It gets even more challenging if we need to write technical documentation for an existing thin report. We have to open the PBIX file of the thin report in the Power BI Desktop and click every single measure. If the expression bar appears, the selected measure is a report level measure; otherwise, it is a dataset level measure.

So unless we use third-party tools, which I explain in this post, we must go through the manual process.

Report Level Measure dependencies

Another pain point related to the previous challenge is finding the dependencies between the report level measures. It is crucial to be aware of the interdependencies when doing impact analysis. We need to understand how a change in a report level measure impacts other report level measures. Again, Power BI Desktop does not currently have any options supporting that, so we have to click every measure and read through the DAX expressions to identify the dependencies or use the third-party tools to save development time.

Dataset and Thin Reports Dependency Challenges

The other challenges are even more difficult to overcome relate to interdependencies between datasets and thin reports. Power BI Service provides a lineage view that shows the dependencies between a dataset and its connected thin reports. But the challenges can get more complex to overcome manually. The following are some real-world examples of more complex situations:

  • What if we need to analyse the impact of changes in a dataset measure on all report level measures of the connected thin reports?
  • How do we analyse the impact of changes on a dataset measure on all connected thin reports, including the visuals, filters, etc…?
  • What if we need to tune the performance and we want to find a list of all unused tables or unused fields?

As you can see, the situation can get pretty complex, so manual operations are virtually impossible.

But there is a third party tool we can use which provides heaps of capabilities with a couple of clicks.

Continue reading “Thin Reports, Real-world Challenges”

Thin Reports, Report Level Measures vs Data Model Measures

Thin Reports, Report Level Measures vs Data Model Measures

The previous post explained what Thin reports are, why we should care and how we can create them. This post focuses on a more specific topic, Report Level Measures. We discuss what report-level measures are, when and why we need them and how we create them.

If you are not sure what Thin Report means, I suggest you check out my previous blog post before reading this one.

What are report level measures?

Report level measures are the measures created by the report writers within a Thin Report. Hence, the report level measures are available within the hosting Thin Report only. In other words, the report level measures are locally available within the containing report only. These measures are not written back to the underlying dataset, hence not available to any other reports.

In contrast, the data model measures, are the measures created by data modellers and appear on the dataset level and are independent from the reports.

Why and when do we need report level measures?

It is a common situation in real-world scenarios when the business requires a report urgently, but the nuts and bolts of the report are not being created on the underlying dataset yet. For instance, the business requires to present a report to the board showing year-to-date sales analysis but the year-to-date sales measure hasn’t been created in the dataset yet. The business analyst approaches the Power BI developers to add the measure, but they are under the pump to deliver some other functionalities which adding a new measure is not even in their project delivery plan. It is perhaps too late if we wait for the developers to plan for creating the required measure, go through the release process, and make it available for us in the dataset. Here is when the report level measures come to the rescue. We can simply create the missing measure in the Thin Report itself, where we can later share it with the developers to implement it as a dataset measure.

Continue reading “Thin Reports, Report Level Measures vs Data Model Measures”