Microsoft Fabric: A SaaS Analytics Platform for the Era of AI

Microsoft Fabric

Microsoft Fabric is a new and unified analytics platform in the cloud that integrates various data and analytics services, such as Azure Data Factory, Azure Synapse Analytics, and Power BI, into a single product that covers everything from data movement to data science, real-time analytics, and business intelligence. Microsoft Fabric is built upon the well-known Power BI platform, which provides industry-leading visualization and AI-driven analytics that enable business analysts and users to gain insights from data.

Basic concepts

On May 23rd 2023, Microsoft announced a new product called Microsoft Fabric at the Microsoft Build conference. Microsoft Fabric is a SaaS Analytics Platform that covers end-to-end business requirements. As mentioned earlier, it is built upon the Power BI platform and extends the capabilities of Azure Synapse Analytics to all analytics workloads. This means that Microfot Fabric is an enterprise-grade analytics platform. But wait, let’s see what the SaaS Analytics Platform means.

What is an analytics platform?

An analytics platform is a comprehensive software solution designed to facilitate data analysis to enable organisations to derive meaningful insights from their data. It typically combines various tools, technologies, and frameworks to streamline the entire analytics lifecycle, from data ingestion and processing to visualisation and reporting. Here are some key characteristics you would expect to find in an analytics platform:

  1. Data Integration: The platform should support integrating data from multiple sources, such as databases, data warehouses, APIs, and streaming platforms. It should provide capabilities for data ingestion, extraction, transformation, and loading (ETL) to ensure a smooth flow of data into the analytics ecosystem.
  2. Data Storage and Management: An analytics platform needs to have a robust and scalable data storage infrastructure. This could include data lakes, data warehouses, or a combination of both. It should also support data governance practices, including data quality management, metadata management, and data security.
  3. Data Processing and Transformation: The platform should offer tools and frameworks for processing and transforming raw data into a usable format. This may involve data cleaning, denormalisation, enrichment, aggregation, or advanced analytics on large data volumes, including streaming IOT (Internet of Things) data. Handling large volumes of data efficiently is crucial for performance and scalability.
  4. Analytics and Visualisation: A core aspect of an analytics platform is its ability to perform advanced analytics on the data. This includes providing a wide range of analytical capabilities, such as descriptive, diagnostic, predictive, and prescriptive analytics with ML (Machine Learning) and AI (Artificial Intelligence) algorithms. Additionally, the platform should offer interactive visualisation tools to present insights in a clear and intuitive manner, enabling users to explore data and generate reports easily.
  5. Scalability and Performance: Analytics platforms need to be scalable to handle increasing volumes of data and user demands. They should have the ability to scale horizontally or vertically. High-performance processing engines and optimised algorithms are essential to ensure efficient data processing and analysis.
  6. Collaboration and Sharing: An analytics platform should facilitate collaboration among data analysts, data scientists, and business users. It should provide features for sharing data assets, analytics models, and insights across teams. Collaboration features may include data annotations, commenting, sharing dashboards, and collaborative workflows.
  7. Data Security and Governance: As data privacy and compliance become increasingly important, an analytics platform must have robust security measures in place. This includes access controls, encryption, auditing, and compliance with relevant regulations such as GDPR or HIPAA. Data governance features, such as data lineage, data cataloging, and policy enforcement, are also crucial for maintaining data integrity and compliance.
  8. Flexibility and Extensibility: An ideal analytics platform should be flexible and extensible to accommodate evolving business needs and technological advancements. It should support integration with third-party tools, frameworks, and libraries to leverage additional functionality.
  9. Ease of Use: Usability plays a significant role in an analytics platform’s adoption and effectiveness. It should have an intuitive user interface and provide user-friendly tools for data exploration, analysis, and visualisation. Self-service capabilities empower business users to access and analyse data without heavy reliance on IT or data specialists.
    These characteristics collectively enable organisations to harness the power of data and make data-driven decisions. An effective analytics platform helps unlock insights, identify patterns, discover trends, and drive innovation across various domains and industries.
Continue reading “Microsoft Fabric: A SaaS Analytics Platform for the Era of AI”

Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD

Slowly changing dimension (SCD) is a data warehousing concept coined by the amazing Ralph Kimball. The SCD concept deals with moving a specific set of data from one state to another. Imagine a human resources (HR) system having an Employee table. As the following image shows, Stephen Jiang is a Sales Manager having ten sales representatives in his team:

SCD in Power BI, Stephen Jiang is the sales manager of a team of 10 sales representatives
Image 1: Stephen Jiang is the sales manager of a team of 10 sales representatives

Today, Stephen Jiang got his promotion to the Vice President of Sales role, so his team has grown in size from 10 to 17. Stephen is the same person, but his role is now changed, as shown in the following image:

SCD in Power BI, Stephen's team after he was promoted to Vice President of Sales
Image 2: Stephen’s team after he was promoted to Vice President of Sales

Another example is when a customer’s address changes in a sales system. Again, the customer is the same, but their address is now different. From a data warehousing standpoint, we have different options to deal with the data depending on the business requirements, leading us to different types of SDCs. It is crucial to note that the data changes in the transactional source systems (in our examples, the HR system or a sales system). We move and transform the data from the transactional systems via ETL (Extract, Transform, and Load) processes and land it in a data warehouse, where the SCD concept kicks in. SCD is about how changes in the source systems reflect the data in the data warehouse. These kinds of changes in the source system do not happen very often hence the term slowly changing. Many SCD types have been developed over the years, which is out of the scope of this post, but for your reference, we cover the first three types as follows.

SCD type zero (SCD 0)

With this type of SCD, we ignore all changes in a dimension. So, when a person’s residential address changes in the source system (an HR system, in our example), we do not change the landing dimension in our data warehouse. In other words, we ignore the changes within the data source. SCD 0 is also referred to as fixed dimensions.

Continue reading “Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD”

Thin Reports, Report Level Measures vs Data Model Measures

Thin Reports, Report Level Measures vs Data Model Measures

The previous post explained what Thin reports are, why we should care and how we can create them. This post focuses on a more specific topic, Report Level Measures. We discuss what report-level measures are, when and why we need them and how we create them.

If you are not sure what Thin Report means, I suggest you check out my previous blog post before reading this one.

What are report level measures?

Report level measures are the measures created by the report writers within a Thin Report. Hence, the report level measures are available within the hosting Thin Report only. In other words, the report level measures are locally available within the containing report only. These measures are not written back to the underlying dataset, hence not available to any other reports.

In contrast, the data model measures, are the measures created by data modellers and appear on the dataset level and are independent from the reports.

Why and when do we need report level measures?

It is a common situation in real-world scenarios when the business requires a report urgently, but the nuts and bolts of the report are not being created on the underlying dataset yet. For instance, the business requires to present a report to the board showing year-to-date sales analysis but the year-to-date sales measure hasn’t been created in the dataset yet. The business analyst approaches the Power BI developers to add the measure, but they are under the pump to deliver some other functionalities which adding a new measure is not even in their project delivery plan. It is perhaps too late if we wait for the developers to plan for creating the required measure, go through the release process, and make it available for us in the dataset. Here is when the report level measures come to the rescue. We can simply create the missing measure in the Thin Report itself, where we can later share it with the developers to implement it as a dataset measure.

Continue reading “Thin Reports, Report Level Measures vs Data Model Measures”

Thin Reports, What Are They, Why Should I Care and How Can I Create Them?

Thin Reports in Power BI

Shared Datasets have been around for quite a while now. In June 2019, Microsoft announced a new feature called Shared and Certified Datasets with the mindset of supporting enterprise-grade BI within the Power BI ecosystem. In essence, the shared dataset feature allows organisations to have a single source of truth across the organisation serving many reports.

A Thin Report is a report that connects to an existing dataset on Power BI Service using the Connect Live connectivity mode. So, we basically have multiple reports connected to a single dataset. Now that we know what a thin report is, let’s see why it is best practice to follow this approach.

Prior to the Shared and Certified Datasets announcement, we used to create separate reports in Power BI Desktop and publish those reports into Power BI Service. This approach had many disadvantages, such as:

  • Having many disparate islands of data instead of a single source of truth.
  • Consuming more storage on Power BI Service by having repetitive table across many datasets
  • Reducing collaboration between data modellers and report creators (contributors) as Power BI Desktop is not a multi-user application.
  • The reports were strictly connected to the underlying dataset so it is so hard, if not totally impossible, to decouple a report from a dataset and connect it to a different dataset. This was pretty restrictive for the developers to follow the Dev/Test/Prod approach.
  • If we had a fairly large report with many pages, say more than 20 pages, then again, it was almost impossible to break the report down into some smaller and more business-centric reports.
  • Putting too much load on the data sources connected to many disparate datasets. The situation gets even worst when we schedule multiple refreshes a day. In some cases the data refresh process put exclusive locks on the the source system that can potentially cause many issues down the road.
  • Having many datasets and reports made it harder and more expensive to maintain the solution.

In my previous blog, I explained the different components of a Business Intelligence solution and how they map to the Power BI ecosystem. In that post, I mentioned that the Power BI Service Datasets map to a Semantic Layer in a Business Intelligence solution. So, when we create a Power BI report with Power BI Desktop and publish the report to the Power BI Service, we create a semantic layer with a report connected to it altogether. By creating many disparate reports in Power BI Desktop and publishing them to the Power BI Service, we are indeed creating many semantic layers with many repeated tables on top of our data which does not make much sense.

On the other hand, having some shared datasets with many connected thin reports makes a lot of sense. This approach covers all the disadvantages of the previous development method; in addition, it decreases the confusion for report writers around the datasets they are connecting to, it helps with storage management in Power BI Service, and it is easier to comply with security and privacy concerns.

Continue reading “Thin Reports, What Are They, Why Should I Care and How Can I Create Them?”