What is Microsoft Fabric? Part 3: Performance

25.01.2024 Pinja Blog Business Intelligence

Icons related to knowledge management inside gears

You can read the other parts of the series via the links below:
What is Microsoft Fabric? Part 1: From the past to the present
What is Microsoft Fabric? Part 2: Technology
What is Microsoft Fabric? Part 4: Licenses
What is Microsoft Fabric? Part 5: Fabric is Microsoft’s favorite child and it is developed rapidly
What is Microsoft Fabric? Part 6: Lakehouse integrates artificial intelligence and combines disparate data

In this part, we focus on the performance of Microsoft Fabric.

The technical performance of the data platform is an important part of data management, and has an impact on every aspect of the data product. If the data transfer or processing speed is poor, the data update to the Power BI report will be slower and, in the worst case, may result in additional costs. Real-time data, or data that is updated several times a day, is something you can dream about on a slow data platform.

Fabric allows you to improve performance by, for example, shortening the data pipeline. This is a good and cost-effective approach, as it does not require increased computing power. In the case of Fabric, the data pipeline is shortened by the fact that data is not duplicated in OneLake, and the Direct Lake feature can be used on Power BI reports.

Because Fabric’s OneLake does not physically replicate data in different layers of the data warehouse, the data travels through the data warehouse stages (e.g. Bronze, Silver, Gold) faster than with traditional data warehouses. In addition, external data stores can be made available by simple shortcuts, so data does not necessarily have to be transferred at all.

With Power BI’s Direct Lake feature, there is no need to load data from the Fabric data warehouse (Lakehouse or Warehouse) to the Power BI service on a scheduled basis, but the required data is read from the data warehouse when the report is run. So in this sense, Direct Lake works in the same way as the Direct Query and Live Connect features already familiar from Power BI. However, the performance of Direct Lake is closer to a dataset imported with the Import function than a dataset/semantic model based on Direct Query.

In addition to shortening the data pipeline, Direct Lake also enables the use of tables with very large row and column counts for reporting. We’ve run a 100 column, 100 million row view report through Direct Lake as a test, and marveled at the performance. However, the increased performance does not eliminate the need for smart planning.

The capacity of the Fabric environment is like a Tesla battery

The Fabric environment capacity can be selected from 11 different capacity options between F2 and F2048. The capacity always doubles compared to the previous level, for example F2 -> F4 -> F8... What does this capacity mean? Capacity is the average capacity unit seconds. They can be thought of as “computing time” or as a Tesla battery. How much computing time the environment needs per day helps determine the capacity choice.

Smaller capacities can be thought of as Tesla’s Plaid models. In Ludicrous mode, the car’s acceleration resembles that of the fastest hyper cars, but there’s not enough battery power for constant extreme acceleration. Even the smallest F2 capacity is capable of high processing power momentarily, but its seconds are not sufficient for the continuous processing needs of a large environment. Higher than usual computing power is made possible by Fabric’s Bursting and Smoothing features. In their example, Microsoft explains how bursting allows you to use four times the capacity of 256 CUs instead of the 64 CUs you bought, reducing processing time to a quarter. (Bursting is automatic, and can be used to gain power in exchange for capacity).

Just as you can’t constantly accelerate in Ludicrous mode with a Tesla, you can’t constantly run Fabric at power levels exceeding your capacity. The Smoothing feature keeps track of overruns. If consumption exceeds the purchased capacity for less than 10 minutes, there is no performance limit. If the overrun lasts more than 10 minutes, performance is limited, and in the worst case, even time processes are not run. So the battery of the electric rocket is on its way out, and you have to stop to recharge it. The Fabric Capacity Metrics application helps you plan the right size Fabric environment. It allows you to monitor the power consumption of the environment and the potential power limits.

Fabric performance test results

So how does the performance of Microsoft Fabric compare to, say, Synapse or Azure SQL? That’s what we found out with our performance test.

We chose the F64 capacity for our Fabric. For comparison, the traditional Synapse Analytics Warehouse, size DWU500, which is about the same price as Fabric F64 at the time of writing (around $5,500/month). Synapse Analytics Serverless was not included in the comparison, as it is very limited in usability and features for lakehouse/data warehouse use. In addition, we added the traditional Azure SQL database 8 vCore ($1,700/month) and 24 vCore ($4,700/month) to the comparison. The Azure SQL database was tested with two different sizes, as there was little difference in the speed of data loading, i.e. it does not scale.

Performance was compared on two datasets:

Large demo table (240 million rows, over 100 columns)
NYC Taxi Yellow (73 million rows)
NYC Taxi Yellow (282 million rows)

The large demo table could not be uploaded to Synapse at all; we tried direct insert uploading via Polybase External Table and also Synapse’s own Data Factory. Both eventually crashed either due to tempdb running out of capacity or a timeout error, after about 2 hours of uploading. It took 14 minutes to upload the same data into Fabric using Lakehouse, which is a very good result for this dataset.

Below is a summary table of the test results:

Load	Fabric	Synapse	SQL	Description
NYC Taxi Yellow 73m transfer	1 min	5 min	25 min	External Azure Data Lake Gen 2 source
Facts + Dimensions query	4 sec	13 sec	1 min	Data model with fact and 7 dimensions
Loading a complex view	1 h	1 h 15 min	2 h	A more complex view created from a data model, with CASE WHEN handlings. NYC Taxi Yellow duplicated to 280 million rows as fact. Transfer from one table to another in the Warehouse.
Loading a complex view into Lakehouse	1 h	n/a	n/a	Loading a complex view from Warehouse to Lakehouse
Loading a complex view inside Lakehouse	3 min	n/a	n/a	Loading a complex view inside Lakehouse
Uploading a large table to the data warehouse	14 min	2+ h	3+ h	In Fabric via Shortcut to Lakehouse. In Synapse, both with Data Factory and via external table, the upload often crashed with an error.

Summary of Fabric’s performance

Concrete tests show that you really do get a lot of value for money with Fabric. Even at medium capacity, performance is top-notch, and Fabric provides the tools to fit virtually any scenario to smoothly bring your data warehouse project to the finish line. In addition, the product includes new innovative performance-enhancing features such as smoothing, bursting, and Direct Lake.

Another good thing about Fabric is that all the tools are truly centralized in one portal, so the entire development team and end users are on the same playing field. Pricing is also relatively simple, as the data warehouse excludes almost all individual services in Azure that are normally configured to work as separate entities. Fabric provides almost all the services you need.

Lakehouse/Warehouse thinking allows Data Engineers and Data Scientists in the development team to use the same powerful tools and the same already cleaned data. Much of this text is focused on Fabric’s performance, and in the next part of the series we will go into more detail about Fabric’s licensing.

Knowledge management – How to use data more effectively
Microsoft Fabric and AI make it easier for educational institutions to manage data and self-service reporting
What is Microsoft Fabric?
What is Microsoft Fabric? Part 1: From the past to the present
What is Microsoft Fabric? Part 2: Technology
What is Microsoft Fabric? Part 4: Licenses
What is Microsoft Fabric? Part 5: Fabric is Microsoft’s favorite child and it is developed rapidly
What is Microsoft Fabric? Part 6: Lakehouse integrates artificial intelligence and combines disparate data
Pinja’s knowledge management and business intelligence services

Aleksi Rytkönen

I work at Pinja as a data warehouse architect. I design technical solutions that suit the needs of customers and help with their implementation. I am particularly interested in the things that happen under the hood of data warehouses. On my free time I spend time with family, play video games and exercise.

Back to the Pinja Blog

Pinja Blog

What will maintenance managers invest in in 2024?

What is Microsoft Fabric? Part 3: Performance

The capacity of the Fabric environment is like a Tesla battery

Fabric performance test results

Summary of Fabric’s performance

Read more:

Aleksi Rytkönen

Back to the Pinja Blog

Categories

What is Microsoft Fabric? Part 3: Performance

The capacity of the Fabric environment is like a Tesla battery

Fabric performance test results

Summary of Fabric’s performance

Read more:

Share on social media

Aleksi Rytkönen

Back to the Pinja Blog

Categories