When Microsoft introduced Microsoft Fabric, it promised to be an end-to-end analytics and data platform for enterprises looking for a unified solution. While it sparked excitement, it also raised a lot of questions.
Like many in the data community, we wondered: Why is data engineering happening in Power BI? Where should data transformation take place now? As a Principal Architect and Head of Innovation at SDK, my job is to challenge the status quo, find better ways to do things, and provide greater value to our clients. While many of us get caught up in the daily minutiae of development lifecycles and implementations, part of my job is to ask why. As the official asker of why, I had some questions for Fabric: Why should we use this? And, more importantly, can this new platform make our projects easier to deliver?
In 2024, SDK set out to answer these questions head-on. Our goal was to build an accelerator that would allow us to implement Fabric for our clients efficiently and based on best practices. This is our journey.
What Makes Fabric Different?
At first glance, Microsoft Fabric stands out because it builds on the familiar Power BI portal while providing an all-in-one solution that is simple to set up. Although Fabric doesn’t offer the full flexibility of traditional Azure implementations, it provides a streamlined, all-in-one platform for both generating and consuming data with ease. In short, it bridges the gap between data professionals and citizen developers, making analytics more accessible and efficient.
From my perspective, here are the four reasons Microsoft Fabric is changing the game of data analytics:
- Seamless Connectivity One of Fabric’s biggest advantages is its built-in connectivity. In traditional Azure implementations, developers must manually configure linked services and then set up connections between each resource. Fabric simplifies this by handling authentication just once. And because it is all one platform, the components seamlessly communicate with one another.
- Storage Flexibility – Read and Write Across Multiple Languages A core principle for Fabric is data accessibility. All data is stored in Delta Lake, which is a leading industry standard for storage. Microsoft Fabric enhances this by making Delta Lake available across multiple languages: PySpark or Spark SQL writes to a Lakehouse, SQL writes to a Warehouse, and KQL writes to an Eventhouse. The game-changer? Fabric uniquely allows reading data across these languages seamlessly. This interoperability makes working with data more intuitive compared to other platforms.
- Tools for Every Skill Level Microsoft Fabric brings together tools for data engineers, analysts, and citizen developers under one roof. Whether you’re a data engineer who uses a notebook as your main tool, a data analytic who uses SQL, or a citizen developer who uses a low-code/no-code approach with dataflows to assist in transformations, Fabric provides all users with the ability to build out standardized patterns to load data.
- Feature Roadmap – Continuous Innovation If you were around in 2015 when the launch of Power BI completely transformed the data landscape, you may be familiar with Microsoft’s rapid development cycle. Microsoft has adopted the same agile approach with Fabric, releasing new features every month. While Fabric initially had some gaps, Microsoft has consistently delivered on its feature roadmap (give or take a month).
How We Built Our Fabric Accelerator
SDK has a strong track record of developing enterprise-ready Accelerators, pre-built frameworks that simplify data implementation, drive operational efficiency, and accelerate cloud and AI adoption using best practices and purpose-built analytic models. Our Accelerators have been successfully deployed at scale across multiple clients.
For SDK’s Fabric Accelerator, dubbed Fabric Manager, we focused on designing an Accelerator that incorporates six key features:
- Configurable, Data Driven, and Parameterized Pipelines Traditional code-based platforms can become unmanageable when handling large datasets. Instead, we prefer a data or configuration-based platform. For example, if you need to extract 100 tables, does that mean you need 100 separate data pipelines? That could quickly become unmanageable. Instead, we build one data pipeline that takes in parameters. This configuration is stored in a Fabric Warehouse (like SQL). With our parameterized pipelines, adding 20 new tables is as simple as updating a configuration file.
- Orchestration That Works the Way You Need It Orchestration is the glue that pulls it altogether. As you know, data engineering workflows require precise sequencing – extract, transform, load. Dimensions are loaded before facts, then the semantic model is run. We used Fabric Workflows (based on Airflow) to manage orchestration. By combining this with our warehouse-stored pipelines, we can configure execution order and concurrency for maximum performance.
- Modular, Maintainable, Object-Oriented Code Historically, data professionals followed a flow-based approach to their coding practices. This was influenced by the declarative nature of SQL. As they transitioned to Python, this flow-based approach often carried over. In addition, SQL wasn’t originally designed as a class/method language, so developers often needed to use the same code in multiple places. When building Fabric Manager, we implemented object-orientated principles, allowing developers to create reuseable components, which streamlines maintenance.
- Pre-Built Functions for Faster Development Fabric Manager uses a wheel file, a code library that is packaged together with key functions, that acts similarly to Excel’s built-in formulas. Just as you don’t need to write an Average function in Excel, Fabric Manager comes equipped with the essential functions for data extraction and transformation, reducing development time and improving reliability.
- Maintainable and Deployable Microsoft continues to add new CI/CD features and REST APIs to Fabric. While Fabric’s Git integration is still maturing, we worked through these challenges to get Fabric Manager into a state of semi-automatic deployment. Code and configurations are stored in Git, ensuring version control.
- Robust Monitoring To troubleshoot and optimize a company’s data estate, organizations need visibility into what pipelines are running, how long it takes, and what data is being moved. Without this key information, cost savings and performance measures are difficult to quantify. SDK’s Fabric Manager logs all data movement for duration, errors, and key metrics. A Power BI report helps the business pinpoint areas for optimization.
Lessons Learned: Challenges & Considerations
While Fabric is a powerful platform, we encountered some gotchas throughout our journey. None of these challenges were showstoppers in the process of building Fabric Manager; rather, areas we have noted for improvement. The great news is that the Fabric team is listening to the community and adding new features often.
Here are a few gotchas we have noticed when using Fabric:
- Speed Considerations If you’re used to SQL Server, you may initially find Fabric to be slow when first starting a session. However, performance improves as Fabric caches data during a session.
- Security Limitations Fabric offers workplace-level and data-level security. For workplace security, fine grained permissions are more rigid than some organizations may expect. Data security primarily relies on the SQL endpoint controls, offering Column, Row and Object-level permissions. However, managing security without schema organization can be cumbersome. Schema support is currently in preview, but enabling it limits certain security controls.
- SPN Support is Lacking Server principal name (SPN) support is lacking on APIs, requiring the use of service or user accounts instead. This creates security and supportability concerns, as services accounts can be exploited by malicious actors. Additionally, if a user account is used and that user leaves, the transition to a new user can be cumbersome.
- Developer Experience Fabric has some gaps as a developer platform. Some examples that come to mind include. You can’t work on multiple components (a notebook and a data pipeline for example) in the same tab. We tried to work locally instead, but the flow wasn’t quite there. The Git integration is about 80% complete, with more improvements on the way. Warehouses lack key SQL functions, such as Identify and Merge. While workarounds exist, they can be challenging for those unfamiliar with them.
- External Connections and Key Vault Connections outside of Fabric, such as to source systems, cannot be Key Vault-backed. Additionally, these connections are tenant wide, meaning that when switching between Dev, Test, Prod environments, all connections must exist within the same tenant. This requires manual updates during deployment, adding extra complexity.
It is important to note that these gotchas we’ve encountered are specific only at the time of writing. Many of these may not be relevant in the future and are intended only to speak to our past experiences.
What’s Next? What Features We’re Excited to Try
Despite some challenges, we’re excited to continue to develop in Fabric and enhance our Fabric Manager accelerator.
Upcoming features we are excited to try next:
- Copy job for easier data movement
- The AI Skill enhancement to the “chat with my data” feature
- The Metrics Layer of Fabric, which will allow users to generate custom models for ad hoc analysis without having to learn an entire semantic model.
Microsoft Fabric is evolving rapidly, and the platform’s development team consistently delivers on its roadmap. We look forward to seeing how these new features address some of the challenges we’ve encountered.
Ursula Pflanz, Principal Architect & Head of Innovation at SDK