Data Lake

Data Lake

Microsoft Build 2023: What businesses need to know
AI, unsurprisingly, gets top billing at Microsoft's Build developers conference this year. But there are some other announcements that enterprises may find more applicable and compelling.
Stylized image of laptop with swirls indicating AI magic

Microsoft’s Build developers conference is happening this week, starting on May 23. This year’s event is “hybrid,” again, with attendees participating in person in Seattle or watching virtually. Unsurprisingly, there are lots of announcements around Microsoft’s Copilot AI assistance technology this week. But there also are other product and strategy reveals that likely will have more near-term and widespread customer impact.

Microsoft wants Build attendees to take away one big thing from the show: Copilots and plug-ins are here! (Except, in reality, they’re not -- other than in Bing and GitHub.)

Microsoft has been pre-announcing a range of Copilot assistants for just about every consumer and commercial product in its line-up since the start of 2023. The business-focused Microsoft Copilots which Microsoft now officially defines as “applications that use modern AI and large language models to assist you with a complex cognitive task” are in private preview for a (very) select few at this point. Microsoft still hasn’t said when they’ll be commercially available, how they’ll be licensed, or how much they’ll cost to use.

At Build, officials will continue to tout the productivity benefits of Copilots and the company’s decision to back the same plug-in standard that OpenAI is using for ChatGPT. The plug-ins we’ve seen demoed to date are almost all consumer ones, such as OpenTable and Instacart. But Microsoft is promising that corporate developers will be able to build their own custom plug-ins using Visual Studio and the Teams Toolkit for Visual Studio Code to provide users help with tasks such as processing contracts or managing corporate travel expense reports. Microsoft also is labeling Teams message extensions and Power Platform connectors as plug-ins, going forward. Officials said the 20 customers currently in the M365 Copilot Early Access program have access to more than 50 plug-ins from Microsoft, as well as Atlassian, Adobe, ServiceNow, Thomson Reuters and other vendors.

(I’m thinking that Microsoft execs are busily re-making the “Power House” Power Platform apps, which are customizable, template-like apps for specific business functions that we wrote about a few months ago, to fit into this Copilot/plug-in story. Maybe we will hear more on that this summer.)

Because no Microsoft event these days is complete these days without the announcement of EVEN MORE Copilots, Microsoft is bringing a Copilot to Windows 11(with a preview coming in June, assumedly to Windows Insiders) in the name of helping users navigate commands and work with multiple apps simultaneously. The Microsoft Edge browser is getting the Microsoft 365 Copilot. The newly announced Microsoft Fabric data platform will be getting a Fabric Copilot “soon.” Power Platform’s Power Pages Copilot is now in public preview and the Power BI Copilot for Data Analysis Expressions (DAX), announced today, is in public preview now, too.

Fabric, Azure Linux and Win32 Isolation (Oh, my!)

Enough AI smoke. Here are the other top announcements of potential interest to Microsoft business customers from Build:

Microsoft Fabric “Fabric” is the new end-to-end data and analytics platform Microsoft is unveiling at Build. It’s a combination of updated versions of existing tools, plus a few new ones, delivered in the form of Software as a Service (SaaS). The platform is built on top of a common data lake based on the open delta/parquet format, which Microsoft has dubbed “OneLake.” Microsoft’s goal with Fabric is to provide cross-suite integration. A bunch of the pieces of Fabric, including Data Factory; Synapse Data Engineering, Data Science, Data Warehousing and Real Time Analytics; as well as OneLake are in public preview now.

“Azure Linux”: Listen at Build for mentions of something called “Azure Linux” this week. We already knew that Microsoft has its own Linux distribution, called CBL-Mariner (CBL = Common Base Linux), but officials repeatedly have pointed out that Mariner is for Microsoft’s own internal use and not a commercial Linux distribution. Microsoft has disclosed previously that it used Mariner as the base for the Linux virtual machine that is in Azure Kubernetes Service (AKS) for edge devices. But until this week, officials have called this “AKS-Lite VM.”

When I asked about the name-change, I got the following statement: "Azure Linux is the commercial product for CBL-Mariner, and it is supported as a container host OS for AKS. Mariner (CBL-Mariner) as an AKS container host has been renamed to Azure Linux container host for AKS. Microsoft does not provide broad commercial support for Azure Linux as a server operating system," said Jim Perrin, Principal Program Manager Lead, Linux Systems Group. My interpretation: Mariner has graduated beyond an internal-use-only skunkworks project to more of a commercial Linux, even if limited. I'll be curious if Microsoft goes further in making Azure Linux a "real" Linux distribution at some point.

New Win32 app isolation technology: Microsoft previewed plans to add new Win32 app isolation technology to Windows 11 back at the BlueHat conference in April. This update will be in public preview on May 24 for both consumers and commercial customers. Microsoft is presenting this capability as a step on its journey to making Windows “adminless” in the name of security. Many Win32 “classic” apps don’t run with least privilege, and isolation could help contain security issues and reduce damage if an app is compromised. Win32 app isolation in Windows 11 will rely on the “Helium” container tech that lives on top of the existing registry and file system, which is used by the MSIX Windows app package format. Microsoft isn’t providing information on when this isolation capability will be available commercially or how it will be priced/licensed.

Even more Windows goodies: After a few years of making Windows an afterthought at Build, it seems Microsoft is making quite a few Windows-related announcements at the show this year. In addition to the Win32 app isolation announcement, Microsoft is showing off a new Dev home app (in preview) that will let developers more easily set up Windows as their dev machine with WinGet configuration, GitHub integration and more built in. Microsoft also is announcing officially “Dev Drive,” which is a new virtual hard disk (VHD) storage volume tailored for developers that is based on ReFS (Resilient File System). And Win365 Boot, the capability allowing Windows 365 users to log directly into their Win365 Cloud PCs and designate them as their primary Windows experience (with no interim steps required after the initial log in) is finally going to preview.

Microsoft Mesh and Teams avatars: Before it was all-in on AI, Microsoft was gearing up to hop on the metaverse train. The company gave its Mesh mixed-reality collaboration platform and avatars for Teams lots of promotional love in 2021 and 2022. Both technologies get a small nod at Build this year. Mesh is (finally) in private preview for those who want to build virtual experiences for town halls, employee training, onboarding, and virtual tours. And the avatars for Teams will be generally available for all Microsoft 365 Business and Enterprise customers in the Teams desktop app on Windows and Mac starting this week.

Entra External ID: Microsoft’s customer identity and access solution, Entra External ID, which Microsoft announced earlier this year, will hit preview this summer. (Entra is the brand for a set of services built on top of Azure Active Directory for identity management.) While Microsoft made Entra Verified ID generally available in 2022, allowing users to share proof of employment, education and similar personal information, the External ID component lets organizations employ an identity provider to manage their IDs and then manage access to apps with Azure AD or Azure AD B2C.

Microsoft Fabric: Microsoft's family of data products to get the suite treatment
Microsoft is readying updated releases of a number of its key data assets, including Power BI, Azure Synapse Analytics, and Data Factory, and is enabling them to work with a common 'OneLake' data lake storage back-end. Expect a big reveal at the Microsoft Build conference in late May.
Logo for the coming Microsoft Fabric Data Platform
WalkingCat on Twitter

While AI announcements are expected to get top billing at Microsoft's Build developer conference next week, its data platform reveals are going to get a lot of time in the limelight, too. The new "Microsoft Fabric" platform, built to make data access and insights more easily accessible and integrated, will be the subject of more than a few Build sessions, based on what's in the conference's online session catalog.

On May 17, Microsoft sleuth "WalkingCat" on Twitter posted what look to be pre-recorded video excerpts from the coming Microsoft Fabric announcements. In those segments, Arun Ulag, Corporate Vice President of Azure Data, outlines the coming Fabric data platform "for the era of AI."

In the video, Ulag says Fabric will be an end-to-end data and analytics platform, based on updated versions of existing tools, delivering a unified architecture, security and data-sharing experience in the form of Software as a Service (SaaS). The Fabric platform will be built on top of a common SaaS data lake, based on the open Delta Parquet format, accessible by all components. He calls this platform "OneLake," which he says is "almost like OneDrive, but for your data."

The Fabric platform will deliver data analytics directly to users in the Office apps by leveraging a new, integrated version of Power BI, Microsoft's business intelligence offering.

Microsoft Fabric, which was codenamed "Trident," will consist of seven core workloads or apps, each built for particular personas and tasks, according to the videos. The seven:

  • Data Factory: Data Factory pipelines are already part of Synapse Analytics and will continue to provide data integration services to bring data together, cleanse it, and get it ready for engineering
  • Synapse Data Engineering: An updated environment for Synapse's Spark allows leveraging notebooks for collaboration
  • Synapse Data Science: A new analytics module backed by Azure Machine Learning (ML), also will use notebooks for collaboration
  • Synapse Data Warehousing: An update to the SQL Server data warehouse component, updated for larger scaling, with the ability to use the Delta Parquet open data formats
  • Synapse Real Time Analytics engine: An update to the Azure Data Explorer component enabling it to analyze massive amounts of data, including IoT data
  • Power BI: An expansion of Microsoft's BI platform which is integrated into Fabric for workspace sharing and security
  • Data Activator: Another new data tool about which details are few (so far)

All of these workloads will store and access data in OneLake via workspace folders, and all the data will be in the Delta Parquet open-source data file format. (Ulag touts the decision to go with open-source formats in Fabric as a major differentiator between Microsoft and Snowflake, Google and AWS.) All data across the Fabric platform will be automatically provisioned with the tenant and be organized in a hierarchical namespace. The data will be automatically indexed for discovery, governance, compliance and more, according to the videos.

So what's the AI piece of this? (Given Microsoft seemingly has decided no product or service can be announced without an AI component, there must be at least one.) Microsoft is expected to to add its Copilot AI assistant technology to Power BI. Sources of mine say Copilot will be coming to all the other pieces of Microsoft Fabric, too, at some point, which will enable customers to use natural language to code, create reports, glean insights and more. Additionally, in the leaked videos, Azure OpenAI is mentioned as a key binding service across the Fabric platform, although there are few details as to what that actually will bring to the Fabric data platform party.

Last year at Build, Microsoft announced the Microsoft Intelligent Data Platform, which included everything already in the Azure Data space (Azure Data Factory, Azure Data Explorer, SQL Server 2022, Azure SQL, Cosmos DB, etc.) to the Synapse Analytics products, to Power BI, to the newly rebranded Purview compliance/governance product family. Microsoft officials said the unification was meant to weave analytics more tightly into the other components of customers' data estates.

With only these few details about the Build news available publicly at this point, it's hard to know for sure whether Microsoft Fabric is yet another marketecture-type announcement, or a major revamp of some of the company's most important data assets.

Information on what these changes mean to the existing platforms that make up Fabric and how that will affect existing customer deployments is unknown. Most of the Fabric components appear to be further enhancements and integrations of existing capabilities. On paper, at least, the idea of a common back-end storage platform and more open APIs sounds like a potentially big customer win. But more needs to be known about how easy it is to migrate from Microsoft's existing tools and how much it will cost organizations to do so.

Based on what we know so far, "Fabric feels more like an important evolution of Microsoft’s cloud analytics and data lake service, rather than a wholesale change. And that bodes well for customers already using the underlying services like Power BI and Data Lake and Synapse Analytics," said Directions on Microsoft analyst Andrew Snodgrass. "Fabric should help existing customers improve what they’re already doing and open the door to orgs that haven’t adopted the individual services because it’s hard to combine them all."

Subscribe to Data Lake

Microsoft’s data lake services provide high-speed ingestion and global scaling and support open-source technology that make them viable options for enterprise applications.

Azure Data Lake Analytics is retiring on Feb. 29, 2024, when Azure Data Lake Storage (ADLS) Gen 1 retires, which is no surprise as Analytics has a dependency on ADLS Gen 1.

The chart compares Microsoft’s on-premises and hosted database offerings.

Sidebar explains the various data types and technologies used in Microsoft database management offerings.

Azure Data Lake Storage offers a new back-end option to improve performance and scalability and raises questions about the future of Azure Data Lake Analytics

Azure Data Lake Storage provides scalable, peta-byte level storage and high-speed ingestion for unstructured data

Tags: Data Lake

Azure provides several Hadoop-based (Big Data) data management services, each providing specialized features with varying costs and configurations

Roadmap for Azure Data Lake, a Microsoft cloud service for analyzing Big Data.

Tags: Data Lake