Updated: October 21, 2024 (October 21, 2024)
Charts & IllustrationsUnderstanding Fabric Spark Pools
The screen shot shows the creation of a capacity-level Spark pool definition. In this example, the administrator is creating a Spark pool definition for a Fabric F16 capacity. The pool will use the maximum number of Large memory optimized nodes allowed.
What Is an Apache Spark Pool?
An Apache Spark pool is a predefined set of compute resources created by an admin to simplify management of the back-end infrastructure and enforce organizational standards. A Spark pool is not a deployment, but rather a definition for how to deploy a set of resources to run Spark applications, such as jobs, scripts, and notebooks. A Spark pool definition describes the number and size of compute nodes, along with scaling settings, that a service like Fabric Data Engineering or Databricks uses to build a compute cluster.
Spark pools are needed because Fabric capacities are created with a defined set of compute resources (CPU and memory) that are shared on a first-come basis. A single workload like a Data Engineering Spark job could consume all available capacity resources and lock out other users and processes. Spark pool definitions allow admins to enforce a limit on compute usage and avoid potential clashes.
Atlas Members have full access
Get access to this and thousands of other unbiased analyses, roadmaps, decision kits, infographics, reference guides, and more, all included with membership. Comprehensive access to the most in-depth and unbiased expertise for Microsoft enterprise decision-making is waiting.
Membership OptionsAlready have an account? Login Now