Updated: July 23, 2020 (March 19, 2018)

  Charts & Illustrations

HDInsight Cluster Configuration Options

Andrew Snodgrass by
Andrew Snodgrass

Andrew analyzes and writes about Microsoft's data management, business intelligence, and machine learning solutions, as well as aspects of licensing... more

HDInsight is a Microsoft-hosted Hadoop service that offers multiple native Hadoop deployment configurations optimized for different workloads. The illustration shows three panels from the Azure portal that are used to configure an HDInsight cluster.

A new cluster configuration begins with selecting the type of cluster and version (left). For each cluster type, users can select from at least two newest community editions of the package. For example, a Spark cluster can currently be deployed using Spark 1.6.3, 2.0.2, or 2.1.0. Other Apache Hadoop components (bottom left) that support the environment, such as metastores and cluster management, are automatically deployed with the cluster.

Approved third-party applications (right) that provide specialized functionality, such as data stream management, artificial intelligence, and data security, can be added to the cluster. Third-party applications incur additional cost.

The final step is selecting the number and size of Linux VMs, called nodes (middle), to meet performance requirements. Each cluster type defaults to a specific VM series that is appropriate for the functions it performs. For example, Hadoop clusters typically use general purpose A series VMs, whereas Spark clusters typically use high-performing D series with large memory levels and solid-state drives.

Atlas Members have full access

Get access to this and thousands of other unbiased analyses, roadmaps, decision kits, infographics, reference guides, and more, all included with membership. Comprehensive access to the most in-depth and unbiased expertise for Microsoft enterprise decision-making is waiting.

Membership Options

Already have an account? Login Now