Updated: July 15, 2020 (February 22, 2016)

  Charts & Illustrations

PolyBase Accesses Hadoop, HDInsight

My Atlas / Charts & Illustrations

316 wordsTime to read: 4 min
Andrew Snodgrass by
Andrew Snodgrass

Andrew analyzes and writes about Microsoft's data management, business intelligence, and machine learning solutions, as well as aspects of licensing... more

The PolyBase technology in SQL Server 2016 simplifies access to unstructured data stored in Hadoop, including Azure HDInsight, Microsoft’s Hadoop-based hosted service. The illustration shows how a SQL Server 2016 cluster (top) can use PolyBase to process T-SQL queries to unstructured data in a Hadoop-based cluster (bottom). The PolyBase engine runs on the head node of a PolyBase Group of servers, where it provides access to Hadoop table schemas and coordinates queries that are distributed to compute nodes.

Hadoop data schemas are generated by the Hadoop File System (HDFS), and MapReduce jobs (bottom middle) are accessed by the PolyBase head node, which presents them as database-level external table and resources (data sources and file formats) that can be queried by T-SQL (upper left). The HDFS manages the storing and access of unstructured data in the Hadoop framework and places a high-level addressable format on the data, similar to a traditional table schema. MapReduce is a custom programming model in Hadoop that is used to create formatting jobs that map, filter, sort, and summarize Hadoop unstructured data into an addressable format, similar to a traditional table schema.

Atlas Members have full access

Get access to this and thousands of other unbiased analyses, roadmaps, decision kits, infographics, reference guides, and more, all included with membership. Comprehensive access to the most in-depth and unbiased expertise for Microsoft enterprise decision-making is waiting.

Membership Options

Already have an account? Login Now