Updated: July 15, 2020 (February 22, 2016)
Charts & IllustrationsPolyBase Accesses Hadoop, HDInsight
The PolyBase technology in SQL Server 2016 simplifies access to unstructured data stored in Hadoop, including Azure HDInsight, Microsoft’s Hadoop-based hosted service. The illustration shows how a SQL Server 2016 cluster (top) can use PolyBase to process T-SQL queries to unstructured data in a Hadoop-based cluster (bottom). The PolyBase engine runs on the head node of a PolyBase Group of servers, where it provides access to Hadoop table schemas and coordinates queries that are distributed to compute nodes.
Hadoop data schemas are generated by the Hadoop File System (HDFS), and MapReduce jobs (bottom middle) are accessed by the PolyBase head node, which presents them as database-level external table and resources (data sources and file formats) that can be queried by T-SQL (upper left). The HDFS manages the storing and access of unstructured data in the Hadoop framework and places a high-level addressable format on the data, similar to a traditional table schema. MapReduce is a custom programming model in Hadoop that is used to create formatting jobs that map, filter, sort, and summarize Hadoop unstructured data into an addressable format, similar to a traditional table schema.
Atlas Members have full access
Get access to this and thousands of other unbiased analyses, roadmaps, decision kits, infographics, reference guides, and more, all included with membership. Comprehensive access to the most in-depth and unbiased expertise for Microsoft enterprise decision-making is waiting.
Membership OptionsAlready have an account? Login Now