Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either server-less or provisioned resources at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.
Azure Synapse provides you the platform to build and mange a modern DW with limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either server less on-demand or provisioned resources at scale.
How does Azure Synapse Analytics work?
Microsoft’s service is a SaaS (Software as a Service), and can be used on demand to run only when needed. It has four components:
- SQL Analytics with full T-SQL based analysis: SQL Cluster (pay per unit of computation) and SQL on demand (pay per TB processed).
- Apache Spark fully integrated.
- Connectors with multiple data sources.
Azure Synapse uses Azure Data Lake Storage Gen2 as a data warehouse and a consistent data model that incorporates administration, monitoring and metadata management sections. In the security area, it allows you to protect, monitor, and manage your data and analysis solutions, for example using single sign-on and Azure Active Directory integration. Basically, Azure Synapse completes the whole data integration and ETL process and is much more than a normal data warehouse since it includes further stages of the process giving the users the possibility to also create reports and visualizations.
In terms of programming language support, it offers a choice of several languages such as SQL, Python, .NET, Java, Scala and R. This makes it highly suitable for different analysis workloads and different engineering profiles.
Everything is encompassed within the Synapse Analytics Studio that makes it easy to integrate Artificial Intelligence, Machine Learning, IoT, intelligent applications or business intelligence, all within the same unified platform.
On the Road to Maximum Compatibility and Power
Initially, the Microsoft service is presented as a solution to two fundamental problems that companies must face. The first of these is compatibility. The data analysis system that it integrates has the ability to work with both traditional systems and unstructured data and various data sources. It is thus able to analyze data stored in systems such as customer databases (with names and addresses located in rows and columns arranged like a spreadsheet) and also with data stored in a Data Lake in parquet format.
But it also provides greater versatility in automatically handling tasks to build a system for analyzing data. This increased power has the direct consequence of reducing the amount of work needed by programmers, and by extension project development times (it is the first and only analysis system that has executed all TPC-H queries at petabyte scale).
Successful consultations in milliseconds
In addition to scaling process and storage resources separately, Azure Synapse Analytics stands out for its result caching capability (it has a fully managed 1 TB cache). Thus, when a query is made it is stored in this cache to speed up the next query that consumes the same type of data.
This is one of the keys to it being able to throw responses in milliseconds. This is because the cache survives pause, resume and scale operations (which can be activated very quickly by a massive parallel processing architecture designed for the cloud).
Workloads and performance
Also noteworthy is its full support for JSON, data masking to ensure high levels of security, support for SSDT (SQL Server Data Tools) and especially workload management and how it can be optimized and isolated. Here multiple workloads share implemented resources. This makes it possible to create a workload and assign the amount of CPU and concurrency to it.
In the case of having for example 1000 DWU (Data Warehouse Units), Azure Synapse facilitates the operation of assigning for example a percentage of work to sales and another to marketing (for example 60% to one and 40% to the other). The idea is to facilitate administration and prioritize database queries.
In terms of data preparation and ingestion, it supports streaming in an integrated manner (Native SQL Streaming) to generate analyses, for example with integration with Event Hub or an IoT Hub. And it achieves this by achieving high performance of up to 200MB/second, delivery latencies in seconds, ingest performance scales with computing scale, and analysis capabilities with Microsoft SQL-based queries for combinations, aggregations, filters…
Some additional features
- For data preparation and loads, the Copy order makes external tables no longer necessary, since it allows you to load tables directly into the database.
- It offers full support for standard CSV: line breaks and custom delimiters and SQL dates.
- Provides user-controlled file selection (wildcard support)
- Machine Learning support: Machine Learning models can be created and saved in ONNX format, which are stored within the Azure Synapse data store and used with the native PREDICT instruction.
- Integration with Data Lake: from Azure Synapse, files are read in the Data Lake in Parquet format, which achieves a much higher performance improving Polybase execution over 13x.