What is Azure DataBricks ?
Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform . It is a fast, easy, and collaborative Apache Spark-based analytics service
Here are some of the benefits for Data Engineers and Data Scientists for using Azure Databricks:
1. Optimized Environment
Azure Databricks is optimized from the ground up for performance and cost-efficiency in the cloud .
The Databricks Runtime adds several key capabilities to Apache Spark workloads that can increase performance and reduce costs by as much as 10-100x when running on Azure, including:
a)High-speed connectors to Azure storage services, such as Azure Blob Store and Azure Data Lake, developed together with the Microsoft teams behind these services.
b)Auto-scaling and auto-termination for Spark clusters to automatically minimize costs.
c)Performance optimizations including caching, indexing, and advanced query optimization, which can improve performance by as much as 10-100x over traditional Apache Spark deployments in cloud or on-premise environments
2. Seamless Collaboration
Notebooks on Databricks are live and shared, with real-time collaboration, so that everyone in your organization can work with your data. Dashboards enable business users to call an existing job with new parameters. Also, Databricks integrates closely with PowerBI for interactive visualization.
3. Easy to Use
Azure Databricks comes packaged with interactive notebooks that let you connect to common data sources, run machine learning algorithms, and learn the basics of Apache Spark to get started quickly. It also features an integrated debugging environment to let you analyze the progress of your Spark jobs from within interactive notebooks, and powerful tools to analyze past jobs. Finally, other common analytics libraries, such as the Python and R data science stacks, are preinstalled so that you can use them with Spark to derive insights.
Total Azure Integration
- Diversity of VM types: Customers can use all existing VMs including F-series for machine learning scenarios, M-series for massive memory scenarios, D-series for general purpose, etc.
- Security and Privacy: In Azure, ownership and control of data is with the customer. We have built Azure Databricks to adhere to these standards. We aim for Azure Databricks to provide all the compliance certifications that the rest of Azure adheres to.
- Flexibility in network topology: Customers have a diversity of network infrastructure needs. Azure Databricks supports deployments in customer VNETs, which can control which sources and sinks can be accessed and how they are accessed.
- Azure Storage and Azure Data Lake integration: These storage services are exposed to Databricks users via DBFS to provide caching and optimized analysis over existing data.
- Azure Power BI: Users can connect Power BI directly to their Databricks clusters using JDBC in order to query data interactively at massive scale using familiar tools.
- Azure Active Directory provide controls of access to resources and is already in use in most enterprises. Azure Databricks workspaces deploy in customer subscriptions, so naturally AAD can be used to control access to sources, results, and jobs.
- Azure SQL Data Warehouse, Azure SQL DB, and Azure CosmosDB: Azure Databricks easily and efficiently uploads results into these services for further analysis and real-time serving, making it simple to build end-to-end data architectures on Azure
For the first time, a leading cloud provider and leading analytics system provider have partnered to build a cloud analytics platform optimized from the ground up – from Azure’s storage and network infrastructure all the way to Databricks’s runtime for Apache Spark. We believe that Azure Databricks will greatly simplify building enterprise-grade production data applications