Logo
Sign in
Product Logo
DataprocGoogle

A managed service for open-source data analytics and AI on the cloud.

Vendor

Vendor

Google

Company Website

Company Website

Product details

Google Cloud Dataproc is a fully managed service designed for running Apache Hadoop, Apache Spark, Apache Flink, Presto, and over 30 other open-source data analytics tools and frameworks. It facilitates data lake modernization, ETL (Extract, Transform, Load), and secure data science at scale, integrating well with Google Cloud services to enhance data processing and analytics.

Key Features

  • Serverless Deployment: Autoscaling Spark applications without manual infrastructure management.
  • Resizable and Autoscaling Clusters: Quickly scale clusters based on workload needs.
  • Cloud Integration: Built-in integration with Cloud Storage, BigQuery, Vertex AI, and more.
  • Automatic or Manual Configuration: Flexibility in hardware and software setup.
  • Developer Tools: Management options include web UI, Cloud SDK, RESTful APIs, and SSH access.
  • Initialization Actions: Customize cluster settings during creation.
  • Optional Components: Install additional open-source components like Zeppelin and Presto.
  • Workflow Templates: Manage and execute workflows easily.
  • Automated Policy Management: Standardize security and infrastructure policies.

Benefits

  • Cost-Effective: Up to 60% lower TCO compared to other cloud-based Spark alternatives.
  • Faster Model Training: Train models up to 5X faster with integrated AI tools.
  • Enterprise Security: Advanced security features like Kerberos and at-rest encryption.
  • Scalability: Autoscaling clusters for elastic workload management.