Logo
Sign in
Product Logo
Databricks Model ServingDatabricks

Unified deployment and governance for all AI models.

Vendor

Vendor

Databricks

Company Website

Company Website

Databricks Model Serving6.png
Databricks Model Serving4.png
Databricks Model Serving3.png
Product details

Mosaic AI Model Serving is a unified service for deploying, governing, querying and monitoring models fine-tuned or pre-deployed by Databricks like Meta Llama 3, DBRX or BGE, or from any other model provider like Azure OpenAI, AWS Bedrock, AWS SageMaker or Anthropic. Our unified approach makes it easy to experiment with and productionize models from any cloud or provider to find the best candidate for your real-time application. With batch inference support, you can efficiently run AI inference on large datasets, complementing real-time serving for comprehensive model evaluation. You can do A/B testing of different models and monitor model quality on live production data once they are deployed. Model Serving also has pre-deployed models such as Llama 3 70B, allowing you to jump-start developing AI applications like retrieval augmented generation (RAG) and provide pay-per-token access or pay-for-provisioned compute for throughput guarantees.

Benefits

Simplified deployment for all AI models

Deploy any model type, from pretrained open source models to custom models built on your own data — on both CPUs and GPUs. Automated container build and infrastructure management reduce maintenance costs and speed up deployment so you can focus on building your AI projects and delivering value faster for your business.

Unified management for all models

Manage all models, including custom ML models like PyFunc, scikit-learn and LangChain, foundation models (FMs) on Databricks like Llama 3, MPT and BGE, and foundation models hosted elsewhere like ChatGPT, Claude 3, Cohere and Stable Diffusion. Model Serving makes all models accessible in a unified user interface and API, including models hosted by Databricks, or from another model provider on Azure or AWS.

Effortless batch inference

Model Serving enables efficient AI inference on large datasets, supporting all data types and models in a serverless environment. You can seamlessly integrate with Databricks SQL, Notebooks and Workflows to apply AI models across vast amounts of data in one streamlined operation. Enhance data processing, generate embeddings and evaluate models — all without complex rework.

Governance built-in

Integrate with Mosaic AI Gateway to meet stringent security and advanced governance requirements. You can enforce proper permissions, monitor model quality, set rate limits, and track lineage across all models whether they are hosted by Databricks or on any other model provider.

Data-centric models

Accelerate deployments and reduce errors through deep integration with the Data Intelligence Platform. You can easily host various generative AI models, augmented (RAG) or fine-tuned with their enterprise data. Model Serving offers automated lookups, monitoring and governance across the entire AI lifecycle.

Cost-effective

Serve models as a low-latency API on a highly available serverless service with both CPU and GPU support. Effortlessly scale from zero to meet your most critical needs — and back down as requirements change. You can get started quickly with one or more pre-deployed models and pay-per-token (on demand with no commitments) or pay-for-provisioned compute workloads for guaranteed throughput. Databricks will take care of infrastructure management and maintenance costs, so you can focus on delivering business value.