A global initiative led by Cohere For AI to advance the state-of-art in multilingual AI and bridge gaps between people and cultures across the world. Aya is an open science project to create new models and datasets that expand the number of languages covered by AI, involving over 3,000 independent researchers across 119 countries.
Vendor
Cohere
Company Website


Overview
Aya is a global initiative led by Cohere For AI to advance the state-of-the-art in multilingual AI and bridge gaps between people and cultures across the world. Aya is an open science project designed to create new models and datasets that expand the number of languages covered by AI, involving over 3,000 independent researchers across 119 countries. The Aya project represents a significant stride in multilingual AI research, providing an open-access collection of datasets and models. This initiative addresses the critical gap in AI's language coverage, particularly for underserved languages. The project is built upon the collaborative efforts of a global network of researchers, fostering innovation and inclusivity in the field. Aya is more than just a model and dataset; it's a movement towards a more inclusive and globally representative AI landscape. It enables researchers and developers to build applications that can understand and interact with users in their native languages, regardless of how widely spoken those languages are.
Features
- Aya Collection: This is the most extensive assembly of multilingual instruction fine-tuning datasets to date, featuring 513 million prompts and completions across 114 languages. It includes rare, human-curated annotations from fluent speakers worldwide.
- Aya Model: A massively multilingual language model capable of following instructions in 101 languages. It is developed using a diverse mix of instructions from the Aya dataset and collection among others, achieving state-of-the-art performance across numerous multilingual benchmarks.
- Aya 23: Open Weight Releases to Further Multilingual Progress, this model shares evaluation results on multiple multilingual NLP benchmarks and generation quality assessments.
- Aya Expanse Model Family: A new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models.
- Aya Vision: Multimodal VLLM models in 8B and 32B parameter sizes.
Benefits
- Advancing Multilingual AI: Aya provides AI researchers a groundbreaking foundation to accelerate multilingual AI progress.
- Open-Source Resources: Fully open-sourced dataset and model for collaborative research and development.
- Extensive Language Coverage: Supports 101+ languages, half of which were previously underserved by existing language models.
- State-of-the-Art Performance: Achieves top performance across numerous multilingual benchmarks.
- Community-Driven Innovation: Fosters collaboration among researchers worldwide, leading to rapid advancements in multilingual AI.
- Accessibility: Offers accessible and efficient multilingual models like Aya Expanse 8B for broader use.