
NVIDIA’s Project Mellon adds natural language commands to interactive applications. Project Mellon is a lightweight Python package harnessing the power of large language models (LLM) and speech AI to transform user experiences. NVIDIA Speech AI has the power to dramatically enhance the human-software interface.
Vendor
NVIDIA
Company Website
NVIDIA Project Mellon is a lightweight Python package that harnesses the power of large language models (LLM) and speech AI to transform user experiences. It adds natural language commands to interactive applications, enabling users to control complex applications with their voice. Project Mellon uses NVIDIA RIVA for automatic speech recognition (ASR), text-to-speech (TTS), and large language models (LLM) such as NVIDIA NeMo for natural language understanding (NLU).
Features
- Zero Shot Language Models: No need for training language models.
- Python API: Issue commands and parameters to the application’s native fulfillment logic.
- Multi-LLM Support: Easy to use with multiple large language models.
- Natural Language Commands: Allows a broader group of users to use the application.
- Multi-Language Support: Extend speech control in English, Spanish, German, and Russian using NVIDIA RIVA.
- Remote Services: Use remote services for ASR, TTS, and NLU, with a small local Python package.
Benefits
- Simplified User Experience: Users can navigate complex GUIs with voice commands.
- Enhanced Immersion: Replaces invasive GUIs with voice commands for deeper immersion in XR applications.
- Broad Accessibility: Enables a wider range of users to interact with applications.
- Creative Interactions: Frees users from traditional button-and-menu-driven interfaces, enhancing creativity.
- Flexible Integration: Easily integrate natural language commands into existing applications.