About
Ichigo is a user-friendly speech package crafted for developers seeking efficient solutions in the realm of speech technology. Designed to accommodate the growing demands of accurate speech processing, Ichigo presents a cohesive platform that integrates essential speech tasks. This includes Automatic Speech Recognition (ASR), a Text-to-Speech (TTS) feature coming soon, and an experimental Speech Language Model (Ichigo-LLM). By offering easy access to robust models through straightforward Python interfaces or a scalable FastAPI service, Ichigo allows developers to sidestep the complexities of audio processing and concentrate on enhancing their projects.
Highlights
Ichigo sets itself apart with a focus on accessibility and simplicity. It introduces Ichigo-ASR, a lightweight yet powerful speech tokenizer optimized for multilingual applications, using the Whisper-medium model. With just 22 million parameters, Ichigo-ASR efficiently converts speech into discrete tokens, facilitating compatibility with large language models for instant speech comprehension. The model is trained on a diverse dataset, comprising over 400 hours of English and 1000 hours of Vietnamese data, ensuring broad usability.
Batch processing capabilities allow developers to transcribe multiple audio files seamlessly, employing just a single line of code. This functionality not only streamlines the transcription process but also enables users to maintain control over their audio files efficiently. For example, users can transcribe a single audio file and automatically save the transcriptions in a text file located in the same directory. This ease of use makes Ichigo an essential tool for developers looking to integrate speech capabilities into their systems without the hassle of complex setups.