What languages are live right now
This page shows what is available now and what is still being built. Pipeline work is visible for planning, but it is not counted as live.
Languages now
11
Available now
Dialect clusters
37
With dialect-level metadata
Text tokens
2.4B
Across released packages
Speech hours
18.6k
Segmented + transcript-audited
Where we are live, and where we are still building
Five regions are tracked today, with East and Northeast as home-strength coverage.
East
Bengali, Odia, Maithili, Bhojpuri, Santali
Northeast
Assamese, Meitei, Khasi pipeline, Bodo pipeline
North
Hindi, Punjabi pipeline
West
Marathi, Gujarati pipeline
South
Tamil, Telugu, Kannada pipeline, Malayalam pipeline
Languages you can review now
For each one we show format, size, tier range, and QA status.
Languages planned next
Useful for planning, but not a promise until source review is done.
How a language enters the catalog
- Collected from partners or consented contributors
- Dialect labels added where they matter
- Source docs tied to the release record
- Real sample review before a contract is signed
How releases are versioned
- Every release receives a version ID and manifest hash
- Corrections ship as new versions, never silent overwrites
- Buyers get a change note when affected fields or files change
- Older versions stay visible for comparison and audit
Request coverage for your exact language mix.
We send a CSV with languages, format, volume, license tier, and QA status.