Spoken language understanding

We offer a suite of speech and language solutions built on top of VSR (Vernacular Speech Recognition), which is a speech-to-text service powered by state of the art deep learning techniques. Combined with our natural language understanding (NLU) technology, it builds the foundation of our conversational AI platform.

Multilingual Speech Recognition and Code-switch support

Our speech to text software is created on a speech corpus of over 1 million hours of data that covers diversity on multiple attributes such as gender, age, region, etc.

  • Multilingual supports with 10 languages and 100 dialects
  • Recognises code-switched utterances with accurate multilingual recognition
Multilingual Speech Recognition and Code-switch support
Streaming and synchronous recognition

Streaming and synchronous recognition

  • Streaming speech recognition publishes the transcription in real-time
  • Synchronous speech recognition is suitable where delayed processing is required, for example, To generate transcription for your next podcast

Short and long duration speech input support

  • Short audio format for voice assistants - Get transcription for user commands
  • Long audio format for processing phone call conversations - For a custom implementation of speech analysis
Short and long duration speech input support
Speaker characteristics

Speaker characteristics

  • Configure and receive certain speaker characteristics such as speaker gender, age, and dialect or region
  • Differentiate between multiple speakers and provide annotated transcription according to the individual speakers

Generate Insights with Keyword spotting

Identify the existence of a set of custom phrases in your speech input that allows you to generate insights and act on it in real-time

Generate Insights with Keyword spotting
Intent identification

Intent identification

  • Identify the real meaning of the user's speech input as per your domain, not just the plain transcription
  • Identify the real meaning of the user's speech input as per your domain, not just the plain transcription

Sentiment analysis

  • Analyzes the speech input real-time using over 50 signals such as the pitch, speech-rate, etc. to infer the sentiment and emotion of the speaker
  • If the conversation needs a unique solution or needs human empathy, it can transfer the call in real-time to an appropriate agent with the call history
Sentiment analysis
Robust to disfluencies

Robust to disfluencies

  • Our speech recognition service has in-built noise cancellation that works in multiple environments and on different audio file formats, without losing accuracy
  • Handles common disfluencies in speech such as false-start, stuttering, and duplicate, etc

Dialog management

  • Enables a multi-turn end-to-end conversation between VIVA and humans, keeping track of the state of the dialog with proper context while taking input from NLU/ASR components and deciding what to do next
  • Handle both linear as well as non-linear explorations of a conversation.
  • Updates the dialog structure personalizing it for the speaker as per the current context
Dialog management

0.1M hrs of speech data

Trained over a diverse range of speakers across demographic

10 language support

Supports all major languages and dialects in the country

100+ dialect support

Covers regional variance of speakers with better accuracy

Fill out the information requested below and our representative will get in touch with you

By submitting this form, you agree to Vernacular.ai’s Privacy Policy.