Google Speech Recognition API in Unity (C#)

With Virtual Reality finally taking its place on deck, the the maturity of Speech Recognition we’ve seen over the past decade or so could not have come at a more opportune time. New technologies like VR require a shift in the way we interact with the systems we use and as anybody who has read Neal Stephenson knows the Diamond Age of immersion will come when the characters we’re interacting with feel real to us. That realism requires a bit more than just pointing at a menu and clicking a button. Hence, why I’ve been playing around with getting Speech Recognition to work in Unity.


Links: – The code. – You’ll need a Speech API key from here. You’re limited to 50 requests a day last time I checked. I haven’t messed with using the credentials yet.


The code found here uses the Google Speech Recognition API and works in five basic steps.

  1. Record audio from your microphone.
  2. Convert audio to a temporary WAV file.
  3. Upload the WAV file to Speech API Server.
  4. Recieve a JSON response from the API Server.
  5. Parse the JSON string for your Transcription.


A typical disclosure for me:
I don’t necessarily know what I’m doing, I just have a tendency to not give up until it works. The code I’ve written here could probably be done in a much prettier manner.
As far as I know this should be cross platform compatible. I’ve used it successfully on Android and Windows. I don’t have the resources to test it on iOS currently.


To Use the Demo Project:

  1. Place your Google Speech API Key you got from here in the appropriate location in the Inspector for the Speech GameObject.
  2. Play your scene.
  3. Press the Record Gui Button.
  4. You have seven seconds to speak. Note: Google Speech accepts up to a ten second wav file but the microphone is set to record seven because we’re not trimming blank air.
  5. Click Stop and Play when you’re done.
  6. Read your voice transcript in the text box.
  7. Tada!


References and Source Code used: – The code used to save your audio to mp3 – The code used to parse the response from Google’s servers.