Build IoT device with Google Voice Assistant

With companies opening access to their Assistant’s APIs (Alexa, Cortana, Google Assistant, Siri) an IoT enthusiast is given endless opportunities in building a really smart home. Automating the house is nice but giving it ears and voice (and your house automation API) is truly amazing. Almost frightening! In this article we’ll explore Google Assistant SDK and find ways to wire up our home management code.

Google Assistant

I chose Google Assistant to play with as its probably more advanced of the four known assistants. There are numerous comparisons out there saying its the best (to the moment of writing at least) and I also feel safe to have my assistant backed with… Google.

Getting Started With SDK

You can start experimenting with Google Assistant SDK on your Ubuntu desktop where you supposedly have audio (recording and playback) properly set up. Head to SDK Overview and choose a way to interact with the API: either with python library or service. Interacting via the service lets you choose among the wrappers for different languages. I started with npm package [google-assistant-node]. Try to get yourself comfortable with the SDK, that is, try actually talking to the Assistant and get meaningful response. With google-assistant-node, If I turn on logging I get:

$ node mic-speaker.js
Say something!
Transcription: who  --- Done: false
Transcription: Who Am  --- Done: false
Transcription: who am I  --- Done: false
Transcription: who am I  --- Done: false
Transcription: who am I  --- Done: true
Assistant Text Response: I remember you telling me your name was Maxim
Assistant Speaking
Conversation Complete
Assistant Finished Speaking

Once you get something like this on you platform of choice – you’re ready to move on to real Assistant device setup.

Setting Up Audio On Your Device

As I didn’t have a RasPi 3 at my disposal, I took my NanoPi M3 as the prototype. With RasPi 3 you probably won’t face any of the problems I faced with M3, but I’ll describe the general process of setting up audio on a Linux machine. Basically, you don’t have to do anything to set up audio – it should be working out of the box. The following instructions are mostly for cases when something is not working. First, get acquainted with your devices. List your microphones:

fa@NanoPi-M3:~$ sudo arecord --list-devices
**** List of CAPTURE Hardware Devices ****
card 0: nanopi3audio [nanopi3-audio], device 0: c0055000.i2s-ES8316 HiFi ES8316 HiFi-0 []
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

List your speakers:

fa@NanoPi-M3:~$ sudo aplay -L
null
    Discard all samples (playback) or generate zero samples (capture)
pulse
    PulseAudio Sound Server
default:CARD=nanopi3audio
    nanopi3-audio,
    Default Audio Device
sysdefault:CARD=nanopi3audio
    nanopi3-audio,
    Default Audio Device

Try some generic audio tests:

$ speaker-test -t wav -c 2

Try audio capture tests:

$ arecord -vv -f dat /dev/null

(you should see some meter activity) If any of your speakers or microphones are not working, its time to open up alsamixer. Just type the command alsamixer and you should see the control panel:

Switch the switches, twiddle the knobs, unmute everything and perform the tests. Repeat until you get your audio working. Finally, once you get capture and playback devices ready, memoize their names and create an .asoundrc config inside your user home directory. As my working devices were called sysdefault, I came up with the following:

pcm.!default {
  type asym
  capture.pcm "mic"
  playback.pcm "speaker"
}
pcm.mic {
  type plug
  slave {
    pcm "sysdefault"
  }
}
pcm.speaker {
  type plug
  slave {
    pcm "sysdefault"
  }
}

Reboot after changing/creating this file – it took me quite a while tofind out the changes won’t take effect otherwise. Also, some more useful info can be found here: https://developers.google.com/assistant/sdk/guides/library/python/embed/audio This config will let google assistant library (or service) to properly use your audio.

Google Assistant Device

Ok, you’ve got your audio working – its time to register the device as a Google Assistant device so that you can use its traits and actions. I haven’t found any GUI to do this – not such thing in Google dashboard or elsewhere. The only way to register a device is via an API. There are several guides on this, but here is an easy one: switch to Python and use GA gRPC example. This example includes every step required to register a device and play with it: setup Python 3 environment, install the required dependencies, Google OAuth procedure (involving client_secret_XXXXX.json file), etc. The device register command, described in the example looks like:

python -m devicetool register --model 'my-model-identifier'
                              --type LIGHT --trait action.devices.traits.OnOff
                              --manufacturer 'Assistant SDK developer'
                              --product-name 'Assistant SDK light'
                              --description 'Assistant SDK light device'
                              --device 'my-device-identifier'
                              --nickname 'My Assistant Light'

After registering you’ll be able to refer to your device via device id (the one you set as --deviceoption). Now explore both gRPC and library samples, try python -m pushtotalk --device-id 'my-device-identifier', then switch to library example and try python -m hotword. Also try to add some custom logic as reaction to detected device actions.

See It In Action

There will be no ‘see it in action’ section for now, as my NanoPi M3 used as prototype has no GPIO library for Python and I can’t add any meaningful handling to the device actions. As soon as I get my RasPi 3, I’ll update this article with a working repo.

]]>