I constantly come across people that claim Alexa (Amazon Echo) is spying on you. They argue that Alexa is either transcribing everything you say and sending the resulting plaintext to Amazon; or even worse, they are just recording and sending audio files.
I personally find it hard to believe. If Amazon is doing that, the reputational costs if caught would be huge. Furthermore, recording, storing, and extracting valuable information of such large amounts of data is a daunting task, even for a company like Amazon. Finally, many of those that complain about Alexa, happily carry a mobile phone at all times. Not only a phone follows you around almost 24/7, but they also come with many more sensors other than a microphone, such as cameras, and GPS.
In any case, I wanted to take a stab at figuring out if Alexa is really spying on us. Amazon has explained over and over, that although Alexa is actually listening at all times, something necessary to wake up upon the magic keyword “Alexa”, it only sends data to Amazon when such keyword has been heard.
How did I go about this?
One way to investigate this is to listen to the network packages that Alexa is sending. Unfortunately for us, we can’t see the actual package content, that’s encrypted. But we can still do something
If Alexa is really spying on us, there are two likely scenearios:
- She could transcribe everything she listens to, and send the text over to Amazon.
- Alexa performs audio recordings and the raw files that are sent to Amazon.
The major advantage of approach #1 is that text is very efficiently encoded and compressed, making its network transport easy to disguise and conceal. It also has a major drawback: sounds, such as a dog barking or foreign languagues cannot be easily transcribed, which means that Alexa would be deaf to these.
Approach #2 doesn’t suffer from this limitation, but at a great cost. Audio is relatively heavy, which makes it much more difficult to conceal than text.
In either case, I ran a small experiment:
During an 8-hour period of time when my home was empty, I set up my router to capture all outbound Alexa packages. During 4 of those 8 hours, I played a podcast in the background using a speaker connected to an old phone. During the remaining 4 hours, the house was silent.
The reasoning is simple, if Alexa is really spying on us, the network traffic would likely be different in the 4-hours where the house was silent versus the 4-hours where a podcast was playing in the background. In principle, we can assume that Alexa can’t differentiate a real conversation from the podcast being played, so she would send the text transcription or sound recording over to Amazon.
tcpdump running on my router to capture the packages, and parsed and plotted the results using Python:
As expected, no significant difference was found between silence, or podcast playing in the background when it came to Alexa network activity.
If Alexa is sending audio recordings, likely the UDP protocol is being used. However ~165Kbytes of data for a 4-hour audio recording are not nearly enough.
A similar reasoning can be made if we assume Alexa is sending plain text (likely sent over TCP). During those 4 hours, a professional speaker (~160 words per minute) would have spoken about ~38K words. In English, the average word is 4.79 letters, using a 7-bit encoding that would come to a total of ~82 Kb.
tcpdump reported ~110 Kb of TCP traffic for those 4 hours. It’s possible but unlikely; TCP is also used for other types of traffic that needs to fit in those 110 Kb. Also, as explained above, text transcripts are limited to English; speaking Spanish would thwart Amazon’s evil spying plan.
Proving something is not happening is quite hard. In the light of these results, someone could argue: What if Alexa is throttling network traffic to not be discovered? What if Alexa is using a radio module and sending the results over AM frequencies instead of the Internet?
Sure, these are plausible scenarios, but they violate the Occam’s principle: the more assumptions you have to make, the more unlikely an explanation. My conclusion is that Alexa is not spying on me and that I will continue to use it at home as comfortably as I use my cell phone.
That Alexa is not spying on the general public is, with this data, probably true. However, I am also quite confident that Alexa could be enabled (remotely) to record and report everything to Amazon upon government request. This is just my opinion and I could be totally wrong.