Two related links, both involving using your phone to shoulder surf your passwords. Both attacks take advantage of the fact that smart phones with accurate accelerometers are now ubiquitous. By monitoring the the vibrations of the phone, the attacks inver what keys were pressed on a keyboard. Both of these a much more proof of concept, than actual sophisticated attacks, but they are interesting none the less.
At HOTSEC 11, Liang Cai and Hao Chen of UC Davis were able infer which key was pressed on an onscreen keyboard with 70% accuracy. By measuring how far phone was torqued around both the X and Y axises, the the location of where force was applied, and thus which key was pressed can be inferred. Cai and Chen made the task a bit easier for them. They held the phone in landscape mode, which spread the keys out more, thus causing a larger distribution of torques that could be measured. That’s not necessarily a problem since many people type in landscape mode. The bigger simplification was that they only looked at a touches on the dialing pad. A more interesting paper would have looked at attacking the alphabetical keyboard instead. I understand why they didn’t. The experiment was to find out if someone could use the accelerometers to read key presses at a high enough accuracy. Looking at their confusion matrix, I would think that determining alphabetical keyboard presses would need to be a two step solution. First, you’d get a distribution of what key was pressed. You’d then combine these presses with a Markov Chain language model to determine what the actual keyboard press was. “it was the durst of timez” becomes bit more Dickensian, a little less crappy rap-rock, and a lot less monkey.
Of course, sniffing the phone’s keyboard is one thing, figuring out what someone is typing on their laptop or desktop is something else, but that’s exactly what
Philip Marquardt and others at Georgia Tech did. In their work published at CCS 2011, they describe a technique where a phone placed next to keyboard read key presses via vibrations on the table at 80% accuracy. Unlike the method above, this team used a dictionary to increase the decoding accuracy. Their method feels the vibrations through the table and then attempts to categorize the key being on the left or right side of the keyboard (assuming the phone is placed to the left of the keyboard). Pairs of key presses are read, the distance between the first and second key of each pair is categorized as being either “near” or “far”. These triple are then passed through the dictionary in order to figure out what is the most likely English word typed. Left-right and near-far categorization is done using a neural net.