Multi-word passphrases not all that secure, says Cambridge University

<img src="http://sophosnews.files.wordpress.com/2012/03/cambridge-university.jpg?w=640" title="Cambridge University" alt="Cambridge University" vspace="10" hspace="10" align="right" />Think that a passphrase of multiple, random dictionary words is as unguessable as long strings of gibberish, but easier to remember?

Research from the Computer Laboratory at the University of Cambridge suggests that this might not be so.

While passphrases using dictionary words may not be as vulnerable as individual passwords, they may still be cracked by dictionary attacks, the research found.

Security researcher Joseph Bonneau reports, in a recent paper written with Ekaterina Shutova, that his team studied the problem by turning not to the theoretical space of choices but rather the real-life passphrases that people actually string together.

To find such a selection of passphrases, his team used data crawled from the now-defunct Amazon PayPhrase system, introduced last year for US users only.

The goal wasn’t to evaluate the security of the scheme as deployed by Amazon, Bonneau says, but rather to learn more about how people choose passphrases in general.

Amazon's was "a relatively limited data source", he writes, but the research results do "suggest some caution on this approach".

In the original version of the Amazon site, passphrases had to be at least two words long. Error messages indicated when a passphrase was already in use.

<img src="http://sophosnews.files.wordpress.com/2012/03/amazon-pass-phrase.jpg?w=640" alt="Amazon Passphrase" title="Amazon passphrase">

The first experiment was a dictionary attack using lists of movie titles, sports team names, and dozens of other types of proper nouns crawled from Wikipedia, along with idiomatic phrases crawled from sources including Urban Dictionary.

<img src="http://sophosnews.files.wordpress.com/2012/03/passphrase-attack.jpg?w=640" alt="Passphrase attack" title="Passphrase attack">

Here's what the researchers said:

We found about 8,000 phrases using a 20,000 phrase dictionary. Using a very rough estimate for the total number of phrases and some probability calculations, this produced an estimate that passphrase distribution provides only about 20 bits of security against an attacker trying to compromise 1% of available accounts. This is far better than passwords, which are usually under 10 bits by this same metric, but not high enough to make online guessing impractical without proper rate-limiting.

<img src="http://sophosnews.files.wordpress.com/2012/03/login.jpg?w=640" title="login screen" alt="login screen" vspace="10" hspace="10" align="right" />The debate about how easily dictionary attacks can break passphrases is interesting. I am not adept at the mathematics involved, but random word passphrases certainly do have their proponents.

Take, for example, the Slashdot discussion on this issue.

A random selection of commenters' thoughts on the entropy (i.e., the password strength/resistance to brute-force searching) of common-word passphrases:

»IMHO, you CANNOT use straight dictionary words (regardless of language, and yes, I do mean Klingon and Sindarin!) in your passwords without some sort of numeric or symbolic character replacement pattern.

»Of course you can. If they're selected randomly, an attacker has to use the complete source space for the random selection in a brute force attack.

»diceware.com gives you 12.9 bits of entropy per word. Brute forcing that is already more trouble than it's worth at three words, and five would require nation-state resources to crack.

These issues are delightful and productive to ponder for those with a love for password generation nuance, but most laypeople just want to know how to choose a safe password.

We don't want to have to remember crazy combinations of uppercase and lowercase and random words with letters swapped out Leetspeak-ishly, plus of course the added special character &$!! or two and some digits glued to the bottom. (See xkcd for the graphic representation of the insanity this causes.)

<img src="http://sophosnews.files.wordpress.com/2012/03/battery-staple.jpg?w=640" alt="Password security discussed on XKCD" title="Password security discussed on XKCD">

The research takeaway is that while passphrases are safer than passwords, they're not all that safe, depending, of course, on length.

Length is another matter entirely. It spikes entropy greatly, and it's why a common-word passphrase that's unrestrictedly long (think "Mary Had a Little Lamb, Its Fleece Was White As Snow") does actually increase entropy, as Paul Ducklin and Chester Wisniewski pointed out in a recent Sophos Techknow podcast on password rules and regulations.

Personally, I was long ago converted to the passcode generation scheme put forth by Graham Cluley, depicted in this video:

(Enjoy this video? You can check out more on the SophosLabs YouTube channel and subscribe if you like)

Graham's approach is a user-friendly method that combines not random words but rather the first letters of a personally significant passphrase, peppered with Leet swappage: i.e., 4 for A, 0 for o, 3 for e, etc.

And thus is the word Leet itself rendered by Leetspeak as 1337.

As many have pointed out, Leet is too predictable to use on simple dictionary words. Everybody already knows the common character swaps, and there are Leet dictionaries out there that can be used for attacks.

"[The password myth] that annoys me the most [concerns] Leetspeak," Chester said in the password podcast. "They pick a nice word, and they say, 'Well, it's not a dictionary word. I added 0 instead of o.' But most password-cracking apps try that right off the bat, because they know how much people rely on this false sense of security from complicating their password."

But combining passphrase abbreviation with Leetspeak combines the best of random characters mixed with the implicit, coherent meaningfulness of a phrase.

The debate over whether passphrases are guessable seems moot in the face of this user-friendly approach.

I'm not saying that because I write for Naked Security; I'm saying it because I've found it actually works.

Using this hybrid approach, I can call to mind random strings of characters reaching a dozen or more characters which, when I decipher them, form phrases that are simple for me to associate with important sites: for example, that of my neighborhood bank.

And, of course, as Graham's video points out - you can use password management software to remember your passphrases securely if you can't remember them.

If you're not convinced that this is the best approach, either for you or your end users if you set organizational password policy, I'm curious to hear your thoughts on how you approach password generation. So please, comment away.