Skip to content

Checking your math... #1

@bwbug

Description

@bwbug

In The Similar-Words Problem in your Readme, you wrote:

If we assumed a hypothetical 18,000 word list that was just 9,000 words and their plurals, I think the odds of getting at least one "awkward double" in a 4-word passphrase is (1/18000) * (2/18000) * (3/18000), which is a really small number. But check my math!

Although your conclusion is correct ("the odds...is a really small number"), the odds of this happening is over 600 million times more probable than what you have estimated.

The correct probability is 1/9000 + 2/9000 + 3/9000 - 11/9000**2 + 6/9000**3.

To prove this for a word list containing N words and their plurals (2 N words total), if P1 is the probability of getting at least one "awkward double", and if P0 is the probability of getting no awkward doubles, then

P1 =1 - P0

The probability if getting no awkward doubles (P0) is the number of passphrases containing only unique stems (i.e., once a word has been selected, it cannot be reselected itself, and neither can its conjugate -- the plural or singular form, whichever was not picked in the previous selection), divided by the total number of possible passphrases. For a passphrase consisting of k words, the total number of passphrases is

Ntotal = (2 N)k

To compute the number of passphrases containing only unique stems, the size of the word pool is reduced by 2 each time a word is selected (because the word itself is eliminated from further consideration, as is the plural/singular form of that word):

Nunique = (2 N)(2 N - 2)(2 N - 4) ... (2 N - 2 k +2)

Therefore, the probability of getting only unique stems is

P0 = Nunique/Ntotal = (1 - (1/N))(1 - (2/N))...(1 - (k-1)/N)

Therefore, the general solution for the probability of getting at least one "awkward double" is

P1 = 1 - (1 - (1/N))(1 - (2/N))...(1 - (k-1)/N)

For k=4, the math works out to the following result:

P1 = 6/N - 11/N2 + 6/N3

If, in the general solution, one neglects higher-order terms (N-2, N-3, etc.), the following approximate solution is obtained:

P1 ≈ 1/N + 2/N + ... + (k - 1)/N = (k(k - 1))/(2 N)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions