Ethereum: Does the BIP39 Mnemonic Structure Avoid Word Repetition?
The decentralized Ethereum network relies on a complex system of cryptographic keys and mnemonics to securely store users’ private keys. At the heart of this system is Bitcoin Improvement Proposal (BIP) 39, also known as BIP39 or Seed Phrase derivation. This protocol allows users to create unique, seed-based keys that can be used for a variety of purposes, including signing transactions, creating wallets, and interacting with third-party services.
A common concern when it comes to mnemonic construction is whether all 24 words of the seed phrase will be unique according to the specification. In other words, is it possible for a word to occupy two places in the valid core, leading to duplicate or incomplete keys?
BIP39 Mnemonic Construction Algorithm
BIP39 uses a simple but effective algorithm to generate mnemonic phrases. During the process, 12 words are randomly selected from a predefined set of possible words, which is usually represented as a list of letters and symbols (e.g., uppercase and lowercase letters, digits, punctuation marks). These 12 words are used to create the base phrase.
The algorithm selects each word randomly, ensuring that the selected word is not the same. However, it is important to note that the selection process does not guarantee the uniqueness of all possible combinations of the 12 words. There is still the possibility that certain word orders or even certain occurrences of the words can lead to duplicate keys.
Word Order and Duplicate Keys
To illustrate this, let’s take an example with a predefined list of 256 possible words (a more realistic number than 128, which is often cited as the maximum size of a mnemonic phrase). The BIP39 algorithm selects 12 random words from this list. In the worst case, all 24 positions in the core could be occupied by the same words, which could lead to duplicate keys.
For example, if we consider two possible word orders:
Word order A:
tool -> #8 and #20
Word order B:
tool -> #10 and #12
In both cases, the word “tool” appears twice in a valid base expression. This shows that yes, it is technically possible for a mnemonic construct to lead to duplicate keys.
Conclusion
While BIP39 provides an efficient algorithm for generating unique mnemonics, under certain circumstances (such as certain word orders or occurrences), there is still the potential for duplicate keys. To mitigate this risk, users can consider the following best practices:
- Use a random number generator to select words from a large predefined list.
- Avoid choosing words that are too similar in spelling or letter patterns.
- Consider using a password manager that generates and stores unique mnemonics.
By understanding how the BIP39 construction algorithms work and taking steps to minimize the risk of key duplication, users can enjoy secure and private key storage on the Ethereum network.