Bitcoin wallets, just like those of other cryptocurrencies, are supposed to be highly secure. You, as the owner of a Bitcoin address – or wallet – are the only one in posession of a private key that can be used to transfer Bitcoin from your address. If you lose your private key, there is nothing that can be done. Indeed it seems that many thousand bitcoins have been lost this way. It should be noted that addresses were never even meant to be reused at all.
When someone generates a new Bitcoin address, they’re first generating a private key. The private key will have a corresponding public key, and a Bitcoin address is just a compressed representation of that public key. In the most theoretical sense, it’s possible that someone will generate a private key that has already been used and corresponds to some existing Bitcoin address with funds in it. The probability of that happening is, however, essentially zero. The amount of possible private keys is finite – albeit incredibly large – so it’s not mathematically correct to say that a newly generated key is unused with probability of 1, but the chances of it being a duplicate are even less than what is normally called astronomically low .
Let’s consider some numbers very roughly. A Bitcoin private key is 256 bits, so there can be 2256different key values (a bit fewer than that to be pedantic, all-zero and some other values aren’t valid as Bitcoin keys). That is more than 1077, pretty close to the estimated number of atoms in the observable universe, 1080. You could assign a key to every thousand atoms in the Universe and still have keys left over. The total number of Bitcoin addresses (each of which corresponds to a key) that have been used is far smaller. As of late 2019, more than 600 million addresses have been used in total, a number that might as well be zero compared to the number of possible keys.
Randomly generating an existing key is essentially impossible. If all 7 billion people on the planet generated a million keys per second for a century, a year later they would have on the order of 1025new keys – the probability of any of those matching a previously-used key is far less than the probability of winning a lottery jackpot every day for three years. In other words, it’s impossible to “get lucky” and generate a duplicate key even if you have all the computational power on Earth and try for more than the average human lifespan.
All of this however rests on the presumption that the keys are generated properly, that is, they are truly random. There’s an entire class of attacks on encryption known as random key generator attacks. These exploit the fact that some supposedly random number generator is actually predictable. The same general principle can apply to Bitcoin. If a private key got generated predictably, then the address corresponding to that private key is vulnerable.
So how to Bitcoin private keys get generated? This depends entirely on the specific software used – after all, you just need the 32 bytes (256 bits) from somewhere. A very popular way is then to just use the SHA 256 hash output of some data. If the data is random, everything is fine. If the data is predictable, not so much. One could simply take the SHA 256 of an ordinary word or password and use that as a Bitcoin private key. The approach of using a passphrase to create a private key, which you then never have to store anywhere but your memory, is called a brainwallet. It’s not a great approach in most cases, but is at least satisfactory, provided a good passphrase is used. But if the SHA 256 of an easily guessable phrase is used for a Bitcoin address, it’s extremely insecure.
I set out to analyze how many Bitcoin addresses were created with the very insecure practice of using common dictionary words for their key. I wrote a simple script that takes the system dictionary (
/ usr / share / dict / words), and then for each word it computes the SHA 256 hash and the corresponding Bitcoin address, and checks the blockchain history of that address.
wordsfile has 102401 entries. I had expected to find but a handful of addresses. To my surprise, I found thousands of addresses that had been created by this highly insecure method.
Moments after starting the scan, I saw an address that had two transactions in its history – it received 0. 0000546 BTC, and sent the same amount. Both transactions were from the second half of 2013, a year that saw the Bitcoin prices rally. Still, the transaction amount is very small. Given prices of around 700 USD per BTC at the time, the address had transacted less than 4 cents.
Then there was another address with 0. 0000546 BTC transacted. And another. And then more of them. Clearly this wasn’t just one person doing a test transaction, the number had some kind of significance. The addresses looked similar – they’d have only two transactions, one incoming and one outgoing, for this amount, and were mostly from 2013.
A quick investigation suggests that 0. 0000546 BTC was the weekly payout threshold for at least one popular “bitcoin faucet” in 2013. A bitcoin faucet or, more generally, a crypto faucet, is a site or app that rewards its users with tiny amounts of cryptocurrency as a reward for viewing ads, filling captchas or performing other actions. Users who reached acertain threshold, such as 0. 0000546 BTC, would receive a payout. This particular faucet seems to still be operational, as a bitcoin casino.
A more interesting question is how the faucet came to be associated with so many insecure addresses. It’s plausible that some people signed up for the faucet and specified an insecure address for their payouts. But the sheer amount of such addresses means something else was likely going on. One possibility is that someone used a bot to sign up for the faucet with many payout accounts, and used the dictionary to generate these payout addresses.
The more intriguing possibility is, however, that the bitcoin faucet itself used thousands of insecure addresses. To examine the possibility, I picked one of these addresses (1bAd4Xp1B8DgThjcHn9ZM1bppo5dCLzGJ), let’s call it 1bAd for brevity, and examined its transactions. It received 0. 0000546 BTC as part of a big transaction, where about 0. (BTC was spent to deposit 0.) BTC to some 500 addresses, 1bAd among them. This clearly looks like part of an automated process, so 1bAd could be the ultimate payout address for some user of the Bitcoin faucet, or it could be an internal address used before the payouts.
Looking at how 1bAd spent its Bitcoin balance, it’s again a massive transaction. Over 300 addresses, 1bAd among them, sent 0. 0000546 BTC to just two other addresses, neither of which have an obvious pattern staring you in the face. Also of note is that 1bAd spent its balance one week after receiving it.
It was then natural to try and cross-reference the incoming transaction for 1bAd with my list of insecure addresses. 1bAd received funds in transaction 7415674 f9baa 56 a6f8e d) ********************************************* (a) f6dbeaff (EF) EF 63 fc 3099 ca, along with 500 other addresses. 439 out of them are also on my list of insecure addresses. Clearly, this is not a coincidence. Moreover, the addresses used what was likely a consecutive section of some dictionary. The first recipient address in the transaction (18 sQ7fY3EAaNcV6as4He7p9wUwiwVKoGKC) has the private key (sha) (‘communicant’)and the last address to receive the same amount of BTC is 1ALsvfkoQaY7FuswvK1UDg9UAXaNnS6a2q, private key (sha) (‘consigned’). Between them, there are words like
I then checked another insecure address with the same history. Address (D) full address (D) *************************************************************** (rUE1A5hQcrFG) Ljdtq9wmz4rjb7) has, just like 1bAd, 0. 0000546 BTC in and out, in the same time period. Sure enough, its incoming transaction (dbb 57244611 a (B4E) FBBFE 86 F6FD 51 b 50617 fcbfaadb9e9e 7499 A 79 adbef) looks similar – 340 addresses on the receiving list, all but one getting the magical 0. 0000546 BTC. Cross-referencing the full receiver list, I find 262 addresses on my insecure list, private keys from
sha 256 ('allotment'to
sha 256 ('apathetic').
Interestingly, at one point nearly 400 addresses with 0. 0001092 BTC transacted showed up. This number is 0. 0000546 times two, and indeed, a closer look showed that such addresses followed the same pattern, except they had a total of two in transactions and two out transactions. Alphabetically, the keys for these addresses are right in between familiar-looking blocks of 0. 0000546 BTC addresses. I suspect that the faucet’s address-generating script somehow failed to update its place in the dictionary, leading to one dictionary block being reused for address generation.
The initial deposits to these insecure addresses occured on August 31, 2013, and they would be emptied some time in September.
The overall conclusion here seems quite clear. At the end of August 2013, a popular BTC faucetgenerated addresses in a highly vulnerable way. It created many addresses to hold 0. 0000546 BTC, presumably before paying out to its users, and it would create such addresses simply by passing sequential words from a dictionary to SHA 256, and using that as the private key. The bitcoin typically spent around a week in these vulnerable addresses. Anyone who had discovered this vulnerability at the time could have very easily seized many addresses from the faucet in question. All in all, my script found just over fourteen thousand insecure addresses from this faucet, all of which could have easily been breached and emptied. The total amount of bitcoin that was vulnerable still doesn’t add up to a lot, but it’s nonetheless interesting to see such an easily-exploitable weakness.
In addition to the large amount of vulnerable auto-generated addresses, I identified multiple insecure addresses with no obvious patterns in their usage. Most likely, they were addresses manually created by people who knowingly used the hash of a single word. This is an extremely bad idea, but it’s not too surprising to see that it has happened on a few occasions. Some of these insecure addresses have had tens of thousands of transactions (possibly trading bots?), Others have only had a few.
A few accounts have a history of 0. 0000646 BTC, and they seem related to a so-called cloud mining service from 2014.
Only two of the insecure addresses ever transacted a significant amount of Bitcoin. The address 1CgCMLupoVAnxFJwHTYTKrrRD3uoi3r1ag was involved in 48 thousand transactions in late 2014 and 2015, receiving a total of 5. 28 BTC. One of the account’s first transactions was for 5 BTC at once. And the address 158 zPR3H2yo 87 CZ8kLksXhx3irJMMnCFAN receivedBTC at one point – which seems, curiously enough, to have been picked up by a bot monitoring weak addresses.
I checked 102401 words in total. 19206 were capitalized, and 887 of them were used in insecure addresses. Out of the 83195 lowercase words, the insecure rate was significantly higher with 13460 used in insecure addresses.
The vast majority, 14109, belongs to the previously described insecure faucet. Of the remaining addresses, most also transacted tiny amounts.
Looking at the histogram of transacted amounts among addresses with less than 0. 001 BTC transacted, it’s dominated by addresses on the order of 0. 00005 – 0.000 06 BTC. The few addresses that transacted between 0. 00007 and 0. 001 BTC aren’t even visible in such a graph.
Excluding addresses with less than one-thousandth of BTC transacted and the two big outliers, the distribution looks like this.
Most addresses are indeed unremarkable.
Only two addresses have transacted a lot of BTC as seen above.
Any system that relies on public-key cryptography is only useful while your private key is safe. If your keys are stolen or otherwise compromised, it gives someone else access to your data, and it even becomes impossible to prove that it’s our of your hands – someone else accessing the data (or transferring Bitcoin) with your private key looks exactly as if you had done it yourself.
The human mind isn’t good at generating random data. Some people make mistakes by using dictionary words for their Bitcoin wallets, and some go even further by using that in a script that serves a popular crypto faucet. There is no excuse, ever, for using a guessable word where randomness is needed.