What do websites know about you?
You control the information you give to the websites you visit. You shouldn’t give the sites more information than they need to provide the service you want. If you’re not expecting the sites to deliver anything to your home address, you should probably not give out your address just because the site asks you to. The same applies to your phone number, full name or even your email address. The less information you give to websites, the less you’ll lose if someone either sells the information or someone gets access to your account.
When data leaks happen, they’re often posted on the internet for others to see. If you’ve given out your mother's maiden name or your first pet’s name as an answer for a security question, that information might now be available to anyone who knows how to use Google. These kinds of security questions are inherently insecure, as your mother’s maiden name, your pet’s name or your first address is usually really easy to find out.
But what if the leaked information is your password? Have you used the same password anywhere else? If yes, then anyone can try your email and password combination for other sites as well. The leaked information is often used in other attacks even before it becomes public.
Passwords are usually the first obstacle in the way of an attack. In later chapters, we’ll introduce other elements that can also prevent your accounts from being accessed by attackers. This defence in depth approach is a common practice in cybersecurity. It means that you have multiple layers protecting the asset you want to protect. If one layer of defence breaks, for example your password, it doesn’t mean the attackers have free access to your information.
Defence in depth is a concept used in information security in which multiple layers of security controls (defence) are placed throughout an information technology system. Its intent is to provide redundancy in the event that a security control fails or a vulnerability is exploited that can cover aspects of personnel, procedural, technical and physical security for the duration of the system's life cycle.
For accounts on the internet, a password is usually the first protective layer you have. We’ll go a bit deeper into how passwords work in theory before familiarising ourselves with other layers we can use.
Enumeration attacks and user names
Enumeration attacks are used by attackers to establish valid user names as a first step in gaining a user's login credentials. Common areas for this type of attack include user login and "forgot password" functions. For example, a site can in some cases unintentionally leak information about its members by giving out different messages for login attempts based on whether the username or email exists.
Have you ever accidentally typed in a wrong user name and got back an error message, "invalid user name" or "user name does not exist"? If the site has a similar message that isolates when only the password is incorrect, then the attacker can use this information to know that a user name is valid. This allows the attackers to enumerate huge lists of emails, for example, and receive information about who has an account on the target site. This is why it is best practice for websites to have login error messages that do not isolate what is wrong with the login attempt, such as "user name OR password is invalid".
In most cases this information itself is not damaging. However, if anyone can verify the existence of an account on a dating or escort service, that information can be used against the individuals.
Enumeration attacks can also be used to better target phishing attacks to individuals, and they are used to make brute force attacks easier by reducing the number of possible targets too. We will cover definitions of phishing and brute force shortly.
Enumeration attacks are not limited to login forms. They can be used with password reset or account creation as well. Account creation, for instance, usually protects from creating duplicate accounts and will inform the user trying to create a duplicate that an account with the provided email or username already exists.
The next step in gaining access to use login details is to know the password. That's what we will learn about next.
Passwords
Most websites you use identify you by a username and password combination. Because we as online users have to use dozens or even hundreds of different services, we generally try to create memorable passwords. The problem with easy-to-remember passwords is that they’re usually also easy to guess for attackers. Given that the most common passwords in 2020 were 123456, 123456789, qwerty and password, it is not hard to hack a huge number of accounts just by trying these passwords with random email addresses.
The most common passwords of 2020
123456
123456789
qwerty
password
1234567
12345678
12345
iloveyou
111111
123123
The most used requirements for setting passwords have also been poor over the years, and have taught people how to create bad passwords. These bad recommendations have required people to create passwords that are hard to remember but easy for computers to crack.
The webcomic XKCD sums up password security really well.
A good password doesn’t necessarily have to contain obscure symbols to be secure. Generally, the longer the password is, the more secure it is. Requirements that mandate you to include symbols, lowercase and uppercase characters and numbers in passwords do not usually take into account how hard the passwords actually are to crack. The requirements make the keyspace (the amount of possible password combinations from a given length) larger, although the same could be achieved by requiring a longer password.
Let’s take a look at an example:
If the possible character set you can use for a password is lowercase characters in the English alphabet, i.e., 26 characters, and the length of the password is 6 random characters, the keyspace would be calculated as:
266= 308,915,776 possible passwords
^ is often used to denote “to the power of”, also expressed 26. You don’t need to know the maths here, but if you’re curious you can type an equation like 26^6 into most online search engines to get the answer
Whereas if the password is 8 random characters long, the keyspace would be:
268= 208,827,064,576 possible passwords
As you can see, the keyspace is dramatically larger if the length of the password is just 2 letters longer. Let’s take another example of additional characters and add uppercase characters to the mix, doubling the number of characters usable for the password.
526= 19,770,609,664 possible passwords
You immediately notice that the keyspace is a lot bigger than what we started with. However, it is not even close to the keyspace for the longer password with just lowercase letters. How about we add some general symbols, i.e. !, “, #, ¤, %, &, /, (, ) and =. This means the number of characters available is now 62.
626= 56,800,235,584 possible passwords
With the addition of the 10 extra characters in the allowed character set, we almost quadrupled the available keyspace. At the same time we made the password much harder to remember. The keyspace is still a lot smaller than with the example password of 8 characters.
A purely random password also defeats other types of attacks such as dictionary attacks. Dictionary attacks use the fact that users try to create memorable passwords by using dictionary words, names, or variations of them. The second edition of the 20-volume Oxford English Dictionary contains entries for 171,476 words, so if you use a password with just one word you can think of it as using a keyspace of 170,000. Often these words are not used as is, but are modified in a way that retains their memorability.
To take the example even further, let’s see how large the keyspace is by using just plain English words. Let’s assume we randomly select 3 English words from the dictionary and create a password with the combination. The calculation would be:
170,000^3 = 4,913,000,000,000,000
A password such as this would be much more memorable, and if we’re just concerned about brute forcing, much harder to crack as well. There are other reasons why the keyspace might not be as large as one would initially assume. Most English speakers only use a vocabulary of about 3,000 words and the passwords chosen in this way reflect that. Also, people are really bad at random. What feels like a random combination of words to us might easily be affected by the context, our thoughts or previously selected words. If using a password of just words, use a generator instead of relying on the human concept of randomness.
A common way people try to obfuscate dictionary words is by using substitutions or adding a number to the word to create a password. Unfortunately, these modifications are generally well understood and it is trivial for a computer to try out all expected variations. Simple replacements like replacing the letter “o” with the number “0” just make the password harder to remember while just barely making a computer do more work. These simple substitutions do not make the keyspace substantially larger. Neither does adding a number or the current year at the beginning or the end of the word. Password cracking software and the people building them know of all these tricks and have taken them into account.
Entropy
Password strength is usually not measured by just the space of possible keys. “Entropy” is a term that is also used in cybersecurity – it means a way to measure the assumed difficulty of real-world cracking of the password. Entropy is calculated taking the mathematical function “log2” from the keyspace (don’t worry about the math, you won’t need to do it – but if you really want to, you can type the formula into most search engines to solve it. For example, try “log2(26^8)” in www.google.com).
So, what does entropy mean in practice? Using the entropy calculation result, the password strength can be measured with this chart:
< 28 bits = very weak; might keep out family members
28-35 bits = weak; should keep out most people, often enough for desktop login passwords
36-59 bits = reasonable; fairly secure passwords for network and company passwords
60-127 bits = strong; can be good for guarding financial information
128+ bits = very strong; often overkill
In our previous example, the entropy of the 8-character password would be:
log2(26^8) = 37.6 bits
A 6-character password would have an entropy of:
log2(26^6) = 28.2 bits
In the example above, the entropy of the 8 random characters (37 bits) barely makes it into the reasonable category. A password of 15 characters (70 bits) would be strong, but still not in the very strong category. The exponential nature of these calculations means that making a password longer can hugely influence the difficulty of cracking it.
Entropy, although useful, is not a tell-all metric. A password of just 8 characters can still be considered mostly unsecure though – we’ll cover why in the section below.
How passwords are cracked
To see how passwords are cracked, we must first understand how they are protected.
Passwords are usually stored in a database, “encrypted” with something called a one-directional hash algorithm. A hash algorithm is a cipher, used to transform your plain text password into coded text. It only works in one direction. A text hashed with a secure hash algorithm cannot be reversed even though the algorithm is known. The resulting output of your password being processed by the hash algorithm is known as a hash. A hash is a string of characters made of letters and numbers, representing the encoded version of the original input text.
A hash is not strictly speaking encryption. Encryption is a two-way process that can be reversed, whereas hashing is a one-way street. If the hash algorithm works as intended, the output value cannot be reversed.
You can read more on cryptographic hash functions on Wikipedia
Let's look at an example of a well-known, and now considered as unsecure, hash algorithm: MD5.
Using MD5, if you hash the text “password” you would get an encoded output hash of “5f4dcc3b5aa765d61d8327deb882cf99”.
A tiny change in the original text will create a totally different hashed value. This is another valuable function of a hash. You cannot estimate what the hash will be based on the input.
For example, an MD5 hash for “passwore” is “a826176c6495c5116189db91770e20ce”, which does not resemble the hash for “password”.
When a user is logging in, their clear text password is hashed and the resulting hash is compared to the hash value (encoded version of the user password) stored in the database. If they match, the user is allowed to log in.
With all of this protection: plain text passwords obfuscated into unpredictable code, through an irreversible algorithm- how can passwords be hacked? What makes older hash algorithms like MD5 vulnerable?
A hashed value (the encoded version of the original password) can be precomputed and stored in a huge list of passwords. These files are called rainbow tables and are easily found on the internet. For example, you can easily find a file with all hashes created with the MD5 algorithm for any password up to 10 characters long containing any mixture of lowercase letters and numbers (abcdefghijklmnopqrstuvwxyz0123456789).
Rainbow tables are created by running password dictionaries (actual dictionary words as well as lists of commonly-used passwords) found on the internet through hash algorithms. The resulting encoded hashes are lined up against their corresponding original plaintext password inputs, and then used to hack into user accounts.
Salted and Hashed
Does this mean that hashing is obsolete for most passwords? No! The good news is that programmers have several clever techniques they can use to make it difficult to find the hash for a password. A hash function is often applied many times to increase the time it takes to check a password and thus delay brute force attacks.
Another way programmers protect their users' passwords is to use a technique known as "salting". Here, a special value, called “salt”, is added to the clear text after the user enters their password, but before the hashing is applied, to prevent the use of a pre-computed list of hashes (rainbow tables). When applied, a salt value makes this rainbow table file unusable and is a really easy way to make those types of attacks unfeasible.
For example using the MD5 hash algorithm, if we add a randomised salt “xhsr2d” to the cleartext “password” we get MD5(“xhsr2dpassword”), which is “ebf20a6c99eccaefa0bf4d88a5bd3456”. This is a totally different output than the original MD5-produced hash of "password" found above, which any hacker can find in a rainbow table on the internet. By salting the password, even if we had a rainbow table with hashes for every possible password with lowercase letters and maximum length of 10, we couldn’t find the match for the hash and would have defeated the use of pre-computed hashes.
What salting means in practise, is that attackers must brute force attack every password to successfully hack their encoded hashes in a database.
Brute forcing passwords means using the power of computers to programmatically try different hashes until the password with a matching hash value is found. Hashing functions rely on how hard it is to create two different messages that produce the same hash value. This is called a collision. If a hash function can be forced to generate the same values for different input data, it is considered broken.
Brute forcing passwords can be really hard. With modern computers you can calculate thousands of hashes per second. Graphics cards in modern computers are specialised hardware and they are really proficient at calculating hashes. They can in some cases reach billions of hashes per second.
However, a long password of 15 lowercase characters can have 1,677,259,342,285,725,925,376 different variations, and computing that many hashes with the speed of 1 billion hashes per second will take a little over 50,000 years. The calculation time drops down to approximately 3.5 minutes for a password with a length of 8 characters.
You, as a user of an online service, usually can’t make sure that the possible preventive measures have been used by the developers. This means you should assume that rainbow tables might be used against your passwords. In practice, that means you should prefer longer passwords as they make the use of most rainbow tables impractical. Just think of the example above- by adding 7 characters to your password, you can make the life of a hacker a LOT harder!