III.
Protecting your identity

What do websites know about you?

You control the information you give to the websites you visit. You shouldn’t give the sites more information than they need to provide the service you want. If you’re not expecting the sites to deliver anything to your home address, you should probably not give out your address just because the site asks you to. The same applies to your phone number, full name or even your email address. The less information you give to websites, the less you’ll lose if someone either sells the information or someone gets access to your account.

Passwords are usually the first obstacle in the way of an attack. In later chapters, we’ll introduce other elements that can also prevent your accounts from being accessed by attackers. This defence in depth approach is a common practice in cybersecurity. It means that you have multiple layers protecting the asset you want to protect. If one layer of defence breaks, for example your password, it doesn’t mean the attackers have free access to your information.

Enumeration attacks and user names

Enumeration attacks are used by attackers to establish valid user names as a first step in gaining a user's login credentials. Common areas for this type of attack include user login and "forgot password" functions. For example, a site can in some cases unintentionally leak information about its members by giving out different messages for login attempts based on whether the username or email exists.

In most cases this information itself is not damaging. However, if anyone can verify the existence of an account on a dating or escort service, that information can be used against the individuals.

Enumeration attacks can also be used to better target phishing attacks to individuals, and they are used to make brute force attacks easier by reducing the number of possible targets too. We will cover definitions of phishing and brute force shortly.

Enumeration attacks are not limited to login forms. They can be used with password reset or account creation as well. Account creation, for instance, usually protects from creating duplicate accounts and will inform the user trying to create a duplicate that an account with the provided email or username already exists.

The next step in gaining access to use login details is to know the password. That's what we will learn about next.

Passwords

Most websites you use identify you by a username and password combination. Because we as online users have to use dozens or even hundreds of different services, we generally try to create memorable passwords. The problem with easy-to-remember passwords is that they’re usually also easy to guess for attackers. Given that the most common passwords in 2020 were 123456, 123456789, qwerty and password, it is not hard to hack a huge number of accounts just by trying these passwords with random email addresses.

The most common passwords of 2020

123456
123456789
qwerty
password
1234567
12345678
12345
iloveyou
111111
123123

The most used requirements for setting passwords have also been poor over the years, and have taught people how to create bad passwords. These bad recommendations have required people to create passwords that are hard to remember but easy for computers to crack.

The webcomic XKCD sums up password security really well.

A good password doesn’t necessarily have to contain obscure symbols to be secure. Generally, the longer the password is, the more secure it is. Requirements that mandate you to include symbols, lowercase and uppercase characters and numbers in passwords do not usually take into account how hard the passwords actually are to crack. The requirements make the keyspace (the amount of possible password combinations from a given length) larger, although the same could be achieved by requiring a longer password.

Let’s take a look at an example:

If the possible character set you can use for a password is lowercase characters in the English alphabet, i.e., 26 characters, and the length of the password is 6 random characters, the keyspace would be calculated as:

26⁶= 308,915,776 possible passwords

Whereas if the password is 8 random characters long, the keyspace would be:

26⁸= 208,827,064,576 possible passwords

As you can see, the keyspace is dramatically larger if the length of the password is just 2 letters longer. Let’s take another example of additional characters and add uppercase characters to the mix, doubling the number of characters usable for the password.

52⁶= 19,770,609,664 possible passwords

You immediately notice that the keyspace is a lot bigger than what we started with. However, it is not even close to the keyspace for the longer password with just lowercase letters. How about we add some general symbols, i.e. !, “, #, ¤, %, &, /, (, ) and =. This means the number of characters available is now 62.

62⁶= 56,800,235,584 possible passwords

With the addition of the 10 extra characters in the allowed character set, we almost quadrupled the available keyspace. At the same time we made the password much harder to remember. The keyspace is still a lot smaller than with the example password of 8 characters.

A purely random password also defeats other types of attacks such as dictionary attacks. Dictionary attacks use the fact that users try to create memorable passwords by using dictionary words, names, or variations of them. The second edition of the 20-volume Oxford English Dictionary contains entries for 171,476 words, so if you use a password with just one word you can think of it as using a keyspace of 170,000. Often these words are not used as is, but are modified in a way that retains their memorability.

To take the example even further, let’s see how large the keyspace is by using just plain English words. Let’s assume we randomly select 3 English words from the dictionary and create a password with the combination. The calculation would be:

170,000^3 = 4,913,000,000,000,000

A password such as this would be much more memorable, and if we’re just concerned about brute forcing, much harder to crack as well. There are other reasons why the keyspace might not be as large as one would initially assume. Most English speakers only use a vocabulary of about 3,000 words and the passwords chosen in this way reflect that. Also, people are really bad at random. What feels like a random combination of words to us might easily be affected by the context, our thoughts or previously selected words. If using a password of just words, use a generator instead of relying on the human concept of randomness.

Entropy

Password strength is usually not measured by just the space of possible keys. “Entropy” is a term that is also used in cybersecurity – it means a way to measure the assumed difficulty of real-world cracking of the password. Entropy is calculated taking the mathematical function “log2” from the keyspace (don’t worry about the math, you won’t need to do it – but if you really want to, you can type the formula into most search engines to solve it. For example, try “log2(26^8)” in www.google.com).

So, what does entropy mean in practice? Using the entropy calculation result, the password strength can be measured with this chart:

< 28 bits = very weak; might keep out family members

28-35 bits = weak; should keep out most people, often enough for desktop login passwords

36-59 bits = reasonable; fairly secure passwords for network and company passwords

60-127 bits = strong; can be good for guarding financial information

128+ bits = very strong; often overkill

In the example above, the entropy of the 8 random characters (37 bits) barely makes it into the reasonable category. A password of 15 characters (70 bits) would be strong, but still not in the very strong category. The exponential nature of these calculations means that making a password longer can hugely influence the difficulty of cracking it.

Entropy, although useful, is not a tell-all metric. A password of just 8 characters can still be considered mostly unsecure though – we’ll cover why in the section below.

How passwords are cracked

To see how passwords are cracked, we must first understand how they are protected.

Passwords are usually stored in a database, “encrypted” with something called a one-directional hash algorithm. A hash algorithm is a cipher, used to transform your plain text password into coded text. It only works in one direction. A text hashed with a secure hash algorithm cannot be reversed even though the algorithm is known. The resulting output of your password being processed by the hash algorithm is known as a hash. A hash is a string of characters made of letters and numbers, representing the encoded version of the original input text.

Let's look at an example of a well-known, and now considered as unsecure, hash algorithm: MD5.

Using MD5, if you hash the text “password” you would get an encoded output hash of “5f4dcc3b5aa765d61d8327deb882cf99”.

A tiny change in the original text will create a totally different hashed value. This is another valuable function of a hash. You cannot estimate what the hash will be based on the input.

For example, an MD5 hash for “passwore” is “a826176c6495c5116189db91770e20ce”, which does not resemble the hash for “password”.

When a user is logging in, their clear text password is hashed and the resulting hash is compared to the hash value (encoded version of the user password) stored in the database. If they match, the user is allowed to log in.

With all of this protection: plain text passwords obfuscated into unpredictable code, through an irreversible algorithm- how can passwords be hacked? What makes older hash algorithms like MD5 vulnerable?

Salted and Hashed

Does this mean that hashing is obsolete for most passwords? No! The good news is that programmers have several clever techniques they can use to make it difficult to find the hash for a password. A hash function is often applied many times to increase the time it takes to check a password and thus delay brute force attacks.

Another way programmers protect their users' passwords is to use a technique known as "salting". Here, a special value, called “salt”, is added to the clear text after the user enters their password, but before the hashing is applied, to prevent the use of a pre-computed list of hashes (rainbow tables). When applied, a salt value makes this rainbow table file unusable and is a really easy way to make those types of attacks unfeasible.

For example using the MD5 hash algorithm, if we add a randomised salt “xhsr2d” to the cleartext “password” we get MD5(“xhsr2dpassword”), which is “ebf20a6c99eccaefa0bf4d88a5bd3456”. This is a totally different output than the original MD5-produced hash of "password" found above, which any hacker can find in a rainbow table on the internet. By salting the password, even if we had a rainbow table with hashes for every possible password with lowercase letters and maximum length of 10, we couldn’t find the match for the hash and would have defeated the use of pre-computed hashes.

What salting means in practise, is that attackers must brute force attack every password to successfully hack their encoded hashes in a database.

Brute forcing passwords means using the power of computers to programmatically try different hashes until the password with a matching hash value is found. Hashing functions rely on how hard it is to create two different messages that produce the same hash value. This is called a collision. If a hash function can be forced to generate the same values for different input data, it is considered broken.

You, as a user of an online service, usually can’t make sure that the possible preventive measures have been used by the developers. This means you should assume that rainbow tables might be used against your passwords. In practice, that means you should prefer longer passwords as they make the use of most rainbow tables impractical. Just think of the example above- by adding 7 characters to your password, you can make the life of a hacker a LOT harder!

Next section

IV. How to make your passwords secure

Start

III.Protecting your identity