16.3 Cryptography

Being able to send a secure message has been an important tool in warfare and affairs of state for centuries. Although the techniques for doing so have evolved over the centuries, at a basic level we are trying to get a message from one actor (we will call her Alice), to another (Bob), without an eavesdropper (Eve) intercepting the message (as shown in Figure 16.9). As you may recall, such an intercept in the field of computer security is referred to as a man-in-the-middle attack. These placeholder names are in fact the conventional ones for these roles in cryptography.

Figure 16.9 Alice transmitting to Bob with Eve intercepting the message

The figure illustrates the transmission and interception of a message between three people named Alice, Bob, and Eve.
Figure 16.9 Full Alternative Text

Eavesdropping could allow someone to get your credentials while they are being transmitted. This means even if your PIN was shielded, and no one could see it being entered over your shoulder, it can still be seen as it travels across the Internet to its destination. Back in Chapter 1, you learned how a single packet of data can be routed through any number of intermediate locations on its way to the destination. If that data is not somehow obfuscated, then getting your password is as simple as reading the data during one of the hops.

A cipher is a message that is scrambled so that it cannot easily be read, unless one has some secret knowledge. This secret is usually referred to as a key. The key can be a number, a phrase, or a page from a book. What is important in both ancient and modern cryptography is to keep the key a secret between the sender and the receiver. Alice encrypts the message (encryption) and Bob, the receiver, decrypts the message (decryption), both using their keys as shown in Figure 16.10. Eavesdropper Eve may see the scrambled message (cipher text), but cannot easily decrypt it, and must perform statistical analysis to see patterns in the message to have any hope of breaking it.

Figure 16.10 Alice and Bob using symmetric encryption to transmit messages

The figure illustrates the transmission and interception of a message between three people named Alice, Bob, and Eve using encryption and decryption.
Figure 16.10 Full Alternative Text

To ensure secure transmission of data, we must draw on mathematical concepts from cryptography. In the next subsection several ciphers are described that provide insight into how patterns are sought in seemingly random messages to encrypt and decrypt messages. The mathematics of the modern ciphers are described at a high level, but in practice the implementations are already provided inside of web servers and your web browsers.

16.3.1 Substitution Ciphers

A substitution cipher is one where each character of the original message is replaced with another character according to the encryption algorithm and key.

Caesar

The Caesar cipher, named for and used by the Roman Emperor, is a substitution cipher where every letter of a message is replaced with another letter, by shifting the alphabet over an agreed number (from 1 to 25).

The message HELLO, for example, becomes KHOOR when a shift value of 3 is used as illustrated in Figure 16.11. The encoded message can then be sent through the mail service to Bob, and although Eve may intercept and read the encrypted message, at a glance it appears to be a non-English message. Upon receiving the message, Bob, knowing the secret key, can then transcribe the message back into the original by shifting back by the agreed-to number.

Figure 16.11 Caesar cipher for shift value of 3. HELLO becomes KHOOR
The figure illustrates Caesar cipher for shift value of 3.

Even without a computer, this cipher is quite vulnerable to attack since there are only 26 possible deciphering possibilities. Even if a more complex version is adopted with each letter switching in one of 26 ways, the frequency of letters (and sets of two and three letters) is well known, as shown in Figure 16.12, so a thorough analysis with these tables can readily be used to break these codes manually. For example, if you noticed the letter J occurring most frequently, it might well be the letter E.

Figure 16.12 Letter frequency in the English alphabet using Oxford English Dictionary summary9
The figure illustrates the Letter frequency in the English alphabet using Oxford English Dictionary.

Any good cipher must, therefore, try to make the resulting cipher text letter distribution relatively flat so as to remove any trace of the telltale pattern of letter distributions. Simply swapping one letter for another does not do that, necessitating other techniques.

Modern Block Ciphers

Building on the basic ideas of replacing one character with another and aiming for a flat letter distribution, block ciphers encrypt and decrypt messages using an iterative replacing of a message with another scrambled message using 64 or 128 bits at a time (the block).

The Data Encryption Standard (DES) and its replacement, the Advanced Encryption Standard (AES) are two-block ciphers still used in web encryption today. These ciphers are not only secure, but operate with low memory and computational requirements, making them feasible for all types of computer from the smallest 8-bit devices all the way through to the 64-bit servers you use.

While the details are fascinating to a mathematically inclined reader, the details are not critical to the web developer. What happens in a broad sense is that the message is encrypted in multiple rounds where in each round the message is permuted and shifted using intermediary keys derived from the shared key and substitution boxes. The DES cipher is broadly illustrated in Figure 16.13. Decryption is identical but uses keys in the reverse order.

Figure 16.13 High-level illustration of the DES cipher
The figure is a High-level illustration of the D E S cipher.

Triple DES (perform the DES algorithm three times) is still used for many applications and is considered secure. What’s important is that the resulting letter frequency of the cipher text is almost flat, and thus not vulnerable to classic cryptanalysis.

All of the ciphers we have covered thus far use the same key to encode and decode, so we call them symmetric ciphers. The problem is that we have to have a shared private key. The next set of ciphers do not use a shared private key.

16.3.2 Public Key Cryptography

The challenge with symmetric key ciphers is that the secret must be exchanged before communication can begin. How do you get that information exchanged? Over the phone? In an email? Through the regular mail? Moreover, as you support more and more users, you must disclose the key again and again. If any of the users lose their key, it’s as though you’ve lost your key, and the entire system is broken. In a network as large as the Internet, private key ciphers are impractical.

Public key cryptography (or asymmetric cryptography) solves the problem of the secret key by using two distinct keys: a public one, widely distributed and another one, kept private. Algorithms like the Diffie-Hellman key exchange, published in 1976, provide the basis for secure communication on the WWW.10 They allow a shared secret to be created out in the open, despite the presence of an eavesdropper Eve.

Note

To adequately describe public key cryptography, the next sections describe some mathematic manipulations. You can skip over this section and still use public key cryptography, although you may want to return later to understand what’s happening under the hood.

Diffie-Hellman Key Exchange

Although the original algorithm is no longer extensively used, the mathematics of the Diffie-Hellman key exchange are accessible to a wide swath of readers, and subsequent algorithms (like RSA) apply similar thinking but with more complicated mathematics.

The algorithm relies on properties of the multiplicative group of integers modulo a prime number (modulo being the term to describe the remainder left when dividing), as illustrated in Figure 16.14, and relies on the power associative rule, which states that:

gab=gba
Figure 16.14 Illustration of a simple Diffie-Hellman Key Exchange for g=2 and p=11
The figure illustrates a simple Diffie-Hellman Key Exchange for g equals 2 and p equals 11.

The essence of the key exchange is that this gab can be used as a symmetric key for encryption, but since only ga and gb are transmitted the symmetric key isn’t intercepted.

To set up the communication, Alice and Bob agree to a prime number p and a generator g for the cyclic group modulo p.

Alice then chooses an integer a, and sends the value ga mod p to Bob.

Bob also chooses a random integer b and sends gb mod p back to Alice.

Alice can then calculate (gb)a mod p since she has both a and gb and Bob can similarly calculate (ga)b mod p. Since gab=gba, Bob and Alice now have a shared secret key that can be used for symmetric encryption algorithms such as DES or AES.

Eve, having intercepted every communication, only knows g, p, ga mod p, and gb mod p but cannot easily determine a, b, or gab. Therefore the shared encryption key has been successfully exchanged and secure encryption using that key can begin!

RSA

The RSA algorithm, named for its creators Ron Rivest, Adi Shamir, and Leonard Adleman, is the public key algorithm underpinning the HTTPS protocol used today on the web.11 In this public key encryption scheme, much like the Diffie-Hellman system, Alice and Bob exchange a function of their private keys and each, having a private key, determine the common secret used for encryption/decryption. It uses powers and modulo to encode the message and relies on the difficulty of factoring large integers to keep it secure. Its implementation is included in most operating systems and browsers, making it ubiquitous in the modern secure WWW. The algorithm itself would take pages to describe and is left as an exercise for interested readers.

Pro Tip

Drawing from number theory, the DH key exchange depends on the fact that numbers are difficult to factor. To understand some of the restrictions, consider some concepts from number theory.

When we say g is a generator, we mean that if you take all the powers of g modulo some number p, you get all values {1, 2,. ,p1}MathType@MTEF@5@5@+=feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8qacaGG7bGaaGymaiaacYcacaqGGcGaaGOmaiaacYcacaqGUaGaaeiiaiaab6cacaqGGaGaaeOlaiaabccacaqGSaGaaeiCaiabgkHiTiaaigdacaGG9baaaa@4357@. Consider p=11 and g=2. The first 11 powers of 2 mod 11 are 2,4,8,5,10,9,7,3,6,1. Since 2 generates all of the integers, it’s a generator and we can consider the DH Key exchange example as illustrated in Figure 16.14.

16.3.3 Digital Signatures

Cryptography is certainly useful for transmitting information securely, but if used in a slightly different way, it can also help in validating that the sender is really who they claim to be, through the use of digital signatures.

A digital signature is a mathematically secure way of validating that a particular digital document was created by the person claiming to create it (authenticity), was not modified in transit (integrity), and to prevent sender from denying that she or he had sent it (nonrepudiation). In many ways, digital signatures are analogous to handwritten signatures that theoretically also imbue the document they are attached to with authenticity, integrity, and nonrepudiation.

For instance, to sign a digital document, the process shown in Figure 16.15 can be employed. It uses public and private key pairs for validating the digital signature within the document. As you can see in step , Bob needs access to Alice’s public key. This step is also required for HTTPS (which is covered in the next section), and makes use of certificate authorities as the mechanism for transmitting public keys. Notice that the flow in Figure 16.15 doesn’t encrypt the message itself; it is only a way of validating the identity of the sender.

Figure 16.15 Illustration of a digital signature flow

The image contains sequence of steps from the Message is written until hash is calculated.