5.7 Validating User Input

User input must never be trusted. It could be missing. It might be in the wrong format. It might even contain JavaScript or SQL as a means to causing some type of havoc. Thus, almost always user input must be tested for validity.

5.7.1 Types of Input Validation

The following list indicates most of the common types of user input validation.

Required information. Some data fields just cannot be left empty. For instance, the principal name of things or people is usually a required field. Other fields such as emails, phones, or passwords are typically required values.
Correct data type. While some input fields can contain any type of data, other fields, such as numbers or dates, must follow the rules for its data type in order to be considered valid.
Correct format. Some information, such as postal codes, credit card numbers, and social security numbers have to follow certain pattern rules. It is possible, however, to go overboard with these types of checks. Try to make life easier for the user by making user input forgiving. For instance, it is an easy matter for your program to strip out any spaces that users entered in their credit card numbers, which is a better alternative to displaying an error message when the user enters spaces into the credit card number.
Comparison. Some user-entered fields are considered correct or not in relation to an already inputted value. Perhaps the most common example of this type of validation is entering passwords: most sites require the user to enter the password twice and then a comparison is made to ensure the two entered values are identical. Other forms might require a value to be larger or smaller than some other value (this is common with date fields).
Range check. Information such as numbers and dates have infinite possible values. However, most systems need numbers and dates to fall within realistic ranges. For instance, if you are asking a user to input her birthday, it is likely you do not want to accept January 1, 214 as a value; it is quite unlikely she is 1800 years old! As a result, almost every number or date should have some type of range check performed.
Custom. Some validations are more complex and are unique to a particular application. Some custom validations can be performed on the client side. For instance, the author once worked on a project in which the user had to enter an email (i.e., it was required), unless the user entered both a phone number and a last name. This required multiple conditional validation logic. Other custom validations require information on the server. Perhaps the most common example is user registration forms that will ensure that the user doesn’t enter a login name or email that already exists in the system.

5.7.2 Notifying the User

What should your pages do when a validation check fails? Clearly, the user needs to be notified, but how? Most user validation problems need to answer the following questions:

What is the problem? Users do not want to read lengthy messages to determine what needs to be changed. They need to receive a visually clear and textually concise message. These messages can be gathered together in one group and presented near the top of a page and/or beside the fields that generated the errors. Figure 5.34 illustrates both approaches.

Figure 5.34 Displaying error messages

Figure 5.34 Full Alternative Text
Where is the problem? Some type of error indication should be located near the field that generated the problem. Some sites will do this by changing the background color of the input field or by placing an asterisk or even the error message itself next to the problem field. Figure 5.35 illustrates the latter approach.

Figure 5.35 Indicating where an error is located

Figure 5.35 Full Alternative Text
If appropriate, how do I fix it? For instance, don’t just tell the user that a date is in the wrong format; tell him or her what format you are expecting, such as “The date should be in yy/mm/dd format.”

5.7.3 How to Reduce Validation Errors

Users dislike having to do things again, so if possible, we should construct user input forms in a way that minimizes user validation errors. The basic technique for doing so is to provide the user with helpful information about the expected data before she enters it. Some of the most common ways of doing so include:

Using pop-up JavaScript alert (or other popup) messages. This approach is fine if you are debugging a site still in development mode or you are trying to re-create the web experience of 1998, but it is an approach that you should generally avoid for almost any other production site. Probably the
only usability justification for pop-up error messages is for situations where it is absolutely essential that the user see the message. Destructive and/or consequential actions such as deleting or purchasing something might be an example of a situation requiring pop-up messages or confirmations.
Provide textual hints to the user on the form itself, as shown in Figure 5.36. These could be static or dynamic (i.e., only displayed when the field is active). The placeholder attribute in text fields is an easy way to add this type of textual hint (though it disappears once the user enters text into the field).

Figure 5.36 Providing textual hints

Figure 5.36 Full Alternative Text
Using tool tips or pop-overs to display context-sensitive help about the expected input, as shown in Figure 5.37. These are usually triggered when the user hovers over an icon or perhaps the field itself. These pop-up tips are especially helpful for situations in which there is not enough screen space to display static textual hints. However, hover-based behaviors will generally not work in environments without a mouse (e.g., mobile or tablet-based browsers). HTML does not provide support for tool tips or pop-ups, so you will have to use a JavaScript-based library to add this behavior to your pages. The examples shown in Figure 5.37 were added via the Bootstrap framework introduced in Chapter 4.

Figure 5.37 Using tool tips

Figure 5.37 Full Alternative Text
Another technique for helping the user understand the correct format for an input field is to provide a JavaScript-based mask, as shown in Figure 5.38. The advantage of a mask is that it provides immediate feedback about the nature of the input and typically will force the user to enter the data in a correct form. While HTML5 does provide support for regular expression checks via the pattern attribute, if you want visible masking, you will have to use a JavaScript-based library to add masking to your input fields.

Figure 5.38 Using input masks

Figure 5.38 Full Alternative Text
Providing sensible default values for text fields can reduce validation errors (as well as make life easier for your user). For instance, if your site is in the .uk top-level domain, make the default country for new user registrations the United Kingdom.
Finally, many user input errors can be eliminated by choosing a better data entry type than the standard <input type="text">. For instance, if you need the user to enter one of a small number of correct answers, use a select list or radio buttons instead. If you need to get a date from the user, then use either the HTML5 <input type="date"> type (or one of the many freely available JavaScript-enabled custom versions). If you need a number, use the HTML5 <input type="number"> input type.

Pro Tip

One of the most common problems facing the developers of real-world web forms is how to ensure that the user submitting the form is actually a human and not a bot (i.e., a piece of software). The reason for this is that automated form bots (often called spam bots) can flood a web application form with hundreds or thousands of bogus requests.

This problem is generally solved by a test commonly referred to as a CAPTCHA (which stands for Completely Automated Public Turing test to tell Computers and Humans Apart) test. Most forms of CAPTCHA ask the user to enter a string of numbers and letters that are displayed in an obscured image that is difficult for a software bot to understand. Other CAPTCHAs ask the user to solve a simple mathematical question or trivia question.

We think it is safe to state that most human users dislike filling in CAPTCHA fields, as quite often the text is unreadable for humans as well as for bots. They also present a usability challenge for users with visual disabilities. As such, in general one should only add CAPTCHA capabilities to a form if your site is providing some type of free service or the site is providing a mechanism for users to post content that will appear on the site. Both of these scenarios are especially vulnerable to spam bots.

If you do need CAPTCHA capability, there is a variety of third-party solutions. Perhaps the most common is reCAPTCHA, which is a free open-source component available from Google. It comes with a JavaScript component and PHP libraries that make it quite easy to add to any form.

5.7.4 Where to Perform Validation

Validation can be performed at three different levels. With HTML5, the browser can perform basic validation. Figure 5.39 illustrates how HTML5 validation appears in the browser. For instance, in the following example, the required and pattern attributes are used to validate a date in the format ##/##/####.

<input type="text" pattern="\d{1,2}/\d{1,2}/\d{4}" required>

The figure consists of a browser window with textboxes.

Figure 5.39 Full Alternative Text

What is that strange set of text used in this pattern attribute? It is a regular expression, a popular standardized language used in a wide variety of languages and platforms for the matching and manipulating text. Regular expressions will be covered in a bit more detail in Chapter 9.

However, since the validation that can be achieved in HTML5 is quite basic (and there is no real control over how it looks and behaves), many web applications do not use this level of validation and instead perform validation in the browser using JavaScript (covered in Chapters 8–11). If you wish to disable browser validation (perhaps because you want a unified visual appearance to all validations), you can do so by adding the novalidate attribute to the form attribute:

<form id="sampleForm" method="..." action="..." novalidate>

The advantage of validation using JavaScript is that it reduces server load and provides immediate feedback to the user. The immediacy of JavaScript validation dramatically improves the user experience of data-entry forms, and for this reason it is an essential feature of any real-world web site that uses forms.

Unfortunately, JavaScript validation cannot be relied on: for instance, it might be turned off on the user’s browser. For these reasons, validation should always be done on the server side as well. Indeed, server-side validation is arguably the most important since it is the only validation that is guaranteed to run. Figure 5.40 illustrates the interaction of the different levels of validation.

The figure shows 4 Blocks that display 2 User Forms, Browser, and Server along with various steps involved in visualizing the levels of validation.

Figure 5.40 Full Alternative Text

Tools Insight

Version Control

Managing your code base is a challenge for anyone who has worked in web development. You may even have adopted some personal strategies to keep backups of your work in case you break something and need to go back. Version control systems (also known as software configuration management or SCM systems) provide a way to manage all your changes for you, so that you can easily go back, track changes, and work with multiple people at the same time on the same files. That is, version control systems are analogous to a database that stores snapshots of your code (see Figure 5.41).

Figure 5.41 Version control software

Figure 5.41 Full Alternative Text

There are a variety of popular version control systems available. Some make use of a centralized storage system; Concurrent Versions System (CVS) and Subversion (SVN) are two popular version systems that were especially popular a decade ago. Other version control systems make use a distributed storage system (i.e., multiple computers can act as storage systems); the most popular of these is Git, which will be the focus of this tools insight.

Git (and all distributed version control systems) is a software program, much like your web server that runs on your computer, or optionally can be installed on a remote server. Popular services like GitHub and Bitbucket offer easy-to-use web-based remote repositories (described below) but should not be conflated with Git, the software daemon that you can download, install, and run yourself for free.

Git has a reputation for being daunting to learn, and indeed we do not have the space in the book to fully teach Git. The Git website provides a comprehensive online book (https://git-scm.com/book/en/v2) that can help you learn Git; the Git Tower website also has an excellent online book (https://www.git-tower.com/learn/git/ebook). If Git seems too difficult to master, you might consider using version control as part of a larger Integrated Development Environment (IDE). However, we certainly recommend taking to time to learn Git. It has become an essential tool for all developers, and many employers expect their software developers to be proficient with it. Similarly, making use of an online remote repository such as GitHub for sharing your code has become an important part of contemporary web development workflow and employers often expect their potential hires to have some of their code (for instance, school assignments) publicly accessible.

Once you download and install Git (and are granted access to a university, corporate or personal repository), you can create your first repository and start interacting with the system. Git is a command-line tool, so using it involves using the Terminal (Mac) or Command Prompt or Powershell in Windows. In other words, learning Git involves learning a variety of different commands, visualized in Figure 5.42. We have summarized many of the key Git commands below. There are GUI tools that integrate these commands into larger IDE applications.

Figure 5.42 Git workflow

Figure 5.42 Full Alternative Text

Create a Repository

You normally have a repository for each project. Use the command line to navigate to a folder you want to work in (the working folder) and type:

git init

This will create a local repository (or “repo”) and also create a folder in the code folder named .git. It’s best to leave this folder and its content alone, since Git uses it to store data (see 1 in Figure 12.31).

Once your repository is created, you will typically be performing add/commit/push commands as the main actions using Git.

Adding Files

Whether you initialized Git on an empty folder or one with files already present, the files that you wish to track must be added explicitly. Each time you create a file in your working directory you must also add it to Git using the Git add command as follows.

git add <filename>

To add everything that has been changed to the commit you would enter:

git add .

It should be mentioned that the add command doesn’t change the repository. All it does is tell Git to add these files to the next commit. That is, it adds it to the Index, which is a staging area for modified files ready to be committed (the 2 in Figure 12.31).

Committing Files

While saving files in your working folder is important (how else can you test them in the browser?), it does not save them on the repository. To update the local repository to reflect all the changes you’ve made to a file (or files), you must commit them (3 in Figure 12.31) using the commit command.

The -m flag and message used with the command allows you to attach a message with the commit; this can provide a brief summary of changes made so that later a log can be examined to determine what changes people made to code where and when. For a new file, we can commit it easily with:

git commit <filename> -m "Initial commit message"

This sends the local file to the repository and replaces the HEAD of the repository with a reference to the new file. In practice files are often committed together, reminding us that the HEAD is a reference to the commit itself, not any particular file.

Pushing Files to Remote Repository

Git is a locally installed version control system. To collaborate with other developers on a single project, your files must be stored on a remote repository, which is a Git repository hosted on the internet (for instance, on GitHub or BitBucket) or on a network accessible to the other developers. Just as you had to initialize one time a folder for Git, you have to tell Git one time to add a remote repository using the remote add command.

git remote add origin <url>

The word “origin” becomes a shortname that we can use to reference the remote repository in subsequent commands. If you have already run the clone command, this origin shortname will already be defined and associated with the URL used in the clone.

Once a remote repository has been added, you can push (4 in Figure 12.31) your master branch (see below) up to the remote repository with the command:

git push origin master

However, if other people have also pushed revised content to the server, Git will reject your push. You will have to fetch their work, merge it into yours, and then do the push. This is where Git shows its true power (but also becomes much more complicated).

Information Commands

There are several commands (see 5) that return information to you but do not change the local or remote version of files. For instance, to see the current status of your files (i.e., which need updating) type:

git status

After some time, each file will have a history built up capturing the changes to files made through successive commits over time, which can be viewed via the log command.

git log <filename>

Branches

One of the most important features of Git is its ability to maintain multiple version of your files. A Git branch (see 6) allows you to change content in isolation from the default master branch. For instance, imagine you are working on a production application, and you need to make a hotfix to the application to remove a bug while your coworker wants to develop a new feature. Knowing you might have to change many files, you could spawn a new branch and make your changes within that branch; while your team continues work on the main branch. This way you can commit changes to your own branch as you need to, knowing that you are not impacting the rest of the team. Once each of you is satisfied with another developer’s branch changes, they would merge their branches into the main master branch. A branch is created using the branch command:

git branch <branchname>

This only creates a new branch. To use it for subsequent adds and commits, you will need to use the checkout command.

Checking Out Files

The checkout command (see 7) provides a lot of power and flexibility. It can be used to switch to a different branch.

git checkout <branchname>

What exactly does this do? The files in the local working folder will be updated to match the version in the selected branch. The HEAD pointer in the local repository will now also point to the last commit on this branch.

The checkout command can also be used to download files from a local repository to your local folder. The checkout takes the most recent version of the file (also called the Head of the branch) and overwrites your local file, if it exists. Once you have a checked out file, your edits are made locally, only to be added back to the repository through a commit command.

The ability to roll back code to a previous version is one of the reasons version control is so popular. If you want to go back to the most recently committed version in the repository (the HEAD), you simply recheck out the file to update it with the version in the repository.

git checkout <filename>

If you want to roll back to particular version, use the Git log command to identify the hash and then roll back to that hash:

git checkout <hash-of-version-to-checkout> <filename>

Git provides the revert and reset commands as well for undoing changes, which are not covered here.

Merge

Once a branch is complete and you want to merge the changes in one branch onto its parent, you checkout the parent branch and run the merge command (see 8).


git checkout master
git merge <branchname>

This process doesn’t always happen smoothly; when multiple people are merging onto the same parent branch, Git might not be able to merge your changes by itself. In such a case, you may have to use the diff command to help you manually merge changes together, since Git can’t do it.

git diff <filename>

The cryptic output returned from the Git diff command shows changes between the current local file and the HEAD version using the + symbol and green to show which lines are added and a - symbol and red to show deletions. In Chapter 13 we illustrate another (easier) way of using Git diff, accessed through an Integrated Development Environment.

Pulls, Fetches, Clones, and Forks

Sometimes you will want to retrieve specific branches, or all the branches, from the remote repository, which can be accomplished via the clone, fetch, and pull commands (see 9). We won’t be covering all these commands in this already too-long tools insight section. The clone command is quite useful even for beginners with Git.

You often want to begin a project by copying files from an existing remote repository, which can be done via the clone command.

git clone <url>

For instance, you can clone the start project files for this book by using the command:

git clone https://github.com/MountRoyalCSIS/funwebdev-projects-start.git

This copies (downloads) all the data and files for this repository from the publicly accessible online GitHub repository into the current folder on your machine.

Finally, one of the key benefits of online remote repositories such as GitHub is the ability to fork another online repository. Forking a remote repository is essentially copying one remote repository into a different remote repository. This is an especially valuable way for a developer (or a set of developers) to experiment with a remote repository without modifying the original remote repository. Developers often use forking as a way to use someone else’s project as the starting point for their own project.