The Visible Part of an HTML Document

4.6 Character Entities in HTML

While the importance of character entities in HTML has diminished considerably with the gradual spread of Unicode (especially UTF-8), I should still touch on them briefly here because there are always reasons to use them. In Section 4.5, you learned about different character sets, and, by now, you know how to specify the character set used as a <meta> tag in the HTML document head with charset. For example, if you’ve specified ISO-8859-1 or ISO-8859-15 as the character set and want to use the word shalom (= inline image ) in Hebrew characters, you’re likely to be unsuccessful:

<meta charset="iso-8859-1">
  ...
<p>Shalom: </p>

The output is likely to be a cluster of cryptic characters instead of inline image . The simplest solution would be to change the character set to UTF-8 via

<meta charset="UTF-8">

There might also be a different problem with this example: How do you type the word inline image in your editor? If you don’t happen to have a Hebrew keyboard in front of you or a virtual keyboard with Hebrew characters, the simple and quick solution might be to use the character entities of HTML. This is how you write the word “Shalom” in Hebrew using character entities:

<p>Shalom: <bdo>&#1501;&#1503;&#1500;&#1513;</bdo></p>

4.6.1 Structure of a Character Entity in HTML

As you’ve seen before from the four Hebrew characters, an HTML entity starts with the & character and ends with the semicolon. Now you have two options to arrange the sign:

Numeric entities
You specify the form with &#nnn;. Here, nnn stands for the encoding of the character. This form is used when it isn’t possible to enter the character via the keyboard. The notation can also be in the form of &#xhhh;, where xhhh is the hexadecimal value for the character. The notation without x is the decimal notation.

Named entities
This is an easier-to-remember name that has been agreed on for the character. You may have already seen examples with < (lt = less than) or > (gt = greater than) where people prefer to use these named entities. Alternatively, you can use the numeric entity instead of the named entities. For example, with <, <, and <, you would use the < sign (less-than sign) three times.

Masking HTML-Specific Characters

Especially if you use special characters in your body text that are part of the HTML syntax, you should mask these characters by using the appropriate entity. For example, the following line is likely to cause display problems in a web browser:

<p>Mexico City<Tokyo and Mumbai>London</p>

The web browsers would only output Mexico City-London here, because the area between < and > is considered an HTML element (even if it’s wrong). Although you could solve this problem with a blank line in between, you should use the appropriate entity for this, to be on the safe side. This is where the named entity comes in handy:

<p>Mexico City&lt;Tokyo and Mumbai&gt;London</p>

The ampersand character & belongs to it as well and should be used via the string & in the continuous text.

In addition, if you want to use the double quote within HTML attributes, you should replace " with ", such as the following:

<img src="#" alt="Cover of book &quot;CSS from A to Z&quot;">

In the alt attribute, " was used as a masking character for ". If you used " instead of the named entity " here, the area in between would probably be “swallowed” by the web browser.

More Unicode Numbers

You can find even more Unicode numbers for a desired character at www.unicode.org/charts.