4.6    Character Entities in HTML

While the importance of character entities in HTML has diminished considerably with the gradual spread of Unicode (especially UTF-8), I should still touch on them briefly here because there are always reasons to use them. In Section 4.5, you learned about different character sets, and, by now, you know how to specify the character set used as a <meta> tag in the HTML document head with charset. For example, if you’ve specified ISO-8859-1 or ISO-8859-15 as the character set and want to use the word shalom (=inline image) in Hebrew characters, you’re likely to be unsuccessful:

<meta charset="iso-8859-1">
...
<p>Shalom: inline image</p>

The output is likely to be a cluster of cryptic characters instead of inline image. The simplest solution would be to change the character set to UTF-8 via

<meta charset="UTF-8"> 

There might also be a different problem with this example: How do you type the word inline image in your editor? If you don’t happen to have a Hebrew keyboard in front of you or a virtual keyboard with Hebrew characters, the simple and quick solution might be to use the character entities of HTML. This is how you write the word “Shalom” in Hebrew using character entities:

<p>Shalom: <bdo>&#1501;&#1503;&#1500;&#1513;</bdo></p> 

4.6.1    Structure of a Character Entity in HTML

As you’ve seen before from the four Hebrew characters, an HTML entity starts with the & character and ends with the semicolon. Now you have two options to arrange the sign:

Masking HTML-Specific Characters

Especially if you use special characters in your body text that are part of the HTML syntax, you should mask these characters by using the appropriate entity. For example, the following line is likely to cause display problems in a web browser:

<p>Mexico City<Tokyo and Mumbai>London</p> 

The web browsers would only output Mexico City-London here, because the area between < and > is considered an HTML element (even if it’s wrong). Although you could solve this problem with a blank line in between, you should use the appropriate entity for this, to be on the safe side. This is where the named entity comes in handy:

<p>Mexico City&lt;Tokyo and Mumbai&gt;London</p> 

The ampersand character & belongs to it as well and should be used via the string &amp; in the continuous text.

In addition, if you want to use the double quote within HTML attributes, you should replace " with &quot;, such as the following:

<img src="#" alt="Cover of book &quot;CSS from A to Z&quot;"> 

In the alt attribute, &quot; was used as a masking character for ". If you used " instead of the named entity &quot; here, the area in between would probably be “swallowed” by the web browser.

More Unicode Numbers

You can find even more Unicode numbers for a desired character at www.unicode.org/charts.