PageRank is an algorithm, published by Google’s founders in 1998.4 This early discussion of search engines and the thinking behind them is essential reading for anyone interested in search engines. The PageRank algorithm is the basis for search engine ranking, although in practice it has been modified and changed in the decades since its publication. According to the authors, PageRank is
a method for computing a ranking for every web page based on the graph of the web.
The graph of the web being referred to looks at the hyperlinks between web pages, and how that creates a web of pages with links. Links into a site are termed backlinks, and those backlinks are key to determining which pages are more important. Sites with thousands of backlinks (from other domains) are surely more important than sites with only a handful of backlinks into them.
The remainder of this section describes the mathematics of the PageRank algorithm. While most web developers do not need to master this math, it is helpful for understanding why search engine optimization works.
The simplified definition of a site n’s PageRank is:
In this formula the PageRank of a page, that is, , is determined by collecting every page v that links to , and summing their PageRanks divided by the number of links out . In order to apply this algorithm, we begin by assigning each page the same rank: . With these initial ranks in place, we can iteratively calculate the updated PageRank using the formula above.
To illustrate this concept look at the four web pages listed in Figure 18.4. Intuitively A is the most important since all other pages link to it, but to formalize this notion, let’s calculate the actual PageRank. To begin, assign the default rank to all pages:

Beginning with Page A, we calculate the updated PageRank.
Since all three other pages link to A, we must substitute all three components in our sum.
We know the page ranks of B, C, D and can count the links out of each , , and .
Since B has A and C backlinking to it:
C has only D backlinking to it so:
Finally, D has B and C backlinks so:
Figure 18.5 shows the four pages with PageRanks after two iterations. See if you can arrive at the same values for iteration 2. Interestingly, Page B has a higher calculated rank than A, defying our initial guess.

In practice the links can change between iterations as well if the page was re-crawled so the formula must be dynamically interpreted every time. Interestingly, the updated ranks always sum together to make one. This is not the case if one of the pages was a rank sink, that is, a page with no links as shown in Figure 18.6 where Page A has no links to other pages. There you can see with each iteration the total PageRank decreases. A more sophisticated PageRank algorithm introduces a scalar factor to prevent rank sinks.5
