April 14, 2026

The 25 billion dollar eigenvector

In my linear algebra class last week we covered PageRank. I already knew roughly how Google search worked, but I had never actually sat with the math. Once the professor drew the importance matrix on the board and wrote the punchline under it, I stopped taking notes. It was one of those moments where an abstract concept you learned in week 2 turns out to be the reason a trillion dollar company exists.

I want to write it down while the feeling is still fresh.

The problem in the 90s

In the mid-90s, search engines like AltaVista and Yahoo ranked pages by keyword frequency. If you searched for a band name, the engine scanned every page it knew about, counted how many times those words appeared, and showed you the pages with the most matches. That was the whole algorithm.

This was obviously broken. The more unsavory corners of the web figured out within weeks that you could paste the name of a popular album a thousand times into a page, often in white text on a white background, and vault to the top of the results. The engines had no way to tell a good page from a gamed one. The signal they were measuring was too easy to fake.

The insight

Larry Page and Sergey Brin noticed that the web already contained a better signal. Links. When one page links to another, it's a tiny vote of confidence. Not all votes are equal. A link from a trusted, well-linked page should count more than a link from some random blog. And a page with a hundred outgoing links divides its influence across all of them, so each individual link counts less.

You can turn this into a rule that mostly defines itself in terms of itself. A page is important if important pages link to it. The circular definition is a feature, not a bug. If you write it down properly, it collapses into a single equation.

There's a nicer way to think about it. Imagine a random surfer who starts on some page, clicks a random link, clicks another random link from the next page, and keeps going forever. The pages where the surfer spends the most time are the important ones. That intuition and the circular definition turn out to be exactly the same object.

The math

Here is where it got good. You build an importance matrix. If you have n pages, the matrix is n by n. The entry in row i, column j is the fraction of page j's importance that flows into page i. If page j has three outgoing links, each of those links carries 1/3 of j's rank. Every column sums to 1. The matrix is stochastic.

Now let v be the vector of all page ranks. The importance rule says that the rank of each page equals the sum of the ranks flowing into it. Write that out and you get Av = v. The rank vector is an eigenvector of the importance matrix with eigenvalue 1.

This is the moment I stopped in class. Ranking the entire web was not a clever search problem. It was an eigenvector problem. Every algorithm in the textbook for finding eigenvectors was suddenly a candidate for indexing the internet.

Fixing the matrix

The real web breaks the nice math in two ways. Some pages have no outgoing links, so their column in the matrix is all zeros and the matrix stops being stochastic. And even when it is stochastic, you can have disconnected clusters of pages that trap the random surfer, which messes up the uniqueness of the steady state.

Page and Brin patched this with two moves. First, any dangling page with no outgoing links gets replaced by a column of 1/n, as if the surfer teleports to a random page. Second, on every click, with some small probability p, the surfer ignores the links on the current page and jumps to a completely random page anyway. This p is called the damping factor, usually 0.15.

The resulting matrix, the Google Matrix, is a positive stochastic matrix. That matters because the Perron Frobenius theorem guarantees it has a unique steady state with positive entries. That steady state is the PageRank vector. It exists, it's unique, and you can compute it by repeatedly multiplying a starting vector by the matrix until it stabilizes.

Why this stuck with me

The idea that Google search, at its mathematical core, is a power iteration finding the dominant eigenvector of a stochastic matrix is the kind of fact I find almost physically satisfying. No machine learning. No heuristics. No magic. A stochastic matrix, a theorem from 1907, and enough compute to multiply it a few hundred times.

The real challenge was never the math. It was scale. Google's actual matrix has billions of rows, and you can't store it or multiply it naively. Every engineering problem Google solved in its early years, from distributed storage to MapReduce, was downstream of wanting to compute this one eigenvector fast and often.

PageRank is a reminder that some of the biggest ideas in tech come from applying old math to new systems. Page and Brin didn't invent a theorem. They looked at a problem the world was failing at, recognized that the structure of the web fit the shape of linear algebra, and wrote the translation. Sometimes the most valuable thing you can do is notice that two things are the same.

That's what I love about classes like this. You spend weeks grinding through eigenvalue calculations on toy matrices, and then one example shows up and you realize the thing you've been practicing is worth 25 billion dollars.