Looking-Glass Output Is Not Reality — An Operator's Mental Model

Looking-Glass Output Is Not Reality — An Operator's Mental Model

A looking glass is a window. You stand on one side, look through, and see a slice of the network. The slice is real. It is also one router, at one moment, with one route policy. Treat that slice as "the internet's view" and the next three hours of your investigation will be wrong.

I've sat through too many incident calls where somebody pasted a looking-glass output and declared the problem solved, only for the issue to keep happening for another forty-eight hours. The looking glass wasn't lying. It was just answering a different question than the one being asked.

What a looking glass actually returns

A looking glass is, in operational terms, a thin web interface in front of a single BGP-speaking router that lets you query that router's RIB. When you ask "what's the path to 192.0.2.0/24", the router answers with the path it would use, filtered through its configured route policies, after import filters, after RPKI validation if enabled, after community-based preferences.

The path you see is the path that one router believes, right now. It is not a consensus path. It is not what your prefix looks like from "the internet". It is what one operator's edge router decided this morning.

Three ways the picture diverges from reality

Route policy distorts what you see. If the looking-glass operator has a strict ingress filter that drops paths longer than four AS hops, you won't see paths longer than four hops. They exist. The looking glass just refused to install them.

Best-path selection masks alternatives. A router learns multiple paths to a prefix and picks one. The looking glass shows you the picked one. If there are five paths and four of them would be picked under slightly different conditions, you don't see them. The single output looks definitive when it isn't.

Time. Things move. A looking glass query is a snapshot of the moment. BGP is not a steady-state system — it's a continuously-updating event stream. The path you queried at 14:03 may not be the path that was active at 14:01 when the incident report was filed. Re-running the query thirty minutes later can give you a completely different answer for reasons unrelated to the original problem.

The mental model that works

Replace "what does the internet see" with "what does this specific router decide, under this specific policy, at this specific instant". Now the looking glass is a useful tool and not a misleading one.

When you have a routing question, ask it across multiple looking glasses simultaneously. Pick three or four geographically diverse vantage points. Run the same query against each. The answers will not all match. The discrepancies are the data.

If three out of four show one path and one shows another, you've found a localised route divergence — which is what you actually wanted to know. If all four show wildly different paths, you've found a propagation problem that nobody router can see alone.

The tools that aggregate

This is what RIPE RIS, RouteViews, and bgproutes.io exist for. They're not better looking glasses. They're aggregators that watch many vantage points and present the union of what each one sees. That aggregate view is closer to "what the internet sees" than any single looking-glass query.

ASPA validation, when more transits implement it, will give you a different kind of consensus — a programmatic answer to "does this path comply with the published peering policies of the ASes along it". That's a step beyond pure path observation.

For now, the operator's discipline is: never quote a single looking-glass output as final. Always cross-reference. Always note the time. Always note which vantage point.

What this means for incident response

The investigation order for any routing issue should be:

One. Pull the customer's prefix announcement state — is it actually being advertised? Direct check against the customer's edge.

Two. Query four geographically diverse looking glasses. Note disagreements.

Three. Pull RIPE RIS or RouteViews for the prefix. The aggregated answer tells you what most of the world is doing with the prefix.

Four. Only then look at specific looking-glass paths in detail, to chase the disagreement that step two found.

Most incidents that "took six hours" took six hours because step four happened in place of steps one through three. The looking glass answered confidently and incorrectly, and the team spent hours chasing the wrong path.

The honest disclaimer

I run looking glasses. I've operated them, I've debugged them, I've watched them break. They are a useful tool when used as a sample, not as a source of truth. The operator who treats them as the latter is the one whose RCA documents say "intermittent issue, root cause not determined". The operator who treats them as the former has a real RCA in three hours.

The looking glass is a window. Open four of them.