Where in Time  /  Journal

← back to game

Building Where in Time — a two-week solo indie dev journal

By @handleofiron · Published 2026-05-12

Two weeks from "I want to make a Wordle for history" to live launch. Vanilla JavaScript frontend, Express backend, no framework, no build step. 374 historically anchored 360° panoramas generated through a three-stage AI pipeline. Per-era scoring sigma. A handful of mistakes I had to undo. Here's how it went.

Why this game

I'd been playing Wordle for months and GeoGuessr on weekends, and the thing both games do that nothing else does is the once-a-day shared puzzle. Everyone, everywhere, getting the same problem at the same time. The spoiler risk is the social hook. Your group chat lights up because you all played the same five minutes apart.

GeoGuessr nails the where. Wordle nails the daily ritual. Nobody had cleanly merged the two AND added a year dimension — guess when, not just where. Felt like an obvious gap. The hard part was always going to be the imagery: you can't Street View Hammurabi's Babylon or D-Day on Omaha Beach. But AI image generation had reached the point where a curated panorama could pass for "you are there" if the prompt and the pipeline were tuned right.

So I made it.

The stack

Boring on purpose:

The single biggest decision was the no-build-step rule. It's heretical in 2026 — React + TypeScript + Vite is the assumed default — but for a project where the bottleneck is content (scene generation) not engineering, the "edit, refresh, done" loop is a 10× productivity multiplier. There's no Webpack config to debug. There's no bundle to wait for. There's no version-mismatch hell. The whole frontend is three files; you can read them.

I'd make the same call again.

The panorama pipeline

This is the technically interesting part.

Each scene is a 360° panorama. The pipeline runs from scene draft to live game in about two minutes wall-time per scene, at roughly $0.36 in API costs. The stages:

  1. Scene card. A human-written prompt specifying the event, the year, the location (real lat/lon on Earth), the visual anchors that must be identifiable in the frame (a specific landmark, costume era, technological tell), and rough composition guidance.
  2. Cinematic still. gpt-image-2 generates a high-fidelity panoramic still from the prompt. About 90 seconds.
  3. Upscale. Topaz Photo AI (High Fidelity V2, 3×) upscales the still to the panorama resolution the sphere viewer requires. About 20 seconds.
  4. Difficulty rank. A second AI pass evaluates the result and assigns a 1–5 difficulty score. The signal is "how recognizable is the event from the panorama alone, to an American audience?" Recognizable events score easy; ambient-history scenes score hard.
  5. Upsert. Scene metadata lands in scenes.json; the panorama uploads to Cloudflare R2.
  6. Review. A human (me) plays the scene at /sphere-demo.html on localhost to spot-check before it ships. About a third of scenes get a seam-bust reshoot or a difficulty override at this step.

The whole thing is orchestrated by a script (scripts/produce-scene.py) that takes a scene ID and a prompt and runs the full pipeline in one command. By the end of the build I could generate, review, and ship a new scene in about three minutes of attention.

The off-Earth lesson (the Apollo 11 mistake)

Early in the library expansion I shipped a handful of scenes that turned out to be unplayable. The most egregious was Apollo 11 — the lunar surface, Buzz Aldrin in the foreground, the lander in the back. Visually stunning. Historically iconic. Completely unguessable.

The bug: the scoring math assumes Earth as the substrate. You drop a pin on a flat map of Earth. The scoring distance is measured along Earth's surface. There is no pin you can drop that scores well for "the Moon, 1969." The scene was guaranteed to give every player a near-zero distance score regardless of how well they recognized the event.

I pulled the Apollo 11 scene on 2026-05-07, along with the other off-Earth offenders, and added a hard rule to the scene-drafting skill: every scene must occur on Earth — surface, atmosphere, or oceans. The rule is now enforced at draft time, before any image generation happens. Cool ideas get refused.

Lesson: The scoring math is a constraint, not a feature. Any scene that breaks the constraint breaks the player's experience even if the scene itself is great. Cool ≠ playable.

The monochrome trap (the B&W mandate)

The next day, 2026-05-08, I shipped two more scenes that needed reshooting for a different reason. The VJ Day kiss in Times Square. Lunch atop a skyscraper. Both rendered in black-and-white.

The image model wasn't wrong, exactly. The famous reference photos for those moments — Alfred Eisenstaedt's VJ Day, Charles Ebbets's Lunch Atop a Skyscraper — are monochrome. The model trained on those references and produced what looked like them. The trouble is Where in Time isn't a documentary recreation. It's a cinematic placement of the player inside the scene as if they were there. The actual Eisenstaedt photograph was monochrome because mid-century photojournalism was. The actual moment was in color.

You wouldn't have seen Times Square in black-and-white. You'd have seen the green of the sailors' uniforms, the red lipstick on the nurse, the yellow cabs in the background.

So on 2026-05-08 I added a full-color mandate to the skill: every panorama must render in full color, even when the source reference photo is famously monochrome. The skill enforces this via an explicit color directive in the visual-style block AND a B&W ban in the negative prompt. Both scenes were reshot in color. Iwo Jima and the Robert Capa D-Day photos and the Civil War daguerreotypes all got the same treatment.

Lesson: Image models reproduce reference style, not reference reality. If your product needs reality, you have to say so in the prompt — every time, in every layer.

The scoring math derivation

This was the design problem I lost the most sleep over.

Each round scores two dimensions: distance (kilometers off) and year (years off). Both use Gaussian decay — score falls off smoothly with miss, no hard cutoffs. The interesting question is what sigma to use for the year score.

A 50-year miss in 2000 BCE shouldn't score the same as a 50-year miss in 1980. The historical record is denser in the modern era. Fifty years off in 1980 is three regimes, the Internet, the end of the Cold War. Fifty years off in 2000 BCE is dynastic noise.

So the year sigma is calibrated per era:

Worked example: Hammurabi's Babylon, ~1792 BCE, Ancient pool. You guess 1850 BCE. That's 58 years off. 58 ÷ 500 ≈ 0.12σ, so the Gaussian score is exp(-(0.12²)/2) ≈ 0.99 — you keep ~99% of the max year-points. Now imagine the same 58-year miss in the Modern pool (σ ≈ 25): 58 ÷ 25 ≈ 2.3σ, score is exp(-(2.3²)/2) ≈ 0.07 — you keep ~7%.

Same numerical miss. Wildly different scores. That's the per-era calibration earning its keep.

Distance scoring uses a similar Gaussian curve, calibrated to give meaningful credit for pinning the right continent even if the country is wrong. The distance sigma is constant across eras — Earth hasn't moved.

(There's a fuller treatment with more worked examples on the /how-it-works/ page.)

The moderation gauntlet

I'm going to be slightly cagey here because the constraints are still evolving, but the broad strokes:

Generating cinematic panoramas of real historical events sits in a moderation no-go zone for content categories you'd expect — graphic violence, mass-casualty events, identifiable real people in compromising frames. Most of those refusals are correct. A small number are friction. Refining the scene prompt — pulling the camera back, emphasizing landscape over close-up, framing the event by its setting rather than its bodies — gets most refused prompts through on the second or third try.

One firm self-imposed line: no Nazi imagery and no Holocaust-victim panoramas, ever. A moderation block on those categories from the image provider is a one-strike rule for the project as a whole; the account is identity-verified and a single ban kills Where in Time forever. Better to never approach the line than to find out where it is.

Three things I'd do differently

  1. Calibrate difficulty from real player data, not AI guesses. The AI difficulty rank is decent but lossy. A scene the AI calls "easy" might be unplayable for someone outside the cultural reference frame; a scene it calls "hard" might be obvious to a history nerd. The right answer is to instrument actual play and adjust the difficulty rank against observed median scores. That data didn't exist on day one, so the AI-ranked difficulties are the v1 placeholder. Re-ranking against real data is on the v1.1 list.
  2. Build the era-pool architecture earlier. The first 50 scenes I generated were all over the timeline with no era buckets. When I tried to wire up the per-era scoring sigma, I had to retrofit every existing scene into a bucket. Should have decided the eras and the boundaries upfront — would have saved a chunky refactor and gotten the scoring math live faster.
  3. Write the moderation skill rules before generating, not after. The Earth-only rule and the full-color mandate both got added after I shipped scenes that violated them. Cheaper to enumerate constraints in the drafting skill on day one than to discover them by shipping broken scenes. I now have a checklist that runs at the prompt stage; should have built it at the start.

What's next

The v1 ship is Daily mode only — five scenes a day, free, no signup. Casual+ (era-pick + unlimited 5-round sessions, paid) is the next major feature; the backend scaffolding is shipped and feature-flagged off until the LLC and Stripe live-mode flip.

After Casual+:

If you played

If you played the daily today and have a thought on the scoring curve, scene difficulty, or what era you'd most want to see expanded — I'd love to hear it. contact@whereintime.ai. Or DM on Instagram. It's a solo project; direct emails are read.

Thanks for reading.