Designing Fact & Dimension Models for Behavioral Analytics

01 — FoundationStart with grain

The first decision in any fact table — and the one most people get wrong — is the grain. Grain is the answer to: "what does one row in this fact represent?" Until that's pinned down, every other decision is provisional.

For behavioral data, the grain is usually one row per event. One row per page view. One row per visit status change. One row per transaction. The temptation is to pre-aggregate to "one row per user per day" because it feels smaller and faster. Resist this for the source of truth. You can always aggregate later; you cannot disaggregate what you've already collapsed.

If you can't state the grain in one sentence, the model isn't ready.

02 — StructureDimensions, done right

Every fact joins to dimensions. Good dimensions share three properties:

Wide, not deep. Many descriptive columns, few rows. dim_user with 50 attributes is fine; what matters is that each row is one user.
Conformed across facts. The same dim_user joins to fact_visit, fact_message, fact_payment. One definition of "user" everywhere.
Surrogate keys. An integer user_key, not the natural email. Email changes; surrogate keys don't.

03 — HistorySCD: when, and which type

Slowly Changing Dimensions are how you handle the fact that descriptive attributes change. Two patterns matter most:

SCD Type 1 — Overwrite

Update in place. No history. Simple, fast, fine for attributes where history doesn't matter (current display name, email).

SCD Type 2 — Track history

Add a new row with effective_from / effective_to / is_current. Preserves the full history. Use for attributes where "as of last quarter" is a real question (provider specialty, customer tier, billing plan).

The decision rule: if a metric reported last quarter must remain reproducible after a dimension changes, use Type 2. If it doesn't, Type 1 is fine. Most real-world dimensions need a mix — some Type 1 columns, some Type 2 columns, on the same dimension table.

04 — VariantsThree fact types worth knowing

Transaction facts — one row per event, immutable once written. Most behavioral data.
Periodic snapshot facts — one row per entity per period. Useful for state, not events. (How many active users on this date?)
Accumulating snapshot facts — one row per process instance, updated as the process advances. Useful for funnels with definite start/end states. (Visit lifecycle, order processing.)

Behavioral analytics typically uses transaction facts as the source of truth and builds periodic snapshots on top for dashboard performance. Accumulating snapshots are the right tool when you need to track elapsed time between phases — like "how long are visits spending in checked-in status?"

→ Conformed dimensions are organizational

The hardest part of dimensional modeling isn't SQL — it's getting two teams to agree on what "user" or "customer" means. The schema is downstream of that agreement. Don't try to model around organizational misalignment; surface it and resolve it.

05 — TakeawaysDiscipline over cleverness

Dimensional modeling is unglamorous. There's no novel technology, no impressive architecture diagram. What it gives you is a warehouse where the same question, asked from two different dashboards, returns the same answer. That's the actual product.

Pin the grain first. Everything else follows.
Conformed dimensions across all facts. Inconsistency here is how trust dies.
Mix SCD types within a dimension. Not every column needs history.
Pick the right fact type. Transactional, periodic, or accumulating — they answer different questions.

When the model is right, the BI layer becomes thin. That's the goal: every dashboard becomes a question, not a recomputation.