01 — Pattern ASnapshot
You take a full picture of the source table at regular intervals — once an hour, once a day — and store each picture. The warehouse becomes a sequence of states over time, and you reason about change by diffing snapshots.
Strengths: Simple. No source-side instrumentation required. Works on any system you can SELECT from. Captures soft deletes naturally.
Weaknesses: Storage grows linearly with snapshot frequency. You lose the granularity of when a change happened — only that something differed between two snapshots. Bandwidth-heavy on large tables.
When it shines: Slowly-changing reference data, configuration tables, customer-managed lookups. Anything where "show me what it looked like last Tuesday" is a useful question.
02 — Pattern BChange Data Capture
The source emits an event for every row-level change — insert, update, delete — and your pipeline applies those events incrementally. The warehouse stays in sync with the source, often within seconds.
Strengths: Low bandwidth (only deltas move). Fine-grained timing — you know exactly when each change happened. Near-real-time freshness possible. Naturally compatible with SCD Type 2.
Weaknesses: Requires source-side support (binlog, write-ahead log, or application-level event emission). Operational complexity — out-of-order events, replays, and schema drift all need handling. One missed event is silent corruption.
When it shines: High-velocity transactional data, where freshness matters and full snapshots are too expensive. Audit-trail-grade history requirements.
03 — Pattern CFull-Load
Every run truncates the target and copies the source in its entirety. No deltas, no watermarks, no merge logic.
Strengths: The simplest possible model. Always self-consistent — there is no "drift" because every run is a fresh start. Perfect for small reference tables.
Weaknesses: Doesn't scale with table size. Bandwidth and compute proportional to total table size, not change rate. You lose all history unless you wrap it in snapshot-style captures yourself.
When it shines: Small tables (< few million rows). Reference data with no inherent history. Cases where "always reflects the source as of now" is the only requirement.
04 — DecisionHow to choose
Choose Snapshot when
Source has no CDC support · table is too large for full-load but slow-changing · history of state-at-a-time is genuinely useful · you want simplicity over freshness.
Choose CDC when
Freshness is a product requirement (minutes, not hours) · source supports it natively · you have engineering capacity to operate it · history of every change matters.
Choose Full-Load when
Table is small enough that full reload is cheap · history isn't needed · you value simplicity and self-consistency above all else.
Choose hybrid when
You have a mix — most warehouses do. Use full-load for reference data, snapshot for slowly-changing dimensions, CDC for high-velocity facts. Pick per table, not per project.
The most common mistake is choosing CDC for every table because "real-time is better." It's not better — it's faster. If nobody is reading the data more than once a day, CDC's cost (operational and engineering) buys you nothing the snapshot approach wouldn't.
05 — TakeawaysMatch the pattern to the data
None of these patterns is universally correct. The right answer for a single warehouse is usually all three, applied per table. The wrong answer is picking one and forcing every table into it.
- Pattern follows the data, not the project. Different tables justify different patterns.
- CDC is powerful but expensive to operate. Use it where freshness matters; not as a default.
- Snapshots are underrated. They're simple, they capture soft deletes, and they give you point-in-time views.
- Full-load is fine for small tables. Don't over-engineer reference data.
The discipline is in matching the pattern to the table — and being willing to revisit the choice when a table grows past its original assumptions.