Snapshot vs CDC vs Full-Load — Gunabhiram Billa

01 — Pattern ASnapshot

You take a full picture of the source table at regular intervals — once an hour, once a day — and store each picture. The warehouse becomes a sequence of states over time, and you reason about change by diffing snapshots.

Strengths: Simple. No source-side instrumentation required. Works on any system you can SELECT from. Captures soft deletes naturally.

Weaknesses: Storage grows linearly with snapshot frequency. You lose the granularity of when a change happened — only that something differed between two snapshots. Bandwidth-heavy on large tables.

When it shines: Slowly-changing reference data, configuration tables, customer-managed lookups. Anything where "show me what it looked like last Tuesday" is a useful question.

02 — Pattern BChange Data Capture

The source emits an event for every row-level change — insert, update, delete — and your pipeline applies those events incrementally. The warehouse stays in sync with the source, often within seconds.

Strengths: Low bandwidth (only deltas move). Fine-grained timing — you know exactly when each change happened. Near-real-time freshness possible. Naturally compatible with SCD Type 2.

Weaknesses: Requires source-side support (binlog, write-ahead log, or application-level event emission). Operational complexity — out-of-order events, replays, and schema drift all need handling. One missed event is silent corruption.

When it shines: High-velocity transactional data, where freshness matters and full snapshots are too expensive. Audit-trail-grade history requirements.

CDC is the right answer until you realize you have to operate it. Then full-load looks tempting again.

03 — Pattern CFull-Load

Every run truncates the target and copies the source in its entirety. No deltas, no watermarks, no merge logic.

Strengths: The simplest possible model. Always self-consistent — there is no "drift" because every run is a fresh start. Perfect for small reference tables.

Weaknesses: Doesn't scale with table size. Bandwidth and compute proportional to total table size, not change rate. You lose all history unless you wrap it in snapshot-style captures yourself.

When it shines: Small tables (< few million rows). Reference data with no inherent history. Cases where "always reflects the source as of now" is the only requirement.

04 — DecisionHow to choose

Choose Snapshot when

Source has no CDC support · table is too large for full-load but slow-changing · history of state-at-a-time is genuinely useful · you want simplicity over freshness.

Choose CDC when

Freshness is a product requirement (minutes, not hours) · source supports it natively · you have engineering capacity to operate it · history of every change matters.

Choose Full-Load when

Table is small enough that full reload is cheap · history isn't needed · you value simplicity and self-consistency above all else.

Choose hybrid when

You have a mix — most warehouses do. Use full-load for reference data, snapshot for slowly-changing dimensions, CDC for high-velocity facts. Pick per table, not per project.

→ The trap

The most common mistake is choosing CDC for every table because "real-time is better." It's not better — it's faster. If nobody is reading the data more than once a day, CDC's cost (operational and engineering) buys you nothing the snapshot approach wouldn't.

05 — TakeawaysMatch the pattern to the data

None of these patterns is universally correct. The right answer for a single warehouse is usually all three, applied per table. The wrong answer is picking one and forcing every table into it.

Pattern follows the data, not the project. Different tables justify different patterns.
CDC is powerful but expensive to operate. Use it where freshness matters; not as a default.
Snapshots are underrated. They're simple, they capture soft deletes, and they give you point-in-time views.
Full-load is fine for small tables. Don't over-engineer reference data.

The discipline is in matching the pattern to the table — and being willing to revisit the choice when a table grows past its original assumptions.