Designing a Multi-Tenant Healthcare Data Warehouse in Snowflake

01 — ProblemOne warehouse, many customers

When you serve a B2B SaaS product into healthcare, your warehouse is rarely just your warehouse. It's the foundation under every customer-facing dashboard, every renewal conversation, every operational alert. And it has to be all of those things — for many customers — without ever leaking data across boundaries.

The warehouse I inherited had grown organically. Each customer's analytics needs were met with bespoke views and one-off SQL. Cross-tenant benchmarking was impossible without manually joining customer-specific tables. Onboarding new customers required engineering work, not configuration. And the lurking risk — quietly visible to anyone who knew where to look — was that nothing structurally prevented one tenant's row from being returned to another.

⚠ The risk

In multi-tenant analytics, the worst-case bug is silent. A query returns extra rows. A dashboard shows a number that's wrong by 3%. Nobody notices until someone does — and at that point, trust is already gone.

02 — ArchitectureThe layered architecture

The redesign was anchored on three layers, each with explicit contracts:

Staging — exact mirrors of source systems, no transformations, no joins. The job here is fidelity.
Intermediate — business logic applied. Conformed dimensions built. SCD Type 2 history tracked. Tenant ID propagated. The job here is correctness.
Marts — semantic exposure, denormalized for dashboard performance. Tenant-isolated views built on top of conformed facts. The job here is speed.

Fig. 02 · Each layer has one job. Crossing layers without going through the right contract is a code smell.

The temptation in any growing warehouse is to skip the middle layer — to let dashboards reach into staging when the business logic is "obvious." Resist this. Every shortcut that bypasses the intermediate layer is a place where business rules drift between consumers.

03 — TenancyThree patterns for tenancy

You have roughly three options for serving multiple tenants from one warehouse:

① Database per tenant

Hardest to leak across, but operationally expensive. Schema migrations multiply. Cross-tenant analytics require external joins. Reasonable when tenants need radically different schemas.

② Schema per tenant

Middle ground. Easier to manage than DB-per-tenant. Cross-tenant queries possible but ugly. Schema drift between tenants becomes a real risk.

③ Tenant column + row policies

One physical table for each fact, with tenant_id propagated and a row-access policy enforcing isolation. Cleanest for benchmarking, easiest to maintain. This is what I chose.

Hybrid

Shared dimensions in row-policy mode; tenant-specific marts in schema-per-tenant mode for customer-facing exposure. Pragmatic for most cases. Used selectively here for customer-export marts.

The decision hinges on what you optimize for. Going row-policy-first lets the warehouse benchmark, aggregate, and govern in one place — at the cost of getting the row-access policy exactly right, every time. There is no margin for error there.

There is no "mostly correct" in tenant isolation. The policy either holds for every query, or it doesn't.

04 — ConformedConformed dimensions across tenants

Some entities are tenant-private (a customer's own provider list). Others are shared and conformed across tenants (calendar, geography, ICD-10 codes, payer lookups). Mixing these without discipline is how drift creeps in.

The rule I settled on: shared dimensions are owned centrally and live in a single schema. Tenant-specific overrides go into a parallel tenant_overrides table that joins back via tenant ID. Reports always join through a view that applies overrides where present and falls back to the shared dimension otherwise.

conformed_dim_provider.sql

-- Conformed dimension with tenant overrides
CREATE OR REPLACE VIEW dim_provider AS
SELECT
  COALESCE(o.provider_key, b.provider_key)   AS provider_key,
  COALESCE(o.display_name, b.display_name)   AS display_name,
  COALESCE(o.specialty,    b.specialty)      AS specialty,
  b.tenant_id,
  b.effective_from,
  b.effective_to,
  b.is_current
FROM dim_provider_base b
LEFT JOIN dim_provider_tenant_overrides o
  ON  b.provider_key = o.provider_key
  AND b.tenant_id    = o.tenant_id;

This pattern keeps the shared schema canonical while letting individual tenants rename, recategorize, or hide entries without forking the dimension.

05 — GovernanceIsolation enforced at the platform

Code is fallible. Reviews miss things. The only durable answer is to enforce isolation at a layer below the SQL — at the warehouse itself.

In Snowflake, this means:

Row-access policies on every fact table, keyed off the calling role's tenant claim
Tag-based masking on PHI columns so dev and staging environments never see real data
Role hierarchy built top-down: tenant roles inherit from a base read-role with the row policy applied; admin roles bypass it explicitly and audibly
Warehouse grants separated from schema grants so compute access can be revoked without restructuring data permissions

→ Test it like a security control

Write tests that try to violate the policy. Pick a query, run it as Tenant A, capture the count. Run it as Tenant B, capture the count. Run it as the platform role with no tenant — capture the union. The first two must never overlap. The third must equal their sum.

06 — TakeawaysWhat this bought us

This architecture became the foundation for 60+ multi-tenant customer dashboards and dozens of internal analytical use cases. Onboarding a new customer is now a config change, not a code change. Cross-tenant benchmarking — once a manual spreadsheet exercise — runs as a single query. Drift between consumers stopped, because there is one place where business logic lives.

If I had to compress what I learned into a few lines:

Pick one tenancy pattern and apply it everywhere. Hybrid is fine, but only by exception, and only with documentation.
Make tenant_id structural, not a convention. It belongs in primary keys, indexes, and access policies — not just SELECT lists.
Conformed dimensions are organizational design, not just data design. Decide who owns them before you build them.
Test isolation like a security boundary, because that's what it is.

The dashboards on top of this warehouse are the visible product. The architecture under them is the reason the dashboards are trusted. That's a trade-off I'll always take.

01 — ProblemOne warehouse, many customers

02 — ArchitectureThe layered architecture

03 — TenancyThree patterns for tenancy

① Database per tenant

② Schema per tenant

③ Tenant column + row policies

Hybrid

04 — ConformedConformed dimensions across tenants

05 — GovernanceIsolation enforced at the platform

06 — TakeawaysWhat this bought us

Building Idempotent ETL Pipelines in Snowflake + Python

Designing a Reconciliation Layer Between CRM, Billing & Operational Systems