GUNABHIRAM · BILLA · DATA · ENGINEER · NYC · 2026

WITH impact AS (
  SELECT project_id,
    'production' AS status,
    measurable_value
  FROM portfolio.systems
  WHERE built_with_care = 'true'
)
SELECT * FROM impact;

NYC · LIVE · --:--:-- EST ● Available

The pipeline
ran at 3 a.m.
nobody noticed. — that's the goal.

$ I'm Gunabhiram Billa — a Data & Analytics Engineer in New York. I build the unglamorous infrastructure underneath your dashboards: idempotent pipelines, multi-tenant warehouses, and reconciliation layers that catch mismatches before finance does. 50+ production systems shipped. 60+ multi-tenant dashboards live. No demos that pass once and break Tuesday.

See the case files → Resume ↓ $ say_hi

Production Projects

Customer Dashboards

Reporting Time Saved

Technical Articles

// Engineer_Profile.jpg

SNOWFLAKE · PYTHON · SQL · ANALYTICS

Snowflake Python Automation Dimensional Modeling Healthcare Analytics Idempotent Pipelines KPI Frameworks Reconciliation Systems Operational Clarity Multi-Tenant Architecture Snowflake Python Automation Dimensional Modeling Healthcare Analytics Idempotent Pipelines KPI Frameworks Reconciliation Systems Operational Clarity Multi-Tenant Architecture

// 01 · who

A builder of quiet
infrastructure.

RoleData & Analytics Engineer

BasedNew York City

StatusOpen to hybrid / remote

StackSnowflake · Python · SQL

DomainHealthcare operational analytics

Shipped50+ projects · 60+ dashboards

DirectionSenior data system ownership

Currently Designing layered Snowflake architectures for multi-tenant healthcare analytics — and writing about the patterns along the way.

I work in the unglamorous middle layer — between the source systems that emit data and the dashboards that consume it. Semantic layers, KPI frameworks, idempotent pipelines. The kind of code finance, sales, and CS teams quietly depend on without ever opening a ticket about.

Across 50+ production projects, I've designed multi-tenant warehouses serving 60+ customer-facing dashboards, automated reporting that turned two-day Mondays into 8 a.m. emails, and reconciliation layers that catch mismatches before finance closes the books.

I care about idempotency, dimensional discipline, and cross-system reconciliation — the foundations that make the dashboards on top actually trustworthy. If a pipeline ran at 3 a.m. and nobody noticed, that's the goal.

// how I work

I write more SQL comments than anyone I've worked with.
I document like the next engineer is a stranger — because eventually they will be.
I'd rather ship a smaller correct thing than a bigger fragile one.
I treat reconciliation like a security control, not a nice-to-have.

// 02 · stack

What I reach for.

A working toolkit, sharpened across data warehousing, automation, and cross-system integration in healthcare operational analytics. No boilerplate languages I've touched once on a tutorial. Tap a category to explore.

// 01

SQL Mastery

Complex joins & CTE pipelines
Window functions & analytics
Query plan analysis
Performance & scan reduction
Procedural SQL & scripting

// 02

Python Automation

Snowflake connector workflows
Pandas & openpyxl reporting
Word/PDF document generation
API ingestion & pagination
Scheduled job orchestration

// 03

Pipeline Engineering

Idempotent incremental loads
Watermark & control tables
Merge upsert patterns
Late-arriving data handling
Recovery & retry logic

// 01

Snowflake

Warehouse design & sizing
Role/grant/secure-share patterns
Query optimization & clustering
Credit metering & cost tuning
Streams, tasks, & materialized views

// 02

BI & Visualization

Tableau dashboards
Amazon QuickSight (embedded)
Excel automation & charting
Multi-tenant dashboard frameworks
Semantic layer design

// 03

Integrations & Cloud

AWS — S3, API logs, IAM
Zoho CRM / Desk integrations
QuickBooks reconciliation
REST API ingestion
Git / GitHub workflows

// 01

Dimensional Modeling

Star & snowflake schemas
SCD Type 1 & Type 2
Conformed dimensions
Bridge & junk tables
Grain definition discipline

// 02

Multi-Tenant Architecture

Tenant-aware fact tables
Row-level isolation patterns
Cross-tenant benchmarking
Shared dimension governance
Customer-specific overrides

// 03

Layered Transformations

Staging → intermediate → mart
Snapshot vs CDC vs full-load
Behavioral fact patterns
Time-aware joins
Semantic exposure layers

// 01

Operational Analytics

Funnel & drop-off analysis
Provider productivity metrics
Visit lifecycle tracking
Engagement & adoption KPIs
Customer health scoring

// 02

Financial Reconciliation

CRM ↔ billing alignment
FTE thresholds & classification
Daily reconciliation snapshots
Exception reporting views
Audit trail design

// 03

Cross-Functional Delivery

Stakeholder requirement capture
ROI framework design
Renewal & upsell analytics
Executive reporting cadence
Documentation & handoff

// 03 · work · 10 entries

Case files.

Real systems I've shipped — the architecture that held, the bugs I traced through three layers, the trade-offs that mattered. Selected from a portfolio of 50+ production projects. Click any row to expand.

/01

Automated KPI Reporting Pipeline

SnowflakePythonSQLWord/PDF

Fully automated extraction → templated Excel charts → Word reports → PDF distribution. ~80% reduction in manual reporting time.

The Challenge

Customer Success was spending two full days every week assembling a recurring KPI report — pulling SQL exports, copying values into Excel templates, refreshing charts manually, then re-typing summaries into a Word doc and exporting to PDF. The same humans were doing the same clicks every Monday.

The Approach

Designed a parameterized Python pipeline that reads customer config from a control table, runs templated Snowflake queries, populates an Excel workbook (preserving formatting and pivot logic), and injects the results plus auto-generated narrative into a Word template via python-docx. Final step exports to PDF for leadership distribution.

Technical Details

Parameterized SQL with config-driven date windows and tenant filters
openpyxl for chart-preserving Excel population
python-docx with bookmarked placeholder injection
Reusable across 30+ customers with no code changes
Scheduled via cron with email distribution on success

~80%

Time saved

30+

Customers

Weekly

Cadence

/02

Customer ROI Metrics Engine

SnowflakeSQLExcel

Standardized ROI framework modeling intake, scheduling, reminders, and engagement. Backed renewal & upsell with measurable product value.

The Challenge

The Sales and Customer Success teams couldn't quantify product value at renewal conversations. Each rep was inventing their own ROI math, leading to inconsistent numbers across customers — and no defensible framework when leadership asked "what's the upsell story?"

The Approach

Built a unified Snowflake-backed ROI model that quantifies four pillars: intake completion lift, scheduling adoption, reminder effectiveness (no-show reduction), and engagement uplift. Each pillar maps to specific time and revenue assumptions configurable per customer segment.

Technical Details

Per-customer baseline measurement (pre-product window)
Configurable assumption layer for time/cost values
Versioned metric definitions in a governance table
Excel deliverable with executive summary and detail tabs
Used by Sales, CS, and Product strategy teams

ROI pillars

100%

Renewal coverage

All

Customers

/03

Multi-Tenant Data Warehouse Architecture

SnowflakeSQLSCD II

Fact/dimension models, time-aware joins, layered staging→intermediate→mart. Improved query performance and tenant-aware governance.

The Challenge

Multiple customers, shared infrastructure, divergent needs — and a warehouse that had grown organically into a tangle of one-off views. Cross-tenant benchmarking was unsafe, performance was inconsistent, and onboarding a new customer required custom code rather than configuration.

The Approach

Redesigned as a layered architecture: staging mirrors source systems, intermediate handles business logic and conformed dimensions, marts expose tenant-isolated semantic models. SCD Type 2 captures historical attribution. Tenant ID propagates through every fact for safe row-level isolation.

Technical Details

Three-layer architecture with clear contracts between layers
SCD Type 2 with effective_from / effective_to / is_current
Conformed dimensions shared across tenants
Row-access policies for tenant isolation
Foundation for 60+ multi-tenant dashboards

60+

Dashboards on top

Layers

Type 2

Historical SCD

/04

Idempotent Incremental ETL Framework

SnowflakeSQLPython

Watermark tracking, merge upserts, late-arriving data validation, control tables for recovery. Daily pipelines hardened for safe reruns.

The Challenge

Daily pipelines occasionally failed mid-run or produced duplicates after partial recovery. Engineers had to manually clean state before reruns, and late-arriving source records silently corrupted aggregates. Trust in the data was eroding.

The Approach

Designed an incremental framework around three principles: deterministic watermark tracking, merge-based upserts (never plain inserts), and explicit validation gates. Every run logs to a control table; every failure can be safely retried without manual cleanup.

Technical Details

Watermark column tracked per source-target pair
MERGE statements with deterministic match keys
Validation queries for duplicates and gaps
Late-arriving record detection & backfill logic
Control tables track state for surgical recovery

100%

Idempotent

Daily

Cadence

~min

Recovery time

/05

Cross-System Data Quality & Reconciliation

SnowflakeSQL

Automated mismatch detection across CRM, billing, and ops systems with daily reconciliation snapshots. Improved trust in financial reporting.

The Challenge

CRM said one number of active contracts. Billing said another. The operational system said a third. Finance was reconciling these by hand each month-end, and discrepancies often weren't surfaced until they had compounded across periods.

The Approach

Built a reconciliation layer that joins all three systems on canonical entity keys and applies deterministic mismatch rules. Every mismatch is logged with a category, severity, and recommended action. Daily snapshots track drift over time, with exception reports surfaced to operations.

Technical Details

Canonical entity resolution across three systems
Deterministic mismatch classification rules
Daily snapshot fact for drift monitoring
Severity scoring and routing logic
Structured exception views for operations

Systems

Daily

Snapshots

↑

Finance trust

/06

Snowflake Cost & Query Optimization

SnowflakeSQL

Workload segmentation (ETL / BI / ad hoc), right-sizing, auto-suspend policies, full-scan reduction. Long-term compute cost predictability.

The Challenge

Snowflake credit consumption was growing faster than usage justified. A single oversized warehouse was running every workload, ad-hoc queries were doing full table scans, and no one had visibility into which queries were the actual cost drivers.

The Approach

Mined query history and warehouse metering to attribute cost by workload. Split into purpose-sized warehouses (small for BI, medium for ETL, X-small for ad hoc), enforced auto-suspend, and rewrote the top expensive queries to use clustering and pruning. Built an internal cost dashboard for ongoing visibility.

Technical Details

Query history mining & cost attribution
Workload segmentation across multiple warehouses
Auto-suspend tuning per workload pattern
Clustering keys and pruning for top queries
Ongoing cost monitoring dashboard

↓

Credit burn

↑

Predictability

Workload tiers

/07

UrgentIQ Visit Status & Event Analytics

PythonSQLSnowflakeAPI

Event-driven pipeline, status timeline normalization, transition-duration analysis, configurable schema mapping. Eliminated manual lifecycle analysis.

The Challenge

Visit lifecycle data lived in a noisy event stream from the UrgentIQ API. Some statuses were skipped, some were emitted out of order, and timestamps were UTC while reporting expected EST. Operations couldn't answer basic questions like "how long are visits stuck in checked-in status?"

The Approach

Built an incremental ingestion pipeline that normalizes status timelines (1 → N), gap-fills missing transitions with deterministic rules, standardizes timezone, and computes transition durations. Output is a clean lifecycle fact ready for funnel and bottleneck analysis.

Technical Details

API pagination & incremental ingestion
Status timeline gap-filling logic
UTC → EST timestamp normalization
Transition duration computation per status pair
Deduplication & configurable schema mapping

Manual analysis

↑

Visibility

Reusable

Framework

/08

Provider Count & Billing Reconciliation

SQLExcelQuickBooks

Aligned operational provider counts with CRM contracts and billing records via FTE-threshold logic. Reduced underbilling risk.

The Challenge

Customers were billed based on provider counts, but "provider" meant different things in different systems. Some part-time providers were counted as full FTEs, some full-time providers were missed entirely, and contract terms specified thresholds nobody was actively checking.

The Approach

Designed a reconciliation workflow that joins operational provider activity with CRM contract terms and QuickBooks billing records. Applied FTE thresholds (active hours, encounter volume) to classify each provider into a billing category. Standardized monthly reconciliation checkpoints.

Technical Details

Three-system join: ops, CRM, billing
FTE threshold logic with multiple inputs
Provider classification into billing tiers
Monthly reconciliation checkpoint cadence
Exception reports for review by finance

↓

Underbill risk

Monthly

Cadence

↑

Accuracy

/09

SMS Optimization & Number Rationalization

SQLExcel

Standardized templates at ~150-char threshold, disabled 300+ inactive numbers. Lower monthly SMS spend, improved consistency.

The Challenge

SMS costs were inflated by two avoidable issues: messages tipping over the 160-character single-segment threshold (doubling per-message cost) and a fleet of 300+ phone numbers that were billed monthly but no longer in use. Nobody had audited either in years.

The Approach

Audited message content across templates, identified those tipping over the segment boundary, and rewrote them under a 150-character target with a margin for personalization variables. Cross-referenced active phone numbers against actual outbound traffic and decommissioned the inactive ones.

Technical Details

Segment count analysis across all templates
~150-character optimization threshold
Number-to-traffic cross reference
300+ inactive numbers identified & disabled
Standardized template review process

300+

Numbers cut

↓

Monthly cost

~150

Char target

/10

EMR API Usage & Cost Analysis

SQLAPI Logs

Six-month analysis surfacing high-volume endpoints and redundant call patterns. Supported engineering API cost governance.

The Challenge

EMR vendor API costs were rising without a clear understanding of what was driving usage. Were certain endpoints inefficient? Were features double-fetching data? Engineering needed evidence before making architectural changes.

The Approach

Mined six months of API logs to surface usage patterns: top endpoints by volume, redundant call chains, customer-level cost attribution, and timing correlations with downstream features. Output was a structured report engineering used to prioritize caching and batching work.

Technical Details

Six months of log analysis
Endpoint volume and cost ranking
Redundant call chain detection
Customer-level cost attribution
Optimization recommendations to engineering

6mo

Log window

↓

Wasted calls

↑

Governance

/11

dbt-Based Analytics Engineering Framework

dbtSnowflakeJinjaYAML

Migrated legacy SQL views to dbt with tests, docs, and CI/CD. Replaced tribal knowledge with verifiable contracts.

The Challenge

Hundreds of business-critical views had grown over years with no tests, no documentation, and no consistent style. Logic was duplicated across views, dependencies were implicit, and a change in one place could quietly break a dashboard halfway across the warehouse.

The Approach

Migrated transformations to dbt with a layered staging → intermediate → mart structure. Added schema tests for uniqueness, referential integrity, and accepted values. Generated docs auto-publish on every merge. CI runs the full test suite before any change reaches production.

Technical Details

Layered model architecture with explicit DAG
Schema tests, custom tests, and freshness checks
Macros for reused logic (tenant filters, SCD merges)
Auto-generated docs with column-level lineage
GitHub Actions CI for slim builds on PRs

200+

Models migrated

100%

Tested coverage

Auto

Docs & lineage

/12

Near-Real-Time Event Ingestion (Kafka → Snowflake)

KafkaSnowpipePythonStreams

Sub-minute event ingestion with quality monitoring and out-of-order handling. Powered live operational dashboards with no batch lag.

The Challenge

Operational teams needed visibility into events as they happened — not 24 hours later. The existing batch pipeline ran nightly, which meant by the time an issue showed up on the dashboard, the day was over. Switching to streaming meant rethinking watermarks, ordering, and failure recovery.

The Approach

Built a Kafka-to-Snowflake pipeline using Snowpipe Streaming for sub-minute landing, then Snowflake Streams + Tasks for incremental transformation. Out-of-order events handled via watermark tolerance windows. Quality metrics (lag, dropped events, schema drift) emitted to a monitoring dashboard.

Technical Details

Snowpipe Streaming with row-level latency under 60s
Streams + Tasks for incremental aggregation
Watermark tolerance for out-of-order events
Schema evolution handling without pipeline restart
Lag, drop, and drift metrics surfaced to ops

<1m

End-to-end lag

24/7

Quality monitor

Manual recovery

/13

RAG Data Quality Framework for Clinical Knowledge Retrieval

PythonVector DBSnowflakeEmbeddings

Source-data validation layer that catches stale, ambiguous, or contradictory content before it reaches an LLM. Reduced hallucination risk in retrieval-augmented systems.

The Challenge

RAG systems are only as trustworthy as the documents they retrieve from. Stale clinical guidelines, contradictory sources, and ambiguous procedure documentation all silently corrupt LLM answers downstream — with no easy way to detect the failure after the fact.

The Approach

Designed a quality framework that runs on the source corpus before embedding: freshness checks (last-updated thresholds), deduplication (semantic similarity clustering), contradiction detection (cross-document NLI scoring), and provenance tagging. Documents that fail are quarantined; the rest enter the vector index with quality metadata attached.

Technical Details

Document freshness scoring with configurable thresholds
Semantic dedup via embedding similarity clusters
Cross-document contradiction detection
Provenance metadata propagated to retrieval layer
Quarantine queue for human review

↓

Hallucination risk

↑

Source trust

Auto

Quality gates

// 04 · principles · 5 rules

How I think.

Opinions earned shipping data systems. Each one has a scar behind it — a 3 a.m. page, a quarter-end mismatch, a dashboard that lied for two months before anyone noticed. I write them down so I don't forget.

01 · grain

// design

If you can't state the grain in one sentence, the model isn't ready.

Most analytics bugs aren't typos. They're cardinality surprises hiding in a vague join. Pin the grain first, write SQL second. Twenty minutes of sketching saves four hours of debugging.

02 · idempotency

// pipelines

A pipeline that needs human cleanup to retry is not a pipeline.

It's a script with a person attached. Watermarks, merge upserts, control tables — make rerunning safe by construction. The first time a 3 a.m. failure becomes a 3:02 a.m. retry, you'll know it's working.

03 · isolation

// multi-tenant

There's no "mostly correct" in tenant isolation.

The row policy either holds for every query or it doesn't. Test isolation like a security control — write queries that try to violate it and confirm they can't. The worst-case bug is silent.

04 · reconciliation

// governance

Reconciliation isn't a project. It's a control plane you operate.

Run it daily, review weekly, audit monthly. Drift compounds; cadence catches it early. By month-end, three systems disagreeing has cost real money. Make every mismatch visible within 24 hours.

05 · documentation

// craft

The next engineer is a stranger.

Write SQL, comments, and docs like the person debugging this at 3 a.m. has zero context. Eventually they will. That stranger might be future you. The empathy is the same.

// 05 · writing · 8 essays

Field notes.

Technical writing on data modeling, architecture decisions, automation patterns, and governance. Most of these are post-mortems on something that broke first.

Architecture

// 06 · contact · door's open

Still here?
Let's build something durable.

$ say_hi — gunabhiram.billa@gmail.com

LinkedIn ↗ GitHub ↗ Email ↗

A builder of quietinfrastructure.

What I reach for.

SQL Mastery

Python Automation

Pipeline Engineering

Snowflake

BI & Visualization

Integrations & Cloud

Dimensional Modeling

Multi-Tenant Architecture

Layered Transformations

Operational Analytics

Financial Reconciliation

Cross-Functional Delivery

Case files.

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

The Challenge

The Approach

Technical Details

How I think.

If you can't state the grain in one sentence, the model isn't ready.

A pipeline that needs human cleanup to retry is not a pipeline.

There's no "mostly correct" in tenant isolation.

Reconciliation isn't a project. It's a control plane you operate.

The next engineer is a stranger.

Field notes.

Designing a Multi-Tenant Healthcare Data Warehouse in Snowflake

Building Idempotent ETL Pipelines in Snowflake + Python

Snowflake Credit Optimization at Scale

Snapshot vs CDC vs Full-Load: Practical Tradeoffs in Modern ELT

Designing Fact & Dimension Models for Behavioral Analytics

Designing a Reconciliation Layer Between CRM, Billing & Operational Systems

My Workflow as an Analytics Engineer: Complex Data → Reliable Insights

Reducing SMS Message Segments to Save $$$ — A SQL Optimization Story

Still here?Let's build something durable.

A builder of quiet
infrastructure.

Still here?
Let's build something durable.