Fintech Reporting Dashboard200M rows · 60% faster · sub-second queries
Parquet/Arrow-backed reporting layer with Plaid and Stripe ingestion and reconciliation pipelines, replacing an overloaded Postgres that was spending more time locking than serving.
The diagram, walked through in plain language
- 1Two databases, two jobs
The original Postgres database keeps doing what it's good at: handling live transactions (a customer adds an invoice, payments come in via Plaid and Stripe, etc.). It is the legal source of truth and stays that way.
- 2A copy goes to a 'data lake'
Every change in Postgres is mirrored to a Parquet file lake on Amazon S3 — a format that compresses well and is built for analytics rather than transactions.
- 3Reports query the lake, not the live database
The dashboard talks to DuckDB (a small, fast analytics engine) reading from those Parquet files, with the most recent month held in memory for instant answers.
- 4Late events get handled gracefully
Stripe disputes can arrive 90 days after a charge. A separate pipeline rewrites only the affected days in the lake without breaking older reports or invalidating cached numbers.
- 5The two sides reconcile every 15 minutes
A check runs the same totals on both Postgres (truth) and the Parquet lake (reports). If they ever disagree by more than a rounding cent, the team is alerted before the customer notices.
- 6Customers see the numbers, not the plumbing
The React + Grafana dashboard hits the fast path; nothing about the new architecture changes the bookkeeping the customer sees.
The brief
An SMB-focused fintech had a dashboard that was becoming an embarrassment. Queries for a 90-day P&L took 7 seconds on a good day, 40 seconds on a bad one, and the root cause was structural: they were running analytical aggregations against their OLTP Postgres that was also handling every ingestion write.
They didn't need magic. They needed to stop punishing a transactional database for doing analytics work, without rebuilding the data model from scratch.
The constraints
- System of record stayed on Postgres. Regulatory constraints ruled out moving it.
- Every reported number had to reconcile to the cent against the Postgres ground truth, continuously and automatically.
- Ingestion had to handle Plaid's inconsistent webhook ordering and Stripe's delayed dispute events gracefully.
- Dashboard latency budget: < 1 second for 95% of queries.
- Infra cost had to be roughly flat — this was a cost problem, not a spend-more problem.
The shape we built
Postgres kept its job as the system of record. Alongside it, a Parquet lake on S3, partitioned by tenant and month, maintained by a thin Python pipeline that reads from Postgres's logical replication slot and writes columnar files.
Queries stopped hitting Postgres. The dashboard hits a DuckDB process running against Parquet files, with the hot month held in memory via Arrow. Reconciliation runs every fifteen minutes: the same aggregate computed both places, diffed, alerted if non-zero.
Ingestion got its own reliability layer. Plaid webhooks land in an idempotent inbox; out-of-order events are reordered by timestamp before commit. Stripe disputes, which can arrive 90 days after the original charge, flow through a separate late-arriving events pipeline that rewrites only the affected daily partitions.
What was hard
- Timezone semantics. “Monthly revenue” meant four different things across four different reports. We standardized on account-local midnight-to-midnight, which sounds obvious and took two weeks to sell.
- Reconciliation tolerance. Float arithmetic on a long series of fees produced 0.01-cent differences that were technically real and technically unactionable. We round to the cent at the boundary and log the pre-rounding delta separately.
- Stripe dispute backfill. The late-arriving events pipeline rewrites history, which means caching has to be smarter than “invalidate after 24 hours.” We tag every cached aggregate with the latest source event ID.
What it does today
The dashboard that used to take 7 seconds now answers in 420ms at p95. Median queries are 60% faster than the Postgres baseline across the full reporting surface. The Parquet lake indexes 200M rows across three years of history. Reconciliation runs every 15 minutes against Postgres and has not found a material discrepancy since cutover. Infra spend is within 5% of the pre-migration number.
What I'd do differently
I'd have shipped reconciliation before the new dashboard, not alongside it. Turning on a new reporting path when you can't yet prove its numbers match the old one is a conversation you do not want to have with finance. Once reconciliation was green, everything else felt like paperwork.
- Python · PyArrow · DuckDB
- Plaid · Stripe ingestion
- Postgres (system of record)
- AWS Lambda + S3 (Parquet lake)
- dbt (transformation layer)
- Grafana + custom React dashboard
Continue the tour
Have a similar problem?
If this shape of engagement fits what you're working on, I'd be happy to scope it.