Data Talks on the Rocks

The Modern Data Stack Is Over. Here’s What’s Next.

Michael Driscoll

Author

December 2, 2025

Date

minutes

Reading time

CONTENTS

Example H2

Example H3

For almost a decade, the Modern Data Stack defined how teams thought about analytics.
Snowflake was the destination.
DBT handled transformation.
FiveTran delivered the data.
Orchestration, governance, and dashboards framed everything else.

It made sense in an era when SQL-first engineers owned the definition of “data,” and when the problem was mostly about standardizing a fragmented landscape.

But as Data Talks on the Rocks host Michael Driscoll explores in this week’s conversation with Matthaus Krzykowski – founder of dltHub – the Modern Data Stack didn’t fall out of favor because something shinier arrived. It fell out of favor because the conditions that made it necessary have fundamentally changed.

And once those conditions shift, the entire model shifts with them.

The Maintenance Burden No One Wants to Admit

Every data engineer knows the quiet truth – a truth Mike surfaces early in the episode:

Most of the job isn’t building pipelines. It’s repairing them.

APIs move.
Schemas drift.
Stale documentation misleads.
Internal services change without warning.
A connector breaks at 2 a.m., and suddenly your “modern” stack looks anything but modern.

The diagrams always looked clean. The reality never was.

As Matthaus puts it:

“People think they’re choosing tools. What they’re actually choosing is a maintenance burden.”

And as Mike points out, maintenance simply doesn’t scale — especially not in a world where companies now produce thousands of potential data sources and every business unit expects all of them to be captured.

The Modern Data Stack promised order. The real world delivered entropy.

The Python Generation Arrives

In 2018, roughly seven million developers used Python.
Today, that number is twenty-two million.
Within a few years, Matthaus believes it could reach one hundred million.

That shift alone is enormous. But something deeper is happening. The new wave of data practitioners isn’t entering through SQL editors or warehouse modeling. They’re arriving through Python notebooks, lightweight compute engines, and increasingly through AI-powered coding environments like Cursor, Continue, GitHub Copilot, Claude, and ChatGPT.

They’re not writing code the way past generations did. They’re generating it.

And when they ask an AI editor to scaffold a pipeline or clean a dataset or pull from a strange API, the answer that comes back is almost always Python.

What the Modern Data Stack treated as an edge case – irregular sources, semi-structured datasets, bespoke ingestion – is exactly where this new generation starts.

As Mike observes, the center of gravity has moved.

Agents Change the Economics

This is where dlt's story becomes a proxy for something bigger – a point Mike and Matthaus unpack together.

dlt began as a practical Python library for handling the messy parts of ingestion: pagination, normalization, lineage, type inference, all the quiet details that make a pipeline stable over time. It was built for senior data engineers who needed reliability more than magic.

But once AI systems began generating large volumes of Python, dlt quietly became the substrate that made that output safe to run. It absorbed the inconsistency, enforced structure, and gave LLM-generated code predictable semantics — a point Mike continually circles back to.

And then the numbers started to climb.

dlt now supports thousands of data sources — not because the company hired an army of engineers, but because LLMs generate them. Entire pipelines that once took weeks of effort are now scaffolded in minutes, then hardened and deployed through dlt's standardization layer.

As Mike frames it: This isn’t a productivity boost. It’s a structural break.

FiveTran succeeded by building a catalog of stable connectors.
dlt succeeds by embracing the long tail — the sources no product team would ever prioritize.

And as Matthaus notes:

“If pipelines can be generated infinitely, you can’t charge by the pipeline.”

That single sentence marks the end of a core Modern Data Stack business model.

Why Object Storage Is Suddenly the New Warehouse

Another signal Matthaus highlights is where dlt pipelines are landing.

For years, Snowflake and Postgres were dlt's most common destinations. Now, it’s S3, GCS, and Azure Blob.

On the surface, that looks like cost optimization. But the deeper reason is cultural, and Mike draws the line clearly:

Python-first teams prefer file formats like Parquet and CSV.
They lean on Ibis, DuckDB, and Python tooling.
They favor flexible structures over rigid schemas.
And they want AI systems to operate directly on raw or semi-structured data.

The Modern Data Stack relied on structure. The next era relies on adaptability.

Object storage isn’t replacing the warehouse — as Mike puts it — it’s outgrowing it.

What Comes After the Modern Data Stack

Matthaus is careful in this conversation not to project too far into the future. But as Mike draws out in the final section of the episode, the direction is unambiguous.

We are moving from standardized tools to adaptive systems. From hand-written pipelines to pipelines generated by agents. From SQL as the lingua franca to Python as the shared interface for both humans and machines.

The Modern Data Stack was built for a world where data teams were centralized, pipelines were curated, and the edges were manageable.

The world has changed.

Data sources have multiplied.
AI has altered how engineers work.
The bottleneck is no longer tooling – it’s assumptions themselves.

The Modern Data Stack was designed for control. The next era is designed for change.

If you want to understand this shift, this conversation between Michael Driscoll and Matthaus Krzykowski is essential.

Watch the full episode on Data Talks on the Rocks.

Ready for faster dashboards?

Try for free today.

Get started

Related Articles

Python Was Built for Humans. AI Just Changed Everything.

How ClickHouse became one of the fastest-growing databases in the world

The Semantic Layer Problem Nobody Wants to Talk About

Ready for faster dashboards?