ETL is Dead, Long Live ELT: Building Modern Data Pipelines

For decades, ETL — Extract, Transform, Load — was the gold standard for moving data. You pulled data from source systems, transformed it in a staging area, and loaded the clean result into a data warehouse. It worked. But the world has changed.

Why traditional ETL is struggling

Traditional ETL was designed for a world where storage was expensive and compute was limited. Transformations happened before loading because you could not afford to store raw data. Today, cloud warehouses like Snowflake, BigQuery, and Redshift offer virtually unlimited storage at a fraction of the cost, with massive parallel compute on demand.

The bottleneck has shifted. The problem is no longer "where do I put all this data?" — it is "how fast can I get raw data into the warehouse so analysts can start working with it?"

The ELT paradigm

ELT flips the order: Extract, Load, Transform. Raw data lands in the warehouse first, and transformations happen inside the warehouse using SQL. This approach has several advantages:

Speed to value — Raw data is available immediately. Analysts do not wait for engineering to build transformation logic before they can explore.
Full data lineage — Because raw data is preserved, you can always rebuild transformations without going back to the source.
Separation of concerns — Ingestion and transformation become independent processes, each with its own tooling and pace of change.

The modern data stack in practice

A typical modern pipeline looks like this:

Ingestion — Tools like Fivetran, Airbyte, or custom Python scripts extract data from APIs, databases, and SaaS platforms and load it into a cloud warehouse.
Transformation — dbt (data build tool) applies version-controlled SQL transformations inside the warehouse, with built-in testing and documentation.
Orchestration — Apache Airflow, Dagster, or Prefect manage dependencies, scheduling, and alerting across the pipeline.
Data quality — Great Expectations or dbt tests validate data at every stage, catching issues before they reach dashboards.

-- Example dbt model: monthly revenue by product
SELECT
    product_id,
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS total_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM {{ ref('stg_orders') }}
WHERE status = 'completed'
GROUP BY 1, 2

Common pitfalls we see

Over-engineering from day one. You do not need Kubernetes, Spark, and a real-time streaming layer for 10 GB of data. Start simple. A cloud warehouse, a managed ingestion tool, and dbt will handle most use cases up to hundreds of gigabytes comfortably.

Ignoring data quality. Pipelines without tests are a ticking time bomb. One schema change in a source API can silently corrupt weeks of reports. Build tests early — they pay for themselves within the first incident they prevent.

Treating pipelines as one-off scripts. Data pipelines are production software. They need version control, CI/CD, monitoring, and documentation. A Jupyter notebook that "just runs" is not a pipeline — it is technical debt.

When to invest in pipeline modernization

If your team spends more time debugging data issues than building new analyses, your pipelines need attention. If analysts wait days for new data sources to be integrated, your architecture is holding the business back.

At BIGCODE, we design and implement data pipelines that are reliable, testable, and maintainable. Whether you are migrating from legacy ETL or building from scratch, we help you get it right from the start.

ETLELTData PipelinesData EngineeringdbtAirflow