How Snowflake Works · Sachin Kukreja

Running a data warehouse used to mean running a compromise. You estimated your busiest day of the year — the quarterly board report, the month-end close, the Black Friday analytics run — provisioned hardware to survive it, and paid for that hardware every other day of the year. Storage and compute were physically inseparable: one cluster, one shared pool of resources, one queue. The BI team's dashboard refresh slowed down the data science team's model training, which slowed down the ingestion pipeline, which slowed down everyone. The only fix was to add more hardware — which made the idle-time waste even worse.

The three engineers who founded Snowflake in 2012 — Benoit Dageville and Thierry Cruanes from Oracle, and Marcin Zukowski from the Dutch research institute CWI — understood this problem at the source. Their observation was straightforward: cloud object storage had become extraordinarily cheap and durable. If storage no longer needed to live next to compute, why were data warehouses still built as if it did?

Separating storage from compute is Snowflake's founding idea. Everything else — every feature, every pricing decision, every performance characteristic — follows from it.

The Three-Layer Architecture

Snowflake divides into three distinct, independently operating layers. Each layer can scale, fail, or be swapped without affecting the others. Understanding them individually matters less than understanding how they relate — that relationship explains almost every behaviour that makes Snowflake distinctive.

Layer 1 — Cloud Services

Query Optimizer Metadata Store Auth & ACL Transaction Mgmt Result Cache

↕ coordinates query execution

Layer 2 — Virtual Warehouse

BI Team

CPU Memory SSD Cache

Layer 2 — Virtual Warehouse

Data Science

CPU Memory SSD Cache

Layer 2 — Virtual Warehouse

Ingestion Pipeline

CPU Memory SSD Cache

↕ reads micro-partitions

Layer 3 — Storage

μP
001

μP
002

μP
003

μP
004

μP
005

μP
006

μP
007

μP
008

μP
009

μP
010

μP
011

μP
012

· · ·

Amazon S3 Azure Blob Google GCS

Storage

All data in Snowflake lives in cloud object storage — S3, Azure Blob, or GCS depending on your chosen cloud. Snowflake manages this on your behalf; you never interact with the underlying bucket. The data is stored in Snowflake's own compressed, columnar format, similar in structure to Apache Parquet but optimised for Snowflake's query execution.

The fundamental unit is the micro-partition. Every table is divided into micro-partitions: immutable, contiguous files of 50 to 500 MB of compressed data. Within each micro-partition, data is stored column by column. A query reading three columns out of fifty fetches only the bytes belonging to those three columns — nothing else is read from storage.

What makes micro-partitions more than just a storage format is the metadata attached to each one. For every column in every partition, Snowflake records the minimum and maximum value, distinct value count, and null count. This metadata lives in the cloud services layer and is available without touching the data files.

orders table — divided into micro-partitions at ingest

μP 001 Jan 1–3

μP 002 Jan 3–5

μP 003 Jan 5–7

μP 004 Jan 7–9

μP 005 Jan 9–11

μP 006 Jan 11–13

· · ·

Metadata for μP 001 — queryable without reading the data file

rows 42,819

compressed size 128 MB

event_date range Jan 1 → Jan 3

user_id range 1,001 → 99,847

amount range $0.99 → $4,299

null counts all columns: 0

When a query arrives with a filter like WHERE event_date = '2024-01-07', the query optimiser checks this metadata across every partition and skips any partition whose date range does not include that value — before reading a single byte of actual data. On large tables with ordered data, this pruning can eliminate over 99% of the scan. No manual indexing required; it happens automatically as data is loaded.

Micro-partitions are immutable. Updates and deletes do not modify existing partitions — they write new ones and mark the old ones as superseded. This immutability, which sounds like a constraint, is what enables Time Travel, fail-safe recovery, and zero-copy cloning.

Compute — Virtual Warehouses

The compute layer consists of Virtual Warehouses: independent clusters of cloud compute instances that execute queries. Each warehouse has its own CPU, memory, and local SSD cache. Warehouses share nothing with each other.

This isolation eliminates the noisy-neighbor problem. A heavy ETL job running on the data engineering warehouse has zero effect on the response time of the BI dashboard warehouse — even if both are reading from the same underlying tables. They hit the same storage layer but run on entirely separate hardware.

Warehouses are sized from X-Small to 6X-Large. Each step doubles the node count and roughly doubles both throughput and hourly cost. A larger warehouse is not faster for simple queries — it is faster for queries that parallelise across many partitions simultaneously. Choosing the wrong size is the most common source of Snowflake overspending.

Warehouses can be paused at any time and cost nothing while paused. An auto-suspend setting keeps idle warehouses from running overnight when no one is using them. Auto-resume restarts them automatically when a query arrives, adding a few seconds to the first query's latency. For high-concurrency workloads, multi-cluster warehouses spin up additional compute clusters to absorb simultaneous query load and decommission them when demand falls.

Cloud Services

The cloud services layer is the coordination brain. It runs continuously, independently of any warehouse, and handles everything that is not raw storage or query execution: SQL parsing and validation, query optimisation, transaction management (Snowflake is fully ACID), authentication, role-based access control, and the result cache.

When a query arrives, the cloud services layer determines which micro-partitions are relevant by consulting the metadata store, prunes everything else, and generates an execution plan for the chosen warehouse to carry out. This entire process happens before any data is read from storage. The cloud services layer also holds detailed statistics — column cardinality, data distributions, recent write patterns — used by the cost-based optimiser to pick efficient join strategies and scan orders.

Cloud services billing is typically a small percentage of total compute spend and is often invisible in practice. The layer is always on, and its cost is not something most teams think about.

Caching

Snowflake has three separate caching layers that work in concert. Understanding them explains much of the apparently unpredictable performance new users encounter — why the same query sometimes takes two seconds and sometimes takes zero, why the first query after a resume is slower, why two users running identical queries do not both trigger compute.

Level 1 — Metadata Cache

Answers queries resolvable from partition statistics alone — COUNT(*), MIN, MAX on non-nullable columns. Returns in milliseconds; no warehouse required.

Cloud Services
Always available

Level 2 — Result Cache

Returns the exact result of a prior query if the underlying data has not changed, within 24 hours. Global across all users and warehouses — the second person to run a slow report sees it return instantly. No warehouse required.

Cloud Services
24-hour TTL

Level 3 — Local SSD Cache

Caches recently accessed micro-partitions on the warehouse's own local disk. Subsequent queries touching the same partitions read from SSD rather than cloud storage. Warm as long as the warehouse is running; lost on pause/resume.

Virtual Warehouse
Warehouse lifetime

"The result cache is global. The first analyst to run a slow report pays the cost. Everyone after them gets it for free — until the data changes."

Features

The architecture creates a foundation for features that would be difficult or impossible to build on traditional coupled systems. Several of these are genuinely novel; others are improvements on ideas that existed but never worked well at scale.

Time Travel

Query your data as it existed at any point in the past

Because micro-partitions are immutable and superseded partitions are retained for a configurable window, Snowflake can reconstruct the state of any table at any prior timestamp. The window is 1 day on Standard editions and up to 90 days on Enterprise. Useful for recovering from accidental deletes, auditing changes, debugging pipelines against historical states, and comparing current vs. past aggregations.

-- query the orders table as it existed on Jan 1st
SELECT * FROM orders
  AT(TIMESTAMP => '2024-01-01 00:00:00'::timestamp_tz);

-- restore accidentally deleted rows
INSERT INTO orders
  SELECT * FROM orders BEFORE(STATEMENT => '<query_id>')
  MINUS
  SELECT * FROM orders;

Zero-copy Cloning

Clone a full database or table in milliseconds, for free

Cloning creates a new object that references the same underlying micro-partitions as the source. No data is copied; no storage is consumed until the clone diverges through writes. As the clone receives new data, new partitions are written for it while the originals remain shared. Transforms the economics of dev and staging environments: clone production to staging in a single command, pay nothing until you start making changes.

-- clone the full production database to staging, instantly
CREATE DATABASE staging_db CLONE production_db;

-- or just one table
CREATE TABLE orders_dev CLONE orders;

Data Sharing

Share live data with other accounts without copying it

A Snowflake account can grant another account read-only access to a database as a live share. The consumer queries your data in real time but the data never moves — the share is a pointer to your micro-partitions. Works cross-account and cross-cloud. The Snowflake Marketplace is a directory of these shares: organisations publish datasets that anyone with a Snowflake account can subscribe to and query immediately, without ETL or file transfers.

Semi-structured Data

Store and query JSON, Avro, Parquet, and XML natively

The VARIANT column type accepts any semi-structured payload without requiring a schema at load time. Fields are accessed using dot notation; arrays are unnested using the FLATTEN table function. Snowflake automatically detects frequently accessed sub-fields and stores them in internal sub-columns with typed compression, improving both storage efficiency and query speed. Schema evolution is free — just load the new payload shape.

-- access nested fields directly
SELECT
  payload:user:email::string        AS email,
  payload:event:type::string        AS event_type,
  payload:metadata:version::integer AS version
FROM raw_events;

-- flatten a nested array
SELECT e.value:sku::string AS sku
FROM orders, LATERAL FLATTEN(input => line_items) e;

Snowpipe

Continuous, serverless ingestion triggered by file arrival

Standard ingestion requires a running warehouse and explicit execution. Snowpipe is an event-driven alternative: define a pipe with a COPY INTO statement, configure an event notification on your cloud storage bucket, and Snowpipe automatically loads new files within minutes of arrival. No warehouse needed; billing is per compute resource consumed, not per warehouse hour. The natural fit for streaming-adjacent workloads where data arrives continuously as files.

Streams, Tasks & Dynamic Tables

Build incremental pipelines entirely inside Snowflake

Streams track every change to a table — inserts, updates, deletes — as a CDC log, giving you a precise record of what changed since you last processed it. Tasks are scheduled SQL jobs, analogous to cron, that can consume a stream incrementally on a defined interval. Together they let you build ELT pipelines inside Snowflake without external orchestration. Dynamic Tables are a newer, cleaner abstraction: declare the SQL result you want, and Snowflake keeps it materialised and refreshed automatically as upstream data changes — replacing the streams + tasks pattern for most pipeline use cases.

Snowpark

Run Python, Java, or Scala directly on Snowflake compute

Snowpark lets you write transformation and ML logic in Python (or Java, Scala) and execute it on Snowflake's own compute nodes, rather than pulling data into an external environment. The Python DataFrame API is intentionally similar to pandas and PySpark, making adoption low-friction. Snowpark ML adds model training primitives, a feature store, and a model registry directly inside the warehouse — a bid to reclaim ML workloads that were previously going to Databricks or SageMaker.

Why Teams Choose Snowflake

The recurring theme across most Snowflake adoption stories is not a specific feature but a reduction in operational overhead. Teams stop managing cluster sizes, vacuuming tables, building and maintaining indexes, and tuning autoscaling policies. They write SQL; Snowflake handles the rest.

The workload isolation model matters at scale. A single copy of your data can serve analysts, data scientists, ingestion jobs, and dashboards simultaneously without any team's workload affecting another's. Before Snowflake, achieving this either required expensive hardware overprovisioning or careful scheduling of jobs around each other.

The data engineering features — Time Travel, zero-copy cloning, streams, dynamic tables — reduce the amount of external tooling a team needs. Cloning production to staging is a SQL command. Auditing a data change is a SQL query. Incremental pipelines are SQL jobs. Teams that previously required Airflow, dbt, and a custom recovery process can often collapse that stack significantly.

How Snowflake Compares

Dimension	Snowflake	Redshift	BigQuery	Databricks
Storage / compute	Fully separated from day one	Separated via RA3 nodes (2019)	Fully serverless, no sizing	Separated (DBFS + compute clusters)
Multi-cloud	AWS, Azure, GCP — cross-cloud sharing	AWS only	GCP only	AWS, Azure, GCP
Primary language	SQL-first	SQL-first	SQL-first	Python/Spark-first, SQL available
Semi-structured data	Native VARIANT type, dot notation	SUPER type (limited)	Native JSON fields	Native via Delta Lake
Data sharing	Cross-cloud live sharing + Marketplace	Data sharing within AWS	Analytics Hub	Unity Catalog shares
Python / ML	Snowpark (Python, Java, Scala)	SageMaker integration	BigQuery ML (SQL-native)	Native MLflow, notebooks, GPUs
Storage format	Proprietary (optimised columnar)	Proprietary	Proprietary (Capacitor)	Delta Lake (open)
Best for	SQL teams, data engineering, enterprise sharing	AWS-native teams, predictable workloads	GCP-native, fully serverless preference	ML/data science, open formats

The clearest battle today is between Snowflake and Databricks, and it is a convergence story. Snowflake added Python via Snowpark; Databricks added a SQL warehouse layer. Both are trying to become the single data platform for an organisation. Snowflake still wins on SQL ergonomics, operational simplicity, and data sharing. Databricks still wins on ML experimentation, streaming, and open formats — Delta Lake is Apache-licensed, meaning data is not locked to Databricks' runtime.

Where Snowflake Falls Short

Not designed for OLTP

Row-level updates and deletes are expensive — each one rewrites a micro-partition. Snowflake is an analytical system built for bulk loads and read-heavy workloads. Latency is too high for transactional use; you will not replace PostgreSQL with it.

Cold start latency

The first query after a warehouse auto-resumes adds two to five seconds of startup time. For dashboards on auto-suspending warehouses, this means the first visitor after an idle period waits noticeably longer. Something to design around, not a dealbreaker.

Proprietary storage format and vendor lock-in

Data is stored in Snowflake's own format and is not directly readable by external tools. Getting data out at scale requires re-exporting via COPY INTO, which is a real migration cost. Compared to Databricks' Delta Lake (open and readable by Spark, Trino, Athena without Databricks), this is a meaningful difference for organisations that value portability.

No true real-time streaming

Snowpipe is event-driven but not instantaneous — typical latency is one to five minutes. Kafka Snowflake connector reduces this but does not eliminate it. If your use case requires sub-second data freshness, Snowflake is not the right system. It is a near-real-time analytical store, not a stream processing engine.

Cost can surprise at scale

The credit model rewards auto-suspending idle compute but punishes careless warehouse sizing. A large warehouse left running continuously by an unattended process can consume a month's budget in a few days. Snowflake requires active governance — query tagging, resource monitors, warehouse policies — to stay predictable at scale.

Snowpark ML is still maturing

The bid to reclaim ML workloads through Snowpark is directionally right but the ecosystem is not yet at parity with Databricks. Teams doing serious model training, experimentation tracking, and deployment will still find Databricks or dedicated ML platforms more complete.

Time Travel, zero-copy cloning, cross-cloud data sharing, the global result cache — none of these started as features. They are consequences of one decision: making micro-partitions immutable and separating storage from compute. If a single architectural constraint from 2012 is still generating new capabilities today, how many of the things your team treats as hard problems are actually just constraints waiting to be questioned?