Direct Lake vs Import vs DirectQuery: How to Stop Guessing and Actually Choose

Power BI now has three meaningfully different ways to connect a semantic model to data, and the internet is full of articles that treat them as a spec sheet rather than an architectural decision. "Import is fast. DirectQuery is live. Direct Lake is new." Thanks — extremely useful for choosing which one to use on a 200-million-row sales fact table in a Microsoft Fabric lakehouse.

This is the article I wish existed when I first had to make that choice in production. It covers what each mode actually does at a storage and query execution level, the performance model that determines which one wins in which scenario, the five most common mistakes teams make, and a decision framework that actually leads to an answer.

One Thing to Understand Before Everything Else

There is one architectural fact that changes how you think about all three modes:

Import and Direct Lake both use VertiPaq for query execution. DirectQuery usually does not.

VertiPaq is Power BI's in-memory columnar engine — the thing that makes DAX fast. When your data is in VertiPaq, it's encoded, compressed, and resident in RAM. Complex aggregations over millions of rows happen in sub-second time. When your data is not in VertiPaq (DirectQuery), every report interaction potentially translates into SQL sent to your source system and waits for the result. These are fundamentally different performance profiles, and the mode you choose determines which one your users experience.

Import

VertiPaq resident

Data copied into the semantic model during refresh. VertiPaq is the primary storage and query engine. Queries hit RAM. Fastest, most predictable. Data age = last refresh time.

Direct Lake

VertiPaq on demand

Data stays in OneLake Delta/Parquet. Columns loaded into VertiPaq when first queried. Near-Import performance when warm; cold-start penalty on first query after eviction.

DirectQuery

Source engine

No local data storage. DAX queries translated to SQL at runtime. Performance depends entirely on source system speed, indexes, concurrency, and network latency.

Import Mode: The Reliable Workhorse

Import mode is fifteen years old and still the best default for most scenarios. When you refresh a Power BI Import model, it copies data from the source, compresses it with VertiPaq's dictionary and RLE encoding, and stores it in the semantic model. When a user opens a report, every DAX query hits RAM — there is no round-trip to any external system.

The result is the fastest and most predictable user experience Power BI can deliver. Report performance doesn't degrade when the source database has a noisy neighbor. It doesn't slow down at peak business hours when fifty other applications are hitting the same SQL Server. It doesn't timeout when someone puts a "Top N" slicer on a 500-million-row table. The data is local, encoded, and ready.

Where Import Falls Short

The obvious limitation is that the data is only as fresh as the last refresh. If you refresh hourly, users see data that's up to 59 minutes stale. For most business reporting, this is fine. For operational monitoring dashboards where people need the last five minutes of data, it isn't.

The less-obvious limitation is refresh cost. For a 10 GB semantic model refreshing every hour, you're running 24 full data reads per day from your source. Large models with expensive transformations can consume enough compute that the refresh pipeline itself becomes a problem to manage. This is where incremental refresh partitioning (available in Power BI Premium / Fabric capacity) becomes essential — refresh only the data that changed, not the entire dataset.

Use Import when: performance is the top priority, daily or hourly refresh is acceptable, you need advanced DAX modeling capabilities (calculated tables, calculated columns, complex relationship patterns), or you want predictable performance regardless of source system load.

Direct Lake: The Fabric-Native Path

Direct Lake is a Microsoft Fabric storage mode that reads data directly from OneLake Delta Parquet files rather than importing it. The critical distinction from DirectQuery is that it still uses VertiPaq — it just loads columns into VertiPaq on demand rather than during a scheduled refresh.

The mechanism: when a report query references a column, Direct Lake loads that column's Parquet data from OneLake into VertiPaq memory. Subsequent queries on that same column hit the in-memory VertiPaq cache, like Import mode. If memory pressure evicts a column (capacity is full, new queries need space), the next query that references it incurs a reload cost.

Cold, Warm, and Hot

Direct Lake performance lives in one of three states, and understanding which state your queries are actually in is the key to honest performance expectations:

Cold — columns not loaded Warm — columns in memory Hot — columns + query caches active

Cold means the required columns are not in VertiPaq memory. Direct Lake must read the Parquet files from OneLake, decode and transcode them into VertiPaq's format, then execute the query. This takes seconds to tens of seconds depending on column size, Parquet file layout, and whether V-Order optimization was applied when the files were written. On a Friday morning when the first user opens the weekly sales report, they hit the cold path. Everyone after them gets the warm path.

Warm means the columns are loaded. Query performance approaches Import mode — you're running VertiPaq scans against in-memory columnar data with the full benefit of SIMD vectorization, dictionary encoding, and RLE compression. For a well-designed model, "warm Direct Lake" and "Import" are largely indistinguishable to users.

Hot means columns are loaded and the VertiPaq internal result cache (VertiScan cache) has results for your specific query patterns. The second report user on the same filter context gets hot-path performance: cached results, sub-100ms response.

The key insight: Direct Lake's selling point isn't that it's faster than Import — it often isn't, especially warm Import vs warm Direct Lake. The selling point is that it eliminates the refresh pipeline. Your Gold Delta table is always current; Power BI reads it on demand. No scheduled refresh job, no incremental refresh partitioning, no "the report was delayed because the 3am refresh failed." Your Fabric pipeline updates the Delta table; Direct Lake sees the new snapshot immediately.

Direct Lake on OneLake vs Direct Lake on SQL Endpoint

There are two flavors. Direct Lake on OneLake connects the semantic model directly to OneLake Delta files — this is the clean, strategic path for Fabric-native architectures. Direct Lake on SQL Endpoint routes through the Fabric Warehouse or Lakehouse SQL endpoint, which can introduce DirectQuery fallback behavior for certain operations (SQL views, complex joins). For new Fabric architectures, target Direct Lake on OneLake and design your Gold layer as Delta tables rather than SQL views.

The Delta Table Quality Tax

Direct Lake performance is directly proportional to the quality of your Delta tables. Small files, poor partitioning, high-cardinality columns without sorting, and non-V-Order Parquet all add to the cold-load time. A Gold table with 50,000 tiny files takes much longer to load than one with 200 well-sized 256 MB files. This means the data engineering team's decisions upstream directly affect report performance downstream — a political fact that is worth surfacing early in any Fabric architecture project.

DirectQuery: Powerful Tool, Frequently Misused

DirectQuery is the mode where Power BI sends queries directly to the source system at runtime. When a user changes a slicer, Power BI translates the resulting DAX into SQL (or whatever native query language the source supports) and waits for the answer. No local data. No VertiPaq. No caching of facts.

This sounds appealing: no refresh delays, always-fresh data, no storage overhead. In practice, it's the mode that generates the most support tickets. Here's why.

The translated SQL Power BI generates is often not what a human would write. It tends to be verbose, uses subqueries, and doesn't always leverage indexes optimally. Multiply this by 50 concurrent users, each with 12 visuals on their dashboard, each visual potentially generating its own query. A busy dashboard can generate hundreds of SQL queries per minute against your source system. A database optimized for OLTP will cry.

Use DirectQuery when: data genuinely cannot be copied (regulatory, contractual, or security constraints), reports must show current state at second-level freshness, you're querying a source that's already optimized for analytical queries (Synapse Dedicated SQL Pool, Snowflake, BigQuery), or you're building a composite model where only a small live table needs DirectQuery while historical data is imported.

"DirectQuery for real-time data" is usually the wrong architecture. DirectQuery is not a streaming solution — it's a synchronous SQL poll. If you need truly real-time data (seconds), you need Power BI Streaming Datasets, Fabric Real-Time Intelligence, or a pre-aggregated table in your source that Direct Lake or Import can read with a short refresh interval. DirectQuery with a report auto-refresh every 5 seconds on a shared database is a great way to make the DBA page you at 2am.

The Architecture: Where Each Mode Lives in a Fabric Stack

flowchart LR
    subgraph Sources["Source Systems"]
        ERP["ERP / CRM / SaaS"]
        OLTP["Operational DBs"]
        ExtDW["External DW\nSnowflake / Databricks"]
    end

    subgraph Fabric["Microsoft Fabric — OneLake"]
        DF["Data Factory Pipelines\n+ Dataflows Gen2\nIngestion + orchestration"]
        Bronze["Bronze Delta Tables\nRaw ingested data"]
        Silver["Silver Delta Tables\nCleaned + conformed"]
        Gold["Gold Delta Tables\nStar schema — facts + dims"]
        WH["Fabric Warehouse\nSQL endpoint over Gold"]
    end

    subgraph SemanticLayer["Power BI Semantic Layer"]
        IMPORT["Import Semantic Model\nGold → copied into VertiPaq\nFull-speed DAX, scheduled refresh"]
        DL["Direct Lake Semantic Model\nGold Delta → columns loaded on demand\nNo refresh job needed"]
        DQ["DirectQuery Semantic Model\nLive SQL to Warehouse or ext. source\nAlways current, source-dependent perf"]
        VP["VertiPaq Engine\nIn-memory columnar — Import + Direct Lake"]
    end

    ERP & OLTP --> DF --> Bronze --> Silver --> Gold
    ExtDW -->|"OneLake shortcut"| Gold
    Gold -->|"scheduled/incr. refresh"| IMPORT
    Gold -->|"Direct Lake — no refresh"| DL
    Gold --> WH -->|"live SQL"| DQ
    ExtDW -->|"live SQL"| DQ
    OLTP -->|"composite model live tail"| DQ
    IMPORT --> VP
    DL --> VP

In a Fabric medallion architecture, all three modes can coexist. Import and Direct Lake sit over the same Gold Delta tables but differ in how data reaches VertiPaq. DirectQuery routes through the SQL endpoint or external sources, bypassing VertiPaq for fact data.

Performance Reality Check

Here's the honest comparison of what each mode actually delivers at query time, stripped of marketing language:

Factor	Import	Direct Lake (warm)	Direct Lake (cold)	DirectQuery
Simple aggregation (SUM over 10M rows)	10–50ms	10–80ms	2–15 seconds	0.5–5 seconds (source-dependent)
Complex DAX (context transitions)	Fast (VertiPaq FE)	Fast (VertiPaq FE)	Slow + load penalty	Very slow or timeout
50 concurrent users	Good — cached in RAM	Good when warm	Contention at cold load	Degrades — source concurrency
Data freshness after pipeline run	Next scheduled refresh	Immediate	Immediate	Immediate
Model size limit	RAM-bound per capacity	OneLake (virtually unlimited)	OneLake (virtually unlimited)	No local limit
DAX modeling richness	Full	Full (with some Direct Lake caveats)	Full (with some Direct Lake caveats)	Limited — many patterns inefficient

The table makes one thing clear: Import and warm Direct Lake are roughly equivalent for most DAX patterns. Cold Direct Lake is noticeably slower for the first user. DirectQuery is performance-variable and DAX-constrained in ways that bite you during report development, not just in production.

The Decision Tree

flowchart TD
    Start(["Need a Power BI model"]) --> Q1{"Where is your\ncurated data?"}

    Q1 -->|"In Fabric OneLake\nDelta tables"| Q2{"Refresh latency\ntolerance?"}
    Q1 -->|"External DB / DW\nor operational source"| Q3{"Can data be\ncopied into Power BI?"}

    Q2 -->|"Daily / hourly OK\nand I want max performance"| IMP1["Import\nBest default for performance\n+ modeling flexibility"]
    Q2 -->|"Always current after\npipeline runs"| DL["Direct Lake\nGold Delta → VertiPaq on demand\nNo refresh pipeline needed"]

    Q3 -->|"Yes — can copy"| IMP2["Import\nBest performance +\nrichest modeling"]
    Q3 -->|"No — live source or\nsource-side security required"| DQ["DirectQuery\nAccept performance tradeoff\nDesign reports carefully"]

    DL --> Q4{"Performance OK?"}
    Q4 -->|"Cold queries slow"| OPT1["Optimize Delta tables upstream:\nV-Order, file consolidation,\ncolumn sort order"]
    Q4 -->|"DirectQuery fallback"| OPT2["Switch to Direct Lake on OneLake\n(not SQL Endpoint)\nRemove SQL views from model"]
    Q4 -->|"Model too complex"| OPT3["Simplify Gold schema:\nstar schema, reduce high-cardinality\ncalculated columns"]

    IMP1 & IMP2 & DL --> VP(["VertiPaq — fast DAX"])
    DQ --> SQ(["Source query engine\n— variable performance"])

Start with where your data lives and how fresh it needs to be. Most teams land on Import or Direct Lake; DirectQuery is the exception that requires a genuine live-data or data-residency requirement to justify its trade-offs.

Composite Models: The Real-World Pattern

Real enterprise semantic models rarely fit neatly into one mode. The practical pattern is composite models: a single semantic model where different tables use different storage modes.

The most common composite pattern in Fabric:

Direct Lake for large fact tables — sales transactions, events, logs. Too large to import efficiently; benefits most from Direct Lake's "no refresh" freshness.
Import for small helper tables — date dimension, product hierarchy, static mappings, disconnected slicer tables. Small enough that import is trivial; rich modeling features (calculated columns, M transformations) are available for these tables.
DirectQuery for a live tail — in some architectures, the most recent hour of data lives in a transactional source while historical data is in Gold. DirectQuery on the live table, Import or Direct Lake on the Gold table, with a relationship or Union in DAX connecting them. This is the "hybrid historical + live" pattern that legacy Lambda architectures used to solve with two separate pipelines.

The composite model approach is pragmatic and explicitly supported by Fabric's architecture. Don't feel like you're cheating by mixing modes — you're using the right tool for each part of the data.

Five Mistakes That Will Ruin Your Friday Afternoon

Mistake 1: Thinking Direct Lake Is Always Faster Than Import

It isn't. A well-designed Import model where all columns are already in VertiPaq is faster than Direct Lake in cold state, and roughly equivalent in warm state. Direct Lake's advantage is eliminating the refresh pipeline, not beating Import on raw query speed. If your model has a two-hour refresh window that's causing operational pain, Direct Lake is your fix. If your model already refreshes in 15 minutes and performance is fine, don't migrate for the marketing story.

Mistake 2: Using DirectQuery Because It Sounds Like Real-Time Analytics

DirectQuery is not real-time analytics. It's synchronous SQL polling. If a user changes a slicer and Power BI fires a SQL query against a Postgres database that's also handling 500 concurrent application transactions, somebody's experience is going to suffer. Use DirectQuery when the source is actually built for it (Synapse, Snowflake, BigQuery), not when you want to avoid building a proper pipeline.

Mistake 3: Connecting Power BI Directly to Bronze or Silver Tables

This is the "I'll skip the Gold layer, it's just overhead" mistake. It isn't. Bronze tables have raw data, naming inconsistencies, no star schema, and columns that VertiPaq will encode inefficiently. Silver tables are better but still normalized — normalized data means complex DAX relationships that push query logic into the Formula Engine rather than the Storage Engine. The Gold star schema exists specifically because it's what VertiPaq processes efficiently. Skip it and your report performance will be consistently mediocre, and you'll spend six months blaming the wrong thing.

Mistake 4: Ignoring Delta Table Physical Quality for Direct Lake

If your Gold Delta table has 100,000 files of 1 MB each (classic streaming write without compaction), Direct Lake cold-load will be slow regardless of any other optimization. The Parquet file reader has to open each file, read the footer, check the row group statistics. With 100,000 files, this metadata overhead dominates even before reading a single value. Compact aggressively — aim for 128 MB to 512 MB files. Apply V-Order on write (Fabric-native tools do this automatically; external writers don't). Sort dimension columns for maximum RLE compression.

Mistake 5: Building Complex Transformations in DAX Instead of Upstream

DAX is a query language, not a transformation engine. Calculated columns that join multiple tables, complex SWITCH statements implementing business rules, ADDCOLUMNS that materialize derived fields from large tables — all of these compute at refresh time and store the result in VertiPaq memory, or worse, re-compute at query time. Heavy business logic belongs in Spark notebooks, dbt models, or Dataflows Gen2, where it runs at scale on distributed compute and the result lands in a clean Gold column. If your DAX has more than 10 lines per measure and involves multiple table joins, something has gone wrong upstream.

The Recommended Pattern for Fabric

For most Microsoft Fabric enterprise analytics platforms, this is the architecture that makes the fewest regrettable decisions:

Source Systems
    ↓ (Fabric Data Factory / CDC Mirroring)
Bronze Delta Tables  (raw, append-only)
    ↓ (Spark notebooks or Dataflows Gen2)
Silver Delta Tables  (cleaned, conformed, optionally Data Vault 2.0)
    ↓ (dbt or Spark)
Gold Delta Tables    (star schema — facts + dimensions, V-Order, compacted)
    ↓
Direct Lake Semantic Model  (over Gold, no refresh pipeline)
    + Import for small helper / lookup tables
    + DirectQuery only where live source access is genuinely required
    ↓
Power BI Reports / Dashboards

This pattern gives you: scalable data engineering in Delta Lake, no Power BI refresh pipeline to maintain, VertiPaq performance for the query layer, and the flexibility to add Import or DirectQuery tables where the standard pattern genuinely doesn't fit — without compromising the rest of the model.

The one-sentence rule: Import by default. Direct Lake when your data lives in Fabric OneLake and refresh avoidance matters. DirectQuery only when the data genuinely must be queried live at the source. When in doubt, start with Import — it's the most predictable, easiest to debug, and hardest to make accidentally slow.

Choosing in Practice

The question "which mode should I use?" almost always resolves to one of three actual situations:

"I have data in Fabric OneLake and want to build a Power BI report on it." → Direct Lake. No import needed, no refresh pipeline, just build the semantic model and point it at your Gold Delta tables.
"I have data in Power BI or a traditional SQL source and this is a standard BI project." → Import. Standard tooling, best modeling flexibility, predictable performance.
"I have data that legally or contractually cannot leave the source system, or I need sub-minute freshness on a source-optimized warehouse." → DirectQuery. Accept the trade-offs and design reports accordingly: fewer visuals per page, reduce high-cardinality slicers, validate every measure in DAX Studio against the generated SQL.

Most of the time, you already know the answer before you finish reading the question. The challenge is resisting the urge to pick a fancier-sounding option when the boring one is clearly correct. "Import" has served perfectly well for fifteen years. Direct Lake is genuinely useful in the Fabric context. DirectQuery is a specialized tool that requires specialized discipline. Know which situation you're in, and the rest is implementation detail.