Snowflake Cortex AI 2026: Agents, AISQL, Document AI & Use Cases

❄️ This is Part 2 of a 3-part series: Snowflake Deep Dive (2026)

Snowflake Internals: How the Three-Layer Architecture Actually Works
Snowflake Cortex AI in 2026: Agents, Analyst, and the Agentic Data Cloud (you are here)
Real-Time Snowflake on AWS: Snowpipe Streaming, Dynamic Tables, and Lessons Learned

When I covered Cortex AI in late 2024, the pitch was "LLM functions inside your data warehouse" — run SNOWFLAKE.CORTEX.COMPLETE() over a table, get managed RAG with Cortex Search, ask questions in English with a preview of Cortex Analyst. Useful, but mostly a set of building blocks you assembled yourself.

Eighteen months later the framing has changed. Cortex Analyst and Cortex Search went GA, the LLM functions consolidated under AISQL, and — the headline shift — Snowflake shipped Cortex Agents: a fully-managed agentic runtime that plans, calls tools, executes code, and reasons over your structured and unstructured data in one governed loop, without you operating an orchestration framework. The marketing calls the whole thing the "Agentic Data Cloud." Stripped of the slogan, the real story is that the building blocks from 2024 are now tools an agent orchestrates for you. This article maps the 2026 surface, shows how the agent loop actually works, and gives an honest read on where it fits.

If you read the 2024 piece: LLM functions, Cortex Search, Cortex Analyst, and Document AI all still exist and work as described — this isn't a rewrite of those. What's new is the layer above them (Cortex Agents), the GA status of the key pieces, and the consolidation of the function surface into AISQL. Skip to "Cortex Agents" if you only want the 2026 delta.

The 2026 Product Surface

graph TD
    subgraph Platform["Snowflake Agentic Data Cloud (2026)"]
        AGENTS["🤖 Cortex Agents (GA)\nmanaged plan→tool→reflect loop"]
        subgraph Tools["Tools an agent can call"]
            ANALYST["Cortex Analyst (GA)\nNL → SQL over semantic views"]
            SEARCH["Cortex Search (GA)\nhybrid retrieval over unstructured"]
            AISQL["AISQL functions\nCOMPLETE, AI_CLASSIFY, AI_FILTER…"]
            CUSTOM["Custom tools\nstored procs / UDFs"]
            MCP["MCP connectors\n+ web search + code exec"]
        end
        SEMMODEL["Semantic Models / Views\n(the shared business layer)"]
        HORIZON["Horizon Catalog\ngovernance, lineage, RBAC"]
    end

    AGENTS --> ANALYST
    AGENTS --> SEARCH
    AGENTS --> AISQL
    AGENTS --> CUSTOM
    AGENTS --> MCP
    ANALYST --> SEMMODEL
    HORIZON -.governs every call.-> AGENTS

Cortex AI in 2026. The agent is the new top layer; the 2024 primitives (functions, Search, Analyst) are now its tools. The semantic model is the shared contract that makes natural-language-to-SQL reliable, and Horizon governance wraps every call — the same "data never leaves the perimeter" story, now extended to autonomous agents.

AISQL: The Function Layer, Consolidated

The scalar LLM functions are still the foundation — every higher-level feature ultimately calls them — but the 2026 surface is broader and more SQL-idiomatic. Alongside the familiar COMPLETE, SUMMARIZE, SENTIMENT, and TRANSLATE, Snowflake added a family of "AI operators" designed to be composed inside ordinary queries:

-- Classic generation, now with a much larger model menu
SELECT review_id,
       SNOWFLAKE.CORTEX.COMPLETE('claude-sonnet-4-5',
           'Extract the product complaint in one sentence: ' || review_text) AS issue
FROM reviews;

-- AI_FILTER: a boolean LLM predicate you can put in a WHERE clause
SELECT * FROM support_tickets
WHERE AI_FILTER(ticket_text, 'this message describes a billing dispute');

-- AI_CLASSIFY: multi-label classification as a first-class function
SELECT ticket_id,
       AI_CLASSIFY(ticket_text, ['billing','technical','account','other']) AS category
FROM support_tickets;

-- AI_AGG: aggregate reasoning across many rows in a group
SELECT product_line,
       AI_AGG(review_text, 'summarize the top 3 recurring complaints') AS themes
FROM reviews
GROUP BY product_line;

The meaningful change since 2024 is the model menu. Cortex now serves frontier models from multiple providers directly inside the governed perimeter — including Anthropic's Claude family and OpenAI models via Cortex, alongside the open-weights Llama, Mistral, and Snowflake's own models. The 2024 limitation — "no GPT-4 or Claude, open-weights only" — is gone. You can run a genuinely frontier model with the same data-never-leaves-Snowflake guarantee, which removes the single biggest reason teams used to route around Cortex to an external API.

Cortex Analyst and Cortex Search: Now GA, Now Tools

Cortex Analyst (GA) is still the natural-language-to-SQL feature, and the semantic model is still the thing that makes it trustworthy: a YAML (now exposed as a first-class semantic view object) that maps business terms to verified SQL — measures, dimensions, the filters that belong on a metric. The reliability principle hasn't changed: it generates SQL against defined semantics, not by guessing column names off a raw schema.

Cortex Search (GA) is still managed hybrid retrieval (vector + keyword) over a Snowflake table, kept in sync with the source. What's new is that an agent can now adjust its parameters dynamically — filters, which metadata columns to return, result count, time-decay — instead of you hard-coding them.

The important reframing: in 2024 you called these directly and wrote your own glue. In 2026 they are the two primary tools a Cortex Agent reaches for — Analyst for "the answer is in a number in a table," Search for "the answer is in prose in a document." The agent decides which (or both) a question needs.

Cortex Agents: The Managed Loop

This is the genuinely new thing. A Cortex Agent is a managed runtime that executes the standard agentic cycle — plan → use tools → reflect → respond — without you building or operating the orchestration loop, the runtime, or a code sandbox. Snowflake runs all of that.

sequenceDiagram
    participant U as User / App
    participant A as Cortex Agent (managed)
    participant AN as Cortex Analyst
    participant SE as Cortex Search
    participant DB as Governed Data
    U->>A: "Why did EMEA churn rise last quarter, and what are customers saying?"
    A->>A: Plan — split into (1) churn metric (2) qualitative reasons
    A->>AN: Tool call: NL → SQL over semantic view
    AN->>DB: SELECT churn metrics (RBAC enforced)
    DB-->>AN: numbers
    AN-->>A: structured result
    A->>SE: Tool call: search support tickets / NPS verbatims (EMEA, last quarter)
    SE->>DB: hybrid retrieval (RBAC enforced)
    DB-->>SE: relevant passages + citations
    SE-->>A: passages
    A->>A: Reflect — enough to answer? combine quant + qual
    A->>U: Grounded answer with SQL + cited sources

A single agent turn combining structured and unstructured data. The agent decided, on its own, that the question needed both Analyst (the churn number) and Search (the "why"), then reconciled them. Every data access still runs under the asking user's role — governance is not bypassed by the agent.

What an agent can call

Beyond Analyst and Search, a Cortex Agent's toolbelt includes custom tools (your own stored procedures and UDFs implementing business logic), code execution in a secure sandbox (Python, when enabled), MCP connectors to remote Model Context Protocol servers (Atlassian, Salesforce, and custom apps), web search, and data-to-chart visualization. Tools are described to the agent; it chooses which to invoke per step.

How you actually invoke one

Agents are GA in Snowsight, via SQL, and through a REST API. The API supports two patterns: define a reusable agent object once, or pass the full configuration per request. Authentication is via PAT, key-pair JWT, or OAuth.

-- Define a reusable agent object backed by a semantic view + a search service
CREATE OR REPLACE AGENT support_insights_agent
  WITH PROFILE = '{"display_name": "Support Insights"}'
  COMMENT = 'Answers questions over support metrics + ticket text'
  FROM SPECIFICATION $$
  models:
    orchestration: claude-sonnet-4-5
  tools:
    - name: metrics
      type: cortex_analyst_text_to_sql
      semantic_view: analytics.prod.support_semantics
    - name: tickets
      type: cortex_search
      search_service: analytics.prod.ticket_search
  instructions:
    response: "Always cite the ticket IDs you used. Show the SQL for any number."
  $$;

# Call the agent from an app over the REST API
import requests

resp = requests.post(
    f"https://{account}.snowflakecomputing.com/api/v2/cortex/agent:run",
    headers={"Authorization": f"Bearer {jwt}", "Content-Type": "application/json"},
    json={
        "agent": "support_insights_agent",
        "messages": [{"role": "user", "content": [{"type": "text",
            "text": "Why did EMEA churn rise last quarter, and what are customers saying?"}]}],
    },
)
# Response streams the agent's tool calls, generated SQL, retrieved citations,
# and the final grounded answer — the orchestration loop runs server-side.

The Rest of the 2026 Platform

A few more pieces round out the surface and frequently come up in real designs:

Cortex Code — an AI coding/data-engineering assistant in Snowsight (GA) and a CLI, including agent teams that split large assignments into parallel work. It is Snowflake's "build on your data with an agent" developer surface.
Snowflake Cortex / CoWork — the knowledge-worker-facing chat surface that sits on top of agents, for non-engineers to ask governed questions.
Openflow — managed data integration (built on Apache NiFi) for getting structured and unstructured sources into Snowflake so agents have something to reason over.
Document AI — still the no-code path for extracting structured fields from PDFs and images at scale, now commonly used as a preprocessing step feeding agent workflows.
Cortex Knowledge Extensions — packaged third-party content (e.g. licensed reference data) consumable by Search/agents via the Marketplace.

Use Cases in Practice

The product surface is abstract until you map it to work people actually do. Here are the patterns that show up most often in production, organized by which Cortex capability does the heavy lifting.

AISQL: AI as a column in your pipeline

The highest-ROI Cortex pattern is also the least flashy: treat an LLM as a SQL function and run it over data you already have, in batch, inside your existing transformations. Concrete uses that pay for themselves quickly:

Support-ticket triage at SQL speed. AI_CLASSIFY routes tickets to queues, SENTIMENT flags at-risk customers, and AI_FILTER in a WHERE clause isolates (say) billing disputes — turning a manual reading task into a query. The win the teams describe is "recoverable capacity": the model handles routine volume so humans focus on the ambiguous, high-risk cases.
Entity and field extraction from free text — pulling structured fields out of contracts, product reviews, or clinical notes with EXTRACT_ANSWER (which returns a confidence score you can threshold on) or a structured COMPLETE prompt.
Multimodal enrichment — transcribing call recordings, summarizing documents, and translating multilingual content inline in a transformation so downstream dashboards reflect a complete global view rather than just the English subset.
Aggregate reasoning — AI_AGG summarizes the top recurring themes across all reviews in a product line in one grouped query, replacing a manual sampling exercise.

Cortex Agents: questions that span numbers and prose

Agents earn their keep on questions a single tool can't answer. The canonical pattern is "what happened, and why" — a metric plus its explanation:

Governed self-serve analytics: "Why did EMEA churn rise last quarter, and what are customers saying?" The agent calls Cortex Analyst for the churn number (via a semantic view) and Cortex Search over support tickets/NPS verbatims for the qualitative reasons, then reconciles both — under the asking user's permissions.
Operational copilots that combine a custom tool (a stored procedure implementing business logic) with retrieval — e.g. an agent that looks up an account's current status via a UDF and pulls the relevant policy text via Search.
Cross-system workflows via MCP — agents calling out to Atlassian, Salesforce, or custom MCP servers to fetch context that lives outside Snowflake, while keeping the data-grounded reasoning inside the governed perimeter.

Document AI: unstructured documents into the warehouse

Document AI is the workhorse for turning PDFs and images into rows. It is most valuable as a preprocessing step that feeds the other capabilities: extract invoice fields, contract terms, or form data, land them as structured columns, then let AISQL classify them or an agent reason over them. Typical homes are finance (invoice and receipt extraction), legal (contract clause extraction), insurance (claims forms), and healthcare (intake documents) — anywhere a custom OCR/parsing pipeline used to live.

Cortex Knowledge Extensions: licensed knowledge, governed

Cortex Knowledge Extensions (CKEs) are Cortex Search services packaged and shared — typically via the Snowflake Marketplace or private listings — so you can ground an agent or RAG app in third-party or licensed content (market research, reference texts, industry corpora, regulatory libraries) without building and maintaining that ingestion yourself. Two patterns dominate:

Consuming a CKE to augment your own agent with authoritative external knowledge — the licensed content stays governed and is retrievable exactly like your internal Search services.
Publishing a CKE — if you own valuable content, you can package it as a CKE and monetize it through Marketplace (subscriptions or trials), turning a proprietary corpus into a product other Snowflake customers ground their AI on.

Cost: Still Tokens, Still Credits, Still the Same Trap

The billing model is unchanged in shape: AISQL functions are billed per token, converted to Snowflake credits, and agent runs bill for every underlying tool call and token they consume. That last part is the new wrinkle — an agent that fans out to Analyst, Search, web search, and a couple of reflection steps consumes more than a single function call, and a multi-step agent over a large dataset can surprise you the same way SUMMARIZE over 10M rows did in 2024.

What you run	What it bills	Cost-control lever
AISQL function over a table	Tokens × model rate → credits	Test on `LIMIT 1000`, pick the smallest model that passes eval
Cortex Search service	Embedding + serving + sync compute	Tune `TARGET_LAG`; don't over-refresh static corpora
Cortex Analyst query	Orchestration model tokens	Tight semantic views reduce retries/clarifications
Cortex Agent run	Sum of every tool call + reasoning tokens	Constrain the toolset; cap steps; cheaper orchestration model

Agents multiply the 2024 batch trap. A single agent question is cheap. The same agent wired into an app and called on every page load, fanning out to four tools each time, is not. Before exposing an agent in production, measure the credit cost of a representative turn, multiply by expected traffic, and put a budget/resource monitor on the warehouse and on Cortex consumption. Use the cheapest orchestration model that holds answer quality on your eval set — the frontier model is rarely needed for the planning step.

Best Practices

Patterns that consistently separate cost-effective, reliable Cortex deployments from expensive, flaky ones:

Filter before you call the model. Every row you eliminate with a plain SQL WHERE before an AISQL function is a row you don't pay tokens for. Push cheap predicates first; reserve the LLM for the rows that actually need it.
Right-size the model per task. Don't default to the most capable (and expensive) model. A small/cheap model handles most classification, sentiment, and extraction at a fraction of the cost; reserve frontier models for genuinely hard generation or planning. The orchestration model for an agent rarely needs to be the biggest one.
Count tokens on a sample first. Run COUNT_TOKENS over a representative sample before processing a large table — token counts are almost always higher than intuition suggests, and this is where batch jobs blow their budget.
Prefer batch over real-time for AISQL. The functions are built for set-based, batch processing; per-row interactive calls are where latency and cost get ugly.
Invest in the semantic view. Cortex Analyst (and therefore agents) is only as reliable as the semantic model behind it. Tight, well-described measures and dimensions reduce clarifying-question round-trips and wrong SQL — both of which cost tokens and trust.
Constrain the agent toolbelt. Give an agent only the tools a task needs and cap its steps. A wide-open toolset increases both latency and the chance of a wrong tool choice.
Use AI observability. Cortex's AI observability (GA mid-2025) lets you trace agent runs and evaluate quality — wire it in before you expose an agent to users, not after the first incident.

Lessons Learned

Agents multiply the batch cost trap. A single agent question is cheap; the same agent on every page load, fanning out to four tools each time, is not. Measure the credit cost of a representative turn and multiply by traffic before shipping, and put resource monitors on both the warehouse and Cortex consumption.
The data-stays-put guarantee is the real differentiator. With frontier models now served inside Cortex, the strongest reason to choose it is no longer model quality — it's zero data egress and unified governance. Your proprietary data isn't used to train the underlying models, and existing RBAC applies to every tool call. For nervous compliance teams, this eliminates an entire conversation.
Garbage retrieval beats a good model every time. Most "the agent gave a bad answer" incidents trace to retrieval (a weak Search index or a sloppy semantic view), not the LLM. Fix the grounding layer first; swapping models rarely helps.
Document AI quality needs a human-reviewed sample. Don't trust extraction at scale until you've reviewed a sample and measured field-level accuracy. Threshold on the confidence scores and route low-confidence extractions to human review.
Treat CKEs like dependencies. A consumed Knowledge Extension is third-party content with its own update cadence and licensing — version and monitor it like any other external dependency, not as a static asset.

When Cortex Wins in 2026 — and When It Still Doesn't

The honest assessment has shifted because the model gap closed. In 2024, the strongest reason to leave Cortex was "I need GPT-4/Claude quality." With frontier models now served inside Cortex, the decision is mostly about where your data and governance live.

Scenario	Verdict	Why
Q&A and agents over governed Snowflake data	Cortex wins	Data never leaves the perimeter; RBAC + Horizon apply to every tool call; no glue infra
Batch enrichment / classification over big tables	Cortex wins	SQL-native, scales on warehouses, no rate limits
Blending structured metrics + unstructured text in one answer	Cortex wins	Agents + Analyst + Search is purpose-built for exactly this
Low-latency consumer chatbot at the edge	External / app-side	Round-trip and cold-start latency; agent loop adds steps
Deeply custom agent graphs, human-in-the-loop tooling	External framework	LangGraph / custom orchestration still more flexible than managed loop
Data lives outside Snowflake	External	The whole value prop is governance over data already in Snowflake

The clearest 2026 use cases: governed self-serve analytics where a business user asks a question that spans a fact table and a pile of support tickets, and an agent answers it with cited SQL and cited text — under the asker's own permissions. The clearest cases to stay external: latency-critical user-facing apps, and complex agent topologies that need the control of a full orchestration framework. Most Snowflake-heavy shops run both, exactly as in 2024 — but the line moved, and a lot more now lands on the Cortex side of it.

In Part 1 we saw the engine these agents run on — separated storage and compute, governed by the cloud services layer. In Part 3, we close the loop on the data side: how to feed Snowflake continuously so these agents and dashboards reason over fresh data, using the next-generation Snowpipe Streaming and Dynamic Tables on AWS.

❄️ Continue the series

Snowflake Internals: The Three-Layer Architecture
Snowflake Cortex AI in 2026 (this article)
Real-Time Snowflake on AWS: Snowpipe Streaming, Dynamic Tables, and Lessons Learned →

Related: the 2024 Cortex AI deep dive — the building-blocks view that this article builds on.