Snowflake Cortex AI Deep Dive: LLMs Inside Your Data Warehouse

Snowflake's pitch with Cortex AI is compelling: what if you didn't need a separate LLM infrastructure layer at all? What if you could run SELECT SNOWFLAKE.CORTEX.COMPLETE('llama3.1-70b', 'Summarize this contract: ' || contract_text) FROM contracts and get AI-powered results on your existing data, billed to your existing Snowflake contract, governed by your existing Snowflake access controls? No API keys. No separate vector database. No orchestration infrastructure.

For organizations that are already Snowflake-native, Cortex AI is genuinely worth serious evaluation. For organizations that aren't, the question is whether the workflow simplicity is worth any trade-offs in model selection and flexibility. This article covers the full Cortex AI product surface — LLM functions, Cortex Search (hybrid RAG), Cortex Analyst (text-to-SQL), Document AI, and Arctic — with honest assessments of where it works, where it doesn't, and how costs actually add up.

The Cortex AI Product Surface

graph TD
    subgraph CortexAI["Snowflake Cortex AI"]
        LLMFunctions["LLM Functions\nCOMPLETE, SUMMARIZE\nTRANSLATE, SENTIMENT\nEXTRACT_ANSWER"]
        CortexSearch["Cortex Search\nHybrid vector+BM25\nManaged embedding + index\nRAG without infra"]
        CortexAnalyst["Cortex Analyst\nText-to-SQL on semantic model\nNatural language → verified SQL"]
        DocAI["Document AI\nPDF/image extraction\nClassification + field extraction\nno-code document processing"]
        FinetuningAPI["Cortex Fine-Tuning\nParameter-efficient fine-tuning\non your own Snowflake data"]
    end

    subgraph Models["Available Models (Dec 2024)"]
        Arctic["Snowflake Arctic\n(480B MoE, open weights)"]
        LLaMA["Meta Llama 3.1\n8B / 70B / 405B"]
        Mistral["Mistral / Mixtral\n7B / 8x7B / 8x22B"]
        Reka["Reka Flash\n(vision + text)"]
        GoogleM["Google gemma-7b"]
    end

    LLMFunctions --> Arctic
    LLMFunctions --> LLaMA
    LLMFunctions --> Mistral
    CortexSearch --> LLMFunctions
    CortexAnalyst --> LLMFunctions

Cortex AI product surface as of December 2024. LLM Functions are the foundation — all higher-level products (Cortex Search, Analyst, Document AI) ultimately call LLM functions internally. The key constraint: no GPT-4 or Claude access; models are limited to open-weights models Snowflake hosts directly.

LLM Functions: SQL-Native AI

Cortex LLM functions are SQL scalar functions that call hosted LLMs as part of a query. They run inside Snowflake's compute infrastructure — no external API calls, no data leaving Snowflake, governed by existing Snowflake RBAC.

-- Summarize customer support tickets using Llama 3.1 70B
SELECT
    ticket_id,
    created_at,
    SNOWFLAKE.CORTEX.SUMMARIZE(ticket_text)          AS summary,
    SNOWFLAKE.CORTEX.SENTIMENT(ticket_text)           AS sentiment,    -- returns float -1 to 1
    SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
        ticket_text,
        ['billing', 'technical', 'account', 'other']
    ):label::STRING                                   AS category
FROM support_tickets
WHERE created_at >= DATEADD(day, -7, CURRENT_DATE());

-- Extract structured fields from unstructured text
SELECT
    contract_id,
    SNOWFLAKE.CORTEX.COMPLETE(
        'mistral-large2',
        CONCAT(
            'Extract the following fields from this contract as JSON: ',
            '{"start_date": "", "end_date": "", "total_value": "", "parties": []}. ',
            'Contract: ', contract_text
        )
    )::VARIANT AS extracted_fields
FROM contracts;

The available functions cover the most common NLP tasks: COMPLETE (free-form generation), SUMMARIZE (optimized summarization shortcut), SENTIMENT (returns a score), TRANSLATE (language translation), EXTRACT_ANSWER (extractive QA from a context), CLASSIFY_TEXT (multi-class classification), and EMBED_TEXT_768/1024 (generate embeddings for semantic search). All are SQL functions, composable with standard SQL.

Model availability by region: Cortex AI model availability varies by Snowflake region. Not all models are available in all regions — particularly EU regions have a smaller model menu due to data sovereignty requirements. Check the Snowflake docs for your specific region before designing workflows around a specific model. The snowflake.cortex.get_service_status('mistral-large2') function tells you if a model is available in your account.

Cortex Search: Managed RAG Without Infrastructure

Cortex Search (GA Q4 2024) is the most significant Cortex product for data teams that want RAG without building or managing a vector database. You create a "Cortex Search Service" pointing at a Snowflake table (or view) with a text column, and Snowflake automatically handles embedding, indexing, and hybrid search (dense vector + BM25 keyword). The service stays synchronized as the source table changes.

-- Create a Cortex Search service on a documents table
CREATE CORTEX SEARCH SERVICE product_docs_search
    ON doc_text                          -- column to embed + index
    ATTRIBUTES category, updated_at     -- filterable metadata columns
    WAREHOUSE = cortex_wh
    TARGET_LAG = '1 hour'              -- how often to sync with source
AS (
    SELECT
        doc_id,
        title,
        doc_text,
        category,
        updated_at
    FROM product_documentation
    WHERE is_published = TRUE
);

from snowflake.cortex import CortexSearchService

# Query the search service from Python
service = CortexSearchService(
    session=snowflake_session,
    service_name="product_docs_search"
)

results = service.search(
    query="How do I configure SSO with Okta?",
    columns=["title", "doc_text"],
    filter={"@eq": {"category": "authentication"}},  # metadata filter
    limit=5
)

# Build RAG prompt with retrieved chunks
context = "\n---\n".join([r["doc_text"] for r in results.results])
answer = snowflake.cortex.Complete(
    "mistral-large2",
    f"Answer based on context:\n{context}\n\nQuestion: {query}"
)

Cortex Search's hybrid search (vector + keyword) is particularly valuable for technical documentation and code — keyword search finds exact terms (function names, error codes, product names) that semantic search alone misses. The managed sync eliminates one of the hardest operational problems in DIY RAG: keeping the vector index in sync with source data changes.

The limitation: Cortex Search is Snowflake-specific and uses Snowflake's own embedding models (snowflake-arctic-embed-m or -l variants). You can't bring your own embedding model or swap to a different model per-document. For most enterprise document search use cases, this is fine. For specialized domains (genomics, legal, code) where domain-specific fine-tuned embeddings outperform general-purpose ones, the inflexibility matters.

Cortex Analyst: Text-to-SQL on Your Semantic Model

Cortex Analyst is Snowflake's answer to the "can I ask questions about my data in English?" problem — executed properly, as a semantic model layer rather than raw schema-to-SQL. You define a semantic model YAML file describing your tables, measures, dimensions, and their business meaning, and Cortex Analyst uses it to generate verified SQL.

# semantic_model.yaml — the layer between natural language and SQL
name: Sales Analytics
tables:
  - name: orders
    base_table:
      database: ANALYTICS
      schema: PROD
      table: FCT_ORDERS
    dimensions:
      - name: order_date
        expr: order_date
        data_type: date
        label: Order Date
      - name: customer_region
        expr: c.region
        data_type: varchar
        label: Customer Region
    measures:
      - name: total_revenue
        expr: SUM(order_total)
        data_type: number
        label: Total Revenue
        description: "Sum of order_total for completed orders only"
        filters:
          - expr: order_status = 'completed'
      - name: order_count
        expr: COUNT(DISTINCT order_id)
        label: Number of Orders

With this semantic model, a business user can ask "What was total revenue by region in Q3?" and Cortex Analyst generates the correct SQL — not by guessing column names from raw schema, but by mapping business terms to verified SQL expressions. This is substantially more reliable than naive text-to-SQL approaches that hallucinate column names or miss business logic (like the order_status = 'completed' filter that belongs on revenue but not on order count).

The honest assessment: Cortex Analyst works well for well-defined metrics on clean dimensional models. It struggles with complex multi-step analyses, ambiguous business terms, and ad-hoc exploratory questions that require understanding data distribution before formulating the query. It's a genuinely useful tool for business users asking standard questions about standard reports — not a replacement for a data analyst on novel analytical tasks.

Document AI: Unstructured Document Processing

Document AI processes PDFs, scanned images, and unstructured documents at scale — extracting structured fields (dates, names, amounts, entities) without hand-crafted parsers. The typical workflow: upload documents to a Snowflake stage, create a Document AI build, define the fields to extract, review a sample to verify accuracy, then process the full corpus as a SQL query.

-- Extract fields from invoices using Document AI
-- (After building and reviewing a Document AI model in the UI)

SELECT
    RELATIVE_PATH,
    doc_ai_model!PREDICT(
        GET_PRESIGNED_URL('@invoice_stage', RELATIVE_PATH), 1
    ) AS extracted
FROM DIRECTORY('@invoice_stage')
WHERE RELATIVE_PATH LIKE '%.pdf';

-- Access individual extracted fields
SELECT
    RELATIVE_PATH,
    extracted:invoice_number::VARCHAR AS invoice_number,
    extracted:invoice_date::DATE      AS invoice_date,
    extracted:total_amount::FLOAT     AS total_amount,
    extracted:vendor_name::VARCHAR    AS vendor_name
FROM (
    SELECT RELATIVE_PATH, doc_ai_model!PREDICT(...) AS extracted
    FROM DIRECTORY('@invoice_stage')
);

Document AI is genuinely useful for finance teams processing invoices, legal teams extracting contract terms, and HR teams processing resumes — tasks that previously required custom OCR pipelines or expensive third-party services. The in-Snowflake execution means extracted data lands directly in your data warehouse with no ETL step and full lineage.

Cost Model: Tokens, Credits, and Surprises

Cortex AI functions are billed in tokens, but tokens are charged as Snowflake credits (not a separate API budget). This is important: a Cortex LLM function call consumes Snowflake credits from your warehouse, not a separate AI budget line item. In practice this means:

Model	Tokens per credit	Effective $/1M tokens (at $3/credit)
Llama 3.1 8B	200,000	$15
Llama 3.1 70B	25,000	$120
Mistral 7B	200,000	$15
Mixtral 8x7B	33,333	$90
mistral-large2	5,000	$600
Snowflake Arctic	100,000	$30
EMBED_TEXT_768	500,000 tokens	$6

The premium for mistral-large2 (Snowflake's most capable generally-available model as of late 2024) is significant — 12x more expensive than Llama 8B. For batch summarization or classification tasks, Llama 8B or Mistral 7B is usually sufficient and dramatically cheaper. Save mistral-large2 for tasks where quality genuinely matters: complex extractions, nuanced generation, Cortex Analyst query generation.

The "run AI on all your historical data" trap: Cortex makes it trivially easy to run SELECT SNOWFLAKE.CORTEX.SUMMARIZE(text) FROM my_table on millions of rows. The first time you do this on a 10M-row table, the credit bill can be shocking. Always test on a sample (LIMIT 1000), calculate the per-row cost, multiply by row count, and compare against your monthly budget before running batch AI at scale. Use SHOW FUNCTIONS LIKE 'COMPLETE' to check current token rates in your region.

Snowflake Arctic: The Open-Weights Model

Snowflake Arctic is Snowflake's own open-weights LLM, released in April 2024. It's a 480B parameter mixture-of-experts (MoE) model — 128 expert layers but only a small subset activate per token, making inference cheaper than a dense 480B model would be. Arctic was optimized specifically for enterprise tasks: SQL generation, code, and data transformation — not creative writing or general reasoning.

In practice, Arctic performs well on structured data tasks (SQL, JSON extraction, data transformation instructions) and is priced attractively within Cortex. For general conversation quality, Llama 3.1 70B is stronger. The "open weights" aspect is meaningful: Arctic's weights are available on Hugging Face under the Apache 2.0 license, so you can self-host it outside Snowflake if needed.

When Cortex AI Wins vs. External LLM APIs

Scenario	Cortex AI advantage	External API advantage	Verdict
Batch processing millions of rows	SQL-native, scales with Snowflake compute, no API rate limits	GPT-4 / Claude quality	Cortex wins (cost + simplicity)
Real-time user-facing chatbot	Governed by Snowflake IAM	Lower latency, GPT-4o / Claude quality	External API wins (latency + quality)
RAG over Snowflake data	Cortex Search, no vector DB needed, data stays in Snowflake	Custom embedding models	Cortex wins (simplicity)
Text-to-SQL for business users	Cortex Analyst + semantic model = verified SQL	More models to choose from	Cortex wins (semantic model integration)
Complex multi-step AI agents	Limited agent framework	LangChain, LangGraph, CrewAI ecosystem	External wins (ecosystem)
Data governance / compliance	Data never leaves Snowflake, existing IAM applies	Requires data egress policies	Cortex wins (compliance)

The clearest Cortex AI use cases: batch document processing on large tables in Snowflake, semantic search over Snowflake-managed documents, and text-to-SQL for non-technical users on well-defined metrics. The clearest cases for external APIs: user-facing latency-sensitive applications, tasks requiring frontier model quality (GPT-4o, Claude), and complex agentic workflows that need a full orchestration framework.

Most Snowflake-native organizations end up running both: Cortex for data-warehouse-side AI (batch enrichment, self-serve analytics), external LLM APIs for application-side AI (user-facing features, complex agents). The data governance story is Cortex's strongest argument — if your compliance team is nervous about sending customer data to an external API, Cortex eliminates that conversation entirely.