Snowflake's pitch with Cortex AI is compelling: what if you didn't need a separate LLM infrastructure layer at all? What if you could run SELECT SNOWFLAKE.CORTEX.COMPLETE('llama3.1-70b', 'Summarize this contract: ' || contract_text) FROM contracts and get AI-powered results on your existing data, billed to your existing Snowflake contract, governed by your existing Snowflake access controls? No API keys. No separate vector database. No orchestration infrastructure.
For organizations that are already Snowflake-native, Cortex AI is genuinely worth serious evaluation. For organizations that aren't, the question is whether the workflow simplicity is worth any trade-offs in model selection and flexibility. This article covers the full Cortex AI product surface — LLM functions, Cortex Search (hybrid RAG), Cortex Analyst (text-to-SQL), Document AI, and Arctic — with honest assessments of where it works, where it doesn't, and how costs actually add up.
The Cortex AI Product Surface
graph TD
subgraph CortexAI["Snowflake Cortex AI"]
LLMFunctions["LLM Functions\nCOMPLETE, SUMMARIZE\nTRANSLATE, SENTIMENT\nEXTRACT_ANSWER"]
CortexSearch["Cortex Search\nHybrid vector+BM25\nManaged embedding + index\nRAG without infra"]
CortexAnalyst["Cortex Analyst\nText-to-SQL on semantic model\nNatural language → verified SQL"]
DocAI["Document AI\nPDF/image extraction\nClassification + field extraction\nno-code document processing"]
FinetuningAPI["Cortex Fine-Tuning\nParameter-efficient fine-tuning\non your own Snowflake data"]
end
subgraph Models["Available Models (Dec 2024)"]
Arctic["Snowflake Arctic\n(480B MoE, open weights)"]
LLaMA["Meta Llama 3.1\n8B / 70B / 405B"]
Mistral["Mistral / Mixtral\n7B / 8x7B / 8x22B"]
Reka["Reka Flash\n(vision + text)"]
GoogleM["Google gemma-7b"]
end
LLMFunctions --> Arctic
LLMFunctions --> LLaMA
LLMFunctions --> Mistral
CortexSearch --> LLMFunctions
CortexAnalyst --> LLMFunctions
Cortex AI product surface as of December 2024. LLM Functions are the foundation — all higher-level products (Cortex Search, Analyst, Document AI) ultimately call LLM functions internally. The key constraint: no GPT-4 or Claude access; models are limited to open-weights models Snowflake hosts directly.
LLM Functions: SQL-Native AI
Cortex LLM functions are SQL scalar functions that call hosted LLMs as part of a query. They run inside Snowflake's compute infrastructure — no external API calls, no data leaving Snowflake, governed by existing Snowflake RBAC.
-- Summarize customer support tickets using Llama 3.1 70B
SELECT
ticket_id,
created_at,
SNOWFLAKE.CORTEX.SUMMARIZE(ticket_text) AS summary,
SNOWFLAKE.CORTEX.SENTIMENT(ticket_text) AS sentiment, -- returns float -1 to 1
SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
ticket_text,
['billing', 'technical', 'account', 'other']
):label::STRING AS category
FROM support_tickets
WHERE created_at >= DATEADD(day, -7, CURRENT_DATE());
-- Extract structured fields from unstructured text
SELECT
contract_id,
SNOWFLAKE.CORTEX.COMPLETE(
'mistral-large2',
CONCAT(
'Extract the following fields from this contract as JSON: ',
'{"start_date": "", "end_date": "", "total_value": "", "parties": []}. ',
'Contract: ', contract_text
)
)::VARIANT AS extracted_fields
FROM contracts;
The available functions cover the most common NLP tasks: COMPLETE (free-form generation), SUMMARIZE (optimized summarization shortcut), SENTIMENT (returns a score), TRANSLATE (language translation), EXTRACT_ANSWER (extractive QA from a context), CLASSIFY_TEXT (multi-class classification), and EMBED_TEXT_768/1024 (generate embeddings for semantic search). All are SQL functions, composable with standard SQL.
Model availability by region: Cortex AI model availability varies by Snowflake region. Not all models are available in all regions — particularly EU regions have a smaller model menu due to data sovereignty requirements. Check the Snowflake docs for your specific region before designing workflows around a specific model. The snowflake.cortex.get_service_status('mistral-large2') function tells you if a model is available in your account.
Cortex Search: Managed RAG Without Infrastructure
Cortex Search (GA Q4 2024) is the most significant Cortex product for data teams that want RAG without building or managing a vector database. You create a "Cortex Search Service" pointing at a Snowflake table (or view) with a text column, and Snowflake automatically handles embedding, indexing, and hybrid search (dense vector + BM25 keyword). The service stays synchronized as the source table changes.
-- Create a Cortex Search service on a documents table
CREATE CORTEX SEARCH SERVICE product_docs_search
ON doc_text -- column to embed + index
ATTRIBUTES category, updated_at -- filterable metadata columns
WAREHOUSE = cortex_wh
TARGET_LAG = '1 hour' -- how often to sync with source
AS (
SELECT
doc_id,
title,
doc_text,
category,
updated_at
FROM product_documentation
WHERE is_published = TRUE
);
from snowflake.cortex import CortexSearchService
# Query the search service from Python
service = CortexSearchService(
session=snowflake_session,
service_name="product_docs_search"
)
results = service.search(
query="How do I configure SSO with Okta?",
columns=["title", "doc_text"],
filter={"@eq": {"category": "authentication"}}, # metadata filter
limit=5
)
# Build RAG prompt with retrieved chunks
context = "\n---\n".join([r["doc_text"] for r in results.results])
answer = snowflake.cortex.Complete(
"mistral-large2",
f"Answer based on context:\n{context}\n\nQuestion: {query}"
)
Cortex Search's hybrid search (vector + keyword) is particularly valuable for technical documentation and code — keyword search finds exact terms (function names, error codes, product names) that semantic search alone misses. The managed sync eliminates one of the hardest operational problems in DIY RAG: keeping the vector index in sync with source data changes.
The limitation: Cortex Search is Snowflake-specific and uses Snowflake's own embedding models (snowflake-arctic-embed-m or -l variants). You can't bring your own embedding model or swap to a different model per-document. For most enterprise document search use cases, this is fine. For specialized domains (genomics, legal, code) where domain-specific fine-tuned embeddings outperform general-purpose ones, the inflexibility matters.
Cortex Analyst: Text-to-SQL on Your Semantic Model
Cortex Analyst is Snowflake's answer to the "can I ask questions about my data in English?" problem — executed properly, as a semantic model layer rather than raw schema-to-SQL. You define a semantic model YAML file describing your tables, measures, dimensions, and their business meaning, and Cortex Analyst uses it to generate verified SQL.
# semantic_model.yaml — the layer between natural language and SQL
name: Sales Analytics
tables:
- name: orders
base_table:
database: ANALYTICS
schema: PROD
table: FCT_ORDERS
dimensions:
- name: order_date
expr: order_date
data_type: date
label: Order Date
- name: customer_region
expr: c.region
data_type: varchar
label: Customer Region
measures:
- name: total_revenue
expr: SUM(order_total)
data_type: number
label: Total Revenue
description: "Sum of order_total for completed orders only"
filters:
- expr: order_status = 'completed'
- name: order_count
expr: COUNT(DISTINCT order_id)
label: Number of Orders
With this semantic model, a business user can ask "What was total revenue by region in Q3?" and Cortex Analyst generates the correct SQL — not by guessing column names from raw schema, but by mapping business terms to verified SQL expressions. This is substantially more reliable than naive text-to-SQL approaches that hallucinate column names or miss business logic (like the order_status = 'completed' filter that belongs on revenue but not on order count).
The honest assessment: Cortex Analyst works well for well-defined metrics on clean dimensional models. It struggles with complex multi-step analyses, ambiguous business terms, and ad-hoc exploratory questions that require understanding data distribution before formulating the query. It's a genuinely useful tool for business users asking standard questions about standard reports — not a replacement for a data analyst on novel analytical tasks.
Document AI: Unstructured Document Processing
Document AI processes PDFs, scanned images, and unstructured documents at scale — extracting structured fields (dates, names, amounts, entities) without hand-crafted parsers. The typical workflow: upload documents to a Snowflake stage, create a Document AI build, define the fields to extract, review a sample to verify accuracy, then process the full corpus as a SQL query.
-- Extract fields from invoices using Document AI
-- (After building and reviewing a Document AI model in the UI)
SELECT
RELATIVE_PATH,
doc_ai_model!PREDICT(
GET_PRESIGNED_URL('@invoice_stage', RELATIVE_PATH), 1
) AS extracted
FROM DIRECTORY('@invoice_stage')
WHERE RELATIVE_PATH LIKE '%.pdf';
-- Access individual extracted fields
SELECT
RELATIVE_PATH,
extracted:invoice_number::VARCHAR AS invoice_number,
extracted:invoice_date::DATE AS invoice_date,
extracted:total_amount::FLOAT AS total_amount,
extracted:vendor_name::VARCHAR AS vendor_name
FROM (
SELECT RELATIVE_PATH, doc_ai_model!PREDICT(...) AS extracted
FROM DIRECTORY('@invoice_stage')
);
Document AI is genuinely useful for finance teams processing invoices, legal teams extracting contract terms, and HR teams processing resumes — tasks that previously required custom OCR pipelines or expensive third-party services. The in-Snowflake execution means extracted data lands directly in your data warehouse with no ETL step and full lineage.
Cost Model: Tokens, Credits, and Surprises
Cortex AI functions are billed in tokens, but tokens are charged as Snowflake credits (not a separate API budget). This is important: a Cortex LLM function call consumes Snowflake credits from your warehouse, not a separate AI budget line item. In practice this means:
| Model | Tokens per credit | Effective $/1M tokens (at $3/credit) |
|---|---|---|
| Llama 3.1 8B | 200,000 | $15 |
| Llama 3.1 70B | 25,000 | $120 |
| Mistral 7B | 200,000 | $15 |
| Mixtral 8x7B | 33,333 | $90 |
| mistral-large2 | 5,000 | $600 |
| Snowflake Arctic | 100,000 | $30 |
| EMBED_TEXT_768 | 500,000 tokens | $6 |
The premium for mistral-large2 (Snowflake's most capable generally-available model as of late 2024) is significant — 12x more expensive than Llama 8B. For batch summarization or classification tasks, Llama 8B or Mistral 7B is usually sufficient and dramatically cheaper. Save mistral-large2 for tasks where quality genuinely matters: complex extractions, nuanced generation, Cortex Analyst query generation.
The "run AI on all your historical data" trap: Cortex makes it trivially easy to run SELECT SNOWFLAKE.CORTEX.SUMMARIZE(text) FROM my_table on millions of rows. The first time you do this on a 10M-row table, the credit bill can be shocking. Always test on a sample (LIMIT 1000), calculate the per-row cost, multiply by row count, and compare against your monthly budget before running batch AI at scale. Use SHOW FUNCTIONS LIKE 'COMPLETE' to check current token rates in your region.
Snowflake Arctic: The Open-Weights Model
Snowflake Arctic is Snowflake's own open-weights LLM, released in April 2024. It's a 480B parameter mixture-of-experts (MoE) model — 128 expert layers but only a small subset activate per token, making inference cheaper than a dense 480B model would be. Arctic was optimized specifically for enterprise tasks: SQL generation, code, and data transformation — not creative writing or general reasoning.
In practice, Arctic performs well on structured data tasks (SQL, JSON extraction, data transformation instructions) and is priced attractively within Cortex. For general conversation quality, Llama 3.1 70B is stronger. The "open weights" aspect is meaningful: Arctic's weights are available on Hugging Face under the Apache 2.0 license, so you can self-host it outside Snowflake if needed.
When Cortex AI Wins vs. External LLM APIs
| Scenario | Cortex AI advantage | External API advantage | Verdict |
|---|---|---|---|
| Batch processing millions of rows | SQL-native, scales with Snowflake compute, no API rate limits | GPT-4 / Claude quality | Cortex wins (cost + simplicity) |
| Real-time user-facing chatbot | Governed by Snowflake IAM | Lower latency, GPT-4o / Claude quality | External API wins (latency + quality) |
| RAG over Snowflake data | Cortex Search, no vector DB needed, data stays in Snowflake | Custom embedding models | Cortex wins (simplicity) |
| Text-to-SQL for business users | Cortex Analyst + semantic model = verified SQL | More models to choose from | Cortex wins (semantic model integration) |
| Complex multi-step AI agents | Limited agent framework | LangChain, LangGraph, CrewAI ecosystem | External wins (ecosystem) |
| Data governance / compliance | Data never leaves Snowflake, existing IAM applies | Requires data egress policies | Cortex wins (compliance) |
The clearest Cortex AI use cases: batch document processing on large tables in Snowflake, semantic search over Snowflake-managed documents, and text-to-SQL for non-technical users on well-defined metrics. The clearest cases for external APIs: user-facing latency-sensitive applications, tasks requiring frontier model quality (GPT-4o, Claude), and complex agentic workflows that need a full orchestration framework.
Most Snowflake-native organizations end up running both: Cortex for data-warehouse-side AI (batch enrichment, self-serve analytics), external LLM APIs for application-side AI (user-facing features, complex agents). The data governance story is Cortex's strongest argument — if your compliance team is nervous about sending customer data to an external API, Cortex eliminates that conversation entirely.