Building an AI Assistant with Snowflake Cortex: Agents, Analyst, Search & Streamlit

Everyone wants a chatbot over their data. The demo is easy — point an LLM at a table, wire up a text box, and you have something that answers three questions impressively and the fourth one confidently wrong. The hard part is the gap between that demo and something a finance team will actually trust with the quarterly numbers, that won't leak a column it shouldn't, and that doesn't quietly cost more than the analyst it was meant to help.

I spent a chunk of this spring building exactly that on Snowflake Cortex — an internal assistant that answers questions over both the structured warehouse (revenue, pipeline, support volumes) and a pile of documents (product docs, policy PDFs, past incident write-ups). This is the build. Not the marketing tour of Cortex — I did that separately — but the actual architecture, the artifacts that mattered, the parts that bit, and an honest account of cost and evaluation. If you've read the RAG fundamentals piece, think of this as what it looks like when the whole retrieval-and-generation stack lives inside one governed platform instead of six services you glued together.

The one-sentence version of what I learned: the quality of a Cortex assistant is almost entirely the quality of its semantic model and its document corpus — the agent, the LLM, and the UI are the easy 20%. If your assistant is wrong, it's almost never the model's fault.

The architecture, and why it's shaped this way

The mental model that made everything click: a Cortex Agent is an orchestrator, and everything else is a tool it calls. The agent doesn't know anything about your data. It knows how to plan, pick a tool, read the tool's output, decide whether it has enough to answer, and either loop again or write a grounded response. The intelligence about your business lives in the tools.

For this assistant there were two tools that did the real work, plus the model and the UI around them:

Cortex Analyst — the structured-data tool. It turns a plain-English question into SQL against a semantic view (more on that below), runs it on a warehouse, and returns rows plus the SQL it wrote. This is the "what was Q1 net revenue in EMEA" path.
Cortex Search — the unstructured-data tool. A managed hybrid (vector + keyword) retrieval service over a chunked document corpus. This is the "what's our policy on data retention for trial accounts" path.
The Cortex Agent — the loop that decides which of those two (or both, or neither) a question needs, stitches the results, and produces an answer with citations.
Streamlit in Snowflake — the chat UI, running inside the account so no data crosses the perimeter to render.

graph TD
    U["User question
(Streamlit in Snowflake)"] --> AG["Cortex Agent
plan · route · reflect"]
    AG -->|structured Q| AN["Cortex Analyst
NL → SQL over semantic view"]
    AG -->|document Q| SE["Cortex Search
hybrid retrieval over chunks"]
    AN --> SV["Semantic view
tables · dimensions · metrics · synonyms"]
    SV --> WH["Warehouse executes SQL
RBAC + masking enforced"]
    SE --> COR["Chunked document corpus
(embeddings managed)"]
    WH --> AG
    COR --> AG
    AG --> ANS["Grounded answer
+ citations + the SQL it ran"]
    ANS --> U

The agent plans and routes; Analyst and Search are the tools that touch data. Crucially, the SQL Analyst writes runs as the calling user's role — so row-access policies and masking apply exactly as they would in a manual query. Nothing about the assistant is a governance bypass.

The reason this shape matters: nothing in it moves data out of Snowflake. The model runs inside the Cortex perimeter, the SQL runs on your warehouse under your roles, the documents never leave, and Streamlit renders in-account. For the kind of data I was working with, "the text of a clinical note never has to be exported to an LLM provider" isn't a nice-to-have — it's the reason the project was allowed to exist at all.

Step 1: the semantic view (this is 80% of the job)

Cortex Analyst is only as good as the semantic layer you hand it. Give it raw tables with columns named amt_net_usd_v2 and it will guess, and it will sometimes guess wrong in ways that look right. The semantic view is where you teach it your business vocabulary: what a "customer" is, that "revenue" means SUM(net_amount) and not gross, that "region", "territory", and "area" all mean the same column.

In 2026 the cleaner path is a semantic view — a first-class Snowflake object you create with DDL and govern with grants, rather than a YAML file you upload. Here's a trimmed version of the one I built:

CREATE OR REPLACE SEMANTIC VIEW analytics.ai.revenue_sv
  TABLES (
    orders     AS sales.core.orders     PRIMARY KEY (order_id),
    customers  AS sales.core.customers  PRIMARY KEY (customer_id)
  )
  RELATIONSHIPS (
    orders_to_customers AS
      orders (customer_id) REFERENCES customers (customer_id)
  )
  DIMENSIONS (
    customers.region    AS region    WITH SYNONYMS ('area','territory','geo'),
    customers.segment   AS segment   WITH SYNONYMS ('tier','customer type'),
    orders.order_date   AS order_date
  )
  METRICS (
    orders.net_revenue  AS SUM(orders.net_amount)
      COMMENT = 'Net booked revenue after discounts; the default meaning of "revenue"',
    orders.order_count  AS COUNT(orders.order_id),
    orders.aov          AS SUM(orders.net_amount) / NULLIF(COUNT(orders.order_id),0)
      COMMENT = 'Average order value'
  )
  COMMENT = 'Governed semantic layer for the revenue assistant';

Three things earned their keep here, and I'd fight to keep all three:

Synonyms. Real users don't say "segment." They say "tier," "customer type," "which bucket." Every synonym you add is a question that now routes correctly instead of failing. I added them reactively — every time the assistant misunderstood a word in testing, that word became a synonym.
Metric definitions with comments. Defining net_revenue once, centrally, means the assistant can never accidentally answer with gross. The comment isn't decoration — Analyst reads it as context for disambiguation.
A verified query repository. This is the highest-leverage thing nobody mentions. You attach a set of known-good question→SQL pairs, and Analyst uses them as few-shot exemplars. The handful of questions you know people will ask — "monthly revenue by region this year," "top 10 customers by net revenue" — get pinned to correct SQL. Accuracy on those goes to effectively 100%, and the pattern generalizes to nearby questions.

Scope the semantic view narrowly. My first instinct was to throw the whole warehouse at it. Wrong move — a 200-column, 40-table semantic view makes Analyst slower, more expensive, and less accurate because there are more ways to be wrong. I ended up with a deliberately small view covering the questions the assistant was for. If a question needs a table that isn't in scope, I'd rather the assistant say so than improvise a join.

Step 2: the Cortex Search service over documents

The structured side answers "how much." The document side answers "why" and "what's the rule." Cortex Search is a managed retrieval service — you point it at a table of text chunks, it handles the embedding, indexing, and serving, and it does hybrid retrieval (semantic + keyword) which in practice beats pure vector search on the exact kind of jargon-and-acronym queries enterprise users type.

The work that mattered here was upstream of the service: chunking. I'll spare you the full digression — the RAG fundamentals piece covers why this dominates quality — but the short version is that I chunked on document structure (headings, sections) rather than a blind 1,000-character window, kept a doc_url and section on every chunk so the assistant could cite precisely, and re-ran the pipeline a few times until retrieval looked right. Creating the service itself is the easy part:

CREATE OR REPLACE CORTEX SEARCH SERVICE support.kb.docs_search
  ON chunk
  ATTRIBUTES product, section, doc_url
  WAREHOUSE = search_build_wh
  TARGET_LAG = '1 hour'
  AS (
    SELECT chunk, product, section, doc_url, last_modified
    FROM   support.kb.document_chunks
  );

TARGET_LAG is the freshness knob — the service keeps itself in sync with the base table within that window, so when someone updates a policy doc and the chunks refresh, the assistant picks it up within the hour without anyone rebuilding an index. The ATTRIBUTES are what you can filter on at query time (e.g., restrict to one product line) and what you surface as citation metadata.

Step 3: defining the agent

With both tools in place, the agent is mostly configuration: which tools it has, how to choose between them, and how to behave when it answers. The instructions matter more than they look — this is where you encode "don't guess," "always cite," and "if the question needs data you don't have, say so."

# Cortex Agent configuration (abridged)
tools:
  - name: revenue_analyst
    type: cortex_analyst
    semantic_view: analytics.ai.revenue_sv
    description: >
      Use for any quantitative question about orders, revenue, customers,
      regions, or segments. Returns numbers and the SQL used.
  - name: docs_search
    type: cortex_search
    service: support.kb.docs_search
    description: >
      Use for questions about policies, product behavior, definitions, or
      "how do we..." questions answered in written documentation.

orchestration: >
  Prefer revenue_analyst for "how many / how much / trend" questions.
  Prefer docs_search for policy, definition, and procedure questions.
  Some questions need both: get the number, then explain the rule behind it.
  If neither tool can answer, say you don't have that information.

response_instructions: >
  Always cite the document section or show the SQL that produced a number.
  Never state a figure you did not retrieve. Be concise. If the result set
  is empty, say so plainly rather than inventing a plausible answer.

You can run this agent two ways. The no-code path is Snowflake Intelligence — the built-in chat UI where business users just talk to the agent, which is genuinely where most of my users ended up. The programmatic path is the Cortex Agents REST API (/api/v2/cortex/agent:run), which is what you call when you want your own UI — and that's what the Streamlit front end uses.

Step 4: the Streamlit front end

Because the app runs as Streamlit in Snowflake, it authenticates as the user's session and calls the agent endpoint without any keys or external network hops. The core of the chat loop is unremarkable, which is the point:

import json
import _snowflake
import streamlit as st
from snowflake.snowpark.context import get_active_session

session = get_active_session()

def ask_agent(question: str):
    payload = {
        "model": "claude-sonnet-4-6",
        "messages": [{"role": "user", "content": [{"type": "text", "text": question}]}],
        "tools": [
            {"tool_spec": {"type": "cortex_analyst", "name": "revenue_analyst"}},
            {"tool_spec": {"type": "cortex_search",  "name": "docs_search"}},
        ],
    }
    resp = _snowflake.send_snow_api_request(
        "POST", "/api/v2/cortex/agent:run", {}, {}, payload, {}, 60_000
    )
    return json.loads(resp["content"])

st.title("📊 Ask the Data")
if q := st.chat_input("Ask about revenue, customers, or our policies..."):
    with st.chat_message("user"):
        st.write(q)
    with st.chat_message("assistant"):
        result = ask_agent(q)
        st.write(result["answer"])
        if result.get("sql"):
            with st.expander("SQL that produced this"):
                st.code(result["sql"], language="sql")
        for cite in result.get("citations", []):
            st.caption(f"📄 {cite['section']} — {cite['doc_url']}")

The two things I'd insist on in any real version: always render the SQL behind a number (the expander), and always show citations. Not because users read them — most don't — but because the few who do are exactly the skeptics whose trust you need, and the moment they can audit one answer, they stop second-guessing the rest. An assistant that shows its work gets adopted; one that asks for blind faith gets a polite "neat demo."

Governance: the part that's free, and the part that isn't

The free part is real and worth saying clearly: because Analyst's SQL runs under the caller's role, every existing access control still applies. Row-access policies, dynamic masking, column grants — the assistant can't see anything the user couldn't already query by hand. I didn't build a single new permission for the assistant; I reused the warehouse's existing RBAC. A sales rep asking about revenue gets their territory; a finance admin gets everything; the masking on customer emails stays masked in the chat exactly as it is in a worksheet.

The part that isn't free is the new surface: who can use the agent, and what's in its scope. The agent object, the semantic view, and the search service are all grantable objects — I locked the agent to specific roles. And the scoping discipline from Step 1 is itself a governance control: a narrow semantic view means the assistant structurally cannot answer questions about data you didn't intend it to touch. That's a much stronger guarantee than a prompt instruction telling it not to.

Cost: the multiplier that surprises people

Three things bill, and they don't bill the way a single LLM call does:

What	How it bills	Where it bites
LLM tokens (the agent + Analyst + responses)	per million tokens, by model	The agent loop calls the model multiple times per question — plan, tool-choice, reflect, respond. One user question is rarely one model call.
Warehouse compute (Analyst's SQL)	credits, while running	Cheap if your warehouse auto-suspends; a runaway if it doesn't.
Cortex Search serving	serving + storage for the index	Mostly steady; scales with corpus size and query volume.

The trap is the first row. A naive RAG call is one model invocation. An agent answering the same question might invoke the model four or five times as it plans, picks a tool, reads the result, decides it needs the other tool too, and finally writes the answer. That's the price of the orchestration that makes it smart — but it means your per-question cost is a multiple of what a back-of-envelope "tokens × price" estimate suggests. Budget for the loop, not the call. The mitigations that worked: a small, fast model as the default (escalating only when needed), a tight semantic view so Analyst gets it right on the first try instead of looping, and an aggressively auto-suspending warehouse.

Evaluation: how you know it's actually right

This is where most internal-assistant projects quietly die — they ship, someone catches it being wrong, and trust collapses faster than it built. The discipline that kept mine honest:

Separate retrieval from generation. Before judging answers, I checked: did Analyst write the right SQL? Did Search return the right chunks? Most "the AI is wrong" failures are actually retrieval failures — the model reasoned fine over the wrong inputs. Fixing those is a semantic-view or chunking fix, not a prompt fix.
A golden question set. Forty real questions with known-correct answers, run on every change to the semantic view or the corpus. It's the regression test for an AI feature, and it caught a semantic-view edit that silently broke three revenue questions.
Lean into abstention. Cortex Agents can decline to answer when they can't ground a response. I tuned the instructions to prefer "I don't have that" over a confident guess. An assistant that occasionally says "I can't answer that" is trusted; one that's confidently wrong even 5% of the time is not.
Log everything. Every question, the tool chosen, the SQL or chunks retrieved, and the answer — into a table. That log was how I found the missing synonyms, the out-of-scope questions worth adding, and the two questions where the model was overconfident.

Lessons learned

The semantic view is the product. I'll say it again because it's the whole game. Time spent on synonyms, metric definitions, and verified queries pays back more than anything you'll do to the agent or the prompt.
Narrow beats broad. A focused assistant that nails revenue questions beats a general one that's mediocre at everything. Scope is a feature, not a limitation.
Show the work. SQL and citations on every answer. This is what converts skeptics, and skeptics are who you're building for.
Budget for the loop, not the call. Agentic orchestration multiplies model invocations. The cost is real and worth it — just don't let it surprise you in the first month's bill.
Reuse RBAC; don't reinvent it. The assistant inheriting the user's existing permissions is the single biggest reason this was buildable on regulated data. Don't fight it; lean on it.
Prefer "I don't know." Tune for abstention. Trust is asymmetric — it takes one confident wrong answer to lose a user you spent a month earning.

Where this leaves you

The remarkable thing about building this on Cortex in 2026 isn't any single feature — it's that the entire stack a few years ago would have meant a vector database, an orchestration framework, an embedding pipeline, a model gateway, and a security review for each, all stitched together and all moving data around. Here it's a semantic view, a search service, an agent, and a Streamlit app, all inside one governed boundary. The hard engineering didn't disappear — it just moved to where it always belonged: modeling your business clearly enough that a machine can reason over it. That part is still on you, and it's still most of the work. The platform just stopped making it harder than it needs to be.

If you want the wider tour of what Cortex offers — Agents, AISQL, Document AI, Knowledge Extensions — and an honest read on where it wins versus a DIY stack, that's in Snowflake Cortex AI in 2026. For the platform underneath it, Snowflake Internals.