Module 6: AI Features

Cortex LLM assistants — not predictive ML

Duration: 45 min — Animation (3) · Think & Discuss (7) · Theory (15) · Quiz (3) · Practice (17)

1. Animation

2. Think & Discuss

Situation: Marcus saw Priya’s dashboard and asked whether AI can speed up analyst workflows. MHP demos assistants — medallion remains the source of truth.

Prompts:

  • What task is Marcus trying to speed up — pipelines, SQL, or dashboards?
  • Where would you trust AI in this project? Where would you not?
  • If AI writes SQL against Silver, what must still exist for trustworthy answers?

3. Theory

WarningModule 6 ≠ Module 9
Module Product Examples
6 — AI Features Cortex LLM AI_COMPLETE, Genie, Copilot
9 — ML (optional) Snowflake ML Functions ML.FORECAST, Snowpark ML training

Do not preview ML.FORECAST in Module 6.

NoteModel names drift

Do not hardcode LLM IDs (mistral-large2, databricks-meta-llama-3-3-70b-instruct, etc.). Verify available models in your workspace/account before class.

TipCortex Analyst & Cortex Search

Beyond AI_COMPLETE() (used hands-on below), Snowflake offers two additional AI services worth knowing:

  • Cortex Analyst — translates natural language questions into SQL, but unlike Genie, it runs against a governed semantic model (a YAML definition of tables, columns, and business metrics). This ensures the AI only queries approved, well-defined columns — making it suitable for self-service analytics where governance matters.
  • Cortex Search — provides hybrid search (semantic + keyword) over both unstructured documents and structured data, enabling RAG (Retrieval-Augmented Generation) patterns within Snowflake.

Databricks RAG equivalent: On Databricks, Mosaic AI Vector Search provides managed vector indexes for RAG patterns — embedding documents and retrieving relevant chunks to ground LLM responses. Both platforms fully support RAG; the choice depends on which ecosystem your data already lives in.

Snowflake Intelligence (announced 2025) bundles these services into a unified AI layer — think of it as the umbrella brand for all Cortex AI capabilities. For this workshop, the hands-on focus remains on AI_COMPLETE(), Genie, and notebook assistants.

3.1 Snowflake Cortex AI (20 min — Hands-on)

AI_COMPLETE — LLM-powered Data Analysis

Snowflake’s AI_COMPLETE() function runs LLMs directly in SQL queries:

SELECT
    pickup_zone,
    dropoff_zone,
    time_of_day,
    AI_COMPLETE(
        'mistral-large2',
        'Classify this NYC taxi trip as business, leisure, or commute: '
        || 'From ' || pickup_zone || ' to ' || dropoff_zone
        || ' during ' || time_of_day
    ) AS trip_classification
FROM silver_nyc_taxi_enriched
LIMIT 10;

Additional built-in Cortex AI functions include AI_SUMMARIZE_AGG() for document summaries and AI_CLASSIFY() for categorization — all without model training or external API calls.

Cortex Code — AI-Assisted SQL

  • Available in Snowsight SQL editor
  • Generates SQL from natural language descriptions
  • Explains existing queries
  • Suggests optimizations

Exercise

See AI Features Exercise for guided Cortex AI exercises.

3.2 Databricks Genie (15 min — Guided Demo + Hands-on)

Databricks Assistant

AI coding assistant integrated into notebooks:

  • Type a prompt → get PySpark code
  • Explain existing code
  • Debug errors
  • Available in every notebook cell

Try these prompts: > “Write a PySpark query to find the top 5 routes by revenue” > “Explain this data quality filter” > “Add a column that classifies tips as low/medium/high”

AI Functions — ai_query()

Run LLMs on your data via SQL:

SELECT
    pickup_zone,
    ai_query(
        'databricks-meta-llama-3-3-70b-instruct',
        CONCAT('Is this zone a tourist area? Zone: ', pickup_zone)
    ) AS is_tourist_area
FROM gold_kpi_top_pickup_zones;

Genie (formerly AI/BI Spaces)

Create a natural-language interface over Gold tables:

  1. In the Databricks sidebar, click Genie (under the SQL section)
  2. Create a new Genie space connected to your Gold schema
  3. Ask questions in plain English: “What hour has the most taxi trips?”
NoteMosaic AI stack (Databricks)

Beyond the Genie and Assistant features demonstrated here, Databricks offers the Mosaic AI stack for production AI workloads: Model Serving (managed REST endpoints for ML and LLM models), Vector Search (managed vector indexes for RAG), and Agent Framework (orchestration for multi-step AI agents). These are production-grade tools that extend beyond the workshop scope but are worth knowing for teams building AI applications on Databricks.

3.3 dbt Copilot (10 min — Trainer Demo)

Features (dbt Cloud Starter+ plans)

  • Auto-generate documentation: AI writes model descriptions from SQL
  • Auto-generate tests: AI suggests data tests based on column patterns
  • SQL generation: Natural language → dbt SQL model

dbt MCP Server (Free — Trainees Can Use)

The dbt MCP server integrates dbt with AI coding assistants in VS Code:

  • Query dbt documentation from your IDE
  • Get model recommendations
  • Auto-complete ref() and source() references

3.4 Discussion: Where Does AI Add Most Value?

Consider these scenarios:

Scenario Best AI Tool
Classify trip data at scale Cortex AI / ai_query()
Generate boilerplate SQL Databricks Assistant / Copilot
Auto-document 50 dbt models dbt Copilot
Explore data via natural language Genie / Cortex Analyst
Debug a failing query Any AI assistant
Write data tests dbt Copilot

Key insight: AI is most valuable for repetitive, pattern-based tasks — not for architectural decisions or complex business logic.

AI on governed data only

Always run AI functions against Silver or Gold tables — never against raw Bronze. LLMs produce more reliable, consistent results when the input data is clean, typed, and validated. If AI_COMPLETE() generates a classification for a zone name that contains null or garbled values, the output is garbage. Clean data in, useful AI out.

AI outputs are not deterministic

LLM-based functions (AI_COMPLETE, ai_query()) are probabilistic — the same input can produce different outputs across runs. This means: (a) never use LLM output as a primary key or join condition, (b) always validate AI-generated classifications against known ground truth before trusting them at scale, and (c) do not put AI-generated columns into financial reports without human review. AI accelerates exploration; humans validate production KPIs.

3.5 Key Takeaways

  • This module ≠ Module 9 — Module 6 covers LLM assistants (text generation, classification, code generation); Module 9 covers predictive ML (regression, forecasting)
  • AI_COMPLETE() and ai_query() run LLMs inside SQL — no separate API server or infrastructure needed
  • Genie and Cortex Analyst provide natural-language interfaces over Gold tables — ideal for self-service analytics
  • AI is most valuable for repetitive, pattern-based tasks: boilerplate SQL, data classification, documentation generation
  • AI outputs are probabilistic — always validate against governed Silver/Gold data and never use as primary keys
  • Cortex Analyst uses semantic models for governed self-service; Cortex Search enables RAG patterns

4. Quiz

Quiz: Module 6 — AI Features Quiz

Scan QR to open quiz

Before moving on, make sure you can answer:

  1. What is the difference between AI_COMPLETE() (Module 6) and ML.FORECAST (Module 9)?
  2. Why should AI functions run against Silver or Gold tables rather than Bronze?
  3. Name two scenarios where AI assistants save significant time in data engineering workflows.

5. Practice

Hands-on lab

Priya / Power BI: AI helps Priya explore Gold faster — it does not replace her KPI definitions or dashboard.

Next module

Module 7: Comparison & Wrap-up — Priya presents the finished dashboard. Marcus asks what to run in production — you decide.


Official Documentation