Open Discussion Guide
Module 7 capstone — trainee-led tool comparison
Facilitator guide for the Module 7 capstone: trainees compare Databricks, Snowflake, and dbt and defend a production tool choice for YellowLine NYC.
Duration: ~30 minutes total within Module 7
Format: Animation → silent write → open discussion → trainer close
Goal: Trainees lead; trainer facilitates — no vendor lecture
Related docs:
- architecture-decision-matrix.qmd — structured workshop matrix trainees fill during silent reflection
- reflection-prompts.qmd — Module 7 silent reflection prompts
- animation-production-scripts.md —
mod-07-wrapup.mp4 - TRAINING_MATERIAL_MIGRATION_PLAN.md — storyline context
- module-prerequisites-and-order.md — editorial rules (workshop-2026 only; do not edit workshop-2026-v1/)
- exercises/ex-batch-comparison.qmd — observation table (optional Module 7 opener)
- modules/07-comparison-wrapup.qmd — technical reference (use only if discussion stalls)
Module 7 Flow (Full Block)
| Step | Activity | Duration | Doc |
|---|---|---|---|
| 1 | Play mod-07-wrapup.mp4 |
4 min | animation-production-scripts.md |
| 2 | Silent reflection | 2 min | reflection-prompts.qmd § Module 7 |
| 3 | Short theory recap (optional) | 5 min | Three pipelines, one dataset — trainer only if needed |
| 4 | Open discussion | 20–25 min | This guide |
| 5 | Trainer close | 2 min | This guide § Close |
| 6 | Power BI demo (optional) | 10 min | Pre-class checklist § Power BI |
Timing note: If running the Power BI demo, shorten Round 3 synthesis to 5 minutes or move the demo before discussion so trainees discuss with the dashboard fresh in mind.
Facilitator Mindset
Do
- Ask follow-up questions: “Why?” “What would you give up?” “Who maintains it?”
- Capture trainee words on the whiteboard — not slide content.
- Let disagreement happen; compare trade-offs, not personalities.
- Remind: platform, transform layer, and consumption (Power BI) are separate decisions.
- Revisit the Story whiteboard at the end.
Do not
- Open with a comparison table slide — trainees should build it.
- Declare a single “correct” stack for YellowLine NYC.
- Let one vocal participant dominate — rotate speakers in Round 1.
- Conflate dbt with a warehouse — correct gently if someone says “we’ll use dbt instead of Snowflake.”
- Conflate Module 6 Cortex LLM with Module 9 Cortex ML — they are different APIs and sessions.
Cortex naming (say if confusion arises):
| Module | Cortex usage | Purpose |
|---|---|---|
| 6 AI Features | AI_COMPLETE, Copilot, Genie |
LLM assistants — SQL, exploration |
| 9 ML (optional) | ML.FORECAST, ML.ANOMALY_DETECTION, Snowpark ML |
Predictive models |
Elena’s framing (say once at start of discussion):
“Marcus doesn’t need the best tool in the industry. He needs the best fit for his SQL team, his auditors, and his budget. You built all three paths today — now recommend with evidence.”
Before Discussion — Silent Reflection (2 min)
Hand out the architecture decision matrix (one printed page per trainee) or use the prompts below verbally if no handout.
Display on screen or read aloud:
- My recommended stack for YellowLine NYC is: Databricks / Snowflake / dbt / combination
- One sentence why (must reference at least one row from the matrix):
- One tool I would not choose as the primary platform and why:
No talking. No laptop research. Pens only.
Facilitator: Start a 2-minute timer. When it ends, go straight to Round 1 — do not ask for hands first.
Three-constraint reminder (project during silent fill)
Trainees should weigh every recommendation against three orthogonal dimensions Marcus named over the day:
| Constraint | Marcus’s question | Where it appeared |
|---|---|---|
| Cost | Can we afford this in year 3? | Story brief; Round 2 Card B |
| Performance | Fast enough for live dispatch? | Module 8 framing; Round 2 Card C |
| Compliance | Audit-ready lineage by Q3? | Module 4 dbt motivation; Round 2 Card D |
Round 2 — Challenge (8 minutes)
Purpose: Stress-test recommendations against changing constraints.
Format: Announce a constraint card → ask “Does your recommendation still hold?” → brief debate → next card. Spend ~2 minutes per card; use 2–3 cards, not all six.
Constraint cards
Pick cards that match what you heard in Round 1.
Card A — SQL-only team
“Marcus confirms: five SQL analysts, zero Python developers on staff. Nothing changes for two years.”
Follow-ups:
- What happens to the Databricks notebook path?
- Is Snowpark enough, or is pure SQL required?
- Where does dbt fit for this team?
Card B — Budget cut 40%
“Finance cuts the data platform budget by forty percent. One primary platform license.”
Follow-ups:
- Do you still run two platforms?
- What do you drop — ingest tooling, transform layer, or duplicate pipelines?
- Can dbt Core on an existing warehouse reduce cost?
Card C — Real-time in six months
“Marcus needs live zone demand signals in six months — Module 8 streaming.”
Follow-ups:
- Which platform from today gives the clearest path to streaming?
- Does your batch stack choice block or help streaming later?
- Would you split batch and streaming across platforms?
Card D — Audit and lineage
“Regulators audit in Q3. Every dashboard tile must trace to source with tests.”
Follow-ups:
- What did dbt add that Snowflake worksheets alone lacked?
- Can Unity Catalog or Snowflake Horizon replace dbt docs for Marcus’s board?
- Minimum viable governance stack?
Card E — ML and tipping model
“Marcus wants tip prediction and driver incentives — Module 9 ML — within a year.”
Follow-ups:
- Does Databricks become non-negotiable?
- Can Snowflake Cortex or Snowpark ML suffice for SQL-heavy teams?
- Where do features live — dbt table, Silver, or notebook?
- Would you batch-score to Gold for Priya’s Power BI page?
If Module 9 was delivered: Ask trainees to reference their RMSE comparison from ex-ml.
Card F — Speed to first dashboard
“Marcus needs something on his desk in two weeks, full platform decision in six months.”
Follow-ups:
- Fastest path to Priya’s Overview page?
- Build throwaway vs build for production?
- Minimum Bronze/Silver/Gold for one KPI?
Facilitator tip: If the room converges too fast, play the opposite card (e.g. ML card after everyone picks Snowflake + dbt).
Round 3 — Synthesis (8 minutes)
Purpose: Build a comparison table from group consensus, not slides.
Format: Facilitate row by row. Ask for hands or short shouts; write exact phrases.
Whiteboard table (fill live)
| Dimension | Databricks | Snowflake | dbt |
|---|---|---|---|
| Best for | |||
| Weak for | |||
| Fit for Marcus’s SQL team | |||
| Ingest strength | |||
| Transform / governance | |||
| Power BI consumption |
Row prompts:
- “One thing Databricks clearly won today — shout it out.”
- “Where was Databricks overkill for YellowLine NYC?”
- “What does dbt do that isn’t just SQL in a worksheet?”
- “Priya connected Power BI to Gold — does that favor any platform?”
Separate three decisions (draw on board)
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PLATFORM │ │ TRANSFORM │ │ CONSUMPTION │
│ (where data │ │ (logic, tests, │ │ (Power BI — │
│ runs) │ │ lineage) │ │ Priya) │
│ Databricks / │ │ dbt / native │ │ Gold KPIs │
│ Snowflake / … │ │ SQL / notebooks│ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Key teaching moment:
“dbt is not a third warehouse. It runs on Databricks or Snowflake. Marcus might choose Snowflake + dbt + Power BI — three layers, three roles.”
Round 4 — Architecture Revisit (5 minutes)
Purpose: Close the loop to the Story.
Bring back Story whiteboard (photo or still on board).
Questions:
- “What would you change in your morning design now that you’ve built all three pipelines?”
- “What did you get right on day one?”
- “If you were Elena, what would you tell Marcus to run on Monday?”
Optional poll (hands or Mentimeter):
- Primary platform: Databricks / Snowflake / Both
- Transform layer: Native SQL / dbt / Notebooks
- Confidence: “I could defend my choice to a client” — 1–5 fingers
Trainer Close (2 minutes)
Use this script — adapt, do not read verbatim if the room is engaged:
“Today you saw one dataset flow through medallion architecture three ways. Priya’s Power BI dashboard didn’t care which engine built Gold — same schema, same KPIs.
There’s no universal winner. Real projects choose from skills, cost, governance, and what’s next — streaming, ML, audit. Three constraints stayed with us all day:
- Cost — can YellowLine NYC afford this in year 3?
- Performance — fast enough for live dispatch later?
- Compliance — audit-ready lineage by Q3?
Remember three decisions: platform, transform layer, consumption. YellowLine NYC might combine tools. Your job as data engineers is to recommend with evidence — like you did in this room.
Look at your Story sketch. You weren’t wrong to guess. Now you’ve proved it in code.”
Closing line (deliver as the final sentence, project as title card if using slides):
“Technology is a decision. Architecture is responsibility.”
Optional Elena line:
“MHP often lands on Snowflake + dbt for SQL-heavy clients and keeps Databricks for heavy engineering — but that’s a pattern, not a rule. Marcus pays for your recommendation, not ours.”
Reference Material — Use Only If Discussion Stalls
Do not project this at the start. Use if silence exceeds 30 seconds or factual confusion arises.
dbt clarification
| Misconception | Correction |
|---|---|
| “dbt replaces Snowflake” | dbt sends SQL to Snowflake (or Databricks) |
| “We only need dbt” | dbt does not ingest raw Parquet from ADLS2 by itself |
| “dbt is only for docs” | Tests and materializations are core value |
Comparison dimensions (trainer crib sheet)
| Dimension | Databricks | Snowflake | dbt |
|---|---|---|---|
| Primary user | Data engineer / ML engineer | SQL analyst / analytics engineer | Analytics engineer |
| Ingest | Spark, Auto Loader, DLT, direct ADLS2 | External stages, Snowpipe, COPY INTO |
Reads existing Bronze tables |
| Transform | PySpark, SQL, Delta Lake | SQL, Snowpark Python | SQL + Jinja, ref(), macros |
| Governance | Unity Catalog | Horizon, tags, masking | Tests, docs, lineage graph |
| Scheduling | Workflows, DLT pipelines | Tasks, Streams | CI/CD, dbt build in GitHub Actions |
| Power BI | Gold via Databricks connector | Gold via Snowflake connector | Gold via underlying warehouse |
| Learning curve for SQL team | Higher (notebooks) | Lower (worksheets) | Low–medium (SQL + YAML) |
| Strong when | Scale, Spark, ML, streaming path | SQL ops, elastic DWH, sharing | Lineage, tests, transform-as-code |
Example “reasonable” stacks (not answers to give — discussion seeds if stalled)
| Stack | When it fits YellowLine NYC |
|---|---|
| Snowflake + dbt + Power BI | SQL team maintains; auditors need lineage |
| Databricks only + Power BI | Small eng team; ML/streaming on roadmap |
| Databricks ingest + Snowflake Gold + dbt | Rare split — discuss complexity cost |
| Snowflake only (no dbt) | Fast start; weaker lineage story for board |
Optional Power BI Demo — Placement Options
| Option | When to use |
|---|---|
| A — Before discussion | Visual payoff first; discussion references live dashboard |
| B — After discussion | Discussion stays abstract; demo as “Priya’s deliverable” |
| C — Skip live demo | Point to Exercise: Power BI; animation already showed full dashboard |
Delivery path (see pre-class checklist § Power BI): screen-share from Power BI Service (published workspace) or Desktop .pbix fallback.
Demo talking points (2 min max if time-tight):
- Built in Desktop, optionally published to trainer cloud workspace — all 12
kpi_*tables, five pages - Same report connects to Databricks or Snowflake Gold — switch data source only
- Twelve
kpi_*tables — no relationships required - Priya’s five questions from the Story are answered on five pages
Handling Common Classroom Situations
| Situation | Response |
|---|---|
| One person dominates | “Thank you — let’s hear someone who disagrees.” |
| Vendor debate gets heated | “We’re not picking a winner for MHP — we’re advising Marcus.” |
| Trainee says “just use everything” | “Marcus has budget for one primary platform. What do you cut?” |
| Confusion on dbt | Draw platform box with dbt inside as transform layer |
| Room is quiet after Round 1 | Use constraint Card B or F — concrete scenarios unlock opinions |
| Running out of time | Skip Round 4 poll; keep synthesis table + close + Story revisit |
Printable Facilitator Timing Card
MODULE 7 — OPEN DISCUSSION (~30 min)
────────────────────────────────────
[ ] Animation mod-07-wrapup.mp4 4 min
[ ] Silent write 2 min
[ ] Round 1 — Share (3–4 speakers) 8 min
[ ] Round 2 — Challenge (2–3 cards) 8 min
[ ] Round 3 — Synthesis table 8 min
[ ] Round 4 — Story revisit 5 min ← shorten if needed
[ ] Trainer close 2 min
[ ] Power BI demo (optional) 10 min
────────────────────────────────────
OPEN: "What would YOU choose for YellowLine NYC?"
CLOSE: Platform | Transform | Consumption — three decisions
Success Signals
Discussion succeeded if trainees:
Document History
| Date | Change |
|---|---|
| 2026-05-23 | Initial Module 7 open discussion facilitation guide |
| 2026-05-23 | Expanded Module 9 ML constraint card follow-ups |
| 2026-05-23 | ex-batch-comparison opener; Cortex Module 6 vs 9 distinction |
| 2026-05-24 | Decision-matrix handout link; three-constraint reminder (cost / performance / compliance); closing tagline |