Exercise: Production Patterns
YellowLine NYC story · full hands-on lab
title: “Exercise: Production Patterns” subtitle: “YellowLine NYC story · full hands-on lab” —
Estimated time: 30–35 min (Databricks: 10 min · Snowflake: 15 min · dbt CI: 5–10 min)
YellowLine NYC context (Module 5)
What runs every night when the team is not in the room?
TipWorking Environment
Use GitHub Codespaces for a ready-to-use environment — all tools pre-installed. Open your fork on GitHub → Code → Codespaces → Create codespace on main. All production files (databricks/production/, snowflake/production/, dbt_project/production/) are at /workspace/.
Prerequisites
- Completed at least one pipeline module (Databricks, Snowflake, or dbt)
- Production files available in
databricks/production/,snowflake/production/,dbt_project/production/
Databricks Production Exercise
Review Lakeflow Declarative Pipeline (formerly DLT)
- Open
databricks/production/dlt_pipeline.py - Compare
@dlt.expect_or_drop()decorators with the manualWHEREfilters in02_silver_cleaning.py(Note: In 2026, the recommended API isfrom pyspark import pipelines as dpwith@dp.table()/@dp.expect_or_drop(). The legacyimport dlt/@dltdecorators still work.) - Answer:
- How many data quality expectations are defined?
- What happens to rows that fail an expectation?
- How does DLT know the execution order (Bronze → Silver → Gold)?
Review Asset Bundle
- Open
databricks/production/asset_bundle/databricks.yml - Answer:
- How many deployment targets are defined?
- What changes between
devandprodtargets? - What does
databricks bundle deploy -t proddo?
Stretch: Create a Lakeflow Declarative Pipeline
- In Databricks, go to Jobs & Pipelines in the sidebar → Create → Pipeline (Delta Live Tables was renamed to Lakeflow Declarative Pipelines in 2025.)
- Create a new pipeline pointing to
dlt_pipeline.py - Set target catalog to your attendee catalog
- Start the pipeline and observe the execution graph
Review Workflows Job (Lakeflow Jobs)
- Open
databricks/production/workflows_job.yml - Answer:
- How is the task DAG defined (Bronze → Silver → Gold)?
- What cluster configuration is used for production?
- How does this compare to Asset Bundles for deployment?
Snowflake Production Exercise
Create a Scheduled Task
-- 1. Create a simple task that refreshes a Gold KPI
SET attendee_id = 'XX_yourname';
CREATE OR REPLACE TASK IDENTIFIER($attendee_id || '_GOLD.REFRESH_TRIPS_BY_HOUR')
WAREHOUSE = DE_WORKSHOP_WH
SCHEDULE = 'USING CRON 0 6 * * * Europe/Berlin'
AS
CREATE OR REPLACE TABLE IDENTIFIER($attendee_id || '_GOLD.GOLD_KPI_TRIPS_BY_HOUR') AS
SELECT
EXTRACT(HOUR FROM PICKUP_DATETIME) AS PICKUP_HOUR,
COUNT(*) AS TOTAL_TRIPS
FROM IDENTIFIER($attendee_id || '_SILVER.SILVER_NYC_TAXI_ENRICHED')
GROUP BY PICKUP_HOUR;
-- 2. Start the task
ALTER TASK IDENTIFIER($attendee_id || '_GOLD.REFRESH_TRIPS_BY_HOUR') RESUME;
-- 3. Check task history
SELECT *
FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY())
WHERE NAME = UPPER($attendee_id) || '_GOLD.REFRESH_TRIPS_BY_HOUR'
ORDER BY SCHEDULED_TIME DESC
LIMIT 5;
-- 4. Clean up (suspend when done)
ALTER TASK IDENTIFIER($attendee_id || '_GOLD.REFRESH_TRIPS_BY_HOUR') SUSPEND;Review Stored Procedures
- Open
snowflake/production/stored_procedures.sql - Answer:
- How does error handling work in the stored procedure?
- How would you call this procedure from a Task?
Review Tasks + Streams (CDC)
- Open
snowflake/production/tasks_and_streams.sql - This demonstrates a production CDC pattern using Streams + Tasks
- Answer:
- How does the Stream detect new data in Bronze?
- What does
WHEN SYSTEM$STREAM_HAS_DATA()do? - How is this different from a cron-scheduled Task?
Review Snowpark Stored Procedure
- Open
snowflake/production/snowpark_sproc.py - This shows deploying Snowpark Python as a Snowflake stored procedure
- Answer:
- How is the pipeline logic packaged via
session.sproc.register()? - What are the advantages of running Python inside Snowflake vs locally?
- How would you call this procedure from a Task?
- How is the pipeline logic packaged via
dbt Production Exercise
Review CI Configuration
- Open
dbt_project/production/github_actions_ci.yml - Trace through the job steps and answer:
- What triggers the CI workflow?
- What does
--select state:modified+do? - What does
--deferdo and why is it useful? - What happens if a dbt test fails?
Discussion Questions
- How does slim CI save time and compute cost?
- When would you choose dbt Cloud over GitHub Actions?
- What would you add to this CI pipeline for production readiness?
Expected Results
| Exercise | What you should see |
|---|---|
| Lakeflow Declarative Pipeline review | @dlt.expect_or_drop (or @dp.expect_or_drop) decorators define data quality rules; the pipeline auto-discovers Bronze → Silver → Gold execution order from dlt.read() calls |
| Asset Bundle review | databricks.yml defines dev and prod targets — prod uses a different catalog and job cluster config |
| Workflows review | workflows_job.yml defines a sequential task DAG with cluster config — alternative to Asset Bundles |
| Snowflake Task | Task appears in TASK_HISTORY() after RESUME; runs on cron schedule; Gold KPI table is refreshed |
| Tasks + Streams | Stream tracks CDC changes; Task runs only when stream has data — event-driven vs cron |
| Snowpark SProc | Python pipeline deployed as stored procedure via sproc.register(); runs inside Snowflake compute |
| dbt CI review | state:modified+ selects only changed models and their downstream dependents; --defer uses production state for unmodified models |
Cleanup
-- Suspend the Snowflake task to stop scheduled runs
ALTER TASK IDENTIFIER($attendee_id || '_GOLD.REFRESH_TRIPS_BY_HOUR') SUSPEND;
-- Suspend warehouse
ALTER WAREHOUSE DE_WORKSHOP_WH SUSPEND;In Databricks, terminate any running clusters via Compute → Terminate.
Return to module
Source: merged from frozen workshop-2026-v1/exercises/ex-production.qmd — do not edit workshop-2026-v1/.