Use Case — YellowLine NYC

Fictional yellow-taxi operator · NYC TLC-inspired data

Story Animation

Client

YellowLine NYC — a fictional NYC yellow-taxi operator (inspired by TLC public trip data, not a real company). They operate across all five boroughs and compete with ride-hail on price, wait time, and driver earnings.

Business challenge

Trip volume is stable, but revenue per mile and driver utilization vary by hour, borough, and route. Leadership suspects they are:

  • Over-supplying taxis in low-demand zones at the wrong times
  • Under-pricing or mis-allocating fleet during peak windows
  • Losing visibility because reports are built manually from spreadsheets

They need an analytics platform their SQL-heavy internal team can maintain after MHP leaves.

Success criteria

# Criterion How the training proves it
1 Ingest TLC trip data reliably Bronze layer in all three tool tracks
2 Clean, enriched trip records Silver with quality rules + zone lookup
3 Answer 12 operational KPI questions Gold layer (identical KPIs across platforms)
4 Executive dashboard Power BI on Gold tables
5 Team can extend pipelines Snowflake SQL + dbt maintainability story
6 Documented lineage and quality dbt docs, tests, lineage graph
7 Informed platform choice Module 7 open discussion

Data source

Azure ADLS2: mhpdeworkshopsa / nyc-taxi-data
├── raw/trips/          → yellow_tripdata_*.parquet
└── raw/lookup/         → taxi_zone_lookup.csv

Technical detail: Architecture · Data model

Priya’s five KPI questions

  1. When are our peak revenue hours?
  2. Which pickup zones drive the most trips?
  3. Which routes are most popular?
  4. How does revenue vary by borough and distance band?
  5. Are we losing money on bad data (null fares, zero-distance trips)?

All five map to the 12 Gold tables in the data model reference.

The Three Constraints

TipThree constraints to weigh all day

Marcus will judge every recommendation against three orthogonal dimensions. Keep them in your head from the story introduction to Module 7 — Elena revisits them in the wrap-up.

Constraint Marcus’s question Where it lands
Cost Can YellowLine NYC afford this in year 3? Module 7 Round 2 Card B
Performance Fast enough for live dispatch later? Module 8 streaming
Compliance Audit-ready lineage by Q3? Module 4 dbt; Module 7 Card D

By end of day you should answer all three — not just “which tool is best?”