Aiven Streaming Setup Guide
Kafka cluster, SSL certificates, and secrets for Module 8 (Optional)
title: “Aiven Streaming Setup Guide” subtitle: “Kafka cluster, SSL certificates, and secrets for Module 8 (Optional)” —
Most of the Aiven configuration is done once by the trainer before the workshop. Trainees only need to add SSL credentials to Databricks Secrets and run the SQL files. This guide covers both trainer setup and trainee steps.
Overview
Module 8 uses Aiven Free Kafka as the streaming broker.
| Component | Who sets it up | What it does |
|---|---|---|
| Aiven Kafka cluster | Trainer | Apache Kafka broker (free tier) |
| User Activity generator | Trainer | Publishes simulated events automatically |
| SSL certificates | Trainer downloads, shares | Authenticates consumers to Kafka |
| Databricks Secrets | Trainee | Stores SSL certs for notebook access |
Relay consumer (00_relay_consumer.py) |
Trainer | Writes Kafka events to ADLS2 as NDJSON |
| Snowflake stage + Snowpipe | Trainee (SQL file) | Loads NDJSON files into Bronze table |
Trainer Setup Steps
1. Create the Aiven Kafka cluster
- Go to aiven.io → Login → Create a new service
- Select Apache Kafka
- Choose Free plan (no credit card required)
- Region: azure-westeurope (same as workshop storage)
- Service name:
mhp-de-workshop-kafka - Click Create service — takes ~2 minutes to provision
- 2 partitions per topic (maximum)
- 250 KiB/s throughput
- 5 topics maximum
- Idle shutdown after 24 hours of inactivity
- One free Kafka service per organisation
2. Start the User Activity generator
- Open the Aiven Console → select your Kafka service
- Navigate to Sample data tab
- Select User Activity scenario
- Duration: 4 hours (maximum for free tier)
- Click Start generating — events start flowing immediately to a
user-activitytopic
The generator publishes events in Avro format using the built-in Karapace Schema Registry.
Event schema:
| Field | Type | Example |
|---|---|---|
timestamp |
ISO8601 string | 2026-04-04T10:23:45Z |
user_id |
string | user-abc123 |
action |
string | view, click, scroll, search, purchase |
page |
string | /products/shoes |
country |
string | DE, US, GB, FR |
3. Download SSL certificates
- In the Aiven Console → your Kafka service → Connection information
- Under Available protocols → Kafka — download all three:
- CA Certificate → save as
ca.pem - Access Certificate → save as
service.cert - Access Key → save as
service.key
- CA Certificate → save as
- Note the Service URI — format:
kafka-xxxxx.aivencloud.com:12345
Share the Service URI and cert content with trainees via the workshop credential sheet. Never commit cert files to Git.
4. Start the relay consumer
The relay consumer (streaming/snowflake/00_relay_consumer.py) reads from Kafka and writes NDJSON files to ADLS2 so Snowpipe can ingest them.
pip install kafka-python fastavro azure-storage-blob
export AIVEN_BOOTSTRAP_SERVERS="kafka-xxxxx.aivencloud.com:12345"
export AIVEN_TOPIC="user-activity"
export AIVEN_CA_CERT_PATH="/path/to/ca.pem"
export AIVEN_CLIENT_CERT_PATH="/path/to/service.cert"
export AIVEN_CLIENT_KEY_PATH="/path/to/service.key"
export AZURE_STORAGE_ACCOUNT="mhpdeworkshopsa"
export AZURE_STORAGE_KEY="<storage-account-key>"
python streaming/snowflake/00_relay_consumer.pyFiles land in: https://mhpdeworkshopsa.blob.core.windows.net/nyc-taxi-data/streaming/user-activity/
Trainee Setup Steps
1. Add Aiven credentials to Databricks Secrets
The Databricks streaming notebooks read SSL credentials from the Databricks Secrets store. The trainer will share the values during the session.
Open a cluster terminal or notebook and run:
# Install Databricks CLI if not present
pip install databricks-cli
# Create the secrets scope (if it doesn't exist)
databricks secrets create-scope --scope workshop-scope
# Add each secret (values provided by trainer)
databricks secrets put --scope workshop-scope --key aiven-bootstrap-servers
databricks secrets put --scope workshop-scope --key aiven-ca-cert
databricks secrets put --scope workshop-scope --key aiven-client-cert
databricks secrets put --scope workshop-scope --key aiven-client-key
databricks secrets put --scope workshop-scope --key aiven-topicYou can also manage secrets in the Databricks workspace UI: Settings → Admin Console → Secrets, or via the Secrets UI (if enabled in your workspace).
Secret names used by the notebooks:
| Secret key | Value source |
|---|---|
workshop-scope/aiven-bootstrap-servers |
Service URI from Aiven Console |
workshop-scope/aiven-ca-cert |
Contents of ca.pem |
workshop-scope/aiven-client-cert |
Contents of service.cert |
workshop-scope/aiven-client-key |
Contents of service.key |
workshop-scope/aiven-topic |
user-activity |
2. Install Maven libraries on your Databricks cluster
The streaming notebooks require two Maven libraries that are not in the standard Databricks Runtime.
- Open your Databricks workspace → Compute → select your cluster
- Click Libraries tab → Install new
- Source: Maven — install both:
| Coordinates | Purpose |
|---|---|
org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 |
Kafka source/sink for Structured Streaming |
org.apache.spark:spark-avro_2.12:3.5.0 |
Avro decoding (from_avro) |
- Restart the cluster after installing
3. Set up Snowflake for streaming (Bronze + Snowpipe)
- Open a Snowsight SQL Worksheet
- Open
streaming/snowflake/01_setup_streaming.sql - Replace
{ATTENDEE_ID}with your assigned ID (uppercase) - Replace
<sas-token-from-trainer>with the SAS token provided during the session - Run all statements
- Verify the Snowpipe is created:
SELECT SYSTEM$PIPE_STATUS(
'DE_MASTERCLASS.{ATTENDEE_ID}_STREAMING.STREAMING_PIPE_USER_ACTIVITY'
);Verify Everything is Working
Databricks — confirm Kafka connection (Bronze notebook cell 1):
# Run first cell of 01_streaming_bronze.py
# Should print: "Aiven SSL config ready"
# Should NOT raise: AuthenticationException or ssl.SSLErrorSnowflake — confirm Bronze rows are landing:
-- Run this every 30 seconds after relay consumer starts
SELECT COUNT(*), MAX(ingest_ts) AS last_row
FROM DE_MASTERCLASS.{ATTENDEE_ID}_STREAMING.STREAMING_BRONZE_USER_ACTIVITY;Expect first rows within 1–2 minutes of the relay consumer starting.
Troubleshooting
| Issue | Solution |
|---|---|
ssl.SSLError: certificate verify failed |
Check that ca-cert secret contains the full content of ca.pem (no extra whitespace) |
AuthenticationException |
Verify bootstrap-servers format is correct: hostname:port |
Databricks: ClassNotFoundException: kafka |
Maven library not installed or cluster not restarted |
Databricks: ClassNotFoundException: from_avro |
spark-avro library not installed |
| Bronze table empty after 3 min | Confirm relay consumer is running; check ADLS2 container for files |
| Snowpipe not loading | Run ALTER PIPE ... REFRESH; manually; check pipe status |
| Aiven service offline | Free tier shuts down after 24h idle — restart from Aiven Console |