An adaptive, local-first quiz covering the architecture, storage, Python data pipeline, K3s, queues, API, networking, and public delivery for the build.
Progress stays in this browser
Music-Credit Graph Project
Architecture and Technology Study Guide
Initial hardware: ZimaBoard 832 + 1 TB NVMe + four Raspberry Pi 3B nodes + optional x600 compute
Physical system topology
Guide Map
Section
What it answers
1. Architecture decision
What runs where, and why K3s becomes reasonable with the ZimaBoard.
2. Storage and data layers
How the 1 TB NVMe is organized and which formats store which information.
3. Workload and network design
How the Wi-Fi-connected ZimaBoard coordinates wired Pi workers without becoming a bottleneck.
4. Application and public delivery
How the API, static site, live game, and offline fallback fit together.
5. Build phases
A sequence that proves the game on small data before attempting a full catalog ingest.
6. Technology study matrix
Languages, frameworks, infrastructure tools, and small exercises to learn each one.
7. Study path and glossary
An ordered curriculum and concise definitions.
1. Recommended Architecture Decision
Use the ZimaBoard 832 as the single always-on control and storage node. Install the operating system on its internal eMMC, mount the 1 TB NVMe as the project data volume, connect it to the home LAN with a Linux-supported USB 5 GHz Wi-Fi adapter, and run the K3s server there. The four Raspberry Pis become K3s agents and Python batch workers. The x600 remains optional heavy compute rather than a required control-plane member.
Host the authoritative database; run many Python processes; hold K3s control-plane state.
x600
Local development; monthly parsing and graph builds; multi-architecture image builds; optional heavy queue worker.
Be required for daily availability.
Cloudflare
Static site delivery; tunnel ingress; caching; rate limiting; durable public exports.
Store the raw working catalog or replace the canonical local data store initially.
2. Physical Setup and Wi-Fi Constraints
2.1 ZimaBoard hardware
PCIe slot: occupied by the passive NVMe adapter and 1 TB NVMe drive.
Wi-Fi: use a USB adapter because the PCIe slot is no longer available. Prefer a chipset supported directly by the Linux kernel; avoid adapters that depend on fragile out-of-tree DKMS drivers.
Boot: keep the OS on eMMC at first. Use the NVMe for /srv/music-graph and relocate K3s data, databases, container storage, and project files there.
Filesystem: ext4 is the simplest choice for a single-drive experimental server. Use no RAID; protect only irreplaceable state with backups.
Addressing: reserve the ZimaBoard IP in eero DHCP, give it a stable hostname, disable Wi-Fi power saving, and monitor reconnection health.
2.2 What Wi-Fi changes
Traffic type
Expected behavior
Design response
K3s control traffic
Low volume; acceptable over stable Wi-Fi.
Use a reserved IP and automatic reconnect. Expect the cluster scheduler to pause during Wi-Fi loss.
PostgreSQL / queue messages
Small requests and results; normally fine.
Workers communicate through APIs/queues rather than opening database files.
Monthly raw and Parquet transfers
Tens of gigabytes; slow but infrequent.
Schedule overnight, resume transfers, verify checksums, and copy one worker at a time.
Random graph traversal from Pis
Potentially latency-sensitive and chatty.
Do not traverse the graph over NFS. Replicate immutable graph snapshots to local Pi storage.
Public API traffic
Very small relative to data ingest.
Use Cloudflare Tunnel and cache popular results.
Upgrade path: when an extra switch port becomes available, wire the ZimaBoard to the shelf switch. The application architecture remains unchanged; only the network interface and IP reservation change.
3. Data Storage Architecture
Music-credit graph data flow
3.1 Storage layers
Layer
Format / system
Purpose
Durability
Raw source
Original XML.gz snapshots
Reproducible source inputs; current and previous month retained.
Redownloadable, but keep checksums and manifests.
Normalized catalog
Partitioned Parquet
Broad artist, release, track, role, format, and credit records; supports future pivots.
The most important reusable data product.
Analytical workspace
DuckDB over Parquet
Transformations, joins, data-quality tests, statistics, and graph-build queries.
Do not store the graph solely as relational artist-to-artist pairs. Keep a bipartite artist → release → artist representation. A release with many contributors otherwise generates a combinatorial number of pair edges. Rule-specific graphs are derived from the normalized credit records.
3.2 Suggested NVMe directory layout
/srv/music-graph/
raw/<snapshot>/ original XML.gz + checksums
curated/<snapshot>/ partitioned Parquet tables
graph/<snapshot>/<ruleset>/ immutable graph files
postgres/ PostgreSQL data directory
redis/ optional persistence
exports/ public JSON and SVG
staging/ temporary build files
backups/ compressed app-state backups
manifests/ schema, source, and checksum metadata
Allocation target
Planning allowance
Retention policy
Raw snapshots
40–60 GB
Current + previous; older raw inputs can be redownloaded.
Curated Parquet
100–200 GB
Several schema/snapshot versions while the parser evolves.
Graph snapshots
80–150 GB
Current, previous, and a few rule sets.
PostgreSQL + indexes
40–100 GB
Keep app state and selected searchable metadata, not every analytical row.
Security boundary: Cloudflare Tunnel creates outbound connections from the ZimaBoard, so the router needs no inbound port forwarding. Restrict CORS to the site domain, validate inputs, enforce timeouts and maximum search depth, and expose only the API service—not PostgreSQL, Redis, K3s, SSH, or storage shares.
8. Repository and Development Workflow
music-credit-graph/
apps/api/ FastAPI application
apps/frontend/ Astro + Svelte project
packages/catalog/ XML parsing and normalization
packages/graph_core/ graph format and algorithms
packages/game_rules/ configurable connection rules
packages/workers/ RQ jobs and research tasks
infra/ansible/ operating-system provisioning
infra/k8s/ Kustomize manifests
data/contracts/ schemas, fixtures, manifests
tests/ unit, integration, property, browser
Development concern
Recommendation
Python environment
pyproject.toml with uv; lock dependencies; build reproducible containers.
Quality
Ruff for formatting/linting, mypy for selected typed boundaries, pytest for unit and integration tests.
Data testing
Tiny checked-in XML fixtures, schema tests, count checks, and known-path golden tests.
Containers
Docker Buildx multi-platform images for amd64 and arm64; push to GHCR.