Music-Credit Graph Project

Architecture and Technology Study Guide

Initial hardware: ZimaBoard 832 + 1 TB NVMe + four Raspberry Pi 3B nodes + optional x600 compute

Guide Map

Section	What it answers
1. Architecture decision	What runs where, and why K3s becomes reasonable with the ZimaBoard.
2. Storage and data layers	How the 1 TB NVMe is organized and which formats store which information.
3. Workload and network design	How the Wi-Fi-connected ZimaBoard coordinates wired Pi workers without becoming a bottleneck.
4. Application and public delivery	How the API, static site, live game, and offline fallback fit together.
5. Build phases	A sequence that proves the game on small data before attempting a full catalog ingest.
6. Technology study matrix	Languages, frameworks, infrastructure tools, and small exercises to learn each one.
7. Study path and glossary	An ordered curriculum and concise definitions.

1. Recommended Architecture Decision

Use the ZimaBoard 832 as the single always-on control and storage node. Install the operating system on its internal eMMC, mount the 1 TB NVMe as the project data volume, connect it to the home LAN with a Linux-supported USB 5 GHz Wi-Fi adapter, and run the K3s server there. The four Raspberry Pis become K3s agents and Python batch workers. The x600 remains optional heavy compute rather than a required control-plane member.

Machine	Primary responsibilities	What it should not do
ZimaBoard 832	K3s server; PostgreSQL; Redis/RQ; FastAPI; graph service; authoritative files; Cloudflare Tunnel; scheduling.	Repeatedly send random graph reads over Wi-Fi; perform every heavy full-catalog rebuild.
Pi 3B ×4	Read-only batch workers; path sampling; challenge scoring; network statistics; validation; local snapshot processing.	Host the authoritative database; run many Python processes; hold K3s control-plane state.
x600	Local development; monthly parsing and graph builds; multi-architecture image builds; optional heavy queue worker.	Be required for daily availability.
Cloudflare	Static site delivery; tunnel ingress; caching; rate limiting; durable public exports.	Store the raw working catalog or replace the canonical local data store initially.

2. Physical Setup and Wi-Fi Constraints

2.1 ZimaBoard hardware

PCIe slot: occupied by the passive NVMe adapter and 1 TB NVMe drive.
Wi-Fi: use a USB adapter because the PCIe slot is no longer available. Prefer a chipset supported directly by the Linux kernel; avoid adapters that depend on fragile out-of-tree DKMS drivers.
Boot: keep the OS on eMMC at first. Use the NVMe for /srv/music-graph and relocate K3s data, databases, container storage, and project files there.
Filesystem: ext4 is the simplest choice for a single-drive experimental server. Use no RAID; protect only irreplaceable state with backups.
Addressing: reserve the ZimaBoard IP in eero DHCP, give it a stable hostname, disable Wi-Fi power saving, and monitor reconnection health.

2.2 What Wi-Fi changes

Traffic type	Expected behavior	Design response
K3s control traffic	Low volume; acceptable over stable Wi-Fi.	Use a reserved IP and automatic reconnect. Expect the cluster scheduler to pause during Wi-Fi loss.
PostgreSQL / queue messages	Small requests and results; normally fine.	Workers communicate through APIs/queues rather than opening database files.
Monthly raw and Parquet transfers	Tens of gigabytes; slow but infrequent.	Schedule overnight, resume transfers, verify checksums, and copy one worker at a time.
Random graph traversal from Pis	Potentially latency-sensitive and chatty.	Do not traverse the graph over NFS. Replicate immutable graph snapshots to local Pi storage.
Public API traffic	Very small relative to data ingest.	Use Cloudflare Tunnel and cache popular results.

Upgrade path: when an extra switch port becomes available, wire the ZimaBoard to the shelf switch. The application architecture remains unchanged; only the network interface and IP reservation change.

3. Data Storage Architecture

3.1 Storage layers

Layer	Format / system	Purpose	Durability
Raw source	Original XML.gz snapshots	Reproducible source inputs; current and previous month retained.	Redownloadable, but keep checksums and manifests.
Normalized catalog	Partitioned Parquet	Broad artist, release, track, role, format, and credit records; supports future pivots.	The most important reusable data product.
Analytical workspace	DuckDB over Parquet	Transformations, joins, data-quality tests, statistics, and graph-build queries.	Rebuildable from Parquet.
Application metadata	PostgreSQL	Artist search, aliases, challenges, scores, path cache, job records, snapshot registry.	Back up regularly.
Graph snapshot	Compact CSR-style binary arrays + metadata	Fast pathfinding with integer IDs and memory mapping.	Versioned; rebuildable from normalized data.
Transient coordination	Redis + RQ	Work queues, leases, short-lived result caching.	Treat as disposable; important results move to PostgreSQL.
Public export	JSON, SVG, optional compressed index	Daily challenges, aggregate research, cluster status, offline fallback.	Replicated to Cloudflare Pages/R2.

Do not store the graph solely as relational artist-to-artist pairs. Keep a bipartite artist → release → artist representation. A release with many contributors otherwise generates a combinatorial number of pair edges. Rule-specific graphs are derived from the normalized credit records.

3.2 Suggested NVMe directory layout

/srv/music-graph/
  raw/<snapshot>/                 original XML.gz + checksums
  curated/<snapshot>/             partitioned Parquet tables
  graph/<snapshot>/<ruleset>/     immutable graph files
  postgres/                       PostgreSQL data directory
  redis/                          optional persistence
  exports/                        public JSON and SVG
  staging/                        temporary build files
  backups/                        compressed app-state backups
  manifests/                      schema, source, and checksum metadata

Allocation target	Planning allowance	Retention policy
Raw snapshots	40–60 GB	Current + previous; older raw inputs can be redownloaded.
Curated Parquet	100–200 GB	Several schema/snapshot versions while the parser evolves.
Graph snapshots	80–150 GB	Current, previous, and a few rule sets.
PostgreSQL + indexes	40–100 GB	Keep app state and selected searchable metadata, not every analytical row.
Staging/build space	150–250 GB	Clean after successful snapshot publication.
Exports, logs, backups, free space	Remainder	Keep at least 20–25% of the SSD free.

4. Runtime and Cluster Architecture

4.1 K3s node plan

Node label	Machine	Scheduled components
role=control, storage=nvme, arch=amd64	ZimaBoard 832	K3s server, API, PostgreSQL, Redis, scheduler, publisher, cloudflared, graph-query service.
role=worker, class=pi, arch=arm64	Each Pi 3B	One bounded Python worker, health agent, and snapshot synchronization job.
role=heavy, class=x600, arch=amd64	x600 when enabled	Parser, graph builder, intensive statistics, multi-arch builds, optional GPU experiments.

Use one K3s server with the default SQLite datastore. High availability is unnecessary for this hobby build.
Put /var/lib/rancher/k3s and container data on the NVMe, not the eMMC.
Use K3s local-path storage only on the ZimaBoard for stateful services. Do not deploy Longhorn across 1 GB Pis.
Use node selectors and resource requests/limits. Start with one worker process per Pi and memory limits around 200–300 MB per application worker.
Build linux/amd64 and linux/arm64 container images. Run all Pis on a current 64-bit Raspberry Pi OS Lite image.

4.2 Services

Service	Placement	Role
api	ZimaBoard	FastAPI endpoints for search, evidence, paths, challenge retrieval, and health.
graph-engine	ZimaBoard; library reused by workers	Memory-mapped adjacency traversal; bidirectional BFS and weighted path search.
postgres	ZimaBoard	Search metadata, challenges, path cache, scores, job and snapshot registry.
redis	ZimaBoard	RQ broker and short-lived cache.
scheduler	K3s CronJobs	Daily challenge publication, monthly ingest, cache maintenance, and backup jobs.
worker	One per Pi	Independent batches: sample paths, score difficulty, validate data, estimate graph statistics.
snapshot-server	ZimaBoard	Read-only HTTP endpoint for versioned graph packages and checksums.
publisher	ZimaBoard	Writes sanitized JSON/SVG exports to the public hosting layer.
cloudflared	ZimaBoard	Outbound-only public route to the API.

5. Data Pipeline and Snapshot Contract

Register a source snapshot with its date, URLs, file sizes, checksums, parser version, and schema version.
Stream the compressed XML with lxml.iterparse; never inflate the whole release file onto disk or load it into memory.
Write broad normalized Parquet tables, preserving raw role text, normalized role category, track position, release/master IDs, formats, country, year, aliases, and provenance.
Use DuckDB to validate counts, find duplicate keys, profile missing values, classify credits, and create stable internal integer IDs.
Build one or more rule-specific bipartite graph snapshots. Produce adjacency arrays, role/flag arrays, searchable metadata, and a manifest.
Run integrity tests: reciprocal adjacency, valid offsets, known fixture paths, disconnected-component checks, and reproducible hashes.
Publish the snapshot to the ZimaBoard and mark it current only after validation. Keep the previous snapshot available for rollback.
Synchronize the graph package to the Pis one at a time. Workers switch snapshots atomically after checksum verification.
Generate path samples, daily challenges, difficulty scores, and aggregate findings. Persist accepted results in PostgreSQL.
Export small public artifacts and invalidate relevant web caches.

Snapshot manifest example fields: source_month, parser_schema, ruleset, graph_format, artist_count, release_count, credit_count, file checksums, build commit, built_at, and compatibility version.

6. Game and API Architecture

6.1 Core endpoints

Endpoint	Purpose	Caching
GET /artists/search?q=	Autocomplete and disambiguation.	Short public cache; normalized query key.
GET /artists/{id}	Artist metadata and a small neighborhood summary.	Longer cache by snapshot.
POST /paths	Shortest or weighted evidence-backed route between two artists.	Cache by endpoints + ruleset + snapshot.
GET /challenges/daily	Current curated challenge and constraints.	Static/public cache.
POST /challenges/{id}/validate	Validate a player-built path against graph evidence.	Do not cache user-specific response.
GET /research/summary	Published aggregate graph findings.	Static/public cache.
GET /health	API, database, graph snapshot, queue, and worker health.	No or very short cache.

The browser never receives database credentials, filesystem paths, or direct SQL access.
All public responses are snapshot-versioned and include evidence links or source identifiers.
The API is read-only for anonymous users except bounded game-validation requests.
Rate-limit path generation and cache common pairs. Reject unbounded neighborhood expansion requests.

6.2 Front end

Layer	Technology	Use
Static shell	Astro	Project pages, explanations, daily challenge shell, SEO, and deployment to Cloudflare Pages.
Interactive game	Svelte + TypeScript	Artist search, manual path construction, hints, state, and responsive interaction.
Visualization	SVG with selective D3 utilities	Readable evidence chain and small local graph; avoid giant force-directed networks.
Testing	Playwright	Search, path display, mobile interaction, offline fallback, and challenge completion tests.

7. Public-Site Delivery and Availability

Experience	Delivery path	Behavior if home cluster is offline
Project explanation	Astro on Cloudflare Pages	Fully available.
Daily challenge	Published JSON bundled or fetched from R2/Pages	Available using the last published challenge.
Research findings	Static JSON/SVG export	Fully available.
Arbitrary artist search and pathfinding	Browser → Cloudflare → Tunnel → FastAPI on ZimaBoard	Graceful offline message; static modes remain available.
Cluster-status display	Small periodically published artifact	Shows last update time rather than failing.

Security boundary: Cloudflare Tunnel creates outbound connections from the ZimaBoard, so the router needs no inbound port forwarding. Restrict CORS to the site domain, validate inputs, enforce timeouts and maximum search depth, and expose only the API service—not PostgreSQL, Redis, K3s, SSH, or storage shares.

8. Repository and Development Workflow

music-credit-graph/
  apps/api/                 FastAPI application
  apps/frontend/            Astro + Svelte project
  packages/catalog/         XML parsing and normalization
  packages/graph_core/      graph format and algorithms
  packages/game_rules/      configurable connection rules
  packages/workers/         RQ jobs and research tasks
  infra/ansible/             operating-system provisioning
  infra/k8s/                 Kustomize manifests
  data/contracts/            schemas, fixtures, manifests
  tests/                     unit, integration, property, browser

Development concern	Recommendation
Python environment	pyproject.toml with uv; lock dependencies; build reproducible containers.
Quality	Ruff for formatting/linting, mypy for selected typed boundaries, pytest for unit and integration tests.
Data testing	Tiny checked-in XML fixtures, schema tests, count checks, and known-path golden tests.
Containers	Docker Buildx multi-platform images for amd64 and arm64; push to GHCR.
Deployment	Ansible provisions hosts; Kustomize applies environment-specific Kubernetes manifests.
CI	GitHub Actions for tests and image builds; deployment remains explicit at first.
Secrets	Kubernetes Secrets initially; SOPS + age when the repository and deployment mature.
Versioning	Every result records source snapshot, graph ruleset, schema version, and application commit.

9. Build Phases

Phase	Deliverable	Exit criterion
0. Hardware and baseline	NVMe mounted; Wi-Fi stable; Ansible inventory; fresh OS images; monitoring.	All five nodes survive reboot and are reachable by hostname.
1. Cluster skeleton	K3s server on Zima; four Pi agents; sample multi-arch Python worker.	A job is scheduled on every Pi and returns a result.
2. Small-data vertical slice	Handmade or collection-sized catalog; FastAPI path search; Astro/Svelte result screen.	A complete artist → release → artist route works end to end.
3. Durable data model	Parquet schemas, role taxonomy, internal IDs, snapshot manifests, PostgreSQL search.	Changing a game ruleset requires only a graph rebuild, not an XML reparse.
4. Medium graph	Thousands to low millions of credits; local Pi snapshots; RQ challenge generation.	Workers generate reproducible challenges without network graph reads.
5. Full dump pipeline	Full monthly ingest and graph build, preferably accelerated by x600.	A validated snapshot can be published and rolled back.
6. Public beta	Cloudflare Pages, Tunnel API, caching, rate limiting, offline fallback.	The public site fails gracefully when the ZimaBoard is disconnected.
7. Research and pivots	New rule sets, collection mode, hidden-contributor puzzles, network findings.	A new mode is built from normalized data without changing ingestion.

10. Technology Study Matrix

Technology	Category	Role in project	Priority	First study exercise
Python	Primary language	Parsing, graph algorithms, API, workers, tests.	Core	Build a bidirectional BFS over a small bipartite graph.
SQL	Query language	PostgreSQL app queries and DuckDB transformations.	Core	Write a query that finds releases with multiple qualifying performers.
TypeScript	Front-end language	Typed UI state and API contracts.	Core	Render a typed path response as evidence cards.
Bash / YAML	Operations languages	Automation, container entrypoints, Ansible and Kubernetes configuration.	Working knowledge	Automate a health check and deployment variable override.
Ansible	Provisioning	Users, SSH, packages, hostnames, mounts, K3s installation.	Core infra	Rebuild one Pi from a clean image without manual package installation.
K3s / Kubernetes	Orchestration	Scheduling, service discovery, CronJobs, resource limits, rollout.	Core infra	Deploy one global worker and pin PostgreSQL to the ZimaBoard.
Docker / Buildx	Packaging	Reproducible amd64/arm64 services.	Core infra	Publish one image that runs on ZimaBoard and Pi.
PostgreSQL	Transactional database	Search metadata, challenges, caches, game state.	Core data	Add an indexed normalized-name search table.
Redis + RQ	Queue/cache	Distribute independent Python jobs to Pi workers.	Core compute	Queue 1,000 pair-scoring jobs and collect results safely.
Parquet	Columnar file format	Compact normalized catalog and batch interchange.	Core data	Write and read partitioned credits by snapshot or role category.
DuckDB	Analytical SQL engine	Query Parquet, normalize data, validate builds.	Core data	Calculate credit counts and missing-role profiles directly from Parquet.
lxml.iterparse	Streaming XML	Process very large compressed XML without loading it into RAM.	Core data	Parse a fixture while clearing elements to keep memory flat.
NumPy / SciPy sparse	Numerical storage	Compact integer arrays and CSR-style adjacency.	Core graph	Serialize offsets and neighbor arrays, then memory-map them.
NetworkX	Graph reference tool	Prototype and test algorithms on small subgraphs.	Supporting	Compare your custom BFS results with NetworkX fixtures.
FastAPI + Pydantic	API framework	Validated public endpoints and typed response schemas.	Core app	Create /paths with bounded inputs and structured errors.
Astro	Static web framework	Fast project pages and Cloudflare Pages deployment.	Core front end	Create an always-available daily-challenge page.
Svelte	Interactive UI	Search, path construction, hints, state, and transitions.	Core front end	Build a path editor that alternates artist and release cards.
SVG / D3 utilities	Visualization	Evidence chains and small local graphs.	Supporting	Draw a responsive horizontal/vertical path without force layout.
Cloudflare Pages / Tunnel	Public delivery	Static hosting and secure outbound API ingress.	Core delivery	Serve a local health endpoint through a private tunnel hostname.
pytest / Playwright	Testing	Parser, graph, API, and browser regression coverage.	Core quality	Golden-test a path and complete the same challenge in a browser test.
Dask	Optional distributed compute	Larger parameter sweeps or DataFrame experiments after the RQ pipeline works.	Later	Compare a batch calculation with RQ and Dask implementations.

11. Suggested Study Sequence

Order	Focus
1. Python data contracts	Dataclasses/Pydantic, pyproject, pytest, small XML fixtures.
2. Discogs domain model	Artists, releases, masters, credits, track credits, aliases, and source provenance.
3. Streaming ingestion	lxml.iterparse, incremental writes, memory profiling, and restartable jobs.
4. Parquet and DuckDB	Columnar schemas, partitions, analytical SQL, and data-quality checks.
5. Graph representation	Bipartite graphs, integer IDs, adjacency arrays, BFS, Dijkstra, hub penalties.
6. PostgreSQL and API	Indexes, autocomplete, caches, Pydantic schemas, FastAPI testing.
7. Containers and multi-architecture	Dockerfiles, Buildx, amd64/arm64 images, resource constraints.
8. Ansible and K3s	Repeatable hosts, node labels, storage placement, CronJobs, rollouts.
9. Distributed jobs	RQ semantics, retries, idempotency, leases, result validation, local snapshots.
10. Astro/Svelte game UI	Search, path editing, evidence display, accessibility, responsive SVG.
11. Public delivery and security	Cloudflare Tunnel, caching, rate limits, offline fallback, backups.
12. Research methods	Sampling bias, approximate centrality, challenge difficulty, reproducibility.

12. Decisions Intentionally Deferred

Question	Default now	Revisit when
Final game mechanic	Automatic path plus evidence; manual relay can follow.	The medium-size graph is playable and path quality is understood.
Album artwork	No dependency; generated or typographic release tiles.	Discogs and rights constraints are clarified.
SQLite vs PostgreSQL	PostgreSQL from the start for app state.	Only reconsider if operational simplicity becomes more important than learning.
RQ vs Dask	RQ for independent jobs.	A workload needs distributed DataFrame/array execution rather than queued tasks.
Full catalog on day one	No; use staged subsets.	Parser, normalized schema, and graph integrity tests are stable.
Wired ZimaBoard	Wi-Fi accepted with local graph replication.	A switch port or shelf layout makes wiring easy.
Public arbitrary search	After static daily/curated mode works.	Rate limiting, caching, and API availability are proven.

13. Compact Glossary

Term	Meaning here
Bipartite graph	A graph with two node types—artists and releases—where edges only cross between types.
CSR	Compressed sparse row: compact offset and neighbor arrays used for efficient adjacency access.
Memory map	Opening a file as if it were an array while the OS loads only needed pages.
Parquet	Compressed columnar files optimized for analytics rather than transactional updates.
DuckDB	An embedded analytical database that can query Parquet directly.
Snapshot	An immutable, versioned set of graph and metadata files built from one source month and ruleset.
Idempotent job	A task that can safely run again without duplicating or corrupting its result.
Control plane	The K3s components that schedule and manage workloads; hosted on the ZimaBoard.
Agent node	A machine that runs scheduled containers; each Pi is an agent.
Ruleset	Configuration defining which roles, release types, dates, or tracks count as valid connections.
Provenance	The source snapshot and record identifiers that explain where a connection came from.
Static fallback	Prepublished content that remains available when the live home API is offline.

14. Primary Documentation to Bookmark

Discogs API Terms of Use
Discogs developer documentation
Discogs public data snapshots
ZimaBoard 832 product specifications
Zima PCIe to NVMe adapter
K3s requirements and resource profiling
K3s storage documentation
Ansible documentation
PostgreSQL documentation
Redis documentation
RQ documentation
DuckDB documentation
Apache Parquet documentation
FastAPI documentation
Astro documentation
Svelte documentation
Cloudflare Tunnel documentation
Cloudflare Pages documentation