# L3 I-Benchmark — CASSI KAIST 10-Scene

**ID:** `PWM-L3-cassi-kaist-10scenes`
**Parent Spec:** [`PWM-L2-em-cassi-forward`](../spec/spec.md)
**Status:** ⊙ Testnet (genesis)
**Submitter:** _(wallet address)_

---

## 1. Dataset

10 hyperspectral scenes from the KAIST Hyperspectral Dataset (Choi et al., ICCV 2017), cropped and pre-processed into uniform 256×256×28 cubes covering 450–650 nm (28 bands, ~7.4 nm spacing).

Each scene is paired with a deterministic 256×256 binary mask generated by the seed below, so all solvers are evaluated against the exact same measurement.

## 2. Dataset Access

| Property | Value |
|----------|-------|
| **URL** | `ipfs://bafybeic-genesis-cassi-kaist-10s/` |
| **Mirror** | `https://physicsworldmodel.org/datasets/cassi-kaist-10s.tar.gz` |
| **SHA-256** | `0000000000000000000000000000000000000000000000000000000000000000` |
| **Size** | 38 MB compressed, 220 MB uncompressed |
| **License** | CC BY 4.0 (KAIST scenes) — see `LICENSE.txt` in archive |

The archive contains:

```
cassi-kaist-10s/
├── scene_01.npy       # (256, 256, 28) float32, normalized [0,1]
├── scene_02.npy
├── ...
├── scene_10.npy
├── mask.npy           # (256, 256) uint8, binary, density 0.5
├── dispersion.npy     # (28,) int, δ_b = b (linear)
└── ground_truth.json  # PSNR/SSIM targets per scene
```

## 3. Success Metric

| Metric | Threshold | Aggregation |
|--------|-----------|-------------|
| PSNR (dB) | ≥ 28.0 | Mean across 10 scenes, then across 28 bands |
| SSIM | ≥ 0.85 | Same aggregation |
| SAM (degrees) | ≤ 0.20 | Mean spectral angle, lower is better |

A Solution passes iff **all three** thresholds hold simultaneously on the held-out 10 scenes. Sensor noise σ = 0.01 is added at evaluation time (fixed seed).

## 4. Difficulty Tier

`hard` — challenging Ω: requires good spectral prior, robust to dispersion artifacts, sensitive to mask alignment. Genesis-tier benchmark intended to anchor the L4 leaderboard.

## 5. Evaluation Protocol

```
for each scene s in {1..10}:
    x_gt = load(scene_s.npy)
    y    = forward_cassi(x_gt, mask, dispersion, sigma=0.01)
    x_hat = solver(y, mask, dispersion, sigma=0.01)
    record psnr(x_hat, x_gt), ssim(x_hat, x_gt), sam(x_hat, x_gt)

report mean over 10 scenes.
```

Evaluation runs in a pinned Docker container (see `solution/code/` for the reference solver and exact dependency manifest). Solver wall-time cap: 60 seconds per scene on a single A100 GPU.

## 6. Ground-Truth Reference Solvers

The genesis baseline for this benchmark is `PWM-L4-cassi-kaist-gaptv` (see `solution/`) which scores PSNR ≈ 32.4 dB at default parameters. Solutions must **strictly improve** any of (PSNR, SSIM, SAM) without regressing the others to take leaderboard position.

## 7. References

- Choi, I. et al. "High-Quality Hyperspectral Reconstruction Using a Spectral Prior." *ACM TOG* 36, 6 (2017). (KAIST dataset)
- Yuan, X. "Generalized Alternating Projection Based Total Variation Minimization for Compressive Sensing." *ICIP* 2016. (GAP-TV reference solver)

---

## File Mapping

| File | Role | How to regenerate |
|------|------|-------------------|
| `benchmark.md` | Source of truth | Human or LLM |
| `benchmark.json` | Structured metadata, dataset URL, metric thresholds | LLM regenerates from §1 Dataset, §2 Dataset Access, §3 Success Metric, §4 Difficulty Tier, §5 Evaluation Protocol |

**Prompt for your LLM after editing `benchmark.md`:**

> Read `benchmark.md`. Regenerate `benchmark.json` so every field matches.
> Schema:
> `{ id, spec_id, name, description, dataset: {url, mirror_url, sha256, size_bytes, license, file_manifest[]}, success_metrics[{metric, threshold, comparator, aggregation}], difficulty_tier, instance_count, evaluation: {wall_time_seconds, gpu, fixed_seed, noise_sigma}, references[] }`
> Output only the JSON object.
