Pharlux — Testing Guide
The practical "how to run tests locally" reference for contributors. The deeper what and why — the test tiers, the proptest invariants, the crash-recovery rationale — is in TEST_STRATEGY.md. This page is the runbook that complements it.
If you are about to commit code, scroll to The four pre-commit gates — that is the bar.
Contents
- The four pre-commit gates
- Crash recovery — the 10-run gate
- Test tiers
- Test infrastructure
- Frontend tests
- Adding tests for new code
- What CI runs
- Common pitfalls
- Where each test tier lives
The four pre-commit gates
These four commands must pass before you declare a task complete. They mirror what CI runs, so a green local run is a strong predictor of a green CI run. From CLAUDE.md:
cargo fmt --check
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace --all-targets
cargo build --workspace --release
cargo nextest run is the test runner — not cargo test. Nextest runs tests in parallel processes (real isolation, not threads) and produces faster, more deterministic output. The pinned version in CI is cargo-nextest@0.9.132.
If you don't have nextest installed:
cargo install cargo-nextest --version 0.9.132 --locked
You can use cargo test for ad-hoc work but the gate is nextest — some race-sensitive tests will pass under nextest's per-test process isolation and fail under cargo test's shared-process default.
Faster iteration during development
Per-crate runs are fastest:
cargo nextest run -p pharlux-store
cargo nextest run -p pharlux-query --test crash_recovery # specific test target
cargo nextest run -p pharlux-api -- handlers::query # filter by name substring
cargo clippy --workspace --all-targets -- -D warnings is slow; for a single crate use cargo clippy -p <crate> --all-targets -- -D warnings. Run the workspace form before pushing.
Crash recovery — the 10-run gate
From Phase 1 onwards, the crash-recovery test suite must pass 10 consecutive runs without a single flake. This is a hard gate. A flaky crash test is a bug, never an acceptable characteristic — do not retry, do not #[ignore], do not add sleeps.
for i in 1 2 3 4 5 6 7 8 9 10; do
cargo nextest run --test crash_recovery || { echo "FLAKY on run $i"; exit 1; }
done
The crash-recovery test (pharlux-store/tests/crash_recovery.rs) drives a separate wal_harness binary via assert_cmd, sends 10,000 WAL records, SIGKILLs the process after 7,500, then replays and asserts the count is ≥ 7,500 with no corruption panics. It exercises the prost + length-prefix + CRC32 framing from ADR-0018.
If the gate goes red, follow the procedure in TEST_STRATEGY.md § When a crash test becomes flaky. The root cause is always one of:
- missing
fsync, - incorrect partial-record detection on replay,
- a race between WAL flush and rename, or
- unrelated test bleeding state via a shared tempdir.
Fix the root cause. Add a regression test that reproduces the flake deterministically. Re-run the 10-consecutive-runs gate before considering it fixed.
Test tiers
Unit tests (inline #[cfg(test)] modules)
Each crate tests its own logic in isolation. Use tempfile::tempdir() for any test that touches disk — never reuse a fixed path. Async tests use #[tokio::test].
Location: <crate>/src/**/*.rs, inside #[cfg(test)] mod tests { ... } blocks.
cargo nextest run -p pharlux-common
cargo nextest run -p pharlux-auth -- jwt:: # filter by module
cargo nextest run -p pharlux-store wal:: # filter by name
Per-crate test counts as of V1.0.0:
| Crate | Inline tests |
|---|---|
pharlux-store | 60+ inline (heaviest) plus 2 integration test files |
pharlux-api | 38 inline (covers handlers + admin endpoints + dashboards) |
pharlux-auth | 23 inline (jwt round-trip, Argon2id, db migrations) |
pharlux-query | 17 inline (plan rewriter, tenant filter) |
pharlux-ingest | 16 (translate, channel, tenant resolver) |
pharlux-dashboard | 16 inline (CRUD, UNIQUE-violation mapping) |
pharlux-common | 14 inline (config, schema, tenant) |
pharlux-alert | 13 (state machine, payload builders, axum mock) |
The full workspace test count at V1.0.0 was 293 passing. The number grows with each commit; the rule from DOCUMENTATION_POLICY.md is new code without tests does not merge.
Integration tests (<crate>/tests/)
Spin up real components against real disk and real network. Slower than unit tests (10–30s each) but the highest-value tests because they exercise the full stack end-to-end.
| Test file | What it does |
|---|---|
pharlux-bin/tests/e2e.rs | Boots the full server binary in a tempdir, sends OTLP via HTTP to :4318, queries via REST on :3100, asserts results. The Phase 1 exit-gate end-to-end test. |
pharlux-store/tests/storage_integration.rs | End-to-end ingest + query against real Parquet files in a tempdir. Exercises the WAL → Parquet flush → TableProvider → DataFusion path. |
pharlux-store/tests/crash_recovery.rs | The 10-run gate (above). |
Run them individually:
cargo nextest run -p pharlux-bin --test e2e
cargo nextest run -p pharlux-store --test storage_integration
cargo nextest run -p pharlux-store --test crash_recovery
Property-based tests
Proptest invariants — WAL round-trip, OTLP-to-Arrow round-trip, schema-evolution compatibility, timestamp policy, backpressure — are documented in TEST_STRATEGY.md § Proptest invariants. The proptest harness (including the proptest-extended feature flag) is V1.1 work and is not yet wired into this workspace. The proptest crate is not currently a workspace dependency. Do not invoke the feature flag yet — it does not exist.
Race-condition tests
The WAL + Parquet union TableProvider (see crate-map.md § pharlux-store) is the highest-complexity component in the project. Targeted tests use tokio::sync::Barrier to synchronise flush + query at known race points and assert exact row counts. Touching this code requires a human pair-review session per CLAUDE.md — don't add or modify race tests in a solo session.
Load tests
The pharlux-loadtest crate is a standalone OTLP load generator. Build it with cargo build --release -p pharlux-loadtest, then point it at a running server. The V1 reference target (used to validate the 500k pts/sec ceiling on a 4 vCPU / 8 GB VPS) is documented in the release notes for v1.0.0.
cargo build --release -p pharlux-loadtest
./target/release/pharlux-loadtest --points-per-sec 500000 --duration 60s --endpoint http://localhost:4318
Test infrastructure
| Crate | Role |
|---|---|
tempfile | Ephemeral directories — every test that touches disk. Never share a fixed path between tests. |
assert_cmd | Subprocess management for crash-recovery and e2e tests (Command::cargo_bin("pharlux")). |
tokio (test feature) | #[tokio::test] on async tests, tokio::sync::Barrier for race synchronisation. |
reqwest (in tests) | HTTP client for e2e and webhook-mock tests. |
nextest | The runner. CI pins cargo-nextest@0.9.132. |
wal_harness (pharlux-store test bin) | Subprocess wrapper used by the crash-recovery test. Built automatically by the test target. |
Logging in tests: set RUST_LOG=trace (or pharlux_store=trace) when reproducing a flake. The default test config keeps logs quiet.
Frontend tests
pharlux-ui/ is a Vite + TypeScript project. The CI pipeline runs npm ci && npm run build before any cargo step (the build populates pharlux-ui/dist/ for rust-embed).
There is no Rust-side browser-rendering test in V1. The pages and panels are exercised manually during development; the API surface they depend on is exercised by pharlux-api's inline tests and pharlux-bin's e2e test. Headless-browser smoke tests are V1.x.
Local iteration:
cd pharlux-ui
npm ci
npm run dev # vite dev server with HMR; talks to a separately-running pharlux server
npm run build # production build → pharlux-ui/dist
npm run lint # eslint flat config
The known set-state-in-effect warning in DashboardPage.tsx is non-blocking and not in CI's lint scope. Don't introduce new warnings.
Adding tests for new code
DOCUMENTATION_POLICY.md's definition-of-done says: new tests exist for all new code (at least one unit test per new function; integration test for new public APIs). Apply it strictly.
Patterns that pass review:
- New SQLite store method → at least one inline unit test using
tempdir()to create the db, exercising the happy path and one error path. - New REST endpoint → at least one integration test in
pharlux-api/src/handlers/...'s#[cfg(test)]block, plus an e2e cross-check inpharlux-bin/tests/e2e.rsif it touches public surface. - New OTLP validation rule → a translate-time unit test in
pharlux-ingest/src/translate.rsplus a proptest invariant if the rule is generic (timestamp bounds, body size, etc.). - New CLI subcommand → an inline test in
pharlux-bin/src/main.rsagainst a temp data directory (the existingpharlux usertests are the reference pattern). - New cross-tenant code path → a test that creates two tenants and verifies the cross-tenant call returns 404, not 403.
Patterns that don't:
#[ignore]to silence flakes. Never. Find the root cause.- Sleeps to "fix" timing flakes. Never. Use barriers or explicit signals.
- Tests that depend on external network. Mock with an in-process axum server (the
pharlux-alertwebhook tests are the reference pattern). - Tests that share a fixed
/tmp/...path between runs. Usetempdir(). Always.
What CI runs
.github/workflows/ci.yml on every PR and main-branch push. The job is named check and runs:
npm ci && npm run buildinpharlux-ui/(populatesdist/forrust-embed).cargo fmt --all --check.cargo clippy --workspace --all-targets -- -D warnings.cargo nextest run --workspace --all-targets --no-tests=pass.cargo build --workspace --release.
Plus the dco job on PRs only — verifies every commit has a Signed-off-by: trailer (git commit -s adds it automatically). Missing sign-offs block merge.
The toolchain is pinned: Rust 1.95.0, Node 22, cargo-nextest@0.9.132. CI uses sccache over GitHub Actions cache for warm compile caches.
The integration / nightly workflow with the 10-consecutive-runs crash-recovery gate is described in TEST_STRATEGY.md § CI matrix and is operator-managed (not yet wired as a separate workflow file at V1.0.0).
The musl release build runs out-of-CI on tag, via cross build --release --target x86_64-unknown-linux-musl. This is the path used to produce the binary attached to GitHub Releases.
Common pitfalls
cargo testinstead ofcargo nextest run. The gate is nextest. Some race-sensitive tests pass under nextest's per-test process isolation and fail undercargo test. Always use nextest before pushing.- Forgetting to build the frontend.
cargo build --releaseinpharlux-binwill fail withrust-embederror ifpharlux-ui/dist/is missing or empty. Runnpm ci && npm run buildinpharlux-ui/first. - Shared
/tmp/pharlux-test/...path between tests. Two test processes running in parallel under nextest will collide. Usetempfile::tempdir()for everything. - Forgetting the SPDX header. Every new
.rsfile needs// SPDX-License-Identifier: AGPL-3.0-onlyat the top. Clippy doesn't enforce this; reviewers do. - Forgetting
git commit -s. The DCO check fails the PR. Easiest fix: re-do the commit with-s. - Using
RUST_LOG=traceand missing the actual error in the noise. Filter narrowly:RUST_LOG=pharlux_store=trace,info. - Adding a new dep to satisfy a test. Don't. Every dep needs a
VERSIONS.mdentry and human approval. Use what's already pinned, or escalate.
Where each test tier lives
| Tier | Location | Run with |
|---|---|---|
| Unit (inline) | <crate>/src/**/*.rs (#[cfg(test)]) | cargo nextest run -p <crate> |
| Integration (e2e binary) | pharlux-bin/tests/e2e.rs | cargo nextest run -p pharlux-bin --test e2e |
| Integration (storage) | pharlux-store/tests/storage_integration.rs | cargo nextest run -p pharlux-store --test storage_integration |
| Crash recovery | pharlux-store/tests/crash_recovery.rs | cargo nextest run --test crash_recovery (×10 for the gate) |
| Load | pharlux-loadtest/ (binary crate) | cargo run --release -p pharlux-loadtest -- ... |
| Frontend | pharlux-ui/ (Vite + TS) | npm run lint, npm run build |
For the rationale and invariants behind each tier, see TEST_STRATEGY.md. For the per-crate breakdown of where tests are concentrated, see crate-map.md.
Last updated: 2026-05-02.