<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Pharlux Blog</title>
        <link>https://pharlux.com/blog</link>
        <description>Pharlux Blog</description>
        <lastBuildDate>Tue, 05 May 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Pharlux © 2026 Veltara Works · AGPL-3.0 + Commercial</copyright>
        <item>
            <title><![CDATA[Running Pharlux on a $20/month VPS]]></title>
            <link>https://pharlux.com/blog/pharlux-on-a-20-dollar-vps</link>
            <guid>https://pharlux.com/blog/pharlux-on-a-20-dollar-vps</guid>
            <pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A practical setup guide for running Pharlux on a single sub-$25/month VPS — one binary, one config, one systemd unit, no Docker.]]></description>
            <content:encoded><![CDATA[<p><em>Last updated: 2026-05-05 · Pharlux v1.0.0 · By Ian Holt</em></p>
<p>If your Datadog bill is creeping past the comfort line — or if your Loki + Mimir + Tempo + Grafana + Alertmanager stack has become a part-time job you didn't sign up for — there <em>is</em> a third option.</p>
<p>This is the practical setup guide for running Pharlux on a single sub-$25/month VPS: one binary, one config file, one systemd unit, no Docker, no external databases beyond embedded SQLite. By the end you will have an OpenTelemetry-native ingestion endpoint (OTLP gRPC and HTTP/protobuf), durable storage, and SQL across metrics and logs through Apache DataFusion. Setup is about thirty minutes.</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-honest-math">The honest math<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#the-honest-math" class="hash-link" aria-label="Direct link to The honest math" title="Direct link to The honest math" translate="no">​</a></h2>
<p>Three options for a small team running 5 services:</p>

























<table><thead><tr><th>Option</th><th>Monthly cost</th><th>Operational overhead</th></tr></thead><tbody><tr><td>Datadog Pro across 5 hosts (with logs + APM)</td><td>Low-hundreds-of-dollars-per-month range, varying with retention, log volume, and APM coverage</td><td>Low — vendor operates the platform; you operate the bill</td></tr><tr><td>Self-operated LGTM stack on a 4 vCPU / 16 GB VPS</td><td>~$50–80/month for the box, plus engineer-hours to operate Loki + Mimir + Tempo + Grafana + Alertmanager (5 components, 5 configs, 5 upgrade cycles)</td><td>High — sub-linear scaling; every service added means more to operate</td></tr><tr><td>Pharlux Community on a 4 GB / 2 vCPU VPS</td><td>~$15–25/month for the box, $0 software (AGPL-3.0)</td><td>Low — single binary, single config, single systemd unit</td></tr></tbody></table>
<p>(Datadog ranges are intentionally that — ranges. Their published pricing for Infrastructure Pro, Log Management, and APM is the source; numbers move with host count, retention, and event volume. Check their pricing page for current specifics.)</p>
<p>The cost gap is real. The operational gap is harder to prove — and that one comes down to design choices.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-a-4-gb--2-vcpu-vps-is-enough">Why a 4 GB / 2 vCPU VPS is enough<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#why-a-4-gb--2-vcpu-vps-is-enough" class="hash-link" aria-label="Direct link to Why a 4 GB / 2 vCPU VPS is enough" title="Direct link to Why a 4 GB / 2 vCPU VPS is enough" translate="no">​</a></h2>
<p>Pharlux's design centre is 1–10 services on a single VPS. The architecture pays for that:</p>
<ul>
<li class="">A single statically-linked Rust binary, ~85 MB on disk. No Docker daemon, no init system inside a container, no orchestrator. systemd starts it.</li>
<li class="">Embedded SQLite for metadata. No Postgres, no Kafka, no ClickHouse.</li>
<li class="">A custom write-ahead log (WAL) followed by per-signal Apache Parquet files on local disk. Frozen formats: WAL framing per ADR-0018, Parquet schemas per ADR-0003.</li>
<li class="">Apache DataFusion as the in-process query engine, capped at a 256 MB MemoryPool in V1 (ADR-0011) so a runaway query cannot OOM the box.</li>
<li class="">A custom DataFusion <code>TableProvider</code> that unions the live WAL with on-disk Parquet (ADR-0002) — freshly-ingested data is queryable without delay.</li>
<li class="">Memory-safe TLS via rustls. Zero OpenSSL in the dependency tree. The binary is genuinely static musl — no glibc surprise on the target VPS.</li>
</ul>
<p>Sustained load testing on a 4 vCPU / 8 GB VPS produced 577,000 metric points/sec over 17.36 million points with zero errors and 7 ms average request latency. The 4 GB / 2 vCPU tier handles considerably less than that — call it the working envelope for a small team's actual production traffic, with headroom — but the architectural ceiling sits well above the small-team workload.</p>
<p>What you would outgrow this for: 250+ hosts, multi-region deployments, dedicated SaaS-grade isolation with regulatory attestations. Pharlux is not pretending to be those things in V1.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="picking-a-provider">Picking a provider<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#picking-a-provider" class="hash-link" aria-label="Direct link to Picking a provider" title="Direct link to Picking a provider" translate="no">​</a></h2>
<p>Any reputable provider with a 4 GB / 2 vCPU tier will run this fine. Approximate monthly pricing for that tier as of writing:</p>








































<table><thead><tr><th>Provider</th><th>Plan</th><th>Approximate monthly cost</th></tr></thead><tbody><tr><td>Hetzner Cloud</td><td>CX22</td><td>~€6</td></tr><tr><td>OVHCloud</td><td>VPS Comfort</td><td>~$15</td></tr><tr><td>BinaryLane</td><td>std-2vcpu</td><td>~$20</td></tr><tr><td>DigitalOcean</td><td>Premium AMD 4 GB</td><td>$24</td></tr><tr><td>Linode (Akamai)</td><td>Shared 4 GB</td><td>$24</td></tr><tr><td>Vultr</td><td>Cloud Compute 4 GB</td><td>$24</td></tr></tbody></table>
<p>This is not a recommendation — Pharlux runs the same way on any of them. <em>(Disclosure: Veltara Works dogfoods on BinaryLane.)</em> Pick whichever you already trust, in the region closest to whatever you are observing.</p>
<p>What to check when choosing:</p>
<ul>
<li class=""><strong>Network egress pricing</strong> — Pharlux is the receiving side of OTLP traffic, so ingress is free almost everywhere. Outbound is mostly your dashboard sessions.</li>
<li class=""><strong>Snapshot or backup pricing</strong> — see the FAQ section below.</li>
<li class=""><strong>IPv6 availability</strong> — useful for OTLP from IPv6-only services.</li>
<li class=""><strong>Region proximity</strong> — keep the box close to your services to keep OTLP latency low.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="30-minute-setup">30-minute setup<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#30-minute-setup" class="hash-link" aria-label="Direct link to 30-minute setup" title="Direct link to 30-minute setup" translate="no">​</a></h2>
<p>The canonical reference is the <a class="" href="https://pharlux.com/docs/getting-started/">Getting Started guide</a> — this section is the abbreviated version.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-provision-the-vps">1. Provision the VPS<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#1-provision-the-vps" class="hash-link" aria-label="Direct link to 1. Provision the VPS" title="Direct link to 1. Provision the VPS" translate="no">​</a></h3>
<p>Ubuntu 24.04 LTS is the supported baseline. Other recent systemd-based distributions work too — the binary is statically-linked musl, so glibc version is not a constraint.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-download-the-v100-binary">2. Download the v1.0.0 binary<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#2-download-the-v100-binary" class="hash-link" aria-label="Direct link to 2. Download the v1.0.0 binary" title="Direct link to 2. Download the v1.0.0 binary" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> </span><span class="token function" style="color:rgb(220, 220, 170)">curl</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-fSL</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-o</span><span class="token plain"> /usr/local/bin/pharlux </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  https://github.com/Veltara-Works/pharlux/releases/download/v1.0.0/pharlux-linux-amd64</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> </span><span class="token function" style="color:rgb(220, 220, 170)">chmod</span><span class="token plain"> +x /usr/local/bin/pharlux</span><br></div></code></pre></div></div>
<p>The download is one file, ~85 MB. Verify the checksum from the release page:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token function" style="color:rgb(220, 220, 170)">curl</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-fSL</span><span class="token plain"> https://github.com/Veltara-Works/pharlux/releases/download/v1.0.0/pharlux-linux-amd64.sha256 </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token operator" style="color:rgb(212, 212, 212)">|</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">cd /usr/local/bin </span><span class="token operator" style="color:rgb(212, 212, 212)">&amp;&amp;</span><span class="token plain"> sha256sum </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-c</span><span class="token plain"> -</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-install-the-systemd-unit">3. Install the systemd unit<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#3-install-the-systemd-unit" class="hash-link" aria-label="Direct link to 3. Install the systemd unit" title="Direct link to 3. Install the systemd unit" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> pharlux </span><span class="token function" style="color:rgb(220, 220, 170)">install</span><br></div></code></pre></div></div>
<p>That single command writes <code>/etc/systemd/system/pharlux.service</code> with <code>MemoryMax=1G</code>, <code>Restart=always</code>, <code>DynamicUser=yes</code>, <code>StateDirectory=pharlux</code>, and <code>ConfigurationDirectory=pharlux</code>. It also creates <code>/etc/pharlux/jwt.secret</code> (64 random bytes, mode <code>0640</code>) and prepares <code>/var/lib/pharlux/</code> for the systemd-managed dynamic UID. No host-level <code>pharlux</code> user is needed.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-optional-configure">4. (Optional) Configure<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#4-optional-configure" class="hash-link" aria-label="Direct link to 4. (Optional) Configure" title="Direct link to 4. (Optional) Configure" translate="no">​</a></h3>
<p>A zero-length <code>/etc/pharlux/pharlux.toml</code> is a valid configuration — every key has a sensible default, including binding <code>0.0.0.0</code> on ports <code>3100</code> (HTTP/UI), <code>4317</code> (OTLP gRPC), and <code>4318</code> (OTLP HTTP). Override only what you need to change. The full key reference is at <a class="" href="https://pharlux.com/docs/getting-started/"><code>/docs/getting-started/</code></a> and <a class="" href="https://pharlux.com/docs/sizing-guide/"><code>/docs/sizing-guide/</code></a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-open-the-otlp-and-ui-ports">5. Open the OTLP and UI ports<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#5-open-the-otlp-and-ui-ports" class="hash-link" aria-label="Direct link to 5. Open the OTLP and UI ports" title="Direct link to 5. Open the OTLP and UI ports" translate="no">​</a></h3>
<p>Pharlux trusts the reverse proxy for TLS termination. For public-facing OTLP, run it behind nginx, Caddy, or Cloudflare Tunnel. On a private network, internal-only is fine.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> ufw allow </span><span class="token number" style="color:rgb(181, 206, 168)">3100</span><span class="token plain">/tcp comment </span><span class="token string" style="color:rgb(206, 145, 120)">'Pharlux UI/API'</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> ufw allow </span><span class="token number" style="color:rgb(181, 206, 168)">4317</span><span class="token plain">/tcp comment </span><span class="token string" style="color:rgb(206, 145, 120)">'OTLP gRPC'</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> ufw allow </span><span class="token number" style="color:rgb(181, 206, 168)">4318</span><span class="token plain">/tcp comment </span><span class="token string" style="color:rgb(206, 145, 120)">'OTLP HTTP'</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-start-it-and-create-the-bootstrap-admin">6. Start it and create the bootstrap admin<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#6-start-it-and-create-the-bootstrap-admin" class="hash-link" aria-label="Direct link to 6. Start it and create the bootstrap admin" title="Direct link to 6. Start it and create the bootstrap admin" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> systemctl daemon-reload</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> systemctl </span><span class="token builtin class-name" style="color:rgb(78, 201, 176)">enable</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--now</span><span class="token plain"> pharlux</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token comment" style="color:rgb(106, 153, 85)"># Confirm the service is healthy (no auth required for /health):</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token function" style="color:rgb(220, 220, 170)">curl</span><span class="token plain"> http://localhost:3100/api/v1/health</span><br></div></code></pre></div></div>
<p>The bootstrap admin is created via the host CLI (the operator-trust path); subsequent users are added through the API by an existing admin.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> systemctl stop pharlux</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token builtin class-name" style="color:rgb(78, 201, 176)">read</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-rs</span><span class="token plain"> PHARLUX_PW</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> pharlux user </span><span class="token function" style="color:rgb(220, 220, 170)">add</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--username</span><span class="token plain"> admin </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--password</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"</span><span class="token string variable" style="color:rgb(156, 220, 254)">$PHARLUX_PW</span><span class="token string" style="color:rgb(206, 145, 120)">"</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">--admin</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token function" style="color:rgb(220, 220, 170)">sudo</span><span class="token plain"> systemctl start pharlux</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token builtin class-name" style="color:rgb(78, 201, 176)">unset</span><span class="token plain"> PHARLUX_PW</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-point-your-opentelemetry-collector-at-it">7. Point your OpenTelemetry Collector at it<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#7-point-your-opentelemetry-collector-at-it" class="hash-link" aria-label="Direct link to 7. Point your OpenTelemetry Collector at it" title="Direct link to 7. Point your OpenTelemetry Collector at it" translate="no">​</a></h3>
<p>Sample <code>otel-collector.yaml</code> exporter section:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token key atrule">exporters</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">otlp/pharlux</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token key atrule">endpoint</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"your-vps-ip:4317"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token key atrule">tls</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">insecure</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token boolean important">true</span><span class="token plain">   </span><span class="token comment" style="color:rgb(106, 153, 85)"># only on a trusted network; use TLS in production</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token key atrule">service</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token key atrule">pipelines</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token key atrule">metrics</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">receivers</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">[</span><span class="token plain">otlp</span><span class="token punctuation" style="color:rgb(212, 212, 212)">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">exporters</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">[</span><span class="token plain">otlp/pharlux</span><span class="token punctuation" style="color:rgb(212, 212, 212)">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token key atrule">logs</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">receivers</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">[</span><span class="token plain">otlp</span><span class="token punctuation" style="color:rgb(212, 212, 212)">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">      </span><span class="token key atrule">exporters</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">[</span><span class="token plain">otlp/pharlux</span><span class="token punctuation" style="color:rgb(212, 212, 212)">]</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="8-verify-it-is-working">8. Verify it is working<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#8-verify-it-is-working" class="hash-link" aria-label="Direct link to 8. Verify it is working" title="Direct link to 8. Verify it is working" translate="no">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token comment" style="color:rgb(106, 153, 85)"># Log in as the bootstrap admin to get a JWT.</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token assign-left variable" style="color:rgb(156, 220, 254)">TOKEN</span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token variable" style="color:rgb(156, 220, 254)">$(</span><span class="token variable function" style="color:rgb(220, 220, 170)">curl</span><span class="token variable" style="color:rgb(156, 220, 254)"> </span><span class="token variable parameter variable" style="color:rgb(156, 220, 254)">-s</span><span class="token variable" style="color:rgb(156, 220, 254)"> </span><span class="token variable parameter variable" style="color:rgb(156, 220, 254)">-X</span><span class="token variable" style="color:rgb(156, 220, 254)"> POST http://your-vps-ip:3100/api/v1/auth/login </span><span class="token variable punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token variable" style="color:rgb(156, 220, 254)"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token variable" style="color:rgb(156, 220, 254)">  </span><span class="token variable parameter variable" style="color:rgb(156, 220, 254)">-H</span><span class="token variable" style="color:rgb(156, 220, 254)"> </span><span class="token variable string" style="color:rgb(206, 145, 120)">"Content-Type: application/json"</span><span class="token variable" style="color:rgb(156, 220, 254)"> </span><span class="token variable punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token variable" style="color:rgb(156, 220, 254)"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token variable" style="color:rgb(156, 220, 254)">  </span><span class="token variable parameter variable" style="color:rgb(156, 220, 254)">-d</span><span class="token variable" style="color:rgb(156, 220, 254)"> </span><span class="token variable string" style="color:rgb(206, 145, 120)">'{"username":"admin","password":"your-password"}'</span><span class="token variable" style="color:rgb(156, 220, 254)"> </span><span class="token variable operator" style="color:rgb(212, 212, 212)">|</span><span class="token variable" style="color:rgb(156, 220, 254)"> jq </span><span class="token variable parameter variable" style="color:rgb(156, 220, 254)">-r</span><span class="token variable" style="color:rgb(156, 220, 254)"> .token</span><span class="token variable" style="color:rgb(156, 220, 254)">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token comment" style="color:rgb(106, 153, 85)"># Run a query.</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token function" style="color:rgb(220, 220, 170)">curl</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-s</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-X</span><span class="token plain"> POST http://your-vps-ip:3100/api/v1/query </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-H</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"Content-Type: application/json"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-H</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">"Authorization: Bearer </span><span class="token string variable" style="color:rgb(156, 220, 254)">$TOKEN</span><span class="token string" style="color:rgb(206, 145, 120)">"</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(156, 220, 254)">-d</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">'{"sql":"SELECT count(*) FROM metrics WHERE timestamp &gt; now() - INTERVAL '</span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain">'</span><span class="token string" style="color:rgb(206, 145, 120)">'5 minutes'</span><span class="token punctuation" style="color:rgb(212, 212, 212)">\</span><span class="token plain">'</span><span class="token string" style="color:rgb(206, 145, 120)">'"}'</span><span class="token plain"> </span><span class="token operator" style="color:rgb(212, 212, 212)">|</span><span class="token plain"> jq </span><span class="token builtin class-name" style="color:rgb(78, 201, 176)">.</span><br></div></code></pre></div></div>
<p>If you have been sending OTLP for at least a minute, you should see a non-zero count.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-you-will-see">What you will see<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#what-you-will-see" class="hash-link" aria-label="Direct link to What you will see" title="Direct link to What you will see" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Pharlux dashboard with the eight metric series from a 30-second OTLP load test — top metrics by count chart on the left, log severity distribution on the right (empty here because the test sent only metrics), recent logs panel below." src="https://pharlux.com/assets/images/dashboard-fc30173c16af8c12ab0dcfd9f12e5dcb.png" width="1440" height="900" class="img_ev3q"></p>
<p>The shipped UI handles ad-hoc SQL, saved queries, and basic dashboards. Cross-signal queries — for example, joining metrics and logs by <code>trace_id</code> — are one-line SQL through the DataFusion <code>TableProvider</code>. The same data is queryable through the HTTP API by any tool that can speak HTTP and SQL.</p>
<p><img decoding="async" loading="lazy" alt="Pharlux SQL Query view — SELECT name, count(*) AS cnt FROM metrics GROUP BY name ORDER BY cnt DESC returns 8 rows in a few hundred milliseconds against ~60k metric points." src="https://pharlux.com/assets/images/query-results-006d538c4127163c824f1b5a9cda481d.png" width="1440" height="900" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="12-month-cost-summary">12-month cost summary<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#12-month-cost-summary" class="hash-link" aria-label="Direct link to 12-month cost summary" title="Direct link to 12-month cost summary" translate="no">​</a></h2>
<p>For a small team running 5 services on a single 4 GB / 2 vCPU VPS:</p>






























<table><thead><tr><th>Line item</th><th>Monthly</th><th>Annual</th></tr></thead><tbody><tr><td>VPS rental (4 GB / 2 vCPU)</td><td>~$20</td><td>~$240</td></tr><tr><td>Pharlux Community (AGPL-3.0)</td><td>$0</td><td>$0</td></tr><tr><td>Optional snapshot backups</td><td>~$2</td><td>~$24</td></tr><tr><td><strong>Total</strong></td><td><strong>~$22</strong></td><td><strong>~$264</strong></td></tr></tbody></table>
<p>For comparison, a Datadog Pro deployment for the same 5 services with logs and APM lands in the low-hundreds-of-dollars-per-month range, depending on host count, log volume, and APM coverage. The cost-reduction ratio for the small-team scenario typically falls between 10× and 25×.</p>
<p>The point of citing ranges rather than a single Datadog number is that those numbers move. The point of citing the Pharlux number exactly is that it does not — $0 is $0, and the VPS line is whatever your provider charges.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="honest-limitations">Honest limitations<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#honest-limitations" class="hash-link" aria-label="Direct link to Honest limitations" title="Direct link to Honest limitations" translate="no">​</a></h2>
<p>What you give up at this scale, in V1:</p>
<ul>
<li class=""><strong>No traces yet.</strong> Traces ship in V1.1 (ADR-0005). If you need distributed tracing today, run Pharlux for metrics + logs and a separate trace store in parallel.</li>
<li class=""><strong>No PromQL yet.</strong> Queries are SQL via DataFusion in V1; PromQL ships in V1.1 (ADR-0005). If your team has months of PromQL muscle memory, that is real switching cost.</li>
<li class=""><strong>Single-VPS architecture.</strong> Multi-VPS clustering and S3 cold tier are V1.1+ features at the commercial-tier level. The Scale tier ceiling is 250 hosts. If you require more hosts, <a href="mailto:licensing@pharlux.com?subject=Pharlux%20enquiry%20%E2%80%94%20more%20than%20250%20hosts" target="_blank" rel="noopener noreferrer" class="">send us a message</a>.</li>
<li class=""><strong>No managed cloud option.</strong> Pharlux is self-hosted-first. If you need a fully-managed SaaS with credit-card sign-up, that is not the V1 product.</li>
<li class=""><strong>No SAML / OIDC / LDAP in Community.</strong> Those are commercial-tier features. JWT and admin/read-only auth are in V1 Community.</li>
</ul>
<p>There are also scenarios where Pharlux is genuinely the wrong choice. If your team has months of Grafana dashboards-as-code investment and a dedicated SRE who enjoys operating the LGTM stack, the migration cost outweighs the operational savings. If you need traces today, see above. If your shop is large enough to have a Datadog budget and uses every feature of the platform, the cost framing of this post does not apply to you.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="frequently-asked-questions">Frequently asked questions<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#frequently-asked-questions" class="hash-link" aria-label="Direct link to Frequently asked questions" title="Direct link to Frequently asked questions" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="will-a-20month-vps-actually-handle-my-workload">Will a $20/month VPS actually handle my workload?<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#will-a-20month-vps-actually-handle-my-workload" class="hash-link" aria-label="Direct link to Will a $20/month VPS actually handle my workload?" title="Direct link to Will a $20/month VPS actually handle my workload?" translate="no">​</a></h3>
<p>For 1–10 services with typical metric and log volume, yes. The 4 GB / 2 vCPU tier is the documented minimum spec. Teams operating closer to the upper end of that range — 8–10 active services with high-cardinality metrics — should consider the next tier up (8 GB / 4 vCPU, around $30–50/month depending on provider).</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="can-i-run-other-things-on-the-same-vps">Can I run other things on the same VPS?<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#can-i-run-other-things-on-the-same-vps" class="hash-link" aria-label="Direct link to Can I run other things on the same VPS?" title="Direct link to Can I run other things on the same VPS?" translate="no">​</a></h3>
<p>Yes. Pharlux's memory budget is bounded — DataFusion is capped at a 256 MB MemoryPool in V1 (ADR-0011), and the rest of the binary's working set is small. The original design target was deliberately co-tenancy-friendly. If the VPS already runs a small web service or two, Pharlux fits alongside them.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-is-the-backup-story">What is the backup story?<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#what-is-the-backup-story" class="hash-link" aria-label="Direct link to What is the backup story?" title="Direct link to What is the backup story?" translate="no">​</a></h3>
<p>Pharlux's state lives in two places: the Parquet directory on disk, and the embedded SQLite metadata file. A nightly filesystem snapshot at the provider level (most VPS providers offer this for a few dollars a month) is the cheapest workable backup. For finer-grained backups, the Parquet files are append-only after rotation — <code>rsync</code> to off-box storage works well.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-do-i-upgrade-pharlux">How do I upgrade Pharlux?<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#how-do-i-upgrade-pharlux" class="hash-link" aria-label="Direct link to How do I upgrade Pharlux?" title="Direct link to How do I upgrade Pharlux?" translate="no">​</a></h3>
<p><code>systemctl stop pharlux</code>, replace the binary, <code>systemctl start pharlux</code>. The WAL format is frozen (ADR-0018) and Parquet schemas are frozen (ADR-0003), so V1.x patch upgrades and the V1.0 → V1.1 transition are binary swaps, not data migrations.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-if-i-outgrow-the-20month-tier">What if I outgrow the $20/month tier?<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#what-if-i-outgrow-the-20month-tier" class="hash-link" aria-label="Direct link to What if I outgrow the $20/month tier?" title="Direct link to What if I outgrow the $20/month tier?" translate="no">​</a></h3>
<p>Move to the 8 GB / 4 vCPU tier on the same provider. The systemd unit and config carry over; the data directory is rsync-able. If you outgrow that, the commercial tiers — Team ($49/month, 10 hosts), Business ($199/month, 50 hosts), Scale ($899/month, 250 hosts) — extend retention, add SAML / OIDC / LDAP, audit log, white-label, and S3 cold tier as you grow.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="can-i-migrate-from-grafana--loki-gradually">Can I migrate from Grafana + Loki gradually?<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#can-i-migrate-from-grafana--loki-gradually" class="hash-link" aria-label="Direct link to Can I migrate from Grafana + Loki gradually?" title="Direct link to Can I migrate from Grafana + Loki gradually?" translate="no">​</a></h3>
<p>Yes. Point your OpenTelemetry Collector at both Pharlux and your existing stack in parallel. Verify Pharlux is seeing the data and the queries you care about work. Cut over alerting last. There is no big-bang migration.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="is-the-community-tier-really-free-forever">Is the Community tier really free forever?<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#is-the-community-tier-really-free-forever" class="hash-link" aria-label="Direct link to Is the Community tier really free forever?" title="Direct link to Is the Community tier really free forever?" translate="no">​</a></h3>
<p>Yes. The Community edition is AGPL-3.0 and unmetered — run it at any scale. The commercial tiers exist for organisations that want SAML, audit log, white-label, S3 cold tier, or support; they are dual-licensed (ADR-0022) and remove the AGPL terms for organisations that need that.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="get-pharlux">Get Pharlux<a href="https://pharlux.com/blog/pharlux-on-a-20-dollar-vps#get-pharlux" class="hash-link" aria-label="Direct link to Get Pharlux" title="Direct link to Get Pharlux" translate="no">​</a></h2>
<ul>
<li class=""><strong>Download v1.0.0</strong> — <a href="https://github.com/Veltara-Works/pharlux/releases/tag/v1.0.0" target="_blank" rel="noopener noreferrer" class="">github.com/Veltara-Works/pharlux/releases/tag/v1.0.0</a></li>
<li class=""><strong>Documentation</strong> — <a href="https://pharlux.com/docs/getting-started/" target="_blank" rel="noopener noreferrer" class="">pharlux.com/docs/getting-started/</a></li>
<li class=""><strong>Source</strong> — <a href="https://github.com/Veltara-Works/pharlux" target="_blank" rel="noopener noreferrer" class="">github.com/Veltara-Works/pharlux</a></li>
</ul>
<p>Pharlux is one of several developer tools built by <a href="https://veltaraworks.com/" target="_blank" rel="noopener noreferrer" class="">Veltara Works</a> — alongside email hosting, cloud infrastructure, and software license management. See <a href="https://veltaraworks.com/" target="_blank" rel="noopener noreferrer" class="">veltaraworks.com</a> for the full portfolio.</p>]]></content:encoded>
            <category>tutorial</category>
            <category>self-hosting</category>
            <category>observability</category>
            <category>otel</category>
        </item>
        <item>
            <title><![CDATA[How the WAL+Parquet union query works]]></title>
            <link>https://pharlux.com/blog/wal-parquet-union-query</link>
            <guid>https://pharlux.com/blog/wal-parquet-union-query</guid>
            <pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A walkthrough of the Apache DataFusion TableProvider that lets Pharlux serve queries against in-memory WAL records and on-disk Parquet files as a single consistent view — the load-bearing piece behind sub-second freshness.]]></description>
            <content:encoded><![CDATA[<p><em>Last updated: 2026-05-05 · Pharlux v1.0.0 · By Ian Holt</em></p>
<p>The thing most observability platforms don't talk about: you cannot query data you have not persisted, and persisting hurts throughput. ClickHouse-style batch writers compress beautifully and scan fast, but typical batch intervals introduce tens of seconds of staleness before new data is queryable. In-memory buffers are immediate but lose data on crash. The two requirements — durability and freshness — pull in opposite directions, and most platforms pick one and document around the other.</p>
<p>Pharlux ships a different design. A custom Apache DataFusion <code>TableProvider</code> unions an in-memory WAL buffer with on-disk Apache Parquet files into one consistent view at query time. Freshly-ingested data is queryable as soon as it lands in the WAL — no flush wait, no buffer-window staleness — and it is queryable through the same SQL surface as historical data on disk.</p>
<!-- -->
<p>This is the load-bearing piece behind Pharlux's freshness story, and the riskiest decision we made in the whole architecture — risky enough that we held the proof-of-concept as a mandatory gate before writing any production code. It lives in <a href="https://github.com/Veltara-Works/pharlux/blob/v1.0.0/pharlux-store/src/table_provider.rs" target="_blank" rel="noopener noreferrer" class=""><code>pharlux-store/src/table_provider.rs</code></a>. This post walks through the design, the runtime semantics, and the trade-offs.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-trade-off-concretely">The trade-off, concretely<a href="https://pharlux.com/blog/wal-parquet-union-query#the-trade-off-concretely" class="hash-link" aria-label="Direct link to The trade-off, concretely" title="Direct link to The trade-off, concretely" translate="no">​</a></h2>
<p>Three specific scenarios drive the design:</p>
<ol>
<li class=""><strong>Acknowledged-but-not-flushed data must survive a crash.</strong> When OTel Collectors send a batch and Pharlux returns 200 OK, that data is persisted. A <code>SIGKILL</code> immediately afterwards must not lose the batch.</li>
<li class=""><strong>Just-acknowledged data must be queryable.</strong> Operators run <code>pharlux user add</code>, restart a service, hit a dashboard. The dashboard query lands within seconds. The metric points just emitted by the restarted service must appear in that query — not "in 30 seconds when the next flush completes."</li>
<li class=""><strong>Compressed historical data must scan fast.</strong> Six months of metrics needs to live in a format the query engine can scan efficiently — column-pruned, predicate-pushed, page-skipped. That is what Parquet was designed for.</li>
</ol>
<p>A single storage format that satisfies all three is hard. Parquet is column-stored, optimised for analytical scans, and not designed for high-frequency small writes. An append-only WAL is fast to write and easy to crash-recover, but Parquet's compression ratios and scan performance assume larger, well-organised files. Pharlux uses both, with the WAL as the durable buffer and Parquet as the compressed steady state — which puts the unification problem at query time.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-naive-options-that-dont-work">The naive options that don't work<a href="https://pharlux.com/blog/wal-parquet-union-query#the-naive-options-that-dont-work" class="hash-link" aria-label="Direct link to The naive options that don't work" title="Direct link to The naive options that don't work" translate="no">​</a></h2>
<p>A few approaches the design considered and rejected:</p>
<ul>
<li class=""><strong>Query Parquet only, accept N-second staleness.</strong> Loses requirement #2. ClickHouse, SigNoz, and most batch-oriented platforms accept this. Pharlux's design centre — small teams debugging incidents in real time — does not.</li>
<li class=""><strong>Flush WAL → Parquet on every batch.</strong> Parquet writers have meaningful per-file overhead (footer encoding, schema serialisation, dictionary pages). Flushing on every batch destroys throughput and produces thousands of tiny Parquet files that are slow to scan.</li>
<li class=""><strong>In-memory buffer only, no on-disk WAL.</strong> Loses requirement #1. Any crash between ingest and the next flush loses data. Rejected explicitly in ADR-0018.</li>
<li class=""><strong>Two storage layers, two query paths.</strong> Some systems route "recent" queries to a hot store and "historical" queries to cold storage, requiring the user to know which is which. The user does not want to know which is which. They want one SQL surface.</li>
<li class=""><strong>A real database (PostgreSQL, ClickHouse, etc.).</strong> Violates the single-binary constraint. Pharlux is one statically-linked Rust binary; there is no PostgreSQL or ClickHouse process to run alongside it.</li>
</ul>
<p>The design that satisfies all three requirements is: keep the WAL durable on disk, keep recently-acknowledged records in an in-memory snapshot of that WAL, write Parquet files when the WAL exceeds a size or time threshold, and let the query engine see both at once.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-split">The split<a href="https://pharlux.com/blog/wal-parquet-union-query#the-split" class="hash-link" aria-label="Direct link to The split" title="Direct link to The split" translate="no">​</a></h2>
<p>Pharlux runs two storage layers in the same process:</p>
<p><strong>The WAL</strong> is an append-only file on disk with a strict frame format (length-prefixed prost-encoded protobuf records, each followed by a CRC32 — see <a href="https://github.com/Veltara-Works/pharlux/blob/v1.0.0/adr/0018-wal-file-format-prost-crc32.md" target="_blank" rel="noopener noreferrer" class="">ADR-0018</a>). Every accepted batch is written to the WAL with a configurable fsync policy before the API returns 200 OK. On crash, replay reads the WAL forward, validates each record's CRC, truncates at the first invalid record (a partial write at the tail), and rebuilds the in-memory state. Crash recovery is gated by a hard test: 10 consecutive crash-recovery test runs with zero flakes is one of the V1 release gates.</p>
<p><strong>Parquet</strong> is the on-disk steady state. Per-signal schemas (metrics, logs, V1.1 traces) are frozen (<a href="https://github.com/Veltara-Works/pharlux/blob/v1.0.0/adr/0003-separate-parquet-schemas-per-signal.md" target="_blank" rel="noopener noreferrer" class="">ADR-0003</a>). Files live under <code>/var/lib/pharlux/{metrics,logs}/{tenant_id}/YYYY/MM/DD/HH/</code> so retention and compaction can operate on time partitions without scanning the whole tree. Compression is column-stored, dictionary-encoded for high-repeat fields like <code>name</code> and <code>scope_name</code>, with timestamp-sorted row groups for predicate pushdown.</p>
<p><strong>The link between them</strong> is the <code>PharluxMetricsTable</code> (and its <code>PharluxLogsTable</code> sibling), which holds:</p>
<div class="language-rust codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-rust codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token keyword" style="color:rgb(86, 156, 214)">struct</span><span class="token plain"> </span><span class="token type-definition class-name" style="color:rgb(78, 201, 176)">TableState</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    wal_records</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">Vec</span><span class="token operator" style="color:rgb(212, 212, 212)">&lt;</span><span class="token class-name" style="color:rgb(78, 201, 176)">WalRecord</span><span class="token operator" style="color:rgb(212, 212, 212)">&gt;</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain">      </span><span class="token comment" style="color:rgb(106, 153, 85)">// in-memory snapshot of the WAL tail</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    parquet_files</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">Vec</span><span class="token operator" style="color:rgb(212, 212, 212)">&lt;</span><span class="token class-name" style="color:rgb(78, 201, 176)">PathBuf</span><span class="token operator" style="color:rgb(212, 212, 212)">&gt;</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain">      </span><span class="token comment" style="color:rgb(106, 153, 85)">// committed files on disk</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token punctuation" style="color:rgb(212, 212, 212)">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token keyword" style="color:rgb(86, 156, 214)">pub</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">struct</span><span class="token plain"> </span><span class="token type-definition class-name" style="color:rgb(78, 201, 176)">PharluxMetricsTable</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    schema</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">SchemaRef</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">Arc</span><span class="token operator" style="color:rgb(212, 212, 212)">&lt;</span><span class="token class-name" style="color:rgb(78, 201, 176)">RwLock</span><span class="token operator" style="color:rgb(212, 212, 212)">&lt;</span><span class="token class-name" style="color:rgb(78, 201, 176)">TableState</span><span class="token operator" style="color:rgb(212, 212, 212)">&gt;&gt;</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token punctuation" style="color:rgb(212, 212, 212)">}</span><br></div></code></pre></div></div>
<p>The <code>RwLock</code> is the load-bearing primitive. Ingest takes the write lock briefly to push a <code>WalRecord</code> into the buffer. Query takes the read lock briefly to snapshot both halves. Flush takes the write lock briefly to drain WAL records and register newly-written Parquet paths.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-a-query-executes">How a query executes<a href="https://pharlux.com/blog/wal-parquet-union-query#how-a-query-executes" class="hash-link" aria-label="Direct link to How a query executes" title="Direct link to How a query executes" translate="no">​</a></h2>
<p>The implementation lives in <code>impl TableProvider for PharluxMetricsTable</code>. The DataFusion query planner calls <code>scan()</code> with the projection, filters, and limit; Pharlux returns an <code>ExecutionPlan</code> that yields Arrow <code>RecordBatch</code>es covering both layers.</p>
<p>The interesting bit is the snapshot:</p>
<div class="language-rust codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-rust codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token keyword" style="color:rgb(86, 156, 214)">async</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">fn</span><span class="token plain"> </span><span class="token function-definition function" style="color:rgb(220, 220, 170)">scan</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token operator" style="color:rgb(212, 212, 212)">&amp;</span><span class="token keyword" style="color:rgb(86, 156, 214)">self</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">...</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">-&gt;</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">Result</span><span class="token operator" style="color:rgb(212, 212, 212)">&lt;</span><span class="token class-name" style="color:rgb(78, 201, 176)">Arc</span><span class="token operator" style="color:rgb(212, 212, 212)">&lt;</span><span class="token keyword" style="color:rgb(86, 156, 214)">dyn</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">ExecutionPlan</span><span class="token operator" style="color:rgb(212, 212, 212)">&gt;&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token comment" style="color:rgb(106, 153, 85)">// Snapshot WAL records and open Parquet file handles under read lock.</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token comment" style="color:rgb(106, 153, 85)">// Opening handles inside the lock ensures compaction cannot delete</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token comment" style="color:rgb(106, 153, 85)">// files between path observation and handle acquisition.</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">wal_records</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain"> parquet_handles</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> state </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">self</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">read</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">unwrap</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> records </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">wal_records</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">clone</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> handles</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">Vec</span><span class="token operator" style="color:rgb(212, 212, 212)">&lt;</span><span class="token class-name" style="color:rgb(78, 201, 176)">File</span><span class="token operator" style="color:rgb(212, 212, 212)">&gt;</span><span class="token plain"> </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">parquet_files</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">iter</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">filter_map</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token closure-params closure-punctuation punctuation" style="color:rgb(212, 212, 212)">|</span><span class="token closure-params">p</span><span class="token closure-params closure-punctuation punctuation" style="color:rgb(212, 212, 212)">|</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">File</span><span class="token punctuation" style="color:rgb(212, 212, 212)">::</span><span class="token function" style="color:rgb(220, 220, 170)">open</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">p</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">ok</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">            </span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">collect</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">        </span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">records</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain"> handles</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token punctuation" style="color:rgb(212, 212, 212)">}</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token comment" style="color:rgb(106, 153, 85)">// ... build RecordBatches, delegate to MemTable for projection/filter/limit</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token punctuation" style="color:rgb(212, 212, 212)">}</span><br></div></code></pre></div></div>
<p>Two subtle properties matter here:</p>
<ul>
<li class=""><strong>Open handles inside the lock.</strong> A naive implementation would clone the path list under the read lock and open files later. Compaction or retention could delete one of those files in the gap between the path snapshot and the open call, causing a query to fail with <code>ENOENT</code> mid-flight. Opening the handles inside the read lock means each query holds a kernel-level reference to its set of files for the duration of the scan, even if the on-disk path is unlinked.</li>
<li class=""><strong>Clone the WAL records.</strong> The <code>Vec&lt;WalRecord&gt;</code> clone is a real cost (shallow copy of the slice plus protobuf payloads), but it removes the WAL data from the lock's lifetime. The query proceeds on a private copy; ingest and flush can continue concurrently on the original buffer without the query holding the read lock for the full scan duration.</li>
</ul>
<p>After the snapshot, WAL records are converted into a single Arrow <code>RecordBatch</code> matching the production Parquet schema (timestamp as <code>Timestamp(Nanosecond, UTC)</code>; <code>name</code> as a <code>Dictionary(Int32, Utf8)</code>; <code>tenant_id</code> as a non-null <code>Utf8</code>; etc.). Parquet files are read through <code>ParquetRecordBatchReaderBuilder</code>, which yields the same Arrow shape. All batches are collected into a <code>MemTable</code>, and <code>MemTable::scan()</code> applies projection pushdown, filter pushdown, and the optional <code>LIMIT</code> — the same DataFusion optimisations that run for any other table.</p>
<p>The user-side SQL surface is just SQL:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token keyword" style="color:rgb(86, 156, 214)">SELECT</span><span class="token plain"> name</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain"> </span><span class="token function" style="color:rgb(220, 220, 170)">count</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token operator" style="color:rgb(212, 212, 212)">*</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">AS</span><span class="token plain"> cnt</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token keyword" style="color:rgb(86, 156, 214)">FROM</span><span class="token plain"> metrics</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token keyword" style="color:rgb(86, 156, 214)">WHERE</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">timestamp</span><span class="token plain"> </span><span class="token operator" style="color:rgb(212, 212, 212)">&gt;</span><span class="token plain"> </span><span class="token function" style="color:rgb(220, 220, 170)">now</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain"> </span><span class="token operator" style="color:rgb(212, 212, 212)">-</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">INTERVAL</span><span class="token plain"> </span><span class="token string" style="color:rgb(206, 145, 120)">'5 minutes'</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token keyword" style="color:rgb(86, 156, 214)">GROUP</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">BY</span><span class="token plain"> name</span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token keyword" style="color:rgb(86, 156, 214)">ORDER</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">BY</span><span class="token plain"> cnt </span><span class="token keyword" style="color:rgb(86, 156, 214)">DESC</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><br></div></code></pre></div></div>
<p>There is no <code>metrics_recent</code> vs <code>metrics_historical</code> distinction in the schema. The query engine sees one logical table; the <code>TableProvider</code> produces the union; the user does not need to know.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="atomic-transitions">Atomic transitions<a href="https://pharlux.com/blog/wal-parquet-union-query#atomic-transitions" class="hash-link" aria-label="Direct link to Atomic transitions" title="Direct link to Atomic transitions" translate="no">​</a></h2>
<p>The other moving piece is what happens when WAL records <em>become</em> Parquet files. The flush path:</p>
<div class="language-rust codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#9CDCFE;--prism-background-color:#1E1E1E"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-rust codeBlock_bY9V thin-scrollbar" style="color:#9CDCFE;background-color:#1E1E1E"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#9CDCFE"><span class="token comment" style="color:rgb(106, 153, 85)">// Step 1: read-lock to clone the records we'll flush</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> to_flush </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> state </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">self</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">read</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">unwrap</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> n </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> count</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">min</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">wal_records</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">len</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">wal_records</span><span class="token punctuation" style="color:rgb(212, 212, 212)">[</span><span class="token punctuation" style="color:rgb(212, 212, 212)">..</span><span class="token plain">n</span><span class="token punctuation" style="color:rgb(212, 212, 212)">]</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">to_vec</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token punctuation" style="color:rgb(212, 212, 212)">}</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token comment" style="color:rgb(106, 153, 85)">// Step 2: write Parquet files — no lock held during I/O</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> written </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> </span><span class="token namespace">parquet_writer</span><span class="token namespace punctuation" style="color:rgb(212, 212, 212)">::</span><span class="token function" style="color:rgb(220, 220, 170)">flush_records_to_parquet</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">data_dir</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain"> </span><span class="token operator" style="color:rgb(212, 212, 212)">&amp;</span><span class="token plain">to_flush</span><span class="token punctuation" style="color:rgb(212, 212, 212)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(212, 212, 212)">...</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token operator" style="color:rgb(212, 212, 212)">?</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> paths</span><span class="token punctuation" style="color:rgb(212, 212, 212)">:</span><span class="token plain"> </span><span class="token class-name" style="color:rgb(78, 201, 176)">Vec</span><span class="token operator" style="color:rgb(212, 212, 212)">&lt;</span><span class="token class-name" style="color:rgb(78, 201, 176)">PathBuf</span><span class="token operator" style="color:rgb(212, 212, 212)">&gt;</span><span class="token plain"> </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> written</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">iter</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">map</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token closure-params closure-punctuation punctuation" style="color:rgb(212, 212, 212)">|</span><span class="token closure-params">w</span><span class="token closure-params closure-punctuation punctuation" style="color:rgb(212, 212, 212)">|</span><span class="token plain"> w</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">path</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">clone</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">collect</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token comment" style="color:rgb(106, 153, 85)">// Step 3: write-lock to drain flushed records and register new files</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token punctuation" style="color:rgb(212, 212, 212)">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">mut</span><span class="token plain"> state </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(86, 156, 214)">self</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">write</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">unwrap</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    </span><span class="token keyword" style="color:rgb(86, 156, 214)">let</span><span class="token plain"> n </span><span class="token operator" style="color:rgb(212, 212, 212)">=</span><span class="token plain"> to_flush</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">len</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">min</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">wal_records</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">len</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">wal_records</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">drain</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token punctuation" style="color:rgb(212, 212, 212)">..</span><span class="token plain">n</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain">    state</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token plain">parquet_files</span><span class="token punctuation" style="color:rgb(212, 212, 212)">.</span><span class="token function" style="color:rgb(220, 220, 170)">extend</span><span class="token punctuation" style="color:rgb(212, 212, 212)">(</span><span class="token plain">paths</span><span class="token punctuation" style="color:rgb(212, 212, 212)">)</span><span class="token punctuation" style="color:rgb(212, 212, 212)">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#9CDCFE"><span class="token plain"></span><span class="token punctuation" style="color:rgb(212, 212, 212)">}</span><br></div></code></pre></div></div>
<p>The relevant invariant is in the doc comment on this function: <em>"records transition from WAL to Parquet atomically under the write lock — a concurrent query sees records in either WAL or Parquet, never both and never missing."</em></p>
<p>That property is what makes the union query correct. Step 2 (the actual Parquet write) happens with no lock held — disk I/O cannot block ingest or queries. Step 3 takes the write lock just long enough to swap pointers: drain the flushed prefix from <code>wal_records</code>, push the new <code>PathBuf</code>s into <code>parquet_files</code>. A query running on a snapshot taken before this step sees the records as WAL entries; a query on a snapshot taken after sees them as Parquet entries. There is no in-between state where a record is missing or duplicated.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="performance-characteristics">Performance characteristics<a href="https://pharlux.com/blog/wal-parquet-union-query#performance-characteristics" class="hash-link" aria-label="Direct link to Performance characteristics" title="Direct link to Performance characteristics" translate="no">​</a></h2>
<p>The design's runtime profile, on a 4 GB / 2 vCPU VPS:</p>
<ul>
<li class=""><strong>Ingest path:</strong> OTLP request → bounded <code>tokio::sync::mpsc</code> channel (default 1,000 batches) → WAL writer task → fsync. The channel bound is the primary backpressure signal; when full, ingest returns HTTP 429 / gRPC <code>RESOURCE_EXHAUSTED</code> after a 100 ms send timeout (per <a href="https://github.com/Veltara-Works/pharlux/blob/v1.0.0/adr/0015-ingest-channel-bounded-mpsc-backpressure.md" target="_blank" rel="noopener noreferrer" class="">ADR-0015</a>). OTel Collectors retry 429 with exponential backoff out of the box.</li>
<li class=""><strong>WAL ceiling:</strong> 64 MB. Above this, a flush is forced regardless of the configured flush interval. The 64 MB figure is a deliberate cap on memory-resident WAL state, since the in-memory <code>Vec&lt;WalRecord&gt;</code> is what the query path snapshots.</li>
<li class=""><strong>Query path:</strong> <code>scan()</code> snapshots state under read lock (microseconds), reads Parquet (column-pruned per the projection, predicate-pushed per the filter), unions with the WAL batch, and streams back through DataFusion. Memory is bounded by an explicit <code>MemoryPool</code> cap of 256 MB (per <a href="https://github.com/Veltara-Works/pharlux/blob/v1.0.0/adr/0011-memory-budget-200-430mb.md" target="_blank" rel="noopener noreferrer" class="">ADR-0011</a>) — pathological queries fail with a clear error rather than OOM-killing the box.</li>
<li class=""><strong>Process ceiling:</strong> 1 GB hard <code>MemoryLimit</code> enforced by systemd. If the process ever exceeds 1 GB, the kernel kills it; <code>Restart=always</code> brings it back; WAL replay rebuilds the in-memory state from disk; no acknowledged data is lost.</li>
</ul>
<p>Sustained load testing on a 4 vCPU / 8 GB VPS produced 577,000 metric points/sec over 17.36 million points with zero errors and 7 ms average request latency. The 4 GB / 2 vCPU tier handles considerably less than that in absolute throughput — but the architectural ceiling sits well above small-team production traffic.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-give-up">What we give up<a href="https://pharlux.com/blog/wal-parquet-union-query#what-we-give-up" class="hash-link" aria-label="Direct link to What we give up" title="Direct link to What we give up" translate="no">​</a></h2>
<p>The design is honest about its limits:</p>
<ul>
<li class=""><strong>No per-record delete.</strong> Parquet is append-only. GDPR-style erasure runs at the partition level via retention plus targeted deletes, not row-level.</li>
<li class=""><strong>No incremental Parquet.</strong> Flushed Parquet files are immutable; updates to old data require a rewrite of the affected file by the compaction job.</li>
<li class=""><strong>DataFusion has no sparse indexes.</strong> Full-text log search via <code>LIKE</code> on a large logs table is a Parquet full scan. ADR-0005 documents the V1 threshold (~10 GB/day) above which Tantivy indexing is the V1.1 scaling story.</li>
<li class=""><strong>MemoryPool ceiling is real.</strong> Unbounded <code>GROUP BY</code> over a long time range can fail with a 256 MB MemoryPool error. The error message names the cap and recommends narrowing the time range or adding <code>LIMIT</code>. We chose this over silent OOM.</li>
</ul>
<p>These trade-offs are listed in the ADRs. They are not surprises; they are the cost paid for the single-binary, embedded-execution architecture.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-this-design-over-alternatives">Why this design over alternatives<a href="https://pharlux.com/blog/wal-parquet-union-query#why-this-design-over-alternatives" class="hash-link" aria-label="Direct link to Why this design over alternatives" title="Direct link to Why this design over alternatives" translate="no">​</a></h2>
<p>As I said up top, this was the highest-risk decision we made — risky enough that we treated the proof-of-concept as a mandatory gate. The PoC ran the full path (WAL crash recovery, concurrent ingest + query, compaction, two-tenant isolation) before any production code was written. It passed. The fallback — embedded DuckDB behind the <code>QueryEngine</code> trait (<a href="https://github.com/Veltara-Works/pharlux/blob/v1.0.0/adr/0014-query-engine-trait-abstraction.md" target="_blank" rel="noopener noreferrer" class="">ADR-0014</a>) — remains documented but unused.</p>
<p>The reason the design is preferred over a real database engine is alignment, not novelty. ClickHouse, embedded or otherwise, would deliver excellent compression and scan performance — and would also bring its own internal WAL, its own page cache, its own memory accounting, and 1-2 GB of resident memory in its own right. On a 4 GB VPS that is most of the budget. The DataFusion + Parquet path lives within Pharlux's memory budget (200-430 MB realistic, 1 GB hard ceiling) because there is no second engine to feed.</p>
<p>The reason it is preferred over an in-memory-only design is durability. Acknowledged data must survive a crash; the WAL is the contract.</p>
<p>The reason it is preferred over a flush-on-every-batch design is throughput and Parquet hygiene. Many small Parquet files are slow to scan and burn footer encoding overhead; the WAL absorbs the small writes and Parquet sees larger, well-organised files.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="frequently-asked-questions">Frequently asked questions<a href="https://pharlux.com/blog/wal-parquet-union-query#frequently-asked-questions" class="hash-link" aria-label="Direct link to Frequently asked questions" title="Direct link to Frequently asked questions" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-an-in-memory-wal-snapshot-if-the-wal-is-on-disk">Why an in-memory WAL snapshot if the WAL is on disk?<a href="https://pharlux.com/blog/wal-parquet-union-query#why-an-in-memory-wal-snapshot-if-the-wal-is-on-disk" class="hash-link" aria-label="Direct link to Why an in-memory WAL snapshot if the WAL is on disk?" title="Direct link to Why an in-memory WAL snapshot if the WAL is on disk?" translate="no">​</a></h3>
<p>The on-disk WAL is the durable record. The <code>Vec&lt;WalRecord&gt;</code> in memory is a snapshot of the same records, kept as Arrow-shaped objects to avoid re-parsing protobuf on every query. The on-disk WAL is the source of truth on crash recovery; the in-memory copy is the source of truth on hot-path queries.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-happens-if-a-query-is-running-when-the-wal-flushes">What happens if a query is running when the WAL flushes?<a href="https://pharlux.com/blog/wal-parquet-union-query#what-happens-if-a-query-is-running-when-the-wal-flushes" class="hash-link" aria-label="Direct link to What happens if a query is running when the WAL flushes?" title="Direct link to What happens if a query is running when the WAL flushes?" translate="no">​</a></h3>
<p>The query took its snapshot under the read lock before the flush ran. The query sees those records as WAL entries. A query that starts after the flush completes sees them as Parquet entries. There is no race window where a record is missing or duplicated — that is the property the doc comment on <code>flush()</code> calls out explicitly.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-is-tenant_id-enforced-across-the-union">How is <code>tenant_id</code> enforced across the union?<a href="https://pharlux.com/blog/wal-parquet-union-query#how-is-tenant_id-enforced-across-the-union" class="hash-link" aria-label="Direct link to how-is-tenant_id-enforced-across-the-union" title="Direct link to how-is-tenant_id-enforced-across-the-union" translate="no">​</a></h3>
<p><code>tenant_id</code> is a non-null <code>Utf8</code> column in every Parquet schema and a field on every <code>WalRecord</code>. Every query goes through <code>TenantScopedQueryBuilder</code> with a mandatory <code>WHERE tenant_id = ?</code> predicate before reaching the <code>TableProvider</code>. Community deployments use the constant <code>"default"</code> tenant — the code path is identical. Multi-tenant from day one (per Pharlux's hard invariants) is not a retrofit.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="can-datafusion-push-my-filter-into-the-parquet-reader">Can DataFusion push my filter into the Parquet reader?<a href="https://pharlux.com/blog/wal-parquet-union-query#can-datafusion-push-my-filter-into-the-parquet-reader" class="hash-link" aria-label="Direct link to Can DataFusion push my filter into the Parquet reader?" title="Direct link to Can DataFusion push my filter into the Parquet reader?" translate="no">​</a></h3>
<p>Yes for predicate pushdown on standard column types — DataFusion uses Parquet page indexes and row group statistics to skip pages that cannot match. Projection pushdown applies on every scan. The WAL side is in-memory Arrow, where filters are applied at scan time without an indexing pass — at typical WAL sizes (well under 64 MB) this is fast.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-happens-if-the-parquet-file-is-deleted-by-retention-while-my-query-is-reading-it">What happens if the Parquet file is deleted by retention while my query is reading it?<a href="https://pharlux.com/blog/wal-parquet-union-query#what-happens-if-the-parquet-file-is-deleted-by-retention-while-my-query-is-reading-it" class="hash-link" aria-label="Direct link to What happens if the Parquet file is deleted by retention while my query is reading it?" title="Direct link to What happens if the Parquet file is deleted by retention while my query is reading it?" translate="no">​</a></h3>
<p>The query holds an open <code>File</code> handle taken inside the read lock. On Linux, deleting an open file unlinks the directory entry but leaves the file content readable through any open handle until the last handle closes. The query completes against the original content; the disk space is reclaimed when the query finishes. This is documented in the <code>scan()</code> comment: <em>"opening handles inside the lock ensures compaction cannot delete files between path observation and handle acquisition."</em></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-not-just-use-embedded-duckdb">Why not just use embedded DuckDB?<a href="https://pharlux.com/blog/wal-parquet-union-query#why-not-just-use-embedded-duckdb" class="hash-link" aria-label="Direct link to Why not just use embedded DuckDB?" title="Direct link to Why not just use embedded DuckDB?" translate="no">​</a></h3>
<p>DuckDB was the documented Phase 0 fallback (<a href="https://github.com/Veltara-Works/pharlux/blob/v1.0.0/adr/0013-phase-0-poc-gate.md" target="_blank" rel="noopener noreferrer" class="">ADR-0013</a>) and would also have worked — DuckDB has a battle-hardened Parquet reader and real production mileage in embedded mode. The reason DataFusion is the preferred choice is forward compatibility: Pharlux writes plain Apache Parquet files, readable directly by DuckDB, Trino, Ballista, Polars, Spark, and any other engine that speaks Parquet. Operators who outgrow Pharlux take their data with them in an open interchange format that every analytical tool understands. DataFusion's Arrow-native API also lets Pharlux compose query results as zero-copy Arrow streams from Parquet through the engine to the API response, where DuckDB's API exposes SQL strings — Arrow composition matters for the dashboard latency budget.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="get-pharlux">Get Pharlux<a href="https://pharlux.com/blog/wal-parquet-union-query#get-pharlux" class="hash-link" aria-label="Direct link to Get Pharlux" title="Direct link to Get Pharlux" translate="no">​</a></h2>
<ul>
<li class=""><strong>Download v1.0.0</strong> — <a href="https://github.com/Veltara-Works/pharlux/releases/tag/v1.0.0" target="_blank" rel="noopener noreferrer" class="">github.com/Veltara-Works/pharlux/releases/tag/v1.0.0</a></li>
<li class=""><strong>Documentation</strong> — <a href="https://pharlux.com/docs/getting-started/" target="_blank" rel="noopener noreferrer" class="">pharlux.com/docs/getting-started/</a></li>
<li class=""><strong>Source</strong> — <a href="https://github.com/Veltara-Works/pharlux" target="_blank" rel="noopener noreferrer" class="">github.com/Veltara-Works/pharlux</a></li>
<li class=""><strong>The TableProvider source</strong> — <a href="https://github.com/Veltara-Works/pharlux/blob/v1.0.0/pharlux-store/src/table_provider.rs" target="_blank" rel="noopener noreferrer" class="">pharlux-store/src/table_provider.rs</a></li>
</ul>
<p>Pharlux is one of several developer tools built by <a href="https://veltaraworks.com/" target="_blank" rel="noopener noreferrer" class="">Veltara Works</a> — alongside email hosting, cloud infrastructure, and software license management. See <a href="https://veltaraworks.com/" target="_blank" rel="noopener noreferrer" class="">veltaraworks.com</a> for the full portfolio.</p>]]></content:encoded>
            <category>engineering</category>
            <category>deep-dive</category>
            <category>datafusion</category>
            <category>parquet</category>
            <category>observability</category>
        </item>
    </channel>
</rss>