Skip to main content

One post tagged with "parquet"

View All Tags

How the WAL+Parquet union query works

· 15 min read
Ian Holt
Founder, Veltara Works

Last updated: 2026-05-05 · Pharlux v1.0.0 · By Ian Holt

The thing most observability platforms don't talk about: you cannot query data you have not persisted, and persisting hurts throughput. ClickHouse-style batch writers compress beautifully and scan fast, but typical batch intervals introduce tens of seconds of staleness before new data is queryable. In-memory buffers are immediate but lose data on crash. The two requirements — durability and freshness — pull in opposite directions, and most platforms pick one and document around the other.

Pharlux ships a different design. A custom Apache DataFusion TableProvider unions an in-memory WAL buffer with on-disk Apache Parquet files into one consistent view at query time. Freshly-ingested data is queryable as soon as it lands in the WAL — no flush wait, no buffer-window staleness — and it is queryable through the same SQL surface as historical data on disk.