Skip to main content

Logs Query Performance

Full-text search approach (V1)

Pharlux V1 uses DataFusion SQL LIKE / ILIKE operators for log body search, scanning over Parquet files directly. This avoids the complexity of maintaining a separate full-text index while covering the majority of small-team use cases.

Performance characteristics

Daily log volume7-day LIKE scan time (2 vCPU)7-day LIKE scan time (4 vCPU)Assessment
< 1 GB/day< 2 seconds< 1 secondExcellent
1–5 GB/day2–15 seconds1–8 secondsGood
5–10 GB/day15–30 seconds8–15 secondsAcceptable
10–20 GB/day30–60 seconds15–30 secondsDegraded — consider tantivy index (V1.1)
> 20 GB/day> 60 seconds> 30 secondsNot recommended without full-text index

The recommended threshold for LIKE-based search is 10 GB/day sustained log volume. Above this, 7-day LIKE scans will exceed the 30-second response time target on a typical 2-vCPU VPS.

Why LIKE is sufficient for small teams

  • Parquet's columnar layout means LIKE '%pattern%' only reads the body column, not the entire row
  • Zstd compression reduces I/O; dictionary encoding on repeated fields (severity_text, scope_name) further reduces scan volume
  • DataFusion's predicate pushdown and partition pruning (by time range) reduce the data scanned
  • For the target audience (1–10 services on a single VPS), log volume is typically 0.5–5 GB/day

Query patterns

-- Search log bodies
SELECT timestamp, severity_text, body
FROM logs
WHERE tenant_id = 'default'
AND timestamp > now() - INTERVAL '1 hour'
AND body LIKE '%connection refused%'
ORDER BY timestamp DESC
LIMIT 100;

-- Case-insensitive search
SELECT timestamp, body
FROM logs
WHERE body ILIKE '%error%'
AND timestamp > now() - INTERVAL '24 hours';

-- Cross-signal correlation via trace_id
SELECT l.timestamp, l.body, l.severity_text
FROM logs l
WHERE l.trace_id = X'0102030405060708090a0b0c0d0e0f10'
ORDER BY l.timestamp;

Partition pruning

Pharlux stores logs in hourly partitions (data/logs/{tenant_id}/YYYY/MM/DD/HH/). Queries that include a time range filter (WHERE timestamp > ...) automatically benefit from partition pruning — only relevant hour partitions are scanned.

Always include a time range in log queries to minimize scan volume.

Future: tantivy full-text index (V1.1)

V1.1 will add an optional tantivy inverted index per log partition, enabling sub-second full-text search at any volume. The storage layer is designed so this can be added without schema changes — the index is a sidecar file alongside each Parquet partition.

tantivy is not yet a Pharlux dependency; the V1.1 work will add it through the standard VERSIONS.md review process. See DESIGN.md §"Full-text log search" for the design-in note.