Skip to main content

SQL Query Reference

Pharlux uses Apache DataFusion as its query engine. Queries are standard SQL with full support for aggregations, joins, subqueries, and window functions.

Tables

metrics

ColumnTypeNullableDescription
tenant_idUtf8NoTenant identifier
timestampTimestamp(ns, UTC)NoWhen the metric was recorded
nameDictionary(Int32, Utf8)NoMetric name (e.g. http.request.duration)
valueFloat64NoMetric value
attributesUtf8 (JSON)YesKey-value attributes as JSON
resourceUtf8 (JSON)YesResource attributes as JSON
scope_nameDictionary(Int32, Utf8)YesInstrumentation scope name
scope_versionDictionary(Int32, Utf8)YesInstrumentation scope version

logs

ColumnTypeNullableDescription
tenant_idUtf8NoTenant identifier
timestampTimestamp(ns, UTC)NoWhen the event occurred
observed_timestampTimestamp(ns, UTC)YesWhen the event was collected
severity_numberInt32YesOTel severity (1-24)
severity_textDictionary(Int32, Utf8)YesSeverity label (TRACE, DEBUG, INFO, WARN, ERROR, FATAL)
bodyUtf8YesLog message body
nameDictionary(Int32, Utf8)YesLog name
attributesUtf8 (JSON)YesKey-value attributes as JSON
resourceUtf8 (JSON)YesResource attributes as JSON
scope_nameDictionary(Int32, Utf8)YesInstrumentation scope name
scope_versionDictionary(Int32, Utf8)YesInstrumentation scope version
trace_idFixedSizeBinary(16)YesW3C trace ID (for log-trace correlation)
span_idFixedSizeBinary(8)YesW3C span ID
flagsUInt32YesOTel log record flags

Query patterns

Metrics

-- Top 10 metric names by count
SELECT name, count(*) as cnt
FROM metrics
GROUP BY name
ORDER BY cnt DESC
LIMIT 10;

-- Average value over time (5-minute buckets)
SELECT
date_trunc('minute', timestamp) as bucket,
avg(value) as avg_value
FROM metrics
WHERE name = 'http.request.duration'
AND timestamp > now() - INTERVAL '1 hour'
GROUP BY bucket
ORDER BY bucket;

-- Latest value per metric
SELECT DISTINCT ON (name) name, value, timestamp
FROM metrics
ORDER BY name, timestamp DESC;

Logs

-- Recent errors
SELECT timestamp, severity_text, body
FROM logs
WHERE severity_number >= 17 -- ERROR and above
AND timestamp > now() - INTERVAL '1 hour'
ORDER BY timestamp DESC
LIMIT 100;

-- Full-text search (LIKE)
SELECT timestamp, body
FROM logs
WHERE body LIKE '%connection refused%'
AND timestamp > now() - INTERVAL '24 hours'
ORDER BY timestamp DESC;

-- Case-insensitive search (ILIKE)
SELECT timestamp, body
FROM logs
WHERE body ILIKE '%timeout%'
ORDER BY timestamp DESC
LIMIT 50;

-- Log count by severity
SELECT severity_text, count(*) as cnt
FROM logs
WHERE timestamp > now() - INTERVAL '1 hour'
GROUP BY severity_text
ORDER BY cnt DESC;

Cross-signal correlation

-- Find logs associated with a specific trace
SELECT timestamp, severity_text, body
FROM logs
WHERE trace_id = X'0102030405060708090a0b0c0d0e0f10'
ORDER BY timestamp;

Tenant isolation

All queries are automatically scoped to the authenticated tenant. The WHERE tenant_id = ? filter is injected at the logical plan level — it cannot be bypassed via SQL.

Community (single-tenant) deployments use the constant tenant "default".

Limits

  • Memory pool: 256 MB per query (ADR-0011). Queries exceeding this are rejected.
  • Default row limit: 100,000 rows per query (configurable via [query].max_result_rows).
  • Query timeout: 30 seconds (configurable via [query].query_timeout_seconds).

REST API

POST /api/v1/query
Authorization: Bearer <JWT>
Content-Type: application/json

{
"sql": "SELECT ...",
"limit": 1000 // optional, overrides default max_result_rows
}

Response:

{
"rows": [ {"name": "cpu.usage", "value": 0.75}, ... ],
"row_count": 1
}

Supported SQL features

Pharlux uses DataFusion 53.0.0 which supports:

  • SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, OFFSET
  • Aggregate functions: count, sum, avg, min, max, stddev, variance
  • Window functions: row_number, rank, lead, lag, first_value, last_value
  • Date/time: now(), date_trunc, date_part, INTERVAL
  • String: LIKE, ILIKE, length, upper, lower, trim, substring
  • Math: abs, ceil, floor, round, power, sqrt
  • Subqueries, CTEs (WITH), UNION ALL, JOIN
  • EXPLAIN and EXPLAIN ANALYZE for query planning