⚙️ Determinism & Execution Semantics¶

This document defines elevata’s rules for deterministic SQL generation and execution.
It applies to both standard generation and custom query logic (Query Trees).

🔧 1. Why determinism matters in elevata¶

elevata is built for reproducibility:

SQL previews must match executed SQL
CI checks must be stable
the same metadata must produce the same output across runs
multi-dialect rendering must not introduce semantic drift

Determinism is therefore treated as a correctness requirement, not a “best practice”.

🔧 2. Determinism model: errors vs warnings¶

elevata classifies findings as:

ERROR (blocking): execution is ambiguous or unsafe
WARNING (advisory): execution is valid, but quality or semantics may be degraded

The Query Builder UI surfaces this via:

deterministic / needs ordering badges
error/warning counts

🔧 3. Preflight Validation Phase¶

elevata includes a preflight validation phase executed before any DDL or DML statements are applied.

The preflight phase guarantees that execution behavior is fully predictable.

🧩 Responsibilities¶

The preflight phase performs:

schema introspection
materialization planning
type drift detection
validation of blocking conditions
execution safety checks

No SQL affecting data or schema is executed before preflight completes successfully.

🧩 Deterministic Failure Modes¶

Execution may fail during preflight when:

unsafe schema evolution is required
narrowing or incompatible type drift is detected
required dialect capabilities are missing
metadata inconsistencies are found

Failures always occur before execution starts.

This guarantees:

no partially applied schema changes
no partial data loads
reproducible execution behavior.

🧩 Full Refresh Exception¶

Datasets using full refresh materialization are exempt from type drift blocking
because the table is recreated during execution.

Type drift warnings may still be emitted for visibility.

🔧 4. Window functions¶

Some window functions are inherently nondeterministic without ordering.

Rule:

If a window function requires ordering, an ORDER BY clause is mandatory. Missing ORDER BY → ERROR

Examples of functions requiring ORDER BY:

ROW_NUMBER, RANK, DENSE_RANK
LAG, LEAD
FIRST_VALUE, LAST_VALUE, NTH_VALUE
NTILE

Windowed aggregates (SUM/AVG/…) may not require ORDER BY:

missing ORDER BY is usually ok → optional warning depending on policy

🔧 5. Aggregation determinism¶

Aggregations can become nondeterministic if result ordering is undefined in the aggregation semantics.

Rule patterns:

Ordered aggregates (e.g. STRING_AGG) require explicit ORDER BY inside the function. Missing ordering → ERROR (or strict WARNING, depending on policy)

Other aggregates (SUM, COUNT, MIN, MAX, AVG) are deterministic without ordering.

🔧 6. Contract stability and collisions¶

The output contract must be stable and unambiguous.

Rules:

Output column name collisions → ERROR
Missing inputs / disconnected tree → ERROR
Cycles in the Query Tree → ERROR

🔧 7. Why elevata is not a semantic layer¶

elevata does not implement query-time semantics (like BI semantic layers or metric stores).
Instead, elevata materializes semantics into datasets deterministically:

business logic belongs in bizcore
consumer shaping belongs in serving
execution is metadata-native and explainable via lineage + query contract

This avoids tool-specific logic and ensures reproducible pipelines.