⚙️ Load SQL Architecture¶
This document describes how elevata transforms metadata into executable SQL for load operations.
🔧 1. Overview¶
The load SQL pipeline turns metadata into complete SQL statements suitable for execution on analytical backends.
High‑level flow:
Metadata → Logical Plan → Expression AST → Dialect Rendering → SQL Load Statement → Execution
The system is designed so that:
- metadata remains backend‑agnostic
- logical plans describe what is required, not how it is written
- dialects encapsulate syntactic differences
- execution engines are decoupled from SQL generation
Note
This document focuses exclusively on SQL generation and rendering.
Execution order, orchestration semantics, retries, failure handling, and execution observability are described separately in Load Execution & Orchestration Architecture.
🔧 2. Logical Plans for Load Operations¶
elevata represents load operations using SQL primitives:
LogicalSelect– core building blockLogicalUnion– multi-source resolutionSubquerySource– ranking, filtering, pre‑aggregation
Logical plans are intentionally dialect‑neutral.
🔧 3. Expression AST in Load SQL¶
Every column expression is represented as an AST node during load generation.
Supported node types include:
ColumnRefLiteralExprRefConcatExprConcatWsExprCoalesceExprWindowFunctionExprHash256Expr
The AST guarantees that:
- hashing is consistent across dialects
- CONCAT / COALESCE behave uniformly
- window functions are structured, not string‑built
- SQL remains predictable and comparable
🔧 4. Dialect Rendering¶
After a logical plan is constructed, the selected dialect renders the plan and its AST into concrete SQL.
sql = dialect.render_select(plan)
Each dialect implements:
- identifier quoting
- literal rendering
- hashing functions
- CONCAT / CONCAT_WS
- COALESCE
- window functions
- subqueries and unions
This ensures consistent semantics while using native SQL syntax per backend.
🔧 5. Intent layer (generation + ingestion)¶
elevata uses a small "intent layer" to keep core decisions consistent across target generation, ingestion, drift detection and quality checks.
🧩 Landing intent¶
The function landing_required(SourceDataset) determines whether a dataset conceptually requires a RAW landing object.
Decision rules:
SourceDataset.integratemust be True (hard gate).SourceDataset.generate_raw_tableoverrides the system default.- If unset, inherit
System.generate_raw_tables.
This decision is shared across:
- RAW target generation
- ingestion execution planning
- future drift detection and quality checks
🧩 Ingestion mode¶
The function resolve_ingest_mode(SourceDataset) determines how RAW is populated:
native: elevata extracts and loadsexternal: an external tool populates RAW; elevata validates + continuesnone: no landing ingestion (federated/virtual access)
If RAW landing is required but include_ingest='none', configuration is inconsistent and must fail fast.
🧩 Allowed states (RAW landing × include_ingest)¶
The decision whether a RAW landing exists is driven by landing_required(SourceDataset). The execution mode is driven by System.include_ingest, but only becomes relevant when RAW landing is required.
| landing_required | include_ingest | Result / behavior |
|---|---|---|
| false | none | ✅ Valid. No RAW landing. Ingestion is skipped (federated/virtual or intentionally no landing). |
| false | external | ✅ Valid (ignored). No RAW landing, so ingestion is skipped. |
| false | native | ✅ Valid (ignored). No RAW landing, so ingestion is skipped. |
| true | native | ✅ Valid. elevata performs native ingestion (extract + load into RAW). |
| true | external | ✅ Valid. RAW is expected to be populated by an external tool; elevata validates RAW existence and continues with downstream steps (drift/quality/etc.). |
| true | none | ❌ Invalid. Configuration inconsistency: RAW landing is required but no ingestion mode is enabled. Must fail fast. |
Notes:
include_ingestis only actionable whenlanding_required=True.- For
include_ingest=external, elevata does not extract data but still logs runs and can run drift/quality checks on RAW.
🔧 6. Load Runner¶
The Load Runner CLI (elevata_load) orchestrates SQL generation and execution.
It:
- resolves the active profile and target system
- reads target dataset metadata
- constructs the logical plan
- renders SQL via the active dialect
- optionally executes SQL in the target warehouse
The same pipeline is used for SQL preview and execution.
🔧 7. Deterministic Generation¶
The SQL generation pipeline is fully deterministic:
- stable business-key ordering
- stable hashing patterns
- stable helper column naming
- stable logical plan structure
This guarantees reproducible SQL and predictable diffs.
🔧 8. Merge‑based Incremental SQL Generation (Rawcore)¶
This section documents how merge‑based incremental loads are implemented for Rawcore targets.
🧩 8.1 Source Resolution¶
For targets using incremental_strategy = "merge", the SQL layer resolves the Stage upstream dataset as the merge source:
- source:
stage.<table> AS s - target:
rawcore.<table> AS t
Lineage metadata guarantees compatible natural keys and attribute sets.
🧩 8.2 Natural Key Join¶
Natural key fields define:
- the merge join condition
- identification of new vs. existing rows
- delete‑detection scope
If no natural key is defined, SQL generation fails.
🧩 8.3 Logical Plan Reuse¶
All column expressions used in UPDATE and INSERT branches are reused from the logical plan.
Business logic is defined once and rendered consistently.
🧩 8.4 Dialect‑dependent Strategy¶
Dialects choose between:
- native
MERGEstatements - fallback
UPDATE+INSERT ... WHERE NOT EXISTSpatterns
Both paths reuse the same logical plan expressions.
🧩 8.5 Delete Detection¶
Delete detection is implemented as a separate anti‑join statement that runs before the merge.
The SQL layer translates incremental scope filters from source lineage into target column expressions.
The incremental scope used for delete detection is derived from the SourceDataset.increment_filter and translated via lineage into target column references.
🔧 9. Execution Semantics¶
Execution semantics are defined by target layer:
| Layer | Behaviour |
|---|---|
raw |
Replace when loaded or seeded |
stage |
Always replace (truncate before insert) |
rawcore |
Replace only when mode = full; incremental runs never truncate |
*_hist |
Never truncate; versioned updates only |
Execution always runs inside the target system.
🔧 10. Execution, Auto‑Provisioning & Warehouse Logging¶
🧩 10.1 Execution Modes¶
elevata_load supports:
- Dry‑run: render SQL without executing it
- Execute (
--execute): render and execute SQL in the target warehouse
🧩 10.2 Layer-aware execution (RAW = ingestion)¶
elevata_load --execute is intentionally layer-aware:
- For
rawtargets,--executedoes not produce SQL but triggers ingestion logic instead. The same execution command therefore has different semantics depending on the target layer. - For downstream layers (
stage,rawcore,*_hist),--executerenders and executes warehouse-native SQL as usual.
Why this matters:
- elevata treats ingestion as a first-class citizen of the pipeline.
- The same lineage metadata that drives target generation also drives ingestion planning.
- This closes an important gap in dbt-style stacks: dbt excels at transformations but does not provide ingestion as part of its core execution model.
Practical rule:
- If you can
--executea RAW table, elevata will bring the data in (native/external mode). - If you
--executea Stage model, elevata assumes that its upstream exists inside the target execution context
(RAW landing or federated/external availability), and will fail fast otherwise.
🔎 Ingestion configuration (ingestion_config)¶
For RAW ingestion, source connection details and behavior are defined via SourceDataset.ingestion_config.
- For relational sources, scoping is controlled via
static_filterandincrement_filter. - For non-relational sources (Files / REST),
ingestion_configcontains connector-specific parameters (e.g.uriorurl).
Regardless of source type, RAW ingestion is always executed as Full Replace:
- Drop (if supported)
- Create
- Truncate
- Insert
🧩 10.3 Auto‑Provisioning¶
When enabled, execution automatically provisions:
- target schemas
- the meta schema
- the
load_run_logtable
All DDL is idempotent.
🧩 10.4 Warehouse‑level Load Run Log¶
Each executed load writes a row into meta.load_run_log, capturing:
- batch and load run IDs
- target dataset and system
- load mode and flags
- timestamps and durations
- execution status and error details
This enables warehouse‑native observability and auditing.
🧩 10.5 Schema Evolution (MigrationPlan-driven)¶
Before executing generated DML, elevata applies deterministic schema evolution steps that align physical target tables with metadata-defined schemas:
ENSURE_SCHEMA(idempotent)RENAME TABLE(dataset rename viaformer_names)RENAME COLUMN(column rename viaformer_names)ADD COLUMN(when safe and supported)DROP COLUMN(policy-gated; disabled by default)- Base tables: enabled via
ELEVATA_ALLOW_AUTO_DROP_COLUMNS=true _histtables: enabled viaELEVATA_ALLOW_AUTO_DROP_HIST_COLUMNS=true
- Base tables: enabled via
Schema evolution steps are derived from the Architecture MigrationPlan and executed by the applier. Destructive operations remain policy-gated and are only executed when explicitly enabled.
Table creation remains the responsibility of the load runner via ensure_target_table(...).
🔧 11. Load Observability & Debugging¶
Load runs expose structured summaries, batch grouping, and CLI‑level logging to support debugging and monitoring.
🔧 12. CLI Usage¶
The elevata_load command supports preview, debugging, batch execution, and warehouse execution.
🔧 13. Execute Mode¶
The --execute flag enables direct execution of load SQL in the target warehouse via dialect‑specific execution engines.
© 2025-2026 elevata - Technical Documentation