⚙️ elevata Architecture Overview¶

A high-level view of how elevata transforms metadata into executable SQL - from ingestion to lineage, from logical plans to dialect-aware rendering.

This overview connects the core concepts behind Generation Logic, Incremental Load, Load SQL Architecture, Lineage & Logical Plan, and the Dialect System into one visual narrative.

🔧 1. Core Architecture at a Glance¶

Source Metadata (DB reflection, APIs)
  ↓
Metadata Model (Datasets, Columns, Lineage)
  ↓
Generation Logic (TargetDataset & Columns)
  ↓
Lineage Model (Dataset + Column Lineage)
  ↓
Logical Plan Builder (Structured Query Representation)
  ↓
SQL Renderer (Deterministic SQL Formatting)
  ↓
get_active_dialect() (Dialect Adapter)
  ↓
Load SQL (Full · Merge · Delete Detection)
  ↓
Target Warehouse (Raw · Stage · Rawcore)
  ↓
Schema Evolution (MigrationPlan)
  ↓
DDL Applier (safe DDL only)

This flow represents the central principle of elevata:

Metadata → Logical Plan → Dialect-aware SQL → Warehouse

Architecture Control provides review, approval, controlled execution, and audit artifacts around the same architecture state:

Architecture State
  ↓
Architecture Diff
  ↓
MigrationPlan
  ↓
Policy Decisions
  ↓
Architecture Change Report
  ↓
Architecture Review Briefing
  ↓
Architecture Approval Artifact
  ↓
Execution Preview
  ↓
Controlled Execution
  ↓
Architecture Execution Record

🔧 2. Architecture Layers¶

🧩 2.1 Metadata Ingestion Layer¶

Reads schema, columns, keys from source systems
Normalizes metadata into elevata’s internal models
Reports created, changed, unchanged and removed source metadata outcomes
No SQL generation occurs here

🔎 2.1.1 Source Metadata Import Review¶

Source Metadata Import Review makes source onboarding inspectable immediately after import.

It reports what elevata discovered and how the SourceColumn metadata changed:

created columns
changed columns
unchanged columns
removed columns
detected primary key columns
skipped datasets
datasets that need manual review

Changed and unchanged are intentionally separated. A changed column means the stored technical source metadata now differs from the previous state. An unchanged column means the source was checked and still matches the previous metadata state.

The review result is transient and read-only as a report. It does not persist import history, introduce a new workflow, execute loads, or generate target architecture. It only makes the existing metadata import outcome transparent before downstream generation and control steps.

🧩 2.2 Generation Layer¶

Creates TargetDatasets in Raw, Stage, Rawcore
Injects surrogate keys where required
Produces column mappings based entirely on lineage

Incremental scoping and ingestion behavior are derived from SourceDataset metadata and consistently applied across ingestion, merge, and delete detection.

Raw datasets may be ingested via native ingestion or skipped entirely in federated setups.

🧩 2.3 Lineage Layer¶

Establishes dataset-level and column-level lineage
Feeds the Logical Plan Builder
Ensures traceability from source to Rawcore

🧩 2.4 Logical Plan Layer¶

Builds structured plans (not SQL!)
Vendor-neutral representation of SELECT, JOIN, UNION logic
Used by Raw → Stage → Rawcore previews and loads

🧩 2.5 SQL Rendering Layer¶

Applies formatting rules (indentation, aliasing, column order)
Hands off dialect-specific tasks to the dialect adapter
Deterministic output for UI and CI

🧩 2.6 Dialect Adapter Layer¶

Implements quoting, merge syntax, hashing, concatenation
Ensures SQL runs identically across platforms (BigQuery, Databricks, DuckDB, Fabric Warehouse, MSSQL, Postgres, Snowflake)

🧩 2.7 Load SQL Layer¶

Full load: INSERT INTO ... SELECT
Incremental merge: upsert logic based on natural key lineage
Delete detection: anti-join removal of missing rows

🧩 2.7.1 Schema Evolution (MigrationPlan + Applier)¶

Before executing load SQL, elevata derives a MigrationPlan from the Architecture Diff and translates it into deterministic schema evolution steps:

Dataset renames are expressed as RENAME TABLE
Column renames are expressed as RENAME COLUMN
Missing columns may be added (ADD COLUMN) when supported
Column drops are policy-gated and disabled by default
- Base tables: ELEVATA_ALLOW_AUTO_DROP_COLUMNS=true enables physical DROP COLUMN
- _hist tables: physical drops require ELEVATA_ALLOW_AUTO_DROP_HIST_COLUMNS=true
- Without the hist flag, removed business columns in _hist are retired (inactive + detached lineage)

Important design principle:
Schema evolution does not provision missing tables. Table provisioning is handled centrally by the load runner (ensure_target_table(...)) and executed via the target execution engine.

Preflight validation includes schema introspection and dialect-aware semantic equivalence rules to suppress non-actionable type differences.

🧩 2.7.2 Architecture Catalog¶

Architecture Catalog provides the read-only discovery layer for metadata-defined executable architecture.

It helps users inspect:

dataset inventory
schema / layer placement
materialization semantics
incremental strategy
ownership
metadata health
query logic
upstream and downstream relationships
column contract signals
serving-layer Data Product readiness
layer maps and dependency matrices
latest execution evidence references
architecture quality and governance insights
Architecture Control review status summaries

The Catalog links to dedicated pages for:

dataset details
lineage
query contracts
Catalog Data Products
Architecture Control
execution history

Architecture Catalog does not edit metadata and does not execute loads.

Catalog Data Products provide a read-only consumer-readiness perspective for serving-layer datasets. They combine ownership, metadata health, query contracts, lineage, review state and execution evidence into transparent readiness groups: Consumption-ready, Review recommended and Not consumption-ready.

Catalog Insights provide read-only signals for ownership gaps, metadata health findings, custom query logic, downstream consumer visibility, inactive datasets with consumers, and missing execution evidence. Dataset-specific insight signals are also shown on Catalog detail pages.

Catalog Maps provide a read-only architecture lens across populated schemas and direct TargetDataset dependencies. Layer cards, layer flow overview, dependency matrix and transition examples make architecture structure visible without introducing graph editing, execution controls or metadata mutation.

🧩 2.7.3 Architecture Control¶

Architecture Control makes metadata-defined architecture reviewable, approvable, executable through controlled scopes, and auditable.

It provides deterministic artifacts for:

Architecture State
Architecture Change Reports
Architecture Promotion Reports
Architecture Approval Artifacts
Architecture Execution Records
policy decisions
report fingerprints

Controlled execution is delegated to the load runner. Architecture Control does not bypass preflight validation, materialization policy checks, Architecture Guard enforcement, or dialect-owned SQL rendering.

Command responsibilities:

Command	Responsibility
`elevata_state`	Render the metadata-defined architecture state
`elevata_plan`	Render architecture change intent and policy decisions
`elevata_promote`	Compare two architecture state artifacts
`elevata_approve`	Create architecture approval artifacts
`elevata_approval_check`	Verify approval artifacts
`elevata_load`	Execute loads with preflight and guard checks

Architecture Control uses the same semantic path as execution:

Architecture State → Architecture Diff → MigrationPlan → Policy Decisions

The Architecture Control UI adds a constrained operational layer. Architecture Review Briefing summarizes reviewer attention from the current scoped report, review status and execution preview before approval or execution:

scope-aware report and review status
compact Architecture Review Briefing
approval artifact creation and verification
execution preview
controlled load execution
target-only execution for TargetDataset scopes
captured execution output
persisted Architecture Execution Records

Execution scopes are explicit:

Scope	Execution behavior
All datasets	Executes all active target datasets with dependency ordering
Schema	Executes selected schema roots with dependency ordering
TargetDataset	Executes the selected TargetDataset with dependency ordering
TargetDataset, target-only	Executes only the selected TargetDataset

The default execution path remains lineage-aware. Target-only execution is available only for TargetDataset scopes and is intended for focused iteration when upstream data is already available.

Architecture Execution Records capture the audit context of controlled execution:

execution identifier
operator
timestamps and duration
status and message
Architecture Control scope
dependency mode
report fingerprint
approval identifier
preview fingerprint
command invocation metadata
output and error tails
deterministic record fingerprint

🔧 3. Bizcore - Business Semantics as Metadata¶

elevata introduces a dedicated Bizcore layer for modeling business meaning, rules, and calculations as first-class metadata.

Bizcore sits explicitly between Core and Serving:

RAW → STAGE → CORE → BIZCORE → SERVING

🧩 What Bizcore is¶

A business semantics layer, not a technical projection
Explicitly modeled datasets and columns
Deterministically executed like all other datasets
Fully lineage-aware and explainable

Bizcore datasets express:

business concepts (e.g. Customer, Contract, Revenue)
business rules and classifications
derived business identifiers
KPIs and domain logic as dataset fields

🧩 What Bizcore is not¶

No BI semantic layer
No metric store
No query-time metric resolution
No tool-specific abstraction

Bizcore logic is compiled into the same logical plans and SQL as technical datasets, preserving elevata’s guarantees around determinism, transparency, and reproducibility.

🧩 Serving - Presentation Logic & Consumer Hand-off¶

Serving is the presentation-facing layer. Serving datasets typically expose Bizcore datasets 1:1 (often as views), while allowing consumer-specific shaping such as naming, ordering, and lightweight joins where required. Serving is intended as the hand-off layer to BI tools / semantic layers / frontend use cases - without moving business logic out of Bizcore.

🧩 Custom Query Logic (Query Tree)¶

For most datasets, elevata generates SQL automatically from metadata. In semantic layers (bizcore, serving), elevata additionally supports Custom Query Logic via an explicit Query Tree.

The Query Tree defines the shape of a query (e.g. windowing, aggregation steps, union composition) while remaining fully metadata-native.

If enabled, the Query Tree is compiled into the same Logical Plan and Expression AST used by the default generation pipeline. If disabled, elevata falls back to fully automatic SQL generation.

This ensures advanced query shaping without introducing manual SQL or breaking determinism, lineage, or governance guarantees.

🔧 4. Incremental Processing Path¶

Stage Dataset
  ↓  (Lineage Mapping)
Merge SQL
  ↓
Delete Detection
  ↓
Rawcore Dataset

These two strategies are currently implemented:

full
merge

Both operate exclusively between Stage → Rawcore.

🔧 5. Dialect Resolution Overview¶

ELEVATA_SQL_DIALECT env var  →  Dialect Adapter (override)
Active Profile (elevata_profiles.yaml)  →  Dialect Adapter
DuckDBDialect (fallback)  →  Dialect Adapter

The resolution order is:

Environment override
Profile definition
DuckDB fallback

🔧 6. Unified SQL Generation Pipeline¶

Metadata Model
  → Logical Plan Builder
  → SQL Renderer
  → Dialect Adapter
  → Load SQL (full, merge, delete)

🔧 7. Why This Architecture Matters¶

Vendor neutrality via dialect adapters
Determinism via SQL rendering rules
Traceability via lineage-driven logic
Extensibility (new dialects, strategies, materializations)
Incremental ready with merge + delete detection
Safe for CI/CD - predictable SQL for diffing and testing
Execution & Logging are part of the system