Changelog¶
Changelog¶
All notable changes to this project will be documented in this file.
This project adheres to Semantic Versioning and Keep a Changelog.
📈 For the full roadmap, see Project Readme
🧾 Licensed under the AGPL-v3 — open, governed, and community-driven.
💡 elevata keeps evolving — one small, meaningful release at a time.
[Unreleased]¶
TBD
[1.4.1] - 2026-03-01¶
This patch release introduces and stabilizes elevata’s dialect-specific SQL keyword registry.
The focus of this release is deterministic, dialect-owned handling of reserved identifiers across all supported warehouses.
✨ Added¶
Deterministic SQL Keyword Registry¶
- Introduced dialect-specific reserved keyword modules
(rendering/dialects/keywords/*.py) - Added
elevata_generate_reserved_keywordsmanagement command - Engine-truth extraction for:
- Databricks (
sql_keywords()) - DuckDB (
duckdb_keywords()) - Documentation-based extraction with strict validation for:
- PostgreSQL
- Snowflake
- MSSQL
- Fabric Warehouse
- BigQuery
- Sanity checks guarding against incomplete or malformed keyword sets
- Optional core SQL overlay for defensive fallback safety
🔄 Improved¶
- Fully restored dialect-owned SQL rendering for keyword extraction
- Ensured no vendor-specific SQL remains in management commands
- Stabilized documentation parsing across vendor doc layouts
- Harmonized quoting behavior across all dialects
🔒 Architectural Integrity¶
- Keyword extraction is deterministic and reproducible
- All SQL used in extraction is owned by the dialect layer
- Identifier quoting behavior is now fully dialect-driven
[1.4.0] - 2026-02-28¶
This release significantly expands elevata’s ingestion capabilities (REST + Files)
while further strengthening deterministic, dialect-owned execution semantics
across all supported warehouses.
✨ Added¶
Ingestion Framework (RAW)¶
- Native RAW ingestion for REST APIs and file-based sources via
SourceDataset.ingestion_config - Support for CSV, JSON, JSONL, Excel and Parquet sources
- Environment variable expansion for file URIs (e.g.
${ELEVATA_INGEST_ROOT}/...) - CSV parsing options (
delimiter,quotechar,encoding) - Excel ingestion options (
sheet_name/sheet_index,header_row,max_rows) with strict validation - Chunked file processing for large datasets with accurate total row tracking
- Standardized RAW landing with preserved JSON
payload(system role) - Deterministic JSON path–driven column mapping for REST and JSON ingestion
Ingestion Routing¶
- Unified ingestion dispatcher as single entry point (
ingest_raw_for_source_dataset(...))
✨ Improved¶
Dialect-Owned Merge Rendering¶
- Refactored merge rendering to delegate SQL shape entirely to dialects
load_sqlnow provides semantic ingredients only (source select, keys, columns)- Cross-platform MERGE fallbacks where native MERGE is not available
- Improved dialect diagnostics and merge validation behavior
Fully Executable Historization (SCD Type 2)¶
- History datasets (
*_hist) now generate execution-ready SQL - Historization pipeline fully dialect-owned (incremental history primitives)
- Strict validation for required SCD technical columns and INSERT alignment
🛠️ Fixed¶
- SQLAlchemy 2.0 execution compatibility in ingestion landing paths
- Correct cumulative row reporting for chunked RAW ingestion
- Cross-dialect merge, historization and delete-detection edge cases
- BigQuery hashing stability and literal rendering correctness
- Windows file URI normalization for file-based ingestion
[1.3.1] - 2026-02-18¶
This release stabilizes the Airflow example environment and resolves several issues
that could prevent successful execution when testing elevata with external target
systems.
The focus of this release is usability and reliability of the Airflow example setup,
ensuring that users can immediately validate elevata orchestration against their own
target database without requiring additional infrastructure.
✨ Improved¶
Airflow Example Stability¶
- Airflow example now installs backend-specific dependencies at container startup
instead of build time - Backend selection is now driven entirely by environment configuration
via
ELEVATA_SQL_DIALECT - Removed hardcoded backend assumptions from Dockerfile and docker-compose
- Airflow containers now run elevata inside an isolated virtual environment,
preventing dependency conflicts with Airflow itself - Improved compatibility with Databricks, PostgreSQL, MSSQL, Snowflake,
BigQuery and other supported targets
Configuration & Environment Handling¶
- Environment configuration now consistently sourced from
.env - Removed docker-compose defaults overriding user configuration
- Simplified switching between target systems without rebuilding images
- Improved portability for local testing and Open Source usage
Dependency Handling¶
- Backend-specific requirements are installed only when required
- Prevented SQLAlchemy version conflicts with Airflow dependencies
- Optional dialect imports (e.g. DuckDB) no longer fail when backend is not active
🛠️ Fixed¶
- Airflow example failing to start due to dependency conflicts
- SQLAlchemy downgrade caused by backend requirement installation
- Incorrect backend loading when multiple dialects were present
- Environment variable precedence issues between
.env,
docker-compose defaults and container runtime - Multiple startup issues related to entrypoint execution
- Improved robustness of dialect imports across environments
🔧 Internal¶
- Refactored Airflow entrypoint initialization logic
- Added backend installation stamp mechanism to avoid repeated installs
- Improved separation between Airflow runtime dependencies
and elevata execution dependencies
[1.3.0] - 2026-02-17¶
This release introduces execution safety improvements, deterministic schema evolution,
and orchestration-level predictability enhancements.
The focus of this release is long-term stability and reproducible execution behavior
across platforms, rather than expanding modeling capabilities.
Execution planning, schema evolution, and load execution are now aligned around a
deterministic preflight model that guarantees predictable outcomes before execution begins.
✨ Added¶
Execution Safety & Predictability¶
- Introduced preflight validation phase before execution
- Deterministic failure behavior for unsafe schema evolution scenarios
- Execution now blocks before SQL execution when unsafe changes are detected
- Consistent error vs warning classification during materialization planning
- Improved transparency of execution behavior and failure causes
Schema Evolution & Type Drift Handling¶
- Canonical datatype comparison across all supported dialects
- Type drift detection during materialization planning
- Drift classification into:
- equivalent
- widening (safe)
- narrowing / incompatible (unsafe)
- Automatic schema evolution for safe widening changes
- Dialect-aware ALTER COLUMN generation where supported
- Deterministic rebuild fallback for dialects without ALTER support
- Consistent schema evolution behavior across Snowflake, Databricks, PostgreSQL,
MSSQL and DuckDB
Materialization & Execution Architecture¶
- Materialization split into planning and application phases
- MaterializationPlan now explicitly represents:
- required DDL steps
- warnings
- blocking errors
- Structural synchronization of rawcore
_histdatasets aligned with base datasets - Improved consistency between schema introspection, metadata, and execution behavior
Orchestration & Execution Planning¶
- Metadata-driven execution manifest generation
- Deterministic dependency graph for dataset execution order
- Airflow example DAG for lineage-based execution orchestration
- Execution planning fingerprint for reproducibility validation
Query Builder UX Improvements¶
- Query Builder now displays inline validation and dependency conflict messages
directly in the editing context instead of failing silently - Blocking mutations caused by downstream dataset dependencies are surfaced
as contextual warnings in the grid UI - Inline editors remain open when a mutation is rejected, allowing users to
immediately correct input - Conflict warnings are automatically cleared when cancelling or completing edits
- Improved transparency when renaming or modifying query-derived columns
referenced downstream
🛠️ Fixed¶
- Multiple edge cases where schema drift could result in inconsistent execution behavior
- Incorrect handling of schema evolution across dialect-specific type representations
- Improved stability of materialization planning for renamed datasets and columns
- Various internal consistency improvements in execution planning and validation
🔒 Governance & Determinism¶
- Execution behavior is now fully determined before execution begins
- Unsafe schema changes are blocked deterministically
- Reduced risk of partial schema application or inconsistent load states
- Improved alignment between metadata contracts and physical schema evolution
[1.2.0] - 2026-02-08¶
This release introduces a major upgrade to Query Builder contract handling,
query-derived schema synchronization, and datatype inference.
It significantly improves determinism, usability, and correctness of
metadata-driven query modeling while simplifying internal typing logic.
✨ Added¶
Multi-platform execution support¶
- Added native execution dialects for:
- Snowflake
- Databricks SQL (Unity Catalog)
- Microsoft Fabric Warehouse
- Unified execution semantics across platforms using the same metadata model
- No platform-specific modeling required
- Architecture defined once, executed consistently across engines
This release marks the first version where elevata execution semantics are aligned across
multiple modern cloud data warehouse platforms.
Query Builder & Contract Handling¶
- Query-derived TargetColumns are now fully synchronized with the Query Tree output contract
- Automatic creation, rename, update and deletion of query-derived columns
- Contract-based schema alignment between:
- SQL preview
- logical plan
- materialized dataset schema
- Aggregate nodes now redefine output contracts explicitly as:
- group keys + measures only
- Window and Aggregate operators expose correct upstream input columns
without requiring intermediate target column assignment
Datatype inference¶
- Deterministic datatype inference for:
- window functions (ROW_NUMBER, RANK, DENSE_RANK)
- aggregate measures (COUNT, SUM, MIN, MAX, AVG)
- COUNT and ranking functions now default to INTEGER instead of BIGINT
- Datatypes inferred from upstream input columns where possible
- Canonical datatype resolution aligned with DATATYPE_CHOICES
Query Builder UX improvements¶
- Aggregate editor now shows input dataset columns directly
- Window and Aggregate nodes operate on input contracts instead of materialized target schema
- Eliminates unnecessary intermediate column creation
Databricks execution improvements¶
- Improved raw ingestion performance for Databricks SQL Warehouse by batching multi-row INSERT execution
- Eliminates per-row execution overhead caused by connector-level executemany behavior
- No changes required to metadata models or query definitions
🛠️ Fixed¶
- Window columns incorrectly defaulting to STRING datatype
- Aggregate measures not inheriting input column datatype
- Contract sync overriding inferred datatypes
- Missing input columns in Aggregate editor selection
- Incorrect system-managed flag for query-derived columns
- Multiple contract sync race conditions caused by partial state inference
🔒 Governance & Determinism¶
- Query contract inference now consistently evaluates against query_head
- Dataset schema always reflects effective query output contract
- Deterministic datatype normalization across query-derived columns
- Reduced risk of schema drift between metadata and generated SQL
[1.1.0] - 2026-01-31¶
This release introduces a major upgrade to the Query Builder and UNION workflow,
together with critical stability and governance improvements across metadata,
schema sync, and historization.
✨ Added¶
Query Builder & UNION workflow¶
- Full UNION operator support with:
- Branch management
- Output schema contracts
- Column mappings per branch
- Guided UNION toolbar with:
- Output / Branch navigation
- Schema copy from branch
- Auto-map by name
- Validation and “Set as head”
- Contract snapshot with input → output diff
- Determinism and governance indicators (ordering, window/aggregate checks)
- Contextual node summaries (SELECT, AGGREGATE, WINDOW, UNION)
Metadata-driven governance¶
- Clear separation of query_root (anchor) and query_head (current output)
- Head-based validation, SQL preview and contract inference
- Guardrails for destructive operations with downstream dependents
Schema sync & historization¶
- Robust metadata-driven rename propagation via
former_names - Orphan preservation in
_hist(inactive, detached, not dropped) - Safe FK reference deletion that never removes hist columns
- Technical tail columns always appended last
- Deterministic ordinal normalization for system-managed schemas
🛠️ Fixed¶
- UNIQUE constraint violations in hist regeneration
- Broken rename propagation across rawcore → hist
- False-positive orphan deletions
- Inconsistent candidate column inference in query builder
- UNION validation / navigation not triggering correctly
- Output schema copy errors due to field mismatches
- Multiple signal / sync race conditions
🔒 Governance & Safety¶
- UNION validation enforces compatible schemas
- Set-as-head explicitly marks final dataset output
- Defensive rebuild of hist metadata with key lineage preservation
- Deterministic ordering guarantees for window and aggregate functions
[1.0.0] - 2026-01-25 — First stable release¶
This release marks the first stable, backwards-compatible version of elevata.
elevata has evolved from a SQL generation layer into a metadata-driven, deterministic data platform engine.
From this version onwards, interfaces, metadata structures and execution semantics are considered stable.
✨ Highlights¶
🪄 Query Builder (Major Feature)¶
- Explicit, metadata-driven query planning with SELECT, AGGREGATE, WINDOW and UNION nodes
- Deterministic query execution via a formal Query Tree
- Clear separation between query structure and generated SQL
- Guided UI for building and evolving queries without writing SQL
- SQL preview reflects the exact executed query
🧠 Query Contracts¶
- Explicit, inspectable output schema contracts
- Field-level validation and error reporting
- Early detection of incompatible unions, missing mappings and invalid transformations
🧬 Lineage & Explainability¶
- End-to-end lineage across datasets, fields and query nodes
- Full traceability from source to serving layer
- Query tree visualization for complex transformations
🔒 Determinism & Governance¶
- Deterministic execution guarantees for aggregates and window functions
- ORDER BY / PARTITION BY governance with clear warnings and errors
- Clear distinction between execution errors and policy violations
🧩 Platform Maturity¶
- Stable execution semantics
- Backwards compatibility guarantees from this release onwards
- Foundation for orchestration and governance integrations in future releases
[0.9.1] – 2026-01-14¶
✨ Improved¶
- Refined lineage visualization with configurable multi-hop upstream and downstream views
- Clearer lineage semantics distinguishing direct inputs from extended execution dependencies
- Improved lineage UX with consistent ordering and scope labeling
🧠 Serving Layer¶
- Serving datasets now support presentation-oriented identifiers (friendly names)
- Dataset and column naming in Serving layer allows casing, spaces, and special characters
- Identifier handling is dialect-aware and uses proper quoting where required
🛡️ Validation & Health¶
- Extended metadata validation to distinguish blocking vs advisory findings
- Improved health checks for Serving and Bizcore datasets
- Validation logic consolidated and aligned across layers
🧩 Internal¶
- Improved consistency between lineage analysis, validators, and UI
- Minor internal cleanups in metadata services and views
[0.9.0] – 2026-01-12¶
🧠 Bizcore: Business Semantics as First-Class Metadata¶
This release introduces Bizcore, a dedicated layer for modeling
business meaning, rules, and calculations as explicit metadata —
executed deterministically alongside technical datasets.
Bizcore makes elevata business-capable by design, without introducing
BI-style semantic layers or query-time abstractions.
✨ Added¶
- Bizcore datasets and columns as first-class metadata objects
- Multi-upstream join support for Bizcore datasets
- UI support for building and validating Bizcore structures
- Deterministic SQL preview for Bizcore datasets
- Lineage-driven qualification of expressions and joins
- End-to-end traceability from Core → Bizcore → Serving
🔄 Changed¶
- SQL generation now fully respects semantic lineage in expressions
- Join aliasing and qualification are applied consistently across layers
🧪 Quality & Stability¶
- Extensive validation of join correctness and expression rendering
- Scoped and non-scoped UI flows aligned under a single metadata model
- No breaking changes to existing Raw, Stage, or Core pipelines
✨ Improved¶
- Manual expressions now automatically qualify unaliased column references
with the correct input alias during SQL generation.
This ensures consistent, unambiguous SQL for Bizcore calculations
without requiring users to manually prefix column names.
This release marks a major milestone: elevata now supports
explicit business semantics as metadata, not as BI-layer logic.
[0.8.0] – 2026-01-04¶
⚙️ Execution & Orchestration as First-Class Architecture¶
This release introduces an explicit, metadata-driven execution model,
establishing orchestration, failure semantics, and observability as first-class concerns in elevata.
Execution is now planned, executed, and explained independently of SQL generation,
providing a robust foundation for platform-native orchestration and governance.
✨ Added¶
- Explicit Execution Plan model separating planning from execution
- Dependency-graph–based dataset execution with deterministic ordering
- Multi-dataset batch execution with a shared
batch_run_id - Structured execution policies (
continue_on_error,max_retries) - Retry semantics with per-attempt tracking (
attempt_no) - Distinct failure semantics:
blocked(dependency-based non-execution)aborted(policy-based fail-fast non-execution)- Load Run Snapshot (
meta.load_run_snapshot) - Batch-level, JSON-based execution state
- Captures plan, policy, dependencies, and aggregated outcomes
- Extended Load Run Log (
meta.load_run_log) - Orchestration-only events (blocked / aborted)
- Best-effort, non-blocking meta logging
- CLI execution diagnostics:
- Execution snapshot printing (
--debug-execution) - Snapshot persistence (
--write-execution-snapshot) - Deterministic BigQuery table qualification for execution and metadata writes
(prevents sporadic cross-projectNotFounderrors during streaming inserts) - Global execution modes:
- single-dataset execution with dependencies (default)
- platform-wide execution in deterministic order (
--all) - optional schema-scoped execution (
--schema)
🔄 Changed¶
- Execution semantics are no longer implicit in SQL or CLI flow
- Load execution is now driven by an explicit execution model
- Fail-fast behavior is deterministic and explicitly reported
- Execution observability is metadata-first and dialect-agnostic
🧪 Quality & Stability¶
- Extensive unit tests for execution ordering, retries, fail-fast, and blocking
- Guardrails for orchestration-only events and best-effort persistence
- Clear separation of execution core vs CLI and dialect adapters
- No destructive changes to existing materialization or SQL generation logic
This release establishes elevata as a self-orchestrating, explainable
data platform core, laying the groundwork for native scheduling,
governance rules, and external orchestration integrations.
[0.7.1] – 2025-12-29¶
🧱 Metadata-Driven Schema Evolution¶
This release completes and stabilizes the first materialization layer for
safe, deterministic schema evolution in target warehouses.
Schema changes are now derived explicitly from metadata and applied in a
controlled, lineage-aware manner — without implicit inference from SQL.
✨ Added¶
- Metadata-driven materialization planning for target datasets
- Automatic provisioning of missing target tables
- Deterministic column synchronization (additive, non-destructive)
- Explicit handling of dataset and column renames via
former_names - Lineage-aware propagation of renames into history (
_hist) datasets - Deterministic
INSERT … (column list)generation across all dialects - DuckDB-native introspection via PRAGMA with execution-engine consistency
🔄 Changed¶
- Materialization is now planned separately from SQL rendering
- Table existence is determined by effective provisioning steps, not schema creation alone
- Incremental loads reliably auto-provision target tables when required
- History datasets are provisioned deterministically and stay structurally aligned with base tables
🧪 Quality & Stability¶
- Extensive unit tests for materialization planning and rename scenarios
- Guardrails for ambiguous rename situations (multiple former matches)
- Improved separation of introspection vs execution concerns
- Removed duplicate provisioning paths and race conditions
This release lays the foundation for controlled schema evolution,
future governance rules, and automated validation layers.
[0.7.0] – 2025-12-21¶
Added¶
- Dataset-driven, lineage-aware execution with automatic dependency resolution
- Unified RAW execution semantics via physical ingestion (Source → RAW)
- Stable technical column model across all layers
- BigQuery execution backend with native ingestion support
- Dialect-aware hashing and surrogate key generation (BigQuery, DuckDB, Postgres, MSSQL)
- SourceDataset-level static and incremental filters with runtime {{DELTA_CUTOFF}} resolution
Changed¶
- Execution operates on datasets rather than layers
- RAW datasets are treated as an optional ingestion layer
- RAW ingestion always rebuilds tables, while source extraction may be incrementally scoped
Fixed¶
- Load plan debug output and execution logging consistency
- Signal handling for historization execution order
- Correct application of incremental filters during ingestion and delete detection
- Lineage-based translation of incremental scope filters across renamed columns
- Cross-dialect consistency for incremental execution (DuckDB, MSSQL, Postgres, BigQuery)
[0.6.1] – 2025-12-15¶
Fixed¶
- Correct introspection of SQL Server alias types (e.g.
dbo.Name,dbo.Flag) - Proper handling of
bitcolumns during ingestion (no fallback to string types) - Correct precision and scale mapping for
moneyandsmallmoneycolumns - Stable and deterministic column ordering during metadata import
Improved¶
- Lossless ingestion of source datatypes via
source_datatype_raw - Strict, fail-fast dialect-specific type rendering to prevent silent fallbacks
Notes¶
- This release significantly improves correctness for SQL Server as a source system.
- Re-importing source metadata is recommended to benefit from the improved typing behavior.
[0.6.0] – 2025-12-14¶
🚀 Warehouse-Native Execution & SCD Historization¶
This release introduces the foundation for a fully warehouse-native execution framework.
elevata now manages entire data load pipelines end-to-end — from metadata to SQL generation to execution, historization and observability.
✨ Major Features¶
1. Execution Engine (--execute)¶
elevata can now execute rendered SQL directly against target systems, measure performance, record affected rows, and log complete run metadata.
This shifts elevata beyond SQL rendering into a full pipeline engine.
2. Full SCD Type 2 Historization¶
A deterministic, metadata-driven historization framework:
- automatic change detection via row-hash
- version closing for changed and deleted keys
- insertion of new and changed versions
- lineage-aware attribute propagation
3. Metadata-Driven Incremental Merge Loads¶
Complete incremental pipeline including:
- new-row inserts
- changed-row updates
- delete detection
- MERGE or UPDATE+INSERT fallback depending on dialect
4. Auto-Provisioning of Warehouse Structures¶
elevata can automatically create:
- target schemas (raw, stage, rawcore, ...)
- the meta.load_run_log table
- all required objects for execution and logging
Controlled via .env flags.
5. Warehouse-Level Load Logging¶
A new table meta.load_run_log provides full observability into load executions:
- load mode, historization flags, dialect
- start/end timestamps, render/execution duration
- rows affected, error messages, status
- batch and run identifiers
6. Documentation Expansion¶
- New historization architecture document
- Extended execution, logging, and provisioning sections
- Revised dialect and SQL generation chapters
🧪 Testing Improvements¶
- Deterministic SQL tests for merge and historization pipelines
- Combined historization pipeline tests
- Prepared E2E execution flow for dialect-specific execution engines
This release establishes the execution foundation on which future orchestration, validation and automation layers will be built.
[0.5.3] — 2025-12-10¶
🔹 Historization Structure & Dialect Engine Enhancements¶
This release completes the metadata foundation required for full historized incremental loading in v0.6.0. It finalizes *_hist dataset structure, ensures cross-dialect consistency, and extends SQL rendering to use dialect-driven identifier rules.
✨ Highlights¶
Metadata / Historization¶
- Automatic creation and maintenance of
<dataset>_histdatasets in RAWCORE - Full rename propagation for datasets and columns
- All *_hist fields are system-managed and read-only
- New technical field in RAWCORE:
row_hashfor change detection (persisted expression) - Versioning strategy established:
version_started_atinclusive,version_ended_atexclusive- open-ended validity via max timestamp
version_state(current,changed,deleted)
SQL Generation / Dialects¶
- Unified
render_identifier()andrender_table_identifier()for consistent quoting - All SQL generation now uses dialect identifier rendering
- Delete detection routing tested and guarded per dialect capability
Load Runner¶
elevata_loadsupports--executewith safe stub execution viaExecutionEngine- Logging improvements and full dry-run support remain functional
Testing & Stability¶
- Expanded test coverage for historization and dialect routing
- Full suite green across merge, delete detection & *_hist scenarios
[0.5.2] — 2025-12-07¶
🛠️️ Metadata stability & History (HIST) foundation¶
This release significantly improves the robustness, determinism, and safety of history metadata generation in the RAWCORE schema.
✨ Highlights¶
Metadata / Historization¶
- Deterministic generation of *_hist datasets based on lineage_key.
- Robust schema sync between RAWCORE and *_hist (idempotent, safe deletes).
- History SK expression based on rawcore SK + version_started_at.
- History BK definition: rawcore SK + version_started_at.
- History datasets and columns are fully system-managed (no UI unlock).
Signals & UI¶
- Automatic *_hist sync on dataset rename and column changes in rawcore.
- Inline rename refreshes both rawcore and corresponding *_hist rows.
- Inline editing is disabled for *_hist datasets and columns.
SQL Preview¶
- build_sql_preview_for_target returns a clear comment for history targets instead of misleading SQL.
- Tests added to guard the _hist-preview behaviour.
[0.5.1] — 2025-12-04¶
🧹 Documentation & Consistency Release¶
This patch focuses on improving the clarity, coherence, and structure of elevata’s developer documentation.
✨ Highlights¶
- Full harmonization of all architecture documents
- Removal of outdated version references and legacy wording
- Unified heading and layout style across all Markdown files
- Consistent terminology for LogicalPlan, Expression DSL, Dialects, and Load SQL
- Improved mkdocs navigation structure
- Minor text corrections and consistency fixes across the docs
🚫 No functional changes¶
This release does not modify the SQL engine, metadata model, or any public API surface.
All test suites remain unchanged and green.
[0.5.0] — 2025-12-01¶
🛠️️ Multi-Dialect Engine, MSSQL Support & Deterministic FK Hashing¶
This release delivers the next major milestone of elevata’s SQL engine: full multi-dialect SQL generation, an extensible dialect factory, runtime dialect switching in the UI, and a complete rewrite of the surrogate-key and foreign-key hashing system using a vendor-neutral DSL AST.
🚀 Major Features¶
1. Multi-Dialect SQL Rendering (Postgres, DuckDB, MSSQL)¶
- New pluggable dialect architecture (
SqlDialect,dialect_factory). - Three fully operational dialects:
- DuckDBDialect
- PostgresDialect
- MssqlDialect (new)
- Centralised dialect registry & runtime resolution via:
- profile
- env (
ELEVATA_SQL_DIALECT) - URL parameter in SQL preview
All SQL generation (preview + Load Runner) now passes through a unified, dialect-aware pipeline.
2. SQL Preview Dialect Selector (UI)¶
- New dropdown in TargetDataset detail view.
- Instant SQL refresh via HTMX request.
- Clean display of dialect-specific SQL functions (quoting, hashing, concat, types).
3. Deterministic, Cross-Dialect Hashing via DSL AST¶
A full rewrite of surrogate-key and FK hashing:
- New DSL expression system (
Hash256Expression,ConcatWsExpression,Literal,ColumnRef). - Dialect-specific SQL rendering happens exclusively in dialect classes.
- Identical logical lineage yields identical hash values across vendors.
- Fully deterministic ordering + null replacement semantics.
- Clean child-lineage FK hashing:
- BK1, child BK1, BK2, child BK2…
~and|literal separators, ordered alphabetically
All existing hashing tests green after the rewrite.
4. Multi-Source Stage Identity Mode¶
- Correct logical union builder for Stage datasets with multiple upstream sources.
- Clean identity (no ranking) vs. non-identity (ranking) handling.
- Injected
source_identity_idliteral per upstream branch. - All multi-source identity tests fully passing.
5. Dialect-Aware FK Rendering¶
- Parent surrogate keys and child FK keys now rendered via DSL → dialect.
- MSSQL:
CONVERT(VARCHAR(64), HASHBYTES('SHA2_256', …), 2) - Postgres:
ENCODE(DIGEST(CONCAT_WS(...), 'sha256'), 'hex') - DuckDB:
SHA256(CONCAT_WS(...))
🔧 Internal Improvements¶
- Entire
builder.pycleaned, simplified, and refactored. - Unified
render_select_for_target()and load-SQL paths. - Removed legacy manual hashing logic.
- No raw SQL string assembly left in hashing pipeline.
- Strict quoting rules per dialect.
- Sauber extrahierte DSL operators (
col(),lit(),concat_ws(),hash256()).
🧪 Testing¶
- New tests:
test_dialect_postgres.pytest_hashing_dialects.pytest_fk_hashing.py- Full MSSQL hashing coverage
- Updated test helpers for DSL AST inspection.
- All Stage multi-source tests green after identity-mode rewrite.
📘 Documentation¶
- Updated architecture docs:
- Dialect System
- SQL Rendering Conventions
- Hashing Architecture
- README modernised with new capabilities and architecture.
🧭 Roadmap Shift¶
With the 0.5.0 SQL backend complete, the next stage focuses on execution:
- Load Runner CLI (Full, Merge, Dry-Run)
- Caching & improved SQL formatting
- Multi-source incremental merges
- Additional dialects (Snowflake, BigQuery, Databricks)
Impact
Version 0.5.0 transforms elevata into a true multi-backend SQL generator
with deterministic hashing, dialect-specific rendering, and a stable architectural core
for future execution engines.
[0.4.0] — 2025-11-20¶
🧠 Dialect Architecture & Load SQL Modernization¶
This release marks a major leap for elevata:
a complete SQL dialect abstraction layer, a unified Load-SQL pipeline,
and extensive new documentation that sets the foundation for future multi-backend support.
🚀 Core Features¶
Fully Modular SQL Dialect System¶
A new, extensible dialect layer powers all SQL generation:
- Central
SqlDialectbase class - Concrete
DuckDBDialectreference implementation - Dialect resolution via
ELEVATA_SQL_DIALECT,ELEVATA_DIALECT, and active profile - Dialect capabilities:
supports_mergesupports_delete_detection- Expression-level hooks:
concat_expression()hash_expression()cast_expression()render_literal()
This architecture enables clean, vendor-neutral SQL generation for future backends
(Postgres, MSSQL, Snowflake, BigQuery, Databricks).
🔧 Load SQL Architecture 2.0¶
A fully redesigned, dialect-aware Load SQL engine:
Full Load¶
render_create_replace_tablerender_insert_into_table- Uses dialect quoting, casting, literal handling
Incremental Merge Load¶
- Native dialect-specific
MERGEfor DuckDB - Clean failure modes for dialects without merge support
- Deterministic key handling
- Automatic update/insert column mapping
Delete Detection¶
- Dialect-specific implementation (
DELETE … WHERE NOT EXISTS) - Guardrails when delete detection is requested but dialect does not support it
All Load SQL now flows through a single, coherent pipeline via load_sql.py.
🧪 Testing Enhancements¶
- New test suite for:
- literal rendering (
NULL, booleans, strings, dates, datetimes) - cast expression rendering
- concat & hash expression helpers
- merge & delete detection dialect hooks
- End-to-end tests for Full and Merge load generation
- All tests green across the refactor
This ensures reliable future extensions to new SQL dialects.
📘 Documentation¶
Three major new documents added:
- Dialect System — full architectural overview of dialect abstraction
- Load SQL Architecture — how Full, Merge, and Delete Detection SQL are generated
- Incremental Load Architecture — planner, merge semantics, delete detection
All are linked from:
- index.md
- README_docs.md
- mkdocs.yml navigation
🔍 Internal Improvements¶
- Harmonized
get_active_dialect()with environment and profile resolution - Consolidated SQL preview and load paths to use the same dialect entrypoints
- Removed legacy assumptions and duplicated logic
- Fully revised DuckDB implementation as reference for new dialects
🗺️ Roadmap Impact¶
With 0.4.0 released, the following items shift to 0.5.x:
- Target System Selector (Profiles → target backend)
- Additional SQL dialects (MSSQL, Postgres, Snowflake)
- Pseudo-Lineage Graph in UI
- Multi-Source Incremental Loads
- Load-Runner CLI
These features build directly on the new architecture introduced in 0.4.0.
Impact
Version 0.4.0 delivers the foundational SQL engine for elevata’s future:
clean, extensible, and ready for multiple SQL backends.
It stabilizes the path toward 0.5.x — where elevata becomes a multi-dialect metadata-driven ETL generator.
[0.3.0] — 2025-11-12¶
Lineage-Aware Target Generation & SQL Preview¶
🚀 Core Features¶
Lineage-Driven Target Generation
- Added a stable lineage_key to both TargetDataset and TargetColumn:
- Enables fully idempotent target generation.
- Prevents duplicate targets after renaming (lineage_key is preserved).
- TargetGenerationService.apply_all() refactored into modular steps:
- Existing datasets are now matched and updated via lineage_key instead of physical names.
- Clean dataset-level and column-level re-numbering during regeneration.
Three-Layer Data Lineage
- Explicit dataset-level lineage:
- TargetDatasetInput defines upstream relationships (source_dataset and/or upstream_target_dataset).
- combination_mode (single or union) indicates how multiple inputs are combined.
- Explicit column-level lineage:
TargetColumnInputmirrors the same relationships for individual columns.-
upstream_columnsnow correctly map transformations between layers. -
Layer-specific rules:
- Raw = only
source_datasets - Stage = prefers Raw as upstream (or Source directly if
generate_raw_tables=False) - Rawcore = always built from Stage
Multi-Source Consolidation
- New SourceDatasetGroup + SourceDatasetGroupMembership model:
- Supports joining multiple SourceDatasets into a single Stage target.
- The “primary system” flag defines which source drives column order.
- TargetDatasetInput.role classifies inputs as:
- primary, enrichment, reference_lookup, or audit_only.
Surrogate & Business Keys
- Surrogate key columns are automatically renamed when their dataset is renamed
→ e.g. renaming rc_aw_productmodel → rc_aw_product_model auto-renames the key column to rc_aw_product_model_key.
- Surrogate key expressions now reference upstream column names (Raw or Stage), not renamed targets.
- Deterministic column ordering:
1. Surrogate keys
2. Business keys
3. Integrated source columns
4. Artificial columns
Column Generation Enhancements
- Automatic assignment of ordinal_position on save:
- Newly created columns append at the end in numeric sequence.
- Safe against manual reordering.
- Integrated columns added after initial generation are correctly appended and re-numbered without violating unique constraints.
🧠 Logical Query Model & SQL Preview¶
Logical Plan Layer
- New internal model (logical_plan.py) represents canonical SQL structure for a target dataset:
- Supports LogicalSelect, LogicalUnion, LogicalExpression, and lineage mapping.
- builder.py now constructs expressions (Surrogate Key, BK, and regular fields) from TargetColumnInput lineage.
- Dialect-specific type mapping handled cleanly via map_logical_to_duckdb_type.
SQL Preview 2.0
- SQL preview now generates true lineage-based SELECT statements, e.g.:
- Stage:
sql
SELECT … FROM "raw"."raw_aw1_person"
UNION ALL
SELECT … FROM "raw"."raw_aw2_person"
- Rawcore:
sql
SELECT hash256(…) AS rc_aw_person_key, …
FROM "stage"."stg_aw_person"
- Automatic field alignment:
- Columns missing in one upstream are rendered as NULL AS <column>.
- Integrated columns retain their target aliases.
- Supports both manual_expression and templated ({{ … }}) syntax.
- New visually distinct green preview box in UI with proper formatting:
- Keywords capitalized
- Indentation after SELECT
- Clean separation before FROM
🧩 UI, Governance & Behavior¶
- Context-aware lineage display in detail views:
Source DatasetsandUpstream Datasetsshown based on layer.- Input relations now read like:
raw_aw1_person · businessentityid -> stg_aw_person · businessentityid - System-managed field handling refined:
- Layer-specific read-only fields controlled by settings.
lineage_keytreated as an internal system field (hidden in forms and lists).- Surrogate key names locked for user editing but updated automatically when renaming datasets.
🧪 Testing & Quality¶
Structured Testing Foundation
- Introduced the first complete automated test framework for the metadata generation platform.
- Added dedicated runtests.py launcher for reliable execution across environments.
- Integrated realistic DB-based lineage tests (Raw → Stage → Rawcore).
- Added logic-only tests for hashing, naming, and validators.
- Prepared SQL Preview test templates for the future rendering pipeline.
- New documentation: 🧪 Testing & Quality
Impact
This milestone establishes a solid foundation for test coverage,
ensuring safe refactoring, reproducibility, and confidence in every release.
[0.2.6] — 2025-11-03¶
⚙️ Target Generation & Surrogate Key Implementation¶
Core Features
- Introduced fully automated TargetDataset and TargetColumn generation service (TargetGenerationService).
- Deterministic surrogate key creation using SHA-256 and runtime-loaded pepper.
- Added business_key_column and surrogate_key_column flags to differentiate logical vs. physical keys.
- Layer-aware naming now based on TargetSchema.physical_prefix (no hardcoded prefixes).
- Integrated filtering: only integrate=True columns included across all layers.
UI Enhancements
- Added “Generate Targets” button to SourceDataset list with progress spinner & success message.
- Improved error feedback and runtime validation for pepper and target schema scope.
- Consistent Bootstrap iconography (bi-lightning-charge) and visual feedback for active operations.
Technical Refinements
- Surrogate key expressions persisted in metadata for transparency and traceability.
- Environment-based pepper resolution via .env and get_runtime_pepper().
- Refactored naming logic (naming.py, rules.py, mappers.py) for consistent layer-specific conventions.
Impact
This release completes the Target Automation foundation for elevata —
paving the way for v0.3.0’s Meta-SQL and rendering engine. 🚀
[0.2.5] — 2025-10-27¶
🧩 Metadata Model Finalization & UI Polish¶
Core Enhancements
- Completed redesign of the core metadata model — fully aligned with the 0.3.x architecture.
- Added TargetSchema as a first-class model defining platform layers (raw, stage, rawcore, bizcore, serving).
- Introduced TargetDatasetInput and TargetColumnInput for multi-source mappings and lineage tracking.
- Added lifecycle flags (active, retired_at) for controlled dataset and column deprecation.
- Simplified incremental-load logic (increment_filter placeholder on SourceDataset).
- Unified naming conventions (*_schema_name, *_dataset_name) across all models.
- Extended governance primitives (sensitivity, access_intent) and surrogate-key configuration per layer.
- Removed obsolete fields (get_metadata, stage_dataset, etc.) and harmonized field semantics.
UI & Usability
- Introduced SourceDatasetGroup for managing groups of structurally identical source tables.
- Added governance badges and toggles for better lineage and visibility cues.
- Revised navigation order for more natural workflows.
- Improved help texts, icons, and consistent color themes across all metadata entities.
Impact
This release finalizes the metadata foundation for elevata — stable enough for automation development in 0.3.x.
No breaking structural changes expected before 0.3.0.
🪶 UI Comfort Continuation¶
- Unified color scheme for governance badges (
badge-pii-high,badge-pk, …). - Improved hover feedback and spacing in list views.
- All badges defined declaratively via
ELEVATA_CRUD— no model-specific logic required. - Updated
elevata-theme.cssfor consistent badge geometry and hover states.
Why it matters
Version 0.2.5 concludes the “Model & Comfort” milestone:
the framework now combines a stable metadata core, polished UI, and ready groundwork for automated target generation in 0.3.x. 🚀
[0.2.4] — 2025-10-26¶
Strategic Documentation & Architecture Alignment¶
This release finalizes the strategic and architectural foundation for the upcoming metadata model freeze (v0.3.x).
It does not yet include model changes — instead, it defines the why and how for the next major milestone.
Highlights - New and refined README with philosophy, vision, and AGPLv3 licensing - Updated roadmap outlining the transition toward declarative architecture - Strategic dbt decoupling paper, defining the new “governed SQL through architecture” direction - Preparations for TargetSchema, TargetDatasetReference, and deterministic key generation to follow in v0.3.x
Why it matters
This release marks the calm before the model storm — the documentation is ready, the vision is clear, and the next step is building it. 🚀
[0.2.3] – 2025-10-25¶
🪶 UI Comfort Release¶
Highlights
- Added generic, reusable filter bar for all CRUD list views
- Added dynamic toggle buttons for boolean fields
- Improved badge rendering for PII & PK indicators
- Added sticky table headers for long datasets
Why it matters
This release focuses purely on usability and governance visibility.
It lays the groundwork for 0.3.0 (TargetDataset automation and lineage features).
[0.2.2] – 2025-10-25¶
🧹 Maintenance Release — dbt Dependency Cleanup¶
Summary¶
This minor maintenance release removes all remaining dbt-related artefacts and clarifies elevata’s independent direction ahead of the 0.3.x milestone.
🔧 Changes¶
- Removed unused
dbt_project/folder from repository. - Deleted all
DBT_*variables from.envand example configuration files. - Removed dbt references from
NOTICE.mdand documentation. - Updated
README.mdanddbt_decoupling.mdto reflect full runtime independence. - Adjusted Roadmap and strategy wording in
CHANGELOG.md(dbt now optional adapter, not dependency). - Minor documentation clean-ups and license consistency fixes (MIT → AGPL v3 in trademark notice).
💡 Notes¶
This release does not introduce new features but marks an important architectural boundary:
elevata ≥ 0.2.2 operates entirely without dbt or its configuration files.
The foundation for native rendering and execution begins with v0.3.x.
[0.2.1] – 2025-10-23¶
🪶 Improved¶
- Added truncation for long text fields (
Description,Remark) in list views to improve readability. - Full text now appears on hover for better UX.
- Refined visual highlighting for primary and integrate columns.
- Minor CSS polish and layout consistency fixes across metadata tables.
[0.2.0] - 2025-10-22¶
🧩 Metadata Introspection & Profiles Integration¶
Overview¶
This release marks a major milestone – elevata now connects to relational sources via SQLAlchemy and imports full schema metadata directly into its core models. The new profile and secret management architecture lays the foundation for secure, declarative, and environment-aware metadata operations.
🚀 Highlights¶
- Generic Metadata Import via SQLAlchemy
- Engine factory supporting multiple relational backends (MSSQL, Postgres, SQLite).
- Reads column definitions, data types, PK information from
SourceDatasetentries. -
Automatic datatype normalization across dialects (e.g.
NVARCHAR→ STRING,BIT→ BOOLEAN). -
Flexible Secrets & Profiles
- Unified
elevata_profiles.yamlconfig with environment-based secret resolution. - Connection references derived convention-based from
typeandshort_name. -
Optional Azure Key Vault integration.
-
Security & Configuration
- Sensitive data never stored in the database.
- Secrets resolved dynamically at runtime via
.envor Key Vault. -
Clear separation of metadata and operational configuration.
-
Developer Experience
- Simplified connector interfaces and improved error reporting.
- Cleaner model relationships for
SourceSystemandSourceDataset. - New code organization:
connectors.py,resolver.py,ref_builder.py.
🧭 Next: v0.3.0 will focus on automated target model generation and metadata lineage.¶
🧠 Technical Notes¶
- Fully decoupled from dbt profiles; all runtime connections and secrets resolved through
elevata_profiles.yaml. - All SQL renderers return expressions as plain text templates — ready for downstream ELT tools or custom runners.
- Surrogate Key hashing implemented engine-specifically (Postgres pgcrypto / MSSQL HASHBYTES).
- Supports per-profile Overrides for multi-DB systems (e.g.
sap1,sap2→sap). - Improved ordering, idempotency and error reporting in import and generation routines.
[0.1.1] - 2025-10-19¶
🪶 UI Polish & PostgreSQL Power¶
Overview¶
A refinement release that makes elevata smoother and more flexible:
a polished Django UI meets full PostgreSQL support — available via Docker or your own setup.
Better visuals, faster workflows, and real database choice.
✨ Improvements¶
- UI & UX Enhancements
- Polished Django interface with cleaner layouts and spacing
- Improved responsiveness and overall visual consistency
-
Optimized inline interactions and usability tweaks
-
Database Support
- Full PostgreSQL backend support
- Works with Docker Compose or a user-provided instance
-
Updated settings for seamless configuration and migrations
-
Developer Experience
- Simplified environment setup (SQLite or PostgreSQL)
- Improved local testing through Docker Compose
[0.1.0] - 2025-10-14¶
🧩 Metadata Management Comes Alive¶
Overview¶
This release marks a major milestone:
elevata now provides a fully functional, metadata-driven web interface for managing your data platform’s core structures — built with Django, HTMX, and a clean Bootstrap 5 theme.
It’s the first end-to-end usable version:
from user login → to inline editing → to audit tracking — all running securely and responsively out of the box.
🚀 Highlights¶
- Complete Metadata Management Module
- Inline CRUD with audit fields and user tracking
-
Automatic URL & view generation for all models
-
Modern UI & UX
- Responsive elevata theme (Bootstrap 5.3)
- Autofocus & usability improvements for inline editing
-
Unified form and grid styling
-
Security & Reliability
- Integrated authentication (login, logout, password change)
-
Safe CSRF handling for all HTMX requests
-
Developer Experience
- Default SQLite backend for easy setup
- Clean folder structure:
core/,metadata/,dbt_project/ - Ready for future extensions (PostgreSQL, dbt, etc.)
[0.0.1] - 2025-10-06¶
Added¶
- Project documentation scaffold (
README.md) - License file (
LICENSE) under AGPLv3 - Notice file (
NOTICE.md) for third-party licenses .gitignorefor Python and dbt projects- Placeholder
requirements/base.txt - Initial backend support for DuckDB (
requirements/duckdb.txt) - Base
dbt_project/folder