Skip to content

Changelog

Changelog

All notable changes to this project will be documented in this file.
This project adheres to Semantic Versioning and Keep a Changelog.


📈 For the full roadmap, see Project Readme

🧾 Licensed under the AGPL-v3 — open, governed, and community-driven.
💡 elevata keeps evolving — one small, meaningful release at a time.


[Unreleased]

TBD


[1.4.1] - 2026-03-01

This patch release introduces and stabilizes elevata’s dialect-specific SQL keyword registry.

The focus of this release is deterministic, dialect-owned handling of reserved identifiers across all supported warehouses.


✨ Added

Deterministic SQL Keyword Registry

  • Introduced dialect-specific reserved keyword modules
    (rendering/dialects/keywords/*.py)
  • Added elevata_generate_reserved_keywords management command
  • Engine-truth extraction for:
  • Databricks (sql_keywords())
  • DuckDB (duckdb_keywords())
  • Documentation-based extraction with strict validation for:
  • PostgreSQL
  • Snowflake
  • MSSQL
  • Fabric Warehouse
  • BigQuery
  • Sanity checks guarding against incomplete or malformed keyword sets
  • Optional core SQL overlay for defensive fallback safety

🔄 Improved

  • Fully restored dialect-owned SQL rendering for keyword extraction
  • Ensured no vendor-specific SQL remains in management commands
  • Stabilized documentation parsing across vendor doc layouts
  • Harmonized quoting behavior across all dialects

🔒 Architectural Integrity

  • Keyword extraction is deterministic and reproducible
  • All SQL used in extraction is owned by the dialect layer
  • Identifier quoting behavior is now fully dialect-driven

[1.4.0] - 2026-02-28

This release significantly expands elevata’s ingestion capabilities (REST + Files)
while further strengthening deterministic, dialect-owned execution semantics
across all supported warehouses.


✨ Added

Ingestion Framework (RAW)

  • Native RAW ingestion for REST APIs and file-based sources via SourceDataset.ingestion_config
  • Support for CSV, JSON, JSONL, Excel and Parquet sources
  • Environment variable expansion for file URIs (e.g. ${ELEVATA_INGEST_ROOT}/...)
  • CSV parsing options (delimiter, quotechar, encoding)
  • Excel ingestion options (sheet_name / sheet_index, header_row, max_rows) with strict validation
  • Chunked file processing for large datasets with accurate total row tracking
  • Standardized RAW landing with preserved JSON payload (system role)
  • Deterministic JSON path–driven column mapping for REST and JSON ingestion

Ingestion Routing

  • Unified ingestion dispatcher as single entry point (ingest_raw_for_source_dataset(...))

✨ Improved

Dialect-Owned Merge Rendering

  • Refactored merge rendering to delegate SQL shape entirely to dialects
  • load_sql now provides semantic ingredients only (source select, keys, columns)
  • Cross-platform MERGE fallbacks where native MERGE is not available
  • Improved dialect diagnostics and merge validation behavior

Fully Executable Historization (SCD Type 2)

  • History datasets (*_hist) now generate execution-ready SQL
  • Historization pipeline fully dialect-owned (incremental history primitives)
  • Strict validation for required SCD technical columns and INSERT alignment

🛠️ Fixed

  • SQLAlchemy 2.0 execution compatibility in ingestion landing paths
  • Correct cumulative row reporting for chunked RAW ingestion
  • Cross-dialect merge, historization and delete-detection edge cases
  • BigQuery hashing stability and literal rendering correctness
  • Windows file URI normalization for file-based ingestion

[1.3.1] - 2026-02-18

This release stabilizes the Airflow example environment and resolves several issues
that could prevent successful execution when testing elevata with external target
systems.

The focus of this release is usability and reliability of the Airflow example setup,
ensuring that users can immediately validate elevata orchestration against their own
target database without requiring additional infrastructure.


✨ Improved

Airflow Example Stability

  • Airflow example now installs backend-specific dependencies at container startup
    instead of build time
  • Backend selection is now driven entirely by environment configuration via ELEVATA_SQL_DIALECT
  • Removed hardcoded backend assumptions from Dockerfile and docker-compose
  • Airflow containers now run elevata inside an isolated virtual environment,
    preventing dependency conflicts with Airflow itself
  • Improved compatibility with Databricks, PostgreSQL, MSSQL, Snowflake,
    BigQuery and other supported targets

Configuration & Environment Handling

  • Environment configuration now consistently sourced from .env
  • Removed docker-compose defaults overriding user configuration
  • Simplified switching between target systems without rebuilding images
  • Improved portability for local testing and Open Source usage

Dependency Handling

  • Backend-specific requirements are installed only when required
  • Prevented SQLAlchemy version conflicts with Airflow dependencies
  • Optional dialect imports (e.g. DuckDB) no longer fail when backend is not active

🛠️ Fixed

  • Airflow example failing to start due to dependency conflicts
  • SQLAlchemy downgrade caused by backend requirement installation
  • Incorrect backend loading when multiple dialects were present
  • Environment variable precedence issues between .env,
    docker-compose defaults and container runtime
  • Multiple startup issues related to entrypoint execution
  • Improved robustness of dialect imports across environments

🔧 Internal

  • Refactored Airflow entrypoint initialization logic
  • Added backend installation stamp mechanism to avoid repeated installs
  • Improved separation between Airflow runtime dependencies
    and elevata execution dependencies

[1.3.0] - 2026-02-17

This release introduces execution safety improvements, deterministic schema evolution,
and orchestration-level predictability enhancements.

The focus of this release is long-term stability and reproducible execution behavior
across platforms, rather than expanding modeling capabilities.

Execution planning, schema evolution, and load execution are now aligned around a
deterministic preflight model that guarantees predictable outcomes before execution begins.


✨ Added

Execution Safety & Predictability

  • Introduced preflight validation phase before execution
  • Deterministic failure behavior for unsafe schema evolution scenarios
  • Execution now blocks before SQL execution when unsafe changes are detected
  • Consistent error vs warning classification during materialization planning
  • Improved transparency of execution behavior and failure causes

Schema Evolution & Type Drift Handling

  • Canonical datatype comparison across all supported dialects
  • Type drift detection during materialization planning
  • Drift classification into:
  • equivalent
  • widening (safe)
  • narrowing / incompatible (unsafe)
  • Automatic schema evolution for safe widening changes
  • Dialect-aware ALTER COLUMN generation where supported
  • Deterministic rebuild fallback for dialects without ALTER support
  • Consistent schema evolution behavior across Snowflake, Databricks, PostgreSQL,
    MSSQL and DuckDB

Materialization & Execution Architecture

  • Materialization split into planning and application phases
  • MaterializationPlan now explicitly represents:
  • required DDL steps
  • warnings
  • blocking errors
  • Structural synchronization of rawcore _hist datasets aligned with base datasets
  • Improved consistency between schema introspection, metadata, and execution behavior

Orchestration & Execution Planning

  • Metadata-driven execution manifest generation
  • Deterministic dependency graph for dataset execution order
  • Airflow example DAG for lineage-based execution orchestration
  • Execution planning fingerprint for reproducibility validation

Query Builder UX Improvements

  • Query Builder now displays inline validation and dependency conflict messages
    directly in the editing context instead of failing silently
  • Blocking mutations caused by downstream dataset dependencies are surfaced
    as contextual warnings in the grid UI
  • Inline editors remain open when a mutation is rejected, allowing users to
    immediately correct input
  • Conflict warnings are automatically cleared when cancelling or completing edits
  • Improved transparency when renaming or modifying query-derived columns
    referenced downstream

🛠️ Fixed

  • Multiple edge cases where schema drift could result in inconsistent execution behavior
  • Incorrect handling of schema evolution across dialect-specific type representations
  • Improved stability of materialization planning for renamed datasets and columns
  • Various internal consistency improvements in execution planning and validation

🔒 Governance & Determinism

  • Execution behavior is now fully determined before execution begins
  • Unsafe schema changes are blocked deterministically
  • Reduced risk of partial schema application or inconsistent load states
  • Improved alignment between metadata contracts and physical schema evolution

[1.2.0] - 2026-02-08

This release introduces a major upgrade to Query Builder contract handling,
query-derived schema synchronization, and datatype inference.

It significantly improves determinism, usability, and correctness of
metadata-driven query modeling while simplifying internal typing logic.

✨ Added

Multi-platform execution support

  • Added native execution dialects for:
  • Snowflake
  • Databricks SQL (Unity Catalog)
  • Microsoft Fabric Warehouse
  • Unified execution semantics across platforms using the same metadata model
  • No platform-specific modeling required
  • Architecture defined once, executed consistently across engines

This release marks the first version where elevata execution semantics are aligned across
multiple modern cloud data warehouse platforms.

Query Builder & Contract Handling

  • Query-derived TargetColumns are now fully synchronized with the Query Tree output contract
  • Automatic creation, rename, update and deletion of query-derived columns
  • Contract-based schema alignment between:
  • SQL preview
  • logical plan
  • materialized dataset schema
  • Aggregate nodes now redefine output contracts explicitly as:
  • group keys + measures only
  • Window and Aggregate operators expose correct upstream input columns
    without requiring intermediate target column assignment

Datatype inference

  • Deterministic datatype inference for:
  • window functions (ROW_NUMBER, RANK, DENSE_RANK)
  • aggregate measures (COUNT, SUM, MIN, MAX, AVG)
  • COUNT and ranking functions now default to INTEGER instead of BIGINT
  • Datatypes inferred from upstream input columns where possible
  • Canonical datatype resolution aligned with DATATYPE_CHOICES

Query Builder UX improvements

  • Aggregate editor now shows input dataset columns directly
  • Window and Aggregate nodes operate on input contracts instead of materialized target schema
  • Eliminates unnecessary intermediate column creation

Databricks execution improvements

  • Improved raw ingestion performance for Databricks SQL Warehouse by batching multi-row INSERT execution
  • Eliminates per-row execution overhead caused by connector-level executemany behavior
  • No changes required to metadata models or query definitions

🛠️ Fixed

  • Window columns incorrectly defaulting to STRING datatype
  • Aggregate measures not inheriting input column datatype
  • Contract sync overriding inferred datatypes
  • Missing input columns in Aggregate editor selection
  • Incorrect system-managed flag for query-derived columns
  • Multiple contract sync race conditions caused by partial state inference

🔒 Governance & Determinism

  • Query contract inference now consistently evaluates against query_head
  • Dataset schema always reflects effective query output contract
  • Deterministic datatype normalization across query-derived columns
  • Reduced risk of schema drift between metadata and generated SQL

[1.1.0] - 2026-01-31

This release introduces a major upgrade to the Query Builder and UNION workflow,
together with critical stability and governance improvements across metadata,
schema sync, and historization.

✨ Added

Query Builder & UNION workflow

  • Full UNION operator support with:
  • Branch management
  • Output schema contracts
  • Column mappings per branch
  • Guided UNION toolbar with:
  • Output / Branch navigation
  • Schema copy from branch
  • Auto-map by name
  • Validation and “Set as head”
  • Contract snapshot with input → output diff
  • Determinism and governance indicators (ordering, window/aggregate checks)
  • Contextual node summaries (SELECT, AGGREGATE, WINDOW, UNION)

Metadata-driven governance

  • Clear separation of query_root (anchor) and query_head (current output)
  • Head-based validation, SQL preview and contract inference
  • Guardrails for destructive operations with downstream dependents

Schema sync & historization

  • Robust metadata-driven rename propagation via former_names
  • Orphan preservation in _hist (inactive, detached, not dropped)
  • Safe FK reference deletion that never removes hist columns
  • Technical tail columns always appended last
  • Deterministic ordinal normalization for system-managed schemas

🛠️ Fixed

  • UNIQUE constraint violations in hist regeneration
  • Broken rename propagation across rawcore → hist
  • False-positive orphan deletions
  • Inconsistent candidate column inference in query builder
  • UNION validation / navigation not triggering correctly
  • Output schema copy errors due to field mismatches
  • Multiple signal / sync race conditions

🔒 Governance & Safety

  • UNION validation enforces compatible schemas
  • Set-as-head explicitly marks final dataset output
  • Defensive rebuild of hist metadata with key lineage preservation
  • Deterministic ordering guarantees for window and aggregate functions

[1.0.0] - 2026-01-25 — First stable release

This release marks the first stable, backwards-compatible version of elevata.

elevata has evolved from a SQL generation layer into a metadata-driven, deterministic data platform engine.
From this version onwards, interfaces, metadata structures and execution semantics are considered stable.

✨ Highlights

🪄 Query Builder (Major Feature)

  • Explicit, metadata-driven query planning with SELECT, AGGREGATE, WINDOW and UNION nodes
  • Deterministic query execution via a formal Query Tree
  • Clear separation between query structure and generated SQL
  • Guided UI for building and evolving queries without writing SQL
  • SQL preview reflects the exact executed query

🧠 Query Contracts

  • Explicit, inspectable output schema contracts
  • Field-level validation and error reporting
  • Early detection of incompatible unions, missing mappings and invalid transformations

🧬 Lineage & Explainability

  • End-to-end lineage across datasets, fields and query nodes
  • Full traceability from source to serving layer
  • Query tree visualization for complex transformations

🔒 Determinism & Governance

  • Deterministic execution guarantees for aggregates and window functions
  • ORDER BY / PARTITION BY governance with clear warnings and errors
  • Clear distinction between execution errors and policy violations

🧩 Platform Maturity

  • Stable execution semantics
  • Backwards compatibility guarantees from this release onwards
  • Foundation for orchestration and governance integrations in future releases

[0.9.1] – 2026-01-14

✨ Improved

  • Refined lineage visualization with configurable multi-hop upstream and downstream views
  • Clearer lineage semantics distinguishing direct inputs from extended execution dependencies
  • Improved lineage UX with consistent ordering and scope labeling

🧠 Serving Layer

  • Serving datasets now support presentation-oriented identifiers (friendly names)
  • Dataset and column naming in Serving layer allows casing, spaces, and special characters
  • Identifier handling is dialect-aware and uses proper quoting where required

🛡️ Validation & Health

  • Extended metadata validation to distinguish blocking vs advisory findings
  • Improved health checks for Serving and Bizcore datasets
  • Validation logic consolidated and aligned across layers

🧩 Internal

  • Improved consistency between lineage analysis, validators, and UI
  • Minor internal cleanups in metadata services and views

[0.9.0] – 2026-01-12

🧠 Bizcore: Business Semantics as First-Class Metadata

This release introduces Bizcore, a dedicated layer for modeling
business meaning, rules, and calculations as explicit metadata —
executed deterministically alongside technical datasets.

Bizcore makes elevata business-capable by design, without introducing
BI-style semantic layers or query-time abstractions.

✨ Added

  • Bizcore datasets and columns as first-class metadata objects
  • Multi-upstream join support for Bizcore datasets
  • UI support for building and validating Bizcore structures
  • Deterministic SQL preview for Bizcore datasets
  • Lineage-driven qualification of expressions and joins
  • End-to-end traceability from Core → Bizcore → Serving

🔄 Changed

  • SQL generation now fully respects semantic lineage in expressions
  • Join aliasing and qualification are applied consistently across layers

🧪 Quality & Stability

  • Extensive validation of join correctness and expression rendering
  • Scoped and non-scoped UI flows aligned under a single metadata model
  • No breaking changes to existing Raw, Stage, or Core pipelines

✨ Improved

  • Manual expressions now automatically qualify unaliased column references
    with the correct input alias during SQL generation.
    This ensures consistent, unambiguous SQL for Bizcore calculations
    without requiring users to manually prefix column names.

This release marks a major milestone: elevata now supports
explicit business semantics as metadata, not as BI-layer logic.


[0.8.0] – 2026-01-04

⚙️ Execution & Orchestration as First-Class Architecture

This release introduces an explicit, metadata-driven execution model,
establishing orchestration, failure semantics, and observability as first-class concerns in elevata.

Execution is now planned, executed, and explained independently of SQL generation,
providing a robust foundation for platform-native orchestration and governance.

✨ Added

  • Explicit Execution Plan model separating planning from execution
  • Dependency-graph–based dataset execution with deterministic ordering
  • Multi-dataset batch execution with a shared batch_run_id
  • Structured execution policies (continue_on_error, max_retries)
  • Retry semantics with per-attempt tracking (attempt_no)
  • Distinct failure semantics:
  • blocked (dependency-based non-execution)
  • aborted (policy-based fail-fast non-execution)
  • Load Run Snapshot (meta.load_run_snapshot)
  • Batch-level, JSON-based execution state
  • Captures plan, policy, dependencies, and aggregated outcomes
  • Extended Load Run Log (meta.load_run_log)
  • Orchestration-only events (blocked / aborted)
  • Best-effort, non-blocking meta logging
  • CLI execution diagnostics:
  • Execution snapshot printing (--debug-execution)
  • Snapshot persistence (--write-execution-snapshot)
  • Deterministic BigQuery table qualification for execution and metadata writes
    (prevents sporadic cross-project NotFound errors during streaming inserts)
  • Global execution modes:
  • single-dataset execution with dependencies (default)
  • platform-wide execution in deterministic order (--all)
  • optional schema-scoped execution (--schema)

🔄 Changed

  • Execution semantics are no longer implicit in SQL or CLI flow
  • Load execution is now driven by an explicit execution model
  • Fail-fast behavior is deterministic and explicitly reported
  • Execution observability is metadata-first and dialect-agnostic

🧪 Quality & Stability

  • Extensive unit tests for execution ordering, retries, fail-fast, and blocking
  • Guardrails for orchestration-only events and best-effort persistence
  • Clear separation of execution core vs CLI and dialect adapters
  • No destructive changes to existing materialization or SQL generation logic

This release establishes elevata as a self-orchestrating, explainable
data platform core
, laying the groundwork for native scheduling,
governance rules, and external orchestration integrations.


[0.7.1] – 2025-12-29

🧱 Metadata-Driven Schema Evolution

This release completes and stabilizes the first materialization layer for
safe, deterministic schema evolution in target warehouses.

Schema changes are now derived explicitly from metadata and applied in a
controlled, lineage-aware manner — without implicit inference from SQL.

✨ Added

  • Metadata-driven materialization planning for target datasets
  • Automatic provisioning of missing target tables
  • Deterministic column synchronization (additive, non-destructive)
  • Explicit handling of dataset and column renames via former_names
  • Lineage-aware propagation of renames into history (_hist) datasets
  • Deterministic INSERT … (column list) generation across all dialects
  • DuckDB-native introspection via PRAGMA with execution-engine consistency

🔄 Changed

  • Materialization is now planned separately from SQL rendering
  • Table existence is determined by effective provisioning steps, not schema creation alone
  • Incremental loads reliably auto-provision target tables when required
  • History datasets are provisioned deterministically and stay structurally aligned with base tables

🧪 Quality & Stability

  • Extensive unit tests for materialization planning and rename scenarios
  • Guardrails for ambiguous rename situations (multiple former matches)
  • Improved separation of introspection vs execution concerns
  • Removed duplicate provisioning paths and race conditions

This release lays the foundation for controlled schema evolution,
future governance rules, and automated validation layers.


[0.7.0] – 2025-12-21

Added

  • Dataset-driven, lineage-aware execution with automatic dependency resolution
  • Unified RAW execution semantics via physical ingestion (Source → RAW)
  • Stable technical column model across all layers
  • BigQuery execution backend with native ingestion support
  • Dialect-aware hashing and surrogate key generation (BigQuery, DuckDB, Postgres, MSSQL)
  • SourceDataset-level static and incremental filters with runtime {{DELTA_CUTOFF}} resolution

Changed

  • Execution operates on datasets rather than layers
  • RAW datasets are treated as an optional ingestion layer
  • RAW ingestion always rebuilds tables, while source extraction may be incrementally scoped

Fixed

  • Load plan debug output and execution logging consistency
  • Signal handling for historization execution order
  • Correct application of incremental filters during ingestion and delete detection
  • Lineage-based translation of incremental scope filters across renamed columns
  • Cross-dialect consistency for incremental execution (DuckDB, MSSQL, Postgres, BigQuery)

[0.6.1] – 2025-12-15

Fixed

  • Correct introspection of SQL Server alias types (e.g. dbo.Name, dbo.Flag)
  • Proper handling of bit columns during ingestion (no fallback to string types)
  • Correct precision and scale mapping for money and smallmoney columns
  • Stable and deterministic column ordering during metadata import

Improved

  • Lossless ingestion of source datatypes via source_datatype_raw
  • Strict, fail-fast dialect-specific type rendering to prevent silent fallbacks

Notes

  • This release significantly improves correctness for SQL Server as a source system.
  • Re-importing source metadata is recommended to benefit from the improved typing behavior.

[0.6.0] – 2025-12-14

🚀 Warehouse-Native Execution & SCD Historization

This release introduces the foundation for a fully warehouse-native execution framework.
elevata now manages entire data load pipelines end-to-end — from metadata to SQL generation to execution, historization and observability.

✨ Major Features

1. Execution Engine (--execute)

elevata can now execute rendered SQL directly against target systems, measure performance, record affected rows, and log complete run metadata.
This shifts elevata beyond SQL rendering into a full pipeline engine.

2. Full SCD Type 2 Historization

A deterministic, metadata-driven historization framework:
- automatic change detection via row-hash
- version closing for changed and deleted keys
- insertion of new and changed versions
- lineage-aware attribute propagation

3. Metadata-Driven Incremental Merge Loads

Complete incremental pipeline including:
- new-row inserts
- changed-row updates
- delete detection
- MERGE or UPDATE+INSERT fallback depending on dialect

4. Auto-Provisioning of Warehouse Structures

elevata can automatically create:
- target schemas (raw, stage, rawcore, ...)
- the meta.load_run_log table
- all required objects for execution and logging

Controlled via .env flags.

5. Warehouse-Level Load Logging

A new table meta.load_run_log provides full observability into load executions:
- load mode, historization flags, dialect
- start/end timestamps, render/execution duration
- rows affected, error messages, status
- batch and run identifiers

6. Documentation Expansion

  • New historization architecture document
  • Extended execution, logging, and provisioning sections
  • Revised dialect and SQL generation chapters

🧪 Testing Improvements

  • Deterministic SQL tests for merge and historization pipelines
  • Combined historization pipeline tests
  • Prepared E2E execution flow for dialect-specific execution engines

This release establishes the execution foundation on which future orchestration, validation and automation layers will be built.


[0.5.3] — 2025-12-10

🔹 Historization Structure & Dialect Engine Enhancements

This release completes the metadata foundation required for full historized incremental loading in v0.6.0. It finalizes *_hist dataset structure, ensures cross-dialect consistency, and extends SQL rendering to use dialect-driven identifier rules.

✨ Highlights

Metadata / Historization

  • Automatic creation and maintenance of <dataset>_hist datasets in RAWCORE
  • Full rename propagation for datasets and columns
  • All *_hist fields are system-managed and read-only
  • New technical field in RAWCORE: row_hash for change detection (persisted expression)
  • Versioning strategy established:
  • version_started_at inclusive, version_ended_at exclusive
  • open-ended validity via max timestamp
  • version_state (current, changed, deleted)

SQL Generation / Dialects

  • Unified render_identifier() and render_table_identifier() for consistent quoting
  • All SQL generation now uses dialect identifier rendering
  • Delete detection routing tested and guarded per dialect capability

Load Runner

  • elevata_load supports --execute with safe stub execution via ExecutionEngine
  • Logging improvements and full dry-run support remain functional

Testing & Stability

  • Expanded test coverage for historization and dialect routing
  • Full suite green across merge, delete detection & *_hist scenarios

[0.5.2] — 2025-12-07

🛠️️ Metadata stability & History (HIST) foundation

This release significantly improves the robustness, determinism, and safety of history metadata generation in the RAWCORE schema.

✨ Highlights

Metadata / Historization

  • Deterministic generation of *_hist datasets based on lineage_key.
  • Robust schema sync between RAWCORE and *_hist (idempotent, safe deletes).
  • History SK expression based on rawcore SK + version_started_at.
  • History BK definition: rawcore SK + version_started_at.
  • History datasets and columns are fully system-managed (no UI unlock).

Signals & UI

  • Automatic *_hist sync on dataset rename and column changes in rawcore.
  • Inline rename refreshes both rawcore and corresponding *_hist rows.
  • Inline editing is disabled for *_hist datasets and columns.

SQL Preview

  • build_sql_preview_for_target returns a clear comment for history targets instead of misleading SQL.
  • Tests added to guard the _hist-preview behaviour.

[0.5.1] — 2025-12-04

🧹 Documentation & Consistency Release

This patch focuses on improving the clarity, coherence, and structure of elevata’s developer documentation.

✨ Highlights

  • Full harmonization of all architecture documents
  • Removal of outdated version references and legacy wording
  • Unified heading and layout style across all Markdown files
  • Consistent terminology for LogicalPlan, Expression DSL, Dialects, and Load SQL
  • Improved mkdocs navigation structure
  • Minor text corrections and consistency fixes across the docs

🚫 No functional changes

This release does not modify the SQL engine, metadata model, or any public API surface.
All test suites remain unchanged and green.


[0.5.0] — 2025-12-01

🛠️️ Multi-Dialect Engine, MSSQL Support & Deterministic FK Hashing

This release delivers the next major milestone of elevata’s SQL engine: full multi-dialect SQL generation, an extensible dialect factory, runtime dialect switching in the UI, and a complete rewrite of the surrogate-key and foreign-key hashing system using a vendor-neutral DSL AST.


🚀 Major Features

1. Multi-Dialect SQL Rendering (Postgres, DuckDB, MSSQL)

  • New pluggable dialect architecture (SqlDialect, dialect_factory).
  • Three fully operational dialects:
  • DuckDBDialect
  • PostgresDialect
  • MssqlDialect (new)
  • Centralised dialect registry & runtime resolution via:
  • profile
  • env (ELEVATA_SQL_DIALECT)
  • URL parameter in SQL preview

All SQL generation (preview + Load Runner) now passes through a unified, dialect-aware pipeline.


2. SQL Preview Dialect Selector (UI)

  • New dropdown in TargetDataset detail view.
  • Instant SQL refresh via HTMX request.
  • Clean display of dialect-specific SQL functions (quoting, hashing, concat, types).

3. Deterministic, Cross-Dialect Hashing via DSL AST

A full rewrite of surrogate-key and FK hashing:

  • New DSL expression system (Hash256Expression, ConcatWsExpression, Literal, ColumnRef).
  • Dialect-specific SQL rendering happens exclusively in dialect classes.
  • Identical logical lineage yields identical hash values across vendors.
  • Fully deterministic ordering + null replacement semantics.
  • Clean child-lineage FK hashing:
  • BK1, child BK1, BK2, child BK2…
  • ~ and | literal separators, ordered alphabetically

All existing hashing tests green after the rewrite.


4. Multi-Source Stage Identity Mode

  • Correct logical union builder for Stage datasets with multiple upstream sources.
  • Clean identity (no ranking) vs. non-identity (ranking) handling.
  • Injected source_identity_id literal per upstream branch.
  • All multi-source identity tests fully passing.

5. Dialect-Aware FK Rendering

  • Parent surrogate keys and child FK keys now rendered via DSL → dialect.
  • MSSQL: CONVERT(VARCHAR(64), HASHBYTES('SHA2_256', …), 2)
  • Postgres: ENCODE(DIGEST(CONCAT_WS(...), 'sha256'), 'hex')
  • DuckDB: SHA256(CONCAT_WS(...))

🔧 Internal Improvements

  • Entire builder.py cleaned, simplified, and refactored.
  • Unified render_select_for_target() and load-SQL paths.
  • Removed legacy manual hashing logic.
  • No raw SQL string assembly left in hashing pipeline.
  • Strict quoting rules per dialect.
  • Sauber extrahierte DSL operators (col(), lit(), concat_ws(), hash256()).

🧪 Testing

  • New tests:
  • test_dialect_postgres.py
  • test_hashing_dialects.py
  • test_fk_hashing.py
  • Full MSSQL hashing coverage
  • Updated test helpers for DSL AST inspection.
  • All Stage multi-source tests green after identity-mode rewrite.

📘 Documentation

  • Updated architecture docs:
  • Dialect System
  • SQL Rendering Conventions
  • Hashing Architecture
  • README modernised with new capabilities and architecture.

🧭 Roadmap Shift

With the 0.5.0 SQL backend complete, the next stage focuses on execution:

  • Load Runner CLI (Full, Merge, Dry-Run)
  • Caching & improved SQL formatting
  • Multi-source incremental merges
  • Additional dialects (Snowflake, BigQuery, Databricks)

Impact
Version 0.5.0 transforms elevata into a true multi-backend SQL generator
with deterministic hashing, dialect-specific rendering, and a stable architectural core
for future execution engines.


[0.4.0] — 2025-11-20

🧠 Dialect Architecture & Load SQL Modernization

This release marks a major leap for elevata:
a complete SQL dialect abstraction layer, a unified Load-SQL pipeline,
and extensive new documentation that sets the foundation for future multi-backend support.


🚀 Core Features

Fully Modular SQL Dialect System

A new, extensible dialect layer powers all SQL generation:

  • Central SqlDialect base class
  • Concrete DuckDBDialect reference implementation
  • Dialect resolution via ELEVATA_SQL_DIALECT, ELEVATA_DIALECT, and active profile
  • Dialect capabilities:
  • supports_merge
  • supports_delete_detection
  • Expression-level hooks:
  • concat_expression()
  • hash_expression()
  • cast_expression()
  • render_literal()

This architecture enables clean, vendor-neutral SQL generation for future backends
(Postgres, MSSQL, Snowflake, BigQuery, Databricks).


🔧 Load SQL Architecture 2.0

A fully redesigned, dialect-aware Load SQL engine:

Full Load

  • render_create_replace_table
  • render_insert_into_table
  • Uses dialect quoting, casting, literal handling

Incremental Merge Load

  • Native dialect-specific MERGE for DuckDB
  • Clean failure modes for dialects without merge support
  • Deterministic key handling
  • Automatic update/insert column mapping

Delete Detection

  • Dialect-specific implementation (DELETE … WHERE NOT EXISTS)
  • Guardrails when delete detection is requested but dialect does not support it

All Load SQL now flows through a single, coherent pipeline via load_sql.py.


🧪 Testing Enhancements

  • New test suite for:
  • literal rendering (NULL, booleans, strings, dates, datetimes)
  • cast expression rendering
  • concat & hash expression helpers
  • merge & delete detection dialect hooks
  • End-to-end tests for Full and Merge load generation
  • All tests green across the refactor

This ensures reliable future extensions to new SQL dialects.


📘 Documentation

Three major new documents added:

  • Dialect System — full architectural overview of dialect abstraction
  • Load SQL Architecture — how Full, Merge, and Delete Detection SQL are generated
  • Incremental Load Architecture — planner, merge semantics, delete detection

All are linked from: - index.md - README_docs.md - mkdocs.yml navigation


🔍 Internal Improvements

  • Harmonized get_active_dialect() with environment and profile resolution
  • Consolidated SQL preview and load paths to use the same dialect entrypoints
  • Removed legacy assumptions and duplicated logic
  • Fully revised DuckDB implementation as reference for new dialects

🗺️ Roadmap Impact

With 0.4.0 released, the following items shift to 0.5.x:

  • Target System Selector (Profiles → target backend)
  • Additional SQL dialects (MSSQL, Postgres, Snowflake)
  • Pseudo-Lineage Graph in UI
  • Multi-Source Incremental Loads
  • Load-Runner CLI

These features build directly on the new architecture introduced in 0.4.0.


Impact
Version 0.4.0 delivers the foundational SQL engine for elevata’s future:
clean, extensible, and ready for multiple SQL backends.
It stabilizes the path toward 0.5.x — where elevata becomes a multi-dialect metadata-driven ETL generator.


[0.3.0] — 2025-11-12

Lineage-Aware Target Generation & SQL Preview

🚀 Core Features

Lineage-Driven Target Generation - Added a stable lineage_key to both TargetDataset and TargetColumn: - Enables fully idempotent target generation. - Prevents duplicate targets after renaming (lineage_key is preserved). - TargetGenerationService.apply_all() refactored into modular steps: - Existing datasets are now matched and updated via lineage_key instead of physical names. - Clean dataset-level and column-level re-numbering during regeneration.

Three-Layer Data Lineage - Explicit dataset-level lineage: - TargetDatasetInput defines upstream relationships (source_dataset and/or upstream_target_dataset). - combination_mode (single or union) indicates how multiple inputs are combined.

  • Explicit column-level lineage:
  • TargetColumnInput mirrors the same relationships for individual columns.
  • upstream_columns now correctly map transformations between layers.

  • Layer-specific rules:

  • Raw = only source_datasets
  • Stage = prefers Raw as upstream (or Source directly if generate_raw_tables=False)
  • Rawcore = always built from Stage

Multi-Source Consolidation - New SourceDatasetGroup + SourceDatasetGroupMembership model: - Supports joining multiple SourceDatasets into a single Stage target. - The “primary system” flag defines which source drives column order. - TargetDatasetInput.role classifies inputs as: - primary, enrichment, reference_lookup, or audit_only.

Surrogate & Business Keys - Surrogate key columns are automatically renamed when their dataset is renamed
→ e.g. renaming rc_aw_productmodelrc_aw_product_model auto-renames the key column to rc_aw_product_model_key. - Surrogate key expressions now reference upstream column names (Raw or Stage), not renamed targets. - Deterministic column ordering: 1. Surrogate keys
2. Business keys
3. Integrated source columns
4. Artificial columns

Column Generation Enhancements - Automatic assignment of ordinal_position on save: - Newly created columns append at the end in numeric sequence. - Safe against manual reordering. - Integrated columns added after initial generation are correctly appended and re-numbered without violating unique constraints.


🧠 Logical Query Model & SQL Preview

Logical Plan Layer - New internal model (logical_plan.py) represents canonical SQL structure for a target dataset: - Supports LogicalSelect, LogicalUnion, LogicalExpression, and lineage mapping. - builder.py now constructs expressions (Surrogate Key, BK, and regular fields) from TargetColumnInput lineage. - Dialect-specific type mapping handled cleanly via map_logical_to_duckdb_type.

SQL Preview 2.0 - SQL preview now generates true lineage-based SELECT statements, e.g.: - Stage:
sql SELECT … FROM "raw"."raw_aw1_person" UNION ALL SELECT … FROM "raw"."raw_aw2_person" - Rawcore:
sql SELECT hash256(…) AS rc_aw_person_key, … FROM "stage"."stg_aw_person" - Automatic field alignment: - Columns missing in one upstream are rendered as NULL AS <column>. - Integrated columns retain their target aliases. - Supports both manual_expression and templated ({{ … }}) syntax. - New visually distinct green preview box in UI with proper formatting: - Keywords capitalized
- Indentation after SELECT
- Clean separation before FROM


🧩 UI, Governance & Behavior

  • Context-aware lineage display in detail views:
  • Source Datasets and Upstream Datasets shown based on layer.
  • Input relations now read like: raw_aw1_person · businessentityid -> stg_aw_person · businessentityid
  • System-managed field handling refined:
  • Layer-specific read-only fields controlled by settings.
  • lineage_key treated as an internal system field (hidden in forms and lists).
  • Surrogate key names locked for user editing but updated automatically when renaming datasets.

🧪 Testing & Quality

Structured Testing Foundation - Introduced the first complete automated test framework for the metadata generation platform. - Added dedicated runtests.py launcher for reliable execution across environments. - Integrated realistic DB-based lineage tests (Raw → Stage → Rawcore). - Added logic-only tests for hashing, naming, and validators. - Prepared SQL Preview test templates for the future rendering pipeline. - New documentation: 🧪 Testing & Quality

Impact
This milestone establishes a solid foundation for test coverage,
ensuring safe refactoring, reproducibility, and confidence in every release.


[0.2.6] — 2025-11-03

⚙️ Target Generation & Surrogate Key Implementation

Core Features - Introduced fully automated TargetDataset and TargetColumn generation service (TargetGenerationService). - Deterministic surrogate key creation using SHA-256 and runtime-loaded pepper. - Added business_key_column and surrogate_key_column flags to differentiate logical vs. physical keys. - Layer-aware naming now based on TargetSchema.physical_prefix (no hardcoded prefixes). - Integrated filtering: only integrate=True columns included across all layers.

UI Enhancements - Added “Generate Targets” button to SourceDataset list with progress spinner & success message. - Improved error feedback and runtime validation for pepper and target schema scope. - Consistent Bootstrap iconography (bi-lightning-charge) and visual feedback for active operations.

Technical Refinements - Surrogate key expressions persisted in metadata for transparency and traceability. - Environment-based pepper resolution via .env and get_runtime_pepper(). - Refactored naming logic (naming.py, rules.py, mappers.py) for consistent layer-specific conventions.

Impact
This release completes the Target Automation foundation for elevata —
paving the way for v0.3.0’s Meta-SQL and rendering engine. 🚀


[0.2.5] — 2025-10-27

🧩 Metadata Model Finalization & UI Polish

Core Enhancements
- Completed redesign of the core metadata model — fully aligned with the 0.3.x architecture.
- Added TargetSchema as a first-class model defining platform layers (raw, stage, rawcore, bizcore, serving).
- Introduced TargetDatasetInput and TargetColumnInput for multi-source mappings and lineage tracking.
- Added lifecycle flags (active, retired_at) for controlled dataset and column deprecation.
- Simplified incremental-load logic (increment_filter placeholder on SourceDataset).
- Unified naming conventions (*_schema_name, *_dataset_name) across all models.
- Extended governance primitives (sensitivity, access_intent) and surrogate-key configuration per layer.
- Removed obsolete fields (get_metadata, stage_dataset, etc.) and harmonized field semantics.

UI & Usability
- Introduced SourceDatasetGroup for managing groups of structurally identical source tables.
- Added governance badges and toggles for better lineage and visibility cues.
- Revised navigation order for more natural workflows.
- Improved help texts, icons, and consistent color themes across all metadata entities.

Impact
This release finalizes the metadata foundation for elevata — stable enough for automation development in 0.3.x.
No breaking structural changes expected before 0.3.0.


🪶 UI Comfort Continuation

  • Unified color scheme for governance badges (badge-pii-high, badge-pk, …).
  • Improved hover feedback and spacing in list views.
  • All badges defined declaratively via ELEVATA_CRUD — no model-specific logic required.
  • Updated elevata-theme.css for consistent badge geometry and hover states.

Why it matters
Version 0.2.5 concludes the “Model & Comfort” milestone:
the framework now combines a stable metadata core, polished UI, and ready groundwork for automated target generation in 0.3.x. 🚀


[0.2.4] — 2025-10-26

Strategic Documentation & Architecture Alignment

This release finalizes the strategic and architectural foundation for the upcoming metadata model freeze (v0.3.x).
It does not yet include model changes — instead, it defines the why and how for the next major milestone.

Highlights - New and refined README with philosophy, vision, and AGPLv3 licensing - Updated roadmap outlining the transition toward declarative architecture - Strategic dbt decoupling paper, defining the new “governed SQL through architecture” direction - Preparations for TargetSchema, TargetDatasetReference, and deterministic key generation to follow in v0.3.x

Why it matters
This release marks the calm before the model storm — the documentation is ready, the vision is clear, and the next step is building it. 🚀

[0.2.3] – 2025-10-25

🪶 UI Comfort Release

Highlights - Added generic, reusable filter bar for all CRUD list views
- Added dynamic toggle buttons for boolean fields
- Improved badge rendering for PII & PK indicators
- Added sticky table headers for long datasets

Why it matters
This release focuses purely on usability and governance visibility.
It lays the groundwork for 0.3.0 (TargetDataset automation and lineage features).


[0.2.2] – 2025-10-25

🧹 Maintenance Release — dbt Dependency Cleanup

Summary

This minor maintenance release removes all remaining dbt-related artefacts and clarifies elevata’s independent direction ahead of the 0.3.x milestone.

🔧 Changes

  • Removed unused dbt_project/ folder from repository.
  • Deleted all DBT_* variables from .env and example configuration files.
  • Removed dbt references from NOTICE.md and documentation.
  • Updated README.md and dbt_decoupling.md to reflect full runtime independence.
  • Adjusted Roadmap and strategy wording in CHANGELOG.md (dbt now optional adapter, not dependency).
  • Minor documentation clean-ups and license consistency fixes (MIT → AGPL v3 in trademark notice).

💡 Notes

This release does not introduce new features but marks an important architectural boundary: elevata ≥ 0.2.2 operates entirely without dbt or its configuration files.
The foundation for native rendering and execution begins with v0.3.x.


[0.2.1] – 2025-10-23

🪶 Improved

  • Added truncation for long text fields (Description, Remark) in list views to improve readability.
  • Full text now appears on hover for better UX.
  • Refined visual highlighting for primary and integrate columns.
  • Minor CSS polish and layout consistency fixes across metadata tables.

[0.2.0] - 2025-10-22

🧩 Metadata Introspection & Profiles Integration

Overview

This release marks a major milestone – elevata now connects to relational sources via SQLAlchemy and imports full schema metadata directly into its core models. The new profile and secret management architecture lays the foundation for secure, declarative, and environment-aware metadata operations.


🚀 Highlights

  • Generic Metadata Import via SQLAlchemy
  • Engine factory supporting multiple relational backends (MSSQL, Postgres, SQLite).
  • Reads column definitions, data types, PK information from SourceDataset entries.
  • Automatic datatype normalization across dialects (e.g. NVARCHAR → STRING, BIT → BOOLEAN).

  • Flexible Secrets & Profiles

  • Unified elevata_profiles.yaml config with environment-based secret resolution.
  • Connection references derived convention-based from type and short_name.
  • Optional Azure Key Vault integration.

  • Security & Configuration

  • Sensitive data never stored in the database.
  • Secrets resolved dynamically at runtime via .env or Key Vault.
  • Clear separation of metadata and operational configuration.

  • Developer Experience

  • Simplified connector interfaces and improved error reporting.
  • Cleaner model relationships for SourceSystem and SourceDataset.
  • New code organization: connectors.py, resolver.py, ref_builder.py.

🧭 Next: v0.3.0 will focus on automated target model generation and metadata lineage.


🧠 Technical Notes

  • Fully decoupled from dbt profiles; all runtime connections and secrets resolved through elevata_profiles.yaml.
  • All SQL renderers return expressions as plain text templates — ready for downstream ELT tools or custom runners.
  • Surrogate Key hashing implemented engine-specifically (Postgres pgcrypto / MSSQL HASHBYTES).
  • Supports per-profile Overrides for multi-DB systems (e.g. sap1, sap2sap).
  • Improved ordering, idempotency and error reporting in import and generation routines.

[0.1.1] - 2025-10-19

🪶 UI Polish & PostgreSQL Power

Overview

A refinement release that makes elevata smoother and more flexible:
a polished Django UI meets full PostgreSQL support — available via Docker or your own setup.
Better visuals, faster workflows, and real database choice.


✨ Improvements

  • UI & UX Enhancements
  • Polished Django interface with cleaner layouts and spacing
  • Improved responsiveness and overall visual consistency
  • Optimized inline interactions and usability tweaks

  • Database Support

  • Full PostgreSQL backend support
  • Works with Docker Compose or a user-provided instance
  • Updated settings for seamless configuration and migrations

  • Developer Experience

  • Simplified environment setup (SQLite or PostgreSQL)
  • Improved local testing through Docker Compose

[0.1.0] - 2025-10-14

🧩 Metadata Management Comes Alive

Overview

This release marks a major milestone:
elevata now provides a fully functional, metadata-driven web interface for managing your data platform’s core structures — built with Django, HTMX, and a clean Bootstrap 5 theme.

It’s the first end-to-end usable version:
from user login → to inline editing → to audit tracking — all running securely and responsively out of the box.


🚀 Highlights

  • Complete Metadata Management Module
  • Inline CRUD with audit fields and user tracking
  • Automatic URL & view generation for all models

  • Modern UI & UX

  • Responsive elevata theme (Bootstrap 5.3)
  • Autofocus & usability improvements for inline editing
  • Unified form and grid styling

  • Security & Reliability

  • Integrated authentication (login, logout, password change)
  • Safe CSRF handling for all HTMX requests

  • Developer Experience

  • Default SQLite backend for easy setup
  • Clean folder structure: core/, metadata/, dbt_project/
  • Ready for future extensions (PostgreSQL, dbt, etc.)

[0.0.1] - 2025-10-06

Added

  • Project documentation scaffold (README.md)
  • License file (LICENSE) under AGPLv3
  • Notice file (NOTICE.md) for third-party licenses
  • .gitignore for Python and dbt projects
  • Placeholder requirements/base.txt
  • Initial backend support for DuckDB (requirements/duckdb.txt)
  • Base dbt_project/ folder