⚙️ Logical Plan¶
This document describes the Logical Plan layer of elevata, which includes:
- Subqueries in the FROM clause
- Window Functions (ROW_NUMBER, etc.)
- Multi-source Stage logic (identity vs non-identity)
- Updated Select/Union/Source nodes
- Integration with the Expression DSL & AST
- Dialect-safe rendering
🔧 1. Purpose of the Logical Plan¶
The Logical Plan is the dialect-agnostic intermediate representation between:
Metadata → Logical Plan → Dialect Renderer → SQL
It describes what needs to be executed, not how a specific SQL dialect expresses it.
The Logical Plan:
- is fully structured (tree-based)
- has no vendor SQL
- is deterministic
- is safe for testing
- can be rendered to any SQL dialect (BigQuery, Databricks, DuckDB, Fabric Warehouse, MSSQL, Postgres, SNowflake)
🔧 2. Query Tree and Logical Plan¶
The Query Tree operates at a higher abstraction level than the Logical Plan.
- The Query Tree defines what operations occur and in which order
(e.g. SELECT → WINDOW → AGGREGATE). - The Logical Plan defines how these operations are represented as
structured SQL components (LogicalSelect, SubquerySource, LogicalUnion, …).
When custom query logic is enabled, the Query Tree is compiled into a Logical Plan
using the same builder infrastructure as default generation.
This design keeps the Logical Plan as a stable, vendor-neutral intermediate representation,
regardless of whether SQL is generated automatically or via an explicit Query Tree.
🔧 3. Core Node Types¶
🧩 3.1 LogicalSelect¶
Represents a SELECT statement.
Fields:
- select_list: list[SelectItem]
- from_: SourceNode (table source, subquery, union, etc.)
- where: Optional[Expr]
- group_by: list[Expr]
- order_by: list[OrderItem]
Example structure:
LogicalSelect(
select_list=[SelectItem(ColumnRef("a")), SelectItem(ColumnRef("b"))],
from_=TableSource("raw_sales", alias="r")
)
🧩 3.2 SelectItem¶
Represents a single column in the SELECT list.
Fields:
- expr: Expr
- alias: Optional[str]
🧩 3.3 Source Nodes¶
🔎 TableSource¶
TableSource(table_name, alias)
🔎 SubquerySource¶
Represents:
(SELECT ...) AS alias
Used for:
- multi-source Stage ranking
- derived tables
- complex transformations
🧩 3.4 LogicalUnion¶
Represents a UNION or UNION ALL.
Fields:
- selects: list[LogicalSelect]
- union_type: "ALL" | "DISTINCT"
Rendered as:
SELECT ...
UNION ALL
SELECT ...
often wrapped in a SubquerySource.
🔧 4. Window Functions¶
Window functions appear via:
WindowFunctionExpr(
function_name="ROW_NUMBER",
partition_by=[...],
order_by=[OrderItem(...)]
)
Example AST node inside a SelectItem:
SelectItem(
expr=WindowFunctionExpr(
function_name="ROW_NUMBER",
partition_by=[ColumnRef("business_key")],
order_by=[OrderItem(ColumnRef("updated_at"), "DESC")]
),
alias="__src_rank_ord"
)
Used primarily in Stage non-identity mode.
🔧 5. Subqueries in Multi-Source Stage¶
Multi-source Stage requires optional ranking logic:
🧩 Identity Mode¶
- Each upstream dataset contributes a
source_identity_idliteral. - No ranking is needed.
- We use raw UNION ALL:
LogicalUnion([sel1, sel2, ...])
🧩 No Subquery Required.¶
🧩 Non-Identity Mode¶
- UNION ALL is wrapped into a SubquerySource.
- Outer SELECT applies ROW_NUMBER and filters.
Logical Plan pattern:
LogicalSelect(
select_list=[...],
from_=SubquerySource(
select=LogicalSelect(
select_list=[..., WindowFunctionExpr(...) AS __src_rank_ord],
from_=LogicalUnion([...])
),
alias="ranked"
),
where=ComparisonExpr(ColumnRef("__src_rank_ord"), "=", Literal(1))
)
🔧 6. Integration With Expression DSL & AST¶
All expressions inside the Logical Plan use the DSL/AST layer:
- column references
- literals
- CONCAT/CONCAT_WS
- COALESCE
- HASH256
- window functions
This ensures cross-dialect consistency.
🔧 7. Dialect Rendering Responsibilities¶
Each dialect must render:
- SELECT lists
- window functions
- subqueries
- unions
- column references with quoting rules
- expressions via the DSL AST
Example dialect responsibilities:
🧩 BigQuery¶
- window functions: identical to ANSI
- hashing:
TO_HEX(SHA256(<expr>)) - table addressing: dataset-qualified identifiers
🧩 Databricks¶
- window functions: identical to ANSI / Spark SQL
- hashing:
SHA2(<expr>, 256) - execution: runs on Databricks SQL Warehouse (Unity Catalog for catalog/schema)
🧩 DuckDB¶
- window functions: identical to ANSI
- hashing: SHA256()
🧩 Fabric Warehouse¶
- hashing via
HASHBYTES('SHA2_256', <expr>) - convert binary → hex via
CONVERT(VARCHAR(64), ..., 2) - DDL idempotency:
IF OBJECT_ID(...) IS NULLpattern (noCREATE TABLE IF NOT EXISTS)
🧩 MSSQL¶
- hashing via
HASHBYTES('SHA2_256', ...) - convert binary → hex via
CONVERT(VARCHAR(64), ..., 2)
🧩 Postgres¶
- subqueries:
(SELECT ...) AS alias - hashing:
ENCODE(DIGEST(...), 'sha256'), 'hex')
🧩 Snowflake¶
- window functions: identical to ANSI
- hashing:
LOWER(TO_HEX(SHA2(<expr>, 256))) - identifiers: quoted via
"..."and database/schema-qualified rendering
🔧 8. Logical Plan Rendering Rules (Dialect-Agnostic)¶
🧩 8.1 SELECT¶
SELECT <select_list>
FROM <source>
[WHERE <expr>]
[GROUP BY <exprs>]
[ORDER BY <order_items>]
🧩 8.2 UNION¶
SELECT ...
UNION ALL
SELECT ...
🧩 8.3 Subquery¶
(
SELECT ...
) AS alias
🧩 8.4 Window Function¶
ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)
Dialect modifies only:
- quoting
- hashing
- type conversions
🔧 9. How the Builder Constructs Logical Plans¶
Key builder patterns:
🧩 9.1 Single-Source Stage¶
LogicalSelect(from_=TableSource(...))
🧩 9.2 Multi-Source Stage Identity¶
LogicalUnion([...])
🧩 9.3 Multi-Source Stage Non-Identity¶
SubquerySource(
select=LogicalSelect(
select_list=[..., WindowFunctionExpr(...)],
from_=LogicalUnion([...])
)
)
Outer filter applied via another LogicalSelect.
🔧 10. Benefits of the Logical Plan¶
- Supports subqueries as first-class citizens
- Vendor-neutral window functions
- Clean separation between logic and SQL syntax
- Extensible architecture (CASE WHEN, JOIN, GROUPING SETS)
- Deterministic behavior for multi-source processing
- No vendor SQL stored in metadata
- Strict testability
🔧 11. Bizcore and the Logical Plan¶
Bizcore datasets do not introduce new Logical Plan node types.
Instead:
- Bizcore logic is expressed through existing constructs:
- joins
- expressions
- derived SelectItems
- Business rules compile into standard AST expressions
- No semantic shortcuts or abstractions exist at plan level
This design guarantees:
- identical behavior between technical and business datasets
- full reuse of planning, validation, and rendering logic
- uniform explainability across layers
From the Logical Plan’s perspective, Bizcore is just another deterministic dataset — with richer intent, not different mechanics.
© 2025-2026 elevata Labs — Internal Technical Documentation