⚙️ Logical Plan¶

This document describes the Logical Plan layer of elevata, which includes:

Subqueries in the FROM clause
Window Functions (ROW_NUMBER, etc.)
Multi-source Stage logic (identity vs non-identity)
Updated Select/Union/Source nodes
Integration with the Expression DSL & AST
Dialect-safe rendering

🔧 1. Purpose of the Logical Plan¶

The Logical Plan is the dialect-agnostic intermediate representation between:

Metadata → Logical Plan → Dialect Renderer → SQL

It describes what needs to be executed, not how a specific SQL dialect expresses it.

The Logical Plan:
- is fully structured (tree-based)
- has no vendor SQL
- is deterministic
- is safe for testing
- can be rendered to any SQL dialect (BigQuery, Databricks, DuckDB, Fabric Warehouse, MSSQL, Postgres, SNowflake)

🔧 2. Query Tree and Logical Plan¶

The Query Tree operates at a higher abstraction level than the Logical Plan.

The Query Tree defines what operations occur and in which order
(e.g. SELECT → WINDOW → AGGREGATE).
The Logical Plan defines how these operations are represented as
structured SQL components (LogicalSelect, SubquerySource, LogicalUnion, …).

When custom query logic is enabled, the Query Tree is compiled into a Logical Plan
using the same builder infrastructure as default generation.

This design keeps the Logical Plan as a stable, vendor-neutral intermediate representation,
regardless of whether SQL is generated automatically or via an explicit Query Tree.

🔧 3. Core Node Types¶

🧩 3.1 LogicalSelect¶

Represents a SELECT statement.

Fields:
- select_list: list[SelectItem]
- from_: SourceNode (table source, subquery, union, etc.)
- where: Optional[Expr]
- group_by: list[Expr]
- order_by: list[OrderItem]

Example structure:

LogicalSelect(
  select_list=[SelectItem(ColumnRef("a")), SelectItem(ColumnRef("b"))],
  from_=TableSource("raw_sales", alias="r")
)

🧩 3.2 SelectItem¶

Represents a single column in the SELECT list.

Fields:
- expr: Expr
- alias: Optional[str]

🧩 3.3 Source Nodes¶

🔎 TableSource¶

TableSource(table_name, alias)

🔎 SubquerySource¶

Represents:

(SELECT ...) AS alias

Used for:
- multi-source Stage ranking
- derived tables
- complex transformations

🧩 3.4 LogicalUnion¶

Represents a UNION or UNION ALL.

Fields: - selects: list[LogicalSelect] - union_type: "ALL" | "DISTINCT"

Rendered as:

SELECT ...
UNION ALL
SELECT ...

often wrapped in a SubquerySource.

🔧 4. Window Functions¶

Window functions appear via:

WindowFunctionExpr(
  function_name="ROW_NUMBER",
  partition_by=[...],
  order_by=[OrderItem(...)]
)

Example AST node inside a SelectItem:

SelectItem(
  expr=WindowFunctionExpr(
    function_name="ROW_NUMBER",
    partition_by=[ColumnRef("business_key")],
    order_by=[OrderItem(ColumnRef("updated_at"), "DESC")]
  ),
  alias="__src_rank_ord"
)

Used primarily in Stage non-identity mode.

🔧 5. Subqueries in Multi-Source Stage¶

Multi-source Stage requires optional ranking logic:

🧩 Identity Mode¶

Each upstream dataset contributes a source_identity_id literal.
No ranking is needed.
We use raw UNION ALL:

LogicalUnion([sel1, sel2, ...])

🧩 No Subquery Required.¶

🧩 Non-Identity Mode¶

UNION ALL is wrapped into a SubquerySource.
Outer SELECT applies ROW_NUMBER and filters.

Logical Plan pattern:

LogicalSelect(
  select_list=[...],
  from_=SubquerySource(
    select=LogicalSelect(
      select_list=[..., WindowFunctionExpr(...) AS __src_rank_ord],
      from_=LogicalUnion([...])
    ),
    alias="ranked"
  ),
  where=ComparisonExpr(ColumnRef("__src_rank_ord"), "=", Literal(1))
)

🔧 6. Integration With Expression DSL & AST¶

All expressions inside the Logical Plan use the DSL/AST layer:
- column references
- literals
- CONCAT/CONCAT_WS
- COALESCE
- HASH256
- window functions

This ensures cross-dialect consistency.

🔧 7. Dialect Rendering Responsibilities¶

Each dialect must render:
- SELECT lists
- window functions
- subqueries
- unions
- column references with quoting rules
- expressions via the DSL AST

Example dialect responsibilities:

🧩 BigQuery¶

window functions: identical to ANSI
hashing: TO_HEX(SHA256(<expr>))
table addressing: dataset-qualified identifiers

🧩 Databricks¶

window functions: identical to ANSI / Spark SQL
hashing: SHA2(<expr>, 256)
execution: runs on Databricks SQL Warehouse (Unity Catalog for catalog/schema)

🧩 DuckDB¶

window functions: identical to ANSI
hashing: SHA256()

🧩 Fabric Warehouse¶

hashing via HASHBYTES('SHA2_256', <expr>)
convert binary → hex via CONVERT(VARCHAR(64), ..., 2)
DDL idempotency: IF OBJECT_ID(...) IS NULL pattern (no CREATE TABLE IF NOT EXISTS)

🧩 MSSQL¶

hashing via HASHBYTES('SHA2_256', ...)
convert binary → hex via CONVERT(VARCHAR(64), ..., 2)

🧩 Postgres¶

subqueries: (SELECT ...) AS alias
hashing: ENCODE(DIGEST(...), 'sha256'), 'hex')

🧩 Snowflake¶

window functions: identical to ANSI
hashing: LOWER(TO_HEX(SHA2(<expr>, 256)))
identifiers: quoted via "..." and database/schema-qualified rendering

🔧 8. Logical Plan Rendering Rules (Dialect-Agnostic)¶

🧩 8.1 SELECT¶

SELECT <select_list>
FROM <source>
[WHERE <expr>]
[GROUP BY <exprs>]
[ORDER BY <order_items>]

🧩 8.2 UNION¶

SELECT ...
UNION ALL
SELECT ...

🧩 8.3 Subquery¶

(
  SELECT ...
) AS alias

🧩 8.4 Window Function¶

ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

Dialect modifies only:
- quoting
- hashing
- type conversions

🔧 9. How the Builder Constructs Logical Plans¶

Key builder patterns:

🧩 9.1 Single-Source Stage¶

LogicalSelect(from_=TableSource(...))

🧩 9.2 Multi-Source Stage Identity¶

LogicalUnion([...])

🧩 9.3 Multi-Source Stage Non-Identity¶

SubquerySource(
  select=LogicalSelect(
    select_list=[..., WindowFunctionExpr(...)],
    from_=LogicalUnion([...])
  )
)

Outer filter applied via another LogicalSelect.

🔧 10. Benefits of the Logical Plan¶

Supports subqueries as first-class citizens
Vendor-neutral window functions
Clean separation between logic and SQL syntax
Extensible architecture (CASE WHEN, JOIN, GROUPING SETS)
Deterministic behavior for multi-source processing
No vendor SQL stored in metadata
Strict testability

🔧 11. Bizcore and the Logical Plan¶

Bizcore datasets do not introduce new Logical Plan node types.

Instead:

Bizcore logic is expressed through existing constructs:
joins
expressions
derived SelectItems
Business rules compile into standard AST expressions
No semantic shortcuts or abstractions exist at plan level

This design guarantees:

identical behavior between technical and business datasets
full reuse of planning, validation, and rendering logic
uniform explainability across layers

From the Logical Plan’s perspective, Bizcore is just another deterministic dataset — with richer intent, not different mechanics.