⚙️ Supported Source Backends¶
This document provides an overview of supported database platforms, tested SQLAlchemy dialects,
and connection examples for elevata.
It describes how elevata connects to external systems for schema discovery, metadata introspection,
and optional federated access at the Stage layer.
Execution semantics are defined separately.
It also documents non-relational ingestion sources (Files and REST) and how to configure them via
SourceDataset.ingestion_config.
🔧 1. General Pattern (SQLAlchemy-backed systems)¶
All SQL database connections use the standard SQLAlchemy URI format:
dialect[+driver]://username:password@host:port/database[?params]
Example:
mssql+pyodbc://user:pwd@server,1433/db?driver=ODBC%20Driver%2018%20for%20SQL%20Server
In your .env, define these values using the elevata secret convention:
SEC_<ENVIRONMENT>_CONN_<TYPE>_<SHORTNAME>=<SQLAlchemy URI>
Example:
SEC_DEV_CONN_MSSQL_SAP=mssql+pyodbc://user:pwd@sql01,1433/SAPDB?driver=ODBC%20Driver%2018%20for%20SQL%20Server
🔧 2. Fully Supported Backends¶
| Type | Label | Driver / Package | Reflection Support | Example URI |
|---|---|---|---|---|
bigquery |
Google BigQuery | pybigquery |
✅ Columns only | bigquery://project-id |
databricks |
Databricks SQL (beta) | sqlalchemy-databricks |
⚠️ Columns only | databricks://token:<TOKEN>@<host>?http_path=/sql/1.0/warehouses/<id> |
duckdb |
DuckDB | duckdb-engine |
✅ Full | duckdb:///C:/data/local.duckdb |
mssql |
Microsoft SQL Server | pyodbc |
✅ Full | mssql+pyodbc://user:pwd@host,1433/db?driver=ODBC%20Driver%2018%20for%20SQL%20Server |
mysql |
MySQL / MariaDB | mysqlclient or pymysql |
✅ Full | mysql+pymysql://user:pwd@localhost:3306/db |
oracle |
Oracle | python-oracledb |
✅ Columns & PKs | oracle+oracledb://user:pwd@host:1521/servicename |
postgres / postgresql |
PostgreSQL | psycopg2 |
✅ Full | postgresql+psycopg2://user:pwd@localhost:5432/dbname |
redshift |
Amazon Redshift | sqlalchemy-redshift |
✅ Columns only | redshift+psycopg2://user:pwd@redshift:5439/dev |
snowflake |
Snowflake | snowflake-sqlalchemy |
✅ Columns only | snowflake://user:pwd@account/DB/SCHEMA?warehouse=WH |
sqlite |
SQLite | built-in | ✅ Full | sqlite:///C:/data/dev.db |
🔧 3. Connection Notes¶
- Use URL-encoded driver names (e.g.
ODBC%20Driver%2018%20for%20SQL%20Server). - Backslashes in paths should be replaced by slashes (
C:/path/file.db). - SSL parameters and special driver arguments can be appended via query params.
- For cloud systems (e.g. Snowflake, BigQuery, Databricks), credentials and tokens can be stored
in.envor a vault. - For BigQuery, SQLAlchemy-based reflection is used for schema and column discovery.
Execution as a target backend is described separately.
🔧 4. Beta Dialects (Reflection may be limited)¶
trino/prestoclickhouseteradatadb2hana
You can register these manually under System.type if you want to experiment — elevata will attempt a
generic reflection where possible, but may not retrieve primary or foreign key details.
🔧 5. File-based Sources (RAW Ingestion)¶
elevata supports file-based sources for native RAW ingestion. The file type is determined via System.type.
🧩 Supported File Types¶
System.type |
Format | Notes |
|---|---|---|
csv |
CSV with header row | Header row is used as column names |
json |
JSON array of objects ([{...}, {...}]) |
Each element must be an object |
jsonl |
Newline-delimited JSON ({"a":1} {"a":2}) |
One object per line |
parquet |
Parquet | Currently: local path or file:// |
excel |
Excel (.xlsx, .xlsm) |
Sheet and header configurable |
RAW landing is always Full Replace: Drop/Create/Truncate/Insert.
RAW tables are system-managed landing zones and always include technical columns such as:
- load_run_id
- loaded_at
- payload (preserved JSON object for the original record)
🧩 SourceDataset.ingestion_config (File)¶
🔎 Required fields¶
{
"uri": "file:///data/orders.parquet"
}
uri(string, required)
Path or URI to the file.
Supported:
- file:///... (recommended)
- local paths (e.g. /data/orders.csv, C:/data/orders.xlsx)
- http(s)://... for CSV/JSON/JSONL/Excel (Parquet currently not supported over HTTP)
Local paths vs. file URIs
The
urifield accepts both plain local paths andfile://URIs.Valid examples:
- Linux / macOS:
/data/orders.csv
file:///data/orders.csvWindows:
C:/data/orders.xlsxfile:///C:/data/orders.xlsxUsing
file:///is recommended for portability and clarity in configuration files, but it is not strictly required.Backslashes (
\) should be avoided. Use forward slashes (/) instead.
🔎 Environment variable expansion¶
uri supports environment variable expansion using the ${VAR_NAME} syntax.
Example:
{
"uri": "${ELEVATA_INGEST_ROOT}/finance/orders.xlsx"
}
🧩 .env expectations¶
Recommended pattern:
ELEVATA_INGEST_ROOT=/mnt/dev/inbox
Then reference it in metadata:
{
"uri": "${ELEVATA_INGEST_ROOT}/finance/orders.xlsx"
}
This allows using the same metadata across dev/test/prod while switching file roots via .env.
🔎 Optional fields¶
{
"uri": "https://example.com/download?id=123",
"file_type": "jsonl"
}
file_type(string, optional)
Overrides detection based on file suffix. Useful if the URL does not contain a file extension (e.g. presigned URLs).
Allowed values: csv | json | jsonl | parquet | excel
Important: If set, file_type must match System.type (to avoid inconsistent configuration).
Detection rules (if file_type is not set):
- .csv → CSV
- .json → JSON array
- .jsonl / .ndjson → JSON Lines
- .parquet → Parquet
- .xlsx / .xlsm → Excel
🧩 CSV-specific notes (System.type = "csv")¶
- CSV must have a header row.
- Values are read as strings; type inference happens via auto-import / column profiling.
{
"uri": "file:///data/orders.csv",
"delimiter": ";",
"quotechar": "\"",
"encoding": "utf-8-sig"
}
-
delimiter(string, optional, default: ,)
Column separator character.
Common in German Excel exports:;. -
quotechar(string, optional, default: \")
Character used to quote fields containing delimiters or line breaks.
Standard CSV uses double quotes ("). -
encoding(string, optional, default: utf-8)
Character encoding used when decoding the file.
Useful values: - utf-8
- utf-8-sig (common for Excel exports on Windows)
- cp1252 (legacy Windows encoding)
If not specified, elevata assumes UTF-8 encoding and standard CSV quoting.
🧩 JSON-specific notes (System.type = "json")¶
- The file must contain a JSON array of objects.
- Non-object elements are ignored.
🧩 JSONL-specific notes (System.type = "jsonl")¶
- One JSON object per line.
- Empty lines are ignored.
🧩 Parquet-specific notes (System.type = "parquet")¶
- Parquet ingestion is chunked for memory safety.
- Currently supported locations: local path or
file://URI.
🧩 Excel-specific Options (System.type = "excel")¶
{
"uri": "file:///data/orders.xlsx",
"sheet_name": "Orders",
"header_row": 1,
"max_rows": 50000
}
-
sheet_name(string, optional)
Name of the worksheet. If not set, the first sheet is used. -
sheet_index(int, optional)
0-based sheet index (alternative tosheet_name). -
header_row(int, optional, default: 1)
1-based row number containing the header. -
max_rows(int, optional)
Limits the number of data rows read (after the header).
🔧 6. REST Sources (RAW Ingestion)¶
For REST APIs, use System.type = "rest".
REST ingestion configuration is split into:
-
System-level REST connection (environment-specific):
base_urland default headers -
Dataset-level ingestion_config (dataset-specific):
path, record extraction, pagination, cursor rules
This keeps metadata stable across dev/test/prod while allowing endpoint switches via secrets.
Important: base_url is environment-/system-level configuration and must be provided via secrets
(e.g. .env or vault). It should not be duplicated per dataset.
🧩 System-level REST connection (secret)¶
REST systems resolve connection details via the profile-aware secret reference:
sec/{profile}/conn/rest/{short_name}
The resolved secret must provide at least:
base_url(string, required)
Optional:
headers(object): default headers included with every requestquery(object): fixed query parameters included with every request
Example secret JSON:
{
"base_url": "https://jsonplaceholder.typicode.com",
"headers": {"Authorization": "Bearer <TOKEN>"},
"query": {"locale": "en_US"}
}
🧩 .env expectations¶
In local development, this is typically backed by .env values using the same convention as SQL connection strings.
Example (conceptual) .env entry:
SEC_DEV_CONN_REST_JSONPH={"base_url":"https://jsonplaceholder.typicode.com"}
Optional default headers and fixed query params may be provided as well:
SEC_DEV_CONN_REST_REQRES={"base_url":"https://reqres.in","headers":{"Authorization":"Bearer <TOKEN>"},"query":{"locale":"en_US"}}
Note: The exact
.envrepresentation depends on your secret provider.
The important contract is that resolving the secret yields a JSON object containing at leastbase_url.
🧩 SourceDataset.ingestion_config (REST)¶
🔎 Required fields¶
{
"path": "/posts"
}
path(string, required)
Endpoint path relative to the systembase_url.
🔎 Optional fields¶
{
"path": "/api/users",
"method": "GET",
"headers": {"X-Debug": "1"},
"params": {"limit": 100},
"record_path": "data"
}
-
method(string, optional, default:GET)
HTTP method. -
headers(object, optional)
Request headers added on top of system default headers. -
params(object, optional)
Request query parameters added on top of system fixed query parameters. -
record_path(string, optional)
Dotted path to the list of records inside the JSON response (e.g.data.items). If omitted, the response must be a JSON array.
🔎 Pagination (page-based)¶
{
"path": "/api/users",
"page_param": "page",
"page_size_param": "per_page",
"page_size": 6,
"record_path": "data"
}
-
page_param(string, optional)
Name of the page parameter. -
page_size_param(string, optional)
Name of the page size parameter. -
page_size(int, optional)
Requested page size.
🔎 Incremental scoping (cursor / since)¶
Note: RAW remains Full Replace. Cursor configuration affects only the extraction scope.
{
"path": "/events",
"since_param": "since",
"cursor_field": "updated_at",
"record_path": "items"
}
-
since_param(string, optional)
Query parameter receiving the delta cutoff value. -
cursor_field(string, optional)
Field name used to derive a cursor/max timestamp.
© 2025-2026 elevata Labs — Internal Technical Documentation