Learn BigQuery coverage gaps

BigQuery Emulator: Coverage Gap Analysis

Stack: Python + SQLGlot + DuckDB Date: 2026-04-20 SQLGlot version analyzed: 30.5.0


Gap Layers Overview

┌─────────────────────────────────────────────────────────────────┐
│                    BigQuery Full Feature Set                     │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │      Layer 1: SQLGlot Parse/Transpilation Failures       │   │
│  │  ┌───────────────────────────────────────────────────┐   │   │
│  │  │      Layer 2: DuckDB Engine Gaps                  │   │   │
│  │  │  ┌────────────────────────────────────────────┐   │   │   │
│  │  │  │   Layer 3: Silent Semantic Wrong Results   │   │   │   │
│  │  │  └────────────────────────────────────────────┘   │   │   │
│  │  └───────────────────────────────────────────────────┘   │   │
│  └──────────────────────────────────────────────────────────┘   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │      Layer 4: Infrastructure / API-Only Concepts         │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Layer 1: Hard Parse/Transpilation Failures

SQLGlot raises ParseError or crashes — cannot process these at all.

FeatureErrorImpact
RAISE USING MESSAGEParseErrorScripting
RETURN (standalone)ParseErrorScripting
ML.EVALUATE(MODEL m, ...)ParseErrorML
ML.PREDICT(MODEL m, ...)ParseErrorML
AI.FORECAST(...)ParseErrorAI
NTH_VALUE(x, n FROM LAST)ParseErrorWindow functions
STRUCT(* FROM nested)ParseErrorStruct operations
LOAD DATA INTO ...ParseErrorData loading
Pipe |> RENAME/CALL/SET/DROP/LIMITParseError / AttributeErrorPipe syntax

Layer 2: Silent Scripting Truncation

SQLGlot silently drops everything after the first statement in scripting blocks. No error raised.

-- Input:
BEGIN
  DECLARE y INT64;
  SET y = 5;
  IF y > 0 THEN SELECT 'positive'; END IF;
END

-- SQLGlot output:
BEGIN DECLARE y INT64
-- ⚠️ EVERYTHING ELSE IS GONE

Affected constructs:

  • BEGIN...END
  • IF...THEN...ELSE...END IF
  • LOOP...END LOOP
  • WHILE...DO...END WHILE
  • FOR...IN...END FOR

Root cause: END, END IF, END FOR, END WHILE, END LOOP are parsed as separate COMMAND tokens in SQLGlot’s BigQuery parser (parsers/bigquery.py lines 331-337). Terminal keywords are not attached to the preceding block.

Note: Single DECLARE, SET, CALL statements do transpile correctly. Only block constructs truncate.


Layer 3: Silent Semantic Wrong Results

These transpile without error but produce incorrect results.

3a. Reversed Semantics

FeatureWhat Goes Wrong
TIMESTAMP('2020-01-01', 'America/New_York')BQ interprets as local→UTC. DuckDB AT TIME ZONE does UTC→local. Semantics are reversed.
TABLESAMPLE BERNOULLI(10)Converted to RESERVOIR(10 ROWS) — 10% probability → fixed 10 rows

3b. Silently Dropped Clauses

FeatureWhat’s Dropped
ARRAY_AGG(x IGNORE NULLS)IGNORE NULLS dropped — NULLs included in result
ARRAY_AGG(x RESPECT NULLS)RESPECT NULLS dropped (happens to match DuckDB default)
INSTR(s, 'sub', 1, 2)Occurrence parameter dropped — finds 1st, not 2nd
DDL PARTITION BYSilently dropped from CREATE TABLE
DDL CLUSTER BYSilently dropped from CREATE TABLE
DDL OPTIONS(...)Silently dropped from CREATE TABLE/SCHEMA
CREATE EXTERNAL TABLE OPTIONS(...)All options dropped, becomes plain CREATE TABLE
CREATE MODEL OPTIONS(...)Training config (model_type, etc.) dropped
ALTER TABLE SET OPTIONS(...)OPTIONS clause dropped entirely
EXPORT DATA OPTIONS(...)OPTIONS block dropped

3c. Invalid Output SQL

FeatureOutput
CAST(x AS BIGNUMERIC(76,38))CAST(x AS DECIMAL(38,5)(76,38)) — syntactically invalid

3d. Return Type Mismatches

FeatureDifference
APPROX_TOP_COUNT(x, 5)BQ returns ARRAY<STRUCT<value, count>>. DuckDB APPROX_TOP_K returns different structure.
APPROX_QUANTILES(x, 100)BQ returns ARRAY<T> of 101 values. DuckDB APPROX_QUANTILE uses fractional list — downstream result[OFFSET(50)] breaks.

3e. Wildcard Tables

-- Input:
SELECT * FROM `project.dataset.table_*` WHERE _TABLE_SUFFIX BETWEEN '20200101' AND '20201231'

-- Output:
SELECT * FROM "project"."dataset"."table_*" WHERE _TABLE_SUFFIX BETWEEN ...
-- Table name with * quoted as identifier (unresolvable)
-- _TABLE_SUFFIX is a BQ pseudocolumn with no DuckDB equivalent

Layer 4: DuckDB Engine Gaps

Even with perfect transpilation, DuckDB cannot execute these.

4a. Critical Gaps

GEOGRAPHY (Geodetic/Spherical)

  • DuckDB spatial extension = planar only (Euclidean geometry)
  • BigQuery GEOGRAPHY = spherical (WGS84 geodesic, results in meters)
  • ST_DISTANCE: DuckDB returns degrees, BQ returns meters
  • ST_AREA: DuckDB returns square units, BQ returns square meters on sphere
  • All relationship predicates use different geometric models
  • BQ-only: ST_CLUSTERDBSCAN, S2_CELLIDFROMPOINT, S2_COVERINGCELLIDS, ST_GEOGFROM, ST_BOUNDINGBOX (struct), ST_BUFFERWITHTOLERANCE, ST_UNION_AGG, ST_CENTROID_AGG
  • Workaround: Python UDFs wrapping S2Geometry or shapely + pyproj

Scripting Engine

  • DuckDB has no procedural interpreter
  • Missing: DECLARE, SET (variables), IF/ELSE, WHILE, LOOP, FOR...IN, BREAK, CONTINUE, RETURN, RAISE
  • Workaround: Build procedural interpreter in Python — parse into statements, maintain variable namespace, evaluate conditionals in Python, execute DuckDB statements sequentially

Partitioned / Clustered Tables

  • No DDL-level partitioning syntax
  • No _PARTITIONTIME / _PARTITIONDATE pseudocolumns
  • No partition expiration
  • No required partition filter enforcement
  • Workaround: Application-layer metadata + pseudocolumn injection + WHERE clause push-down via zonemaps

INFORMATION_SCHEMA

  • BQ has 50+ views, DuckDB has 8
  • Missing: PARTITIONS, ROUTINES, PARAMETERS, JOBS, JOBS_BY_USER, TABLE_STORAGE, OBJECT_PRIVILEGES, SEARCH_INDEXES, VECTOR_INDEXES, STREAMING_TIMELINE, RESERVATIONS, BI_CAPACITIES, and ~35 more
  • Existing views (TABLES, COLUMNS) have different schemas — BQ adds is_partitioning_column, clustering_ordinal_position, creation_time, row_count, size_bytes
  • Workaround: Synthesize all views from emulator metadata catalog

4b. High-Impact Gaps

FeatureNotesWorkaround
SAFE.* prefixBQ: SAFE.any_function() → NULL on error. No generic equivalent in DuckDB.Python function wrapper registry with try/except
NET.* functions (10)IP parsing, domain extraction, public suffix listPython UDFs (ipaddress stdlib, tldextract)
JavaScript UDFsBQ runs JS via V8 engineCannot execute — reject with clear error
HLL_COUNT.* sketch functionsSerialized HyperLogLog as first-class objectsStub or raise not-implemented
BIGNUMERIC (76-digit precision)DuckDB DECIMAL max = 38 digitsPython decimal.Decimal UDFs for overflow cases
Materialized ViewsDuckDB views always virtualMaintain physical table + refresh triggers
FOR SYSTEM_TIME AS OFNo time travelCDC records + history replay (complex)
Table Snapshots / ClonesCopy-on-write storage conceptsFull copy on CREATE SNAPSHOT
Stored Procedures (CREATE PROCEDURE)DuckDB has CREATE MACRO onlyProcedural interpreter (same as scripting)
User-Defined AggregatesNo SQL DDL for UDAFs in DuckDBPython API only
Differential PrivacyWITH DIFFERENTIAL_PRIVACY OPTIONS(epsilon=...)Out of scope

4c. Pass-Through Functions (Transpile OK, Fail at Runtime)

SQLGlot passes these through unchanged — DuckDB has no implementation:

CategoryFunctions
String encodingTO_CODE_POINTS, CODE_POINTS_TO_STRING, NORMALIZE
JSON lax conversionLAX_INT64, LAX_FLOAT64, LAX_BOOL, LAX_STRING
Approximate aggregatesAPPROX_TOP_SUM
NET namespaceNET.IP_FROM_STRING, NET.IP_TO_STRING, NET.IPV4_FROM_INT64, NET.IPV4_TO_INT64, NET.IP_NET_MASK, NET.IP_TRUNC, NET.HOST, NET.PUBLIC_SUFFIX, NET.REG_DOMAIN
Date/timeCURRENT_DATETIME(), SOUNDEX
ControlERROR() (DuckDB has error() — different casing may work)
Format specifiersFORMAT('%t', x), FORMAT('%T', x), FORMAT('%P', x)

4d. Type System Gaps

BQ TypeDuckDB EquivalentGap
GEOGRAPHYGEOMETRY (spatial ext)Planar vs geodesic — fundamentally different
BIGNUMERICDECIMAL(38,s)76 vs 38 digit precision
BYTESBLOBFunctional equivalent; SAFE_CONVERT_BYTES_TO_STRING missing
DATETIME (civil)TIMESTAMP (naive)Representable but mapping must be careful
TIMESTAMP (UTC)TIMESTAMPTZCorrect mapping
STRUCTSTRUCTBQ has NULLABLE/REQUIRED/REPEATED modes; DuckDB always nullable
ARRAYLISTBQ prohibits ARRAY<ARRAY<...>>; DuckDB allows nested — must enforce
INTERVALINTERVALMostly equivalent; BQ has JUSTIFY_HOURS/DAYS/INTERVAL (missing in DuckDB)

Layer 5: Infrastructure / API-Only Concepts

Not SQL gaps — BigQuery platform features with no emulator equivalent needed:

FeatureNotes
Slot-based distributed executionEmulator is single-node by design
Streaming buffer / streaming insertsNo buffer concept needed
Table decorators (table@timestamp)Legacy SQL — unlikely needed
BI EngineNot applicable — DuckDB is already in-memory
Row-level access policiesStub DDL, no enforcement
Column-level security / policy tagsStub DDL, no enforcement
Search indexes / SEARCH()Stub DDL, raise on SEARCH()
Remote functionsOut of scope
Reservations / billingNot applicable

What Works Correctly

For reference, these BigQuery features transpile and execute correctly:

  • MERGEMERGE INTO
  • STRUCT(a AS x, b AS y){'x': a, 'y': b}
  • SELECT AS STRUCT x, ySELECT {'x': x, 'y': y}
  • QUALIFY clause (DuckDB native support)
  • SELECT * EXCEPT(col)SELECT * EXCLUDE (col)
  • SELECT * REPLACE(expr AS col)SELECT * REPLACE (expr AS col)
  • UNNEST with WITH OFFSETUNNEST(...) WITH ORDINALITY
  • arr[OFFSET(n)]arr[n+1] (0→1 index)
  • arr[ORDINAL(n)]arr[n]
  • SAFE_CAST(x AS T)TRY_CAST(x AS T)
  • SAFE_DIVIDE(x, y)CASE WHEN y <> 0 THEN x / y ELSE NULL END
  • DATE_ADD / DATE_SUB / DATETIME_ADD → interval arithmetic
  • TIMESTAMP_SECONDS/MILLIS/MICROSMAKE_TIMESTAMP(...)
  • GENERATE_DATE_ARRAYCAST(GENERATE_SERIES(...) AS DATE[])
  • JSON_QUERY-> operator
  • JSON_VALUEJSON_VALUE(col, path)
  • TO_JSON_STRINGCAST(TO_JSON(x) AS TEXT)
  • STRING_AGG(x, sep ORDER BY y)LISTAGG(x, sep ORDER BY y)
  • SPLIT(s, delim)STR_SPLIT(s, delim)
  • REGEXP_CONTAINS(s, pat)REGEXP_MATCHES(s, pat)
  • LOGICAL_AND / LOGICAL_ORBOOL_AND / BOOL_OR
  • COUNTIF(cond)COUNT_IF(cond)
  • APPROX_COUNT_DISTINCTAPPROX_COUNT_DISTINCT
  • WITH RECURSIVE (DuckDB native)
  • EXCEPT ALL / INTERSECT ALL (DuckDB native)
  • PIVOT / UNPIVOT (DuckDB native)
  • DECLARE x INT64 DEFAULT 0DECLARE x BIGINT = 0 (single statement)
  • Named params @param$param
  • GENERATE_UUID()CAST(UUID() AS TEXT)
  • MAKE_INTERVAL(year=>1, day=>3)INTERVAL '1 year 3 day'
  • Pipe \|> WHERE/SELECT/AGGREGATE/JOIN/ORDER BY/DISTINCT/EXTEND/AS → CTEs

Priority Matrix

                        ┌─────────────────────────────────────┐
                  High  │  PARTITIONING    INFORMATION_SCHEMA │
                        │  SCRIPTING       _TABLE_SUFFIX      │
         Frequency      │                                     │
         in real        ├─────────────────────────────────────┤
         BQ usage       │  SAFE.* prefix   TIMESTAMP timezone │
                        │  ARRAY_AGG       DDL OPTIONS        │
                        │  IGNORE NULLS                       │
                        ├─────────────────────────────────────┤
                  Low   │  GEOGRAPHY       BigQuery ML        │
                        │  NET.*           HLL sketches       │
                        │  JS UDFs         Diff Privacy       │
                        └─────────────────────────────────────┘
                          High                          Low
                               Severity of Gap
  1. P0 — Must have for MVP:

    • Partition DDL metadata + _PARTITIONTIME pseudocolumns
    • INFORMATION_SCHEMA core views (TABLES, COLUMNS, SCHEMATA, PARTITIONS, ROUTINES)
    • Fix TIMESTAMP(datetime, timezone) reversed semantics
    • Fix ARRAY_AGG IGNORE NULLS behavior
    • DDL property preservation (PARTITION BY, CLUSTER BY, OPTIONS)
  2. P1 — High value:

    • Scripting interpreter (DECLARE, SET, IF, LOOP, BEGIN…END)
    • SAFE.* prefix wrapper system
    • _TABLE_SUFFIX wildcard table support
    • NET.* functions via Python UDFs
  3. P2 — Medium value:

    • Materialized view emulation
    • BIGNUMERIC precision handling
    • Missing string/JSON functions (TO_CODE_POINTS, LAX_*, NORMALIZE)
    • FORMAT %t/%T specifiers
    • JUSTIFY_HOURS/DAYS/INTERVAL
  4. P3 — Low priority / stub:

    • GEOGRAPHY (complex, niche usage in testing)
    • BigQuery ML (out of scope)
    • JavaScript UDFs (reject with error)
    • HLL sketch serialization
    • Differential privacy
    • Time travel / snapshots / clones
    • Row/column security policies

Last updated: May 2025