Parameter type inference in SQL

When SQL contains a query parameter (i.e. $1), PostgreSQL will try to determine types of that parameter or determine that the type is unknown and fallback to string encoding.

For example here:

SELECT $1::uuid;

… postgres will infer that $1 is of type pg_catalog.uuid. Great.

As a bonus, when passing parameters to your PostgreSQL client of choice, they will probably also accept a text-encoded UUID.

But this all relies on Postgres’s ability to infer the type. And this is not always possible. For example:

SELECT column1::uuid FROM (VALUES ($1))

… postgres will infer that $1 is of unknown type and will this expect a string encoding. Passing in a UUID type to the client will result in:

asyncpg.exceptions.DataError:
  invalid input for query argument $1:
    UUID('6f0a1da5-...') (expected str, got UUID)

Well, this is now annoying.

But my problem spans further, and that’s because I need to implement this exact behavior for EdgeDB too, since we are aiming for full compatibility with PostgreSQL.

And this gets hard because we compile queries like this one:

INSERT INTO "Document" (owner_id) VALUES ($1)

… into something like:

WITH
    ins_contents AS (
         SELECT column1::uuid FROM (VALUES ($1))
    )
    ins AS (
         INSERT INTO "Document" (owner_id)
         SELECT * FROM ins_contents
         RETURNING ...
    )
SELECT ...

In the original query, EdgeDB users (or software using EdgeDB) will expect $1 to have type UUID, since this is what postgres would infer.

In the compiled query, postgres will not be able to infer type of $1, thus breaking anything that is trying to pass in UUIDs.

My solution is a workaround, pushing type annotation into the VALUES query:

SELECT column1::uuid FROM (
    VALUES ($1::uuid) -- I need to inject this type cast here
)

This will not cover all the cases of how postgres infers types, but it will work in the most common case: using VALUES in INSERT.

Again, SQL is not a designed language. There is no clear separation between stuff that’s in the standard and stuff that is just accidental behavior that seems right. Because of that, the full extent of all behaviors is hard to document, test and replicate.

The correct way to do parameter typing would be an explicit one: the language should require that all parameters have an explicit type cast ¹, which would simplify type inference, but also life of anyone debugging broken SQL queries.

EdgeQL does exactly that. ↩︎