Aljaž Mur Eržen

Binary operator precedence in SQL

Table of contents

I've recently read a blog post where the author is documenting one of their struggles when migrating from Oracle Database to PostgreSQL.

I want to drill down on the core problem further and expose what's the difference between the two SQL dialects. This is the query that caused problems the original author:

CASE
    WHEN TO_CHAR(varmonth,'MI') + 1 = 60
    THEN varhr - 1 || TO_CHAR(varmonth,'MI') + 1 + 40
    ELSE varhr - 1 || TO_CHAR(varmonth,'MI') + 1
END AS varhour

After some experimentation, I've boiled the snippet down to just this:

select '5' || 2 + 10;

PostgreSQL

postgres@localhost:postgres> select '5' || 2 + 10;
+----------+
| ?column? |
|----------|
| 512      |
+----------+

Oracle

SQL> select '5' || 2 + 10;

'5'||2+10
_________
       62

What's going on here?

I cannot say for sure, because I cannot use EXPLAIN to analyze what's happening, because PostgreSQL is optimizing the whole expression down to '512'::text:

postgres@localhost:postgres> select '5' || 2 + 10;
○ Total Cost: 0
○ Planning Time: <1 ms
○ Execution Time: <1 ms
├─⌠ Result
│ │ Returns result
│ │ ○ Duration: <1 ms (17%)
│ │ ○ Cost: 0 (100%)
│ │ ○ Rows: 1
│ ⌡► '512'::text

But I can speculate! My suspicion is that PostgreSQL interprets the query the same as it would these:

select '5' || (2 + 10);
select '5' || (12);
select '5' || '12';
select '512';

... while Oracle deducts like this:

select '5' || 2 + 10;
select ('5' || 2) + 10;
select '52' + 10;
select TO_NUMBER('52') + 10;
select 62;

Both approaches are reasonable. Both try to be helpful in their own way and both try to guess what the author meant. They just follow a bit different deduction rules.

In particular, in PostgreSQL + has a higher precedence than ||. In Oracle, they have the same precedence, using left-to-right associativity.

Also, Oracle allows implicit conversion from text to numbers.

Why this is all wrong

Now is time for my option on all this.

First of all, I want to say that both approaches are valid and each database engine can use any language that they want. But both claiming that they use The SQL language, gives an impression that this is the same language and can be used interchangeably.

Then, I'd like to point out a few language "features":

Oracle's implicit text-to-number conversion

Great in theory, seems like the language trying to be helpful, but in practice, programming languages are no place for guessing. Slight changes in the program can cause the guessing logic to act differently and produce very unexpected results.

Implicit number-to-text conversion of || in both engines

Again, the same problem, the same argument. It's a bad design of JavaScript, it's a bad design of SQL.

PostgreSQL's type inference via text literals

This is not directly reflecting this article, but my experimentation to find the root problem.

If you don't know, in PostgreSQL, any literal can be quoted as a string. This is useful for expressing 'NaN' or 'infinity' for floats, but it works for any type:

postgres@localhost:postgres> select abs('-4.5');
+-----+
| abs |
|-----|
| 4.5 |
+-----+

I'm using abs function here to show that '-4.5' is indeed interpreted as a float, because it rejects expressions that are of type text:

postgres@localhost:postgres> select abs('-4.5'::text);
function abs(text) does not exist
LINE 1: select abs('-4.5'::text)
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

So I was thinking that maybe PostgreSQL is interpreting literals in our example as some unexpected type? And that is then causing problems? But no, it was simpler than that.

This behavior is useful, because without it, it would be hard to express some literal values, for example infinity. If we wanted to use just the infinity identifier, that might clash with column or table names.

So the behavior is justified, but this unexpected to me and probably many developers. String literals look like a string, but they are not always a string?

If I were designing a language (and I am), I would have literals for most used primitive types, have special values (like infinity) accessible as a constant in the standard library, and have other types be constructable from primitive types.

Verdict

SQL is not standardized not for the lack of effort, but because the language is huge for what it is able to express and documenting and unifying its behavior is a monumental effort. On top of that, each of the databases want to keep backward compatibility, preventing language unification.

Small differences between dialects (like the precedence in this article) cause substantial issues when migrating between databases.

Tags: #language