For the Haters

Despite the fact that SQL has become the de-facto standard language for relational data, it isn't perfect. There are many valid criticisms of the language. You must know these shortcomings to gain true mastery of the language.

SQL Is Bloated

SQL has a ton of reserved keywords, e.g. SELECT, WHERE, ORDER, etc. These are forbidden from being used as identifiers, so you need to take care not to name a table, column, or function any of these words. You also need to be familiar with what they do to write queries. The latest version of the standard, SQL:2016, has over 400 reserved keywords. That's not normal. The programming language Go has only 25. This makes SQL very difficult to learn.

SQL Tries to Look Like Natural English, But Fails

When designing SQL, the goal was to create an intuitive syntax for "non-professionals" to be able to read and comprehend SQL queries. For some simple queries, that is true.

SELECT age
FROM daters
WHERE name = 'Ben';

Most people can figure that one out. However, once you start doing joins, subqueries, window functions, or anything more complex, intuition is thrown out the window.

SELECT
  s.*,
  m.mood,
  m.ts
FROM (
  SELECT
    *,
    CAST(ts AS DATE) AS d,
    row_number() OVER (
      PARTITION BY user_id, CAST(ts AS DATE)
      ORDER BY ts DESC
    ) AS ts_num
  FROM mood_updates
) m
JOIN sleep_reports s
  ON m.d + INTERVAL '1 day' = CAST(s.ts AS DATE)
WHERE ts_num = 1
  AND efficiency > 70
ORDER BY s.user_id;

Try showing that one to a non-professional. See if they can make any sense of that. Actually, most professionals will probably struggle to interpret that!

Compared to that, the equivalent code in a typical modern programming language will be much more comprehensible to most programmers because they don't strive to look like english. It will look more like a tree of for-loops and if-statements and will be easier to reason about. Natural language is actually a terrible way to express algorithms.

Randomly Inconsistent Function Arguments

Most of the time, function calls in SQL look exactly like in most programming languages: function_name(arg_1, arg_2, …). However, some functions randomly have different syntax with keywords thrown in.

SELECT
  substring('PostgreSQL' FROM 8 FOR 3),
  trim(BOTH FROM 'yxSQLzz', 'xyz'),
  extract(DAY FROM TIMESTAMP '2001-02-16 20:38:40');
substring
TEXT
btrim
TEXT
extract
NUMERIC
SQL
SQL
16

Why? I don't know…

NULL Is Very Confusing

After reading our page on NULL and getting tripped up by NULL in a few challenges, perhaps you don't need any convincing that NULL is confusing.

One reason it's so tricky is that it doesn't equal itself. You need a special operator to even detect it.

SELECT
  NULL = NULL AS "equal_itself?",
  NULL IS NULL AS "is_null?";
equal_itself?
BOOL
is_null?
BOOL
NULLTRUE

This property leads to some unexpected behavior. Let's look at some examples using this simple table.

SELECT * FROM dogs;
id
INTEGER
breed
VARCHAR
name
VARCHAR
1
shiba
toshi
2
NULL
judah
3
jindo
pearl

You cannot put NULL inside an IN expression.

SELECT *
FROM dogs
WHERE breed NOT IN ('shiba', NULL);
id
INTEGER
breed
VARCHAR
name
VARCHAR
no rows

You also can't do composite comparisons.

SELECT *
FROM dogs
WHERE (breed, name) = (NULL, 'judah');
id
INTEGER
breed
VARCHAR
name
VARCHAR
no rows

Despite all that stuff about NULL not equaling itself, sometimes is is considered equal to itself, like when you use UNION.

SELECT 'shiba'
UNION
SELECT 'shiba'
UNION
SELECT NULL
UNION
SELECT NULL;
?column?
TEXT
NULL
shiba

NULL Violates Boolean Algebra

It also breaks some laws of boolean logic. You would expect a WHERE clause like this to return all rows, but the NULLs are left out:

SELECT *
FROM dogs
WHERE breed = 'shiba' OR breed != 'shiba';
id
INTEGER
breed
VARCHAR
name
VARCHAR
1
shiba
toshi
3
jindo
pearl

Even if the value is unknown, there is no such concrete value that could ever return FALSE for that condition. So, we should see all 3 rows in that result.

Another thing is that 2 seemingly logically equivalent expressions can become non-equivalent when NULL is in the mix. Consider this CASE expression:

SELECT
  *,
  CASE
    WHEN breed = 'shiba' THEN 'yes'
    ELSE 'no'
  END AS i_want_to_adopt
FROM dogs;
id
INTEGER
breed
VARCHAR
name
VARCHAR
i_want_to_adopt
TEXT
1
shiba
toshi
yes
2
NULL
judah
no
3
jindo
pearl
no

Imagine I write that a different way, but logically equivalent. It should have the same result, but it doesn't…

SELECT
  *,
  CASE
    WHEN breed != 'shiba' THEN 'no'
    ELSE 'yes'
  END AS i_want_to_adopt
FROM dogs;
id
INTEGER
breed
VARCHAR
name
VARCHAR
i_want_to_adopt
TEXT
1
shiba
toshi
yes
2
NULL
judah
yes
3
jindo
pearl
no

The dog "Judah" gets a different result.

NULL Appears Unknowingly

Sometimes NULL can show up when you don't expect it. It shows up when you do a left, right, or full outer join. It sometimes gets returned from functions or operators to indicate an error.

SELECT
  (ARRAY[1, 2])[3],
  to_char(TIMESTAMP '2021-08-22 08:30:00', '');
array
INTEGER
to_char
TEXT
NULLNULL

Since a lot of operators and functions output NULL when 1 of their inputs is NULL, the NULLs can propagate throughout your query and sabotage your result.

Counting

For most aggregate functions, NULL is usually skipped, like for count(), sum(), avg(), max(), min(), etc.

SELECT count(breed)
FROM dogs;
count
BIGINT
2

But NULL isn't skipped for array_agg().

SELECT cardinality(array_agg(breed))
FROM dogs;
cardinality
INTEGER
3

SQL Is Not Composable

SQL has 2 different types of expressions which can't substitute each other: table expressions and value expressions. There are also "expression lists," such as the IN (…, …) operator, that aren't themselves considered expressions. This is very confusing. Most programming languages have syntax which treats all expressions as interchangeable.

Consider this snippet of JavaScript:

let favoriteFood
favoriteFood = "kimchi"
favoriteFood = getFavoriteFood(user)

let topFoods
topFoods = ["steak", "sushi"]
topFoods = getTopFoods(user)

A function call, like getFavoriteFood(), that gets a primitive value, such as the string 'kimchi', is interchangeable with the string itself. The same is true for collections of values, like arrays. Arrays are expressions, and they can be substituted for other expressions which compute arrays. This makes JavaScript composable, and it's a useful feature when computing values gets very complex.

SQL has functions though, right? So isn't SQL composable too? The problem isn't the lack of functions. It's the inability to substitute some pieces for others. This is why having non-interchangeable value expressions, table expressions, and expression lists is inconvenient. Consider this example:

SELECT *
FROM employees
WHERE department IN ('Marketing', 'Functionality');

This query gets all employees in "marketing" or "functionality," whatever that is. What if our logic got a little more complicated, and we wanted different departments for certain companies? For example, let's say we want "research" and "solutions" employees for Footbook. If the expression list is an expression, it should be able to be substituted with another expression, like a CASE.

SELECT *
FROM employees
WHERE department IN CASE
  WHEN company = 'Footbook' THEN ('Research', 'Solutions')
  ELSE ('Marketing', 'Functionality')
END;
error: syntax error at or near "CASE"

Nope! The expression list itself is not an expression, so it can't be substituted for a value expression. Well, as we learned in the subquery chapter, it can be substituted for a subquery. This is only because the grammar allows a conversion of subquery to expression list, and it will only be valid if the table expression returns a table with 1 column. For some reason, it accepts subqueries but not other table expressions like table functions.

Value expressions also aren't interchangeable with table expressions. There is a complicated set of rules for conversion between the 2 under certain conditions. One example we've learned in the Value Expressions section. Subqueries are always allowed in table expressions but only sometimes in value expressions depending on the result.

SELECT
  *,
  (
    SELECT m.mood
    FROM mood_updates m
    WHERE m.user_id = s.user_id
      AND CAST(m.ts AS DATE) = CAST(s.ts AS DATE)
  ) AS mood
FROM sleep_reports s
WHERE s.user_id = 340679
  AND CAST(s.ts AS DATE) = '2022-08-05';
id
INTEGER
user_id
INTEGER
ts
TIMESTAMPTZ
efficiency
INTEGER
rem_perc
INTEGER
duration_min
INTEGER
mood
INTEGER
440574080
340679
2022-08-05 05:45:49+00
63
40
482
8

This query is legal because it has a subquery in the SELECT clause which returns 1 row. But for a different date…

SELECT
  *,
  (
    SELECT m.mood
    FROM mood_updates m
    WHERE m.user_id = s.user_id
      AND CAST(m.ts AS DATE) = CAST(s.ts AS DATE)
  ) AS mood
FROM sleep_reports s
WHERE s.user_id = 340679
  AND CAST(s.ts AS DATE) = '2022-08-01';
error: more than one row returned by a subquery used as an expression

The same query is illegal because this time the subquery returns multiple rows. We are forced to rewrite this query using a join to make it work.

There are other strange conversions that can happen. Only some value expressions (like normal functions) can go in FROM/JOIN clauses, where a table expression is normally expected.

SELECT *
FROM upper('confusing');
upper
TEXT
CONFUSING

The expression upper('confusing') outputs a single value: 'CONFUSING'. It gets automatically converted to a 1-column, 1-row table to make it a valid table expression.

SELECT *
FROM 2 + 2;
error: syntax error at or near "2"

Strangely, math expressions like 2 + 2 do not get converted.

The last example I'll discuss is that table functions can go in the SELECT list. However, this is the most counter-intuitive conversion in my opinion. Here is a valid query. See if you can predict what the result would be before looking at it.

SELECT
  name,
  generate_series(1, 3) AS s1,
  generate_series(4, 5) AS s2
FROM dogs;
name
VARCHAR
s1
INTEGER
s2
INTEGER
toshi
1
4
toshi
2
5
toshi
3
NULL
judah
1
4
judah
2
5
judah
3
NULL
pearl
1
4
pearl
2
5
pearl
3
NULL

The outputs of the table functions in SELECT are implicitly outer-joined by an unseen "row number," or index, and then that becomes the right-hand table of a cross-join with the table in the FROM clause. This is still valid even for table function that return multiple rows and multiple columns.

Why should table functions convert in this way but not subqueries that return multiple rows and multiple columns? Recall that those with throw an error instead.

Summary

In summary, SQL is an imperfect language. It is nice that the same language and the same model, the relational model, are used by different databases. It is also nice that this model can represent the data for any application. However, SQL…

  1. Is bloated. It is hard to learn because it has so many keywords.
  2. Has unnatural syntax. It strived to look like natural language, but natural language doesn't express these operations well.
  3. Has Inconsistent syntax. Some functions have "special" syntax for their arguments.
  4. Has confusing NULL semantics.
  5. Is not composable. Some pieces of the language don't fit into the spaces where other pieces can go.

Overall, it is expressive enough to support whatever application you may have. As the "lingua franca" of data, it allows all kinds of professionals to carry concepts between different databases. These strengths make it worth using despite the shortcomings.

© 2022 Andrew Carlson. All rights reserved.