For the Haters
Despite the fact that SQL has become the de-facto standard language for relational data, it isn't perfect. There are many valid criticisms of the language. You must know these shortcomings to gain true mastery of the language.
SQL Is Bloated
SQL has a ton of reserved keywords, e.g. SELECT
, WHERE
, ORDER
, etc.
These are forbidden from being used as identifiers,
so you need to take care not to name a table, column, or function any of these words.
You also need to be familiar with what they do to write queries.
The latest version of the standard, SQL:2016, has over 400 reserved keywords.
That's not normal.
The programming language Go has only 25.
This makes SQL very difficult to learn.
SQL Tries to Look Like Natural English, But Fails
When designing SQL, the goal was to create an intuitive syntax for "non-professionals" to be able to read and comprehend SQL queries. For some simple queries, that is true.
SELECT age
FROM daters
WHERE name = 'Ben';
Most people can figure that one out. However, once you start doing joins, subqueries, window functions, or anything more complex, intuition is thrown out the window.
SELECT
s.*,
m.mood,
m.ts
FROM (
SELECT
*,
CAST(ts AS DATE) AS d,
row_number() OVER (
PARTITION BY user_id, CAST(ts AS DATE)
ORDER BY ts DESC
) AS ts_num
FROM mood_updates
) m
JOIN sleep_reports s
ON m.d + INTERVAL '1 day' = CAST(s.ts AS DATE)
WHERE ts_num = 1
AND efficiency > 70
ORDER BY s.user_id;
Try showing that one to a non-professional. See if they can make any sense of that. Actually, most professionals will probably struggle to interpret that!
Compared to that, the equivalent code in a typical modern programming language will be much more comprehensible to most programmers because they don't strive to look like english. It will look more like a tree of for-loops and if-statements and will be easier to reason about. Natural language is actually a terrible way to express algorithms.
Randomly Inconsistent Function Arguments
Most of the time, function calls in SQL look exactly like in most programming languages: function_name(arg_1, arg_2, …)
.
However, some functions randomly have different syntax with keywords thrown in.
SELECT
substring('PostgreSQL' FROM 8 FOR 3),
trim(BOTH FROM 'yxSQLzz', 'xyz'),
extract(DAY FROM TIMESTAMP '2001-02-16 20:38:40');
substring TEXT | btrim TEXT | extract NUMERIC |
---|---|---|
SQL | SQL | 16 |
Why? I don't know…
NULL Is Very Confusing
After reading our page on NULL
and getting tripped up by NULL
in a few challenges,
perhaps you don't need any convincing that NULL
is confusing.
One reason it's so tricky is that it doesn't equal itself. You need a special operator to even detect it.
SELECT
NULL = NULL AS "equal_itself?",
NULL IS NULL AS "is_null?";
equal_itself? BOOL | is_null? BOOL |
---|---|
NULL | TRUE |
This property leads to some unexpected behavior. Let's look at some examples using this simple table.
SELECT * FROM dogs;
id INTEGER | breed VARCHAR | name VARCHAR |
---|---|---|
1 | shiba | toshi |
2 | NULL | judah |
3 | jindo | pearl |
You cannot put NULL
inside an IN
expression.
SELECT *
FROM dogs
WHERE breed NOT IN ('shiba', NULL);
id INTEGER | breed VARCHAR | name VARCHAR |
---|---|---|
no rows |
You also can't do composite comparisons.
SELECT *
FROM dogs
WHERE (breed, name) = (NULL, 'judah');
id INTEGER | breed VARCHAR | name VARCHAR |
---|---|---|
no rows |
Despite all that stuff about NULL
not equaling itself, sometimes is is considered equal to itself,
like when you use UNION
.
SELECT 'shiba'
UNION
SELECT 'shiba'
UNION
SELECT NULL
UNION
SELECT NULL;
?column? TEXT |
---|
NULL |
shiba |
NULL Violates Boolean Algebra
It also breaks some laws of boolean logic.
You would expect a WHERE
clause like this to return all rows,
but the NULL
s are left out:
SELECT *
FROM dogs
WHERE breed = 'shiba' OR breed != 'shiba';
id INTEGER | breed VARCHAR | name VARCHAR |
---|---|---|
1 | shiba | toshi |
3 | jindo | pearl |
Even if the value is unknown, there is no such concrete value that could ever return FALSE
for that condition.
So, we should see all 3 rows in that result.
Another thing is that 2 seemingly logically equivalent expressions can become non-equivalent when NULL
is in the mix.
Consider this CASE
expression:
SELECT
*,
CASE
WHEN breed = 'shiba' THEN 'yes'
ELSE 'no'
END AS i_want_to_adopt
FROM dogs;
id INTEGER | breed VARCHAR | name VARCHAR | i_want_to_adopt TEXT |
---|---|---|---|
1 | shiba | toshi | yes |
2 | NULL | judah | no |
3 | jindo | pearl | no |
Imagine I write that a different way, but logically equivalent. It should have the same result, but it doesn't…
SELECT
*,
CASE
WHEN breed != 'shiba' THEN 'no'
ELSE 'yes'
END AS i_want_to_adopt
FROM dogs;
id INTEGER | breed VARCHAR | name VARCHAR | i_want_to_adopt TEXT |
---|---|---|---|
1 | shiba | toshi | yes |
2 | NULL | judah | yes |
3 | jindo | pearl | no |
The dog "Judah" gets a different result.
NULL Appears Unknowingly
Sometimes NULL
can show up when you don't expect it.
It shows up when you do a left, right, or full outer join.
It sometimes gets returned from functions or operators to indicate an error.
SELECT
(ARRAY[1, 2])[3],
to_char(TIMESTAMP '2021-08-22 08:30:00', '');
array INTEGER | to_char TEXT |
---|---|
NULL | NULL |
Since a lot of operators and functions output NULL
when 1 of their inputs is NULL
,
the NULL
s can propagate throughout your query and sabotage your result.
Counting
For most aggregate functions, NULL
is usually skipped, like for count()
, sum()
, avg()
, max()
, min()
, etc.
SELECT count(breed)
FROM dogs;
count BIGINT |
---|
2 |
But NULL
isn't skipped for array_agg()
.
SELECT cardinality(array_agg(breed))
FROM dogs;
cardinality INTEGER |
---|
3 |
SQL Is Not Composable
SQL has 2 different types of expressions which can't substitute each other: table expressions and value expressions.
There are also "expression lists," such as the IN (…, …)
operator, that aren't themselves considered expressions.
This is very confusing.
Most programming languages have syntax which treats all expressions as interchangeable.
Consider this snippet of JavaScript:
let favoriteFood
favoriteFood = "kimchi"
favoriteFood = getFavoriteFood(user)
let topFoods
topFoods = ["steak", "sushi"]
topFoods = getTopFoods(user)
A function call, like getFavoriteFood()
, that gets a primitive value, such as the string 'kimchi'
, is interchangeable with the string itself.
The same is true for collections of values, like arrays.
Arrays are expressions, and they can be substituted for other expressions which compute arrays.
This makes JavaScript composable, and it's a useful feature when computing values gets very complex.
SQL has functions though, right? So isn't SQL composable too? The problem isn't the lack of functions. It's the inability to substitute some pieces for others. This is why having non-interchangeable value expressions, table expressions, and expression lists is inconvenient. Consider this example:
SELECT *
FROM employees
WHERE department IN ('Marketing', 'Functionality');
This query gets all employees in "marketing" or "functionality," whatever that is.
What if our logic got a little more complicated, and we wanted different departments for certain companies?
For example, let's say we want "research" and "solutions" employees for Footbook.
If the expression list is an expression, it should be able to be substituted with another expression, like a CASE
.
SELECT *
FROM employees
WHERE department IN CASE
WHEN company = 'Footbook' THEN ('Research', 'Solutions')
ELSE ('Marketing', 'Functionality')
END;
error: syntax error at or near "CASE"
Nope! The expression list itself is not an expression, so it can't be substituted for a value expression. Well, as we learned in the subquery chapter, it can be substituted for a subquery. This is only because the grammar allows a conversion of subquery to expression list, and it will only be valid if the table expression returns a table with 1 column. For some reason, it accepts subqueries but not other table expressions like table functions.
Value expressions also aren't interchangeable with table expressions. There is a complicated set of rules for conversion between the 2 under certain conditions. One example we've learned in the Value Expressions section. Subqueries are always allowed in table expressions but only sometimes in value expressions depending on the result.
SELECT
*,
(
SELECT m.mood
FROM mood_updates m
WHERE m.user_id = s.user_id
AND CAST(m.ts AS DATE) = CAST(s.ts AS DATE)
) AS mood
FROM sleep_reports s
WHERE s.user_id = 340679
AND CAST(s.ts AS DATE) = '2022-08-05';
id INTEGER | user_id INTEGER | ts TIMESTAMPTZ | efficiency INTEGER | rem_perc INTEGER | duration_min INTEGER | mood INTEGER |
---|---|---|---|---|---|---|
440574080 | 340679 | 2022-08-05 05:45:49+00 | 63 | 40 | 482 | 8 |
This query is legal because it has a subquery in the SELECT
clause which returns 1 row.
But for a different date…
SELECT
*,
(
SELECT m.mood
FROM mood_updates m
WHERE m.user_id = s.user_id
AND CAST(m.ts AS DATE) = CAST(s.ts AS DATE)
) AS mood
FROM sleep_reports s
WHERE s.user_id = 340679
AND CAST(s.ts AS DATE) = '2022-08-01';
error: more than one row returned by a subquery used as an expression
The same query is illegal because this time the subquery returns multiple rows. We are forced to rewrite this query using a join to make it work.
There are other strange conversions that can happen.
Only some value expressions (like normal functions) can go in FROM
/JOIN
clauses, where a table expression is normally expected.
SELECT *
FROM upper('confusing');
upper TEXT |
---|
CONFUSING |
The expression upper('confusing')
outputs a single value: 'CONFUSING'
.
It gets automatically converted to a 1-column, 1-row table to make it a valid table expression.
SELECT *
FROM 2 + 2;
error: syntax error at or near "2"
Strangely, math expressions like 2 + 2
do not get converted.
The last example I'll discuss is that table functions can go in the SELECT
list.
However, this is the most counter-intuitive conversion in my opinion.
Here is a valid query.
See if you can predict what the result would be before looking at it.
SELECT
name,
generate_series(1, 3) AS s1,
generate_series(4, 5) AS s2
FROM dogs;
name VARCHAR | s1 INTEGER | s2 INTEGER |
---|---|---|
toshi | 1 | 4 |
toshi | 2 | 5 |
toshi | 3 | NULL |
judah | 1 | 4 |
judah | 2 | 5 |
judah | 3 | NULL |
pearl | 1 | 4 |
pearl | 2 | 5 |
pearl | 3 | NULL |
The outputs of the table functions in SELECT
are implicitly outer-joined by an unseen "row number," or index,
and then that becomes the right-hand table of a cross-join with the table in the FROM
clause.
This is still valid even for table function that return multiple rows and multiple columns.
Why should table functions convert in this way but not subqueries that return multiple rows and multiple columns? Recall that those with throw an error instead.
Summary
In summary, SQL is an imperfect language. It is nice that the same language and the same model, the relational model, are used by different databases. It is also nice that this model can represent the data for any application. However, SQL…
- Is bloated. It is hard to learn because it has so many keywords.
- Has unnatural syntax. It strived to look like natural language, but natural language doesn't express these operations well.
- Has Inconsistent syntax. Some functions have "special" syntax for their arguments.
- Has confusing
NULL
semantics. - Is not composable. Some pieces of the language don't fit into the spaces where other pieces can go.
Overall, it is expressive enough to support whatever application you may have. As the "lingua franca" of data, it allows all kinds of professionals to carry concepts between different databases. These strengths make it worth using despite the shortcomings.