Junction Tables
We have seen examples of one-to-one and one-to-many relationships.
For a social media website like "Tweeter", tweeters
and tweets
is an example of one-to-many.
This is because each user can have many tweets, and each tweet must belong to only 1 user.
In this case, the relationship is represented as a foreign key: the user_id
column in the tweets
table.
This representation works because we know each tweet is owned by only 1 user.
The same is true for a one-to-one relationship.
Users Can Like Tweets
Our app will have a feature where you can like a tweet, and see all the tweets you've liked. Users liking tweets is another relationship! It's a totally new kind: many-to-many. A user can like many tweets, and a tweet can be liked by many users.
How would we represent this in our tables?
Can we add a foreign key in the tweeters
or the tweets
table?
Think back to our other examples of foreign keys.
A user_id
in tweets
works because a tweet belongs to 1 user.
A user_id
in drivers_licenses
works because a license belongs to 1 user.
Trying to add some sort of liker_id
to tweets
or liked_id
to tweeters
won't work because there are many values.
So what do we do?
Junction Table
We can represent a many-to-many relationship using a separate table, likes
, for this relationship.
column_name | type |
---|---|
id | INTEGER |
username | VARCHAR |
email | VARCHAR |
registered_at | TIMESTAMPTZ |
column_name | type | reference |
---|---|---|
id | INTEGER | NULL |
body | TEXT | NULL |
user_id | INTEGER | tweeters.id |
created_at | TIMESTAMPTZ | NULL |
column_name | type | reference |
---|---|---|
user_id | INTEGER | tweeters.id |
tweet_id | INTEGER | tweets.id |
created_at | TIMESTAMPTZ | NULL |
The likes
table is just a table of 2 foreign keys to tweeters
and tweets
.
Each row in likes
is a like, and it stores which tweet was liked and who did it.
There are many different names for this type of table: a "join table", "cross table", or "junction table".
In this diagram, we have 2 users, @joey
and @ross
.
Joey likes the tweets "how you doin'" and "what's up" while Ross likes "what's up" and "were on a break".
Joining Through the Junction Table
What if we wanted to get all the tweets liked by the user named @hilda.grant53
?
We need information (the username, to get the right user) from tweeters
and from tweets
.
To know which tweets to get, we need to join through likes
.
So, we need to join tweeters
, likes
and tweets
, in a triple join.
SELECT tweets.*
FROM tweeters liker
JOIN likes
ON likes.user_id = liker.id
JOIN tweets
ON likes.tweet_id = tweets.id
WHERE liker.username = 'hilda.grant53';
id INTEGER | body TEXT | user_id INTEGER | created_at TIMESTAMPTZ |
---|---|---|---|
22 | You can't hack the bandwidth without quantifying the auxiliary SQL firewall! | 3 | 2019-12-03 07:27:54.773+00 |
27 | You can't index the driver without connecting the wireless FTP application! | 10 | 2021-04-26 20:48:05.334+00 |
33 | We need to back up the back-end AGP pixel! | 10 | 2021-09-23 20:10:58.765+00 |
37 | generating the protocol won't do anything, we need to back up the virtual USB hard drive! | 1 | 2021-12-20 11:41:28.716+00 |
35 | I'll back up the mobile AI circuit, that should array the EXE hard drive! | 9 | 2021-11-10 23:56:02.234+00 |
Joining the Same Table Multiple Times
We can get all the tweets that @hilda.grant53
likes.
However, we don't know the usernames of who wrote those tweets.
The username
column is back in the tweeters
table, so to fetch it we'll need to join tweeters
again.
SELECT
tweets.*,
authors.username AS author_username
FROM tweeters liker
JOIN likes
ON likes.user_id = liker.id
JOIN tweets
ON likes.tweet_id = tweets.id
JOIN tweeters authors
ON authors.id = tweets.user_id
WHERE liker.username = 'hilda.grant53';
id INTEGER | body TEXT | user_id INTEGER | created_at TIMESTAMPTZ | author_username VARCHAR |
---|---|---|---|---|
22 | You can't hack the bandwidth without quantifying the auxiliary SQL firewall! | 3 | 2019-12-03 07:27:54.773+00 | brisa_jenkins39 |
27 | You can't index the driver without connecting the wireless FTP application! | 10 | 2021-04-26 20:48:05.334+00 | madge.wehner |
33 | We need to back up the back-end AGP pixel! | 10 | 2021-09-23 20:10:58.765+00 | madge.wehner |
37 | generating the protocol won't do anything, we need to back up the virtual USB hard drive! | 1 | 2021-12-20 11:41:28.716+00 | deonte_murray45 |
35 | I'll back up the mobile AI circuit, that should array the EXE hard drive! | 9 | 2021-11-10 23:56:02.234+00 | enrico.durgan1 |
This can get confusing.
The tweeters
table needs to be referenced twice because the relationship of the users to the other rows in the query are different for each reference.
We had to get tweets liked by a certain user, so tweeters
represents "people who liked tweets" at first.
But then we also reference tweeters
to get "people who wrote tweets" which is a different set of users.
Each time we reference the tweeters
table we should alias it appropriately.
Think of each alias as a separate instance of data from the underlying table,
and you can use those aliases to specify which instance you query.
In the last query, we get authors.username
to specify that we aren't looking for username of the liker, @hilda.grant53
.