Junction Tables
We have seen examples of one-to-one and one-to-many relationships.
For a social media website like "Tweeter", tweeters and tweets is an example of one-to-many.
This is because each user can have many tweets, and each tweet must belong to only 1 user.
In this case, the relationship is represented as a foreign key: the user_id column in the tweets table.
This representation works because we know each tweet is owned by only 1 user.
The same is true for a one-to-one relationship.
Users Can Like Tweets
Our app will have a feature where you can like a tweet, and see all the tweets you've liked. Users liking tweets is another relationship! It's a totally new kind: many-to-many. A user can like many tweets, and a tweet can be liked by many users.
How would we represent this in our tables?
Can we add a foreign key in the tweeters or the tweets table?
Think back to our other examples of foreign keys.
A user_id in tweets works because a tweet belongs to 1 user.
A user_id in drivers_licenses works because a license belongs to 1 user.
Trying to add some sort of liker_id to tweets or liked_id to tweeters won't work because there are many values.
So what do we do?
Junction Table
We can represent a many-to-many relationship using a separate table, likes, for this relationship.
| column_name | type |
|---|---|
id | INTEGER |
username | VARCHAR |
email | VARCHAR |
registered_at | TIMESTAMPTZ |
| column_name | type | reference |
|---|---|---|
id | INTEGER | NULL |
body | TEXT | NULL |
user_id | INTEGER | tweeters.id |
created_at | TIMESTAMPTZ | NULL |
| column_name | type | reference |
|---|---|---|
user_id | INTEGER | tweeters.id |
tweet_id | INTEGER | tweets.id |
created_at | TIMESTAMPTZ | NULL |
The likes table is just a table of 2 foreign keys to tweeters and tweets.
Each row in likes is a like, and it stores which tweet was liked and who did it.
There are many different names for this type of table: a "join table", "cross table", or "junction table".
In this diagram, we have 2 users, @joey and @ross.
Joey likes the tweets "how you doin'" and "what's up" while Ross likes "what's up" and "were on a break".
Joining Through the Junction Table
What if we wanted to get all the tweets liked by the user named @hilda.grant53?
We need information (the username, to get the right user) from tweeters and from tweets.
To know which tweets to get, we need to join through likes.
So, we need to join tweeters, likes and tweets, in a triple join.
SELECT tweets.*
FROM tweeters liker
JOIN likes
ON likes.user_id = liker.id
JOIN tweets
ON likes.tweet_id = tweets.id
WHERE liker.username = 'hilda.grant53';
| id INTEGER | body TEXT | user_id INTEGER | created_at TIMESTAMPTZ |
|---|---|---|---|
22 | You can't hack the bandwidth without quantifying the auxiliary SQL firewall! | 3 | 2019-12-03 07:27:54.773+00 |
27 | You can't index the driver without connecting the wireless FTP application! | 10 | 2021-04-26 20:48:05.334+00 |
33 | We need to back up the back-end AGP pixel! | 10 | 2021-09-23 20:10:58.765+00 |
37 | generating the protocol won't do anything, we need to back up the virtual USB hard drive! | 1 | 2021-12-20 11:41:28.716+00 |
35 | I'll back up the mobile AI circuit, that should array the EXE hard drive! | 9 | 2021-11-10 23:56:02.234+00 |
Joining the Same Table Multiple Times
We can get all the tweets that @hilda.grant53 likes.
However, we don't know the usernames of who wrote those tweets.
The username column is back in the tweeters table, so to fetch it we'll need to join tweeters again.
SELECT
tweets.*,
authors.username AS author_username
FROM tweeters liker
JOIN likes
ON likes.user_id = liker.id
JOIN tweets
ON likes.tweet_id = tweets.id
JOIN tweeters authors
ON authors.id = tweets.user_id
WHERE liker.username = 'hilda.grant53';
| id INTEGER | body TEXT | user_id INTEGER | created_at TIMESTAMPTZ | author_username VARCHAR |
|---|---|---|---|---|
22 | You can't hack the bandwidth without quantifying the auxiliary SQL firewall! | 3 | 2019-12-03 07:27:54.773+00 | brisa_jenkins39 |
27 | You can't index the driver without connecting the wireless FTP application! | 10 | 2021-04-26 20:48:05.334+00 | madge.wehner |
33 | We need to back up the back-end AGP pixel! | 10 | 2021-09-23 20:10:58.765+00 | madge.wehner |
37 | generating the protocol won't do anything, we need to back up the virtual USB hard drive! | 1 | 2021-12-20 11:41:28.716+00 | deonte_murray45 |
35 | I'll back up the mobile AI circuit, that should array the EXE hard drive! | 9 | 2021-11-10 23:56:02.234+00 | enrico.durgan1 |
This can get confusing.
The tweeters table needs to be referenced twice because the relationship of the users to the other rows in the query are different for each reference.
We had to get tweets liked by a certain user, so tweeters represents "people who liked tweets" at first.
But then we also reference tweeters to get "people who wrote tweets" which is a different set of users.
Each time we reference the tweeters table we should alias it appropriately.
Think of each alias as a separate instance of data from the underlying table,
and you can use those aliases to specify which instance you query.
In the last query, we get authors.username to specify that we aren't looking for username of the liker, @hilda.grant53.