Types of joins in SQL (cheatsheet for beginners)
This article is used in my book "Programming with databases".
Better to see once than to hear a hundred times. This small article explains what are the different kinds of joins in SQL.
At the first time you should take into account that SQL is a declarative 4th generation language. So the best way to use it (especially for beginners) is to write the query and let SQL optimizer do its job. That means thinking in set-oriented terms and avoiding sub-queries.
Let's take two example tables: Companies and Contacts.
CREATE TABLE companies (
company_id INT NOT NULL,
company_name VARCHAR(64) NOT NULL,
phone VARCHAR(16) NULL,
CONSTRAINT pk_companies PRIMARY KEY (company_id)
);
CREATE TABLE contacts (
contact_id INT NOT NULL,
contact_name VARCHAR(64),
phone VARCHAR(16) NULL,
company_id INT NULL,
CONSTRAINT pk_contacts PRIMARY KEY (contact_id),
CONSTRAINT fk_contact_company
FOREIGN KEY (company_id)
REFERENCES companies(company_id)
);
Fill these tables by following data:
INSERT INTO companies VALUES (1, 'AT&T', null);
INSERT INTO companies VALUES (2, 'Microsoft', '322-223');
INSERT INTO contacts VALUES (1, 'Bjarne Stroustrup', null, 1);
INSERT INTO contacts VALUES (2, 'Niklaus Wirth', '322-223', null);
INSERT INTO contacts VALUES (3, 'Bill Gates', '322-223', 2);
Here are the tables with data:
Companies
company_id | company_name | phone |
---|---|---|
1 | AT&T | NULL |
2 | Microsoft | 322-223 |
Contacts
contact_id | contact_name | phone | company_id |
---|---|---|---|
1 | Bjarne Stroustrup | NULL | 1 |
2 | Niklaus Wirth | 322-223 | NULL |
3 | Bill Gates | 322-223 | 2 |
Now it's time to write queries.
INNER JOIN
SELECT contact_name, company_name
FROM contacts
INNER JOIN companies
ON contacts.company_id = companies.company_id
ORDER BY contact_name
contact_name | company_name |
---|---|
Bjarne Stroustrup | AT&T |
Bill Gates | Microsoft |
LEFT OUTER JOIN
SELECT contact_name, company_name
FROM contacts
LEFT OUTER JOIN companies
ON contacts.company_id = companies.company_id
ORDER BY contact_name
contact_name | company_name |
---|---|
Bjarne Stroustrup | AT&T |
Niklaus Wirth | NULL |
Bill Gates | Microsoft |
RIGHT OUTER JOIN
In this example tables are joint by a non-key attribute. In relational model you don't need to think about a physical link existence.
SELECT contact_name, company_name
FROM contacts
RIGHT OUTER JOIN companies
ON contacts.phone = companies.phone
ORDER BY contact_name
contact_name | company_name |
---|---|
NULL | AT&T |
Niklaus Wirth | Microsoft |
Bill Gates | Microsoft |
Let's write the symmetric "left outer join", too because it's required for the next example.
SELECT contact_name, company_name
FROM contacts
LEFT OUTER JOIN companies
ON contacts.phone = companies.phone
ORDER BY contact_name
contact_name | company_name |
---|---|
Bjarne Stroustrup | NULL |
Niklaus Wirth | Microsoft |
Bill Gates | Microsoft |
FULL JOIN
SELECT contact_name, company_name
FROM contacts
FULL OUTER JOIN companies
ON contacts.phone = companies.phone
ORDER BY contact_name
As you can see, FULL JOIN
is the union of results returned by LEFT
and RIGHT
outer joins.
contact_name | company_name |
---|---|
NULL | AT&T |
Bjarne Stroustrup | NULL |
Niklaus Wirth | Microsoft |
Bill Gates | Microsoft |
CROSS JOIN is the Cartesian product
CROSS JOIN
is also called "Cartesian product".
SELECT contact_name, company_name
FROM contacts
CROSS JOIN companies
ORDER BY contact_name
contact_name | company_name |
---|---|
Bjarne Stroustrup | AT&T |
Bjarne Stroustrup | Microsoft |
Niklaus Wirth | AT&T |
Niklaus Wirth | Microsoft |
Bill Gates | AT&T |
Bill Gates | Microsoft |
CROSS APPLY
The CROSS APPLY
is also called lateral join. Let's add some data to see the difference.
INSERT INTO companies VALUES (3, 'Google', '333-222');
INSERT INTO contacts VALUES (4, 'Anders Hejlsberg', NULL, 2);
Now, if you run the following query, the result will be the same as for INNER JOIN
.
-- inner join
SELECT company_name, contact_name
FROM companies
INNER JOIN contacts
ON contacts.company_id = companies.company_id
ORDER BY contact_name;
-- lateral join
SELECT company_name, contact_name
FROM companies
CROSS APPLY(SELECT * FROM contacts
WHERE contacts.company_id = companies.company_id
) contacts
ORDER BY contact_name
company_name contact_name
------------- ------------------
Microsoft Anders Hejlsberg
Microsoft Bill Gates
AT&T Bjarne Stroustrup
So what the difference?
There are several scenario whether the CROSS APPLY
is useful.
Join with a table function
Many DBMS allow to write user defined functions returning a dataset (table). For example, in SQL Server you may create following function which returns contacts for the given company.
CREATE FUNCTION dbo.company_get_contacts(@company_id int)
RETURNS TABLE
AS RETURN (
SELECT contact_id, contact_name
FROM contacts
WHERE company_id = @company_id
)
If you want to join companies with this function, you need to provide the value of the company_id
column as an argument for @company_id
parameters. Without CROSS APPLY the only SELECT section where you could call this function.
SELECT company_id,
dbo.company_get_contacts(company_id) AS contact
FROM companies
However, the company_get_contacts()
function returns the table, not the scalar value so the query is completely wrong!
The CROSS APPLY
resolve the problem.
SELECT company_name, contact_name
FROM companies
CROSS APPLY dbo.company_get_contacts(companies.company_id) contacts
ORDER BY contact_name
The result is still the same but we encapsulated some logic in the function that may be changed with a minimum modifications in the queries. For example, you may decide to filter and return only some contacts. In this case you have to modify only the function, the query stay unchanged.
Join with a first row only
In some scenarios (usually in OLAP) you may need to show only the first matched row. For example, show the companies that have at least one contact having a blank phone number.
SELECT company_name, contact_name
FROM companies
CROSS APPLY(SELECT TOP 1 contact_name FROM contacts
WHERE contacts.company_id = companies.company_id
AND contacts.phone IS NULL
) contacts
ORDER BY contact_name
Result
company_name contact_name
------------- ------------------
Microsoft Anders Hejlsberg
AT&T Bjarne Stroustrup
OUTER APPLY
Compared with CROSS APPLY
, the OUTER APPLY
has the same meaning that OUTER JOIN
compared with INNER JOIN
.
If you run again the query returning contacts having blank phone numbers but using OUTER APPLY instead, all companies will be selected regardless whether it has linked contacts or not.
SELECT company_name, contact_name
FROM companies
OUTER APPLY(SELECT TOP 1 contact_name FROM contacts
WHERE contacts.company_id = companies.company_id
AND contacts.phone IS NULL
) contacts
ORDER BY contact_name
Result
company_name contact_name
------------- ------------------
Google NULL
Microsoft Anders Hejlsberg
AT&T Bjarne Stroustrup
Have a nice SQL!
This is a revised and completed translation of my article published in April 2006 in "PC World (Russian edition)", N°12, 2007
blog comments powered by Disqus