When working with databases, SQL joins (Structured Query Language table joins) are used to combine rows from two or more tables, based on a related set of columns between them. In this article, we’ll take a look at the different types of SQL table joins that we can use to fetch related information from our databases.
To get started with SQL joins, we’ll refreshing some things about relational databases (a digital database whose organization is based on the relational model of data, as proposed by E. F. Codd in 1970). At a very basic level, a relational database is a collection of data entities. Entities are physical or abstract elements such as a single person, place, or thing about which data can be stored) organized as a set of formally-described tables. From these tables, we can access/reassemble out data in many different ways without having to reorganize the tables themselves.
Relational databases are usually normalized. Normalization means dividing a database into two or more tables and defining relationships between them by using common attributes (called keys). This eliminates duplication of information, such as when entities have one-to-many relationships.
Working with SQL Joins
Imagine a list of Department associated with a number of Employees. After normalizing the tables, All Departments are stored in the “Departments” table whilst all Employees are stored on their own table. In order to find related rows between the 2 tables, we use keys to match Departments with their Employees. To speed up queries, we use indexes such as single or composite indexes.
This allows people to query a database to find things such as which Employees work in any given Department or which Departments do not have any associated Employees. In order to find such relationships, we use a number of different SQL joins. Using a simple table model, we can model our database with Employees linked to Departments through the IDDepartment column.
Inner SQL joins focus on finding matching data between two tables. When using an inner join, there must be at least some matching columns between two (or more) tables that we use to return a non-empty set of data. In our example, using an inner join returns rows where a Department has at least an Employee with a matching IDDepartment value.
SELECT Departments.*, Employees.* FROM Departments INNER JOIN Employees ON (Departments.IDDepartment = Employees.IDDepartment);
Outer joins return a set of records (or rows) that include what an inner join would return but also includes other rows for which no corresponding match is found in the other table. There are three possible types of outer SQL joins: LEFT JOINS, RIGHT JOINS and FULL OUTER JOINS (also known as CROSS JOINS).
A LEFT JOIN returns all rows in the table on the left side of the SQL query (in this case, the Departments table) regardless of whether Departments have matching Employees. Whenever a row from the table on the left (Departments) does not have a matching row on the right side (Employees), the columns belonging to the table on the right are filled by NULL values.
The number of rows returned by the query will be equal to the number of matches between Departments and Employees plus one row per unmatched Department (with NULL Employees columns). In other words, a Left Join return all rows an Inner Join does plus one row per unmatched Department.
SELECT Departments.*, Employees.* FROM Departments LEFT JOIN Employees ON (Departments.IDDepartment = Employees.IDDepartment);
A RIGHT JOIN is the opposite on a LEFT JOIN, as it returns all rows in the table on the right side of the SQL query (Employees table) regardless of whether Employees are related to a Department. Whenever a row from the table on the right (Employees) does not have a matching row on the left side (Departments), the columns belonging to the table on the left are filled by NULL values. In other words, right SQL joins return all rows from an Inner Join plus one row per unmatched Employee.
SELECT Departments.*, Employees.* FROM Departments RIGHT JOIN Employees ON (Departments.IDDepartment = Employees.IDDepartment);
What happens when all you want is to try all possible combinations between 2 tables (in our case, test all Departments vs. Employees possibilities)? Even if a Department is not currently related to an Employee and vice-versa. Left Joins are not enough because they only test Departments -> Employees combinations and leave out some possible Departments <- Employees mixes. The opposite happens with a Right Join as it leaves out some Departments -> Employees combinations. What we need here is a Cartesian Product (an All Department vs. All Employees combination) or (more technically called) a Full Outer Join.
SELECT Departments.*, Employees.* FROM Departments FULL OUTER JOIN Employees ON (Departments.IDDepartment = Employees.IDDepartment);
Left, Right and Full SQL Joins return rows from unmatched Departments/Employees combinations plus all rows from matched ones. What happens if we only want to find out which Departments do not have Employees or which Employees are not associated with any Department? That’s where filtering comes into the rescue.
If we want to find which Departments are currently “empty” (i.e. no associated Employees), we can use a Left Join and then filter out those rows which have data in columns from the Employees table (that is, non-null values). To that end, we use the WHERE SQL keyword.
SELECT Departments.*, Employees.* FROM Departments LEFT JOIN Employees ON (Departments.IDDepartment = Employees.IDDepartment) WHERE Employees.IDEmployee IS NULL;
Conversely, if we want to find all Employees without a Department, we should use a Right Join and filter all rows where there’s data for a Department, leaving only Employees with no assigned Department.
SELECT Departments.*, Employees.* FROM Departments RIGHT JOIN Employees ON (Departments.IDDepartment = Employees.IDDepartment) WHERE Employees.IDEmployee IS NULL;
Finally, if want to find all Departments with no Employees and All Employees with no assigned Department (both at once), we can use a Full Outer Join and only return rows where Employee data is NULL or Department data is NULL. As shown above, this is accomplished by first running a Left Join query and then a Right Join. The returned rowset will be identical but it’s less efficient, as we execute 2 SQL joins instead of one.
SELECT Departments.*, Employees.* FROM Departments FULL OUTER JOIN Employees ON (Departments.IDDepartment = Employees.IDDepartment) WHERE Departments.IDDepartment IS NULL OR Employees.IDEmployee IS NULL;
SQL handles relationships through the use of table JOINs. In other words, SQL joins are clauses in SQL language that link two or more tables based on one or more fields called keys. In short, relational databases reduce redundancy and improve query efficiency through the use of JOINs, making data access faster and the whole database easier to maintain.