## CodingBison

This pages focuses on queries that operate on multiple tables using JOIN and UNION queries. JOIN and UNION queries help us retrieve data from multiple tables.

For the sake of consistency, examples on this use a fictitious Blue Sky Toys company. The Blue Sky Toys database manages a chain of stores distributed across various locations and the database keeps a record of employees, locations, items, and their inventory.

The record keeping is enabled using four tables: employees (with columns employee_id, employee_name, date_of_birth, store_id, and employee_address), stores (with columns store_id and store_location), items (with columns item_code, item_name, item_type, and item_price), and inventory (with columns store_id, item_code, and item_quantity).

### JOIN Queries

There are several types of JOIN queries.

Perhaps, the most common is that an INNER JOIN query, where two tables are compared against one column from each table; these columns need to hold a common value. The output of this query contains rows built by rows by matching the common column from both tables.

Besides INNER JOIN, the other types of JOIN queries are OUTER JOIN queries. OUTER JOIN queries are also referred to as LEFT OUTER JOIN or RIGHT OUTER JOIN, or, often simply as "LEFT JOIN" and "RIGHT JOIN". In these LEFT/RIGHT JOIN queries, each row of one table (let us call it the driver table) is matched against the rows of the other table.

There is one important difference between INNER and OUTER JOIN queries. An OUTER JOIN query prints all rows from the driver table, whether there is a match or not in the second table.

#### INNER JOIN

Let us use an INNER JOIN query to calculate the sum of cost for all items present in a store. For this we need to use three tables: (a) stores table so that we can group the output store-wise, (b) inventory table to find out which store has what times and in what quantity, and (c) items table since this table has price for each item.

Because this query is non-trivial, for the sake of better readability, we will build this query, one step at a time!

We start by using "INNER JOIN" to connect two tables: inventory and stores. We provide the format of the "INNER JOIN" query in the output provided below. An "INNER JOIN" query works with SELECT and we need to put the names of the table on either side of the "INNER JOIN" query. Instead of WHERE, we need to put the "ON" keyword followed by the condition.

``` mysql> SELECT * FROM inventory INNER JOIN stores ON inventory.store_id = stores.store_id;
+----------+-----------+---------------+----------+---------------------------------+
| store_id | item_code | item_quantity | store_id | store_location                  |
+----------+-----------+---------------+----------+---------------------------------+
|        1 |      1003 |            11 |        1 | 110 Ocean Drive, Springfield    |
|        1 |      1007 |             7 |        1 | 110 Ocean Drive, Springfield    |
|        2 |      1003 |            11 |        2 | 220 Pacific Avenue, Quahog      |
|        2 |      1001 |             2 |        2 | 220 Pacific Avenue, Quahog      |
|        2 |      1006 |            17 |        2 | 220 Pacific Avenue, Quahog      |
|        3 |      1003 |             3 |        3 | 330 Atlantic Way, Langely Falls |
|        3 |      1005 |            22 |        3 | 330 Atlantic Way, Langely Falls |
+----------+-----------+---------------+----------+---------------------------------+
7 rows in set (0.00 sec)
```

As we can see the output matches the inventory items with their store location. It is good to note that the same output using the WHERE clause: "SELECT * FROM inventory,stores WHERE inventory.store_id = stores.store_id;". The INNER JOIN (and so do other joins) print the output in the same order as we specify the tables. Thus, in this query, we specify inventory followed by stores and so, the output from the inventory table is on the left and the output from the stores table is on the right.

For the INNER JOIN query, not specifying the ON clause would mean that MySQL would print each row of first table against each row of the second table (also known as Cartesian product). Since inventory has 14 rows and stores has 3 rows, this would mean a total of 42 rows! Thus, without an ON clause, an INNER JOIN query is generally less useful.

If needed, we can always add a WHERE clause on top of "INNER JOIN" and ON keywords. Let us reuse the earlier query but only for the stores with store_id as 3.

``` mysql> SELECT * FROM inventory INNER JOIN stores ON inventory.store_id = stores.store_id
-> WHERE stores.store_id = 3;
+----------+-----------+---------------+----------+---------------------------------+
| store_id | item_code | item_quantity | store_id | store_location                  |
+----------+-----------+---------------+----------+---------------------------------+
|        3 |      1003 |             3 |        3 | 330 Atlantic Way, Langely Falls |
|        3 |      1005 |            22 |        3 | 330 Atlantic Way, Langely Falls |
+----------+-----------+---------------+----------+---------------------------------+
2 rows in set (0.00 sec)
```

In the next step, let us throw another table in the ring!

Now, we print the price of each item in the above result as well. However, the price of each item is available only in the items table and so we would need to run the above INNER JOIN by adding items table as well. To do this, we add "INNER JOIN items" in the previous query to add items table as well. Next, we add "AND items.item_code = inventory.item_code;" to make sure that we print records that are matched using the item_code between the two tables.

Lastly, to be more selective, we print only selected columns from all three tables; doing a "SELECT *" would join columns from all tables and the output would be rather verbose!

``` mysql> SELECT stores.store_id, inventory.item_code, inventory.item_quantity, items.item_price,
-> items.item_type, stores.store_location
-> FROM inventory INNER JOIN stores INNER JOIN items
-> ON inventory.store_id = stores.store_id AND items.item_code = inventory.item_code;
+----------+-----------+---------------+------------+-----------+---------------------------------+
| store_id | item_code | item_quantity | item_price | item_type | store_location                  |
+----------+-----------+---------------+------------+-----------+---------------------------------+
|        1 |      1003 |            11 |      13.49 |         1 | 110 Ocean Drive, Springfield    |
|        1 |      1007 |             7 |      19.99 |         1 | 110 Ocean Drive, Springfield    |
|        2 |      1003 |            11 |      13.49 |         1 | 220 Pacific Avenue, Quahog      |
|        2 |      1001 |             2 |      29.99 |         2 | 220 Pacific Avenue, Quahog      |
|        2 |      1006 |            17 |      23.49 |         2 | 220 Pacific Avenue, Quahog      |
|        3 |      1003 |             3 |      13.49 |         1 | 330 Atlantic Way, Langely Falls |
|        3 |      1005 |            22 |      17.49 |         2 | 330 Atlantic Way, Langely Falls |
+----------+-----------+---------------+------------+-----------+---------------------------------+
7 rows in set (0.03 sec)
```

As the last step, let us aim to get the total inventory cost of all items for every store. We can easily do this by using the "GROUP BY" clause that would group all the items in a store based on the item_code. Next, within each group, we can use "SUM(item_quantity * items.item_price)" to take product of item quantity with their respective price. Please note that since item_quantity appears only in the stores table, there is no need to write "stores.item_quantity".

``` mysql> SELECT stores.store_id, SUM(item_quantity * items.item_price), stores.store_location
-> FROM stores INNER JOIN inventory INNER JOIN items
-> ON inventory.store_id = stores.store_id and inventory.item_code = items.item_code
-> GROUP BY stores.store_id;
+----------+---------------------------------------+---------------------------------+
| store_id | SUM(item_quantity * items.item_price) | store_location                  |
+----------+---------------------------------------+---------------------------------+
|        1 |                                288.32 | 110 Ocean Drive, Springfield    |
|        2 |                                607.70 | 220 Pacific Avenue, Quahog      |
|        3 |                                425.25 | 330 Atlantic Way, Langely Falls |
+----------+---------------------------------------+---------------------------------+
3 rows in set (0.00 sec)
```

Since the name of the column "SUM(item_quantity * items.item_price)" appears a little bit unwieldy, we can use the "AS" alias keyword to rename the column to something more appropriate.

``` mysql>  SELECT stores.store_id, SUM(item_quantity * items.item_price) AS inventory_cost,
->  stores.store_location FROM stores INNER JOIN inventory INNER JOIN items
->  ON inventory.store_id = stores.store_id and inventory.item_code = items.item_code
->  GROUP BY stores.store_id;
+----------+----------------+---------------------------------+
| store_id | inventory_cost | store_location                  |
+----------+----------------+---------------------------------+
|        1 |         288.32 | 110 Ocean Drive, Springfield    |
|        2 |         607.70 | 220 Pacific Avenue, Quahog      |
|        3 |         425.25 | 330 Atlantic Way, Langely Falls |
+----------+----------------+---------------------------------+
3 rows in set (0.00 sec)
```

#### LEFT/RIGHT JOIN

Also, referred to as OUTER JOINs, LEFT JOIN and RIGHT JOIN allow us to see even those records that do not have any matching entry in the other row. LEFT JOIN can also be written as "LEFT OUTER JOIN" and RIGHT JOIN can also be written as "RIGHT OUTER JOIN".

In both of these joins, one table becomes the driver table and the JOIN compares rows from the other table against the rows of the driver's table. Let us see this using a simple LEFT JOIN.

``` mysql> SELECT * FROM inventory LEFT JOIN stores ON inventory.store_id = stores.store_id;
+----------+-----------+---------------+----------+---------------------------------+
| store_id | item_code | item_quantity | store_id | store_location                  |
+----------+-----------+---------------+----------+---------------------------------+
|        1 |      1003 |            11 |        1 | 110 Ocean Drive, Springfield    |
|        1 |      1007 |             7 |        1 | 110 Ocean Drive, Springfield    |
|        2 |      1003 |            11 |        2 | 220 Pacific Avenue, Quahog      |
|        2 |      1001 |             2 |        2 | 220 Pacific Avenue, Quahog      |
|        2 |      1006 |            17 |        2 | 220 Pacific Avenue, Quahog      |
|        3 |      1003 |             3 |        3 | 330 Atlantic Way, Langely Falls |
|        3 |      1005 |            22 |        3 | 330 Atlantic Way, Langely Falls |
|     NULL |      1001 |            40 |     NULL | NULL                            |
|     NULL |      1002 |            14 |     NULL | NULL                            |
|     NULL |      1006 |            25 |     NULL | NULL                            |
|     NULL |      1004 |            10 |     NULL | NULL                            |
+----------+-----------+---------------+----------+---------------------------------+
11 rows in set (0.00 sec)
```

In the above query, the two tables are placed on either side of the "LEFT JOIN" (similar to that of "INNER JOIN"). The difference here is that the table on the left of the "LEFT JOIN" (inventory) becomes the driver table and the SELECT prints record by comparing each row of "inventory" table against the rows from the "stores" table.

In the above output, some of the records from the inventory table have NULL as store ID (let us say, these items are still in transit and the final store has not yet been decided). "LEFT JOIN" prints those rows as well and since there is no matching row in the stores table, it prints NULL for store_id and store_location.

Also, as stated earlier, if we were to add OUTER to the LEFT JOIN, then the query have the same result: "SELECT * FROM inventory LEFT OUTER JOIN stores ON inventory.store_id = stores.store_id;"

Now, let us turn the tables (pun intended!) and make stores table the driver table; we can do this by simply reversing their order with respect to "LEFT JOIN" clause:

``` mysql> SELECT * FROM stores LEFT JOIN inventory ON inventory.store_id = stores.store_id;
+----------+---------------------------------+----------+-----------+---------------+
| store_id | store_location                  | store_id | item_code | item_quantity |
+----------+---------------------------------+----------+-----------+---------------+
|        1 | 110 Ocean Drive, Springfield    |        1 |      1003 |            11 |
|        1 | 110 Ocean Drive, Springfield    |        1 |      1007 |             7 |
|        2 | 220 Pacific Avenue, Quahog      |        2 |      1003 |            11 |
|        2 | 220 Pacific Avenue, Quahog      |        2 |      1001 |             2 |
|        2 | 220 Pacific Avenue, Quahog      |        2 |      1006 |            17 |
|        3 | 330 Atlantic Way, Langely Falls |        3 |      1003 |             3 |
|        3 | 330 Atlantic Way, Langely Falls |        3 |      1005 |            22 |
|        4 | 440 Sea Way, Springfield         |     NULL |      NULL |          NULL |
+----------+---------------------------------+----------+-----------+---------------+
8 rows in set (0.00 sec)
```

In this case, the records of the stores table are matched against those from the inventory table.

Even though the store with the store_id of 4 does not have any entries in the inventory, it still shows up in this query, albeit, the corresponding row from the inventory table has all NULL values. Also, note that for the case of earlier INNER JOIN, since there is no matching entry of store_id of 4 in the inventory table, so the store_id of 4 is never printed.

With RIGHT JOIN, it is the table on the right that becomes the driver table. Thus, we can rewrite the above query using the RIGHT JOIN as well. With RIGHT JOIN, we need to change the order of the table since we would want to keep stores as the driver table.

``` mysql> SELECT * FROM inventory RIGHT JOIN stores ON inventory.store_id = stores.store_id;
+----------+-----------+---------------+----------+---------------------------------+
| store_id | item_code | item_quantity | store_id | store_location                  |
+----------+-----------+---------------+----------+---------------------------------+
|        1 |      1003 |            11 |        1 | 110 Ocean Drive, Springfield    |
|        1 |      1007 |             7 |        1 | 110 Ocean Drive, Springfield    |
|        2 |      1003 |            11 |        2 | 220 Pacific Avenue, Quahog      |
|        2 |      1001 |             2 |        2 | 220 Pacific Avenue, Quahog      |
|        2 |      1006 |            17 |        2 | 220 Pacific Avenue, Quahog      |
|        3 |      1003 |             3 |        3 | 330 Atlantic Way, Langely Falls |
|        3 |      1005 |            22 |        3 | 330 Atlantic Way, Langely Falls |
|     NULL |      NULL |          NULL |        4 | 440 Sea Way, Springfield         |
+----------+-----------+---------------+----------+---------------------------------+
8 rows in set (0.00 sec)
```

Likewise, the output of "SELECT * FROM stores RIGHT JOIN inventory ON inventory.store_id = stores.store_id;" would be similar to that of "SELECT * FROM inventory LEFT JOIN stores ON inventory.store_id = stores.store_id;" except that rows from the inventory would be on the right side of the output.

### UNION Queries

Sometimes, we want to gather input from multiple tables and compile them together as one list. For such cases, we can use SQL's UNION method. As long as the number of columns from various queries are same, we can use UNION to add these output together.

Let us consider a (non-useful) example for the sake of demonstrating the concept of SQL UNION method. Let us say, we would like to gather the address of all stores and of employees in one report. Clearly, we would need input from two tables, stores and employees. We can run SQL queries individually for both tables and then do a UNION to combine these output together!

``` mysql> (SELECT employee_id, employee_address FROM employees)
-> UNION
-> (SELECT store_id, store_location FROM stores);
+-------------+------------------------------------+
+-------------+------------------------------------+
|         101 | 742 Evergreen Terrace, Springfield |
|         103 | 31 Spooner Street, Quahog          |
|         112 | 470 Evergreen Lane, Quahog         |
|         120 | 742 Evergreen Terrace, Springfield |
|         121 | 31 Spooner Street, Quahog          |
|         110 | 43 Cherry Street, Langely Falls    |
|         122 | 43 Cherry Street, Langely Falls    |
|         123 | 200 Main Street, Springfield        |
|         124 | 200 Lost Road, Springfield         |
|           1 | 110 Ocean Drive, Springfield       |
|           2 | 220 Pacific Avenue, Quahog         |
|           3 | 330 Atlantic Way, Langely Falls    |
|           4 | 440 Sea Way, Springfield            |
+-------------+------------------------------------+
13 rows in set (0.00 sec)
```

Please note that the UNION prints the column names from the first query. If we wanted to avoid that, we could have use column aliases (using the keyword "AS") to do that. For example: "(SELECT employee_id AS generic_id, employee_address AS address FROM employees) UNION (SELECT store_id, store_location FROM stores);"

If we need to add results from additional tables, then we can use UNION several times. For example, if we were to have a third address table (let us say, supplier tables with two columns as supplier_id and supplier_address), then we can easily extend the above query to include address of supplies as well; this way, the new query would be "(SELECT employee_id AS generic_id, employee_address AS address FROM employees) UNION (SELECT store_id, store_location FROM stores) UNION (SELECT supplier_id, supplier_address FROM suppliers);"