Using MySQL, Advanced Queries
Workshop Requirements
You should have completed Parts One, Two, Three, Four and Five of this series.
You should also have access to the MySQL command line client software.
You should also have full permissions on a database.
Introduction
Earlier in this series when we looked at SQL statements we were primarily concerned with either retrieving, changing or deleting values within the database and also manipulating the database structures. SQL is more powerful than that and in this workshop I aim to introduce some of the advanced functions and queries that can be useful when building more complex applications.
As
One of the simplest manipulations is to define a new structural element (table or column) by aliasing an existing value. A common use for this is to create a shorthand reference to elements with long names to make the SQL statements shorter and reduce the chance of typos in the longer names.
SELECT <columns> FROM <existing_table_name> AS <new_table_name>
It is important to remember that the table hasn't actually been renamed, but instead the <new_table_name> is simply a reference that exists for the duration of the SQL statement. For example to see this working lets create a simple SELECT statement that retrieves the name column from the artists table (that we created in Part 5) using a reference 't1'.
mysql> SELECT t1.name -> FROM artists -> AS t1; +----------------------+ | name | +----------------------+ | Jamiroquai | | Various | | westlife | | Various | | Abba |
As we can see t1 is a lot easier to type that artists. To see the full benefit of this let's revisit one of the join statements from part 5.
The existing statement is:
mysql> SELECT artists.Artist, cds.title, genres.genre FROM cds -> LEFT JOIN genres ON cds.genreID = genres.genreID -> LEFT JOIN artists ON cds.artistID = artists.artistID -> WHERE (genres.genre = 'Pop');
Whereas our modified statement would look like so:
mysql> SELECT t2.Artist, t1.title, t3.genre FROM cds AS t1 -> LEFT JOIN genres AS t3 ON t1.genreID = t3.genreID -> LEFT JOIN artists AS t2 ON t1.artistID = t2.artistID -> WHERE (t3.genre = 'Pop');
Using aliases also has few other advantages including:
- Avoiding any reserved (used by MySQL) words.
- Allowing Multiple Joins to the same table
- Allowing Self-Joins
- Assigning the result of MySQL function to a temporary column name.
While there is not much more to be said about MySQL reserved words and MySQL functions are covered in Part 9 of this Virtual Workshop series, we will take a look at Self-Joins next.
Practical Uses of As
In the earlier Virtual Workshop on joins we looked at joining different tables together, we are going to extend the use of joins using AS to in allow multiple joins on the same table and Self-Joins.
To illustrate how these work we will create a new 'producer' column in the cds table (the reasons for which will become apparent later) that we last modified in part 5. This will again hold a foreign key reference to a table holding details about the producer, but rather than creating a 'producer' table, we can just reuse the artist table as producers are artists too.
Altering the database structures and adding data
In order to include this new producer column (and include meaningful data) we need to alter the structure of the cds table.
mysql> ALTER TABLE cds -> ADD producerID INT(3);
At this point you can if you want you can enter the appropriate producer information for your existing CDs entered. For use in this workshop we also need to enter the details of an album produced by the artist and an album produced by a outside party. In this case the famously self reliant Prince (or whatever he is calling himself this week) and The Beatles with and their producer the '5th Beatle' Sir George Martin.
Artist | Title | Year | Label | Bought | Tracks | Genre | Producer |
Prince | Sign of the Times | 1987 | Warner Brothers | 1987-11-07 | 16 | Soul | Prince |
The Beatles | The White Album | 1990 | Capitol | 1994-07-20 | 30 | Classic Rock | George Martin |
This data has to be entered into the artists table first:
mysql> INSERT INTO artists -> VALUES ('?','The Beatles'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO artists -> VALUES ('?','Prince'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO artists -> VALUES ('?','George Martin'); Query OK, 1 row affected (0.00 sec)
Then the get the artistIDs using a SELECT statement
mysql> SELECT * -> FROM artists; +----------+----------------------+ | artistID | name | +----------+----------------------+ -- snip -- | 17 | The Beatles | | 18 | Prince | | 19 | George Martin | +----------+----------------------+ 19 rows in set (0.00 sec)
Next enter the label details in the label table.
mysql> INSERT INTO label -> VALUES ('?','Warner Brothers'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO label -> VALUES ('?','Capitol'); Query OK, 1 row affected (0.00 sec)
And getting those newly created label IDs as well.
mysql> SELECT * -> FROM label; +---------+-----------------+ | labelID | name | +---------+-----------------+ -- snip -- | 5 | Warner Brothers | | 6 | Capitol | +---------+-----------------+ 6 rows in set (0.02 sec)
We should also know the genreIDs as well (Classic Rock = 3; Soul = 5) and we can enter the cd details into the cds table (you can see why it is better to have a scripted interface to MySQL).
mysql> INSERT INTO cds ( -> title, year, bought, tracks, genreID, artistID, labelID, producerID -> ) VALUES ( -> 'Sign of the Times','1987','1987-11-07','16','5','18','5','18' -> ); Query OK, 1 row affected (0.08 sec) mysql> INSERT INTO cds ( -> title, year, bought, tracks, genreID, artistID, labelID, producerID -> ) VALUES ( -> 'The White Album','1990','1994-07-20','30','3','17','6','19' -> ); Query OK, 1 row affected (0.01 sec)
This will now allow us to see some of the good things that you can do with AS.
Multiple Joins to One Table.
Consider the following SQL statement and resulting output.
mysql> SELECT cds.title, artists.name -> FROM cds -> LEFT JOIN artists -> USING (artistID) -> WHERE artists.artistID = 17; +-----------------+-------------+ | title | name | +-----------------+-------------+ | The White Album | The Beatles | +-----------------+-------------+ 1 row in set (0.00 sec)
This is fine, but if we wanted to include the producer details we need to join to a producer table, which we don't have because we decided that producers are artists and this creates a problem. We solve this problem by making ANOTHER join to the artists table only this time calling it the producers table using AS to create this alias.
LEFT JOIN artists AS producers ....etc
There is another small consideration. In order to complete the join to the newly aliased producers tale we have to construct the join without the USING clause. This is because we need to join the cds.producerID field on the producers.artistID field which obviously aren't the same name that USING requires. SO to construct the whole query.
mysql> SELECT cds.title, artists.name AS Artist, producers.name AS Producer -> FROM cds -> LEFT JOIN artists -> USING (artistID) -> LEFT JOIN artists AS producers -> ON cds.producerID = producers.artistId -> WHERE artists.artistID = 17; +-----------------+-------------+---------------+ | title | Artist | Producer | +-----------------+-------------+---------------+ | The White Album | The Beatles | George Martin | +-----------------+-------------+---------------+ 1 row in set (0.01 sec)
NOTE: The artists.name and producers.name field has also been aliased to differentiate them in the output.
We can also use this method of aliasing tables to to join a table to itself or perform a Self-Join as it is known.
The Self-Join
A Self-Join (as the name suggests) is when a table is joined to itself and is made possible by aliasing the original table. The main reasons for using a Self-Join arise when you need to either compare the values in differ net rows in the same table.
Unfortunately in the CDs examples we have used so far there is no obvious candidate to demonstrate the Self-Join correctly so I will use a different example this one time only. The classic example given when explaining Self-Joins (I read this in many different books when learning SQL) involves an 'Employees' table that stores the details of their supervisor as well.
EmployeeID | Name | Salary | ManagerID |
023452 | Ken Smith | 45000 | NULL |
087652 | Linda Jones | 25000 | 023452 |
Employees Table
In order to get the name of the Linda's manager you would have to join the table to itself (using an alias to create a 'Manager' table). The join would occur ON Employees.ManagerID = Manager.EmployeeID. Thus we can construct the SQL statement using AS like so:
SELECT Employees.EmployeeID, Employees.Name, Employees.Salary, Manager.Name AS Manager FROM Employees LEFT JOIN Employees AS Manager ON Employees.ManagerID = Manager.EmployeeID WHERE (Employees.EmployeeID = '087652';
This would result in the following being output.
EmployeeID | Name | Salary | Manager |
087652 | Linda Jones | 25000 | Ken Smith |
Aliasing Function output
Using AS with built in functions is discussed in more depth in Part 9 of this workshop series, but worth covering quickly here. Consider a quick query to find the average number of tracks.
mysql> SELECT AVG(tracks) -> FROM cds; +-------------+ | AVG(tracks) | +-------------+ | 22.1667 | +-------------+ 1 row in set (0.00 sec)
AVG(tracks) does tell us certain information, but this field would be difficult to use as part of a script and therefore we can use AS to give a more meaningful name.
mysql> SELECT AVG(tracks) AS AverageTracks -> FROM cds; +---------------+ | AverageTracks | +---------------+ | 22.1667 | +---------------+ 1 row in set (0.03 sec
UNION Joins
Union Joins allow the results of two queries to be combined into one outputted result set. This is done by having the 2 (or more) queries glued together by the UNION operator.
SELECT <fields> FROM <table> WHERE <condition> UNION SELECT <fields> FROM <table> WHERE <condition>
For example if you wanted to search for artist beginning with either P or G you would construct two statements that searched each phrase and use the UNION statement.
mysql> SELECT artists.name -> FROM artists -> WHERE (artists.name LIKE 'P%') -> UNION -> SELECT artists.name -> FROM artists -> WHERE (artists.name LIKE 'G%'); +-----------------+ | name | +-----------------+ | pop goes the 80 | | Prince | | George Martin | +-----------------+ 3 rows in set (0.03 sec)
By now with the knowledge you possess you will have worked out that this could just have easily been done by using two where conditions.
mysql> SELECT artists.name -> FROM artists -> WHERE ((artists.name LIKE 'P%') || (artists.name LIKE 'G%')); +-----------------+ | name | +-----------------+ | pop goes the 80 | | Prince | | George Martin | +-----------------+ 3 rows in set (0.00 sec)
However UNION also allows you to combine the results from different tables not just the same one. To give a practical, but somewhat unrealistic in a 'real world' sense lets return to our CDs database and select all the genres and all the labels that start with letters A to M.
mysql> SELECT label.name -> FROM label -> WHERE (label.name BETWEEN 'A%' AND 'M%') -> UNION -> SELECT genres.genre -> FROM genres -> WHERE (genres.genre BETWEEN 'A%' AND 'M%'); +----------------+ | name | +----------------+ | jive | | EMI | | Capitol | | Easy Listening | | Heavy Metal | | Eighties | | Hip Hop | | Jazz | | Guitar Bands | +----------------+ 9 rows in set (0.04 sec)
A more practical example would be to imagine if we had in fact made a separate producers table rather than just alias the artists table. We could query both tables using UNION to produce a result set.
Temporary Tables
As there name suggests, temporary tables are fleeting in nature lasting only for the length of the MySQL session. Some of the reasons that you may wish to do this include.
- A large, busy site may create a temporary table copies (which are issued and die with a session for example) for each client to use, thus easing congestion in on the original table.
- MySQL does not yet support subselects. Using temporary tables can, in some instances, overcome that problem.
- Storing data during structural database changes.
There are many other reasons that I could go into but let's just look at one cool trick.
Creating the Temporary Table.
The syntax for creating temporary tables is almost identical that used for creating a normal table. Except that there is an extra TEMPORARY clause.
CREATE TEMPORARY TABLE <table> ( field definitions )
This will work by using a query to create the table as well, like so.
CREATE TEMPORARY TABLE <newtable> SELECT * FROM <oldtable>
And it is here that the 'cool trick' comes in. I received an e-mail asking for help (as I often do) and I was able to provide a solution using a temporary table. The problem was related to how to remove duplicate rows from a database. This came about from a web form that fed into a database and users were submitting it multiple times, thus creating duplicated data. The correspondent wished to remove any duplicate entries. We can replicate this problem by creating a copy of the cds table with all the data duplicated.
First make the duplicate cds table:
mysql> CREATE TABLE cdsdupe -> SELECT * FROM cds;
Apply the same structural constraints to the table:
mysql> ALTER TABLE cdsdupe MODIFY cdID INT(3) auto_increment primary key;
Next insert the same data again only omitting the cdID which would cause a conflict with the existing cdIDs (this means using the longer method of specifying which fields to insert into and select from).
mysql> INSERT INTO cdsdupe -> (title, year, bought, tracks, genreID, artistID, labelID, producerID) -> SELECT cds.title, cds.year, cds.bought, cds.tracks, cds.genreID, -> cds.artistID, cds.labelID, cds.producerID -> FROM cds; Query OK, 8 rows affected (0.00 sec) Records: 8 Duplicates: 0 Warnings: 0
You can check that there are indeed duplicate entries using a standard select statement. So how do we solve this problem? We saw in an earlier workshop that the DISTINCT qualifier can be used to only output records one if there are duplicates - which is what we want. Thus if we create a temporary table using a DISTINCT qualifier in a SELECT statement we can filter out the duplicates.
mysql> CREATE TEMPORARY TABLE cdstemp -> SELECT DISTINCT cdsdupe.title, cdsdupe.year, cdsdupe.bought, -> cdsdupe.tracks, cdsdupe.genreID, cdsdupe.artistID, -> cds.labelID, cds.producerID -> FROM cdsdupe; Query OK, 8 rows affected (0.03 sec) Records: 8 Duplicates: 0 Warnings: 0
Again you can check this with a select statement. We next need to remove the data from the original table using a TRUNCATE statement
TRUNCATE TABLE <table_name>
This is better than a delete statement as it drops then recreates the table. This means that it is faster on large tables and more importantly can be used as part of a transaction.
mysql> TRUNCATE TABLE cdsdupe; Query OK, 0 rows affected (0.03 sec)
We then use the temporary table to repopulate the cdsdupe table.
mysql> INSERT INTO cdsdupe -> (title, year, bought, tracks, genreID, artistID, labelID, producerID) -> SELECT cdstemp.title, cdstemp.year, cdstemp.bought, cdstemp.tracks, -> cdstemp.genreID, cdstemp.artistID, cdstemp.labelID, cdstemp.producerID -> FROM cdstemp;
Thus we have removed the duplicates from the cdsdupe table.
Using Full Text Searches
Full Text searches allow you to search for a phrase that can appear ANYWHERE in fields that you specify. This is a kind of 'super-wildcard' approach. To do this you first have to alter any table nominating fields to be searchable within brackets as part of a FULL TEXT defination.
ALTER TABLE <table> ADD FULLTEXT (fields)
To illustrate lets make our title field of the cds table searchable.
mysql> ALTER TABLE cds ADD FULLTEXT (title); Query OK, 8 rows affected (0.36 sec) Records: 8 Duplicates: 0 Warnings: 0
This done we could query the database using the MATCH() and AGAINST() functions, where the field is matched against the search term.
SELECT * FROM cds WHERE MATCH(fields) AGAINST ('search term')
So if we wanted to see all the cds with 'funk' in the title (just because there are bound to be some).
mysql> SELECT cdID, title, year, tracks -> FROM cds -> WHERE -> MATCH(title) AGAINST ('hits'); +------+---------------------+------+--------+ | cdID | title | year | tracks | +------+---------------------+------+--------+ | 5 | Abbas Greatest Hits | 1998 | 24 | +------+---------------------+------+--------+ 1 row in set (0.02 sec)
The MATCH function also sorts multiple results be relevance with the first result being the most relevant. To demonstrate lets add another cds with hits in the title (The Hits by Prince).
mysql> INSERT INTO cds ( -> title, year, bought, tracks, genreID, artistID, labelID, producerID -> ) VALUES ( -> 'The Hits','1993','1993-10-07','58','5','18','5','18' -> ); Query OK, 1 row affected (0.08 sec)
Running the FULL TEXT query again will put the newly added CD top.
mysql> SELECT cdID, title, year, tracks -> FROM cds -> WHERE -> MATCH(title) AGAINST ('hits'); +------+---------------------+------+--------+ | cdID | title | year | tracks | +------+---------------------+------+--------+ | 9 | The Hits | 1993 | 58 | | 5 | Abbas Greatest Hits | 1998 | 24 | +------+---------------------+------+--------+ 2 rows in set (0.00 sec)
We can ask MySQL to show us the rating as well by SELECTing the result of the match().
mysql> SELECT title, MATCH(title) AGAINST ('hits') AS Rating -> FROM cds -> WHERE MATCH(title) AGAINST ('hits'); +---------------------+-----------------+ | title | Rating | +---------------------+-----------------+ | The Hits | 1.238520026207 | | Abbas Greatest Hits | 1.2109839916229 | +---------------------+-----------------+ 2 rows in set (0.01 sec)
You can see that The Hits (1.23....) has a higher rating than Abbas Greatest Hits (1.21.....)
Conclusion
As these workshops progress and you, the reader, become more confident with SQL statements I have to introduce more complex problems and solutions to keep you entertained ;-). This workshop has dealt with more complex queries but if you have not had enough yet, can I suggest investigating SubSelects available in the MySQL 4.1 alpha releases.