CK's Projects
SQL Student/Course Solution
 

 Note: This site is not being maintained any more. If you want anything from it, feel free to save your own copies of the appropriate pages.

During a job interview sometime in the past, I was given a SQL problem. I was unable to give an answer immediately, though I had some ideas on how to do it. I have since given it more thought and come up with a solution, which I ran by a colleague at Oracle named Martin Long (who then came up with another solution). This page describes both solutions.

Problem statement

Given a table of courses offered at a school, and a table showing which students have taken which courses, construct SQL to show which students have taken all available courses.

My solution

My idea was to build an intermediate SELECT containing two columns which listed students and the number of courses that they have taken, and then select from that the students whose course count equaled the count of courses in the course table. After some experimentation I came up with a solution, which was further refined after discussion with Martin.

SELECT student_id
FROM (
SELECT student_id, COUNT(DISTINCT studentcourses.COURSE_ID) sumcol
FROM studentcourses, courses
WHERE studentcourses.COURSE_ID = courses.COURSE_ID
GROUP BY student_id) alias1
WHERE alias1.sumcol = (SELECT COUNT(DISTINCT COURSE_ID) FROM courses);

Martin's solution

Martin also used a comparison of COUNTs, but used a HAVING clause instead of a subSELECT:

SELECT student_id, COUNT(DISTINCT studentcourses.COURSE_ID)
FROM studentcourses, courses
WHERE studentcourses.COURSE_ID = courses.COURSE_ID
GROUP BY student_id
HAVING COUNT(DISTINCT studentcourses.COURSE_ID) =
(SELECT COUNT(DISTINCT COURSE_ID) FROM courses);

Notes

Both solutions handle "dirty" data (e.g. a course identifier in the student/course table that does not occur in the course table). These rows are thrown out by the WHERE clauses in both SQLs.

My solution returns only the column with the student names, but this may or may not be significant in terms of performance. I'm assuming that some front-end GUI program would handle the display, and that program would choose which column(s) to display.

Associated files

I wrote a shell script and a Python script to generate sample data, load it into MySQL and run both queries. To run them, you must have Python and MySQL installed, and the "test" database already created in your MySQL server. I recommend creating a directory, putting the two scripts into that directory and running the shell script from within that directory. The shcll script can be named whatever you like, while the Python script must be named as shown in the file. Note that the shell script deletes and then creates a temporary directory named "tmp" within the current directory to hold temporary files.

The Python script uses recursion which I would not use in a production environment (given the potential issues with stack overflow).

Home

Last Modified: Sunday, August 26, 2007