SQL Queries For Mere Mortals: Thinking In Sets Part 1

SQL Queries For Mere Mortals: Thinking In Sets Part 1

What Is a Set, Anyway?

  • If you remember in high school, you might have studied set theory in a mathematics course. If you were introduced to set algebra, you probably wondered why any of it would ever be useful.
  • A set can be as little as the data from one column from one row in one table. Actually, you can construct a request in SQL that returns no rows an empty set.
  • A set can also be multiple columns (including columns you create with expressions) from multiple rows fetched from multiple tables. Each row in a result set is a member of the set. The values in the columns are specific attributes of each member (data items that describe the member of the set).

Operations on Sets

The three most common set operations are as follows:

  1. Intersection
    You use this to find the common elements in two or more different sets:

    “List all students and the classes for which they are currently enrolled.”
    “Show me the recipes that contain both lamb and rice.”
    “Show me the customers who ordered both bicycles and helmets.”

  2. Difference
    You use this to find items that are in one set but not in another:

    “Show me the recipes that contain lamb but do not contain rice.”
    “Show me the customers who ordered a bicycle but not a helmet.”

  3. Union
    You use this to combine two or more similar sets:

    “Show me all the recipes that contain either lamb or rice.”
    “Show me the customers who ordered either a bicycle or a helmet.”
    “List the names and addresses for both staff and students.”

Intersection

An intersection of two sets contains the common elements of two sets. Let’s first take a look at an intersection from the pure perspective of set theory and then see how you can use an intersection to solve business problems.

Intersection in Set Theory

  • An intersection is a very powerful mathematical tool often used by scientists and engineers
  • you might be interested in finding common points between two sets of chemical or physical sample data.
  • Let’s take a look at intersection in action by examining two sets of numbers. In this example, each single number is a member of the set.
  • The first set of numbers is as follows: 1, 5, 8, 9, 32, 55, 78
  • The second set of numbers is as follows: 3, 7, 8, 22, 55, 71, 99
  • The intersection of these two sets of numbers is the numbers common to both sets: 8, 55
  • The individual entries—the members—of each set don’t have to be just single values. In fact, when solving problems with SQL, you’ll probably deal with sets of rows.

Intersection between Result Sets

  • According to set theory, when a member of a set is something more than a single number or value, each member (or object) of the set has multiple attributes or bits of data that describe the properties of each member.
  • For example, your favorite lunch recipe is a complex member of the set of all recipes that contains many different ingredients. Each ingredient is an attribute of your complex lunch member.
  • To find the intersection between two sets of complex set members, you have to find the members that match on all the attributes.
  • Also, all the members in each set you’re trying to compare must have the same number and type of attributes.
  • For example, suppose you have a complex set like the one below, in which each row represents a member of the set (a lunch recipe), and each column denotes a particular attribute (an ingredient).

My Recipes Set lunch-set.PNG

My Friend's Mike Recipes Set friend-lunch-set.PNG

The intersection of these two sets is the two members whose attributes all match in both sets—that is, the two recipes that Mike and Me have in common.

lunch-intersection.PNG

Intersection in Venn Diagram

  • Sometimes it’s easier to see how intersection works using a set diagram.
  • A set diagram is an elegant yet simple way to diagram sets of information and graphically represent how the sets intersect or overlap.
  • You might also have heard this sort of diagram called a Euler or Venn diagram That we will use.
  • Let’s assume you have a nice database containing all your favorite recipes.
  • So you’re interested in finding all recipes that contain both beef and onions.
  • Figure 7-1 below shows the set diagram that helps you visualize how to solve this problem.

figure 7-1.PNG

  • The upper circle represents the set of recipes that contain beef.
  • The lower circle represents the set of recipes that contain onions.
  • Where the two circles overlap is where you’ll find the recipes that contain both the intersection of the two sets.
  • As you can imagine, you first ask SQL to fetch all the recipes that have beef.
  • In the second query, you ask SQL to fetch all the recipes that have onions.
  • As you’ll see later, you can use a special SQL keyword—INTERSECT—to link the two queries to get the final answer.

Intersection In SQL — INTERSECT

The Sales Orders Database Schema below:

Sales Orders Example Schema.jpg

Let’s say you’re trying to solve the following seemingly simple problem:

❖ NOTE:

  • It is know in this database that bike products has one of this ids: 1, 2, 6, 11.
  • It is know in this database that helmet products has one of this ids: 10, 25, 26.

Example:

  • Problem statement: “Show me the orders that contain both a bike and a helmet.”
  • Translation: Select the distinct order numbers from the order details table where the product number is in the list of bike and helmet product numbers.
  • SQL:
    SELECT DISTINCT OrderNumber
    FROM Order_Details
    WHERE ProductNumber IN
    (1, 2, 6, 10, 11, 25, 26);
    
  • That seems to do the trick at first, but the answer includes orders that contain either a bike or a helmet.
  • And you really want to find ones that contain both a bike and a helmet! If you visualize orders with bicycles and orders with helmets as two distinct sets, it’s easier to understand the problem.
  • Figure 7-6 below shows one possible relationship between the two sets of orders using a set diagram.

figure 7-6.PNG

  • In Figure 7-6, some orders have a bicycle in the list of products ordered, but no helmet. Some have a helmet, but no bicycle.
  • The overlapping area, or intersection, of the two sets is where you’ll find orders that have both a bicycle and a helmet.
  • Seeing word “both” in your request's problem statement suggests you’re probably going to have to break the solution into separate sets of data and then link the two sets in some way. (Your request also needs to be broken into two parts.)

The Problem After breaking it into two parts:
Problem 1:

  • Problem statement: “Show me the orders that contain a bike.”
  • Translation: Select the distinct order numbers from the order details table where the product number is in the list of bike product numbers
  • SQL:
    SELECT DISTINCT OrderNumber
    FROM Order_Details
    WHERE ProductNumber IN (1, 2, 6, 11);
    

Problem 2:

  • Problem statement: “Show me the orders that contain a helmet.”
  • Translation: Select the distinct order numbers from the order details table where the product number is in the list of helmet product numbers
  • SQL:
    SELECT DISTINCT OrderNumber
    FROM Order_Details
    WHERE ProductNumber IN (10, 25, 26);
    
  • Now you’re ready to get the final solution by using—you guessed it—an intersection of the two sets.
  • Figure 7-8 below shows the SQL syntax diagram that handles this problem. (Note that you can use INTERSECT more than once to combine multiple SELECT statements.)

intersect.PNG

  • You can now take the two parts of your request and link them with an INTERSECT operator to get the correct answer:
  • SQL:
    SELECT DISTINCT OrderNumber
    FROM Order_Details
    WHERE ProductNumber IN (1, 2, 6, 11)
    INTERSECT
    SELECT DISTINCT OrderNumber
    FROM Order_Details
    WHERE ProductNumber IN (10, 25, 26);
    
  • The sad news is that not many commercial implementations of SQL yet support the INTERSECT operator. But all is not lost! I’ll show you an alternative method (JOIN) in The coming article that can solve this type of problem in another way. The good news is that virtually all commercial implementations of SQL do support JOIN.

Problems You Can Solve with an Intersection

  • As you might guess, you can use an intersection to find the matches between two or more sets of information.
  • Here’s just a small sample of the problems you can solve using an intersection technique with data from the sample databases:

    “Show me customers and employees who have the same name.”
    “Find all the customers who ordered a bicycle and also ordered a helmet.” “Show me the students who have an average score of 85 or better in Art and who also have an average score of 85 or better in Computer Science.”
    “Show me the recipes that have beef and garlic.”

❖ NOTE:

  • One of the limitations of using a pure intersection is that the values must match in all the columns in each result set.
  • This works well if you’re intersecting two or more sets from the same table for example, customers who ordered bicycles and customers who ordered helmets.
  • It also works well when you’re intersecting sets from tables that have similar columns for example, customer names and employee names.
  • In many cases, however, you’ll want to find solutions that require a match on only a few column values from each set.
  • For this type of problem, SQL provides an operation called a JOIN an intersection on key values.
  • Here’s a sample of problems you can solve with a JOIN:

    “Show me customers and employees who live in the same city.” (JOIN on the city name.)
    “List customers and the entertainers they booked.” (JOIN on the engagement number.)
    “Show me the students and their teachers who have the same first name.” (JOIN on the first name.)
    “Find the bowlers who are on the same team.” (JOIN on the team ID.)