Eliminating Duplicate Rows

Retrieving Data › Eliminating Duplicate Rows

Eliminating Duplicate Rows

Sometimes a row in a selected column contains information that is the same as information in another row.

Why Duplicates Occur

The EXPERTISE table contains skill IDs, the IDs of employees who have the skills, the level of ability an employee has in a skill, and the date the ability was acquired. Commonwealth Auto may have several employees who match a particular skill in the SKILL table and may have no employees who match another skill.

How It's Done

To obtain a list of skill IDs associated with at least one employee, enter:

select skill_id
       from expertise;

This gives a list of skill IDs that have been matched to employees who have that skill. Any skill that has no employees associated with it will not show up in the result.

The result looks like this:

SKILL_ID -------- 1000 6470 1000 6770 6770 7000 3065 3333 6770 4430 7000 5309 1000 6670 6470 3333 4444 7000 4250 4370 5180 1030 4490 5200 6666 5420

5430 1000 5500 5309 5180 1000 4430 3333 6650 6670 6770 6770 5309 5500 6650 5200 7000 7000 7000 5309 5200 6666 4370 4410 7000 7000 4370 4410 4420 7000 1000 4430 5500 3065 6670 7000 4250 5130 5309 5130 6770 7000 5200 69 rows processed

Why are some of the skill IDs repeated?

The result shows the skill ID for each employee. If more than one employee has that particular skill, the skill ID is repeated.

To see this more clearly, look at the skills and the associated employees by entering:

select skill_id, emp_id
       from expertise;

The result looks like this:

SKILL_ID EMP_ID -------- ------ 1000 1003 6470 1034 1000 1234 6770 1765 6770 2004 7000 2010 3065 2096 3333 2096 6770 2106 4430 2174 7000 2180 5309 2209 1000 2246 6670 2246 6470 2424 3333 2437 4444 2437 7000 2448 4250 2461 4370 2461 5180 2461 1030 2466 4490 2466 5200 2466 6666 2598 5420 2781 5430 2781 1000 2894 5500 2894 5309 3082 5180 3118 1000 3222 4430 3222 3333 3288 6650 3288 6670 3288 6770 3294 6770 3338 5309 3341 5500 3411 6650 3433 5200 3449 7000 3704 7000 3764 7000 3767 5309 3769 5200 3778 6666 3778 4370 3841 4410 3841 7000 3991 7000 4001 4370 4002 4410 4002 4420 4008

7000 4027 1000 4321 4430 4321 5500 4358 3065 4456 6670 4456 7000 4660 4250 4703 5130 4703 5309 4773 5130 4962 6770 5008 7000 5090 5200 5103 69 rows processed

You want to eliminate the duplicate rows resulting from your first SELECT statement in order to see each skill ID only once.

To eliminate these rows, you can use the DISTINCT option immediately after the word SELECT.

How It's Done with DISTINCT

Using the first SELECT statement, add DISTINCT after SELECT:

select distinct skill_id
       from expertise;

The result looks like this:

SKILL_ID -------- 1000 1030 3065 3333 4250 4370 4410 4420 4430 4444 4490 5130 5180 5200 5309 5420 5430 5500 6470 6650 6666 6670 6770 7000 24 rows processed

Now the result shows a list of skill IDs with no duplicates.

Since using DISTINCT eliminates duplicate rows, fewer rows are returned.