Skip to main content
HomeTutorialsPython

Python Sets and Set Theory Tutorial

Learn about Python sets: what they are, how to create them, when to use them, built-in functions and their relationship to set theory operations.
Updated Dec 2022  · 13 min read

Python Sets vs Lists and Tuples

Lists and tuples are standard Python data types that store values in a sequence. Sets are another standard Python data type that also store values. The major difference is that sets, unlike lists or tuples, cannot have multiple occurrences of the same element and store unordered values.

Advantages of a Python Set

Because sets cannot have multiple occurrences of the same element, it makes sets highly useful to efficiently remove duplicate values from a list or tuple and to perform common math operations like unions and intersections.

If you'd like to sharpen your Python skills, or you're just a beginner, be sure to take a look at our Python Programmer career track on DataCamp.

With that, let's get started.

How to Create a Set in Python

Sets are a mutable collection of distinct (unique) immutable values that are unordered.

You can initialize an empty set by using set().

emptySet = set()

To initialize a set with values, you can pass in a list to set().

dataScientist = set(['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'])
dataEngineer = set(['Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'])

Run and edit the code from this tutorial online

Run Code

Start Learning Python For Free

Data Types for Data Science in Python

BeginnerSkill Level
4 hr
65.1K learners
Consolidate and extend your knowledge of Python data types such as lists, dictionaries, and tuples, leveraging them to solve Data Science problems.

Initialize a Set

If you look at the output of dataScientist and dataEngineer variables above, notice that the values in the set are not in the order added in. This is because sets are unordered.

Sets containing values can also be initialized by using curly braces.

dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}
dataEngineer = {'Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'}

Initialize a Set

Keep in mind that curly braces can only be used to initialize a set containing values. The image below shows that using curly braces without values is one of the ways to initialize a dictionary and not a set.

Initialize a Set

Add and Remove Values from Python Sets

To add or remove values from a set, you first have to initialize a set.

# Initialize set with values
graphicDesigner = {'InDesign', 'Photoshop', 'Acrobat', 'Premiere', 'Bridge'}

Add Values to a Python Set

You can use the method add to add a value to a set.

graphicDesigner.add('Illustrator')

Add Values to a Set

It is important to note that you can only add a value that is immutable (like a string or a tuple) to a set. For example, you would get a TypeError if you try to add a list to a set.

graphicDesigner.add(['Powerpoint', 'Blender'])

Add Values to a Set

Remove Values from Sets in Python

There are a couple of ways to remove a value from a set.

Option 1: You can use the remove method to remove a value from a set.

graphicDesigner.remove('Illustrator')

Remove Values from a Set

The drawback of this method is that if you try to remove a value that is not in your set, you will get a KeyError.

Remove Values from a Set

Option 2: You can use the discard method to remove a value from a set.

graphicDesigner.discard('Premiere')

Remove Values from a Set

The benefit of this approach over the remove method is if you try to remove a value that is not part of the set, you will not get a KeyError. If you are familiar with dictionaries, you might find that this works similarly to the dictionary method get.

Option 3: You can also use the pop method to remove and return an arbitrary value from a set.

graphicDesigner.pop()

Remove Values from a Set

It is important to note that the method raises a KeyError if the set is empty.

Remove All Values from a Python Set

You can use the clear method to remove all values from a set.

graphicDesigner.clear()

Remove All Values from a Set

Update Python Set Values

The update method adds the elements from a set to a set. It requires a single argument that can be a set, list, tuples, or dictionary. The .update() method automatically converts other data types into sets and adds them to the set. 

In the example, we have initialized three sets and used an update function to add elements from set2 to set1 and then from set3 to set1. 

# Initialize 3 sets
set1 = set([7, 10, 11, 13])
set2 = set([11, 8, 9, 12, 14, 15])
set3 = {'d', 'f', 'h'}

# Update set1 with set2
set1.update(set2)
print(set1)

# Update set1 with set3
set1.update(set3)
print(set1)

Python Set Update

Iterate through a Python Set

Like many standard python data types, it is possible to iterate through a set.

# Initialize a set
dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}

for skill in dataScientist:
    print(skill)

Iterate through a Set

If you look at the output of printing each of the values in dataScientist, notice that the values printed in the set are not in the order they were added in. This is because sets are unordered.

Transform a Python Set into Ordered Values

This tutorial has emphasized that sets are unordered. If you find that you need to get the values from your set in an ordered form, you can use the sorted function, which outputs a list that is ordered.

type(sorted(dataScientist))

Transform Set into Ordered Values

The code below outputs the values in the set dataScientist in descending alphabetical order (Z-A in this case).

sorted(dataScientist, reverse = True)

Transform Set into Ordered Values

Remove Duplicates from a List in Python

Part of the content in this section was previously explored in the tutorial 18 Most Common Python List Questions, but it is important to emphasize that sets are the fastest way to remove duplicates from a list. To show this, let's study the performance difference between two approaches.

Approach 1: Use a set to remove duplicates from a list.

print(list(set([1, 2, 3, 1, 7])))

Approach 2: Use a list comprehension to remove duplicates from a list (If you would like a refresher on list comprehensions, see this tutorial).

def remove_duplicates(original):
    unique = []
    [unique.append(n) for n in original if n not in unique]
    return(unique)

print(remove_duplicates([1, 2, 3, 1, 7]))

The performance difference can be measured using the the timeit library which allows you to time your Python code. The code below runs the code for each approach 10000 times and outputs the overall time it took in seconds.

import timeit

# Approach 1: Execution time
print(timeit.timeit('list(set([1, 2, 3, 1, 7]))', number=10000))

# Approach 2: Execution time
print(timeit.timeit('remove_duplicates([1, 2, 3, 1, 7])', globals=globals(), number=10000))

Remove Duplicates from a List

Comparing these two approaches shows that using sets to remove duplicates is more efficient. While it may seem like a small difference in time, it can save you a lot of time if you have very large lists.

Python Set Operations

A common use of sets in Python is computing standard math operations such as union, intersection, difference, and symmetric difference. The image below shows a couple standard math operations on two sets A and B. The red part of each Venn diagram is the resulting set of a given set operation.

Set Operation Methods

Python sets have methods that allow you to perform these mathematical operations as well as operators that give you equivalent results.

Before exploring these methods, let's start by initializing two sets dataScientist and dataEngineer.

dataScientist = set(['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'])
dataEngineer = set(['Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'])

union

A union, denoted dataScientist ∪ dataEngineer, is the set of all values that are values of dataScientist, or dataEngineer, or both. You can use the union method to find out all the unique values in two sets.

# set built-in function union
dataScientist.union(dataEngineer)

# Equivalent Result
dataScientist | dataEngineer

The set returned from the union can be visualized as the red part of the Venn diagram below.

Set Operation Methods

intersection

An intersection of two sets dataScientist and dataEngineer, denoted dataScientist ∩ dataEngineer, is the set of all values that are values of both dataScientist and dataEngineer.

# Intersection operation
dataScientist.intersection(dataEngineer)

# Equivalent Result
dataScientist & dataEngineer

intersection

The set returned from the intersection can be visualized as the red part of the Venn diagram below.

intersection

You may find that you come across a case where you want to make sure that two sets have no value in common. In order words, you want two sets that have an intersection that is empty. These two sets are called disjoint sets. You can test for disjoint sets by using the isdisjoint method.

# Initialize a set
graphicDesigner = {'Illustrator', 'InDesign', 'Photoshop'}

# These sets have elements in common so it would return False
dataScientist.isdisjoint(dataEngineer)

# These sets have no elements in common so it would return True
dataScientist.isdisjoint(graphicDesigner)

intersection

You can notice in the intersection shown in the Venn diagram below that the disjoint sets dataScientist and graphicDesigner have no values in common.

intersection

Difference

A difference of two sets dataScientist and dataEngineer, denoted dataScientist \ dataEngineer, is the set of all values of dataScientist that are not values of dataEngineer.

# Difference Operation
dataScientist.difference(dataEngineer)

# Equivalent Result
dataScientist - dataEngineer

difference

The set returned from the difference can be visualized as the red part of the Venn diagram below.

difference

Symmetric Difference

A symmetric difference of two sets dataScientist and dataEngineer, denoted dataScientist △ dataEngineer, is the set of all values that are values of exactly one of two sets, but not both.

# Symmetric Difference Operation
dataScientist.symmetric_difference(dataEngineer)

# Equivalent Result
dataScientist ^ dataEngineer

symmetric_difference

The set returned from the symmetric difference can be visualized as the red part of the Venn diagram below.

symmetric_difference

Set Comprehension

You may have previously have learned about list comprehensions, dictionary comprehensions, and generator comprehensions. There are also Python set comprehensions. Set comprehensions are very similar. Set comprehensions in Python can be constructed as follows:

{skill for skill in ['SQL', 'SQL', 'PYTHON', 'PYTHON']}

Set Comprehension

The output above is a set of 2 values because sets cannot have multiple occurrences of the same element.

The idea behind using set comprehensions is to let you write and reason in code the same way you would do mathematics by hand.

{skill for skill in ['GIT', 'PYTHON', 'SQL'] if skill not in {'GIT', 'PYTHON', 'JAVA'}}
 

The code above is similar to a set difference you learned about earlier. It just looks a bit different.

Membership Tests

Membership tests check whether a specific element is contained in a sequence, such as strings, lists, tuples, or sets. One of the main advantages of using sets in Python is that they are highly optimized for membership tests. For example, sets do membership tests a lot more efficiently than lists. In case you are from a computer science background, this is because the average case time complexity of membership tests in sets are O(1) vs O(n) for lists.

The code below shows a membership test using a list.

# Initialize a list
possibleList = ['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS', 'Java', 'Spark', 'Scala']

# Membership test
'Python' in possibleList

Membership Tests

Something similar can be done for sets. Sets just happen to be more efficient.

# Initialize a set
possibleSet = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS', 'Java', 'Spark', 'Scala'}

# Membership test
'Python' in possibleSet

Membership Tests

Since possibleSet is a set and the value 'Python' is a value of possibleSet, this can be denoted as 'Python'possibleSet.

If you had a value that wasn't part of the set, like 'Fortran', it would be denoted as 'Fortran'possibleSet.

Subset

A practical application of understanding membership is subsets.

Let's first initialize two sets.

possibleSkills = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}
mySkills = {'Python', 'R'}
 

If every value of the set mySkills is also a value of the set possibleSkills, then mySkills is said to be a subset of possibleSkills, mathematically written mySkillspossibleSkills.

You can check to see if one set is a subset of another using the method issubset.

mySkills.issubset(possibleSkills)

Subset

Since the method returns True in this case, it is a subset. In the Venn diagram below, notice that every value of the set mySkills is also a value of the set possibleSkills.

Subset

Frozensets

You have have encountered nested lists and tuples.

# Nested Lists and Tuples
nestedLists = [['the', 12], ['to', 11], ['of', 9], ['and', 7], ['that', 6]]
nestedTuples = (('the', 12), ('to', 11), ('of', 9), ('and', 7), ('that', 6))

Frozensets

The problem with nested sets is that you cannot normally have nested Python sets, as sets cannot contain mutable values, including sets.

Frozensets

This is one situation where you may wish to use a frozenset. A frozenset is very similar to a set except that a frozenset is immutable.

You make a frozenset by using frozenset().

# Initialize a frozenset
immutableSet = frozenset()

Frozensets

You can make a nested set if you utilize a frozenset similar to the code below.

nestedSets = set([frozenset()])

Frozensets

It is important to keep in mind that a major disadvantage of a frozenset is that since they are immutable, it means that you cannot add or remove values.

Conclusion

The Python sets are highly useful to efficiently remove duplicate values from a collection like a list and to perform common math operations like unions and intersections. Some of the challenges people often encounter are when to use the various data types. For example, if you feel like you aren't sure when it is advantageous to use a dictionary versus a set, I encourage you to check out DataCamp's daily practice mode. If you any questions or thoughts on the tutorial, feel free to reach out in the comments below or through Twitter.

Topics

Python Courses

Course

Introduction to Python

4 hr
5.5M
Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore popular packages.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Exploring Matplotlib Inline: A Quick Tutorial

Learn how matplotlib inline can enable you to display your data visualizations directly in a notebook quickly and easily! In this article, we cover what matplotlib inline is, how to use it, and how to pair it with other libraries to create powerful visualizations.
Amberle McKee's photo

Amberle McKee

How to Use the NumPy linspace() Function

Learn how to use the NumPy linspace() function in this quick and easy tutorial.
Adel Nehme's photo

Adel Nehme

Python Absolute Value: A Quick Tutorial

Learn how to use Python's abs function to get a number's magnitude, ignoring its sign. This guide explains finding absolute values for both real and imaginary numbers, highlighting common errors.
Amberle McKee's photo

Amberle McKee

How to Check if a File Exists in Python

Learn how to check if a file exists in Python in this simple tutorial
Adel Nehme's photo

Adel Nehme

Writing Custom Context Managers in Python

Learn the advanced aspects of resource management in Python by mastering how to write custom context managers.
Bex Tuychiev's photo

Bex Tuychiev

How to Convert a List to a String in Python

Learn how to convert a list to a string in Python in this quick tutorial.
Adel Nehme's photo

Adel Nehme

See MoreSee More