Course
Big Data Fundamentals with PySpark
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Loved by learners at thousands of companies
Training 2 or more people?
Try DataCamp for BusinessCourse Description
Prerequisites
Introduction to PythonIntroduction to Big Data analysis with Spark
Programming in PySpark RDD’s
PySpark SQL & DataFrames
Machine Learning with PySpark MLlib
Complete
Earn Statement of Accomplishment
Add this credential to your LinkedIn profile, resume, or CVShare it on social media and in your performance reviewEnroll Now
FAQs
Do I need prior Big Data experience for this course?
No. This is a beginner-level course. You only need basic Python knowledge, and the course will introduce Big Data concepts and Spark from the ground up.
What PySpark libraries does this course cover?
You will use PySpark core for RDD programming, SparkSQL for structured data queries, and MLlib for basic machine learning tasks.
What datasets are used in the exercises?
You will analyze works of William Shakespeare, explore FIFA 2018 data, and perform clustering on genomic datasets.
What jobs use PySpark skills?
Data engineers, big data developers, and machine learning engineers use PySpark to process and analyze large-scale datasets that do not fit in memory.
How is the course structured?
The course has 4 chapters and 55 exercises covering Big Data fundamentals, RDD programming, SparkSQL, and machine learning with MLlib.
Join over 19 million learners and start Big Data Fundamentals with PySpark today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Grow your data skills with DataCamp for Mobile
Make progress on the go with our mobile courses and daily 5-minute coding challenges.