Course
Cleaning Data in R
Included withPremium or Teams
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Loved by learners at thousands of companies
Training 2 or more people?
Try DataCamp for BusinessCourse Description
Overcome Common Data Problems Like Removing Duplicates in R
It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. The time spent cleaning is vital since analyzing dirty data can lead you to draw inaccurate conclusions.In this course, you’ll learn a variety of techniques to help you clean dirty data using R. You’ll start by converting data types, applying range constraints, and dealing with full and partial duplicates to avoid double-counting.
Delve into Advanced Data Challenges
Once you’ve practiced working on common data issues, you’ll move on to more advanced challenges such as ensuring consistency in measurements and dealing with missing data. After every new concept, you’ll have the chance to complete a hands-on exercise to cement your knowledge and build your experience.Learn to Use Record Linkage During Data Cleaning
Record Linkage is used to merge datasets together when the values have issues such as typos or different spellings. You’ll explore this useful technique in the final chapter and practice the application by using it to join two restaurant review datasets together into a single dataset.Prerequisites
Joining Data with dplyrCommon Data Problems
Categorical and Text Data
Advanced Data Problems
Record Linkage
Complete
Earn Statement of Accomplishment
Add this credential to your LinkedIn profile, resume, or CVShare it on social media and in your performance review
Included withPremium or Teams
Enroll NowFAQs
Why is data cleaning important?
Cleaning data is an essential part of the data management process. It ensures that you are working with relevant data, in a standardized format, and will derive accurate insights instead of harming your analysis by including duplicates, errors, or synonyms within the dataset. Working with clean data can help improve efficiency, reduce overall costs, and increase ROI for decisions made based on your data.
What is record linkage?
Record linkage is sometimes called entry resolution or data matching. It’s a useful technique for finding records within a dataset that refer to the same subject but use different terms that do not have a common identifier. For example, your dataset might include people who live in NY and New York - you would want to combine these datasets together as it is the same place, rather than counting these two names as two separate places.
Who needs to learn how to clean data?
Data cleaning is usually carried out by data engineers, data managers, or data quality analysts. However, it’s a useful skill set for anybody who uses data for analysis and decision making on a regular basis, such as managers, marketers, finance professionals, and HR professionals. Learning data cleaning approaches and techniques will also help you spot poor data and prepare your data properly for analysis.
Is this course suitable for beginners?
This course is not suitable for complete beginners. You will need introductory R knowledge and we recommend that you take the Joining Data with dplyr course in order to fully benefit from this course.
Join over 19 million learners and start Cleaning Data in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.