Data Cleaning is the process of ensuring data is correct, consistent, and usable. You can clean data by identifying errors or corruptions, correcting or deleting them, or manually processing data as needed to prevent the same errors from occurring.
There are many benefits to having clean data:
- It removed major errors and inconsistencies that are in-evitable when multiple sources of data are being pulled into one dataset.
- Using tools to clean up data will make everyone on your team more efficient as you'll be able to quickly get what you need from the data available to you.
- Fewer errors mean happier customers and fewer frustrated employees.
- It allows you to map different data functions to better understand what your data is intended to do, and learn where it is coming from.
This is a housing dataset from a state in the US. Various cleaning oprations were performed on this dataset such as;
- Standardizing the date format
- Populating the property address data
- Breaking out the address into individual columns (Address, city, state)
- Change Y and N to Yes and No in the "Sold as Vacant" field
- Removing duplicate columns
- Deleting un-used columns
Link to the full Article can be found here: https://medium.com/@princedanny922/data-cleaning-with-sql-beda03968da6