Data Wrangling

Expert-defined terms from the Postgraduate Certificate in Data-Driven Science Journalism course at Stanmore School of Business. Free to read, free to share, paired with a professional course.

Data Wrangling

Data Wrangling #

Data wrangling, also known as data munging, is the process of cleaning, structur… #

This crucial step in the data science workflow involves transforming and mapping data from its raw form into a format that is suitable for analysis. Data wrangling is often considered one of the most time-consuming tasks in data analysis, as raw data is typically messy, inconsistent, and unstructured.

Data wrangling involves several key tasks, including: #

Data wrangling involves several key tasks, including:

1. **Data Cleaning #

** Removing or correcting errors, inconsistencies, and missing values in the data.

2. **Data Transformation #

** Converting data into a consistent format or structure.

3. **Data Integration #

** Combining data from multiple sources into a unified dataset.

4. **Data Reduction #

** Reducing the size of the dataset while preserving its informational content.

5. **Data Enrichment #

** Adding additional information or attributes to the dataset.

Practical Application #

Imagine you have collected data from multiple sources for a research project #

The data is in different formats, contains missing values, and has inconsistencies. Before you can analyze the data, you need to clean, transform, and integrate it into a single dataset. This process of data wrangling ensures that the data is accurate, complete, and ready for analysis.

Challenges #

Data wrangling can be a complex and time #

consuming process, as it often involves dealing with large volumes of data and multiple sources. Some of the common challenges in data wrangling include:

1. **Data Quality #

** Ensuring the accuracy and completeness of the data.

2. **Data Consistency #

** Dealing with inconsistencies and discrepancies in the data.

3. **Data Integration #

** Combining data from different sources with varying formats.

4. **Data Scalability #

** Handling large datasets efficiently.

5. **Data Security #

** Ensuring the confidentiality and integrity of the data throughout the wrangling process.

June 2026 intake · open enrolment
from £99 GBP
Enrol