Skip to content

Mapping Values with Maps and applymap() in Pandas

Updated: at 04:28 AM

Mapping values is a common data transformation task when working with Pandas DataFrames. Mapping allows you to convert values from one representation to another in a vectorized manner across an entire DataFrame or Series. This can be useful for handling missing data, standardizing data formats, encoding categorical variables, and more.

Pandas provides two main methods for mapping values: map() and applymap(). The map() method can be used with Series and DataFrame columns, while applymap() works element-wise on the entire DataFrame. In this comprehensive guide, you’ll learn:

By the end, you’ll have the skills to leverage the power and flexibility of Pandas mapping for your own data projects.

Table of Contents

Open Table of Contents

Mapping Values in Pandas

Mapping refers to the process of converting values from one representation to another. With Pandas map() and applymap(), this mapping is vectorized, meaning the transformation is applied efficiently to entire Series or DataFrames without the need for slower loops [1].

For example, let’s say we have a Pandas Series of strings representing yes/no responses:

import pandas as pd

responses = pd.Series(["yes", "no", "maybe", "yes", "no", "yes"])

We could map these to boolean values True/False using a dictionary mapping:

mapping = {"yes": True, "no": False}

responses.map(mapping)

This would convert the strings to booleans element-wise for the entire Series, allowing for optimized vectorized mappings.

When to Use map() vs applymap()

The map() and applymap() methods in Pandas have similar use cases but differ in their implementation:

So map() is intended for column-wise transformations, while applymap() works on the full DataFrame.

Use map() when:

Use applymap() when:

In practice, map() tends to see more common use for simpler column lookups. But applymap() provides more flexibility when the mapping relies on computation.

Mapping with Dictionaries and Series

The easiest way to map values is using a dictionary, since Pandas map() and applymap() can accept a dictionary defining the value mappings.

For example:

ages = {"Tom": 30, "Jenny": 20, "Ann": 25}
df = pd.DataFrame(ages.items(), columns=["Name", "Age"])

df["Age Category"] = df["Age"].map({"20": "Twenties", "25": "Twenties", "30": "Thirties"})

Here we map the numeric ages to age category strings using the dictionary.

A Series can also be used to define mappings:

categories = pd.Series([10, 20, 30], index=["Teen", "Twenties", "Thirties"])

df["Age Category"] = df["Age"].map(categories)

Using a Series allows matching the index values. This can be easier to maintain than a dictionary.

Mapping with Functions

For more complex transformations, functions can be used with map() and applymap().

For example, we can write a function to bucket ages into categories:

def categorize_age(age):
    if age < 20:
        return "Teen"
    elif age < 30:
        return "Twenties"
    else:
        return "Thirties"

df["Age Category"] = df["Age"].map(categorize_age)

Here map() will call the categorize_age function on each value in the Age column.

With applymap() we can use a function to transform each element in the full DataFrame:

def square(x):
    return x**2

df = df.applymap(square) # square each value

So map() is great when transforming columns, while applymap() allows full DataFrame transformations.

Using map() with Series and DataFrame Columns

The map() method can be used with both Series and DataFrame columns.

With a Series, map() will transform each element:

s = pd.Series([1, 2, 3])
s.map({1: "one", 2: "two", 3: "three"})

# 0    one
# 1    two
# 2   three

For a DataFrame column, map will operate on each element in that column:

df = pd.DataFrame({"Values": [1, 2, 3]})

df["Mapped"] = df["Values"].map({1: "one", 2: "two"})

We can also specify multiple DataFrame column mappings at once:

mappings = {"Values": {1: "one"}, "OtherCol": {1: "uno"}}

df.map(mappings) # Map Values and OtherCol simultaneously

So map() provides a flexible, column-oriented mapping mechanism.

Applying Element-wise Mappings with applymap()

While map() works column-wise, applymap() operates element-wise on the entire DataFrame. It takes a function and applies it to each value.

For example:

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})

df.applymap(lambda x: x**2) # Square each element

We can also use applymap() with a dictionary:

d = {1: "one", 2: "two"}

df.applymap(d.get) # Map 1 to "one", 2 to "two"

The difference vs map() is applymap() will broadcast the mapping function across all values in the DataFrame, while map() works on a single column.

This makes applymap() very flexible but slower than column-based map().

Optimizing map() Performance

Since map() is applied column-wise, performance is typically very fast, especially with dictionary/Series mappings.

But there are some best practices for optimizing map() performance:

Proper indexing is also crucial for fast Series map() performance. Make sure the Series index matches the lookup keys.

Overall map() aims to provide fast element-wise type changing and data encoding for column transformations. With care, it can be applied efficiently even to large datasets.

Handling Missing Values and Unhashable Types

When using map(), any values not present in the mapping will be converted to NaN by default.

We can override this by specifying na_action='ignore':

s = pd.Series([1, 2, 99, 4])

s.map({1: "one", 2: "two"}, na_action='ignore')

# 0     one
# 1     two
# 2      99
# 3      4

Now unmatched values like 99 and 4 are passed through unchanged.

For applymap(), the function is simply applied to each value, so missing values will depend on the function logic.

Also note map() and applymap() generally require hashable mapping keys/values like dictionaries, Series, functions, etc. For unhashable types like lists, first cast to tuples or use custom functions.

Real-World Examples

There are many uses for value mapping in the real-world. Here are some examples:

Encode categorical values to numbers:

df["State"] = df["State"].map({"NY": 0, "CA": 1, "TX": 2})

Standardize column values:

df["Item Number"] = df["Item Number"].map(lambda x: x.lstrip("0"))

Handle missing data:

df["Sales"] = df["Sales"].map({0: None, -1: None})

Lookups from an external table:

value_map = pd.read_csv("value_mappings.csv")

df["Pixel Value"] = df["Pixel Value"].map(value_map.set_index("Original")["Mapped"])

Custom complex mappings:

def categorize_age(age):
   # Complex logic here
   return mapped_value

df["Age Group"] = df["Age"].map(categorize_age)

There are many more possibilities. map() and applymap() provide simple yet powerful vectorized mapping capabilities for data transformation in Pandas.

Conclusion

This guide provided a comprehensive overview of mapping values using map() and applymap() in Pandas:

By mastering Pandas mapping methods, you can quickly manipulate DataFrames for analysis and visualization. The vectorization provides performance over slower loops while being flexible and expressive. For production workflows, map() should be favored over applymap() where possible.

There are many nuances to mapping data efficiently in Pandas. Be sure to refer to the official Pandas documentation and other credible sources for more details. With practice, Pandas map() and applymap() will become essential tools in your data science and analytics toolbox.