Skip to content

A Comprehensive Guide to Pandas df.info() in Python

Updated: at 01:46 AM

Pandas is one of the most popular and powerful data analysis libraries in Python. It provides efficient data structures like DataFrames and Series to make data analysis workflow much easier and intuitive.

One important method in Pandas is df.info(), which allows us to get a quick overview of the DataFrame including the index, columns, data types, memory usage and more. Having a solid understanding of df.info() is critical for effective exploratory data analysis using Pandas.

In this comprehensive guide, we will dive deep into df.info() and learn how to use it to extract key details about a Pandas DataFrame. We will cover the following topics in-depth with example code snippets:

Table of Contents

Open Table of Contents

Overview of df.info()

The df.info() method in Pandas provides an overview of the DataFrame by outputting information about the index, columns, data types, memory usage and more.

Here is the basic syntax:

df.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

It takes the following optional parameters:

Calling df.info() quickly outputs a concise summary of the DataFrame without having to write much code. This makes it very useful for initial exploratory data analysis.

Let’s look at a simple example:

import pandas as pd

df = pd.DataFrame({'A': [1, 2],
                   'B': [1.0, 3.0],
                  'C': ['a', 'b']})

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       2 non-null      int64
 1   B       2 non-null      float64
 2   C       2 non-null      object
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

This quickly shows the DataFrame has:

As we can see, df.info() provides a neat summary of all the main details we need to know about the structure of a DataFrame. Now let’s look at each of these elements more closely.

Index Details

df.info() provides useful details about the index of the DataFrame including:

By default, Pandas DataFrames have a default integer index labeled 0 to n-1 rows.

We can change this index to another column if required. Let’s see an example:

df = pd.DataFrame({'A': [1, 2],
                   'B': [1.0, 3.0]},
                  index=['row1', 'row2'])

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, row1 to row2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       2 non-null      int64
 1   B       2 non-null      float64
dtypes: float64(1), int64(1)
memory usage: 96.0 bytes

Here we can see the index is named row1 to row2 with 2 entries.

The index data type is also visible. By default, it is the integer position values from 0 to n-1 rows. But it can be set to any data type like strings, datetime etc.

Column Details

In addition to index information, df.info() also provides details about the columns in the DataFrame:

This allows us to quickly check if the columns and data types are as expected.

Let’s see an example:

df = pd.DataFrame({'NumericCol': [1, 2],
                   'StringCol': ['a', 'b']},
                   index=['row1', 'row2'])

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, row1 to row2
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   NumericCol  2 non-null      int64
 1   StringCol  2 non-null      object
dtypes: int64(1), object(1)
memory usage: 128.0 bytes

Here we can see:

This allows us to verify the DataFrame structure at a glance.

Data Types Overview

One of the most useful parts of df.info() is that it provides a quick overview of the data types of all the columns.

The data types summary is shown in the dtypes section of the output.

For example:

dtypes: float64(2), int64(1), object(1)

This summarizes the data types in the DataFrame as:

This allows us to easily verify that the columns have the expected types and detect any unexpected types that could lead to errors later on.

Detecting mixed data types is especially important for numeric calculations to prevent silent errors.

Memory Usage

When dealing with large datasets, understanding the memory footprint is important.

df.info() provides memory usage details of the DataFrame by default.

For example:

memory usage: 200.0+ bytes

This shows the total memory usage in bytes to store the DataFrame data and metadata.

We can also get deep memory usage by passing memory_usage='deep':

df.info(memory_usage='deep')

This traverses the DataFrame columns to provide a more detailed memory breakdown including memory usage of each column.

Use Cases and Examples

Now that we’ve seen what df.info() displays, let’s go over some examples of how it can be used for exploratory data analysis.

1. Verify DataFrame structure and metadata

As seen earlier, we can use df.info() after creating a new DataFrame to verify it has the expected index, columns, data types and size. This helps catch any mismatches between assumptions and reality about the DataFrame.

2. Profile new unknown data sources

When loading datasets from new sources, we may not know the structure, data types or size beforehand. df.info() allows quickly profiling the DataFrame to understand the data better.

3. Catch mixed data types

Using df.info() to print the data types overview can help identify any mixed types in columns. This prevents silent errors later when doing computations on such data.

4. Check for missing data

The non-null count in df.info() output can reveal columns with missing values. This helps plan data cleaning steps like imputation.

5. Estimate memory usage

For big data applications, the memory footprint is important. df.info() provides an estimate of memory usage to optimize system configuration.

6. Monitor memory usage during transformations

We can insert df.info() at various points while transforming data to track how memory usage changes. This helps detect memory leaks or inefficient operations.

7. Compare DataFrames

df.info() can be used to print and compare summaries of two DataFrames side-by-side to understand how they differ.

Additional Parameters

We briefly introduced the extra parameters available for df.info() earlier. Let’s look at them in more depth with examples:

verbose

The verbose parameter controls whether to print the full summary or just the basic details.

df.info(verbose=False)

This will omit the column details like dtypes and memory usage.

buf

We can pass a buffer or file handle to buf to redirect the output to a file or StringIO object.

For example:

import StringIO
buffer = StringIO.StringIO()
df.info(buf=buffer)

max_cols

To limit the number of columns printed, we can pass max_cols. This is useful for wide DataFrames.

For example:

df.info(max_cols=5)

This will print details of only the first 5 columns.

memory_usage

We discussed using memory_usage='deep' earlier to get detailed memory breakdown.

null_counts

Setting null_counts=True will include a column showing the number of non-null values per column.

How df.info() Works Internally

Under the hood, df.info() works by iterating through the columns of the DataFrame and extracting the index, column and data type details.

It uses the following attributes and methods:

The output summary string is constructed using this information.

Knowing this helps understand what operations are done internally by df.info(). We can avoid repeating any redundant operations in our own code.

Comparison with df.describe()

Both df.info() and df.describe() are used for exploratory data analysis with Pandas. But they provide different types of summaries:

So df.info() complements df.describe() by providing structural metadata compared to just statistics.

Here is a comparison:

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

print(df.info())
print(df.describe())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       2 non-null      int64
 1   B       2 non-null      int64
dtypes: int64(2)
memory usage: 96.0 bytes

        A    B
count  2.0  2.0
mean   1.5  3.5
std    0.5  0.5
min    1.0  3.0
25%    1.0  3.0
50%    1.5  3.5
75%    2.0  4.0
max    2.0  4.0

We can see df.info() provides structural metadata like column names, dtypes, index etc. while df.describe() provides statistical summary like mean, standard deviation etc.

Using both together gives a more comprehensive data profile.

Limitations to be Aware Of

While df.info() is very useful, some limitations to keep in mind:

Conclusion

In this comprehensive guide, we explored df.info() in depth including its parameters, use cases, internal working and limitations.

The key takeaways are:

Overall, mastering df.info() provides a simple yet powerful way to understand the shape of DataFrames for effective data analysis in Python. It should be part of every Pandas user’s toolbox.

Hopefully this guide gives you the knowledge to use df.info() for profiling DataFrames confidently. The key is practice - use df.info() liberally when exploring datasets to build intuition. This will enable you to derive insights from data more effectively using Python.