How to Create a Correlation Heatmap in Python?
I want to create a correlation heatmap in Python to represent a correlation matrix. In R, there is a concept called a correlogram, but I am not sure if there’s an equivalent in Python. My correlation values range from -1 to 1, for example:
[[ 1. 0.00279981 0.95173379 0.02486161 -0.00324926 -0.00432099]
[ 0.00279981 1. 0.17728303 0.64425774 0.30735071 0.37379443]
[ 0.95173379 0.17728303 1. 0.27072266 0.02549031 0.03324756]
[ 0.02486161 0.64425774 0.27072266 1. 0.18336236 0.18913512]
[-0.00324926 0.30735071 0.02549031 0.18336236 1. 0.77678274]
[-0.00432099 0.37379443 0.03324756 0.18913512 0.77678274 1. ]]
I attempted to use the following code to create a heatmap:
plt.imshow(correlation_matrix, cmap='hot', interpolation='nearest')
However, this approach does not adequately represent negative values. I would like the heatmap to have a color gradient from blue (-1) to red (1) to effectively display the correlation range. Could you provide a solution for creating such a correlation heatmap in Python?
Creating a correlation heatmap in Python is straightforward with Seaborn’s heatmap
function. Here’s how you can get started using the mpg
dataset:
import seaborn as sns
%matplotlib inline
# Load the dataset
auto_df = sns.load_dataset('mpg')
# Calculate the correlation matrix on numeric columns
corr = auto_df.select_dtypes('number').corr()
# Plot the correlation heatmap
sns.heatmap(corr)
This will display a basic heatmap showing correlations. For a more visually appealing chart, you can use a diverging color palette to represent correlations more clearly:
cmap = sns.diverging_palette(220, 20, as_cmap=True)
sns.heatmap(corr, cmap=cmap, annot=True, linewidths=0.5)
This approach gives a clean and informative visualization of correlation values ranging from -1 to 1. I use this method all the time, especially when analyzing relationships in my data!
Great approach, Shashank! I often use Seaborn too, but sometimes I want more customization in the correlation heatmap python outputs. For example, using Matplotlib, you can design a heatmap with full control over aesthetics. Here’s a detailed example:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load dataset and calculate correlation
auto_df = sns.load_dataset('mpg')
corr = auto_df.select_dtypes('number').corr()
# Create a custom colormap
cmap = plt.cm.RdYlBu_r # Reversed Red-Yellow-Blue color map
# Plot the heatmap with additional details
plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=True, cmap=cmap, linewidths=0.5, vmin=-1, vmax=1)
plt.title("Correlation Heatmap with Custom Color Map")
plt.show()
Using the annot=True
parameter adds correlation values to the cells, and specifying vmin
and vmax
ensures consistent scaling from -1 to 1. The RdYlBu_r
palette is excellent for highlighting negative (blue) and positive (red) correlations.
When I need a polished visualization for presentations, this method is my go-to! It’s versatile and allows you to tweak every visual aspect.
Sam, that’s a great way to customize a heatmap! But for those who want an interactive correlation heatmap python solution, Plotly is fantastic. It lets you zoom, hover, and explore the data dynamically. Here’s an example:
import plotly.express as px
import pandas as pd
# Load dataset and compute correlation matrix
auto_df = sns.load_dataset('mpg')
corr = auto_df.select_dtypes('number').corr()
# Create an interactive heatmap
fig = px.imshow(corr,
color_continuous_scale='RdBu_r',
zmin=-1, zmax=1,
title="Interactive Correlation Heatmap")
fig.update_layout(xaxis_title='Variables', yaxis_title='Variables')
fig.show()
This method creates an interactive heatmap where you can hover over cells to see precise correlation values and zoom in on specific areas of interest. I use it when sharing insights with colleagues or stakeholders since they can explore the heatmap themselves. It’s super intuitive and takes visualization to the next level!