I have a CSV table with columns Indicator, Country, Year, and Value, and I want to transform it so that each unique Indicator becomes a new column, while keeping Country and Year as identifiers.
I tried using df.pivot(columns='Country', 'Year', 'Indicator', values='Value')
but it didn’t work and didn’t produce the desired result.
How can I accomplish this in Pandas?
I’ve run into this exact confusion before
. The key is that df.pivot()
needs the syntax like this:
df.pivot(index=['Country', 'Year'], columns='Indicator', values='Value')
Here’s what’s happening:
index=['Country', 'Year'] keeps your identifiers as rows.
columns='Indicator' turns each unique indicator into a new column.
values='Value' fills the new columns with the corresponding numbers.
I usually assign it to a new DataFrame:
pivoted_df = df.pivot(index=['Country', 'Year'], columns='Indicator', values='Value')
pivoted_df.reset_index(inplace=True) # optional, makes Country and Year normal columns again
This approach works perfectly for clean data without duplicate Country-Year-Indicator combinations.
If you have duplicate Country-Year-Indicator rows, pivot will throw an error. I usually switch to pivot_table
in those cases:
pivoted_df = df.pivot_table(
index=['Country', 'Year'],
columns='Indicator',
values='Value',
aggfunc='mean' # or sum, max, etc., depending on your data
)
pivoted_df.reset_index(inplace=True)
I like this method because it handles duplicates gracefully and gives you control over how to aggregate repeated values.
After pivoting, Pandas sometimes gives a MultiIndex
in columns, which can be annoying.
I usually do this to clean it up:
pivoted_df.columns.name = None # remove the column index name
pivoted_df.columns = [str(col) for col in pivoted_df.columns] # convert to plain strings
I’ve found this really helps when I want to export the DataFrame
to CSV or work with it in analysis, it looks much cleaner and behaves like a regular table.