Read XLSX File and Parse Data in Python

prynka.hans · January 2, 2025, 6:30pm

How to Python Read XLSX File and Parse Data?

I want to read a .xlsx file using the Pandas library in Python and later port the data to a PostgreSQL table.

Here’s what I have so far:

import pandas as pd
data = pd.ExcelFile("*File Name*")

I know the step is successful, but I want to understand how to parse the Excel file and map its data to the data variable. I believe data is a DataFrame object. How can I parse this DataFrame and extract each row one by one?

tim-khorev · January 2, 2025, 6:30pm

If you’re new to this, one straightforward way to work with your data is by iterating row by row using iterrows(). Here’s how it looks:

import pandas as pd  

# Read the Excel file into a DataFrame  
data = pd.read_excel("*File Name*")  

# Iterate over rows using iterrows()  
for index, row in data.iterrows():  
    print(row)  # Access each row as a pandas Series

This method gives you each row as a Series, which is handy when you’re working with smaller datasets. However, keep in mind that iterrows() can be slower for larger files.

emma-crepeau · January 2, 2025, 6:30pm

Building on Tim’s solution, if performance becomes a concern (especially for larger datasets), I recommend using itertuples(). It’s a more efficient way to iterate through rows.

import pandas as pd  

# Read the Excel file into a DataFrame  
data = pd.read_excel("*File Name*")  

# Iterate over rows using itertuples() (faster)  
for row in data.itertuples(index=False):  # index=False prevents including index in the row  
    print(row)  # Access each row as a named tuple

Unlike iterrows(), this gives you a named tuple for each row, making it both faster and cleaner to work with. If you’re dealing with larger datasets in your python read xlsx operations, this approach is ideal.

richaaroy · January 9, 2025, 10:59am

Taking Emma’s point further, another option for processing data is to convert the DataFrame into a list of lists. This approach simplifies accessing rows for certain use cases:

import pandas as pd  

# Read the Excel file into a DataFrame  
data = pd.read_excel("*File Name*")  

# Convert the DataFrame to a list of lists (each row is a list)  
data_list = data.values.tolist()  

# Print each row  
for row in data_list:  
    print(row)  # Each row is a list of values

Using values.tolist(), you effectively remove the overhead of Pandas operations when iterating. This is great if your goal is to feed this data directly into another system, like your PostgreSQL table, during a python read xlsx workflow.