How to open .xlsx files correctly in Python 3?

anjuyadav.1398 · December 17, 2024, 6:30pm

I have an .xlsx file with one sheet, and I’m trying to open it using the xlrd library. However, I encounter issues where the file seems empty. Here is the code I’m using:

import xlrd

file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
workbook_errors = xlrd.open_workbook(file_errors_location)

The code runs without any errors, but when I check workbook_errors.nsheets, it returns 0, even though the file contains sheets. Also, when I inspect workbook_errors, it shows:

xlrd.book.Book object at 0x2..

Why does this happen, and how can I properly read the .xlsx file? Is there a better way to handle this in Python 3?

macy-davis · December 17, 2024, 6:34pm

Hey there! If you’re considering xlrd, here’s a heads-up: Starting from version 2.0.0, xlrd dropped support for .xlsx files. If you still want to use it for older workflows, you’ll need to install version 1.2.0 specifically:

pip install xlrd==1.2.0

But honestly, you’d be better off exploring alternatives since this version is outdated and not maintained for .xlsx. For instance, check out openpyxl. It’s more modern and widely used for .xlsx workflows. Still, if you’re sticking with xlrd, your code might look something like this:

import xlrd

file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
workbook_errors = xlrd.open_workbook(file_errors_location)
sheet = workbook_errors.sheet_by_index(0)
print(sheet.nrows, sheet.ncols)  # Print rows and columns count

But if you can’t downgrade xlrd, let’s talk about other tools that work better for .xlsx files.

Priyadapanicker · December 20, 2024, 4:31pm

Totally agree, Macy. While xlrd was a go-to for many, these days, openpyxl is a much better fit for handling .xlsx files. It’s actively maintained and designed specifically for such tasks.

First, install it:

pip install openpyxl

Then, here’s how you could rewrite that example:

from openpyxl import load_workbook

file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
workbook_errors = load_workbook(file_errors_location)
sheet = workbook_errors.active  # Get the active sheet
print(sheet.title)  # Print sheet name
for row in sheet.iter_rows(values_only=True):
    print(row)  # Print all rows

It’s clean and reliable, and it gives you more options for reading and manipulating your Excel files. If you’re working with .xlsx workflows, this is the way to go!

dipen-soni · December 22, 2024, 4:32pm

Absolutely, Priyada. openpyxl is fantastic for .xlsx workflows, but let me add a little more. If your requirement is heavy on performance or involves manipulating large datasets, pandas can be your best friend. It works seamlessly with openpyxl under the hood for .xlsx files.

Here’s a quick example:

pip install pandas openpyxl

And then:

import pandas as pd

file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
df = pd.read_excel(file_errors_location)  # Automatically uses openpyxl
print(df.head())  # Display the first few rows

Using pandas gives you the ability to work with Excel data like you would with a database—quick and efficient. So if you’re doing .xlsx workflows that require advanced analysis or transformation, it’s definitely worth exploring!