Read CSV Rows in Python

How can I read rows from a CSV file in Python?

I have a CSV file with the following structure:

Year:  Dec: Jan:
1      50   60
2      25   50
3      30   30
4      40   20
5      10   10

I know how to read the file and print each column (for example, ['Year', '1', '2', '3', etc.]). However, what I want to do is read the rows, which would look like ['Year', 'Dec', 'Jan'] and then ['1', '50', '60'] and so on.

Then, I want to store the numbers ['1', '50', '60'] into variables so that I can total them later. For example, I could do:

Year_1 = ['50', '60']
sum(Year_1) = 110

How can I achieve this using Python read csv line by line?

Certainly! You can use Python’s built-in csv module to read csv line by line and process the rows. Here’s how you can do it step by step:

import csv

# Open the CSV file
with open('data.csv', mode='r') as file:
    reader = csv.reader(file)
    headers = next(reader)  # Read the header row

    # Initialize lists for each column
    Dec_values = []
    Jan_values = []

    # Loop through each row
    for row in reader:
        Dec_values.append(int(row[1]))  # Dec column
        Jan_values.append(int(row[2]))  # Jan column

    # Calculate totals
    total_dec = sum(Dec_values)
    total_jan = sum(Jan_values)

    print(f"Total for Dec column: {total_dec}")
    print(f"Total for Jan column: {total_jan}")

In this example, we use csv.reader to iterate line by line, read the relevant columns (Dec and Jan), and store the data in lists. The sum() function is used to calculate totals for each column. This approach is simple and relies on basic Python functionalities.

If you’re open to using external libraries, pandas is a powerful library for handling CSV data more efficiently. Here’s how to use pandas for this task:

import pandas as pd

# Read the CSV file into a DataFrame
df = pd.read_csv('data.csv')

# Calculate the sum for each column
total_dec = df['Dec:'].sum()
total_jan = df['Jan:'].sum()

print(f"Total for Dec column: {total_dec}")
print(f"Total for Jan column: {total_jan}")

Why use pandas?

  • It automatically parses the CSV structure.
  • You can perform operations like summing, filtering, or grouping data with minimal code.
  • It’s faster and scales better for large datasets compared to manual looping with csv.reader.

If you’re working with data frequently, pandas can save a lot of time. While this approach doesn’t directly involve python read csv line by line, it streamlines the process for larger or more complex datasets.

For more flexibility, you can use a dictionary to store data from the CSV file. This approach is especially helpful when you want to dynamically handle multiple rows or columns. Here’s an example:

import csv

# Initialize a dictionary to store data by rows
data = {'Dec': [], 'Jan': []}

# Open the CSV file
with open('data.csv', mode='r') as file:
    reader = csv.DictReader(file)  # Read rows as dictionaries
    
    # Loop through each row
    for row in reader:
        data['Dec'].append(int(row['Dec:']))
        data['Jan'].append(int(row['Jan:']))

# Calculate totals dynamically
total_dec = sum(data['Dec'])
total_jan = sum(data['Jan'])

print(f"Total for Dec column: {total_dec}")
print(f"Total for Jan column: {total_jan}")

Why use this method?

  • It allows you to handle more complex CSV structures with dynamic column names.
  • Using DictReader, you can reference columns by name rather than index, making the code easier to maintain.
  • It still processes the CSV file line by line, aligning with the approach of python read csv line by line.