How to iterate over rows in pandas?
Hello Asheenaraghununan,
Using DataFrame.iterrows
is a generator that yields both the index and row (as a Series):
import pandas as pd
df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
df = df.reset_index() # ensure indexes match the number of rows
for index, row in df.iterrows():
print(row['c1'], row['c2'])
Output:
10 100
11 110
12 120
Important Note from the Documentation
Iterating through pandas objects is generally slow. In many cases, manual row iteration can be avoided with the following approaches:
-
Vectorized Solution: Use built-in methods or NumPy functions, including (boolean) indexing, to perform operations.
-
Use
apply()
: When you have a function that cannot operate on the entire DataFrame/Series at once, it is better to useapply()
instead of iterating over the values. See the documentation on function application for more details. -
Cython or Numba: If you need to perform iterative manipulations on the values and performance is crucial, consider writing the inner loop with Cython or Numba. Refer to the enhancing performance section for examples of this approach.
For more in-depth alternatives to iter* functions, refer to other answers in this thread.
Hello Asheenaraghuninah,
Caution: Avoid Iterating Over DataFrames
Avoid using iteration in pandas whenever possible. Iteration is slow and should be the last resort.
For printing a DataFrame:
df.to_string()
For computations, consider these methods in order:
- Vectorization: Use built-in pandas or NumPy functions.
- Cython routines: Write custom Cython extensions.
- List comprehensions: A fast and simple method.
- DataFrame.apply():
- For reductions in Cython
- For row-wise operations in Python
- DataFrame.itertuples(): For generating named tuples for each row.
- DataFrame.iterrows(): Use only when absolutely necessary.
Note from Documentation: Iterating through pandas objects is generally slow and often avoidable.
Examples: Vectorized Solution:
df['new_col'] = df['col1'] + df['col2']
List Comprehension:
result = [f(x) for x in df['col']]
Summary: -Prefer vectorization for performance. -Use apply() for complex row-wise operations.
- Avoid errors (), unless necessary for sequential processing.
Final Note: Understanding when to use iterative methods requires familiarity with pandas. If you’re unsure, stick with vectorized operations.
Hey,
Answer to Question:-
There are multiple ways to iterate over a DataFrame, but here are two methods that are both easy and efficient:
- DataFrame.iterrows()
- DataFrame.itertuples()
Example:
import pandas as pd
inp = [{'c1': 10, 'c2': 100}, {'c1': 11, 'c2': 110}, {'c1': 12, 'c2': 120}]
df = pd.DataFrame(inp)
print(df)
# Using iterrows method
for index, row in df.iterrows():
print(row["c1"], row["c2"])
# Using itertuples method
for row in df.itertuples(index=True, name='Pandas'):
print(row.c1, row.c2)
Note: itertuples()
is generally faster than iterrows()
.