I’m looking for a clean way to use pandas create empty dataframe and iteratively fill it for a time series use case. I want the DataFrame to start with predefined columns like A, B, and C, indexed by a range of dates. Initially, all values can be set to 0 or NaN.
From there, I’d like to populate the DataFrame row by row—using logic like df.loc[today, ‘A’] = df.loc[yesterday, ‘A’] + 1.
Here’s a snippet of what I’m currently doing (using valdict and separate Series), but it feels clunky and not as “Pandas-native” as it could be. Is there a more elegant or idiomatic way to use pandas create empty dataframe and handle row-wise operations for this kind of setup?
I’ve worked a lot with time-indexed data, and when it comes to time series, pre-structuring your DataFrame makes life easier and keeps performance steady. I usually go with this setup when I want to pandas create empty dataframe and fill it row by row:
import pandas as pd
dates = pd.date_range(start="2024-01-01", periods=10)
df = pd.DataFrame(index=dates, columns=["A", "B", "C"])
df = df.fillna(0) # or use np.nan if you prefer
for i in range(1, len(df)):
today = df.index[i]
yesterday = df.index[i - 1]
df.loc[today, 'A'] = df.loc[yesterday, 'A'] + 1
This avoids the cost of dynamically growing the DataFrame inside a loop—something pandas doesn’t handle very efficiently. Pre-indexing keeps your temporal logic tight and predictable.
@shashank_watak approach is solid and I’ve used it many times too. But if you’re trying to scale or keep things cleaner, I usually take it one step further. Instead of using .loc
with a loop, you can go fully vectorized. Still using the pandas create empty dataframe strategy, here’s how I’d tweak it:*
import pandas as pd
dates = pd.date_range("2024-01-01", periods=10)
df = pd.DataFrame(0, index=dates, columns=["A", "B", "C"])
df['A'].iloc[0] = 1 # Initialize first value
df['A'] = df['A'].shift(1).fillna(0) + 1
This does the same as above, builds ‘A’ based on the previous day’s value but it’s more concise and leverages pandas’ strength: vectorization. Great when you’re working with thousands of timestamps and want to avoid manual iteration altogether.
I’ve run into cases where you’re not working with all the data at once, say you’re consuming it in chunks or real time. In those situations, I don’t even start with a full DataFrame. Instead, I build it incrementally using Series buffers and then combine them all at once. Still using the pandas create empty dataframe mindset, but from the bottom up:
import pandas as pd
date_range = pd.date_range("2024-01-01", periods=10)
rows = []
for t in date_range:
prev = rows[-1]['A'] if rows else 0
row = pd.Series({"A": prev + 1, "B": 0, "C": 0}, name=t)
rows.append(row)
df = pd.DataFrame(rows)
This avoids the overhead of repeated .loc
calls and keeps things flexible, ideal when your data’s trickling in. At the end, you still end up with a clean and structured DataFrame, built row by row