How can I correctly plot a time series in Python?
I have been trying to plot a time series graph from a CSV file. I’ve successfully read the file and converted the date information from string format to a datetime object using strptime and stored it in a list. However, when I try to plot the data using matplotlib, where the list contains the date information, it plots the dates incorrectly. Instead of recognizing the dates on the x-axis, it appears as a series of dots with values like 2012, May, 31, 19:00 on the y-axis, with x = 1, and so on.
I understand that this is not the correct way of passing date information for plotting. Can someone guide me on how to pass and plot time series data correctly in Python?
To correctly plot the time series in Python, make sure your date values are converted into datetime objects. Then, let matplotlib handle these objects natively. Here’s how:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
# Assuming you have your dates and values stored in two lists
dates = ['2012-05-31 19:00', '2012-06-01 19:00', '2012-06-02 19:00']
values = [10, 20, 30]
# Convert string dates to datetime objects
dates = [datetime.strptime(date, '%Y-%m-%d %H:%M') for date in dates]
# Plotting the time series
plt.figure(figsize=(10, 5))
plt.plot(dates, values)
# Formatting the x-axis to show the date in a readable format
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d %H:%M'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator())
plt.xticks(rotation=45)
plt.show()
This approach uses datetime.strptime
to convert strings into datetime
objects, which matplotlib natively supports for plotting time series. For better visuals, the x-axis is formatted using mdates.DateFormatter
. This makes it easy to handle and display dates correctly.
If your data is in a CSV file, pandas makes it much simpler to handle and plot time series data in Python. Here’s an example:
import pandas as pd
import matplotlib.pyplot as plt
# Read CSV file with date column (ensure the 'date' column is parsed as datetime)
df = pd.read_csv('your_file.csv', parse_dates=['date'])
# Plotting the time series
plt.figure(figsize=(10, 5))
plt.plot(df['date'], df['value'])
plt.xticks(rotation=45)
plt.show()
When you use the parse_dates
parameter in pandas.read_csv
, pandas automatically converts your date column into datetime objects. You can directly use these for plotting with matplotlib. This eliminates the need to manually handle conversions and ensures clean and readable time series plots.
For advanced date handling or when working with large datasets, numpy’s datetime64 type is an efficient choice. It integrates well with matplotlib for plotting time series data.
import numpy as np
import matplotlib.pyplot as plt
# Convert your string dates to numpy datetime64
dates = np.array(['2012-05-31 19:00', '2012-06-01 19:00', '2012-06-02 19:00'], dtype='datetime64[m]')
values = [10, 20, 30]
# Plotting the time series
plt.figure(figsize=(10, 5))
plt.plot(dates, values)
# Formatting the x-axis
plt.xticks(rotation=45)
plt.show()
Using numpy.datetime64
, you can store and manipulate date values more efficiently. This is particularly useful for large datasets, ensuring that plotting remains smooth and accurate.