Matplotlib & Datetimes – Tutorial 04: Grouping & Analysing Sparse Data

To extend the previous tutorial (see here), we define a data array that has some information about the event that occurred for each datetime. The plot of data vs time now looks like:

The data array is constructed with numpy.random:

data = np.random.randint(10000,size=len(times))

Now, we will modify the example from tutorial 03:

def group(di):
    return int(calendar.timegm(di.timetuple()))/binning

list_of_dates = np.array(times,dtype=datetime.datetime)
grouped_dates = [[datetime.datetime(*time.gmtime(d*binning)[:6]), len(list(g))] for d,g in itertools.groupby(list_of_dates, group)]
grouped_dates = zip(*grouped_dates)

and instead of taking the number of occurrences with len(list(g)), we define an analysis method to do some clever stuff on g:

def group(di):
    return int(calendar.timegm(di.timetuple()))/binning

def analyse(gi):
    indexes = np.array([np.where(list_of_dates == di)[0] for di in list(gi)]).ravel()
    return np.mean(data[indexes])

grouped_dates = [[datetime.datetime(*time.gmtime(d*binning)[:6]), analyse(g)] for d,g in itertools.groupby(list_of_dates, group)]
grouped_dates = zip(*grouped_dates)

Analyse gets the iterable as argument, which gets converted to a list and we build an array of the indexes of each datetime. This indexes array is then used to select items in the data array, and the mean of this is returned. The final plot will look like :

Note that we plot the bars with a facecolor proportional to the data value (using import matplotlib.cm as cm):

ax = plt.subplot(212,sharex=ax)
bars = plt.bar(grouped_dates[0],grouped_dates[1],width=float(binning)/DAY)
for r,bar in zip(grouped_dates[1], bars):
    bar.set_facecolor(cm.jet(float(r)/np.amax(grouped_dates[1])))
    bar.set_alpha(0.5)
ax.xaxis_date()
plt.grid(True)
plt.title('Mean of data per %i seconds binned random datetimes' % binning)

Voilà !

 

The full code is after the break:

import numpy as np
import matplotlib.pyplot as plt
import datetime, time, calendar
from matplotlib.dates import num2date, DateFormatter
import matplotlib.cm as cm
import itertools

N = 10000
starttime = time.time()
basetimes = sorted(np.random.random(N)*np.random.random(N)*1.0e3+starttime)
times = [datetime.datetime(*time.gmtime(a)[:7]) for a in basetimes]
for i, atime in enumerate(times):
    times[i] = atime + datetime.timedelta(microseconds=(basetimes[i]-int(basetimes[i])) * 1e6)

list_of_dates = np.array(times,dtype=datetime.datetime)
data = np.random.randint(10000,size=len(times))

SECOND = 1
MINUTE = SECOND * 60
HOUR = MINUTE * 60
DAY = HOUR * 24

binning = 5*SECOND

def group(di):
    return int(calendar.timegm(di.timetuple()))/binning

def analyse(gi):
    indexes = np.array([np.where(list_of_dates == di)[0] for di in list(gi)]).ravel()
    return np.mean(data[indexes])

grouped_dates = [[datetime.datetime(*time.gmtime(d*binning)[:6]), analyse(g)] for d,g in itertools.groupby(list_of_dates, group)]
grouped_dates = zip(*grouped_dates)

#Let's plot !
fig = plt.figure()

ax = plt.subplot(211)
plt.scatter(times,data,alpha=0.1)
ax.xaxis_date()
plt.grid(True)
plt.title('Random datetimes plotted vs their random data values')

ax = plt.subplot(212,sharex=ax)
bars = plt.bar(grouped_dates[0],grouped_dates[1],width=float(binning)/DAY)
for r,bar in zip(grouped_dates[1], bars):
    bar.set_facecolor(cm.jet(float(r)/np.amax(grouped_dates[1])))
    bar.set_alpha(0.5)
ax.xaxis_date()
plt.grid(True)
plt.title('Mean of data per %i seconds binned random datetimes' % binning)

plt.show()

Leave a Reply

Your email address will not be published. Required fields are marked *

*