To extend the previous tutorial (see here), we define a *data* array that has some information about the event that occurred for each datetime. The plot of *data* vs *time* now looks like:

The *data* array is constructed with numpy.random:

data = np.random.randint(10000,size=len(times))

Now, we will modify the example from tutorial 03:

def group(di): return int(calendar.timegm(di.timetuple()))/binning list_of_dates = np.array(times,dtype=datetime.datetime) grouped_dates = [[datetime.datetime(*time.gmtime(d*binning)[:6]), len(list(g))] for d,g in itertools.groupby(list_of_dates, group)] grouped_dates = zip(*grouped_dates)

and instead of taking the number of occurrences with* len(list(g))*, we define an *analysis* method to do some clever stuff on *g*:

def group(di): return int(calendar.timegm(di.timetuple()))/binning def analyse(gi): indexes = np.array([np.where(list_of_dates == di)[0] for di in list(gi)]).ravel() return np.mean(data[indexes]) grouped_dates = [[datetime.datetime(*time.gmtime(d*binning)[:6]), analyse(g)] for d,g in itertools.groupby(list_of_dates, group)] grouped_dates = zip(*grouped_dates)

*Analyse* gets the iterable as argument, which gets converted to a list and we build an array of the *indexes* of each *datetime*. This *indexes* array is then used to select items in the *data* array, and the *mean* of this is returned. The final plot will look like :

Note that we plot the bars with a facecolor proportional to the data value (using import matplotlib.cm as cm):

ax = plt.subplot(212,sharex=ax) bars = plt.bar(grouped_dates[0],grouped_dates[1],width=float(binning)/DAY) for r,bar in zip(grouped_dates[1], bars): bar.set_facecolor(cm.jet(float(r)/np.amax(grouped_dates[1]))) bar.set_alpha(0.5) ax.xaxis_date() plt.grid(True) plt.title('Mean of data per %i seconds binned random datetimes' % binning)

Voilà !

The full code is after the break:

import numpy as np import matplotlib.pyplot as plt import datetime, time, calendar from matplotlib.dates import num2date, DateFormatter import matplotlib.cm as cm import itertools N = 10000 starttime = time.time() basetimes = sorted(np.random.random(N)*np.random.random(N)*1.0e3+starttime) times = [datetime.datetime(*time.gmtime(a)[:7]) for a in basetimes] for i, atime in enumerate(times): times[i] = atime + datetime.timedelta(microseconds=(basetimes[i]-int(basetimes[i])) * 1e6) list_of_dates = np.array(times,dtype=datetime.datetime) data = np.random.randint(10000,size=len(times)) SECOND = 1 MINUTE = SECOND * 60 HOUR = MINUTE * 60 DAY = HOUR * 24 binning = 5*SECOND def group(di): return int(calendar.timegm(di.timetuple()))/binning def analyse(gi): indexes = np.array([np.where(list_of_dates == di)[0] for di in list(gi)]).ravel() return np.mean(data[indexes]) grouped_dates = [[datetime.datetime(*time.gmtime(d*binning)[:6]), analyse(g)] for d,g in itertools.groupby(list_of_dates, group)] grouped_dates = zip(*grouped_dates) #Let's plot ! fig = plt.figure() ax = plt.subplot(211) plt.scatter(times,data,alpha=0.1) ax.xaxis_date() plt.grid(True) plt.title('Random datetimes plotted vs their random data values') ax = plt.subplot(212,sharex=ax) bars = plt.bar(grouped_dates[0],grouped_dates[1],width=float(binning)/DAY) for r,bar in zip(grouped_dates[1], bars): bar.set_facecolor(cm.jet(float(r)/np.amax(grouped_dates[1]))) bar.set_alpha(0.5) ax.xaxis_date() plt.grid(True) plt.title('Mean of data per %i seconds binned random datetimes' % binning) plt.show()