## 1 Introduction: A Space of Timing Models

Reaction times are a rich form of data that has been widely used to understand how humans and other animals make simple, two-alternative perceptual discriminations, and how they time intervals. However, statistical techniques for analyzing reaction time data have been developed more extensively in the case of non-temporal, perceptual decision research than in timing research. This chapter describes how to apply reaction time (rt) analysis techniques from non-temporal decision research to the temporal domain, and it describes Matlab code that can be used to implement these analyses (see book’s GitHub repository).

In order to motivate the application of these techniques for interval timing research, it helps first to consider a simple class of computational models of timing, the pacemaker-accumulator (pa) models (Creelman, 1962; Gibbon & Church, 1984; Killeen & Fetterman, 1988; Treisman, 1963). Simulating these models is interesting in its own right. For the purpose of this chapter, though, simulation is merely a tool that allows researchers to generate “fake” reaction time data that they can compare against their empirical data, test their intuitions, and debug their analysis code.

Importantly, starting with a simple, classic family of timing models also provides an example of how rt analysis can be used to select the “best” model within a class. Model selection is currently a topic of great interest in non-temporal decision research. Furthermore, starting with simulations may also help explain why the techniques used in non-temporal decision research have so far not reached the same level of use in timing research. That is, in many cases the sophistication of these techniques may seem to be wasted on the data in timing. Timing data, after all, typically has fewer degrees of freedom than decision data, as we will describe shortly. However, some examples from the literature will serve to illustrate the point that rt analysis can, contrary to this view, be extremely helpful in assessing models of timed behavior (e.g., Balci & Simen, 2014; Simen, Vlasov, & Papadakis, 2016).

### 1.1 Fixed Clock-speed Pacemaker-accumulators

We now consider two different types of timing model and ways to simulate them, thereby producing fake rt data. We can then apply the rt analysis methods described in the final section of this chapter to test whether these methods can properly determine which model best fits the fake data. This is an important step to undertake before applying the methods to real rt data from an experiment. If these methods cannot accurately determine which model generated the data even when we already know the answer, then they will not be useful in the real world. For this reason, model simulation should always be the first step in vetting a model-fitting procedure. We will restrict our attention in this chapter to the most basic and most classical type of timing model: the pa.

pas are just about the simplest possible model of timing. They operate like a stopwatch, which counts up regular or irregular clock ticks until hitting some threshold either for making a decision (such as, whether the current interval is longer or shorter than another interval in memory) or for producing a response (such as pressing a lever to earn a reinforcement). We will refer to these two possibilities as decision and production tasks respectively. There are many possible variations on the basic pa idea, however, and different variations make different predictions, especially about production times. rt analysis techniques will therefore be useful in teasing these models apart.

Creelman (1962) developed an early pa model in which a source of clock pulses emits pulses randomly, at a fixed rate. The time between pulses was exponentially distributed, making the process of counting them up into a Poisson counting model. Since this basic process is central to several of the models for which we provide Matlab code, we begin with it. *Poisson_countermodel.m* contains the code in its entirety (see book’s GitHub repository). It generates exponentially distributed random inter-pulse intervals, then adds them up, and checks when their sum has exceeded a threshold.

An important variation on this model allows for changes in pacemaker rate across trials, as well as variability in the pulse count threshold across trials. Gibbon and Church (1990) considered how these variations could account for a problem with Creelman’s model, which is its conflict with a widely observed phenomenon in timing known as ‘scalar invariance’. This benchmark phenomenon in interval timing research is one in which the standard deviation of remembered interval durations appears to grow linearly with the timed duration. This pattern yields a constant ratio of the standard deviation divided by the mean – a ratio known as the coefficient of variation (cv). Indeed, in production tasks, the production time distributions for different durations frequently superimpose when the data from different duration-conditions are divided by

Treisman (1963) developed an early variant of this pa model that did not make specific assumptions about the distribution of clock pulses except for a key constraint on their statistics. The constraint is that the inter-pulse intervals are mostly relatively short in some trials, compared to the global average over trials, and mostly relatively long in others. That is, the clock speed varies from trial to trial (but always around a fixed average). Another way to say it is that, across trials, the *i*th inter-pulse interval following the clock-start is correlated with the *j*th inter-pulse interval in the same trial, for all intervals *i* and *j* occurring prior to the end of the timed duration. Simulations of this model demonstrate that it takes only a remarkably small amount of such correlation (i.e., variation in the pacemaker rate) across trials to recover the pattern of constant cvs. *Treisman63.m* contains code (see book’s GitHub repository) that allows any desired distribution of inter-pulse intervals to be simulated while still observing the constraints. Part of what the code produces is a correlation matrix for selecting random inter-pulse intervals with the specified level of correlations. As users can see for themselves, correlations can be reduced nearly to 0 in a Creelman-style pa model but still obtain constant cvs.

### 1.2 Variable Clock-speed Pacemaker-accumulators

The models in the preceding section used a pacemaker that emits random pulses at a constant rate on average, and stores different pulse-count totals to encode different durations *T*. In contrast, the models in this section use a fixed pulse-count threshold (call it *θ*) and a variable pacemaker rate, *A* = *θ*/*T.* When the pulses are purely excitatory, the model is equivalent to Killeen and Fetterman’s (1988) Behavioral Theory of Timing (BeT) model (although the “pulses” in that case are interpreted as transitions between states of behavior). When we add negative pulses emitted at a rate proportional to the positive pulse emission, the resulting process closely approximates a process known as a diffusion process (Simen et al., 2013). Such processes are idealized, one-dimensional Brownian motion systems, in which a particle (equivalent to a pulse-count) drifts upward while being continuously perturbed by noise at every moment. This kind of perturbation (in three dimensions) accounts for the diffusion of particles in a liquid or gas over time. One-dimensional drift-diffusion models (ddms) are leading models of two-alternative perceptual decision making, where the “particle position” represents something like the log odds ratio of one hypothesis vs. the alternative. In either case, BeT or the *opponent_Poisson_Appendix3.m* explicitly compares the simulation of actual pulse-counting to a diffusion approximation of pulse-counting (specifically, the Euler-Maruyama method, in which spike counting *per se* does not occur). The primary computational difference between these approaches is that in the former, a sequence of pulse-times is generated and then summed; in the latter, properly scaled deterministic and random noise increments are added to a running sum at every one of a sequence of time steps. The primary theoretical difference is that the pulses are finite in number and spaced out at separate points in time, in the former, while they are idealized as occurring infinitely often at every moment in the latter. Since the latter theoretical assumption of the diffusion model is not likely to be true in reality, it is worth noting that this assumption allows some very simple approximate mathematical expressions to be used to fit rt distributions.

## 2 Reaction Time Analysis Methods

Built-in Matlab functions that are particularly useful for reaction time analysis include sorting functions (sort), plotting functions (plot, hist, ksdensity), and many of the features of the Matlab Statistics Toolbox (this toolbox is an add-on product that comes with its own licensing fee). These functions are incorporated into several of the functions accompanying this chapter.

Growing numbers of researchers these days are also opting for alternatives to Matlab, such as the free, open-source, Python programming and scripting language, or the statistical programming language, R, in addition to open-source versions of Matlab such as Octave. Python, for example, operates in many ways like Matlab, particularly when the NumPy and SciPy numerical and scientific computing packages and the Matplotlib graphics packages are imported (see, e.g., Anaconda, for a complete development environment for scientific computing in Python). Here, however, we restrict our attention to programs in Matlab. The primary virtue of Matlab, in the author’s opinion, is that, although it must be purchased and renewed yearly for updates, the software is stable and platform-independent across Mac, Windows, and Linux computing platforms; documentation for each function is generally trustworthy and well organized; and there are fewer of the problems that seem to accompany open-source software (e.g., the confusion surrounding multiple, slightly different parallel versions – e.g., Python 2 vs. Python 3 – and updates that frequently break functioning software until workarounds are developed). Whether those features are worth the cost is a matter of personal opinion.

### 2.1 Using Moments to Characterize the Data and Evaluate Conformance to the Scalar Invariance Pattern

The shape of a probability distribution can be characterized by its moments: that is, by the mean, variance, skewness, kurtosis, etc. Mathematically, the *n*th central moment is defined as the expected value of the *n*th power of the data minus the mean (when such expectation exists). In the case of skewness and kurtosis, the third and fourth central moments are normalized through division by the standard deviation. In Matlab, these moments can be computed with a single function call. For this, we will assume that the reaction time values are contained in a vector having n elements in the variable rt. With such a variable, we can compute:

Documentation for any of these can be easily obtained at the Matlab command prompt by typing, e.g.:

In the literature on timing, the most famous phenomenon that can be observed by measuring rts is the constancy of relative temporal precision (Gibbon, 1977), that is, the constant ratio between the first two central moments of production time distributions. In studies that require animals to learn to press a lever after a delay of *T* seconds from a stimulus to obtain a reward, production times are found to be typically accurate: the average of the production time distribution is close to *T*. The precision of the production time across trials, however, decreases as *T* increases. Such a relationship can be captured by computing the cv, which is the standard deviation (square root of the variance) divided by the mean of the response times contained in the vector variable rt. In Matlab, the cv can be obtained with:

cvs are often found to be roughly constant as a function of *T* across groups of participants. Violations of cv constancy are often observed if multiple durations *T* are mixed in a single experiment, with both human and non-human participants (Bizo, Chu, Sanabria, & Killeen, 2006).

### 2.2 Skewness and Model Selection

Skewness, the third standardized moment, captures how symmetric a distribution is. Gaussian distributions (a shape that is often used to fit production

It should be noted that when the cv is small, skewness is, therefore, also expected to be small. Thus, if conditions can be created that generate large cvs, such as by driving attentional resources away from the primary task, then skewness should be expected to increase for models such as BeT and ddm, but not for models that predict a normal distribution of rts, such as the information processing implementation (Gibbon, Church, & Meck, 1984) of scalar expectancy theory (set; Gibbon 1977).

In summary, skewness is worth computing as it helps in model selection. Matlab has a built-in skewness function that can be used in this way. The Matlab Statistics Toolbox also includes helpful functions for assessing normality or deviations from normality in a set of rt data in variable rt (e.g., normplot(rt)).

### 2.3 Quantiles and Timescale Invariance

In Section 2.1, we have discussed how the cv tends to remain the same for different base intervals. Another famous feature of timing data that implies constant cvs, but is a stronger form of invariance, is called scalar invariance or timescale invariance. This is a phenomenon in which the entire distribution of rts can be rescaled, through division by the mean, so that the shape of the rt distributions with different means superimpose on each other perfectly after rescaling.

A good test of this form of invariance is to examine a feature of the rt data that is widely used in 2AFC research. To obtain a compact empirical description of the distribution, the rts are divided into quantiles. For example, after the rts are sorted from fastest to slowest, it is possible to calculate the 10th, 30th, 50th, 70th, and 90th percentile by taking the 10% fastest, 30% fastest, 50% fastest, 70% fastest, and 90% fastest rt, respectively. If there is scalar invariance, the plot of the quantiles from one duration condition should line up with the corresponding quantiles from another condition.

The following Matlab commands will compute the quantiles by sorting the reaction time data in the vector variable rt :

determine how many trials there are

and compute the quantiles of reaction time data stored in the Matlab variable rt:

Supposing that there are quantiles from a different data set stored in the variable quantiles2, the two rt distributions can be examined for time scale invariance by plotting them against each other:

These plot points should form nearly a straight line, with slope equal to the ratio of the average of the timed durations in the two different conditions. Additional analyses based on linear regression can be computed and the best fitting line can be superimposed on the data (a quick but incomplete way of doing this is to go to the Tools menu of the figure window in Matlab, select the Basic Fitting menu option, and then select “linear” in the popup dialog box that appears).

### 2.4 Maximum Likelihood Fitting of Distributions to rt Data

The most sophisticated method for testing model predictions is by fitting model parameters to empirical rt distributions. In two-alternative perceptual decision research, this approach is powerful and widely used. Despite having a similar goal, different fitting methods greatly vary in terms of their computational speed, their robustness (resilience from data not actually produced by the process under investigation, but instead delayed by distraction, for example), and the amount of data they require for fitting accurately. For this reason, a large amount of work is devoted to comparing the fitting methods and trying to find new ones.

Maximum likelihood fitting is a standard technique for fitting models to data in many areas. The *likelihood function* is the point-by-point product of a *fit_models.m* (see book’s GitHub repository). It is relatively concise, because the fitdist function hides a large degree of complexity.

The maximum likelihood method is particularly useful in timing research, because pa models produce rt distributions with simple, closed-form expressions, such as the normal, gamma, or inverse Gaussian distributions. The most common distributions are built in to Matlab’s Statistics Toolbox, allowing samples to be drawn easily. Moreover, because of the closed-form expression, the fit to the data can be done by simply modifying the parameters of the distribution and finding the best fit rather than by sampling the data and simulating the model output as in the case of models with no closed-form expression. Instead, a computation-intensive process is required to evaluate the likelihood function for the data of models without a known, closed-form RT distribution, so that maximum likelihood methods are frequently outperformed by methods such as the chi-square method (Ratcliff & Tuerlinckx, 2002). Furthermore, new methods such as hierarchical Bayesian methods (e.g., Wiecki, Sofer, & Frank, 2013) are becoming increasingly interesting to researchers, given their ability to fit small amounts of data and to infer population-level parameters efficiently. For example, they can infer a set of parameter values that represents patients with attention deficit hyperactivity disorder versus a set of values that represents neurotypical control participants. We do not go into those methods here, since they are at the forefront of development in the two-alternative decision domain, and have not yet (to my knowledge) been widely used in fitting timing data.

### 2.5 Model Complexity

Maximum likelihood fitting offers one other useful property, which is that likelihood methods can easily be adapted so as to penalize for model complexity. Occam’s Razor is the principle that the simplest explanation of real data should be preferred, all else being equal. When one statistical model has more parameters than another, it has more flexibility to fit a wider range of data patterns. In the extreme, a model that has as many parameters as data points is likely to fit the data perfectly. However, such a model is not at all likely to generalize well to data that has not been fit. Such models are said to overfit observed data, at the risk of failing to fit unobserved data. To combat this risk, tests such as the Akaike Information Criterion (Akaike, 1974), and the Bayesian Information Criterion (Schwarz, 1978), can be used to rule out overly complex models. Both of these methods simply add a penalty to the logarithm of the computed likelihood function, with the amount of the penalty depending on the number of parameters. Thus, a model A that fits the data less well, but with fewer parameters than model B, may in the end have a higher likelihood score. In such a case, we select model A over model B.

If the log likelihood of the data is computed by the fitting methods described in the previous section and stored in variable ll, it can be adjusted for parameter penalties as follows, where the number of parameters in the model is k (for example, Creelman’s model has a pulse rate parameter and a pulse-count threshold parameter, so k = 2):

Given that the log likelihood is subtracted from the parameter penalty in the standard formulation of the aic, the goal is to select the model with the minimum aic score.

The Bayesian Information Criterion is also used for model selection and tends to implement a stronger penalty for parameters:

### 2.6 Outlier Treatment

One factor that bedevils rt research is the presence of contaminated data. If a participant does not pay attention during a trial of an experiment, they may issue a response that is far later, or far earlier, than would normally occur. There is a host of different approaches to removing outliers from data, though none can be assured of doing it correctly (see, e.g., Ratcliff, 1993). After all, a very

Here is a simple technique for eliminating outliers that is by no means guaranteed to work perfectly, but in any case can be adapted by users to be more or less conservative as they see fit. It is included here primarily to emphasize an incredibly useful technique for indexing rt data that is outside an outlier cutoff range.

We can eliminate unusually long rts by computing the standard deviation of the data, and then keeping only those data that are smaller than some number of standard deviations above the mean, for example 3:

The syntax inside the outer parentheses creates a vector of logical 1s and 0s. Only those elements of the rt array that have a logical 1 in the corresponding array created by the <= operation will be assigned to rt. The result is that any rt greater than 3 standard deviations above the mean will be deleted from the array rt.

## 3 Conclusion

The Matlab code that accompanies this chapter (see book’s GitHub repository) is intended to help researchers who are new to rt analysis to begin analyzing their data with the aim of model selection. Arguably the best thing about Matlab is its extensive, easily searchable Help documentation (for Matlab version 2014a or later, type doc at the Matlab command prompt to bring up the documentation viewer). This author learned Matlab simply by progressing through the Matlab documentation and trying out the examples, which are provided in nearly every help topic at the command prompt. The MathWorks website also includes tutorial videos and forums for getting help from other users. After consulting the Matlab tutorial documentation section (called Getting Started With Matlab in version 2014b), I recommend that new users try out some of the functions provided with this chapter, using the debugger in Matlab to step through lines of code one by one to see how variables in memory are changing as a script or function is executing, and to learn how particular Matlab built-in commands are used for rt analysis. Experienced users may wish to use this code as a stepping-off point for investigating methods more

I have not addressed the very useful technique of fitting rts in retrospective timing tasks, as was done for example in Balci and Simen (2014). In such tasks, estimates of time intervals are used as the inputs to a decision process. Balci and Simen (2014) applied this technique to data from a temporal bisection task, in which intervals are presented and the participant must categorize them as being either closer to a short reference interval, or closer to a long reference interval. Choice probabilities for long and short choices and corresponding rts in this case are not necessarily directly related to the mechanism of time estimation, but the rt data here provide important information about the mechanism by which temporal discriminations are made after an interval is over. Because the literature on non-temporal, two-alternative decision making covers this type of rt analysis extensively, and because fitting two-choice rt data is more complicated than fitting “1-choice” production times, I do not address such methods here. However, they are really just extensions of the approaches described here. Ratcliff and Tuerlinckx (2002) offer a comprehensive discussion of how model fitting is done, as just one example, and a number of fitting algorithms and tutorials exist in the two-choice perceptual decision domain.

## References

Akaike H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control19716–723.

Balci F. & P. Simen (2014). Decision processes in temporal discrimination. Acta Psychologica149157–168.

Bizo L.A. J.Y.M. Chu F. Sanabria & P.R. Killeen (2006). The failure of Weber’s law in time perception and production. Behavioral Processes71201–210.

Creelman C.D. (1962). Human discrimination of auditory duration. The Journal of the Acoustical Society of America34582–593.

Gibbon J. (1977). Scalar expectancy theory and Weber’s law in animal timing. Psychological Review84279–325.

Gibbon J. (1992). Ubiquity of scalar timing with a Poisson clock. Journal of Mathematical Psychology35283–293.

Gibbon J. & R.M. Church (1984). Sources of variance in an information processing theory of timing. In Roitblat H.L. T.G. Bever & H.S. Terrace (Eds.) Animal Cognition (pp. 465–488). Erlbaum.

Gibbon J. & R.M. Church (1990).

Representation of time. Cognition3723–54. Gibbon J. R.M. Church & W.H. Meck (1984) Scalar timing in memory. In Gibbon J. and L.G. Allan (Eds.) Annals of the New York Academy of Sciences: Timing and Time Perception Vol. 423 (pp. 52–77) New York Academy of Sciences.

Killeen P.R. & J.G. Fetterman (1988). A behavioral theory of timing. Psychological Review95(2) 274–295.

Ratcliff R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin 114510–532.

Ratcliff R. & F. Tuerlinckx (2002). Estimating parameters of the diffusion model: Approaches to dealing with contaminant reaction times and parameter variability. Psychonomic Bulletin and Review9(3) 438–481.

Schwarz G. (1978). Estimating the dimension of a model. The Annals of Statistics6(2) 461–464.

Simen P. F. Balci L. deSouza J.D. Cohen & P. Holmes (2011). A model of interval timing by neural integration. Journal of Neuroscience319238–9253.

Simen P. F. Rivest E.A. Ludvig F. Balci & P.R. Killeen (2013). Timescale invariance in the pacemaker-accumulator family of timing models. Timing & Time Perception1159–188.

Treisman M. (1963). Temporal discrimination and the indifference interval: Implications for a model of the ‘internal clock’. Psychological Monographs771–31.

Wiecki T. I. Sofer & M.J. Frank (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics714.