Duration bisection is a prospective, perceptual timing task. Prospective because the participant is aware in advance that duration judgments will be required (Chapters 2 and 4, this volume) and perceptual because the participant is instructed to classify the presentation durations of probe stimuli relative to a standard or standards rather than to produce or reproduce a given duration. The major advantage of prospective timing tasks is that multiple trials can be conducted during a test session, thereby allowing robust estimates of timing behavior and, equally important, psychophysical analysis. A major advantage of duration perception tasks is that the motor response does not contribute to the estimate of timing accuracy or variability.
Duration bisection has seen extensive use since its introduction in modern form by Church and Deluty (1977). Although originally applied to non-human animals (i.e., rats), following early work by Allan and Gibbon (1991) and Wearden (1991), the task has proven popular for use in human studies. Indeed, since 1991 more than 90 articles reporting data from the duration bisection task have been published (see Appendix – Table 5.1). These data come from participants ranging in age from the very young (e.g., Droit-Volet & Wearden, 2001) to the very old (e.g., Lustig & Meck, 2011), and although the preponderance of work has been in samples of healthy individuals, typically college students, numerous studies have applied the bisection task to clinical samples (e.g., Melgire et al., 2005; Nichelli et al., 1996). In some cases, the task has been used to address fundamental questions about the perceptual, cognitive, and neural mechanisms that underlie interval timing (e.g., How does subjective time scale with objective time?; Church & Deluty, 1977), whereas other studies have used the task to elucidate whether and how interval timing interacts with other perceptual and cognitive processes (e.g., Droit-Volet, Fayolle, & Gil, 2016).
In this chapter, we provide a brief overview of the duration bisection task, describe the analysis approaches used for duration bisection data, and conclude the chapter with some guidelines for use. Basic stimulus presentation programs and analysis code, written in matlab and R, are available at the book’s GitHub repository.
2 Duration Bisection – Origins
In an early study of temporal discrimination in non-human animals (Cowles & Finan, 1941), rats received reinforcement for running down one alleyway of a Y shaped discrimination box after a 10 s restraint period and the other alleyway after 30 s of restraint. After 600 trials of training, six of nine animals showed evidence of having learned the temporal discrimination. Although this experiment demonstrated that (some) rats could learn to discriminate the durations, it did not speak to the limits of temporal discrimination, nor to the processes underlying it. Later the same decade, Heron (1949) used a similar temporal restraint design, but a different maze layout, in an “attempt to determine the differential limen for temporal discrimination”. Eleven rats were initially trained to discriminate 5 and 45 s and then progressed to more difficult discriminations. Eight rats learned to discriminate 5 vs. 35 s and 5 vs. 25 s, but only one learned to discriminate 5 vs. 10 s.
Approximately twenty years later, Stubbs (1968) reported a procedure that examined timing of a range of durations during a single test session. Specifically, he reinforced pigeons for responding on one keylight for durations from 1 to 5 s and a second keylight for durations from 6 to 10 s (Experiment 1). Subsequently, Stubbs (1976) developed a task that measured the bird’s subjective experience of time by not reinforcing intermediate durations. Pigeons controlled the illumination color (green or red) of a reinforcement key by pecking on a separate changeover key. Whether pecking on the reinforcement key was rewarded depended on the key’s color and how long it had been illuminated. For example, in one condition, reinforcement followed pecks on the green key after 2 to 4 s of illumination and pecks on the red key after 60 to 64 s of illumination. However, pecking during the period from 4 to 60 s was not reinforced. The critical measure of interest was when pigeons switched the color of the reinforcement key from green to red by pecking on the changeover key, thereby changing the response from the short interval key (green) to the long interval key (red).
3 Duration Bisection – Modern Form
Church and Deluty (1977) reported the first use of the duration bisection task in what is now its most common form. In an initial training phase, rats received reinforcement for pressing one lever following presentation of a short duration stimulus (short anchor) and a second lever following presentation of a long duration stimulus (long anchor). In the test phase, these short and
The human version of duration bisection is similar. In the test phase, the anchor durations and intermediate durations are presented, but feedback is provided for the anchor durations only, which ensures that the task measures subjective perception of duration. However, the training/anchor learning phase is not as extensive for human participants. It usually involves a relatively limited number of presentations of each of the anchor durations only (i.e., 4 to 10), with each presentation followed by feedback indicating whether it was the long or short anchor, and a single training session prior to testing rather than multiple sessions.
It is worth noting that there is a variant of duration bisection in which participants are not explicitly trained on the anchor durations. Rather, in this partition version of the duration bisection task for humans (e.g., Droit-Volet & Rattat, 2007; Wearden & Ferrara, 1995; Wiener et al., 2014) participants are merely asked to classify each presented stimulus as short or long. Importantly, after the participants have had some experience with the range of probe durations, the response functions calculated from subsequent trials in the test session are sigmoidal. This indicates that participants are able to categorize the probe durations appropriately in the absence of explicit training with the anchor durations.
In animal studies, the anchor durations typically comprise 50% of the trials (25% each for short and long), whereas in human studies all durations, whether anchor or intermediate, are usually presented an equal number of times. Notably, the presentation probability of the anchor durations does affect stimulus classification (Akdoğan & Balcı, 2016). For example, when fewer short than long anchor stimuli are presented in the test phase, mice are more likely to classify a given probe duration as long as compared to when the short and long anchors are presented with equal frequency during the test session. Moreover, in animal work, correct classification of the anchor durations in the test phase is normally reinforced on less than 100% of the correct trials. This ensures that animals continue to respond on the intermediate probe trials, which are never reinforced, rather than learning to discriminate reinforced anchor durations from unreinforced probe durations (i.e., learning not to respond on the intermediate probes as they do not lead to reward). Indeed, with extensive training and 100% reinforcement on the anchor durations, rats learn not to respond following intermediate probe durations (Brown et al., 2011).
In contrast, human participants are usually given feedback on all anchor duration trials, unless the researcher has a specific reason for omitting feedback (e.g., Wiener et al., 2014). Human participants are also frequently told to consider the classification task as indicating whether the probe duration is “closer to” short or long judgment rather than as an endorsement that the probe duration is exactly the same as the short or long exemplar.
Neither human nor non-human animal participants are normally reinforced for classifications of the intermediate probe durations (but see Kim et al., 2013, and Ward et al., 2009, for exceptions). Consequently, the classification is a measure of the participant’s subjective judgment of time rather than a test of whether the participant remembers being reinforced for making a particular response for a particular duration.
4 Data Presentation
Data from the duration bisection task are most often plotted as the probability with which a given test duration is classified as ‘long’, although in some early papers the ‘short’ classification probability was plotted (e.g., Gibbon, 1981). If the participant has learned the temporal discrimination, the probability of a long response should be zero or close to zero for the short anchor duration and 100% or close to 100% for the long anchor duration. Due to variance in timing, classification of the intermediate probes as ‘long’ increases relatively smoothly with increasing duration (see Figure 5.1). That is to say, the psychometric response function usually has the form of a sigmoidal function rather than a step function. Absent timing variation, a step function should manifest because all durations below some magnitude would be consistently classified as short on all trials and all durations above that magnitude would be classified as long on all trials.
The parameters of interest that can be derived from the psychometric response function include the bisection point (bp; also known as the Point of Subjective Equality or pse), the difference limen (dl), and the Weber fraction (wf; also known as the Weber ratio or wr). The bp is the duration value at which the participant is equally likely to classify the stimulus (i.e., test duration) as short or long. The bp can be determined in a variety of ways from the response function. For example, Church & Deluty (1977) fitted a straight line to the three most central durations in the range presented to participants to determine the duration corresponding to 50% long classifications. Fitting a mathematical function (e.g., sigmoidal, pseudo-logistic, Weibull) and determining the bp from that fitted function is also a common
The steepness of the psychometric response function reflects temporal sensitivity and can be characterized by the difference limen (dl), with smaller dls (steeper slopes) indicating greater temporal sensitivity. The dl is usually defined as one half of the difference between the duration corresponding to a p(‘long’) of 75% and the duration corresponding to a p(‘long’) of 25%. The
5 Location of the Bisection Point (bp)
The location of the bp is important theoretically because it constrains the models that can account for bisection performance. For example, whether
Gibbon (1981) provided a detailed analysis of the combination of timescale and decision processes that can account for the form and location of the duration bisection psychometric function. He noted that Weber’s Law for time requires that “the discriminability of two durations remains constant at constant ratios of these durations, regardless of their absolute durations” and that “the subjective “middle” between two time values appears to lie at the geometric mean of these values” (p. 59). Accordingly, different probe durations, T1 and T2, from two different test conditions will be classified ‘short’ with the same probability when the T/S ratio is equivalent in both conditions and the S/L ratio is constant. For example, if the ‘short’ report probability is .75 after 3 s has elapsed when S is 2 s and L is 8 s, then we would expect the short report probability to be .75 at 6 s when the S and L durations are 4 and 16 s, respectively. Indeed, Church and Deluty (1977) found that the Weber Fraction was constant across anchor duration conditions that had gms of 2, 4, 6, and 8 s, indicating that timing variability scaled with the duration being timed. Although Gibbon’s analyses ruled out certain combinations of timescale and decision process (e.g., Poisson timing with a likelihood decision rule and Scalar timing with a likelihood decision rule), they did not allow him to discriminate among log timing with a likelihood decision rule, log timing with a similarity decision rule, and Scalar timing with a similarity decision rule. That said, subsequent work by Gibbon and colleagues generally modeled duration bisection as a Scalar timing process with a similarity decision rule.
Although early experiments in rats and pigeons (e.g., Church & Deluty, 1977; Platt & Davis, 1983), as well as some work in humans (Allan & Gibbon, 1991), indicated that the bp was closer to the gm than the am, this result has not been found consistently for human participants. Indeed, bisection at or close to the am of the short and long anchor durations (e.g., Wearden, 1991; Wearden & Ferrara, 1995) is found as frequently, if not more frequently, than bisection at or close to the gm. It is clear from the literature that various experimental design factors, such as stimulus spacing (Wearden & Ferrara, 1995) and stimulus range (Wearden & Ferrara, 1996), have a substantial effect on the location of the bp. Indeed, studies reporting gm bisection in humans have typically used logarithmic spacing of probe stimuli whereas those reporting am bisection have typically used linear spacing. Indeed, Allan (2002) advised that because the bp was susceptible to bias due to paradigm features such as stimulus spacing, L/S ratio, and duration range, it should not be interpreted as indicating the time value that is equally confused with the S and L anchor durations. We briefly describe the effects of stimulus spacing and stimulus range below.
5.1 Stimulus Spacing
In rats, Church and Deluty (1977) failed to obtain evidence for an effect of stimulus spacing on the location of the bp with L/S pairs of 4:1. However, Raslear (1983, 1985), also with rats, found a significantly larger bp for linear as compared to logarithmic spacing with an L/S ratio of 100:1, but no spacing effects for less extreme L/S ratios.
Although Allan and Gibbon (1991) found the bp was closer to the gm than the am for both arithmetic (Exp. 1) and logarithmic spacing (Exp. 2) conditions in humans, the same year Wearden (1991) reported duration bisection at the am for conditions using 200 vs. 800 ms and 100 vs. 900 ms anchor duration pairs. Subsequently, Wearden and Ferrara (1995) explicitly compared logarithmic and linear spacing. They found leftward shifts in the response functions for logarithmic as compared to linear spacing at both duration ranges tested (Exp. 1: 200 vs. 800 ms and 100 vs. 900 ms). In a second experiment, they used unequal arithmetic spacing, such that in some conditions there was a larger number of shorter than longer durations. Response functions in these conditions were shifted to the left compared to those conditions with a larger number of longer than shorter durations.
Wearden, Rogers, and Thomas (1997) used longer durations (1 vs. 4 s and 2 vs. 8 s), but included a concurrent task to prevent counting (verbally shadowing visually presented digits). Response functions appeared leftward shifted in the logarithmic spacing condition as compared to linear spacing, but there was no statistical difference between the bps. They also examined L/S ratios of 2:1 and 5:1, but found little effect on the location of the bp, although the bp was closer to the am than the gm in all conditions. Interestingly, participants showed greater timing sensitivity for the more difficult L/S ratios (see Kopec & Brody, 2010, for a brief review).
Wearden and Ferrara (1995) proposed that stimulus spacing affects the location of the bp because participants calculate the midpoint of the distribution of probe durations presented in the test phase and decide whether to respond short or long based on the magnitude of the current duration relative to the midpoint. The midpoint of a logarithmically spaced S/L range will be smaller than the midpoint of a linearly spaced S/L range. For example, the midpoint of a log spaced 2:8 s range (i.e., 2, 2.52, 3.17, 4.00, 5.04, 6.35, and 8.00) is 4.44, whereas the midpoint of a linearly spaced 2:8 s range (i.e., 2, 3, 4, 5, 6, 7, 8) is 5.00. Hence, the psychometric response functions will differ between these two spacing conditions. Although this explanation accounts well for some spacing effects, it doesn’t provide an explanation for all findings. As noted above, the L/S ratio impacts the presence or absence of stimulus spacing effects (Raslear, 1983, 1985). Indeed, task difficulty, indicated by L/S ratio, was greater in Allan and Gibbon (1991) than in Wearden (1991).
5.2 Stimulus Range
The presence of spacing effects for some L/S ratios, but not others, suggests that the stimulus range (i.e., ratio of the short and long anchor durations) the participant experiences also impacts the location of the bp. Wearden and Ferrara (1996) examined stimulus range effects in humans using the partition method of bisection (see above). For three groups of participants, they examined L/S ratios of 5:1 and 2:1 in situations where the difference between the S and L values was kept constant at 400 ms. For another three groups the L/S ratio was kept constant at 4:1, but the difference between short and long values ranged from 300 to 600 ms. They obtained bisection at the am for all groups except the 2:1 group, which showed gm bisection. Experiment 2 of the same study explored linear and logarithmic spacing, L/S ratios of 2:1 (450 vs. 900 ms) and 19:1 (50 vs. 900 ms), and explicitly trained the participants on the S and L anchor durations instead of using the partition method. They obtained an effect of stimulus spacing for the large 19:1 ratio, but not the 2:1 ratio, and a bp close to the gm for the 2:1 ratio. Hence, they concluded that gm bisection manifests for small L/S ratios and that linear versus logarithmic spacing effects manifest only when L/S ratios are large. They also showed that Wearden’s modified difference model (Wearden, 1991) accounted for the obtained data patterns reasonably well. Unfortunately from the perspective of finding generally applicable models, it does not account for the stimulus spacing effects described in the preceding section.
With a view toward developing a single model that could account for the idiosyncrasies found in the duration bisection literature, including effects of stimulus spacing and stimulus range such as those described above, Kopec and Brody (2010) analyzed data from 148 experiments reported in 18 different studies of human duration bisection. They developed a two-step decision model in which the participant first determines whether the probe duration is one of the anchor durations and, if not, conducts the second step in which the relative distance of the probe duration from the anchor durations is compared.
5.3 Timing Precision
As noted above, the dl and wf provide measures of temporal sensitivity in the duration bisection task. The steeper the participant’s response function, the more precise (i.e., less variable) the participant’s timing on a trial-to-trial basis. Several studies in human and non-human animals have shown more precise timing performance with more difficult L/S ratios (e.g., Church & Deluty, 1977). However, when the L/S ratios are extremely difficult (e.g., 3 vs. 3.6 s), timing anomalies may arise. For example, Penney et al. (2008) found reversals in the psychometric response functions, meaning that participants classified a
6 Analysis of Duration Bisection Data
Data from the duration bisection task are often analyzed in what could be termed an atheoretical manner. The simplest approach is to compare the probability of a long classification, p(‘long’), at each test phase duration between conditions. In other words, probe duration and experimental condition can be treated as factors in an anova. This analytic approach can reveal whether duration classification differs between conditions, but it has several shortcomings.
First, even when there are substantial condition effects at intermediate probe durations the stimulus classifications at or very close to the anchor durations may not be different. Both human and non-human subjects often classify the shortest and longest stimuli as ‘short’ and ‘long’ with perfect, or near perfect, accuracy (e.g., Droit-Volet & Wearden, 2001; Meck, 1983). Given a main effect of Condition collapses across the test durations, a condition difference among intermediate probe durations could be concealed. Although a Condition difference may manifest as a Duration x Condition interaction, detecting a significant interaction often requires greater statistical power as compared to detecting a significant main effect.
Second, in the event a Duration x Condition interaction manifests, one would typically then test the effect of Condition at each level of test duration. This may result in significant differences at some intermediate test durations, but not others, particularly when one corrects for multiple comparisons. However, inconsistent effects at the intermediate probe durations may be difficult to interpret in a meaningful manner.
Consequently, rather than analyzing p(‘long’) values, researchers often analyze the bp, dl, and wf values derived from the response function. Differences in the bp indicate whether the manipulation of interest shifted the response function to the left or right and changes in temporal sensitivity are revealed by differences in the dl or wf.
Whether a manipulation shifts the psychometric response function in an additive or multiplicative manner provides critical information for understanding the putative psychological mechanisms underlying effects on interval timing. For example, within the Scalar Timing Theory (STT; Gibbon, Church,
Ideally, to determine whether a manipulation shifts the bp in an additive or multiplicative manner, the experiment should comprise at least three anchor duration pairs. This allows the between condition bp difference to be calculated for each anchor duration pair and whether the shift is a constant absolute amount or a constant proportional amount (i.e., a linear increase), or some other functional form, to be determined. In practice, very few duration bisection studies have used more than one or two pairs of anchor durations. Consequently, most discussions of additive versus multiplicative shifts in the bisection function have centered on whether there is evidence for superimposition of the response functions after normalization by their respective bisection points. We return to this issue below.
The dl, which reflects the steepness of the psychometric response function, provides a measure of the participant’s temporal sensitivity or acuity. The more consistent a participant is in categorizing the same stimulus duration in the test phase, then the steeper the participant’s response function and the smaller the corresponding dl. For example, continuing with the 2 vs. 8 s anchor pair example described above, a participant who has a 25% long value of 3.17 s and a 75% long value of 5.04 s has a much sharper response function and smaller dl than a participant with a 25% long value of 2.52 and a 75% long value of 6.35 (.93 vs. 1.92).
As noted above, the wf provides a measure of temporal sensitivity that is corrected for the magnitude of the timed duration, which in the case of duration bisection means dividing the dl by the bp. If Weber’s Law holds, then the wf is constant across conditions. However, to determine the constancy of the wf, researchers usually statistically test whether there is a between condition difference in the wf and, if not, conclude that Weber’s Law holds. Obviously, the failure to find a statistical difference among wf values does not comprise particularly strong evidence that the wfs are equivalent.
It is worth noting that the Scalar Property for time imposes a stricter requirement on response functions than mere equivalence of wf values. Rather, when normalized by t/bp, where T is a probe duration, the entire psychometric response function should superimpose (Gibbon, 1981; Penney et al., 2000). However, response function superimposition is usually tested “by eye” in combination with the wf analysis approach just described. One could subject the normalized p(‘long’) values for all test durations to a statistical analysis to determine whether values differ between conditions, but this approach also entails the “accepting the null hypothesis” problem.
To determine whether superimposition held in the presence of a bp difference between auditory and visual timing stimuli, Penney et al. (2000) compared the superimposition of the response functions when normalized in different ways. These were multiplicative normalization in which p(‘long’) was plotted against t/bp (i.e., the typical approach) and what the authors termed “lateral” normalization in which they rescaled the objective time axis by adding one half of the difference between the auditory and visual bps to each T value for the auditory modality and subtracted half of the bp difference for the visual modality. Sigmoidal equations were fit to the resulting response functions and the quality of the fit used as a measure of the degree of superimposition. A statistical test revealed that fit quality was better for the multiplicative normalization than the lateral normalization, a result that was taken as support for a clock speed interpretation of the shift rather than a timing onset interpretation.
Balcı and colleagues introduced a more principled method for examining shifts in psychometric response functions obtained from the duration bisection task (Balcı & Gallistel, 2006; Çoşkun et al., 2015). Although not explicitly used for this purpose in their papers, the approach can easily be applied to test superimposition of two, or more, response functions. First, the best fitting cumulative normal distribution function (or another sigmoidal function that best describes the data) is determined for the p(‘long’) data in each condition and the log likelihood of each data point for each distribution fit is calculated. These log likelihoods are summed to obtain the likelihood of the data under different cumulative distributions with different mean and variance parameters (i.e., the non-superimposition model). The superimposition model can be obtained, for instance, by applying one of the best fitting distributions obtained in the first step to the data from all conditions. As above, log likelihoods of each data point are obtained and summed to obtain the likelihood of the data under the single cumulative distribution model. The superimposition and non-superimposition models can then be compared. The likelihood of the data will always be higher for the latter than the former model, but this difference might
6.2 Theory-based Analysis
Many different theoretical models have been applied to data from the duration bisection task. Perhaps the most popular applications have been of models that fall within the framework provided by Scalar Expectancy Theory (set; Gibbon, 1977) and its information processing companion model, Scalar Timing Theory (stt; Gibbon, Church, & Meck, 1984). There are several different duration bisection models within the set/stt framework, with the differences among models centering on what parameters are allowed to vary and the decision rules assumed to apply. For example, Meck (1983) applied Gibbon’s Referents Known Exactly (rke) model to account for pharmacological and electric shock induced effects on timing. As the name suggests, this model assumes that perceptual variance is greater than the variance in the values stored in reference memory. However, the Sample Known Exactly (ske) model has been more commonly used. It was originally proposed by Gibbon (1981) and, subsequently, modified by various researchers to account for a broader range of experimental results (e.g., Meck, 1984; Penney, Gibbon, & Meck, 2000, 2008). The ske model posits that the participant maintains a veridical representation of the probe duration presented on the test trial (i.e., the current sample is known exactly), whereas there is variability in the memory representations for S and L. In its simplest form, the model has two free parameters: g, which reflects variation in the S and L memory representations, and b, which represents bias to respond long. The decision rule to respond “long” in the ske model is [m(T)2] > (wSwL)/b, where m(T) represents mean subjective time, and wS and wL are samples from the memory distributions for S and L. In the absence of bias, this decision rule results in bisection at the geometric mean of S and L. Of course, many other duration bisection models exist in the literature (e.g., Allan, 2002; Balcı & Simen, 2014; Kopec & Brody, 2010; Machado & Pata, 2005; Rodríguez-Gironés & Kacelnik, 2001; Wearden, 1991).
The major benefit of a theoretical analysis is that it provides an interpretive framework for understanding the pattern of results obtained in an experiment. For example, determining whether a between condition difference is due to a clock speed effect or a memory storage effect. Meck (1983) used the
7 Implementation Recommendations
What constitutes best practice in the bisection task depends on the question being asked. For example, if one wishes to unambiguously demonstrate that a shift in the psychometric response function is multiplicative rather than additive, then three sets of anchor duration pairs should be used in a within-subjects design. However, interference between sets of learned anchor durations is a concern when more than one duration pair is presented within the same test session. Moreover, particularly when using seconds range durations, the test session may be rather lengthy. Consequently, very few duration bisection papers published within the past decade have used more than two pairs of anchors durations and most have used a single anchor duration pair.
The selection of L/S range is also important. The task should not be too difficult as this may result in poor performance. However, it also should not be too easy as this may result in participants not being particularly attentive. Perhaps the most commonly used L/S range has been 4:1, although most of these studies came from the same research group or closely related groups. Choice of stimulus spacing is less critical, but if one intends to compare the results with previous findings in the literature, then it should be considered carefully because, as described above, it can influence the location of the bp.
We recommend presenting at least 10 trials at each probe duration during the test phase of the task. The goal is to have enough resolution in the timing measure to provide relatively smooth response functions, while not inducing task fatigue in the participants. Of course, how many trials can be run without
For normal, adult participants, we recommend using seven durations in the test phase (i.e., two anchor durations and five intermediate probes). This number of durations is more likely to reveal the true form of the response
In sum, the duration bisection task is relatively easy to implement across a wide range of participant populations and, therefore, has seen substantial use, particularly in situations where one wants to eliminate the influence of motor responding on the duration estimate. However, the details of task implementation can have a profound impact on the data obtained. Consequently, task parameters must be carefully considered in light of the particular psychological question being asked.