Abstract
As we celebrate the launch of a new mathematics education journal focused (at least in part) on replication studies, I add to a conversation initiated by others (e.g., Aguilar, 2020) around what it would mean for our field to be more amendable to replication. This paper begins by examining several replication studies in mathematics education as a way to reflect upon types of replication studies and the importance of replication in our field. I then problematize the idea of a conceptual replication, and I explore the boundary between a replication and a follow-up study. Finally, I consider what it might take for the field of mathematics education to become a more replication-friendly culture by introducing a distinction between idea-initiated research and results-initiated research.
1 Introduction
In 2008 I published a paper describing a study that explored preservice mathematics teachers’ ability to notice (Star & Strickland, 2008). In this study, I worked with 28 preservice secondary mathematics teachers who were enrolled in a semester-long mathematics methods course that I taught. Improving teachers’ ability to notice was one of the explicit goals of the course. Toward the beginning of the course, these preservice teachers watched an 8th grade 50-min videotaped lesson on exponents (from the TIMSS video study) and then took an assessment that explored what they noticed about the lesson. At the conclusion of the course, these same teachers watched another TIMSS 8th grade videotaped lesson on angles and then took an assessment exploring what they noticed about this lesson. Both assessments included multiple-choice and short-answer questions that were specifically tied to context of the lesson, in the categories of classroom environment, classroom management, tasks, mathematical content, and communication; both assessments were developed by my research team. The results of this study indicated that these preservice teachers were not particularly skilled observers of classroom practice, but that their observation skills substantially improved by the conclusion of the course.
My 2008 paper provides a useful context to begin to situate my reflections in the present paper on the topic of replication. I start by noting that it is exciting that many scholars are thinking about and writing about replication right now. In the field of mathematics education, the latest incarnation of interest in replication studies emerged around 2017, with the establishment of working groups at CERME10 (e.g., Jankvist et al., 2017), PME42 (e.g., Inglis et al., 2018), and CERME11 (e.g., Aguilar et al., 2019); as well as articles published as part of a special issue of the Journal for Research in Mathematics Education (e.g., Cai et al., 2018; Star, 2018) and those published soon thereafter (e.g., Aguilar, 2020; Jacobson & Simpson, 2019). Each of these efforts to consider the issue of replication in mathematics education has built on similar conversations in other fields (particularly the social sciences) over the past decade (e.g., Anczyk et al., 2019; Brandt et al., 2014; Chhin et al., 2018; Coyne et al., 2016; Earp & Trafimow, 2015; Hüffmeier et al., 2016; Makel & Plucker, 2014; Markee, 2017; Marsden et al., 2018; Porte & Richards, 2012; Schmidt, 2009).
Looking across these many recent conversations about replication, a useful place to start is to identify points of agreement. Judging from these recent writings within and outside of mathematics education, many scholars in the social sciences appear to agree about the following ideas as related to replication. First, replication of studies in an academic research field is desirable. At its core, replication provides a means for research results to be confirmed and/or for the field to identify possible restrictions or limitations on results. Replication allows us to explore the generality of our research findings and help us to take advantage of “the self-correcting nature of science” (Porte & Richards, 2012, p. 285). Second, although replication is quite common in many scientific fields (e.g., medicine), it is much less frequently done in the social sciences, including in mathematics education. Good estimates about the prevalence of replication studies in any field are difficult to obtain, in part because there is not widespread agreement on exactly what makes a study a replication study (more on this below). But (as one example) in the field of special education, Coyne and colleagues suggest that as few as 0.4% of studies are replications (Coyne et al., 2016). Cai et al. (2018) provide a very similar estimate for the number of published replication studies in the field of mathematics education. Third, one reason that replication studies are relatively uncommon is that there may be disincentives in the field that push scholars away from doing this kind of work. There is a strong perception that journals and grant review boards are more favorably disposed to studies that are original, new, and innovative (Makel & Plucker, 2014). As a result, pursuit of replication studies may be viewed by some scholars in the social sciences as a risky career move. Fourth, when we speak of replication studies, there is agreement that there are several types of replications. There are a variety of terminological frameworks used to describe types of replication studies, using adjectives such as “exact” and “direct” to describe replication studies that are methodologically identical or extremely close to an original study, and “conceptual” to describe replication studies where one or more features of the original study’s design have been modified from the original study. Of particular interest is the suggestion (Coyne et al., 2016; Hüffmeier et al., 2016; Marsden et al., 2018) that one can conceive of replication studies as existing on a continuum, ranging from studies that are very closely aligned to an original study, to those that introduce a small number of intentional, principled changes to the study in order to test its validity and generalizability, to those studies that are more distal to the original study as a result of several changes to one or more of the study’s variables. And finally, scholars in the social sciences—and in mathematics education more specifically—agree that it is very difficult in our field to conduct exact replication studies (e.g., Brandt et al., 2014). In the most literal sense, in an exact (direct) replication, the same researchers who conducted the original study would need to repeat all aspects of that study, including keeping the study intervention, setting, and participants exactly the same. Given that much of our work in mathematics education involves studying the learning and teaching of mathematics in authentic and dynamic educational contexts, it is unlikely that a replication could include a group of participants who are completely identical to the original participants, including in their demographic composition as well as the state of their knowledge about a particular mathematical concept, much less that the study could take place in a classroom context that is exactly the same as the one used in the original study.
2 Exploring Examples of Replication Studies in Mathematics Education
Although replication studies (of any type) are not particularly common in mathematics education, they do occur. It is instructive to look at a few examples of replication studies in our field to consider what kinds of replications are occurring as well as the motivations and goals for these replication studies. I searched Google Scholar for mathematics education studies since 2000 that self-identified as a replication (e.g., by including this word in the title or abstract of the paper). There were fewer than ten hits, several of which I discuss below.
As a first example of a replication study in mathematics education, many may recall a series of studies by Kaminski and colleagues from around 2008 (e.g., Kaminski et al., 2008) that were widely reported on in the field and in the popular press. These authors conducted several studies exploring the relative impact of concrete versus abstract examples in learning mathematics. Participants in these studies were undergraduate students who were enrolled in an introductory psychology class; these students learned either an abstract instantiation of a mathematical concept or a concrete instantiation of the same concept. Immediately following this period of learning, a learning test was administered. This was followed by a transfer test, which focused on a mathematically isomorphic transfer domain. Kaminski et al. (2008) found that participants whose initial learning experience relied on abstract examples performed better on the transfer test. Kaminski and colleagues argued that these results could be generalized—both to other mathematics topics as well as to learners other than the specific population studied.
The results of these studies by Kaminski and colleagues ran counter to widely held views in mathematics education, and as a result several critical commentaries were published (e.g., Jones, 2009). De Bock and colleagues (De Bock et al., 2011) elaborated on two critiques of the original study and undertook a “replication and extension” study to “support these two elements of critique empirically” (p. 111). De Bock and colleagues’ study maintained the methodology of the original study as closely as possible, working with a similar population of participants (undergraduate students in an introduction to education sciences course) and with a similar study design (a learning phase, followed by a learning test, followed by a transfer test). But as a way to explore possible alternative explanations for the original study’s surprising (in their view) findings, the authors supplemented the original study in two ways: They introduced two additional study conditions, and they incorporated an additional, alternative form of assessment (open-ended assessment questions to supplement the multiple-choice questions used in the original study). These new study features were specifically designed and incorporated as a way to explore alternative explanations for the study’s results.
This basic pattern—a replication study essentially duplicating an original study’s methodology, but with the addition of a small number of intentionally-introduced changes to explore the original study’s validity and generality—can be seen in other replication studies in our field. In 2005, Fuchs and colleagues (Fuchs et al., 2005) investigated the effectiveness of a mathematics tutoring intervention for first-grade students, particularly 127 students who were identified as at-risk for mathematics difficulty. The intervention was a supplement to these children’s regular mathematics instruction and consisted of small group scripted tutoring sessions three times per week for 16 weeks. Fuchs and colleagues found that this mathematics intervention was quite effective for these students.
In 2015, Gersten and colleagues undertook a replication study of the original Fuchs and colleagues’ study (Gersten et al., 2015). Gersten and colleagues note that a primary rationale for doing their “large-scale replication (or scale-up)” was to expand the generality of the results of the original study by investigating the intervention in a “large-scale, real-world setting” (p. 521). Gersten and colleagues provide a detailed comparison of the study features of the replication study and the original study (see Gersten et al., 2015, Table 1, p. 523), making clear that the differences between the original and the replication study were few but carefully selected. The replication involved more students (about seven times as many) from a wider range of schools and districts and a much larger number of tutors (from a wider variety of backgrounds)—and this larger scale necessitated several other small changes in the study’s procedures.
Another example of a very similar replication study was conducted by Doabler and colleagues (Doabler et al., 2016). In this case, this author team replicated a prior study that they themselves had done several years earlier (Clarke et al., 2016). In the replication study (which the authors refer to as a “closely aligned conceptual replication”), Doabler and colleagues investigated the same treatment (a kindergarten mathematics intervention) as was used in the original study, delivered in the same manner, and using the same measures. But they sought to test whether the previously reported effectiveness of the intervention would be maintained under slightly different circumstances, including different participants (suburban and rural Oregon students in the original, a more diverse group of urban and suburban students from Boston in the replication), slightly different timing of the intervention (January start for the intervention in the original, November start for the same intervention in the replication), and a different control condition. The results of the replication largely confirmed those of the original study. Doabler and colleagues suggest that these results served to strengthen the evidentiary base for this kindergarten intervention, given that it was successful in both the original and replication study.
For further examples of this type of replication, we can look to Jitendra et al. (2019), which replicated Jitendra et al. (2015), as well as Star et al. (2011), which replicated Star and Strickland (2008). In both of these cases, the authors of the original study themselves conducted a replication that preserved almost all of the features of the original study but with a few small changes, notably in the number and types of participants—with the goal of strengthening the validity of their results and improving generality. And as a final example of a replication that is extremely similar to the original study but one that was conducted by a different author team, we note that Jacobson and Simpson (2019) replicated an earlier study by Thanheiser (2010).
3 Problematizing Conceptual Replications
Apart from confirming that self-identified replication studies are quite rare in mathematics education, this examination of replication studies in our field yields the following points. First, despite the prevalence of many typologies for describing replication studies, it seems quite difficult to reliably situate a given article along the continuum of replications—and thus it may be the case that these terminological frameworks for categorizing replication studies are more confusing than helpful. While we may agree that none of the replication studies discussed above is an exact or direct replication, I challenge the reader to determine which replication type label best fits each study and to reconcile the various similar terms. Would we agree with Doabler et al. (2016) that their study is a conceptual replication, or is “partial replication” (Marsden et al., 2018) a better fit? Conversely, do we agree with De Bock et al. (2011) that their study is a partial replication, or is this study more accurately called a conceptual replication? Where does the phrase “closely aligned” fall in the replication continuum, as used by both Doabler et al. (2016) and Jacobson and Simpson (2019)? Is Gersten et al.’s (2015) label of “scale-up replication” closer to a “close replication” or a “conceptual replication in the field”, using Hüffmeier et al.’s (2016) framework? These terminological frameworks may appear to be helpful but in practice are quite difficult to use. As Marsden et al. (2018) comment, “In terms of subtypes of replication, we found a very wide range of labels and negligible relations between these labels and the amount or type of change between the initial and replication studies” (p. 366). While it is reasonable to keep in mind that there may be a continuum of replication studies—including those that are more or less closely aligned to the original study—it seems difficult to try to use a precise categorization of replication study types, as the previously discussed examples illustrate. To simplify matters, a somewhat radical proposal would be to refer to all replication studies in social scientific fields such as mathematics education as conceptual replications, acknowledging the impossibility of doing a truly exact or direct replication in these fields and the fact that all replication studies will be at least a little different from the original studies.
However, if we are to refer to all mathematics education replication studies as conceptual replications, it is also the case that we need additional clarity as to what it means for a study to be a “conceptual replication.” I am concerned that the category of conceptual replication, which appears to offer the scholar interested in replication the most degrees of freedom in designing a study, has been interpreted too broadly. Here I diverge from some prior authors who have stressed that a conceptual replication can have “different methods” (e.g., Aguilar, 2020). It is undeniable that replicating a study using at least somewhat different methods is advantageous; as Polit and Beck (2010) note, “If concepts, relationships, patterns, and successful interventions can be confirmed in multiple contexts, varied times, and with different types of people, confidence in their validity and applicability will be strengthened. Indeed, the more diverse the contexts and populations, the greater will be the ability to sort out ‘irrelevancies’ from general truths” (p. 1454). Yet there seems to be confusion about how different the methods of a conceptual replication can be from the original study to still be considered a replication; I believe that this confusion arises as a result of the fact that we have taken the definition of conceptual replication out of the experimental psychological research domains from where it was proposed.
In particular, Schmidt (2009) is cited by many authors as a foundational source when discussing the issue of conceptual replications. As noted by many subsequent authors, it is true that Schmidt defines a conceptual replication as “repetition of a test of a hypothesis or a result of earlier work with different methods” (p. 91, emphasis added). But how different can or should the methods be in a conceptual replication? Schmidt illustrates what he means by a conceptual replication study by discussing Rosenthal’s classic psychological experiments on expectations. In an original study with rats that had been trained to run through a maze, Rosenthal instructed participants that some of the rats were from a breed that was quite good at mazes and others were from a breed that were quite bad at mazes—when in reality all of the rats were from the same breed. Rosenthal wanted to see if the participants’ (incorrect) expectations about the rats’ performance would impact the rats’ actual performance. In a subsequent study that Schmidt refers to as a conceptual replication of this study, Rosenthal did the same thing but using teachers and students. Teachers were told that (as a result of a test given prior the study) some of the students were predicted to do remarkably well in their classes over the next months, when in reality this group of students was selected randomly from among the class. Rosenthal wanted to see if the teachers’ (incorrect) expectations about students’ performance would actually impact the students’ performance. Schmidt notes that these two studies differ on many different dimensions, including the participants and the setting. But at their core, these two studies are structurally isomorphic and have the same underlying hypotheses. Despite their many differences, this isomorphism is what earns the study the label of a conceptual replication.
Continuing to follow the trail of citations, Schmidt cites Hendrick (1991) and Sargent (1981) as key sources for his use of the phrase conceptual replication. Looking first at the former, Hendrick defines a conceptual replication as “an attempt to convey the same crucial structure of information in the independent variables to subjects, but by a radical transformation of the procedural variables in [the primary information focus],” (p. 45, emphasis added), where the primary information focus refers to “the set of instructions or events brought to bear on subjects” (p. 44). But as was the case with Schmidt’s use of the word different above, it is essential to put Henrick’s use of radical in the proper context. Hendrick gives an example of a “classic and successful” conceptual replication. In the original study described by Hendrick (Aronson & Mills, 1959), undergraduate female students volunteered to participate in a group discussion about the psychology of sex. Prior to joining the group, the women were told that they needed to take a screening test to determine whether they were suitable for participating in the discussion. For some of the women, this screening test was designed to be quite stressful and involved reading aloud a list of 12 sexually explicit and obscene words; other participants were given a much milder screening test that involved reading a more innocuous list of words related to sex but that were not obscene (e.g., prostitute). The experimenters wanted to investigate whether the mode of initiation into the group discussion (a traumatic vs. a mild screening test) influenced participants’ subsequent perceptions about the group itself. This original study was subsequently replicated by Gerard and Mathewson (1966). These authors made a number of changes to the original study, including changing the context of the group discussion (from the psychology of sex to cheating in college) and changing the screening test for group membership from the reading of mild or explicit sexual words to receiving a severe or a mild electric shock (!). As with the Rosenthal studies example above from Schmidt, the original Aronson and Mills study and the Gerard and Mathewson conceptual replication are essentially isomorphic—both are studying the same phenomena but with some key (“radical”) differences that allowed the researchers to “rule out a number of alternative explanations of an effect found in a previous experiment” (Gerard & Mathewson, 1966, p. 278).
The other original source that Schmidt (2009) references when discussing conceptual replication is Sargent (1981). Sargent indicates that he is credited by many for the introduction of the distinction between concrete and conceptual replications (although he also says that this distinction was originally made in Stanford, 1974). As with Schmidt (2009) and Hendrick (1991), Sargent discusses these types of replications within the very narrow context of a specific type of experimental laboratory studies. What Sargent calls a concrete replication is very similar to what subsequent authors call a direct or exact replication, while a conceptual replication refers to “testing different predictions about the outcomes of experiments” (p. 429). As an example, Sargent asks the reader to consider a study that explores a hypothesized positive correlation between two variables (say X and Y) by manipulating experimental conditions such that X increases and observing whether or not Y also increases. Assuming that this study was successful, a concrete (or exact) replication could be conducted where (again) the variable X is made to increase, in order to observe whether the variable Y also increases. Sargent recognizes the importance of such replications but also notes that “a sequence of such experiments is profoundly boring” (p. 429). As a more interesting alternative, he suggests a conceptual replication. This would involve keeping all aspects of the study the same, except that instead of increasing X (to look for an increase in Y), the experiment would manipulative experimental conditions so that X decreases and then look for a resulting decrease in Y. Such a study is not an exact (concrete) replication, since it involves important differences in the methodology. But clearly from this example, one can see that the two studies (the original and the conceptual replication) are very similar to each other, even isomorphic.
From the perspective of highly controlled experimental psychological experiments, it is indeed the case—as noted by Sargent (1981), Hendrick (1991), Schmidt (2009), and many contemporary authors—that the methodological features of a conceptual replication study are quite different (even radically so) from the original study. But it is also clear that, for the examples selected by these authors to specifically illustrate the category of conceptual replication, the original and the conceptual replication studies are mostly similar—essentially isomorphic—and only somewhat different in a few carefully chosen methodological details. A conceptual replication is indeed different methodologically than the original study. But despite these differences, the structural of the original study largely remains recognizable and intact in the conceptual replication.
As we take this category of conceptual replication that was introduced in the context of a very specific type of experimental laboratory research and try to apply it in the much broader context of social science research generally, I worry that we may be taking liberties with our interpretation of conceptual replications, choosing a much broader notion of how different the studies can be and still count as (conceptual) replications. In my view, a replication study by design replicates most of the methodological features of the original study. Even in a conceptual replication, we would still expect that the replication study would mostly be the same, methodologically, as the original study—with only a few carefully selected areas of divergence. As emphasized by Porte and Richards (2012), “The essential component one is looking for [in a conceptual replication] is enough similarity—rather than sameness—across the replicated and original studies to permit their satisfactory and constructive comparison across one or more areas” (p. 286). I note that the authors of the mathematics education replication studies described above went to great lengths to highlight the many ways that their studies duplicated the methods of the original studies as well as the very few and targeted ways that the replication departed from the original (e.g., see Table 1 in Jitendra et al., 2019 and Table 1 in Gersten et al., 2015). In any replication study—including a conceptual replication—the methodological shell or essence of the original study should still be present and readily apparent. Conceptual replications are not exploratory studies or those where features of several different studies are combined to create a new, never-before-tried study. Rather, a conceptual replication is very similar to the original study in design—essentially isomorphic—but with a small number of changes introduced to explore the validity and generality of the original study’s findings. (Perhaps it is helpful to be mindful of the fact that the replication process aims to replicate a study, not to replicate results. As in the De Bock et al. (2011) study described above, sometimes we replicate a study with the belief that we may not replicate its results). Replicating a study requires that we maintain most of the study’s methodological features to enable a comparison of the original and the replicated studies, even in a conceptual replication.
4 Follow-Up Studies in Mathematics Education
If we are to adopt this more conservative and narrow definition of a conceptual replication, we may be left with studies that purport to be conceptual replications but that may no longer be considered as replications at all. Here it becomes useful to leverage the distinction between a (conceptual) replication study and a follow-up study (Porte & Richards, 2012; Schmidt, 2009; see also Markee (2017), who used the phrase comparative re-production research). In a follow-up study, the authors may be interested in some of the same goals as in a replication study, including exploring the generality of findings, determining which intervention components are necessary and/or sufficient, exploring effects due to boundary conditions or participant characteristics, etc. But (especially thinking about mathematics education research) follow-up studies are freed from the constraints of methodological comparability with the original study. Rather, follow-up studies are the next step in the line of work that produced the original study. Replication studies fundamentally direct our gaze back to the original study, given that the replication asks us to consider the extent that the original study’s results generalize. In contrast, follow-up studies focus our attention on the present or even the future, as we hope to discover new results from a new study that has been informed by prior work. Porte and Richards (2012) summarize this distinction as follows:
The focus in any replication is always on the original study being replicated and what the results of our replication tell us about that study, its robustness, and perhaps its generalizability. The aim is not to extend the methodology to another context and report new outcomes in a “What happened when we did the same thing in …” situation. This is because we assume that the objective in the replication was closely to assess the findings of that original in some way, rather than simply transfer the original study’s methodology and procedures to produce a new study in a different context and with different subjects. In such “follow-up” studies one usually finds a number of new variables entering the mix: In general, the more variables changed from the original study, the more distant the new study is from what was done originally. Therefore the two become less comparable and the focus shifts from the original study to the outcomes in the new context studied. (p. 286)
In a follow-up study, it may not be possible to identify one specific study whose method is almost identical to the current study (and follow-up study authors often do not explicitly make such a connection)—but rather the methodology of the follow-up study draws from several prior related studies and also may include methodological features that are new and innovative. As Marsden et al. (2018) note, “making many or unacknowledged/unspecified changes to a study lies in tension with being able to account for whether differences in findings compared to the initial study are ascribable to the heterogeneity that was introduced (intentionally or otherwise) or to some other factor,” (p. 333) and that “too many changes or changes that are unmotivated or unacknowledged impede the ability to account for differences in the findings between studies” (p. 353). A replication study is inseparably and permanently linked to the original study on which it is based, while a follow-up study is afforded a much greater degree of independence and generally can exist as a stand-alone piece of work—albeit one with important foundations in prior studies.
This distinction between a replication study and a follow-up study helps explain the similar reactions that both Schoenfeld (2018) and Star (2018) had to the question of whether the paper by Jamil and colleagues (Jamil et al., 2018) included in the JRME special issue on replications could even be considered to be a replication study. Neither Schoenfeld nor Star felt that this study was a replication study; Schoenfeld suggested that it was instead a “triangulation and extension” (p. 95) study. The method of the study by Jamil and colleagues does not appear to closely follow one specific prior study but rather is a conglomeration of and extension of many prior studies that have studied a similar phenomenon. But classifying the Jamil et al. study as a follow-up study is not intended to be a critique per se but rather a prompt to the field for us to be clearer about what we mean by a replication study. This seems especially important as interest in replications continues to increase.
Reflecting more generally on the distinction between a replication and a follow-up study and considering the very small number of published replication studies in our field, it appears that mathematics education scholars are much more predisposed to pursue follow-up studies than to do replication studies. There may be many reasons why this is the case—including my points above about structural disincentives around replication related to publishing and grant funding. Furthermore, I acknowledge that it is unclear what the optimal ratio of follow-up studies to replication studies might be. But at present, follow-up studies are over-represented in our field; I believe that it would perhaps behoove us to repurpose some of our follow-up studies into replications studies—and such a transformation would not be especially difficult.
To make this case, consider the Star and Strickland (2008) paper, first mentioned at the opening of this paper. I noted above that I personally replicated this study (Star et al., 2011) by duplicating nearly all aspects of the study’s methodology. Other than my replication, and although the original 2008 study has been cited dozens of times, it does not appear that this study has ever been replicated by any other authors. Yet there have been many follow-up studies of Star and Strickland. Some of these follow-up studies are similar in design and scope to my original study, in that they investigate the development of secondary preservice mathematics teachers’ lesson observation skills by asking them to watch video during a mathematics methods class and to complete a written assessment. But each follow-up study makes one or more major changes in the study methodology such that it is no longer structurally and methodological similar to Star and Strickland (2008) and thus is not a replication. For example, Males (2017) analyzes secondary preservice teachers’ noticing of their own unique videos (rather than having all students watch the same video, as was done in Star and Strickland), using the same analytical framework from Star and Strickland. Had Males asked all teachers to collectively watch the same video in her study, it could have been considered to be a conceptual replication, but instead it is a follow-up study. In a similar follow-up study, Roller (2016) also analyzed secondary preservice mathematics teachers’ noticing of their own lesson videos but using a framework of her own design. As another example, Amador et al. (2017) asked elementary (instead of secondary as in Roller’s and Males’ studies) preservice mathematics teachers to collectively watch a much shorter video than was used in Star and Strickland. Teachers then completed an assessment that similarly sought to investigate their noticing but that differed along many dimensions from the one used by Star and Strickland. Finally, Stockero et al. (2017) also asked preservice mathematics teachers to view lesson video and to complete assessments documenting what they noticed, but the authors used a very different type of assessment than the one that was used by Star and Strickland (or Males, or Roller, or Amador et al.).
The preceding paragraph is certainly not intended to be a criticism of these authors’ work. Each of these studies is an excellent follow-up to Star and Strickland (2008). They—along with many other similar follow-up studies that could have been mentioned here—helped to advance the field in terms of mathematics teacher noticing. Many of these studies shared several common features, including that the participants (usually preservice mathematics teachers) viewed lesson video and then completed an assessment to document what they noticed in the video. This pattern—where we see many very similar follow-up studies around a given research topic, each incorporating new and often unique methodologies, assessments, and modes of analyses—appears to be the predominant norm in mathematics education research. We are a field focused on follow-up studies, with very little incentive or interest in doing replication work. This characteristic of our field is not necessarily problematic, but it does further help to explain why replication studies are so rare. Mathematics education scholars are encouraged and incentivized to incorporate creativity and innovation into their studies, to leverage features of their local contexts and teaching environments in their research, and to make individual decisions about whether to emphasize or de-emphasize particular aspects of prior studies (e.g., Arcavi, 2000). Each of these factors pushes against the drive to ‘merely’ replicate the methodologies of a given study to incrementally advance knowledge in a more step-by-step and systematic manner.
5 Low Citation Counts among Published Articles in Our Top Journals
The prevalence of (and incentives to produce) follow-up studies in our field represents a formidable challenge in any attempt to try to increase the number of replications studies in mathematics education. The ultimate goal may never be for replication studies to outnumber follow-up studies, but it is worth considering how we might push the field to be more replication friendly. Aguilar (2020) does an excellent job of beginning to answer this question, by wondering if the field of mathematics education is ready for replication. I agree with Aguilar that a transformation in our field along many dimensions may be necessary in order to change our perception of replication studies. For example, Aguilar suggests that we need to move beyond the small-scale studies that predominate the field as well as to include additional methodological detail in our published work. To Aguilar’s wish list (and drawing from similar recommendations posed by other scholars; e.g., Marsden et al., 2018), I would add items such as (a) the need for a greater willingness among researchers to share original data for others to analyze (and journals to provide logistical and technical supports for doing so), (b) journals allocating space to enable researchers to routinely provide technical appendices or supplements with more detailed information about method and participants, and (c) a willingness to engage in (and the infrastructure to implement) preregistration of analyses prior to data collection. But beyond these, I also want to offer some recommendations related to establishing a replication culture in mathematics education.
Fundamentally, my recommendations seek to address the most salient and troublesome manifestation of the lack of a replication culture in our field, which is the very low citation numbers in our top journals. Using data from the Web of Science, I recently looked at all published articles from the period 2009 to 2018 in the three journals that many mathematics education researchers consider to be our top outlets for empirical work (Toerner & Arzarello, 2012; Williams & Leatham, 2017): Journal for Research in Mathematics Education (JRME), Educational Studies in Mathematics (ESM), and Mathematical Thinking and Learning (MTL). I compared citation counts for these recent mathematics education articles to those from two other top-tier journals, Learning and Instruction (L&I) and Journal of Educational Psychology (JEP), for the same time period. (For this analysis, I excluded papers published since 2018, as it is to be expected that it may take a few years for a study to be discovered by researchers in a field and thus cited. Also, note that different websites calculate citation numbers in very different ways, with Web of Science citation counts often being much lower than Google Scholar citation counts. Thus it is important in this analysis to focus on the relative differences between the Web of Science citation numbers reported for the three mathematics education journals and the two comparison journals (L&I and JEP), rather than absolute citation numbers.) My findings—reporting the percent of recently published articles that fall into certain citation ranges, are displayed in the histogram below (see Figure 1). Among the three mathematics education journals, about 1/3 of all articles published between 2009 and 2018 have fewer than 5 citations (JRME: 36%; ESM: 32%; MTL: 42%). In fact, many of these articles (JRME: 25%; ESM: 16%; MTL: 28%) have one or zero citations. In comparison, only 5% of JEP articles published between 2009 and 2018, and only 8% of L&I articles from this time period, have fewer than 5 citations (with only 2% (JEP) and 3% (L&I) having one or zero citations). The median number of citations for articles published between 2009 and 2018 for these three top mathematics education journals was about 8 (8, 8, and 7, for JRME, ESM, and MTL respectively), as compared to a median of 30 for JEP and 24 for L&I. Looking at the more highly cited articles published between 2009 and 2018, 26% of JEP articles and 20% of L&I articles were cited at least 60 times, with 6% and 3% (for JEP and L&I respectively) of articles cited more than 150 times. For JRME, only 5% of articles were cited at least 60 times; this figure drops to 2% for ESM and less than 1% for MTL. Furthermore, combining across all three mathematics education journals, there were only 2 articles published since 2009 that were cited more than 150 times, both from JRME. (For curious readers, these two articles are Jacobs et al. (2010) and Gutiérrez (2013)). I acknowledge that there may be substantial differences between fields in what it means for an article to be frequently cited. But even with this caveat, it is hard to deny the picture that Figure 1 paints: The vast majority of articles published in our top journals are very infrequently cited, with citation counts that are much lower than we see in other fields’ top journals.
Histogram of citation counts for published articles (2009–2018) in Learning and Instruction (L&I) (n = 598), Journal of Educational Psychology (JEP) (n = 740), Journal for Research in Mathematics Education (JRME) (n = 288), Educational Studies in Mathematics (ESM) (n = 682), and Mathematical Thinking and Learning (MTL) (n = 171)
Citation: Implementation and Replication Studies in Mathematics Education 1, 1 (2021) ; 10.1163/26670127-01010003
Source: Web of Science (There may be innocuous explanations for the relatively low citation numbers in top mathematics education journals. Such explanations might include the fact that we are a relatively small sub-field of academic study (as compared to educational psychology or instructional psychology, for example) with fewer researchers and fewer publications, or that many of our field’s most citable publications cross boundaries into other related sub-fields and their journals (e.g., in mathematics teacher education, Journal for Mathematics Teacher Education) and/or into more general journals (e.g., Cognition and Instruction). It may also be the case that scholars in mathematics education publish less in research journals because of their interest in also publishing in practitioner-oriented journals. But it seems unlikely that these explanations would drastically change the trend of relatively low citation numbers that is so clear from the JRME, MTL, and ESM data.
These relatively low citation numbers for published mathematics education research are a symptom of a larger concern: Low citation numbers point to a field’s failure to systematically build on existing research (King, 1995), which is also reflected in the scarcity of replication studies. Generally speaking, our published articles do not make a clear and explicit case—via discussion of related past work—that current studies build upon prior scholarship in mathematics education. In many fields, incremental scientific advancement is chronicled in published papers via the citations of series of closely related studies. In such cases, the field’s march forward is made explicit when authors acknowledge the empirical genealogy of their work—and this has the effect of increasing citation numbers as well. Yet in general this does not appear to be happening in published research in mathematics education, as evidenced by our low citation numbers.
Changing the ways that we cite prior research should not be thought of as an end unto itself. Rather, I consider citation numbers as one indicator of the ‘health’ of an academic field. Relatively high average citation counts would indicate that we are closely attending to the empirical history of mathematics education research in our published work, leveraging and building upon this history in an attempt to steadily make progress toward solving complex and nuanced problems relating to mathematics teaching and learning. Relatively high citation counts would also suggest that we are generally aware of prior empirical work in a particular subfield and that we acknowledge the proximal and distal influences of this prior work upon our current studies. Low citation counts are not caused by a scarcity of replication studies, nor is it likely that a marked increase in the number of replication studies in mathematics education would uniformly increase average citation numbers. Rather, relatively low citation numbers and the infrequency of replication articles may both be caused by the same underlying issue in our field—our failure to systematically build upon prior research. Improvement in this aspect of the culture of our field would pave the way toward a greater appreciation for replication studies.
I acknowledge that this point may be puzzling to some mathematics education scholars, who might correctly note that published work in our field does indeed provide connections and citations to prior scholarship in the field—e.g., look at any published JRME article and you will see many paragraphs describing past related empirical work. But there is a subtle but important distinction between what we tend to do at present in terms of connecting to prior work and what we would need to do to establish a more replication-friendly culture in mathematics education. The distinction arises when we consider the ways that authors conceive of and then write about their research studies.
6 Idea-Initiated vs. Results-Initiated Studies
Reflecting both on my own process for designing, implementing, and publishing research, as well as on the published articles that I have read as a journal reviewer and editor over many years, my hunch is that many studies in our field come into existence because of researchers’ curiosity around a set of ideas—what I might refer to as an idea-initiated research process (see also Arcavi’s (2000) notion of “problem-driven” research). I personally am motivated to do research in a particular area based on a number of factors, including what I find interesting to explore in the field, what seems most powerful and productive for moving the field forward, and what gaps or holes I find compelling or important to fill. My knowledge of the relevant literature plays a key initial role in the design of my studies, in that this knowledge helps me to identify gaps that I am curious to explore. But in looking to the literature, often my focus is on the ideas that are present in prior work (as opposed to their methods).
There is nothing inherently wrong with this type of idea-initiated research process—arguably it has the benefit of propelling scholars to be motivated to produce creative and innovative research. But the ubiquity of idea-generated research in our field has implications for the establishment of a replication culture, in two ways. First, idea-initiated studies tend to become what I refer to above as follow-up studies—or studies that do not bear a sufficiently close resemblance methodologically to prior work such that they could be considered to be replications. In idea-initiated studies, scholars want to follow their curiosity around an idea without being bound by the methods of prior work, unless it is convenient to do so. Prior work may inform the design of our studies, but we feel no obligation to closely adhere the methods of the studies that inspired us, given our central commitment to following our ideas in whatever ways are most compelling to us.
Second, idea-initiated studies are also recognizable from the ways that citations are used in the opening sections of the resulting published papers. The literature review sections of idea-initiated studies trace the genesis and evolution of an idea, including who is credited with early and important work around the idea, major landmark studies and findings related to the idea, and theoretical orientations underlying the idea—all of which can lead to disproportionate citation of seminal or foundational work and far less attention to the subsequent empirical work that incrementally advanced the idea. These opening sections do a nice job of conceptually situating our studies among relevant theoretical frameworks and key articles. But we tend to be less explicit about the empirical studies that are the direct ancestors to our study and about tracing via citation the exact genealogy of the work. As a result, for a given research topic, we find that there are a very small number of seminal (theoretical and empirical) papers that get cited by most studies in the subfield, while the vast majority of articles (which are not seminal but are idea-initiated follow-up studies) are almost never cited. This is exactly the pattern that we see with the Star and Strickland (2008) article cited above—where many follow-up studies cite this work but few themselves are cited more than a handful of times. Thus, the dominance of idea-initiated research in mathematics education helps account for the preponderance of follow-up studies and the generally low citation numbers in our field—both of which are strong signals about the absence of a replication culture.
To change to a more replication-friendly culture in the field, we may need to increase the proportion of studies that are results-initiated, as opposed to idea-initiated. In results-initiated work, scholars are driven as much by the results of prior studies as by the ideas explored by them. In this paradigm, I might undertake a study because I am compelled by interesting or surprising results from a prior study—results that cry out to be confirmed or refuted. Given my interest in prior results, it is essential that I attend very carefully to the methods of the prior work as I design my study. I may end up trying to replicate the prior results, or I may conduct a follow-up study—but regardless, my focus on results means that I am thinking carefully about which aspects of the prior study’s methodology I want to retain. And in the published paper that results from a results-initiated study, I would include both a thorough description and justification of the methods from prior studies as well as an explanation for why I chose to keep or modify these methods in my study.
All of the replications described earlier in this paper are excellent examples of papers reporting results-initiated studies. De Bock et al. (2011) explicitly describe their motivation for the study as arising from their critical reaction to the results from prior studies from Kaminski et al. (2008). Gersten et al. (2015) and Doabler et al. (2016) expressed great interest in the results from a specific prior study and sought to confirm these findings with a different population of participants. And each of these articles devote considerable space in the opening pages to describing the methods of the prior studies and to justifying their decision to retain or modify these prior methods.
Clearly a healthy field of scholarship must include both idea-initiated and results-initiated studies. But if we seek a more replication-friendly culture in mathematics education, we may need to increase the number of results-initiated studies that we conduct, some of which will be replication studies. For the papers that we publish from these results-initiated studies, our literature review sections will not only touch on the theoretical and conceptual grounding for the work and prior seminal studies, but they should also provide specifics about prior studies whose results are the primary motivators for the present work and how the methods of prior closely-related studies are being used or slightly modified—and why.
One way of supporting an increase in the proportion of results-initiated studies would be to incorporate this kind of research into graduate training (Eastman, 1975; Porte & Richards, 2012). For example, prior to beginning work on the dissertation, doctoral students could do a replication of a study of interest. The field would benefit from the resulting replication studies that are produced and published, and the student gets valuable training in how to conduct research. Some students might become very interested in doing replication work through this experience and continue doing so after earning their degree, but more generally the awareness of and appreciation of replication studies would gradually increase across the field.
7 Conclusion
Replication studies play an essential role in the advancement of knowledge in any research domain. In the field of mathematics education, much has been written about the importance of replication over the past few years—with authors frequently lamenting the rarity of published replication studies and speculating how we might encourage more scholars to conduct such studies (as well as journals to publish them). In this paper, I attempt to contribute to this conversation by reflecting on what it means for a study to be a replication study. I first examined several studies that self-identify as replications, such as the De Bock et al. (2011) replication of a study by Kaminski et al. (2008). I note that in each of these replications, the authors deliberately varied certain aspects of the original study’s methodology. But on the whole, the design and methods of each of the replication studies is extremely similar to the original. I then discuss commonly used frameworks for describing types of replications, noting that none of these frameworks seems straightforward to apply. I take particular aim at the idea of a conceptual replication, noting my concern that scholars in mathematics education may be misinterpreting the degree that the methodology of a conceptual replication can be different from the original studies. I look to the origins of this phrase to suggest that a conceptual replication should essentially be isomorphic in methodology to the original. I also suggest that a great proportion of the research conducted in our field could be characterized as follow-up studies. But I worry that our predilection toward follow-up studies—particularly those that I refer to as idea-initiated—has led to a problematically low number of citations for most published work in our field, as well as a research culture that is not particularly conducive toward replications. I applaud the launch of this new journal and hope that this outlet can propel us toward a more replication-friendly research culture in mathematics education.
Impact Sheet
The impact sheet to this article can be accessed at
References
Aguilar, M. S. (2020). Replication studies in mathematics education: What kind of questions would be productive to explore? International Journal of Science and Mathematics Education, 18(1, Suppl.), S37–S50. https://doi.org/10.1007/s10763-020-10069-7.
Aguilar, M. S., Kuzle, A., Wæge, K. & Misfeldt, M. (2019). Introduction to the papers of TWG23: Implementation of research findings in mathematics education. In U. T. Jankvist, M. van den Heuvel-Panhuizen & M. Veldhuis (Eds.), Proceedings of the Eleventh Congress of the European Society for Research in Mathematics Education (pp. 4355–4362). Freudenthal Group & Freudenthal Institute; Utrecht University; ERME.
Amador, J. M., Estapa, A., de Araujo, Z., Kosko, K. W. & Weston, T. L. (2017). Eliciting and analyzing preservice teachers’ mathematical noticing. Mathematics Teacher Educator, 5(2), 158–177. https://doi.org/10.5951/mathteaceduc.5.2.0158.
Anczyk, A., Grzymała-Moszczyńska, H., Krzysztof-Świderska, A. & Prusak, J. (2019). The replication crisis and qualitative research in the psychology of religion. The International Journal for the Psychology of Religion, 29(4), 278–291. https://doi.org/10.1080/10508619.2019.1687197.
Arcavi, A. (2000). Problem-driven research in mathematics education. The Journal of Mathematical Behavior, 19(2), 141–173. https://doi.org/10.1016/S0732-3123(00)00042-0.
Aronson, E. & Mills, J. (1959). The effect of severity of initiation on liking for a group. The Journal of Abnormal and Social Psychology, 59(2), 177–181. https://doi.org/10.1037/h0047195.
Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., Grange, A. J., Perugini, M., Spies, J. R. & van ‘t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224. https://doi.org/10.1016/j.jesp.2013.10.005.
Cai, J., Morris, A., Hohensee, C., Hwang, S., Robison, V. & Hiebert, J. (2018). The role of replication studies in educational research. Journal for Research in Mathematics Education, 49(1), 2–8. https://doi.org/10.5951/jresematheduc.49.1.0002.
Chhin, C. S., Taylor, K. A. & Wei, W. S. (2018). Supporting a culture of replication: An examination of education and special education research grants funded by the Institute of Education Sciences. Educational Researcher, 47(9), 594–605. https://doi.org/10.3102/0013189X18788047.
Clarke, B., Doabler, C., Smolkowski, K., Nelson, E. K., Fien, H., Baker, S. K. & Kosty, D. (2016). Testing the immediate and long-term efficacy of a tier 2 kindergarten mathematics intervention. Journal of Research on Educational Effectiveness, 9(4), 607–634. https://doi.org/10.1080/19345747.2015.1116034.
Coyne, M. D., Cook, B. G. & Therrien, W. J. (2016). Recommendations for replication research in special education: A framework of systematic, conceptual replications. Remedial and Special Education, 37(4), 244–253. https://doi.org/10.1177/0741932516648463.
De Bock, D., Deprez, J., Van Dooren, W., Roelens, M. & Verschaffel, L. (2011). Abstract or concrete examples in learning mathematics? A replication and elaboration of Kaminski, Sloutsky, and Heckler’s study. Journal for Research in Mathematics Education, 42(2), 109–126. https://doi.org/10.5951/jresematheduc.42.2.0109.
Doabler, C. T., Clarke, B., Kosty, D. B., Kurtz-Nelson, E., Fien, H., Smolkowski, K. & Baker, S. K. (2016). Testing the efficacy of a Tier 2 mathematics intervention: A conceptual replication study. Exceptional Children, 83(1), 92–110. https://doi.org/10.1177/0014402916660084.
Earp, B. D. & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in psychology, 6, Article 621. https://doi.org/10.3389/fpsyg.2015.00621.
Eastman, P. M. (1975). Replication studies: Why so few? Journal for Research in Mathematics Education, 6(2), 67–68. https://doi.org/10.5951/jresematheduc.6.2.0067.
Fuchs, L. S., Compton, D. L., Fuchs, D., Paulsen, K., Bryant, J. D. & Hamlett, C. L. (2005). The prevention, identification, and cognitive determinants of math difficulty. Journal of Educational Psychology, 97(3), 493–513. https://doi.org/10.1037/0022-0663.97.3.493.
Gerard, H. B. & Mathewson, G. C. (1966). The effects of severity of initiation on liking for a group: A replication. Journal of Experimental Social Psychology, 2(3), 278–287. https://doi.org/10.1016/0022-1031(66)90084-9.
Gersten, R., Rolfhus, E., Clarke, B., Decker, L. E., Wilkins, C. & Dimino, J. (2015). Intervention for first graders with limited number knowledge: Large-scale replication of a randomized controlled trial. American Educational Research Journal, 52(3), 516–546. https://doi.org/10.3102/0002831214565787.
Gutiérrez, R. (2013). The sociopolitical turn in mathematics education. Journal for Research in Mathematics Education, 44(1), 37–68. http://doi.org/10.5951/jresematheduc.44.1.0037.
Hendrick, C. (1991). Replication, strict replications, and conceptual replications: Are they important? In J. W. Neuliep (Ed.), Replication research in the social sciences (pp. 41–49). Sage.
Hüffmeier, J., Mazei, J. & Schultze, T. (2016). Reconceptualizing replication as a sequence of different studies: A replication typology. Journal of Experimental Social Psychology, 66, 81–92. https://doi.org/10.1016/j.jesp.2015.09.009.
Inglis, M., Schukajlow, S., Van Dooren, W. & Hannula, M. S. (2018). Replication in mathematics education. In E. Bergqvist, M. Österholm, C. Granberg & L. Sumpter (Eds.), Proceedings of the 42nd Conference of the International Group for the Psychology of Mathematics Education (Vol. 1, pp. 195–196). PME.
Jacobs, V. R., Lamb, L. L. & Philipp, R. A. (2010). Professional noticing of children’s mathematical thinking. Journal for Research in Mathematics Education, 41(2), 169–202. http://doi.org/10.5951/jresematheduc.41.2.0169.
Jacobson, E. & Simpson, A. (2019). Prospective elementary teachers’ conceptions of multidigit number: Exemplifying a replication framework for mathematics education. Mathematics Education Research Journal, 31(1), 67–88. https://doi.org/10.1007/s13394-018-0242-x.
Jamil, F. M., Larsen, R. A. & Hamre, B. K. (2018). Exploring longitudinal changes in teacher expectancy effects on children’s mathematics achievement. Journal for Research in Mathematics Education, 49(1), 57–90. https://doi.org/10.5951/jresematheduc.49.1.0057.
Jankvist, U. T., Aguilar, M. S., Bergman Ärlebäck, J. & Wæge, K. (2017). Introduction to the papers of TWG23: Implementation of research findings in mathematics education. In T. Dooley & G. Gueudet (Eds.), Proceedings of the Tenth Congress of the European Society for Research in Mathematics Education (pp. 3769–3775). DCU Institute of Education; ERME. https://hal.archives-ouvertes.fr/hal-01950532/document.
Jitendra, A. K., Harwell, M. R., Dupuis, D. N., Karl, S. R., Lein, A. E., Simonson, G. & Slater, S. C. (2015). Effects of a research-based intervention to improve seventh-grade students’ proportional problem solving: A cluster randomized trial. Journal of Educational Psychology, 107(4), 1019–1034. http://dx.doi.org/10.1037/edu0000039.
Jitendra, A. K., Harwell, M. R., Im, S.-h., Karl, S. R. & Slater, S. C. (2019). Improving student learning of ratio, proportion, and percent: A replication study of schema-based instruction. Journal of Educational Psychology, 111(6), 1045–1062. https://doi.org/10.1037/edu0000335.
Jones, M. G. (2009). Transfer, abstraction, and context. Journal for Research in Mathematics Education, 40(2), 80–89. https://doi.org/10.5951/jresematheduc.40.2.0080.
Kaminski, J. A., Sloutsky, V. M. & Heckler, A. F. (2008). The advantage of abstract examples in learning math. Science, 320(5875), 454–455. http://doi.org/10.1126/science.1154659.
King, G. (1995). Replication, replication. PS: Political Science & Politics, 28(3), 444–452. https://doi.org/10.2307/420301.
Makel, M. C. & Plucker, J. A. (2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43(6), 304–316. https://doi.org/10.3102/0013189X14545513.
Males, L. M. (2017). Using video of peer teaching to examine grades 6–12 preservice teachers’ noticing. In E. O. Schack, M. H. Fisher & J. A. Wilhelm (Eds.), Teacher noticing: Bridging and broadening perspectives, contexts, and frameworks (pp. 91–109). Springer. https://doi.org/10.1007/978-3-319-46753-5_6.
Markee, N. (2017). Are replication studies possible in qualitative second/foreign language classroom research? A call for comparative re-production research. Language Teaching, 50(3), 367–383. https://doi.org/10.1017/S0261444815000099.
Marsden, E., Morgan‐Short, K., Thompson, S. & Abugaber, D. (2018). Replication in second language research: Narrative and systematic reviews and recommendations for the field. Language Learning, 68(2), 321–391. https://doi.org/10.1111/lang.12286.
Polit, D. F. & Beck, C. T. (2010). Generalization in quantitative and qualitative research: Myths and strategies. International Journal of Nursing Studies, 47(11), 1451–1458. https://doi.org/10.1016/j.ijnurstu.2010.06.004.
Porte, G. & Richards, K. (2012). Focus article: Replication in second language writing research. Journal of Second Language Writing, 21(3), 284–293. https://doi.org/10.1016/j.jslw.2012.05.002.
Roller, S. A. (2016). What they notice in video: A study of prospective secondary mathematics teachers learning to teach. Journal of Mathematics Teacher Education, 19(5), 477–498. https://doi.org/10.1007/s10857-015-9307-x.
Sargent, C. L. (1981). The repeatability of significance and the significance of repeatability. European Journal of Parapsychology, 3(4), 423–443.
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13(2), 90–100. https://doi.org/10.1037/a0015108.
Schoenfeld, A. H. (2018). On replications. Journal for Research in Mathematics Education, 49(1), 91–97. https://doi.org/10.5951/jresematheduc.49.1.0091.
Stanford, R. G. (1974). An experimentally testable model for spontaneous psi events: I. Extrasensory events. Journal of the American Society for Psychical Research, 68(1), 34–57.
Star, J. R. (2018). When and why replication studies should be published: guidelines for mathematics education journals. Journal for Research in Mathematics Education, 49(1), 98–103. https://doi.org/10.5951/jresematheduc.49.1.0098.
Star, J. R. & Strickland, S. K. (2008). Learning to observe: Using video to improve preservice mathematics teachers’ ability to notice. Journal of Mathematics Teacher Education, 11(2), 107–125. https://doi.org/10.1007/s10857-007-9063-7.
Star, J. R., Lynch, K. & Perova, N. (2011). Using video to improve mathematics’ teachers’ abilities to attend to classroom features: A replication study. In M. G. Sherin, V. R. Jacobs & R. A. Philipp (Eds.), Mathematics teachers’ noticing: Seeing through teachers’ eyes (pp. 117–133). Routledge.
Stockero, S. L., Rupnow, R. L. & Pascoe, A. E. (2017). Learning to notice important student mathematical thinking in complex classroom interactions. Teaching and Teacher Education, 63, 384–395. https://doi.org/10.1016/j.tate.2017.01.006.
Thanheiser, E. (2010). Investigating further preservice teachers’ conceptions of multidigit whole numbers: Refining a framework. Educational Studies in Mathematics, 75(3), 241–251. https://doi.org/10.1007/s10649-010-9252-7.
Toerner, G. & Arzarello, F. (2012). Grading mathematics education research journals. Newsletter of the European Mathematical Society, 86, 52–54.
Williams, S. R. & Leatham, K. R. (2017). Journal quality in mathematics education. Journal for Research in Mathematics Education, 48(4), 369–396. https://doi.org/10.5951/jresematheduc.48.4.0369.