Save

Stochastic Remembering and Distributed Mnemonic Agency

Recalling Twentieth Century Activists with ChatGPT

In: Memory Studies Review
Authors:
Rik Smit Research Centre for Media and Journalism Studies, University of Groningen, Groningen, The Netherlands

Search for other papers by Rik Smit in
Current site
Google Scholar
PubMed
Close
,
Thomas Smits Amsterdam School of Historical Studies, University of Amsterdam, Amsterdam, The Netherlands

Search for other papers by Thomas Smits in
Current site
Google Scholar
PubMed
Close
, and
Samuel Merrill Department of Sociology, Umeå University, Umeå, Sweden

Search for other papers by Samuel Merrill in
Current site
Google Scholar
PubMed
Close
Open Access

Abstract

This paper introduces the concept of stochastic remembering and uses two prompt engineering techniques to critically examine the text generated by ai chatbots. These techniques – step-by-step prompting and chain of thought reasoning – are then experimentally applied to understand how ChatGPT, the most commonly used ai chatbot, shapes how we remember historical activists. This experiment suggests that hegemonic forms of memory influence the data on which these chatbots are trained and underlines how stochastic patterns affect how humans and ai systems collectively remember the past. Humans and ai systems prompt each other to remember. In conclusion, the paper argues that ai chatbots are a new kind of mnemonic actor that, in interaction with users, renders a probabilistic past. Methodologically, the paper introduces, in an explorative way, an experimental method that can reveal the dynamics of stochastic remembering.

1 Introduction

Increasingly, databases “speak to” us through artificial intelligence (ai) systems such as Bing, Bard, and ChatGPT. These instruction-tuned text generators, often described as ai chatbots, rely, at their core, on Large Language Models (llm s) – neural networks with many parameters that have been trained on vast quantities of unlabeled textual data. Connecting with the field of critical ai studies (Verdegem, 2021; Lindgren, 2023) and early efforts to understand the possible consequences of algorithms and ai for memory and history (Kansteiner, 2022; Makhorthykh, 2024; Merrill, 2023; Shur-Ofry & Pessach, 2020; Smit, 2024), this paper explores how instruction-tuned text generators affect the process of remembering. It acknowledges that these ai systems will likely come to play a progressively important role in how we engage with the past. If all remembering is at least in part a socially and culturally situated activity, occurring as much “in the wild” as “in the head” (Barnier & Hoskins, 2018), what happens when ai systems are used for mnemonic purposes? Based on Bender et al. (2021), we argue that chatbots, the current most widely used ai system, introduce a new, stochastic rendering of the past, based on the probabilistic distribution of words in their training datasets.

Reflecting this, the paper examines the distributed mnemonic agency of humans and instruction-tuned text generators. As contemporary memory studies research has addressed, historically, the social act of remembering has involved the reproduction of hegemonic forms of memory, for example through commemorations and history textbooks. Therefore, the problem is not that humans use ai systems to remember but that they are “deceived” (Natale, 2021) into thinking that these models have (super)human-like agency in this process. We instead argue that this remembering is the result of ongoing and shifting interactions between human and non-human actors and highlight some of the potential dangers of this process. For example, the increased use of ai systems in remembering may lead to the unconscious reproduction of hegemonic memory and might, in turn, hamper our ability to critically engage with the past in the context of the present.

First, however, we discuss the technicalities and common critiques of ChatGPT and llm s at three junctures: training dataset, training tasks, and user input. This discussion helps create a common understanding of what instruction-tuned text generators are and seeks to familiarise non-computer science readers with key terms and processes. Our goal here is to avoid general talk about ai, which might add to the myths and misunderstandings about it in both popular and scholarly discourse (Cave, Kanta & Dillon, 2020).

In the context of human/ai remembering, the way in which ai models are prompted to recall the past significantly influences how humans/ai can remember the past together. In the paper’s second section, we theorise this dispersed form of engaging with the past as “stochastic remembering”. Moving beyond common critiques of the potentially biased output of llm s and the hegemonic memory these outputs embody, we draw attention to the distributed mnemonic processes and agencies behind such outputs.

In the final section of the article, we prompt ChatGPT to “remember” the most and least well-known activists of the twentieth century and use two specific prompt engineering techniques called step-by-step prompting and chain of thought reasoning in order to critically explore the texts that it can produce about the past. Comparing and analysing ChatGPT’s outputs gives us a better grasp of the hegemonic memory contained in the data on which llm-based models were trained, but more significantly for our purposes, helps us explore how stochastic processes influence how humans and ai systems remember the past together. Ultimately, we explore how ChatGPT is a new mnemonic actor that, in interaction with users, renders the past probabilistically.

2 ChatGPT and Large Language Models: Technicalities and Common Critiques

ChatGPT (GPT = Generative Pre-training Transformer) is a state-of-the-art ai chatbot, or, more precisely, a “transformer-based instruction-tuned text generator” that was launched by the company OpenAI in November 2022 (Liesenfeld et al., 2023). In producing textual outputs, it can be said to rely on three main junctures: training dataset, training tasks, and user input.

2.1 Training Datasets

The precise training datasets behind ChatGPT’s gpt-3.5 and gpt-4 llm s are kept secret, but they include large amounts of texts that have been computationally collected (“scraped”) from the public internet up to 2021 (OpenAI, 2023). For instance, ChatGPT’s earlier gpt-3 llm relied on five main public data sources. These were: 1) a 2016–2019 subset of Common Crawl’s free-to-use web archive (see https://commoncrawl.org/); 2) The WebText2 dataset that contains the text of web pages that are linked to Reddit posts with three or more upvotes; 3 & 4) two internet-based books corpora called Books1 and Books2; and 5) English Wikipedia pages (Brown et al., 2020). However, the datasets used to train the gpt-3.5 and gpt-4 llm s have changed and are likely to continue to change in the future. For example, in April 2023 Reddit updated its api regulations to discourage the use of its users’ content to train ai models (Kemper, 2023). More recently it has been revealed that OpenAI has also used transcribed YouTube clips to train gpt-4 (Metz et al., 2024).

While the amount of data used to train ChatGPT is impressive, it is also limited. First, its training data is limited in time. The knowledge of llm s of human experience before the internet is restricted to digitised sources and ChatGPT’s llm s have not generally been trained on content relating to people, events, or experiences after the 2021 cut-off of their training datasets (although some newer or paid versions do, to some extent). Second, the size of these training sets “does not guarantee their diversity” (Bender et al., 2021, p. 613). Internet use has sharply increased in the last decade, but in 2021 an estimated 2.9 billion people had still never used the internet. Most of these people (96%) lived in developing countries (International Telecommunication Union, 2021). Without internet access, the daily lives and concerns (and questions) of a large chunk of the world are not captured in the online texts that undergird ChatGPT. In addition, the daily lives, and concerns of the most avid internet users – often young and relatively wealthy cisgender males from Western countries – are overrepresented (Bender et al., 2021). For example, recent surveys of Reddit users show them predominantly to be male (67%) and between the age of 18 and 29 (64%), while those of Wikipedia users show few to be women (8.8–15%) (cited in Bender et al., 2021). Similarly, certain languages, most obviously English, are overrepresented and the conventions of user-generated internet content from particular websites are prioritised on account of being accessible.

2.2 Training Tasks

While it is arguably common knowledge that ai chatbots like ChatGPT learn from large volumes of data, the training objectives of these ai systems are frequently misunderstood to be very broad or even all-encompassing. For example, people might think that ai chatbots learn to “talk” like humans (Natale, 2021). In actuality, because they have to be validated (tested), the training objectives of most llm s are purposefully simple and unambiguous prediction tasks. Based on patterns in training data, models try to correctly predict future data. llm s, for example, are not trained to “answer questions” but only to “predict the next word”. In this sense, these models that stand at the basis of ai chatbots work stochastically: “haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning” (Bender et al., 2021, p. 617). In other words, llm s have learned the probability of words following one another. When they generate text, llm s use these probabilities to produce an output. However, this output can vary (slightly) for each prompt, even if the input is the same. This randomness in the output is what is referred to as stochastic: the pattern of word probabilities is learned and somewhat predictable, but the exact output is not deterministic. Bender et al. (2021) use this characteristic to point out that ai chatbots do not speak or think but mimic language patterns: they are “stochastic parrots”. Based on a prompt (P), the model is tasked to predict the next token (T1). Tokens are the basic textual units of an llm. They can be as short as one character or as long as one word. For example, punctuation marks and spaces are important tokens as well as single words. Based on (P + T1), the model then predicts the next token (T2). The output (T1 + Tn) should be a pleasing response to the initial prompt. How does ChatGPT become good at predicting pleasing next words? Although its training pipeline is kept secret, most scholars agree that it consists of three steps: an unsupervised large language model, supervised instruction tuning, and reinforcement learning from human feedback (rlhf) (Liesenfeld et al., 2023).

The first step involves the training datasets discussed above. Based on these large collections of “natural” (produced by humans) language, the model trains an encoder, which turns words into numerical representations (embeddings) that capture semantic meaning and relationships between words. Having access to billions of examples, the encoder determines the optimal way of turning a word into an embedding, which can be used to predict the sequence of words in the original training data (Brown et al., 2020).

In the second step (instruction tuning), the model is trained to predict next words in a way that seems more responsive. In simple terms, the llm is fed with a lot of questions and correct (satisfying) answers. These question-and-answer (q&a) pairs partly stem from interactions with the ai chatbot – i.e., every time we interact with a bot, we provide training material for future versions. However, this training data is increasingly synthetic in nature, meaning that one ai, trained to judge if answers are satisfying, is ranking the responses of another ai, trained to provide satisfying answers. Note that this is a supervised learning task on a different dataset – humans (and ai models) instruct the model of the desired pattern. Instead of the collection of texts, the model learns to adjust its answers based on a large number of question-and-answer (q&a) pairs.

The final rlhf phase is meant to achieve a similar objective. The model presents humans with a couple of possible answers to a prompt. The human picks the best (most pleasing) answer. The model is rewarded (reinforced) if its best answers match the human’s. This phase is also used to “teach” the model to not provide answers that many users will find offensive. Because texts on the internet can be offensive the next words that an llm predicts might similarly be objectionable. rlhf helps the llm to “behave”: to predict next words that are pleasing to its users. Because rlhf is relatively labour-intensive, which is often, and problematically, outsourced to low-wage countries, instruction-tuned text generators have also increasingly turned to synthetic data in this phase (Wang et al., 2023). In other words, the q&a data is produced by having a model talk to itself although it is unknown if ChatGPT also makes use of this kind of artificial data (Liesenfeld et al., 2023).

As a result of these different training objectives, chatbots like ChatGPT are very good at generating texts that are pleasing to most of their users. This can save humans a lot of time in certain tasks, especially if the output can be checked (coding). However, it also presents ethical and societal risks. Most prominently, many users expect ai chatbots to have an answer to (almost) everything. However, the next words that these ai chatbots predict are, as already explained, stochastically biased by the shape of their training data. The second and third steps of the training pipeline can be used to mitigate these biases. At the same time, this kind of “model alignment” also introduces new forms of bias.

2.3 User Input

According to recent polls, users of ai chatbots tend to be male, rich, and relatively young (Orth, 2023). The prompts that these users feed the model are critical insofar as they provide the starting point for the prediction of pleasing outputs but also in that they are used to finetune the model. In stochastically rendering these responses, ai chatbots are prone to “hallucinating”. This term was introduced by OpenAI to describe factually inaccurate or simply made-up elements of ChatGPT output. For example, if asked to name five books, ChatGPT might return three existing books and two with convincing-sounding titles that have never been written. In general terms, ai chatbots are trained to be plausible and pleasing but not necessarily correct. In fact, ai chatbots cannot differentiate between right or wrong and fact and fiction: they can only ascertain if an output is pleasing and adheres to a stochastic pattern. Answers to prompts are likely to contain factual information because the data on which ai chatbots were trained contains a lot of facts and users prefer factual answers. At the same time, if users are unable to (fact)check the outputs, they have no way of knowing if they contain – stochastically plausible but possibly highly problematic – counterfactuals. Connected to this, ai chatbots are also unable to indicate on which part of its training data an answer was based, which makes ad hoc (fact)checking difficult.

To mitigate some of the problems described above – the limited temporal scope of llm training data, the hallucinations, and its inability to reference sources – researchers have devised a method called retrieval-augmented generation (rag) (Lewis et al., 2020). This method combines a next-token prediction with a document retrieval task (the basic function of a search engine like Google). Before generating an answer to a prompt, the ai chatbot is forced to look at a dataset that contains up-to-date information considered factual – for example, Wikipedia and recent articles from the New York Times. In predicting next tokens, the model must privilege information from this external database over the distributions it learned during the first phase of its training. In addition, it also has to return the document which contains the new information. While rag might mitigate some of the problematic aspects of ai chatbots, it is important to note that these might be more fundamental than their creators wish to acknowledge. For example, rag might be used to ground the answers of an ai chatbot in facts, but these can be the facts of Wikipedia, the Flat Earth Society, or the Chinese communist party.

User input can also be thought of in terms of the ways users imagine an ai chatbot to work. People’s perceptions of a technology steer how they use it. That is, people´s technological “imaginaries” are performative; they shape use (Bucher, 2016). One way to think of this relationship is through the notion of affordances. Affordances are the perceived range of possible actions associated with an object or environment (Gibson, 1979; Faraj & Azad, 2012). An affordance is therefore both objective (part of the material properties of an object or environment) and subjective (psychological and social). Affordance, then “are rooted in a relational ontology which gives equal play to the material as well as the social” (Faraj & Azad, 2012, p. 238). Users, therefore, have certain expectations of technology. “These expectations may not be encoded hard and fast into such tools by design, but they nevertheless become part of the users’ perceptions of what actions are available to them […] Imagined affordances emerge between users’ perceptions, attitudes, and expectations; between the materiality and functionality of technologies; and between the intentions and perceptions of designers” (Nagy & Neff, 2015, p. 5).

The graphical user interface of ChatGPT and other ai chatbots actively steers the “imagined affordances” of users in a certain direction. Most prominently, interactions with instruction-tuned text generators are styled as a conversation between two humans. However, ChatGPT 3.5, the standard version of the ai chatbot, currently only takes the last 4,096 tokens into account, which translates into approximately 3072 words. These 4096 tokens work as a sliding window. In other words, while users are presented with the full exchange and can easily scroll back to the starting point of an interaction, the ai chatbot bases its output on the last 4096 tokens. In a similar way, the responses of ai chatbots are also limited by their length. Users are steered into believing that they are having an interaction with a human or human-like intelligent agent. In reality, ai chatbots only generate pleasing texts of a certain length based on part of the user’s input. However, the expectations of users – the imagined affordances of ChatGPT – heavily influence the input of users: the prompts they provide to ai chatbots.

3 Memory and ChatGPT Beyond Bias: Stochastic Remembering and ai Mnemo-Technics

Many of the common critiques of ai chatbots, which originate primarily from the field of critical ai studies, have already started to filter down into memory studies (Merrill, 2023; Makhorthyk, 2024). Memory scholars have also helped prefigure broader political economy and political ecology critiques of ai systems in terms of their negative environmental impact (Reading, 2014; Bender et al., 2021). Overall, however, and as with many of the contributions to this special issue, these critiques have predominantly been turned towards highlighting the various mnemonic risks and challenges that ai systems present. For example, with regards to ai Chatbots like ChatGPT there is growing concern about the historical accuracy of the outputs that are generated (Kansteiner, 2022) and how this accuracy is afflicted by the different biases introduced by its datasets, training tasks, and fine tuning.

However, critiquing ai systems only in terms of “bias” is also problematic. Lindgren (2023) writes of a “bias bias”, a disproportionate tendency to focus on and critique bias as something that is reduced to a “glitch” in ai systems that can be fixed. He argues that racist, sexist, and transphobic outputs are not “biases”, in the sense that they are something that “models get slightly wrong, and that can be nudged back on track. These things are deeply rooted in history, social structure, language, and ideology in ways that defy technological fixes” (Lindgren, 2023, p. 157). Approaching bias in this way is not pessimistic but realistic (Lindgren, 2023). Understanding and acknowledging how the output of ai chatbots can reflect deep-rooted social, political, and cultural imbalances in power and hegemonically shape public memory and remembrance cultures is, in other words, important, as we illustrate in our later empirical explorations. Thus, we seek primarily to move beyond debates of bias and accuracy that tend to rely on the epistemologically privileged ideas of history rather than the more fallible notions of memory while pitching humans against ai systems. We do this by promoting a view of remembering as dynamically distributed between these two sets of actors.

Who or what remembers when we ask ChatGPT to produce a text about the past? Can we even call the reproduction and re-presentation of the past by instruction-tuned text generators remembering? The answers to these questions, we argue, lie in recognising memory to rely on a distributed process that is informed and shaped by technological, social, cultural, and linguistic contexts and the moment of remembering (the present). This holds true for both individual and collective forms of remembering. We are not “simply” our individual cognition, we exist in relation to others, and we are always already part of a semiotic and technological system that predates our birth. Seen in this light, neither ChatGPT nor prompting users remember “on their own”. Rather, they do so in relation to each other and the contexts in which they operate.

This contention connects to Halbwachs theorisation of the social frameworks of memory (Halbwachs, 1992). Without such frameworks, Halbwachs notes that remembering would be akin to dreams or hallucinations: “one may say that the individual remembers by placing himself in the perspective of the group, but one may also affirm that the memory of the group realizes and manifests itself in individual memories” (Halbwachs, 1992, p. 40). Furthermore, “with the aid of the material traces, rites, texts, and traditions left behind by that past” and “moreover of recent psychological and social data, that is to say, with the present,” (p. 119) individual remembering is intrinsically connected to the social world.

What counts as “social”, though, can be extended beyond the human, to include technological and other non-human actors. Latour (2005) argues that “there is no society, no social realm, and no social ties, but there exist translations between mediators that may generate traceable associations” (108, original emphasis). These mediators translate, that is, they transform, distort, or alter the meaning or elements they transport within the network (i.e., they are not neutral). Following this line of thinking, remembering can be argued to be a “social” process wherein connections are made, and continually remade, between people and things that mediate and associate. Consequently, this affects how we might view mnemonic agency. Instead of locating agency in the human alone, or anthropomorphising ai chatbots (assigning them human-like agency), we should regard the agency of humans and machines as contingent, relational, and distributed. Instead of asking “who or what has agency?”, we should ask “who or what acts or matters?” in a network of associations, and indeed “who or what remembers?” This line of thinking aligns with what Makhortykh (2024) calls “human-to-robot memory communication” in which ai systems become active memory actors.

When ChatGPT is used for mnemonic purposes, for example, to create a list of historically important activists (as we do later in this article), we can see how this distributed agency plays out. The prompt and its answer constitute a specific socio-technical assemblage. ChatGPT has a “knowable” past in the form of a database (see above), although it never “understands” this past – it has no “historical consciousness” or “historical imagination”; it is a stochastic rendering of the past. We could say that the possible pasts ChatGPT can generate are based on its database and its, partly human-assisted, training. This data, as we explained, can be human-made or (increasingly) synthetic, but it is made or based on human utterances scraped from the internet. When a user inserts a prompt, the ai chatbot creates a textual mnemonic object that sets “the stage for various acts of remembering” and different engagements with reactions (Jacobsen, 2020, p. 2). In that sense, ChatGPT is simultaneously recombining and re-presenting a probable past and “prompting” a user to remember in a specific way. So, the user and the chatbot prompt each other to remember. We will return to this process later, but what is important to note for now is that we can observe a continuous feedback loop between mnemonic inputs and outputs, and between the human and non-human.

Rather than seeing the human and non-human as strictly separate entities in remembering, we regard the human and non-human as co-constituting remembering. We remember together, with others and with things (see Stiegler, 1998, 2009; Hayles, 2012). Such approaches are gaining saliency in some corners of the memory studies field including with respect to biological and ecological non-human actors, but less so with respect to technological actors even as the overlaps between these three categories become all the more apparent in terms of, for example, the mineral underpinnings of digital technology and the environmental impacts of ai systems (see Reading, 2014; Crawford, 2021). While this is not a new insight, it reminds us to place ai chatbots in a long history of technologies that have influenced remembering. Critically however these technologies have tended to be conceived by memory scholars in a human-centered and instrumentalist manner “as serving human ends rather than being ends themselves” (Merrill, 2023, p. 180). Notably, whereas in the past the tendency has been to reserve agency for the human and attribute little or none of it to (older) technology, the most recent wave of ai developments and their associated marketing and hype has encouraged the growth of a popular imagination characterised by the near opposite. ai will destroy/save humanity. In short, we are witnessing a reversal of Latour’s (1987) earlier observation of the limited agency attributed by humans to instruments and machines with ai models frequently attributed extreme or even super-human levels of agency (Smits and Wevers, 2022).

These changing conditions help better foreground the distribution of agency between humans and technology that Stiegler (1998, 2009) has highlighted via the concepts of technogenesis and technics, which convey how humans and technology have always been co-constitutive of one another. These popular imaginaries correctly suggest that something has changed with the most recent wave of growingly accessible ai technologies. But in characterising that change as involving the wholesale redistribution of agency from human to machine they are mistaken. Instead, what we are witnessing is a shift in the distribution of agency (and indeed perhaps a redefinition of agency) between these two sets of actors. Returning to remembering, Stiegler, in fact, discusses a specific type of memory-related technics called mnemotechnics that has changed with different technological developments over time – including, for example, writing but also, more recently, digital technology with the latter leading to the industrialisation of memory (see Merrill, 2023; Prey & Smit, 2023). The contemporary spread and popularisation of ai systems like ChatGPT arguably indicate the advance of yet a newer mnemotechnic – an ai mnemotechnic (Merrill, 2023). In short, OpenAI’s and others’ efforts to build instruction-tuned text generators based on publicly accessible human expressions and documents contribute to the next step in the “systematic industrialization of human memory and cognition through digital technologies” (Lemmens, 2011, pp. 33–34).

Moreover, as a mnemotechnology (Lemmens, 2011; Stiegler, 1998, 2009), ai chatbots engage with the past-as-data in specific ways, again stochastically. Thus, in this new assemblage of human and non-human remembering stochasticity is increasingly brought to bear on predominantly associative modes of human memory. Yet, and as the shift in views about technological agency also foregrounds, it is important to keep in mind that the distributed mnemonic agency behind remembering is never stable. It is not a case of balanced distribution but rather a constantly sliding scale from one moment to the next and from one setting to the next. Thus, instead of seeing ai chatbots as “conversational agents” or entities with independent agency, we challenge the reader to think of them as systems that combine the qualities of a scientific instrument and a medium: a system that has agency in combination with a human and that produces information of which the reliability depends largely on trust.

4 Remembering Twentieth Century Activists with ChatGPT

So far, we argued that ai chatbots and humans remember together. Through a case study – remembering the most well and least-known activists of the twentieth century in different languages – this section explores how the mnemonic agency of humans and (different parts of) ChatGPT is distributed. Next to different kinds of prompts, we apply two prompt engineering techniques: step-by-step prompting and chain of thought reasoning. These techniques were developed to improve the reliability of ai models. We show that they might also help us understand the influence of training data, training tasks, and user prompts on ChatGPT’s outputs: the top five lists of twentieth-century activists generated by the model. While the model cannot explain itself, prompt engineering techniques can help us to examine how it approaches the past stochastically.

We focus on activists because they can be historically and presently contentious. Both these elements contrast with the central tendency of ai chatbots to strive for pleasing and apolitical answers. We start by providing the model with three prompts: “Name five activists of the 20th century” / “Name the five most well-known activists of the 20th century” / “Name the five least well-known activists of the 20th century” (Table 1). We repeated these prompts five times on the same day in order to show to what extent ChatGPT is (in)consistent. The order in which these names appear is the order provided by ChatGPT.

T1

As Table 1 conveys, for prompt 1 and 2, the same six activists are consistently named indicating the consistency of ChatGPT’s response. Prompt 1 also generated the same answers as prompt 2. If a human was given prompt 1 their associative mode of remembering would allow them the opportunity to name a more diverse array of activists (hypothetically at least). ChatGPT, however, answers stochastically regardless of the prompt. This renders the “most well-known” component of prompt 2 partly obsolete. If given the opportunity ChatGPT will always name those (as combinations of tokens) most likely to be associated with the words “activists” and “20th century”. Thus, Martin Luther King Jr. is named frequently because his name has greater “weight” in the ai model in relation to the words “activists” and “twentieth century”. Only a certain number of names in the total dataset will have comparable statistical “weight”.

For prompt 3, however, ChatGPT provides a wider variety of names (17 names relating to 16 individuals). This reveals how the model’s response to “activists” and “well known activist” is predominantly male-orientated and skewed towards American Civil Rights activists whereas its response to “least well known” is more female-orientated and indicates a more diverse array of activist causes. Again, these reflect the expected bias of the datasets on which the ai chatbot has been trained, biases that, as mentioned, may have been compounded by the supervised instruction tuning, and rlhf carried out by OpenAI. Beyond these biases, however, the diversity of names provided by ChatGPT in response to prompt 3 conveys that there are many names in its database with much less weight (in relation to “activist”) than that of Martin Luther King Jr. In other words, ChatGPT communicates something akin to the so-called “long tail”. The “long tail” is the part of a statistical distribution where numerous incidences of whatever that is distributed appear far from the head or central part of the distribution. In actuality, there is an even longer tail of activist names precluded from ChatGPT’s response. However, our prompt (we only asked for five), the training data, and training tasks of the Chatbot shape what it returns as “activists”. It is important to acknowledge that ChatGPT is not naming these activists because (it knows) they are less frequently represented in its database. It names them because they are probabilistically most likely to be associated with “least well-known activist of the twentieth century”. Bayard Rustin (1912–1987) is a good example. A prominent member of social movements for civil rights, socialism, nonviolence, and gay rights, Rustin is often described as forgotten or faded from memory because other leaders were uncomfortable with his homosexuality. In other words, Rustin is partly known for being unknown, which makes him stand out at the head of the long tail. Finally, we might also suggest that even discussing these matters in terms of such things as long tails and stochasticity reflects our human uptake of these ai systems’ modes of remembering. We learn how the model recalls activists’ names and starts to remember in different ways: we are not only prompting the ai chatbot but it is prompting us as well.

Besides the names and short explainers of each activist, ChatGPT also provided different disclaimers and summaries when responding to our prompts. These are likely the result of q&a finetuning and rlhf. As such, these texts highlight additional points where human agency intervened to influence the mnemonic objects of ChatGPT’s outputs, particularly in terms of making them more pleasing. In prompt 1, ChatGPT’s disclaimers reveal its tendency towards prominent activists even though we did not prompt for this: “Certainly, here are five prominent activists from the 20th century”. In a response to prompt 2, it paradoxically (when considering its response to prompt 1) highlighted the difficulty of selecting just five most well-known activists: “There were numerous activists who made significant contributions to various social and political causes during the 20th century. While it’s difficult to narrow it down to just five, here are five well-known activists who had a major impact”. This difficulty was echoed in a response to prompt 3:

The recognition and fame of activists can vary widely, and it’s challenging to definitively determine the ‘least well-known’ activists of the 20th century because it depends on various factors like region, field of activism, and historical context. However, here are five activists who may not be as widely recognized as some others from the 20th century.

Taken together, these disclaimers reveal the tensions caused by ChatGPT’s stochastic and pleasing rendering of the past. It may be difficult to narrow down to just five most well-known activists of the twentieth century but ChatGPT does not actually know it is and despite this difficulty when given the chance again it will generally name the same five activists because of its stochastic mode of “remembering” and because it only “remembers” the last 4,096 tokens of our interactions with it and these tokens are not (yet) being fed back into its datasets. Notably, the results to our prompts above are clearly gendered, with women appearing in the results for prompt 3, whereas men dominate the lists for prompt 1 and 2.

4.1 Step-by-Step

Are there ways to better understand why an ai chatbot picks out some activists but neglects to mention others? In order to mitigate the problem of hallucinations, especially for logical problems, scholars have argued that prompts would give better results if a model was asked to gradually reason out a response rather than jumping immediately to the final answer. The idea is that the response of the model to a step-by-step prompt contains information that might help it to provide a better overall answer. For instance, Kojima et al. (2023) showed that by adding “Let’s think step by step” to a prompt they could significantly improve the accuracy of a llm in solving maths problems. We examined if this kind of “reasoning extraction” would change the results concerning the remembrance of activists. Of course, the reasoning for this kind of question is different than that of a maths problem. Moreover, there are no right or wrong answers: we do not have a definitive list of the most well or least known activists of the twentieth century.

Simply adding “Let’s think step by step” does not lead our prompts to result in different outputs. In contrast to a logical problem, this part of the prompt does not make the model clarify its reasoning: what is an activist, for example, and when is an activist considered to be well known? We can explicitly ask the model to describe the two most important elements of our prompt and then add them to a new prompt (see Figure 1). But again, this does not change the response to our prompts. Most obviously, this failed experiment shows that questions about how activists are remembered are not logical problems. Secondly, and more relevantly, it also shows that humans expect an ai chatbot to apply human-like reasoning to a problem. When confronted with a question about well-known activists, humans would start with setting definitions to determine the boundaries of the question. As our example shows, adding this kind of information does not change ChatGPT’s output. Again, this shows that it does not think like humans but approaches problems stochastically.

Figure 1
Figure 1

An example of step-by-step prompt engineering (ChatGPT 3.5, 23 December 2023)

Citation: Memory Studies Review 1, 2 (2024) ; 10.1163/29498902-202400015

4.2 Chain of Thought

llm s are zero-shot models, which means that they learn patterns unsupervised (we do not need to show them examples of what we are looking for). To mitigate the problem of hallucinations, Wei et al. (2023) proposed that providing one or more question-and-answer examples in a prompt might lead to more accurate answers for maths problems. The idea is that providing the model with a successful shot (a correct answer) will help it to solve a similar problem.

Can we apply this prompt engineering technique to human-ai remembering? Again, there are no right answers here. The question is how humans can assert agency (steer) the outcomes of the model via a prompt. As Figure 2 shows, giving the model one or more examples enables it to produce more relevant responses. Instead of asking the ai Chatbot to list the most well-known activists, we provide it with an example of what we think is a successful activist (Greta Thunberg) and prompt the model to find activists like her. ChatGPT returns a list of young, female, climate activists, who are indeed very similar to Thunberg. As these two forms of prompt engineering show, we cannot influence the outcomes of the model by setting boundaries – in the form of definitions or reasoning – we can, however, have the model produce results that are in line with the stochastic pattern we are looking for. We can regard this prompting technique as a “framework” by which or through which the model should “remember”, according to set parameters. Such interactions between ChatGPT (or similar technologies) and a user do require understanding of how the model works, which might not be the case for most users of the system.

Figure 2
Figure 2

An example of chain of thought prompt engineering (ChatGPT 3.5, 23 December 2023)

Citation: Memory Studies Review 1, 2 (2024) ; 10.1163/29498902-202400015

5 Conclusion

In this paper, we have demonstrated that ChatGPT presents a “pleasing” past when used for mnemonic purposes, and why it does so. Reflecting its database and because it operates stochastically, it will most likely represent the past through usual historical themes and people and mainstream narratives. In this conclusion, we move beyond this critique and return to the question of agency. Instead of seeing ChatGPT and other instruction-tuned text generators as possessing super, or supra-human agency, we propose to regard agency as unequally distributed among multiple human and non-human actors. This, in turn, shapes how the past is represented and who and what has the power to influence this representation. We scrutinise three sets of actors that constitute and “associate” with the socio-technical assemblage called ChatGPT: non-expert users, expert users, and designers. This is by no means meant as an exhaustive analysis of actors and agency but serves as a possible avenue for future research.

Most (“everyday”, non-expert) users of ChatGPT with no technical expertise will still have their own imaginaries regarding the technology. These are often socially informed, for example by mediated discourses and narratives of “ai”. Of course, these imaginaries can vary widely, ranging from those attributing super-human intelligence to a monolithic ai system, to more sophisticated imaginaries based on non-technical metaphors. Nevertheless, these imaginaries will shape how they engage with and perceive the outcomes of the system. An important “actant” (Latour, 2005) in this interaction is the online interface of the ai system. ChatGPT is presented as a chatbot, a friendly and helpful assistant that mimics a conversation, while in fact it “forgets” very quickly and is not context aware. Moreover, this user might not be aware that they are engaging with a “moving target”, a constantly changing ai system that combines different forms of machine learning, operable through a single interface. This is related to the truism that the simpler a technological interface is, the more it obfuscates. This serves a purpose, though: the more users perceive ChatGPT as a chatbot, the better it becomes at mimicking a natural conversation, or in our case, acting as a knowledgeable and legitimate mnemonic actor.

Expert users (i.e., a person with the technical knowledge about llm s) might understand all of this, but do not have access to change the technology. They know, however, how to leverage the system’s capabilities and integrate it into other technologies for their own benefit. Their agency is less driven by imaginaries (although they might very well have them), but by their understanding of the system. Designers (although being a tiered group with different access levels), in contrast, have access and can change the system for specific purposes or to reflect certain values. However, they are, in turn, affected by a group of actors that have a stake in the “success” of the technology: shareholders or investors. In terms of representing the past, designers are served by ChatGPT producing answers that are pleasing to the largest possible group of users; that is, Civil Rights activists “speak to” a majority of users. This leads to the production of hegemonic, pleasing texts about the past. In an ironic feedback loop, the more people like what the system produces, the better it becomes at predicting and providing what most people will like as an answer (a reason why it asks for constant feedback), and so on. Important future directions for research could explore some of the political dimensions we touched upon further, for example by focusing on the gendered aspects of results, prompting from a particular political stance, or the focus on specific historical periods when prompting in a specific language.

Acknowledgements

We are grateful to Umeå University’s Centre for Transdisciplinary ai for supporting this research via a micro-project grant.

Rik Smit is an Assistant Professor at the Centre for Media and Journalism Studies at the University of Groningen, the Netherlands. He teaches and does research within the field of memory studies, digital media, algorithmic culture and critical ai studies. His research has appeared in a range of journals and books, including New Media & Society, Convergence and Memory Studies.

Thomas Smits is Assistant Professor of Digital History & ai at the University of Amsterdam. His work is centered on modern visual (news) culture and is located at the intersections of digital humanities and social sciences. He both critiques ai and applies it in his research.

Samuel Merrill is Associate Professor at Umeå University’s Department of Sociology and Centre for Digital Social Research (digsum) in Northern Sweden. He specializes in digital and cultural sociology and his research interests concern, among other things, the intersections between memory and digital technology, social media platforms, and Ai systems.

Works Cited

  • Barnier, A. J., & Hoskins, A. (2018). Is there memory in the head, in the wild? Memory Studies, 11(4), 386390. https://doi.org/10.1177/1750698018806440.

    • Search Google Scholar
    • Export Citation
  • Bender, E. M., Gebru, T., McMillan-Major, A., & Mitchell, M. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021acmConference on Fairness, Accountability, and Transparency, 610623. https://doi.org/10.1145/3442188.

    • Search Google Scholar
    • Export Citation
  • Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 18771901.

    • Search Google Scholar
    • Export Citation
  • Bucher, T. (2016). The algorithmic imaginary: Exploring the ordinary affects of Facebook algorithms. Information, Communication & Society, 20(1), 3044. https://doi.org/10.1080/1369118X.2016.1154086.

    • Search Google Scholar
    • Export Citation
  • Cave, S., Dihal, K., & Dillon, S. (2020). Introduction: Imagining ai. In S. Cave, K. Dihal, & S. Dillon (Eds.), ainarratives: A history of imaginative thinking about intelligent machines. Oxford University Press. https://doi.org/10.1093/oso/9780198846666.003.0001.

    • Search Google Scholar
    • Export Citation
  • Crawford, K. (2021). Atlas ofai: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.

  • Faraj, S., & Azad, B. (2012). The materiality of technology: An affordance perspective. In P. M. Leonardi, B. A. Nardi, & J. Kallinikos (Eds.), Materiality and organizing: Social interaction in a technological world. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199664054.003.0012

    • Search Google Scholar
    • Export Citation
  • Gibson, J. J. (1979). The ecological approach to visual perception. Houghton Mifflin.

  • Halbwachs, M. (1992). On collective memory. University of Chicago Press.

  • Hayles, N. K. (2012). How we think: Digital media and contemporary technogenesis. University of Chicago Press.

  • International Telecommunication Union. (2021). Measuring digital development: Facts and figures. Retrieved from https://www.itu.int/en/ITU-D/Statistics/Pages/facts/default.aspx.

  • Jacobsen, B. N. (2020). Sculpting digital voids: The politics of forgetting on Facebook. Convergence, 27(2), 357370. https://doi.org/10.1177/1354856520907390.

    • Search Google Scholar
    • Export Citation
  • Kansteiner, W. (2022). Digital doping for historians: Can history, memory, and historical theory be rendered artificially intelligent? History and Theory, 61(4), 119133. https://doi.org/10.1111/hith.12282.

    • Search Google Scholar
    • Export Citation
  • Kemper, J. (2023, April 20). Reddit ends its role as a free ai training data goldmine. The Decoder. Retrieved from https://the-decoder.com/reddit-ends-its-role-as-a-free-ai-training-data-goldmine/.

    • Search Google Scholar
    • Export Citation
  • Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2023). Large language models are zero-shot reasoners. Retrieved from https://doi.org/10.48550/arXiv.2205.11916.

    • Search Google Scholar
    • Export Citation
  • Latour, B. (1987). Science in action: How to follow scientists and engineers through society. Open University Press.

  • Latour, B. (2005). Reassembling the social: An introduction to actor-network-theory. Oxford University Press.

  • Lemmens, P. (2011). “This system does not produce pleasure anymore”: An interview with Bernard Stiegler. Krisis | Journal for Contemporary Philosophy, 2011(1), 3341. https://krisis.eu/article/view/39064.

    • Search Google Scholar
    • Export Citation
  • Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 94599474. https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481f5-Abstract.html.

    • Search Google Scholar
    • Export Citation
  • Liesenfeld, A., Alianda, L., & Dingemanse, M. (2023). Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators. Proceedings of the 5th International Conference on Conversational User Interfaces, 16. https://doi.org/10.1145/3571884.3604316.

    • Search Google Scholar
    • Export Citation
  • Lindgren, S. (2023). Critical theory ofai. Polity.

  • Makhortykh, M. (2024). Shall the robots remember? Conceptualising the role of non-human agents in digital memory communication. Memory, Mind & Media, 3, Article e7. https://doi.org/10.1017/mem.2024.2.

    • Search Google Scholar
    • Export Citation
  • Merrill, S. (2023). Artificial intelligence and social memory: Towards the cyborgian remembrance of an advancing mnemo-technic. In S. Lindgren (Ed.), Handbook of critical studies of artificial intelligence (pp. 173186). Edward Elgar.

    • Search Google Scholar
    • Export Citation
  • Metz, C., Kang, C., Frenkel, S., Thompson, S. A., & Grant, N. (2024, April 6). How tech giants cut corners to harvest data for A.I. The New York Times. Retrieved from https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html.

    • Search Google Scholar
    • Export Citation
  • Nagy, P., & Neff, G. (2015). Imagined affordance: Reconstructing a keyword for communication theory. Social Media + Society, 1(2). https://doi.org/10.1177/2056305115603385.

    • Search Google Scholar
    • Export Citation
  • Natale, S. (2021). Deceitful media: Artificial intelligence and social life after the Turing test. Oxford University Press.

  • OpenAI. (2023). gpt-4 technical report. Retrieved from https://doi.org/10.48550/arXiv.2303.08774.

  • Orth, T. (2023). What Americans think about ChatGPT and ai-generated text. YouGov. Retrieved from https://today.yougov.com/technology/articles/45128-what-americans-think-about-chatgpt-and-ai-text.

    • Search Google Scholar
    • Export Citation
  • Prey, R., & Smit, R. (2023). From personal to personalized memory: Social media as mnemotechnology. In Z. Papacharissi (Ed.), A networked self and birth, life, death (pp. 209223). Routledge.

    • Search Google Scholar
    • Export Citation
  • Reading, A. (2014). Seeing red: A political economy of digital memory. Media, Culture & Society, 36(6), 748760. https://doi.org/10.1177/0163443714532980.

    • Search Google Scholar
    • Export Citation
  • Shur-Ofry, M., & Pessach, G. (2020). Robotic collective memory. Washington University Law Review, 97(3), 9751005.

  • Smit, R. (2024). When memories become data; or, the platformization of memory. In Q. Wang & A. Hoskins (Eds.), The remaking of memory in the internet age. Oxford University Press.

    • Search Google Scholar
    • Export Citation
  • Smits, T., & Wevers, M. (2022). The agency of computer vision models as optical instruments. Visual Communication, 21(2), 329349. https://doi.org/10.1177/1470357221992097.

    • Search Google Scholar
    • Export Citation
  • Stiegler, B. (1998). Technics and time 1: The fault of Epimetheus (R. Beardsworth & G. Collins, Trans.). Stanford University Press.

  • Stiegler, B. (2009). Technics and time 2: Disorientation (S. Barker, Trans.). Stanford University Press.

  • Verdegem, P. (Ed.). (2021). aifor everyone? Critical perspectives. University of Westminster Press. https://doi.org/10.2307/j.ctv26qjjhj.

    • Search Google Scholar
    • Export Citation
  • Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2023). Chain-of-thought prompting elicits reasoning in large language models. Retrieved from https://doi.org/10.48550/arXiv.2201.11.

    • Search Google Scholar
    • Export Citation

Content Metrics

All Time Past 365 days Past 30 Days
Abstract Views 0 0 0
Full Text Views 1945 1945 485
PDF Views & Downloads 1724 1724 279