Experimental laboratory results, often performed with college student subjects, have proposed several linguistic phenomena as indicative of speaker deception. We have identified a subset of these phenomena that can be formalized as a linguistic model. The model incorporates three classes of language-based deception cues: (1) linguistic devices used to avoid making a direct statement of fact, for example, hedges; (2) preference for negative expressions in word choice, syntactic structure, and semantics; (3) inconsistencies with respect to verb and noun forms, for example, verb tense changes. The question our research addresses is whether the cues we have adapted from laboratory studies will recognize deception in real-world statements by suspects and witnesses.
The issue addressed here is how to test the accuracy of these linguistic cues with respect to identifying deception. To perform the test, we assembled a corpus of criminal statements, police interrogations, and civil testimony that we annotated in two distinct ways, first for language-based deception cues and second for verification of the claims made in the narrative data. The paper discusses the possible methods for building a corpus to test the deception cue hypothesis, the linguistic phenomena associated with deception, and the issues involved in assembling a forensic corpus.