Recently, the idea of “domain tuning” or customizing lexicons to improve results in machine translation and summarization tasks has driven the need for better testing and training corpora. Traditional methods of automated document identification rely on word-based methods to find the genre, domain, or authorship of a document. However, the ability to select good training corpora, especially when it comes to machine translation systems, requires automated document selection methods that do not rely on the traditional lexically-based techniques. Because syntactic structures and syntactic feature densities can heavily affect machine translation quality, syntactic feature-based methods of document selection should be used in choosing training and testing corpora. This paper provides evidence that document genres can be distinguished on the basis of syntactic-tag densities alone, supporting the idea that automated document identification is possible using alternative methods. Such methods would be ideal for creating syntactically as well as lexically balanced corpora for both genre and subject matter.