| Integrating Empirical Methods into Computer Science | |
|
Project Goals Required Skills Lab Repository Related Papers Collaborators Feedback |
||
Empirical Lab RepositoryTitle: Author Identification via Word Lengths Author: Dave Reed, Creighton University, davereed@creighton.edu Possible Courses: CS1 Empirical Concepts Introduced: data analysis Computer Science Concepts Used: file I/O, arrays, counters, loops, string manipulation Summary:
This assignment involves analyzing patterns that may appear in works of literature
by an author. It has been shown that authors tend to follow the same patterns in
their
writing style (e.g., favoring longer, more sophistacted words over short, simple
words), and these
patterns
have been used by researchers in identifying the author of uncredited works. In
this assignment, students will analyze works of literature with respect to word
lengths. Each work is read from a file, one word at a time, stripped of all
non-letters, and counts
for each of the corresponding word lengths maintained. The absolute and relative
frequencies of each
word length are then displayed in a table for analysis.
Variations: To further emphasize the role of analysis, additional questions could be asked of the students. For example, the instructor might provide several works of literature by two different authors, then provide an uncredited work and ask the student to identify (and justify) the author.
|