CSC 221: Introduction to Programming
Fall 2013

HW6: Lists and Data

This assignment must be completed independently (with assistance from the instructor as needed).



PART 1: Codingbat Practice (50%)

CodingBat is a free, educational site developed by Nick Parlante at Stanford. It contains short programming problems that can be completed online, either in Java or Python. Problems are divided into categories such as String-1 (String manipulation exercises) and Logic-1 (conditionals and boolean logic exercises). Each problem provides a small description and then requires you to enter the code into a function skeleton. When you click on the Go button, your code is automatically compiled, executed, and graded in the page.

For the first part of this assignment, you must complete:

In order to receive credit for your work, you must create an account at CodingBat and list me as your instructor (davereed@creighton.edu). That way, I can review your answers and keep track of how many you have solved. Note: you must be logged in when working on problems in order for CodingBat to remember you work.


PART 2: Big Data (50%)

In class, we studied functions that processed twitter data from very large files. In particular, the tagVolume function took a year as input and displayed volume statistics regarding hashtags in tweets for that year. It did so by first processing each month, filtering out the lines that correspond to the desired month and counting the number of hashtags. It displayed the total number of hashtags for each month, then the total number for the year.

You are to define two functions that similarly process a twitter data file. The first, countOccurrences, takes two inputs, a year and a tag (a string). It should process the data file corresponding to that year and display how many times that tag occurred - by month and for the year. For example:

The second function, showContains, takes two inputs, a year and a tag, and display all hashtags from that year that contained the specified tag. Note that a hashtag should not be printed more than once. For example:

You will want to download the data files to test your functions on: hashtags2006.tsv, hashtags2007.tsv, hashtags2008.tsv and hashtags2009.tsv.