Integrating Empirical Methods into Computer Science

Required Skills

Table 1 presents our candidate list of core empirical competencies for practicing computer science. In constructing this table, we first examined typical tasks in which computer scientists apply empirical reasoning, and extracted those concepts and skills necessary for the tasks. We also reviewed the contents of numerous Quantitative Reasoning and basic statistics courses, and met with colleagues both in computer science and statistics. We believe the content of this table provides a framework for successfully integrating empirical reasoning into computer science curricula.

Table 1. Core Empirical Competencies for Computer Science
Level Concepts Skills
Introductory
mean vs. median
informal definition of probability
chance error
expected values
Law of Large Numbers
consistency vs. accuracy
benefits/limitations of models

Be able to conduct a well-defined experiment, summarize the results, and compare with the expected results.
Be able to evaluate the persuasiveness of experimental conclusions, focusing on issues such as the clarity of the hypothesis, the sample size, and data consistency.

Intermediate
uniform vs. normal distribution
standard deviation
sampling large populations
methods for obtaining a good sample
curve-fitting (e.g., linear regression)
data presentation (e.g., tables, scatter-plots, histograms)
software testing & debugging strategies
validity threats (e.g., confounding variables, non-generalizability)

Be able to plot and describe the relation between two variables, such as problem size vs. efficiency when analyzing algorithms.
Be able to use a sample to make predictions about a population.
Be able to evaluate the persuasiveness of experimental conclusions, focusing on issues such as appropriate statistical measures, relevance of a model, and generality of the conclusions.
Be able to design a test suite for a software project and conduct systematic debugging.

Advanced
standard error
confidence intervals
measures of goodness of fit (e.g., correlation coefficient)
observational study vs. controlled experiment
correlation vs. causality
significance tests (e.g., t-test, z-score)

Be able to apply experimental methods to the analysis of complex systems in different domains.
Be able to formulate testable hypotheses and design experiments for supporting or refuting those hypotheses.

Table 1 is broken into three different levels corresponding to the levels of the undergraduate computer science curriculum. The purpose of this division is twofold. First, it emphasizes an incremental approach to developing empirical knowledge and skills. Generally, the concepts and skills appearing at the introductory level are basic and less formal, those appearing at more advanced levels require a more formal understanding and pair well with upper division computer science courses. For example, at the introductory level, students learn that a larger sample is more likely to produce a more accurate estimate of the population but do not learn about confidence intervals for quantifying its accuracy until the advanced level. This organization allows for a progressive introduction of many empirical concepts, encouraging students to develop intuition about them before learning more formal or mathematical definitions.

The second reason for dividing the competencies into levels is to complement the traditional format of computer science curricula. In most undergraduate programs, introductory-level courses focus on programming and problem-solving skills. Intermediate-level courses (e.g., data structures, algorithms, computer organization) emphasize foundational knowledge and skills, introducing more formality in design and analysis. Finally, upper-level courses build upon these foundations to study specific areas of computing in depth. Table 1 organizes the empirical concepts and skills to best fit with these traditional practices in curriculum design. For example, many of the concepts listed at the introductory level could easily be introduced in typical CS1 and CS2 assignments, such as simulating dice rolls or random walks. Basic data representation using scatter-plots appears at the intermediate level because graphing problem size versus running time is a natural thing to do in a data structures or algorithms course.

It is important to note that the divisions that appear in Table 1 are not absolute or inflexible. One instructor might take a more formal approach to algorithms in CS1 and CS2, introducing standard deviation or curve fitting early. Likewise, another instructor might choose to defer other topics to later in the curriculum.

A full description of this table and illustrative examples can be found in the paper Core Empirical Concepts and Skills for Computer Science by Grant Braught, Craig Miller, and David Reed. The paper appeared in the Proceedings of the 35th SIGCSE Technical Symposium on Computer Science Education, SIGCSE Bulletin 36(1), 2004, and can be accessed as part of the ACM Digital Library. For those who do not have access to the ACM Digital Library, a draft of this paper is available under the above "Related Papers" link.

	Integrating Empirical Methods into Computer Science

	Project Goals Required Skills Lab Repository Related Papers Collaborators Feedback