| Course Description: | Scientists today face an avalanche of data. Oceanographers generate Terabytes with daily forecasts of temperature, elevation, and velocity.
Astronomers acquire hundreds of millions of images from increasingly powerful telescopes. Physicists are already discussing Petabyte-scale datasets collected from particle accelerators. Biologists have sequenced the Human genome, itself a large dataset, and are now describing the complex interactions between all 20,000 - 80,000 protein-encoding genes, not to mention the interactions between the proteins they encode. In all cases, scientists' ability to collect data has outpaced their ability to manage it. Complicate matters with non-standard data types, extreme performance demands, and ever-changing requirements, and you have one of the major data management challenges of today.
What do these applications have in common, and why are traditional data management tools inadequate? In this course, we will investigate this question from the perspective of modern database research. We will look at what scientific datasets in different domains have in common, and what sets them apart. We will survey the literature in this area, and work with tools used in practice.
The course is open to all computer science students as well as students in other scientific disciplines who are faced with data management challenges in their job or their research. We prefer that students have familiarity with relational databases or, ideally, have completed CS 386: Introduction to Databases. All students should have solid programming experience. Homework assignments will require access to a computer. Students who need to use lab equipment should sign up for a PSU account at the beginning of the course. |