CS 410 Top: Introduction to Computational Biology
| Credit Hours: | 4 |
| Course Coordinator: | Pavel Sumazin |
| Course Description: | The life sciences sector employes computer scientists to tackle the computational core of problems, and to develop efficient tools for researchers.
In this course we study elegant, effcient, and practical algorithms for sequence analysis problems, arguably the life-science area to which the computer science contribution is greatest.
This is an advanced algorithms course that studies the application of theoretical constructs to interesting, complex, and practical problems.
The course will focus on techniques fo rthe design and analysis of data structures and algorithms for sequence-analysis problems, including sequence comparison, sequence matching, longest common substring, longest common subsequence, shortest superstring, motif discovery, phylogeny alignment, sequence based predictions, and heurisitic design. |
| Prerequisites: | CS 350 Algorithms and Complexity, or CS 584 Algorithm Design and Analysis, or instructor's permission.
The class is designed for engineering students and there are no biology prerequisites. Interested non-engineering students are welcome, but should consult with the instructor. |
| Goals: | The investigation of sequence-analysis problems from the life sciences through computational models, and the design of algorithms and heuristics.
An illustration of the impact of technology on the life sciences, an of the practical significance of efficient design.
Students will submit term projects, which in most cases will include implementations of models, algorithms, or heuristics. |
| Textbooks: | Algorithms on Strings, Trees, and Sequences, by Dan Gusfield. |
| References: | Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Dan Gusfield, ISBN: 0521585198
http://www.math.tau.ac.il/~rshamir/algmb/01/algmb01.html |
| Major Topics: | We study algorithms and heuristics for computational biology problems including shotgun sequencing, sequencing by hybridization, restriction map and physical map construction, regulatory network inference, and RNA and protein structure inference.
The course is dominated by the study string problems and algorithms including: shortest superstring hardness and approximations, approximate string matching, exact string matching, string searching, suffix trees and arrays, lowest common ancestor, approximate string matching and edit distance. |
| Laboratory Exercises: | None. |
| CAC Category Credits |
Core | | Advanced |
| Data Structures |
| 0.5 |
| Algorithms |
| 3.0 |
| Software Design |
| 0.5 |
| Computer Architecture |
| |
| Programming Languages |
| |
| Oral and Written Communications: | Each student is required to submit homework assignments (3 in this
class), and a project.
Projects are submitted both orally and in writing.
Project maybe in the form of a literature survey, algorithm
implementation or original work in computational biology. |
| Social and Ethical Issues: | None. |
| Theoretical Content: | We address computational problems that arise in molecular biology research.
We discuss limited issues in computational complexity and approximation
algorithms to NP-complete problems.
Learning algorithms and combinatorial search methods.
The class is most concerned with string and tree algorithms --
searching, matching and alignment problems. |
| Problem Analysis: | The course is focused on combinatorial analysis. |
| Solution Design: | Implementation based project will involve software design issues.
None are explicitly covered during class. |
|