Instructor |
David Maier maier
at cs dot pdx dot edu, 115-14 FAB. Note: Please put ‘cs510’ at the beginning of the subject line. |
Lab Assistant |
Jeremy Steinhauer jsteinha at cs dot pdx dot edu, 115-H FAB |
Phone: |
Mitchell: 503-725-2412 Maier: 503 725-2406 |
Class Meeting |
Tuesday, Thursday 10:00-11:15a, 150 FAB |
Office Hours |
Maier: Mondays 2-3pm Mitchell: T,Th 4-5pm Steinhauer: Fridays 1-2pm You are welcome to ask questions by e-mail or phone. |
Guest Lecturers |
TBA |
Wk |
Date |
Topic |
Reading (will be refined) |
Slides |
Homework Due Tuesdays (at 10 AM) |
1a |
Tues, 30 Mar DM |
Introduction to Information Retrieval; Boolean Retrieval
|
Ch. 1 |
Written Assignment 1 |
|
1b |
Thurs, 1 Apr DM, JS |
Text Processing; Introduction to Software for Project |
Ch. 2 |
Lucene Project, Part I assigned |
|
2a |
Tues, 6 Apr DM |
Indexing |
Ch. 4 Zobel & Moffat |
|
Written Assignment 1
Experimental Project, Parts A and B, assigned |
2b |
Thurs, 8 Apr DM |
Scoring, Weighting, VSM |
Ch. 6
|
||
3a |
Tues, 13 Apr DM, JS |
Catch up, Project info
|
Section 7.1 |
Lecture 5
|
Experimental Project Part A |
3b |
Thurs, 15 Apr MM |
Evaluation |
Ch. 8 |
|
|
4a |
Tues, 20 Apr MM (DM away) |
Probability Review Text Classification |
Chs. 11, 13 |
Lecture 7a (Evaluation 2) Lecture 7b Text Classification 1) |
Lucene Project Part I due (10 points)
Written Assignment 2 assigned
Lucene Project ,Part II, assigned
|
4b |
Thurs, 22 Apr MM |
Vector-Space Classification |
Ch. 14 |
Lecture 8a (Text Classification 2) Lecture 8b Vector Space Classification) |
|
5a |
Tues, 27 Apr MM |
Support Vector Machines
|
Ch. 15
|
Support Vector Machines |
Written Assignment 2 |
5b |
Thurs, 29 Apr DM (MM away)
|
Relevance Feedback |
Chs. 9 |
|
|
6a |
Tues, 4 May DM |
Information Extraction, Segmentation, Summarization |
Written Assignment 3 |
||
6b |
Thurs, 6 May MM |
Clustering 1 |
Ch. 16 |
||
7a |
Tues, 11 May MM |
Clustering 2 |
Ch. 17 |
|
Written Assignment 4 due (8 points) |
7b |
Thurs, 13 May MM |
Latent Semantic Indexing |
Ch. 18 |
|
|
8a |
Tues, 18 May Guest: Steven Bedrick, OHSU |
Image Retrieval |
|
Lecture 15 |
Experimental Project Part B Written Assignment 5 assigned |
8b |
Thurs, 20 May DM |
IR and the Web |
Chs. 19, 20 |
|
|
9a |
Tues, 25 May DM |
IR and the Web |
Written Assignment 5 due (8 points) |
||
9b |
Thurs, 27 May DM |
Making Information Findable |
|
|
|
10a |
1 June MM |
Network Structure and Search, Part 1 |
Ch. 21 |
Lucene Project Part II due (20 points) |
|
10b |
3 June MM |
Network Structure and Search, Part 2 |
|
|
|
11 |
8, 10 June |
Final exam week |
|
|
|
The e-mail list for this class will be cs510iri@cs.pdx.edu. It will be used for announcements from the instructor. You can also send questions and answers to this mail list. You can subscribe to the list at https://mailhost.cecs.pdx.edu/mailman/listinfo/cs510iri.
The Internet has seen the most extensive application of information retrieval (IR) techniques to date. At the same time, the Internet has often stressed traditional IR methods to the breaking point. This course introduces classical IR concepts, but also discusses how they are stressed when applied in the large, distributed and dynamic setting of the Internet, and covers some of the techniques used to get around the limitations. The first half of the course will address standard IR topics: keyword-based retrieval and indexing, classification, clustering and evaluation metrics for different approaches. The second half of the course will cover several different challenges for IR on the Internet, and technologies being developed to address them, such as
• Collection building: crawling the web, duplicate detection, sampling, digital libraries
• Providing context: annotation, meta-data
• Distribution: scalable architectures
• Heterogeneity: scraping, wrapping, translation
• Enterprise and Organizational issues: standards, interoperation
The course will also examine particular systems for searching and intermediation on the Internet. This course may be used in the Databases track of the CS MS.
REQUIRED:
Introduction to
Information Retrieval.
By Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press, 2008, ISBN-13:
9780521865715.
You can access the book and related materials at: http://nlp.stanford.edu/IR-book/information-retrieval-book.html
Readings will come both from the textbook and from supplementary materials.
There will be written assignments and project assignments. The written assignments will generally involve some manual exercise, with a short write-up due (possibly with answers to specific questions).
There will also be two projects. The first will involve formulating a hypothesis about the behavior of a search engine (such as Google), devising a way to test the hypothesis, conducting that test and analyzing your results. The second will involve collecting, indexing and searching web content using the Lucene search-engine library.
Students registered for CS 510 (rather than 410) will have an additional section of each assignment to complete.