In this course, we'll study the theory, design, and implementation of text-based and Web-based information retrieval systems, including an examination of web and social media mining algorithms and techniques at the core of modern search and data mining applications. By the end of the semester you will be able to:
All course announcements will be mailed to the official course mailing list (to your tamu account), so you should check your TAMU mail often. If you have a specific question for either me or the TA please send us an email with 670 in the subject line. We will make our best effort to respond promptly, but we only guarantee a response within one week. The class discussion forum is a Google Group: csce670-spring2012.
I expect all students to have had some previous exposure to basic probability, statistics, algorithms, and data structures. You should be able to design and develop large programs and learn new software libraries on your own.
The primary textbook is IIR: Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. Available at Cambridge University Press, at Amazon, and other fine booksellers.
We'll also read some selections from:
You may find some of these optional textbooks helpful, though none are required:
It is critically important that you study the relevant course readings before class so that we can make the most of our limited class time together. I treat our class meetings as opportunities to highlight significant aspects of the material, to answer questions, to engage in discussions about particular topics, and so on. We cannot cover all of the material in class, so it is up to you to stay on top of the readings and the assignments.
The course grading policy is as follows: 10% Participation, 30% In-class quizzes, 20% Final exam, 40% Project. The grading scale is A: 90-100, B: 80-89, C: 70-79, D: 60-69, F: 0-59.
Participation (10%). Attendance in class and participation in the discussion are both important to your success in the course. As one crude measure of your participation, we will have around 3 to 5 low-stress ungraded quick quizzes (less than 5 minutes each) spread across the semester. These quick quizzes will not be graded for correctness. I will use them to gauge what topics we need to devote more time to and as an indicator that you were in class. Additionally, we expect you to participate in online discussions at csce670-spring2012. Over the course of the semester, you should post at least two substantive, interesting post to the discussion forum. You must also respond to at least six posts made by others. Towards your participation grade, the final day to post to the discussion group is April 19. (Of course you are welcome to continue to post afterwards, but these posts will not count toward your participation grade.)
Quizzes (30%). We'll have three in-class quizzes, each counting for 10% of your final grade. All quizzes are closed book.
Final (20%). The final exam is closed book and will be held on Monday, May 7 from 1-3pm. You may bring one standard 8.5" by 11" piece of paper with any notes you deem appropriate or significant (front and back). No calculators, iPads, iPhones, Blackberries, Android phones/tablets, or abacuses are allowed.
Project (40%). For the project, you will work either individually or with a partner on a problem of your choosing that is interesting, significant, and relevant to Information Storage & Retrieval. The ultimate goal of your course project is to develop a new tool to tackle some interesting real-world problem. At the end of the semester, we will hold a two-day workshop during our regular class time.
Regrade Policy: If you feel that we have made an error in grading a quiz, you may resubmit it for a regrade. You must include a brief written statement describing what portion has been graded in error. Note that we reserve the right to examine the entire assignment, so there is a chance we may find errors in your assignment that we missed before.