CSCE 670 :: Information Storage and Retrieval :: Spring 2013
Back to main page
For the project, you will work in teams of either two or three students on a problem of your choosing that is interesting, significant, and relevant to Information Storage & Retrieval. You will have great latitude in what you choose to work on, so take advantage of this opportunity to make a big impact!
The primary requirements of the project are:
- Your project code must live on github. We prefer it to be public. However, if you're too scared to share your code with future employers, you can claim a private github account (with a .edu email address).
- Your project must use some non-trivial data that your team collects. You may choose to sample social media data (e.g., from one of the APIs listed over at Programmable Web), download an existing collection (e.g., Wikipedia, IMDB), write a simple crawler to collect pre-organized data (e.g., CIA docs,
The Simpsons, or write your own custom web crawler.
- Your project must implement at least one core algorithm that is presented in class or is closely related to the course topic (e.g., Hubs and Authorities, hierarchical clustering, collaborative filtering, learning to rank). You are welcome to re-use your homework code, but our expectation is that the project will implement at least one *new* algorithm (not covered in the homeworks).
Here are some sample projects (from the undergrad class).
The course project counts for 20% of your final grade. You will receive an overall rating based
on the performance of your entire team, as well as an individual rating based on the feedback of
your teammate and your participation in the final project presentation. Typically, the individual rating can bump or depress your project grade by some small delta (say moving a group rating of 85/100 plus or minus 5 points). Rarely, a project score may be depressed significantly if a group member makes only a superficial contribution to a project.
Recall that your late days are applicable to homework assignments only. All project milestones are due on their respective due date. No late project milestones will be accepted.
Project proposal (April 1) [1 to 2 pages (PDF); Post on Google Group]
Each group should post a 1-2 page project proposal in PDF to the course discussion forum by April 1 at 11:59pm. Be sure to start a new thread for your proposal and name the thread "Proposal: [project_name]", where [project_name] is a brief, descriptive name of your project. Your name should be something memorable!
In the proposal, you should address the following issues (adopted from C. Zhai):
- What is exactly the function of your tool? That is, what will it do?
- Why would we need such a tool and who would you expect to use it and benefit from it?
- Does this kind of tools already exist? If similar tools exist, how is your tool different from them? Would people care about the difference? How hard is it to build such a tool? What is the challenge?
- How do you plan to build it? You should mention the data you will use and the core algorithm that you will implement.
- What existing resources can you use?
- How will you demonstrate the usefulness of your tool?
29 and 30, we will hold the 670 project workshop during our regular class time. Each team will give a very brief project overview (a 30-second elevator pitch) and a demo. The format will be: a sequence of elevator pitches followed by an open demo session. All students are required to participate, both in terms of giving the demo, and in evaluating other demos.
Project report (May 5)
You should write your report as if you were writing a short conference paper. You should address the same questions as those you have addressed in the proposal, only with more details, especially regarding some of the challenges that you need to solve and your experimental results if any. You should also include your conclusions from the study and point out how your work can be further extended (i.e., future work).
There is no strict length requirement, but any reasonable report should probably be around 3-4 pages.