CSCE 489 :: Introduction to Data Science :: Spring 2013

MWF 10:20-11:10am in HECC 103 ETB 1035
Instructor: James Caverlee, HRBB 403
Office Hours: Wednesday 3-5pm, or by appointment
Department of Computer Science and Engineering
Texas A&M University

Course Schedule :: Practicum :: Project

Course Summary

Introduction to the theoretical foundations, algorithms, and methods of deriving valuable insights from data. Includes foundations in managing and analyzing data at scale (e.g., big data); data mining techniques and algorithms; exploratory data analysis; statistical methods and models; and data visualization.

Researchers across disciplines are excited by the prospect of "data-driven science" as a complement to traditional hypothesis-driven research. As evidence of this excitement, the White House in 2012 announced the first "Big Data Research and Development Initiative" spanning NSF, DoD, NIH, DARPA, DoE, and USGS. Companies like Google, Facebook, LinkedIn, Amazon, and Walmart are already investing in large-scale data analytics to extract information from massive datasets. As a first course in "data science", this course is designed to prepare students with the practical skills and theoretical foundations that span computer science, data engineering, statistics, visualization, and experimental design.

Learning outcomes:


We're going to use Google Groups for all course communication, so you should check our Google Group often. If you've got a homework question, post to the group. If you've found a cool link you want to share, post to the group! If you're looking for a study partner, post to the group!! Basically, the Google Group should be your best, first choice for all class-related concerns. I will monitor the group and provide feedback. But everyone is encouraged to contribute.


CSCE 315 or approval of instructor.


Course readings will be drawn from the following texts:

Optional readings:


The course grading policy is as follows: 5% Class participation, 15% Practicum, 40% Homework, 20% Final exam, 20% Project. The grading scale is A: 90-100, B: 80-89, C: 70-79, D: 60-69, F: 0-59.

Class participation (5%). Attendance in class and participation in the discussion are both important to your success in the course. As one crude measure of your participation, we will have around 4 to 6 low-stress ungraded quick quizzes (less than 5 minutes each) spread across the semester. These quick quizzes will not be graded for correctness. I will use them to gauge what topics we need to devote more time to and as an indicator that you were in class. You are also required to participate on the Google group.

Practicum (15%). Each week, a team of two students will lead an in-depth hands-on portion of class.

Homework assignments (40%). We will have 4 programming assignments over the course of the semester, each worth 10% of your final grade.

Final (20%). The final exam is closed book and will be held on May 7th from 8:00am to 10:00am. You may bring one standard 8.5" by 11" piece of paper with any notes you deem appropriate or significant (front and back). No calculators, iPads, iPhones, Blackberries, Android phones/tablets, or abacuses are allowed. The final will be a take-home, do-on-your-own-time final exam. Rather than spend 2hrs together during our regular exam time, we are going to make the final do-able on your own time, but still due by the end of the exam time. To clarify, I will release the final on Thursday May 2 by posting it to the Google Group. It will be due on Tuesday May 7 by 10am (that's the end of our regularly scheduled exam time). You should email me directly a file/link to your report. As you work on the exam, you may access any resources you like -- books, notes, Web, etc. -- except for other people. As in, you cannot seek help from me, from your classmates, from friends on Facebook, nor can you tweet out for help from the masses. Make sense? The total expected time for the final is about 90 minutes to 2 hours. My intent is not for you to spend 24hrs creating a perfect report, and indeed I would expect there to be little to be gained by spending an inordinate amount of time on the final.

Project (20%). For the course project, you will work in teams of up to two students on a problem of your choosing that is interesting, significant, and relevant to Data Science. At the end of the semester, we will hold a two-day Data Science Workshop during our regular class time. Each team will deliver an in-class project presentation. and a brief 3-4 page executive summary.

Americans with Disabilities Act (ADA) Policy Statement

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact the Department of Student Life, Services for Students with Disabilities, in Cain Hall or call 845-1637.

Academic Integrity Statements

AGGIE HONOR CODE: ''An Aggie does not lie, cheat, or steal or tolerate those who do.'' Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System. For additional information please visit: