CSCE 489 :: Data Science and Analytics :: Fall 2016

Tuesday/Thursday 3:55-5:10pm in HRBB 113
Instructor: James Caverlee, HRBB 403
Office Hours: Tuesday 3pm until class begins, Wednesday 4-5pm
Department of Computer Science and Engineering
Texas A&M University

TA: Shanshan Li
Office Hours: Monday and Wednesday, 2-3pm in 408D

Course Schedule :: Data Science Spotlight :: Project :: Resources

Final Projects

Each team created a project website (see below) that includes a link to a project video (aimed at a general audience) and a Jupyter notebook. Enjoy!

Course Summary

Introduction to the theoretical foundations, algorithms, and methods of deriving valuable insights from data. Includes foundations in managing and analyzing data at scale (e.g., big data); data mining techniques and algorithms; exploratory data analysis; statistical methods and models; and data visualization.

Researchers across disciplines are excited by the prospect of "data-driven science" as a complement to traditional hypothesis-driven research. As evidence of this excitement, the White House in 2012 announced the first "Big Data Research and Development Initiative" spanning NSF, DoD, NIH, DARPA, DoE, and USGS. Companies like Google, Facebook, LinkedIn, Amazon, and Walmart are already investing in large-scale data analytics to extract information from massive datasets. As a first course in data science, this course is designed to prepare students with the practical skills and theoretical foundations that span computer science, data engineering, statistics, visualization, and experimental design.

Learning outcomes:


We're going to use Piazza for all course communication. If you've got a homework question, post to Piazza. If you've found a cool link you want to share, post to Piazza! If you're looking for a study partner, post to the Piazza! Basically, Piazza should be your best, first choice for all class-related concerns. I will monitor and provide feedback. But everyone is encouraged to contribute.


CSCE 315 or approval of instructor.


Course readings will be drawn from a variety of online textbooks, scholarly papers, and other resources. Refer to the course schedule for details.


The course grading policy is as follows: 5% Class participation, 15% Data Science spotlight, 35% Homework, 20% Quizzes, 25% Project. The grading scale is A: 90-100, B: 80-89, C: 70-79, D: 60-69, F: 0-59.

Class participation (5%). Attendance in class and participation in the discussion are both important to your success in the course. You are expected to come to class, to ask questions and engage with the material, and to be an active participant on Piazza.

Data Science Spotlight (15%). You will be responsible for one data science spotlight over the course of the semester. A spotlight is an opportunity to share a compelling aspect of Data Science -- be it, a neat feature or library you want to share via a Jupyter notebook, a discussion and brief exploration of a particular dataset, an in-depth look at a "data science in the news" story, etc. All spotlights will take part during class. You will have 5 to 6 minutes in total during class to present.

Homework assignments (35%). We will have several programming-based homework assignments over the course of the semester. All will be in Python using Jupyter notebooks.

All homework assignments must be submitted by 11:59pm Central time on the due date. For the homework assignments, you may talk to any other class member or work in groups to discuss the problems in a general way. However, your actual detailed solution must be yours alone. If you do talk to other students, you must write on your assignment who it is that you discussed the problems with. Your submitted work must be written solely by you and not contain work directly copied from others.

Homework Collaboration Clarification: To clarify, your homework is yours alone and you are expected to complete each homework independently. Your solution should be written by you without the direct aid or help of anyone else. However, we believe that collaboration and team work are important for facilitating learning, so we encourage you to discuss problems and general problem approaches (but not actual solutions) with your classmates. If you do have a chat with another student about a homework problem, you must inform us by writing a note on your homework submission (e.g., Bob pointed me to the relevant section for problem 3). The basic rule is that no student should explicitly share a solution with another student (and thereby circumvent the basic learning process), but it is okay to share general approaches, directions, and so on. If you feel like you have an issue that needs clarification, feel free to contact either me or the TA.

Homework Plagiarism Policy: We will use the Stanford Moss system to check homework submissions for plagiarism. Students found to have engaged in plagiarism will be punished severely, typically earning an automatic F in the course and being reported to the Aggie Honor System.

Homework Late Days: For the homework assignments, you have a total of 5 late days that you can use during the semester. However, a single assignment can be submitted up to 3 days late only, so we can post solutions in a timely fashion. For the purposes of the class, a late day is an indivisible 24-hour unit. Once you exhaust your 5 late days, we will not accept any late submissions.

Quizzes (20%). We'll have around 10 fairly quick quizzes over the course of the semester. These will be around 5-10 minutes max and will mainly be checks that you are keeping up with the readings. Expect 3 or 4 short questions per quiz in a true/false, multiple-choice, or fill-in-the-blank style. Quizzes are closed book, closed notes. If you are keeping up with the readings and participating in class, then I would not expect these quizzes to require special extra study time. Note that these quizzes will cover that day's reading material: so stay on top of the readings ahead of classtime!

Project (25%). For the project, you will work in teams of three to four on your own data science problem. At the end of the semester, you will deliver a Jupyter notebook, project website, and two-minute video summarizing your work. We will hold a Project Showcase during the final exam time on December 13th from 1-3pm.

Americans with Disabilities Act (ADA) Policy Statement

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact the Department of Student Life, Services for Students with Disabilities, in Cain Hall or call 845-1637.

Academic Integrity Statements

AGGIE HONOR CODE: ''An Aggie does not lie, cheat, or steal or tolerate those who do.'' Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System. For additional information please visit: