Course Schedule :: Data Science Spotlight :: Project :: Resources
Each team created a project website (see below) that includes a link to a project video (aimed at a general audience) and a Jupyter notebook. Enjoy!
Introduction to the theoretical foundations, algorithms, and methods of deriving valuable insights from data. Includes foundations in managing and analyzing data at scale (e.g., big data); data mining techniques and algorithms; exploratory data analysis; statistical methods and models; and data visualization.
Researchers across disciplines are excited by the prospect of "data-driven science" as a complement to traditional hypothesis-driven research. As evidence of this excitement, the White House in 2012 announced the first "Big Data Research and Development Initiative" spanning NSF, DoD, NIH, DARPA, DoE, and USGS. Companies like Google, Facebook, LinkedIn, Amazon, and Walmart are already investing in large-scale data analytics to extract information from massive datasets. As a first course in data science, this course is designed to prepare students with the practical skills and theoretical foundations that span computer science, data engineering, statistics, visualization, and experimental design.
We're going to use Piazza for all course communication. If you've got a homework question, post to Piazza. If you've found a cool link you want to share, post to Piazza! If you're looking for a study partner, post to the Piazza! Basically, Piazza should be your best, first choice for all class-related concerns. I will monitor and provide feedback. But everyone is encouraged to contribute.
CSCE 315 or approval of instructor.
Course readings will be drawn from a variety of online textbooks, scholarly papers, and other resources. Refer to the course schedule for details.
The course grading policy is as follows: 5% Class participation, 15% Data Science spotlight, 35% Homework, 20% Quizzes, 25% Project. The grading scale is A: 90-100, B: 80-89, C: 70-79, D: 60-69, F: 0-59.
Class participation (5%). Attendance in class and participation in the discussion are both important to your success in the course. You are expected to come to class, to ask questions and engage with the material, and to be an active participant on Piazza.
Data Science Spotlight (15%). You will be responsible for one data science spotlight over the course of the semester. A spotlight is an opportunity to share a compelling aspect of Data Science -- be it, a neat feature or library you want to share via a Jupyter notebook, a discussion and brief exploration of a particular dataset, an in-depth look at a "data science in the news" story, etc. All spotlights will take part during class. You will have 5 to 6 minutes in total during class to present.
Homework assignments (35%). We will have several programming-based homework assignments over the course of the semester. All will be in Python using Jupyter notebooks.
All homework assignments must be submitted by 11:59pm Central time on the due date. For the homework assignments, you may talk to any other class member or work in groups to discuss the problems in a general way. However, your actual detailed solution must be yours alone. If you do talk to other students, you must write on your assignment who it is that you discussed the problems with. Your submitted work must be written solely by you and not contain work directly copied from others.
Homework Collaboration Clarification: To clarify, your homework is yours alone and you are expected to complete each homework independently. Your solution should be written by you without the direct aid or help of anyone else. However, we believe that collaboration and team work are important for facilitating learning, so we encourage you to discuss problems and general problem approaches (but not actual solutions) with your classmates. If you do have a chat with another student about a homework problem, you must inform us by writing a note on your homework submission (e.g., Bob pointed me to the relevant section for problem 3). The basic rule is that no student should explicitly share a solution with another student (and thereby circumvent the basic learning process), but it is okay to share general approaches, directions, and so on. If you feel like you have an issue that needs clarification, feel free to contact either me or the TA.
Homework Plagiarism Policy: We will use the Stanford Moss system to check homework submissions for plagiarism. Students found to have engaged in plagiarism will be punished severely, typically earning an automatic F in the course and being reported to the Aggie Honor System.
Homework Late Days: For the homework assignments, you have a total of 5 late days that you can use during the semester. However, a single assignment can be submitted up to 3 days late only, so we can post solutions in a timely fashion. For the purposes of the class, a late day is an indivisible 24-hour unit. Once you exhaust your 5 late days, we will not accept any late submissions.
Quizzes (20%). We'll have around 10 fairly quick quizzes over the course of the semester. These will be around 5-10 minutes max and will mainly be checks that you are keeping up with the readings. Expect 3 or 4 short questions per quiz in a true/false, multiple-choice, or fill-in-the-blank style. Quizzes are closed book, closed notes. If you are keeping up with the readings and participating in class, then I would not expect these quizzes to require special extra study time. Note that these quizzes will cover that day's reading material: so stay on top of the readings ahead of classtime!
Project (25%). For the project, you will work in teams of three to four on your own data science problem. At the end of the semester, you will deliver a Jupyter notebook, project website, and two-minute video summarizing your work. We will hold a Project Showcase during the final exam time on December 13th from 1-3pm.