Essential Tools for Data Scientists | Spring 2020

Welcome to Data Science Society at Berkeley’s very own DeCal: Essential Tools for Data Scientists! This course is geared towards exposing students to essential data science skills that are demanded in industry and are meant to be taken as a follow-up or alongside Data 8. The course covers the bits of data science and machine learning that aren’t traditionally taught in the classroom like advanced Pandas and Seaborn, and visualization dashboards that will challenge you, sharpen your skills and elevate you in the internship game. In this course, you will learn everything you need to know from the ground up from an introduction on Python, to software like Excel and Tableau, to other essential skills through a personalized data science project that includes data cleaning, visualization, statistical analysis, and machine learning.

 

Logistics

  • Time: Mondays, 6:30 - 8:30 PM
  • Location: Moffitt 101
  • Email: decal@dss.berkeley.edu
  • Office Hours: Calendar
  • Piazza: INFO 98
  •  

    Prerequisites

    There are no formal prerequisites for this course. It is recommended to have some level of basic programming experience in Python, but not required. We want you to learn as much as possible and will help you get up to speed quickly!

     

    Texts

    There is no textbook. Content is created by instructors and TAs.

     

    Grading

  • Attendance: 20% (2 Drops)
  • Labs: 20% (2 Drops)
  • Final Project: 50%
  • Reading Quizzes: 10% (2 Drops)
  •  

    Late Submission Policy

    10% of the total assignment grade will be deducted for every day the project is turned in late. No late lab submissions are allowed.

     

    Weekly Attendance Form

     

    Deadlines

  • Quizzes are released on bCourses on Friday and are due the following Monday before lecture at 6:30 PM.
  • Labs are usually released on Tuesday on DataCamp or through Piazza and are due Sunday midnight unless explicitly stated otherwise.
  • The attendance form (linked above) will be released before lecture on Mondays (6:30 PM) and will be due before the next lecture on the following Monday.
  •  

    Final Project

    The first module of the final project will be due on April 10th at 11:59 PM. There is no submission for the checkpoint but we expect that you will have cleaned your data set and checked in with a TA via either office hours or email. The attached file can be found below. Please download the file as a Python notebook and import it into Datahub so you are able to work on the project in a Jupyter notebook. To import the file into Datahub, simply go to datahub.berkeley.edu, click "Upload" in the top right corner and upload the file! Once you are done, please re-download your completed file as a PDF (File --> Download as PDF), then submit it to us (instructions for how to submit the module are at the end of the Jupyter notebook).

    Who's my TA?
    iPython Notebook

     

     

    Schedule

    Week Date Lecture Resources Assignments Lecturers
    1
    Monday, 2/10/20 Welcome, Logistics & Python Bootcamp Slides None All
    2
    Monday, 2/14/20 Data Manipulation and Wrangling: Pandas Part 1 Slides DataCamp (Chapter 1), Quiz Marta, Luke
    3
    Monday, 3/2/20 Data Manipulation and Wrangling: Pandas Part 2 Slides DataCamp (Chapter 2), Quiz Avik
    4
    Monday, 3/9/20 SQL Slides DataCamp (Chapter 1) Naman, Kanu
    5
    Monday, 3/16/20 Statistical Models: NumPy for Linear and Logistic Regression Slides, YouTube None Nikhil, Elton, Jae
    6
    Monday, 3/30/20 Data Visualization and Exploratory Data Analysis Slides, YouTube DataCamp (Optional) (Chapter 1), Quiz Dhruv, Uma
    7
    Monday, 4/6/20 Speadsheets Slides, YouTube DataCamp (Optional) (Chapters 1 and 2) Avik, Gayatri, Varun, Kate, Naman
    8
    Monday, 4/13/20 Tableau Slides, YouTube DataCamp (Optional) (Chapter 1) Jae, Naman, Kanu, Kate, Avik, Elton, Varun, Gayatri
    9
    Monday, 4/20/20 Exploring Seaborn In-Depth Notebook, YouTube DataCamp (Optional) (Chapter 1) Kanu
    10
    Monday, 4/27/20 Special Topic Guest Leture YouTube Dhruv

    DeCal Course Staff

    Facilitators


    Avik Sethia
    aviksethia99@berkeley.edu

    Varun Mittal
    varunmittal@berkeley.edu

    Gayatri Babel
    gbabel@berkeley.edu

    Teaching Assistants


    Kanu Grover
    Hi! My name is Kanu Grover and I’m a freshman studying CS and data science! My hobbies are playing table tennis way too much, acting (stand up), and doing really spontaneous things with my friends for the heck of it!

    Jae Hee Koh
    Hi, I'm Jae Hee, a first-year data science major from Korea and Singapore. I love solving puzzles, traveling, and listening to music. Feel free to reach out to me about anything!

    Naman Patel
    Hi! I am a sophomore data science and economics double major from Fremont, CA. I enjoy playing and watching all sports as well as listening to music. Feel free to email me with any questions or comments about Cal, data science, news, etc!

    Uma Krishnaswamy
    Hey! I am a sophomore data science and economics double major from San Francisco. Outside of class, you can catch me playing volleyball, doing a crossword puzzle on the glade, and making matcha lattes at home!

    Dhruv Krishnaswamy
    Hey I'm a sophomore economics and data science double major with a theater minor. You can always find me on Moffitt’s 4th floor sipping an iced mocha. I’m super excited to meet all of you!

    Elton Chan
    Hi!!! I’m a freshman data science and economics double major. I enjoy basketball, swimming and anime. I love puzzles and card games. Feel free to email me questions or just to chat!

    Luke Liu
    Hey! I am a freshman CS major and I am a proud Canadian from Toronto, Ontario. Outside of DSS, you will find me teaching CSM sections, playing basketball/ping pong at RSF, playing card games or watching YouTube/Netflix. I am open to all food recommendations!

    Kate Miller
    Yo yo yo, I'm a junior MCB major and aspiring graduate student currently curing cancer. Ask me about proteins, or pandas I guess.

    Nikhil Cukkemane
    Hello everyone my name is Nikhil Cukkemane and I am a junior from New Jersey studying economics and data science. I love to play basketball so catch me at RSF. I also am very interested in sports analytics and research in the economics field. Please reach out to me if you have interest in any of these areas.

    Marta Carrizo Vaque
    Hi! I am a senior double majoring in math and physics from Spain. I love fantasy books, skiing, playing the piano and trying any restaurant around Berkeley that I haven't tried. Feel free to reach out and ask me any questions.