Why learn data science?

It is pretty clear that almost every field today needs data-savvy researchers. It has become something as basic as driving a car in our intellectual work
– Saul Perlmutter (2011 Nobel Laureate in Physics)

Data science is a rapidly growing field and is becoming an integral part of nearly all fields of study, including the Earth and environmental sciences. In the past few years, programs specifically focused on data analytics for Earth and environmental sciences have been created or expanded at many prominent institutions (Stanford, Berkeley, U. Chicago , and UC Boulder to name a few).

The demand in the academic, public, and private sectors for environmental scientists with skills is data analytics has been growing and job and research opportunities are strong. The articles below provide a nice overview on the rising importance of data science in the environmental fields.

 

Interactive figures are easy to make

Programming in R

RStudio IDE

We will write and run our code within the RStudio Integrated Development Environment (IDE). RStudio allows us to write, excute (run), and debug code, along with view output and plots, within a single integrated environment.

Getting help in R

There are lots of resources for getting help in R. In addition to me and your classmates, there is a massive amount of help available freely online (a quick Google search typically yields an answer to just about any R question). There are also many websites devoted to teaching R.

Visit Union College’s very own Center for Data Analytics! It is located in Wold 010 and hosts a data science help desk, where expert students and faculty who staff the help desk can help answer your questions. The Center for Data Analytics also host regular training workshops and guest speakers.

You can also get help directly in R Studio by typing ?term_of_interest_here in the R Console or by searching in the Help bar on the right-hand side of your RStudio window.

Your textbooks R for Data Science and ModernDive are also amazing resources and freely available online in a nice searchable format.

R Cheatsheets are also amazing resources that succintly summarize many different aspects of R. I will hand these out throughout the term, though you can find them freely available here.

Syllabus and course logistics

  • Course logistics
    • You will find all course related materials (e.g. schedule/syllabus, assignments, notes, course policies,…) at our class webpage
    • Assignments will be submitted through Nexus. You MUST use the specified naming convention for the files you submit.
    • Attendance and on-time arrival is critical and your grade will be affected by unexcused absences and late arrivals! You will quickly fall behind if you miss class or arrive late.
    • We will use computers during class and lab, however you should ONLY use your computer for class related activities. This means that you should close unrelated material prior to the start of class and that your should not check email, news, Instagram, Twitter,… during class.
    • Carefully read the syllabus. It has important information about grading, course policies, and tips for success in the class.
    • In each class we will work through code examples in R. I will post my R Notebooks (which act as lecture notes/slides) to the course website. DO NOT simply copy and paste the code from these notebooks. You will not learn the material by doing this. You should type out the code, think carefully about what it means, run the code, and examine the output before moving on to the next step.
    • Get a binder to hold class notes/handouts. This binder will act as a course specific textbook.
    • You need to be an active learner in this class. Learning programming and data analysis cannot be done passively.
    • You should recognize that there will be challenges and bumps as you are learning how to code. You are not alone in this (everyone even experienced programmers hit roadblocks when they work). There are TONS of resources out there to help you. A quick Google search of whatever problem you’ve hit or something new that you’d like to see an example for, will almost always yield tons of information directly related to your question.