## rOpenSci Educators Collaborative: What Are The Challenges When Teaching Science With R?

[blog] unconf18 projects 1: mchtoolbox, pkginspector, dataspice, rOpenSciEd, rOpenInterviewshttps://t.co/9vdwm4o6hh#runconf18 #rstats pic.twitter.com/u2RHJnl3uJ

— rOpenSci (@rOpenSci) June 5, 2018

This first post aims to summarize the main challenges that educators face, as a tool to help them think through the decisions they make about their course materials. The second post explains what makes for a good educational resource which can address these shared challenges. The final post sketches out the main things that educators can do in the future to create and share teaching materials and—even more important—to foster a community of practice around teaching science with R.

The primary challenge instructors face is how to keep the focus on their subject matter and not on R. In most courses that use R, some subject like ecology, statistics, or history is the main discipline to be taught, and teaching R is only a means to that end. In other words, we are not teaching courses about R programming or programming in general, as one might in computer science. Rather, we are teaching R so that we can teach specific skills, such as modeling, manipulating, or visualizing data. And we teach those skills so that we can teach students how to think as ecologists, psychologists, statisticians, or historians. Yet much of the day-to-day work of teaching and learning is necessarily focused on R, since in these domains R is not just a programming language but actually a language of thought and expression within one’s discipline. Educators therefore must balance teaching R with teaching their subject.

This challenge leads into the fundamental question of course design: scoping and sequencing. What things should be taught, and in what order? In a course that teaches a discipline with R, questions of scoping necessarily involve which parts of R should be taught. Will the course be primarily focused on the Tidyverse, or will it mostly use base R? If the Tidyverse is taught, at what point are base R conventions taught so that students have access to the whole R ecosystem? Much of this decision depends on the characteristic data structures for the field of study. For example, can tabular data structures (i.e., data frames) be used? Or must students also be taught how to deal with matrices, lists, or some other data structure less amenable to the Tidyverse?

In R-based courses, much of students’ time is spent getting up to speed with R: setting up a development environment, learning the syntax, understanding basic computing concepts (file paths are a perpetual problem), and inevitably dealing with bugs in their code. New students can be easily discouraged by seemingly impenetrable error messages, which can derail their progress in learning the scientific concepts that the course is actually intended to teach. Instructors have to decide how best to spend their limited time in and out of class dealing with such issues. A particular challenge for educators is developing materials and techniques for helping students get past the noise of technical problems to the actual signal of the subject at hand.

Educators must think about assessment: evaluating student work in progress during the semester, evaluating the overall ability to apply the course material by the end of the semester, and evaluating the overall effectiveness of the course so that it can be revised for future iterations. Evaluating code can be difficult, and often requires a substantial infrastructure to make sure that instructors can run and comment on student’s code.

Finally, educators must find a way to get students from basic skills (e.g., data manipulation and visualization) to conceptual problems (e.g., statistical modeling, social scientific thinking). But even more, educators must find a way to get students from the controlled environment of the classroom to independent application of materials. In the classroom, students are often given defined data, instructed which methods to use, and given required outcomes. The format of many tutorials inclines students to following steps rather than understanding concepts. These classroom recipes may fall short in real life where students have to use real (and often messy) datasets, to choose the correct methods to analyze the data, and to define their own outcomes. The ultimate aim of our courses, and so the most pressing problem of our pedagogy, is teaching students how to flexibly and independently apply the concepts taught in our courses.

In the next post we will present some examples of open educational materials that are successful at addressing these common challenges, and distill the desirable characteristics of those materials we find particularly useful and inspirational for teaching science with R.