rOpenSci | Forcing Yourself to Make Your Life Easier

Forcing Yourself to Make Your Life Easier

🔗 The general struggle

Something that will make life easier in the long-run can be the most difficult thing to do today. For coders, prioritising the long term may involve an overhaul of current practice and the learning of a new skill. This can be painful for a number of reasons:

  1. We have to admit to ourselves that we’ve been doing things inefficiently (i.e. wasting time). This makes us feel stupid and fosters a sense of missed opportunity: we could’ve done something cool with the time we’d have saved (e.g. vacation).
  2. We’re fond of our existing methods, probably because we’re used to them and they’ve served us pretty well thus far.
  3. The learning we have to do may seem beneath us: “I’m an expert in R, I’ve even written my own package. Surely ‘Version Control for Beginners’ wasn’t intended for the likes of me.” This kind of thought permits us to dismiss good ideas.
  4. We’re tired. Overhauling workflows and learning new skills takes energy and concentration. Today, we’ll go through the motions just like we did yesterday. And hey, no one complained about the work we did yesterday.
  5. We feel like the task is actually beyond us. This is almost never true (so long as we’re good at asking for help).
  6. In the context of a work environment, when we’re concentrated on learning, our apparent output drops (often to zero) until the learning period ends. A bad manager/supervisor who doesn’t appreciate the worth of taking the time to learn how to do things better may not forgive this short-term drop in apparent output.

Something that will make life easier in the long-run can be the most difficult thing to do today.

🔗 One specific struggle: the ImageJ TIFF problem

I’m doing a PhD in image analysis, working with a lot of microscopy images of cells, all of which are in the TIFF format. ImageJ is a nice software for image viewing and processing. R is a nice software for image processing but is not as good as ImageJ for viewing and playing around. For me, as an R enthusiast, it was ideal to do my image processing in R and my viewing in ImageJ.

On CRAN, the tiff and magick packages can both read TIFF files, and on Bioconductor, there’s EBImage. However, all of these packages sometimes struggle with TIFF files written from ImageJ in that they wrongly perceive some images to have only one channel when in fact they have many (channels encode colour information: colour images have 3 channels - red, green and blue - whereas black and white or greyscale images have one channel - grey). Once the images are read into R, I am able to rejig them (with a combination of aperm() and abind()) into the format I want. However, the mistakes that the packages make vary, and thus the images require different rejigs. Every time I wanted to read an image into R, the process was read, check, rejig, check. This is a lot longer than just read. Nonetheless, this lengthy reading process still wasn’t that long in absolute terms: I could read in an image and have it in the format I wanted in about a minute. I processed thousands of images over the following two years (with various image analysis techniques). For each one, I went through read, check, rejig, check . . .

In 2016, I attended Bioconductor’s CSAMA event in Italy. Jenny Bryan was there advocating the tidyverse, happygitwithr and various other good ideas. At that time, I wasn’t much into the R scene, so I’d never heard of Jenny nor of many of the things she mentioned, but I was very taken by her teaching (she grabbed my attention with the tip that Alt+- in RStudio gets you the assignment operator <-). The advice that resonated most was that NOW is the time to do that thing that you know you ought to do eventually. We all know that this is good advice, but that doesn’t mean we don’t need to be told it frequently.

Jenny advised that every regular R user should write a package containing the functions that they always use (rather than source()ing them from some random place atop every .R file). So I learned git, made a GitHub account and wrote my first R package. I decided to start easy, gathering together all of the functions that I use to rename and organise my files. This became (the terribly named) filesstrings (now on CRAN). The most useful thing it can do is to extract numbers from strings with functions like first_number(). filesstrings used Rcpp, so I got comfortable writing R packages that use C++. Having completed filesstrings, I looked more seriously into writing a package that solved my ImageJ TIFF issue. It was clear that I would require the libtiff C library, and thus would need to use R’s C interface, so I looked for documentation of it. The section in Hadley Wickham’s Advanced R and his GitHub repo r-internals are both good resources, but I still feel that this facet of R is under-documented. As a result, I couldn’t really get to grips with R’s C interface (at least not within a week as I’d hoped), so I gave up and returned to read, check, rejig, check.

🔗 Eventually

In the end, I did write ijtiff—the package I had been longing for—using libtiff and R’s C interface. What happened that pushed me over the edge to learn everything I needed to know about C and R’s interface to it and to toil away for six weeks with clang errors and RStudio crashes and valgrind in Linux virtual machines? Nothing in particular. I was doing yet another read, check, rejig, check and I had had enough. I decided to take another crack at ijtiff. This time, with the help of what I’d learned during all of my previous failed attempts, I made it over the line.

So of course I was able to do it all along, it just took longer than I thought (it’s just reading a TIFF file into an array, right?). Most things take longer than we think, so that shouldn’t necessarily discourage us so immediately. Of course I should’ve done it right at the start of my PhD (rather than at the start of my 4th and final year), but for a combination of the silly reasons outlined above, I hesitated. I hesitated for three years. Anyway, it’s done now and I’m very happy with it. My feeling of delight that it’s now done easily trumps the feeling that I wasted time by waiting so long to do it. My R life is more efficient and less frustrating now. To check out ijtiff for yourself, see the CRAN page, the GitHub or the vignette.

With the help of what I’d learned during all of my previous failed attempts, I made it over the line.

🔗 Immediate future

As mentioned previously, there is a lack of accessible information on R’s C interface. The demand on a few experts like Hadley Wickham, Kevin Ushey, Jim Hester etc. to provide resources for the rest of us is huge. Although these people are very generous and well-appreciated, they only have a finite amount of time. As such, I want to contribute to the documentation on R’s C interface.

🔗 Conclusions

  1. Find and heed all of Jenny Bryan’s advice.
  2. If there’s something that will make your life easier in the long-run but seems like a bit of a hurdle right now, do it. If necessary, take the time to explain to others around you why it’s a good idea and do it. If it takes longer than you thought to learn how to do it, don’t conclude from that that you can’t do it, just conclude that it’s not very easy and get all the help you can find.
  3. If it’s a lack of skills that prevents you from doing what you wish you could do, take the time to learn those skills. They’ll make your life easier again and again in future.
  4. If you figure out how to do something that you wish was made easier in documentation, please—for the likes of me—contribute that new knowledge to existing documentation.

🔗 Thanks

  • Thanks to rOpenSci for their very helpful review of the ijtiff package. This type of review where the reviewers actively help you as well as objectively evaluating your work is a revelation. Getting the advice and help of people like Jon Clayden and Jeroen Ooms was invaluable. Thanks also to editor Scott Chamberlain and to community manager Stefanie Butland who is helping me with this blog.
  • Thanks to Jenny Bryan for great advice.
  • Thanks to Hadley Wickham, Kevin Ushey, Jim Hester and others for making R usable for the rest of us.
  • Thanks to my supervisor Sergi Padilla-Parra and the other members of his group for having great ideas, believing in my ideas and making my PhD fun and stress free.
  • Thanks to my lovely fiancée Naomi for her continued support throughout my PhD and for editing my lacklustre writing.