rOpenSci | Better Code, Without Any Effort, Without Even AI

Better Code, Without Any Effort, Without Even AI

We are experiencing a programming revolution, with the democratization of artificial intelligence, but also with the creation and improvement of more traditional software tools to improve your code: local, free, deterministic.

In this post, we will introduce you to

  • 📦 lintr, by Michael Chirico and many others, an R package that detects many ways to improve your code;
  • 💻$ Air, by Lionel Henry and Davis Vaughan, a fast CLI (command-line interface) for formatting R code automatically and almost instantly;
  • 💻$ jarl, by Etienne Bacher, another fast CLI (command-line interface) tool to find and automatically fix lints;
  • 📦 flir, by Etienne Bacher, an R package to efficiently rewrite patterns of code, either built-in ones or custom ones.

With these four wonderful tools, you can effortlessly improve your code, your colleagues’ code… and even code proposed by AI. With a bit more effort, you might even internalize best practice and write better code from the get go in the future!

🔗 An example script

Let’s start with a script containing a few problems… Can you spot them?

lleno <-!any(is.na(x))
ok<- !(x[1] == y[1])
if (ok) z<- x +  1
if (z>3) stop("ouch")

🔗 The R console vs the terminal

Note that in this post, some tools are used in the R console, but others are used in the terminal, that you might also know as command line or shell.

🔗 Learn what to improve with the {lintr} R package 📦

A first instinct might be to run the lintr package on the script. The lint() function performs static analysis and highlights potential problems in your R code, including formatting and programming suggestions.

lintr::lint("test.R", linters = lintr::all_linters())
#> index.Rmd:148:32: warning: [nonportable_path_linter] Use file.path() to construct portable file paths.
#> flir::fix("test.R", linters = "flir/rules/custom/stop_abort.yml")
#>                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#> test.R:1:7: style: [infix_spaces_linter] Put spaces around all infix operators.
#> lleno <-!any(is.na(x))
#>       ^~
#> test.R:1:10: warning: [any_is_na_linter] anyNA(x) is better than any(is.na(x)).
#> lleno <-!any(is.na(x))
#>          ^~~~~~~~~~~~~
#> test.R:2:3: style: [infix_spaces_linter] Put spaces around all infix operators.
#> ok<- !(x[1] == y[1])
#>   ^~
#> test.R:2:6: warning: [comparison_negation_linter] Use x != y, not !(x == y).
#> ok<- !(x[1] == y[1])
#>      ^~~~~~~~~~~~~~~
#> test.R:2:11: style: [implicit_integer_linter] Use 1L or 1.0 to avoid implicit integers.
#> ok<- !(x[1] == y[1])
#>          ~^
#> test.R:2:19: style: [implicit_integer_linter] Use 1L or 1.0 to avoid implicit integers.
#> ok<- !(x[1] == y[1])
#>                  ~^
#> test.R:3:10: style: [infix_spaces_linter] Put spaces around all infix operators.
#> if (ok) z<- x +  1
#>          ^~
#> test.R:3:19: style: [implicit_integer_linter] Use 1L or 1.0 to avoid implicit integers.
#> if (ok) z<- x +  1
#>                  ~^
#> test.R:4:6: style: [infix_spaces_linter] Put spaces around all infix operators.
#> if (z>3) stop("ouch")
#>      ^
#> test.R:4:8: style: [implicit_integer_linter] Use 3L or 3.0 to avoid implicit integers.
#> if (z>3) stop("ouch")
#>       ~^
#> test.R:4:10: warning: [condition_call_linter] Use stop(., call. = FALSE) not to display the call in an error message.
#> if (z>3) stop("ouch")
#>          ^~~~~~~~~~~~

We therefore get alerts about

  • styling: space around infix operators for instance; implicit integer.
  • performance: anyNA(x) is better than any(is.na(x)).

Since lintr has been around for a long time, it has an impressive collection of rules, the “linters”. Even reading their documentation can teach you a lot, especially as the list grows over time!

Now, based on these alerts, how could we improve the code?

🔗 Format with Air 💻$

Air is software which automatically formats your R code according to a set of rules.

In the terminal:

air format test.R

And this returns:

lleno <- !any(is.na(x))
ok <- !(x[1] == y[1])
if (ok) {
  z <- x + 1
}
if (z > 3) {
  stop("ouch")
}

Now, the spacing in the code is regular! The if condition is furthermore formatted on three lines instead of only one. Overall, the code is easier to read because it now follows popular conventions.

Note that lintr and Air might have conflicting advice on styling: you can deactivate lintr’s styling related rules if you use Air.

🔗 Improve with the new jarl CLI! 💻$

The jarl CLI lints and fixes your code, and like lintr, identifies potential problems, but unlike lintr, jarl also applies fixes!

In the terminal:

jarl check test.R --fix
lleno <- !anyNA(x)
ok <- !(x[1] == y[1])
if (ok) {
  z <- x + 1
}
if (z > 3) {
  stop("ouch")
}

any(is.na(x)) was automatically replaced with anyNA(x)!

The jarl CLI is as fast for checking and fixing lints as Air is for styling. Furthermore, because it is a simple binary that does not need R to run, it’s quicker to install on continuous integration than an R package (that needs R to be installed for instance).

However, since jarl is newer than lintr, it supports fewer rules for now.

🔗 Improve with the {flir} R package 📦

You could complement the usage of lintr, Air and jarl with flir which is better at custom rules. For instance, what if you’d prefer your codebase to use rlang::abort() instead of stop()?

We first run

flir::setup_flir(getwd())

We save the file below under flir/rules/custom/stop_abort.yml.

id: stop_abort-1
language: r
severity: warning
rule:
  pattern: stop($$$ELEMS)
fix: rlang::abort(paste0(~~ELEMS~~))
message: Use `rlang::abort()` instead of `stop()`.

We then run

flir::fix("test.R", linters = "flir/rules/custom/stop_abort.yml")
#>  Going to check 1 file.
#>  Fixed 1 lint in 1 file.
lleno <- !anyNA(x)
ok <- !(x[1] == y[1])
if (ok) {
  z <- x + 1
}
if (z > 3) {
  rlang::abort(paste0("ouch"))
}

The call to stop() was automatically replaced. Now, we might want to then manually remove the useless paste0(), but we’re already closer to an ideal script.

🔗 Integrating these tools into your workflow

Locally, you can use those tools as needed. For instance, when inheriting an older project, the first thing I do is renovate the project by applying these tools. A real game changer is using the integration of these tools with your IDE. For instance, I have Positron set up so that Air runs on my scripts when I save them. The jarl CLI also provides integrations with IDEs.

You can also use those tools on continuous integration. For instance, a useful workflow might be to suggest formatting changes on Pull Requests. The use of suggestions rather than a direct commit means the contributor gets a chance to learn about the improvements.

Another aspect to consider is whether you want flir and jarl to make the changes as opposed to alerting you about them. Which you choose depends on the context, for example, you might learn more by doing the changes yourself. In any case, having a proper look at modifications before committing them is important!

🔗 What about artificial intelligence?

Artificial intelligence can be useful in coding applications but…

  • The best LLMs are not local;

  • They cost money and are slower;

  • They are not deterministic so you don’t necessarily get the same result every time you use one of them;

  • Their usage may entail some ideological and ethical problems which can be concerning or unattractive.

Ultimately, for such alerts and fixes, you just don’t need to use an LLM… Air, flir, jarl, and lintr already do an excellent job, are faster, and are free (not to mention FOSS)!

🔗 Conclusion

You can improve your code without effort, without even AI, using:

  • {lintr} to signal “bad” patterns, including customizable ones;

  • Air, to efficiently reformat code;

  • jarl, to detect and fix “bad” patterns;

  • {flir}, to efficiently refactor code with custom rules.

As with all tools that modify your code, their usage is best complemented by a human review.