What’s that? You’ve heard of R? You use R? You develop in R? You know someone else who’s mentioned R? Oh, you’re breathing? Well, in that case, welcome! Come join the R community!
We recently had a group discussion at rOpenSci’s #runconf17 in Los Angeles, CA about the R community. I initially opened the issue on GitHub. After this issue was well-received (check out the emoji-love below!), we realized people were keen to talk about this and decided to have an optional and informal discussion in person.
...charlatan makes fake data.
Excited to annonunce a new package called charlatan
. While perusing
packages from other programming languages, I saw a neat Python library
called faker
.
charlatan
is inspired from and ports many things from Python’s
https://github.com/joke2k/faker library. In turn, faker
was inspired from
PHP’s faker,
Perl’s Faker, and
Ruby’s faker. It appears that the PHP
library was the original - nice work PHP.
Use cases
What could you do with this package? Here’s some use cases:
...Two years ago at #runconf15, there was a great discussion about best practices for organizing R-based analysis projects that yielded a nice guidance document describing research compendia. Compendia, as we described them, were minimal products of reproducible research, using parts of R package structure to organize the inputs, analyses, and outputs of research projects.
Since then, we’ve seen more examples and models of research compendia emerge (the organization of such projects is something of an obsession for some of the community). In parallel, there’s been much progress on a number of fronts with R packages: rOpenSci’s package review process has expanded and we’ve worked out many kinks. Infrastructure for automated testing of package code has been developed and field tested. So at #runconf17, we wanted to see how much of this progress in review, testing, and automation could apply to research compendia.
...Textual data and natural language processing are still a niche domain within the R ecosytstem. The NLP task view gives an overview of existing work however a lot of basic infrastructure is still missing. At the rOpenSci text workshop in April we discussed many ideas for improving text processing in R which revealed several core areas that need improvement:
Participants also had many good suggestions for C/C++ libraries that text researchers in R might benefit from. Over the past weeks I was able to look into these suggestions and work on a few packages for reading and analyzing text. Below is an update on new and improved rOpenSci tools for text processsing in R!
...And finally, we end our series of unconf project summaries (day 1, day 2, day 3, day 4).
mwparser
Summary: Wikimarkup is the language used on Wikipedia and similar projects, and as such contains a lot of valuable data both for scientists studying collaborative systems and people studying things documented on or in Wikipedia. mwparser parses wikimarkup, allowing a user to filter down to specific types of tags such as links or templates, and then extract components of those tags.
...