Posts with the "data extraction" tag
Parsing Metadata with R - A Package Story
October 9, 2018
Every R package has its story. Some packages are written by experts, some by novices. Some are developed quickly, others were long in the making. This is the story of jstor, a package which I developed during my time as a student of sociology, working in a research project on the scientific elite within sociology. Writing the package has taught me many things (more on that later) and it is deeply gratifying to see, that others find the package useful.
Exploring European attitudes and behaviours using the European Social Survey
June 14, 2018
Introduction I never thought that I’d be programming software in my career. I started using R a little over 2 years now and it’s been one of the most important decisions in my career. Secluded in a small academic office with no one to discuss/interact about my new hobby, I started searching the web for tutorials and packages. After getting to know how amazing and nurturing the R community is, it made me want to become a data scientist.
Nomisr - Access 'Nomis' UK Labour Market Data
May 8, 2018
I’m excited to announce a new package for accessing official statistics from the UK. nomisr is the R client for the Nomis database. Nomis is run by Durham University on behalf of the UK’s Office for National Statistics (ONS), and contains over a thousand datasets, primarily on the UK labour market, census data, benefit spending and general economic activity. Registration is optional, although registration and the use of an API key allows for larger queries without the risk of being timed out or rate limited by the API.
Lessons Learned from rtika, a Digital Babel Fish
April 25, 2018
The Apache Tika parser is like the Babel fish in Douglas Adam’s book, “The Hitchhikers’ Guide to the Galaxy” 1. The Babel fish translates any natural language to any other. Although Tika does not yet translate natural language, it starts to tame the tower of babel of digital document formats. As the Babel fish allowed a person to understand Vogon poetry, Tika allows an analyst to extract text and objects from Microsoft Word.
Exploratory Data Analysis of Ancient Texts with rperseus
December 5, 2017
Introduction When I was in grad school at Emory, I had a favorite desk in the library. The desk wasn’t particularly cozy or private, but what it lacked in comfort it made up for in real estate. My books and I needed room to operate. Students of the ancient world require many tools, and when jumping between commentaries, lexicons, and interlinears, additional clutter is additional “friction”, i.e., lapses in thought due to frustration.