Posts with the "text mining" tag
Parsing Metadata with R - A Package Story
October 9, 2018
Every R package has its story. Some packages are written by experts, some by novices. Some are developed quickly, others were long in the making. This is the story of jstor, a package which I developed during my time as a student of sociology, working in a research project on the scientific elite within sociology. Writing the package has taught me many things (more on that later) and it is deeply gratifying to see, that others find the package useful.
Lessons Learned from rtika, a Digital Babel Fish
April 25, 2018
The Apache Tika parser is like the Babel fish in Douglas Adam’s book, “The Hitchhikers’ Guide to the Galaxy” 1. The Babel fish translates any natural language to any other. Although Tika does not yet translate natural language, it starts to tame the tower of babel of digital document formats. As the Babel fish allowed a person to understand Vogon poetry, Tika allows an analyst to extract text and objects from Microsoft Word.
fulltext v1: text-mining scholarly works
January 17, 2018
The problem Text-mining - the art of answering questions by extracting patterns, data, etc. out of the published literature - is not easy.
It’s made incredibly difficult because of publishers. It is a fact that the vast majority of publicly funded research across the globe is published in paywall journals. That is, taxpayers pay twice for research: once for the grant to fund the work, then again to be able to read it.