boilerpipeR (1.2)

Interface to the boilerpipe Java library by Christian Kohlschutter (http://code.google.com/p/boilerpipe/).

https://github.com/mannau/boilerpipeR
http://cran.r-project.org/web/packages/boilerpipeR

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Maintainer: Mario Annau
Author(s): Mario Annau [aut, cre]

License: Apache License (== 2.0)

Uses: rJava, RCurl
Reverse depends: tm.plugin.webmining

Released about 6 years ago.