boilerpipeR (1.3)

Interface to the Boilerpipe Java Library.

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe ( Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Maintainer: Mario Annau
Author(s): See AUTHORS file.

License: Apache License (== 2.0)

Uses: rJava, RCurl
Reverse depends: tm.plugin.webmining

Released almost 5 years ago.