SENSEI Web Scraper is a set of specialized web scrapers used to parse data crawled from the Internet. Given the content of an article web page (not the newspaper's main page) the corresponding parser will get the title, date, text, etc. if possible.
## Installation
Simply import the source code as a project into IDE and reference it where needed.
## Java code example
Given a String with content use an specialized parser to get a List of SimpleItem objects.
For details about SimpleItem check its variables documentation on the source code.
URL url = new URL("http://www.fake-url.com/page1");
String content = "A STRING WITH A WEB PAGE CONTENT";
HTMLDocumentSegmenter seg = new CorriereSegmenter(); // Use the corresponding parser according to the 'content'
List<SimpleItem> result = seg.segment(content, url);
// do whatever you need with result
Unit Tests in src/test/java can also be used as code examples and show some of the parsers features.