Lookout

Lookout (a Crowbar clone) is a separate tool aimed at helping Curiosity in scraping DHTML or ajax powered sites - when it's not possible to acquire the data directly from Xml or Json sources.

Lookout is a desktop application featuring an embedded Internet Explorer, and an embedded web server.

By means of a RESTful service, you can request Lookout to load in IE a DHTML powered page, wait for a given timeout to expire, and then send back to you the html source of the content dynamically rendered.

The server accepts three parameters: the url of the page, the timeout expressed in milliseconds, and the index of the target frame (if needed).

You can configure the port of the web server and the default timeout from the configuration file lookout.exe.config.

Please be aware that, even if suitable for everyday usage, Lookout is currently in alpha stage: you can experience bugs, and furthermore some important features are missing (such as handling of multiple parallel requests).

In order to integrate Lookout with Curiosity, set the urlSource of a given webSource to the address of the Lookout service instantiated with suitable parameters.

Download Lookout

Requirements:

Download here  Lookout version 0.2.2.

After having installed Lookout, please read and agree the license (license.txt).

next