Load a News page Url

In order to extract data from a web page, firstly you have to define what is called a Web Source.

In order to do this, you should use the visual tool Curiosity Studio: open the Windows Start Menu, open the Curiosity folder, and choose the CuriosityStudio shortcut.

The Curiosity Studio window will appear (click to enlarge):

Then, in the Document address box type the url of the fake news page that we'll use for this demonstration:

http://www.go-curiosity.com/examples/news.htm

and click the Open button (if you are behind a firewall, you have to first define your proxy settings in curiosity.xml - which is located in the installation folder).

Once the document has been loaded, there will be two possible views:

- in the DOM tab it will be shown the DOM tree:

- in the Browser View tab it will be shown the rendered version of the page:

next