The Report Collector

In order to monitor Curiosity executions results, you can of course look at the log file log.txt; morevoder, if you want, you can further modify the logging features by editing Curiosity.exe.nlog - using NLog syntax.

Nonetheless, for every execution Curiosity can provide a report of the activity for every source, with the following essential information:

  • start time
  • duration time
  • if the data was up to date wrt the history
  • the number of new items discovered
  • error messages

The reports produced by the Report Collector are handled as usual by means of providers, and the options must be set in the report subnode of the options node in the configuration file:

<curiosity>
    <options>
        <report>
            <aProviderTag>
                <aProviderParameter>
                    ...
                </aProviderParameter>
            </<aProviderTag>
           
            <activate name="aNamedProvider" />
        </report>
    </options>
<curiosity>

The xsl file used in order to create the final report is CuriosityReport.xslt.

In the report, which has a tabular format, sources with troubles will be highlighted: in particular, they will be considered having scraping troubles all the web sources for which the data isn't up to date but no new items have been discovered. In fact, in such cases (if net errors didn't occur), it is likely that the page source has changed, and so the xpaths aren't working no more.

If the reportOnlyErrors attribute is set to true, then the Report Collector output is processed by providers only when scraping (or net) errors occur:

<report reportOnlyErrors="true">
    ...
</report>

Moreover, if you do not want scraping errors to be reported for a given web source, you can set to false the warnOnBrokeringError attribute for that source: this could be useful when you are pretty sure that the related scraping rules won't change, as in the case of standardized Xml web sources (such as RSS).

next