Xml Web Source

If a web source is a well-formed xml document, then you must set the content attribute to xml: this way Curiosity won't apply Tidy before scraping that source.

<webSource name="aSource" content="xml">
    ...
</webSource>

Curiosity will assume that the documents retrieved during the scraping steps will have xml content too.

If a screping step apply to a document which has a different content wrt the one specified for the whole web source, you need to properly set the content attribute for that step:

<webSource name="aSource" content="xml">
    <nextstepsList>
        <slot level="1" content="html">
            ...
        </slot>
        ...
    </nextstepsList>
    ...
</webSource>

Of course, it may also happen that the web source has html content (which is the default) but some steps apply to xml content:

<webSource name="aSource">
    <nextstepsList>
        <slot level="1" content="xml">
            ...
        </slot>
        ...
    </nextstepsList>
    ...
</webSource>

next