katana.units.web.spider — Spider Webpages

Spider web pages

This unit will look through all of the different links on a website and queue each of them as a new target, or link to explore.

This unit inherits from katana.units.web.WebUnit as that contains lots of predefined variables that can be used throughout multiple web units.

Warning

This unit automatically attempts to perform malicious actions on the target. DO NOT use this in any circumstances where you do not have the authority to operate!

class katana.units.web.spider.Unit(*args, **kwargs)

Bases: katana.units.web.WebUnit

BAD_MIME_TYPES = ['application/octet-stream']

Avoid mime types that are downloadable files.

PRIORITY = 20

Priority works with 0 being the highest priority, and 100 being the lowest priority. 50 is the default priorty. This unit has a somewhat higher priority.

PROTECTED_RECURSE = True

We don’t really want to spider on EVERYTHING and start an infinite loop.. We can protect against this once we create a target object and start to “keep track” of links we find in one specific website target

evaluate(case: Any)

Evaluate the target. Look for links inside of the target web page and reach out to each of them, queueing them as a new target.

Parameters:case – A case returned by enumerate. For this unit, the enumerate function is not used.
Returns:None. This function should not return any data.
katana.units.web.spider.bad_starting_links = [b'#', b'javascript:', b'https://', b'http://', b'//']

Avoid inline JavaScript, anchors, and external links

katana.units.web.spider.has_a_bad_start(link)

This is a convenience function just to avoid bad links above