katana.units.web.spider
— Spider Webpages¶
Spider web pages
This unit will look through all of the different links on a website and queue each of them as a new target, or link to explore.
This unit inherits from katana.units.web.WebUnit
as that contains
lots of predefined variables that can be used throughout multiple web units.
Warning
This unit automatically attempts to perform malicious actions on the target. DO NOT use this in any circumstances where you do not have the authority to operate!
-
class
katana.units.web.spider.
Unit
(*args, **kwargs) Bases:
katana.units.web.WebUnit
-
BAD_MIME_TYPES
= ['application/octet-stream'] Avoid mime types that are downloadable files.
-
PRIORITY
= 20 Priority works with 0 being the highest priority, and 100 being the lowest priority. 50 is the default priorty. This unit has a somewhat higher priority.
-
PROTECTED_RECURSE
= True We don’t really want to spider on EVERYTHING and start an infinite loop.. We can protect against this once we create a target object and start to “keep track” of links we find in one specific website target
-
evaluate
(case: Any) Evaluate the target. Look for links inside of the target web page and reach out to each of them, queueing them as a new target.
Parameters: case – A case returned by enumerate
. For this unit, theenumerate
function is not used.Returns: None. This function should not return any data.
-
-
katana.units.web.spider.
bad_starting_links
= [b'#', b'javascript:', b'https://', b'http://', b'//'] Avoid inline JavaScript, anchors, and external links
-
katana.units.web.spider.
has_a_bad_start
(link) This is a convenience function just to avoid bad links above