katana.units.web.robots
— Check robots.txt¶
Check robots.txt
This unit will look through all of the different robots.txt entries on a webpage and look for a flag.
It passes a User-Agent to act as a Google-bot crawler.
This unit inherits from katana.units.web.WebUnit
as that contains
lots of predefined variables that can be used throughout multiple web units.
Warning
This unit automatically attempts to perform malicious actions on the target. DO NOT use this in any circumstances where you do not have the authority to operate!
-
class
katana.units.web.robots.
Unit
(*args, **kwargs) Bases:
katana.units.web.WebUnit
-
GROUPS
= ['web', 'robots', 'robots.txt'] These are “tags” for a unit. Considering it is a Web unit, “web” is included, as well as the name of the unit, “robots”.
-
PRIORITY
= 30 Priority works with 0 being the highest priority, and 100 being the lowest priority. 50 is the default priorty. This unit has a somewhat higher priority.
-
RECURSE_SELF
= False This unit should not recurse into itself. That would be silly.
-
enumerate
() Yield cases. This function will look at robots.txt page and return each page, to be examined by the
evaluate
function.Returns: A generator, yielding a string for each URL in robots.txt.
-
evaluate
(case) Evaluate the target. Reach out to every entry in the robots.txt file and look for flags.
Parameters: case – A case returned by enumerate
. For this unit, theenumerate
function will yield each URL in the robots.txt fileReturns: None. This function should not return any data.
-
-
katana.units.web.robots.
headers
= {'User-Agent': 'Googlebot/2.1'} Include these headers in the unit, to simulate action as the Googlebot crawler.