katana.units.pdf.pdfimages — pdfimages - Extract Images

Extract PDF images

This unit retrieves the images included in a PDF document, using the pdfimages command-line tool. The syntax is:

pdfimage -png <target_path> <pdfimages_directory>

The unit inherits from katana.unit.FileUnit to ensure the target is a PDF file.

class katana.units.pdf.pdfimages.Unit(*args, **kwargs)

Bases: katana.unit.FileUnit

BLOCKED_GROUPS = ['pdf']

PDFs shouldn’t come out of this. So no reason to look.

GROUPS = ['pdf', 'pdfimages']

These are “tags” for a unit. Considering it is a pdf unit, “pdf” is included, and the name of this unit “pdfimages”.

PRIORITY = 25

Priority works with 0 being the highest priority, and 100 being the lowest priority. 50 is the default priorty. This unit has a high priority if this is detected…

RECURSE_SELF = False

Again no PDF from this. So recursion is silly.

evaluate(case: Any) → None

Evaluate the target. Run pdfimages on the target and recurse on any new found files.

Parameters:case – A case returned by enumerate. For this unit, the enumerate function is not used.
Returns:None. This function should not return any data.