KItinerary::ScriptExtractor

Search for usage in LXR

KItinerary::ScriptExtractor Class Reference

#include <scriptextractor.h>

Inheritance diagram for KItinerary::ScriptExtractor:

Public Member Functions

bool canHandle (const ExtractorDocumentNode &node) const override
 
ExtractorResult extract (const ExtractorDocumentNode &node, const ExtractorEngine *engine) const override
 
const std::vector< ExtractorFilter > & filters () const
 
QString mimeType () const
 
QString name () const override
 
QString scriptFileName () const
 
QString scriptFunction () const
 

Detailed Description

A single unstructured data extraction rule set.

These rules are loaded from JSON meta-data files in a compiled-in qrc file, or from $XDG_DATA_DIRS/kitinerary/extractors.

Meta Data Format

The meta-data files either contain a single JSON object or an array of JSON objects with the following content:

  • mimeType: The MIME type of the extractor, text if not specified.
  • filter: An array of filters that are used to select this extractor for a given input file.
  • script: A JavaScript file to execute.
  • function: The entry point in the above mentioned script, main if not specified.

The following extractor types are supported:

  • text/plain: plain text, the argument to the script function is a single string.
  • text/html: HTML documents, the argument to the script function is a KItinerary::HtmlDocument instance.
  • application/pdf: PDF documents, the argument to the script function is a KItinerary::PdfDocument instance.
  • application/vnd.apple.pkpass: Apple Wallet passes, the argument to the script function is a KPkPass::Pass instance.
  • internal/event: iCalendar events, the argument to the script function is a KCalendarCore::Event instance.

Filter definitions have the following field:

  • mimeType: The MIME type of the document part this filter can match against.
  • field: The name of the field to match against. This can be a field id in a Apple Wallet pass, A MIME message header name, a property on a Json-LD object or an iCal calendar or event. For plain text or binary content, this is ignored.
  • match: A regular expression that is matched against the specified value (see QRegularExpression).
  • scope: Specifies how the filter should be applied relative to the document node that is being extracted. One of Current, Parent, Children, Ancestors, Descendants (Current is the default).

Example:

[
{
"mimeType": "application/pdf",
"filter": [ { "field": "From", "match": "@swiss.com", "mimeType": "message/rfc822", "scope": "Ancestors" } ],
"script": "swiss.js",
"function": "parsePdf"
},
{
"mimeType": "application/vnd.apple.pkpass",
"filter": [ { "field": "passTypeIdentifier", "match": "pass.booking.swiss.com", "mimeType": "application/vnd.apple.pkpass", "scope": "Current" } ],
"script": "swiss.js",
"function": "parsePkPass"
}
]

Development

For development it's convenient to symlink the extractors source folder to $XDG_DATA_DIRS/kitinerary/extractors, so you can re-run a changed extractor script without recompiling or restarting the application.

Definition at line 76 of file scriptextractor.h.

Member Function Documentation

bool ScriptExtractor::canHandle ( const ExtractorDocumentNode node) const
overridevirtual

Fast check whether this extractor is applicable for node.

Implements KItinerary::AbstractExtractor.

Definition at line 155 of file scriptextractor.cpp.

ExtractorResult ScriptExtractor::extract ( const ExtractorDocumentNode node,
const ExtractorEngine engine 
) const
overridevirtual

Extract data from node.

Implements KItinerary::AbstractExtractor.

Definition at line 171 of file scriptextractor.cpp.

const std::vector< ExtractorFilter > & ScriptExtractor::filters ( ) const

Returns the filters deciding whether this extractor should be applied.

Definition at line 140 of file scriptextractor.cpp.

QString ScriptExtractor::mimeType ( ) const

Mime type this script extractor supports.

Definition at line 105 of file scriptextractor.cpp.

QString ScriptExtractor::name ( ) const
overridevirtual

Identifier for this extractor.

Mainly used for diagnostics and tooling.

Implements KItinerary::AbstractExtractor.

Definition at line 96 of file scriptextractor.cpp.

QString ScriptExtractor::scriptFileName ( ) const

The JS script containing the code of the extractor.

Definition at line 115 of file scriptextractor.cpp.

QString ScriptExtractor::scriptFunction ( ) const

The JS function entry point for this extractor, main if empty.

Definition at line 125 of file scriptextractor.cpp.


The documentation for this class was generated from the following files:
This file is part of the KDE documentation.
Documentation copyright © 1996-2021 The KDE developers.
Generated on Tue Nov 30 2021 23:06:14 by doxygen 1.8.11 written by Dimitri van Heesch, © 1997-2006

KDE's Doxygen guidelines are available online.