KItinerary

Using the extractor engine

C++ API

Using the C++ API is the most flexible and efficient way to use this. This consists of three steps:

  • Extraction: This will attempt to find relevant information in the given input documents, its output however can still contain duplicate or invalid results. There are some options to customize this step, e.g. trading more expensive image processing against finding more results, depending on how certain you are the input data is going to contain such data. See KItinerary::ExtractorEngine.
  • Post-processing: This step merges duplicate or split results, but its output can still contain invalid elements. The main way to customize this step is in what you feed into it. For best results this should be all extractor results that can possibly contain information for a specific incident. See KItinerary::ExtractorPostprocessor.
  • Validation: This will remove and remaining incomplete or invalid results, or results of undesired types. For this step you typically want to set the set of types your application can handle. Letting incomplete results pass can be useful if you do have an existing set of data you want to apply those too. See KItineary::ExtractorValidator.

Example:

using namespace KItinerary;
// Create an instance of the extractor engine
// use engine.setHints(...) to control its behavior
// feed raw data into the extractor engine
// passing a file name or MIME type additional to the data is optional
// but can help with identifying the type of data passed in
// should you already have data in decoded form, see engine.setContent() instead
QFile f("my-document.pdf");
engine.setData(f.readAll(), f.fileName());
// perform the extraction
const auto extractedData = engine.extract();
// post process the extracted result
// ExtractorPostprocessor::process() can be called multiple times
// to accumulate a single merged result set
postproc.process(extractedData);
auto result = postproc.result();
// select the type of data you can consume
// remove invalid results
result.erase(std::remove_if(result.begin(), result.end(), [&validator](const auto &r) {
return !validator.isValidElement(r);
}), result.end());
A bus reservation.
Semantic data extraction engine.
void setData(const QByteArray &data, QStringView fileName={}, QStringView mimeType={})
Set raw data to extract from.
QJsonArray extract()
Perform the actual extraction, and return the JSON-LD data that has been found.
Post-process extracted data to filter out garbage and augment data from other sources.
QList< QVariant > result() const
This returns the final result of all previously executed processing steps followed by sorting and fil...
void process(const QList< QVariant > &data)
This will normalize and augment the given data elements and merge them with already added data elemen...
Validates extractor results.
void setAcceptOnlyCompleteElements(bool completeOnly)
Configure whether or not to accept also incomplete elements.
void setAcceptedTypes(std::vector< const QMetaObject * > &&accptedTypes)
Sets the list of supported top-level types that should be accepted.
A train reservation.
Classes for reservation/travel data models, data extraction and data augmentation.
Definition berelement.h:17
This file is part of the KDE documentation.
Documentation copyright © 1996-2025 The KDE developers.
Generated on Fri May 2 2025 11:54:59 by doxygen 1.13.2 written by Dimitri van Heesch, © 1997-2006

KDE's Doxygen guidelines are available online.