KItinerary
Using the extractor engine
C++ API
Using the C++ API is the most flexible and efficient way to use this. This consists of three steps:
- Extraction: This will attempt to find relevant information in the given input documents, its output however can still contain duplicate or invalid results. There are some options to customize this step, e.g. trading more expensive image processing against finding more results, depending on how certain you are the input data is going to contain such data. See KItinerary::ExtractorEngine.
- Post-processing: This step merges duplicate or split results, but its output can still contain invalid elements. The main way to customize this step is in what you feed into it. For best results this should be all extractor results that can possibly contain information for a specific incident. See KItinerary::ExtractorPostprocessor.
- Validation: This will remove and remaining incomplete or invalid results, or results of undesired types. For this step you typically want to set the set of types your application can handle. Letting incomplete results pass can be useful if you do have an existing set of data you want to apply those too. See KItineary::ExtractorValidator.
Example:
using namespace KItinerary;
// Create an instance of the extractor engine
// use engine.setHints(...) to control its behavior
ExtractorEngine engine;
// feed raw data into the extractor engine
// passing a file name or MIME type additional to the data is optional
// but can help with identifying the type of data passed in
// should you already have data in decoded form, see engine.setContent() instead
QFile f("my-document.pdf");
f.open(QFile::ReadOnly);
engine.setData(f.readAll(), f.fileName());
// perform the extraction
// post process the extracted result
ExtractorPostprocessor postproc;
// ExtractorPostprocessor::process() can be called multiple times
// to accumulate a single merged result set
postproc.process(extractedData);
auto result = postproc.result();
// select the type of data you can consume
ExtractorValidator validator;
validator.setAcceptOnlyCompleteElements(true);
// remove invalid results
result.erase(std::remove_if(result.begin(), result.end(), [&validator](const auto &r) {
return !validator.isValidElement(r);
}), result.end());
void setData(const QByteArray &data, QStringView fileName={}, QStringView mimeType={})
Set raw data to extract from.
Definition engine/extractorengine.cpp:120
QJsonArray extract()
Perform the actual extraction, and return the JSON-LD data that has been found.
Definition engine/extractorengine.cpp:150
Post-process extracted data to filter out garbage and augment data from other sources.
Definition extractorpostprocessor.h:62
QList< QVariant > result() const
This returns the final result of all previously executed processing steps followed by sorting and fil...
Definition extractorpostprocessor.cpp:125
void process(const QList< QVariant > &data)
This will normalize and augment the given data elements and merge them with already added data elemen...
Definition extractorpostprocessor.cpp:68
void setAcceptOnlyCompleteElements(bool completeOnly)
Configure whether or not to accept also incomplete elements.
Definition extractorvalidator.cpp:64
void setAcceptedTypes(std::vector< const QMetaObject * > &&accptedTypes)
Sets the list of supported top-level types that should be accepted.
Definition extractorvalidator.cpp:59
Classes for reservation/travel data models, data extraction and data augmentation.
Definition berelement.h:17
ReadOnly
This file is part of the KDE documentation.
Documentation copyright © 1996-2025 The KDE developers.
Generated on Fri May 2 2025 11:54:59 by doxygen 1.13.2 written by Dimitri van Heesch, © 1997-2006
Documentation copyright © 1996-2025 The KDE developers.
Generated on Fri May 2 2025 11:54:59 by doxygen 1.13.2 written by Dimitri van Heesch, © 1997-2006
KDE's Doxygen guidelines are available online.