Sonnet::GuessLanguage

Search for usage in LXR

Sonnet::GuessLanguage Class Reference

#include <Sonnet/GuessLanguage>

Public Member Functions

 GuessLanguage ()
 
 GuessLanguage (const GuessLanguage &)=delete
 
 ~GuessLanguage ()
 
QString identify (const QString &text, const QStringList &suggestions=QStringList()) const
 
GuessLanguageoperator= (const GuessLanguage &)=delete
 
void setLimits (int maxItems, double minConfidence)
 

Detailed Description

GuessLanguage determines the language of a given text.

GuessLanguage can determine the difference between ~75 languages for a given string. It is based off a Perl script originally written by Maciej Ceglowski macie[email protected][email protected][email protected]glows[email protected]ki.c[email protected]om called Languid. His script used a 2 part heuristic to determine language. First the text is checked for the scripts it contains, then for each set of languages using those scripts a n-gram frequency model of a given language is compared to a model of the text. The most similar language model is assumed to be the language. If no language is found an empty string is returned.

Author
Jacob Rideout [email protected][email protected]acob[email protected]rideo[email protected]ut.n[email protected]et
Since
4.3

Definition at line 39 of file guesslanguage.h.

Constructor & Destructor Documentation

◆ GuessLanguage()

Sonnet::GuessLanguage::GuessLanguage ( )

Constructor Creates a new GuessLanguage instance.

If text is specified, it sets the text to be checked.

Parameters
textthe text that is to be checked

Definition at line 543 of file guesslanguage.cpp.

◆ ~GuessLanguage()

Sonnet::GuessLanguage::~GuessLanguage ( )

Destructor.

Definition at line 548 of file guesslanguage.cpp.

Member Function Documentation

◆ identify()

QString Sonnet::GuessLanguage::identify ( const QString text,
const QStringList suggestions = QStringList() 
) const

Returns the 2 digit ISO 639-1 code for the language of the currently set text and.

Three digits are returned only in the case where a 2 digit code does not exist. If text isn't empty, set the text to checked.

Parameters
textto be identified
Returns
list of the presumed languages of the text, sorted by decreasing confidence. Empty list means it is impossible to determine language with confidence required by setLimits

Definition at line 553 of file guesslanguage.cpp.

◆ setLimits()

void Sonnet::GuessLanguage::setLimits ( int  maxItems,
double  minConfidence 
)

Sets limits to number of languages returned by identify().

The confidence for each language is computed as difference between this and next language on the list normalized to 0-1 range. Reasonable value to get fairly sure result is 0.1 . Default is returning best guess without caring about confidence - exactly as after call to setLimits(1,0).

Parameters
maxItemsThe list returned by identify() will never have more than maxItems item
minConfidenceThe list will have only enough items for their summary confidence equal or exceed minConfidence.

Definition at line 619 of file guesslanguage.cpp.


The documentation for this class was generated from the following files:
This file is part of the KDE documentation.
Documentation copyright © 1996-2022 The KDE developers.
Generated on Wed Sep 28 2022 04:06:04 by doxygen 1.8.17 written by Dimitri van Heesch, © 1997-2006

KDE's Doxygen guidelines are available online.