#include <kencodingdetector.h>

Public Types

enum  AutoDetectScript {
  None, SemiautomaticDetection, Arabic, Baltic,
  CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic,
  Greek, Hebrew, Japanese, Korean,
  NorthernSaami, SouthEasternEurope, Thai, Turkish,
  Unicode, WesternEuropean
enum  EncodingChoiceSource {
  DefaultEncoding, AutoDetectedEncoding, BOM, EncodingFromXMLHeader,
  EncodingFromMetaTag, EncodingFromHTTPHeader, UserChosenEncoding

Public Member Functions

 KEncodingDetector ()
 KEncodingDetector (QTextCodec *codec, EncodingChoiceSource source, AutoDetectScript script=None)
AutoDetectScript autoDetectLanguage () const
QString decode (const char *data, int len)
QString decode (const QByteArray &data)
bool decodedInvalidCharacters () const
QString decodeWithBuffering (const char *data, int len)
const char * encoding () const
EncodingChoiceSource encodingChoiceSource () const
QString flush ()
void resetDecoder ()
void setAutoDetectLanguage (AutoDetectScript)
bool setEncoding (const char *encoding, EncodingChoiceSource type)
bool visuallyOrdered () const

Static Public Member Functions

static bool hasAutoDetectionForScript (AutoDetectScript)
static QString nameForScript (AutoDetectScript)
static AutoDetectScript scriptForName (const QString &lang)

Protected Member Functions

bool analyze (const char *data, int len)
QTextDecoderdecoder ()
bool errorsIfUtf8 (const char *data, int length)
bool processNull (char *data, int length)

Detailed Description

Provides encoding detection capabilities.

Searches for encoding declaration inside raw data – meta and xml tags. In the case it can't find it, uses heuristics for specified language.

If it finds unicode BOM marks, it changes encoding regardless of what the user has told

Intended lifetime of the object: one instance per document.

Typical use:

KEncodingDetector detector;
QString out=detector.decode(data);

Do not mix decode() with decodeWithBuffering()

Guess encoding of char array

Definition at line 57 of file kencodingdetector.h.

Constructor & Destructor Documentation

KEncodingDetector::KEncodingDetector ( )

Default codec is latin1 (as html spec says), EncodingChoiceSource is default, AutoDetectScript=Semiautomatic.

Definition at line 646 of file kencodingdetector.cpp.

KEncodingDetector::KEncodingDetector ( QTextCodec codec,
EncodingChoiceSource  source,
AutoDetectScript  script = None 

Allows to set Default codec, EncodingChoiceSource, AutoDetectScript.

Definition at line 650 of file kencodingdetector.cpp.

Member Function Documentation

bool KEncodingDetector::analyze ( const char *  data,
int  len 

Analyze text data.

true if there was enough data for accurate detection

Definition at line 859 of file kencodingdetector.cpp.

QString KEncodingDetector::decode ( const char *  data,
int  len 

The main class method.

Calls protected analyze() only the first time of the whole object life

Replaces all null chars with spaces.

Definition at line 767 of file kencodingdetector.cpp.

bool KEncodingDetector::decodedInvalidCharacters ( ) const

This method checks whether invalid characters were found during a decoding operation.

Note that this bit is never reset once invalid characters have been found. To force a reset, either change the encoding using setEncoding() or call resetDecoder()

a boolean reflecting said state.
See also
resetDecoder() setEncoding()

Definition at line 839 of file kencodingdetector.cpp.

QTextDecoder * KEncodingDetector::decoder ( )
QTextDecoder for detected encoding

Definition at line 690 of file kencodingdetector.cpp.

QString KEncodingDetector::decodeWithBuffering ( const char *  data,
int  len 

Convenience method that uses buffering.

It waits for full html head to be buffered (i.e. calls analyze every time until it returns true).

Replaces all null chars with spaces.

Decoded data, or empty string, if there was not enough data for accurate detection
See also

Definition at line 789 of file kencodingdetector.cpp.

const char * KEncodingDetector::encoding ( ) const

Convenience method.

mime name of detected encoding

Definition at line 674 of file kencodingdetector.cpp.

bool KEncodingDetector::errorsIfUtf8 ( const char *  data,
int  length 

Check if we are really utf8.

Taken from kate

true if current encoding is utf8 and the text cannot be in this encoding

Please somebody read and check this code...

Definition at line 586 of file kencodingdetector.cpp.

QString KEncodingDetector::flush ( )

Convenience method to be used with decodeForHtml.

Flushes buffer.

See also

Definition at line 844 of file kencodingdetector.cpp.

bool KEncodingDetector::processNull ( char *  data,
int  length 

This nice method will kill all 0 bytes (or double bytes) and remember if this was a binary or not ;)

Definition at line 563 of file kencodingdetector.cpp.

void KEncodingDetector::resetDecoder ( )

Resets the decoder.

Any stateful decoding information (such as resulting from previous calls to decodeWithBuffering()) will be lost. Will Reset the state of decodedInvalidCharacters() as a side effect.

See also
decodeWithBuffering() decodedInvalidCharacters()

Definition at line 695 of file kencodingdetector.cpp.

KEncodingDetector::AutoDetectScript KEncodingDetector::scriptForName ( const QString lang)

Takes lang name after it were i18n()'ed.

Definition at line 1119 of file kencodingdetector.cpp.

bool KEncodingDetector::setEncoding ( const char *  encoding,
EncodingChoiceSource  type 
true if specified encoding was recognized

Definition at line 709 of file kencodingdetector.cpp.

The documentation for this class was generated from the following files:
This file is part of the KDE documentation.
Documentation copyright © 1996-2021 The KDE developers.
Generated on Wed Jan 20 2021 22:50:01 by doxygen 1.8.11 written by Dimitri van Heesch, © 1997-2006

KDE's Doxygen guidelines are available online.