• Skip to content
  • Skip to link menu
KDE API Reference
  • KDE API Reference
  • kdelibs API Reference
  • KDE Home
  • Contact Us
 

KDECore

Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions | List of all members
KEncodingDetector Class Reference

#include <kencodingdetector.h>

Public Types

enum  AutoDetectScript {
  None, SemiautomaticDetection, Arabic, Baltic,
  CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic,
  Greek, Hebrew, Japanese, Korean,
  NorthernSaami, SouthEasternEurope, Thai, Turkish,
  Unicode, WesternEuropean
}
 
enum  EncodingChoiceSource {
  DefaultEncoding, AutoDetectedEncoding, BOM, EncodingFromXMLHeader,
  EncodingFromMetaTag, EncodingFromHTTPHeader, UserChosenEncoding
}
 

Public Member Functions

 KEncodingDetector ()
 
 KEncodingDetector (QTextCodec *codec, EncodingChoiceSource source, AutoDetectScript script=None)
 
 ~KEncodingDetector ()
 
AutoDetectScript autoDetectLanguage () const
 
QString decode (const char *data, int len)
 
QString decode (const QByteArray &data)
 
bool decodedInvalidCharacters () const
 
QString decodeWithBuffering (const char *data, int len)
 
const char * encoding () const
 
EncodingChoiceSource encodingChoiceSource () const
 
QString flush ()
 
void resetDecoder ()
 
void setAutoDetectLanguage (AutoDetectScript)
 
bool setEncoding (const char *encoding, EncodingChoiceSource type)
 
bool visuallyOrdered () const
 

Static Public Member Functions

static bool hasAutoDetectionForScript (AutoDetectScript)
 
static QString nameForScript (AutoDetectScript)
 
static AutoDetectScript scriptForName (const QString &lang)
 

Protected Member Functions

bool analyze (const char *data, int len)
 
QTextDecoder * decoder ()
 
bool errorsIfUtf8 (const char *data, int length)
 
bool processNull (char *data, int length)
 

Detailed Description

Provides encoding detection capabilities.

Searches for encoding declaration inside raw data – meta and xml tags. In the case it can't find it, uses heuristics for specified language.

If it finds unicode BOM marks, it changes encoding regardless of what the user has told

Intended lifetime of the object: one instance per document.

Typical use:

QByteArray data;
...
KEncodingDetector detector;
detector.setAutoDetectLanguage(KEncodingDetector::Cyrillic);
QString out=detector.decode(data);

Do not mix decode() with decodeWithBuffering()

Guess encoding of char array

Definition at line 58 of file kencodingdetector.h.

Member Enumeration Documentation

enum KEncodingDetector::AutoDetectScript
Enumerator
None 
SemiautomaticDetection 
Arabic 
Baltic 
CentralEuropean 
ChineseSimplified 
ChineseTraditional 
Cyrillic 
Greek 
Hebrew 
Japanese 
Korean 
NorthernSaami 
SouthEasternEurope 
Thai 
Turkish 
Unicode 
WesternEuropean 

Definition at line 72 of file kencodingdetector.h.

enum KEncodingDetector::EncodingChoiceSource
Enumerator
DefaultEncoding 
AutoDetectedEncoding 
BOM 
EncodingFromXMLHeader 
EncodingFromMetaTag 
EncodingFromHTTPHeader 
UserChosenEncoding 

Definition at line 61 of file kencodingdetector.h.

Constructor & Destructor Documentation

KEncodingDetector::KEncodingDetector ( )

Default codec is latin1 (as html spec says), EncodingChoiceSource is default, AutoDetectScript=Semiautomatic.

Definition at line 650 of file kencodingdetector.cpp.

KEncodingDetector::KEncodingDetector ( QTextCodec *  codec,
EncodingChoiceSource  source,
AutoDetectScript  script = None 
)

Allows to set Default codec, EncodingChoiceSource, AutoDetectScript.

Definition at line 654 of file kencodingdetector.cpp.

KEncodingDetector::~KEncodingDetector ( )

Definition at line 659 of file kencodingdetector.cpp.

Member Function Documentation

bool KEncodingDetector::analyze ( const char *  data,
int  len 
)
protected

Analyze text data.

Returns
true if there was enough data for accurate detection

Definition at line 875 of file kencodingdetector.cpp.

KEncodingDetector::AutoDetectScript KEncodingDetector::autoDetectLanguage ( ) const

Definition at line 668 of file kencodingdetector.cpp.

QString KEncodingDetector::decode ( const char *  data,
int  len 
)

The main class method.

Calls protected analyze() only the first time of the whole object life

Replaces all null chars with spaces.

Definition at line 772 of file kencodingdetector.cpp.

QString KEncodingDetector::decode ( const QByteArray &  data)

Definition at line 784 of file kencodingdetector.cpp.

bool KEncodingDetector::decodedInvalidCharacters ( ) const

This method checks whether invalid characters were found during a decoding operation.

Note that this bit is never reset once invalid characters have been found. To force a reset, either change the encoding using setEncoding() or call resetDecoder()

Returns
a boolean reflecting said state.
Since
4.3
See also
resetDecoder() setEncoding()

Definition at line 856 of file kencodingdetector.cpp.

QTextDecoder * KEncodingDetector::decoder ( )
protected
Returns
QTextDecoder for detected encoding

Definition at line 694 of file kencodingdetector.cpp.

QString KEncodingDetector::decodeWithBuffering ( const char *  data,
int  len 
)

Convenience method that uses buffering.

It waits for full html head to be buffered (i.e. calls analyze every time until it returns true).

Replaces all null chars with spaces.

Returns
Decoded data, or empty string, if there was not enough data for accurate detection
See also
flush()

Definition at line 796 of file kencodingdetector.cpp.

const char * KEncodingDetector::encoding ( ) const

Convenience method.

Returns
mime name of detected encoding

Definition at line 678 of file kencodingdetector.cpp.

KEncodingDetector::EncodingChoiceSource KEncodingDetector::encodingChoiceSource ( ) const

Definition at line 673 of file kencodingdetector.cpp.

bool KEncodingDetector::errorsIfUtf8 ( const char *  data,
int  length 
)
protected

Check if we are really utf8.

Taken from kate

Returns
true if current encoding is utf8 and the text cannot be in this encoding

Please somebody read http://de.wikipedia.org/wiki/UTF-8 and check this code...

Definition at line 585 of file kencodingdetector.cpp.

QString KEncodingDetector::flush ( )

Convenience method to be used with decodeForHtml.

Flushes buffer.

See also
decodeForHtml()

Definition at line 861 of file kencodingdetector.cpp.

bool KEncodingDetector::hasAutoDetectionForScript ( KEncodingDetector::AutoDetectScript  script)
static

Definition at line 1173 of file kencodingdetector.cpp.

QString KEncodingDetector::nameForScript ( KEncodingDetector::AutoDetectScript  script)
static

Definition at line 1207 of file kencodingdetector.cpp.

bool KEncodingDetector::processNull ( char *  data,
int  length 
)
protected

This nice method will kill all 0 bytes (or double bytes) and remember if this was a binary or not ;)

Definition at line 556 of file kencodingdetector.cpp.

void KEncodingDetector::resetDecoder ( )

Resets the decoder.

Any stateful decoding information (such as resulting from previous calls to decodeWithBuffering()) will be lost. Will Reset the state of decodedInvalidCharacters() as a side effect.

Since
4.3
See also
decodeWithBuffering() decodedInvalidCharacters()

Definition at line 699 of file kencodingdetector.cpp.

KEncodingDetector::AutoDetectScript KEncodingDetector::scriptForName ( const QString &  lang)
static

Takes lang name after it were i18n()'ed.

Definition at line 1145 of file kencodingdetector.cpp.

void KEncodingDetector::setAutoDetectLanguage ( KEncodingDetector::AutoDetectScript  lang)

Definition at line 664 of file kencodingdetector.cpp.

bool KEncodingDetector::setEncoding ( const char *  encoding,
EncodingChoiceSource  type 
)
Returns
true if specified encoding was recognized

Definition at line 712 of file kencodingdetector.cpp.

bool KEncodingDetector::visuallyOrdered ( ) const

Definition at line 684 of file kencodingdetector.cpp.


The documentation for this class was generated from the following files:
  • kencodingdetector.h
  • kencodingdetector.cpp
This file is part of the KDE documentation.
Documentation copyright © 1996-2020 The KDE developers.
Generated on Mon Jun 22 2020 13:22:13 by doxygen 1.8.7 written by Dimitri van Heesch, © 1997-2006

KDE's Doxygen guidelines are available online.

KDECore

Skip menu "KDECore"
  • Main Page
  • Namespace List
  • Namespace Members
  • Alphabetical List
  • Class List
  • Class Hierarchy
  • Class Members
  • File List
  • File Members
  • Modules
  • Related Pages

kdelibs API Reference

Skip menu "kdelibs API Reference"
  • DNSSD
  • Interfaces
  •   KHexEdit
  •   KMediaPlayer
  •   KSpeech
  •   KTextEditor
  • kconf_update
  • KDE3Support
  •   KUnitTest
  • KDECore
  • KDED
  • KDEsu
  • KDEUI
  • KDEWebKit
  • KDocTools
  • KFile
  • KHTML
  • KImgIO
  • KInit
  • kio
  • KIOSlave
  • KJS
  •   KJS-API
  •   WTF
  • kjsembed
  • KNewStuff
  • KParts
  • KPty
  • Kross
  • KUnitConversion
  • KUtils
  • Nepomuk
  • Plasma
  • Solid
  • Sonnet
  • ThreadWeaver

Search



Report problems with this website to our bug tracking system.
Contact the specific authors with questions and comments about the page contents.

KDE® and the K Desktop Environment® logo are registered trademarks of KDE e.V. | Legal