KDE 4.7 PyKDE API Reference
  • KDE's Python API
  • Overview
  • PyKDE Home
  • Sitemap
  • Contact Us
 

KEncodingDetector Class Reference

from PyKDE4.kdecore import *

Detailed Description

Provides encoding detection capabilities.

Searches for encoding declaration inside raw data -- meta and xml tags. In the case it can't find it, uses heuristics for specified language.

If it finds unicode BOM marks, it changes encoding regardless of what the user has told

Intended lifetime of the object: one instance per document.

Typical use:

 QByteArray data;
 ...
 KEncodingDetector detector;
 detector.setAutoDetectLanguage(KEncodingDetector.Cyrillic);
 QString out=detector.decode(data);

Do not mix decode() with decodeWithBuffering()

Guess encoding of char array


Enumerations

AutoDetectScript { None, SemiautomaticDetection, Arabic, Baltic, CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic, Greek, Hebrew, Japanese, Korean, NorthernSaami, SouthEasternEurope, Thai, Turkish, Unicode, WesternEuropean }
EncodingChoiceSource { DefaultEncoding, AutoDetectedEncoding, BOM, EncodingFromXMLHeader, EncodingFromMetaTag, EncodingFromHTTPHeader, UserChosenEncoding }

Methods

 __init__ (self)
 __init__ (self, QTextCodec codec, KEncodingDetector.EncodingChoiceSource source, KEncodingDetector.AutoDetectScript script=KEncodingDetector.None)
 __init__ (self, KEncodingDetector other)
bool analyze (self, QString data, int len)
KEncodingDetector.AutoDetectScript autoDetectLanguage (self)
QString decode (self, QString data, int len)
QString decode (self, QByteArray data)
QString decodeWithBuffering (self, QString data, int len)
bool decodedInvalidCharacters (self)
QTextDecoder decoder (self)
QString encoding (self)
KEncodingDetector.EncodingChoiceSource encodingChoiceSource (self)
bool errorsIfUtf8 (self, QString data, int length)
QString flush (self)
bool processNull (self, QString data, int length)
 resetDecoder (self)
 setAutoDetectLanguage (self, KEncodingDetector.AutoDetectScript a0)
bool setEncoding (self, QString encoding, KEncodingDetector.EncodingChoiceSource type)
bool visuallyOrdered (self)

Static Methods

bool hasAutoDetectionForScript (KEncodingDetector.AutoDetectScript a0)
QString nameForScript (KEncodingDetector.AutoDetectScript a0)
KEncodingDetector.AutoDetectScript scriptForName (QString lang)

Method Documentation

__init__ (   self )

Default codec is latin1 (as html spec says), EncodingChoiceSource is default, AutoDetectScript=Semiautomatic

__init__ (  self,
QTextCodec  codec,
KEncodingDetector.EncodingChoiceSource  source,
KEncodingDetector.AutoDetectScript  script=KEncodingDetector.None
)

Allows to set Default codec, EncodingChoiceSource, AutoDetectScript

__init__ (  self,
KEncodingDetector  other
)
bool analyze (  self,
QString  data,
int  len
)

Analyze text data.

Returns:
true if there was enough data for accurate detection

KEncodingDetector.AutoDetectScript autoDetectLanguage (   self )
QString decode (  self,
QString  data,
int  len
)

The main class method

Calls protected analyze() only the first time of the whole object life

Replaces all null chars with spaces.

QString decode (  self,
QByteArray  data
)

The main class method

Calls protected analyze() only the first time of the whole object life

Replaces all null chars with spaces.

QString decodeWithBuffering (  self,
QString  data,
int  len
)

Convenience method that uses buffering. It waits for full html head to be buffered (i.e. calls analyze every time until it returns true).

Replaces all null chars with spaces.

Returns:
Decoded data, or empty string, if there was not enough data for accurate detection
See also:
flush()

bool decodedInvalidCharacters (   self )

This method checks whether invalid characters were found during a decoding operation.

Note that this bit is never reset once invalid characters have been found. To force a reset, either change the encoding using setEncoding() or call resetDecoder()

Returns:
a boolean reflecting said state.
Since:
4.3
See also:
resetDecoder() setEncoding()

QTextDecoder decoder (   self )

Returns:
QTextDecoder for detected encoding

QString encoding (   self )

Convenience method.

Returns:
mime name of detected encoding

KEncodingDetector.EncodingChoiceSource encodingChoiceSource (   self )
bool errorsIfUtf8 (  self,
QString  data,
int  length
)

Check if we are really utf8. Taken from kate

Returns:
true if current encoding is utf8 and the text cannot be in this encoding

Please somebody read http://de.wikipedia.org/wiki/UTF-8 and check this code...

QString flush (   self )

Convenience method to be used with decodeForHtml. Flushes buffer.

See also:
decodeForHtml()

bool processNull (  self,
QString  data,
int  length
)

This nice method will kill all 0 bytes (or double bytes) and remember if this was a binary or not ;)

resetDecoder (   self )

Resets the decoder. Any stateful decoding information (such as resulting from previous calls to decodeWithBuffering()) will be lost. Will Reset the state of decodedInvalidCharacters() as a side effect.

Since:
4.3
See also:
decodeWithBuffering() decodedInvalidCharacters()

setAutoDetectLanguage (  self,
KEncodingDetector.AutoDetectScript  a0
)
bool setEncoding (  self,
QString  encoding,
KEncodingDetector.EncodingChoiceSource  type
)

Returns:
true if specified encoding was recognized

bool visuallyOrdered (   self )

Static Method Documentation

bool hasAutoDetectionForScript ( KEncodingDetector.AutoDetectScript  a0
)
QString nameForScript ( KEncodingDetector.AutoDetectScript  a0
)
KEncodingDetector.AutoDetectScript scriptForName ( QString  lang
)

Takes lang name _after_ it were i18n()'ed


Enumeration Documentation

AutoDetectScript
Enumerator:
None 
SemiautomaticDetection 
Arabic 
Baltic 
CentralEuropean 
ChineseSimplified 
ChineseTraditional 
Cyrillic 
Greek 
Hebrew 
Japanese 
Korean 
NorthernSaami 
SouthEasternEurope 
Thai 
Turkish 
Unicode 
WesternEuropean 

EncodingChoiceSource
Enumerator:
DefaultEncoding 
AutoDetectedEncoding 
BOM 
EncodingFromXMLHeader 
EncodingFromMetaTag 
EncodingFromHTTPHeader 
UserChosenEncoding 

  • Full Index

Modules

  • akonadi
  • dnssd
  • kdecore
  • kdeui
  • khtml
  • kio
  • knewstuff
  • kparts
  • kutils
  • nepomuk
  • phonon
  • plasma
  • polkitqt
  • solid
  • soprano
This documentation is maintained by Simon Edwards.
KDE® and the K Desktop Environment® logo are registered trademarks of KDE e.V. | Legal