KEncodingProber Class Reference

from PyKDE4.kdecore import *

Detailed Description

Provides encoding detection(probe) capabilities.

Probe the encoding of raw data only. In the case it can't find it, return the most possible encoding it guessed.

Always do Unicode probe regardless the ProberType

Feed data to it several times until ProberState changed to FoundIt/NotMe, or the Confidence reach a value you think acceptable.

Intended lifetime of the object: one instance per ProberType.

Typical use:

 QByteArray data, moredata;
 ...
 KEncodingProber prober(KEncodingProber.Chinese);
 prober.feed(data);
 prober.feed(moredata);
 if (prober.confidence() > 0.6)
    QString out = QTextCodec.codeForName(prober.encodingName())->toUnicode(data);

at least 256 characters are needed to change the ProberState from Probing to FoundIt. if you don't have so many characters to probe, decide whether to accept the encoding it guessed so far according to the Confidence by yourself.

Guess encoding of char array

Enumerations
ProberState	{ FoundIt, NotMe, Probing }
ProberType	{ None, Universal, Arabic, Baltic, CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic, Greek, Hebrew, Japanese, Korean, NorthernSaami, Other, SouthEasternEurope, Thai, Turkish, Unicode, WesternEuropean }
Methods
	__init__ (self, KEncodingProber.ProberType proberType=KEncodingProber.Universal)
float	confidence (self)
QString	encodingName (self)
KEncodingProber.ProberState	feed (self, QByteArray data)
KEncodingProber.ProberState	feed (self, QString data, int len)
KEncodingProber.ProberType	proberType (self)
	reset (self)
	setProberType (self, KEncodingProber.ProberType proberType)
KEncodingProber.ProberState	state (self)
Static Methods
QString	nameForProberType (KEncodingProber.ProberType proberType)
KEncodingProber.ProberType	proberTypeForName (QString lang)

Method Documentation

__init__	(	self,
		KEncodingProber.ProberType	proberType=KEncodingProber.Universal
	)

Default ProberType is Universal(detect all possibe encodings)

float confidence ( self )

Returns:: the confidence(sureness) of encoding it guessed so far (0.0 ~ 0.99), not very reliable for single byte encodings

QString encodingName ( self )

Returns:: the name of the best encoding it guessed so far

KEncodingProber.ProberState feed	(	self,
		QByteArray	data
	)

The main class method

feed data to the prober

Returns:: the ProberState after probe the feedded data

KEncodingProber.ProberState feed	(	self,
		QString	data,
		int	len
	)

QString nameForProberType	(	KEncodingProber.ProberType	proberType
	)

map ProberType to language string

KEncodingProber.ProberType proberType ( self )

KEncodingProber.ProberType proberTypeForName	(	QString	lang
	)

Returns:: the ProberType for lang (eg. proberTypeForName("Chinese Simplified") will return KEncodingProber.ChineseSimplified

reset ( self )

reset the prober's internal state and data.

setProberType	(	self,
		KEncodingProber.ProberType	proberType
	)

change current prober's ProberType and reset the prober

KEncodingProber.ProberState state ( self )

Returns:: the prober's current ProberState

Enumeration Documentation

ProberState

Enumerator:

FoundIt
NotMe
Probing

ProberType

Need more data to make a decision

Enumerator:

None
Universal
Arabic
Baltic
CentralEuropean
ChineseSimplified
ChineseTraditional
Cyrillic
Greek
Hebrew
Japanese
Korean
NorthernSaami
Other
SouthEasternEurope
Thai
Turkish
Unicode
WesternEuropean

KEncodingProber Class Reference

Detailed Description

Enumerations

Methods

Static Methods

Method Documentation

Enumeration Documentation

Modules