kmail

EncodingDetector Class Reference

Provides encoding detection capabilities. More...

#include <encodingdetector.h>

List of all members.


Public Types

enum  AutoDetectScript {
  None, SemiautomaticDetection, Arabic, Baltic,
  CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic,
  Greek, Hebrew, Japanese, Korean,
  NorthernSaami, SouthEasternEurope, Thai, Turkish,
  Unicode, WesternEuropean
}
enum  EncodingChoiceSource {
  DefaultEncoding, AutoDetectedEncoding, BOM, EncodingFromXMLHeader,
  EncodingFromMetaTag, EncodingFromHTTPHeader, UserChosenEncoding
}

Public Member Functions

bool analyze (const QByteArray &data)
bool analyze (const char *data, int len)
AutoDetectScript autoDetectLanguage () const
const char * encoding () const
EncodingChoiceSource encodingChoiceSource () const
 EncodingDetector (QTextCodec *codec, EncodingChoiceSource source, AutoDetectScript script=None)
 EncodingDetector ()
void setAutoDetectLanguage (AutoDetectScript)
bool setEncoding (const char *encoding, EncodingChoiceSource type)
bool visuallyOrdered () const
 ~EncodingDetector ()

Static Public Member Functions

static bool hasAutoDetectionForScript (AutoDetectScript)
static QString nameForScript (AutoDetectScript)
static AutoDetectScript scriptForLanguageCode (const QString &lang)
static AutoDetectScript scriptForName (const QString &lang)

Protected Member Functions

QTextDecoderdecoder ()
bool errorsIfUtf8 (const char *data, int length)

Detailed Description

Provides encoding detection capabilities.

Searches for encoding declaration inside raw data -- meta and xml tags. In the case it can't find it, uses heuristics for specified language.

If it finds unicode BOM marks, it changes encoding regardless of what the user has told

Intended lifetime of the object: one instance per document.

Typical use:

 QByteArray data;
 ...
 EncodingDetector detector;
 detector.setAutoDetectLanguage(EncodingDetector::Cyrillic);
 QString out=detector.decode(data);

Do not mix decode() with decodeWithBuffering()

Guess encoding of char array

Definition at line 57 of file encodingdetector.h.


Member Enumeration Documentation

Enumerator:
None 
SemiautomaticDetection 
Arabic 
Baltic 
CentralEuropean 
ChineseSimplified 
ChineseTraditional 
Cyrillic 
Greek 
Hebrew 
Japanese 
Korean 
NorthernSaami 
SouthEasternEurope 
Thai 
Turkish 
Unicode 
WesternEuropean 

Definition at line 71 of file encodingdetector.h.

Enumerator:
DefaultEncoding 
AutoDetectedEncoding 
BOM 
EncodingFromXMLHeader 
EncodingFromMetaTag 
EncodingFromHTTPHeader 
UserChosenEncoding 

Definition at line 60 of file encodingdetector.h.


Constructor & Destructor Documentation

EncodingDetector::EncodingDetector (  ) 

Default codec is latin1 (as html spec says), EncodingChoiceSource is default, AutoDetectScript=Semiautomatic.

Definition at line 877 of file encodingdetector.cpp.

EncodingDetector::EncodingDetector ( QTextCodec codec,
EncodingChoiceSource  source,
AutoDetectScript  script = None 
)

Allows to set Default codec, EncodingChoiceSource, AutoDetectScript.

Definition at line 881 of file encodingdetector.cpp.

EncodingDetector::~EncodingDetector (  ) 

Definition at line 886 of file encodingdetector.cpp.


Member Function Documentation

bool EncodingDetector::analyze ( const QByteArray data  ) 

Analyze text data.

Returns:
true if there was enough data for accurate detection

Definition at line 981 of file encodingdetector.cpp.

bool EncodingDetector::analyze ( const char *  data,
int  len 
)

Analyze text data.

Returns:
true if there was enough data for accurate detection

Definition at line 986 of file encodingdetector.cpp.

EncodingDetector::AutoDetectScript EncodingDetector::autoDetectLanguage (  )  const

Definition at line 895 of file encodingdetector.cpp.

QTextDecoder * EncodingDetector::decoder (  )  [protected]

Returns:
QTextDecoder for detected encoding

Definition at line 921 of file encodingdetector.cpp.

const char * EncodingDetector::encoding (  )  const

Convenience method.

Returns:
mime name of detected encoding

Definition at line 905 of file encodingdetector.cpp.

EncodingDetector::EncodingChoiceSource EncodingDetector::encodingChoiceSource (  )  const

Definition at line 900 of file encodingdetector.cpp.

bool EncodingDetector::errorsIfUtf8 ( const char *  data,
int  length 
) [protected]

Check if we are really utf8.

Taken from kate

Returns:
true if current encoding is utf8 and the text cannot be in this encoding
Please somebody read http://de.wikipedia.org/wiki/UTF-8 and check this code...

Definition at line 813 of file encodingdetector.cpp.

bool EncodingDetector::hasAutoDetectionForScript ( EncodingDetector::AutoDetectScript  script  )  [static]

Definition at line 1274 of file encodingdetector.cpp.

QString EncodingDetector::nameForScript ( EncodingDetector::AutoDetectScript  script  )  [static]

Definition at line 1308 of file encodingdetector.cpp.

EncodingDetector::AutoDetectScript EncodingDetector::scriptForLanguageCode ( const QString lang  )  [static]

Definition at line 1361 of file encodingdetector.cpp.

EncodingDetector::AutoDetectScript EncodingDetector::scriptForName ( const QString lang  )  [static]

Takes lang name _after_ it were i18n()'ed.

Definition at line 1246 of file encodingdetector.cpp.

void EncodingDetector::setAutoDetectLanguage ( EncodingDetector::AutoDetectScript  lang  ) 

Definition at line 891 of file encodingdetector.cpp.

bool EncodingDetector::setEncoding ( const char *  encoding,
EncodingChoiceSource  type 
)

Returns:
true if specified encoding was recognized

Definition at line 926 of file encodingdetector.cpp.

bool EncodingDetector::visuallyOrdered (  )  const

Definition at line 911 of file encodingdetector.cpp.


The documentation for this class was generated from the following files: