• Skip to content
  • Skip to link menu
KDE 4.0 API Reference
  • KDE API Reference
  • kdelibs
  • Sitemap
  • Contact Us
 

KDECore

KEncodingDetector Class Reference

#include <kencodingdetector.h>

List of all members.


Detailed Description

Provides encoding detection capabilities.

Searches for encoding declaration inside raw data -- meta and xml tags. In the case it can't find it, uses heuristics for specified language.

If it finds unicode BOM marks, it changes encoding regardless of what the user has told

Intended lifetime of the object: one instance per document.

Typical use:

 QByteArray data;
 ...
 KEncodingDetector detector;
 detector.setAutoDetectLanguage(KEncodingDetector::Cyrillic);
 QString out=detector.decode(data);

Do not mix decode() with decodeWithBuffering()

Guess encoding of char array

Definition at line 58 of file kencodingdetector.h.


Public Types

enum  EncodingChoiceSource {
  DefaultEncoding, AutoDetectedEncoding, BOM, EncodingFromXMLHeader,
  EncodingFromMetaTag, EncodingFromHTTPHeader, UserChosenEncoding
}
enum  AutoDetectScript {
  None, SemiautomaticDetection, Arabic, Baltic,
  CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic,
  Greek, Hebrew, Japanese, Korean,
  NorthernSaami, SouthEasternEurope, Thai, Turkish,
  Unicode, WesternEuropean
}

Public Member Functions

 KEncodingDetector ()
 KEncodingDetector (QTextCodec *codec, EncodingChoiceSource source, AutoDetectScript script=None)
 ~KEncodingDetector ()
bool setEncoding (const char *encoding, EncodingChoiceSource type)
const char * encoding () const
bool visuallyOrdered () const
void setAutoDetectLanguage (AutoDetectScript)
AutoDetectScript autoDetectLanguage () const
EncodingChoiceSource encodingChoiceSource () const
QString decode (const char *data, int len)
QString decode (const QByteArray &data)
QString decodeWithBuffering (const char *data, int len)
QString flush ()

Static Public Member Functions

static AutoDetectScript scriptForName (const QString &lang)
static QString nameForScript (AutoDetectScript)
static bool hasAutoDetectionForScript (AutoDetectScript)

Protected Member Functions

bool processNull (char *data, int length)
bool errorsIfUtf8 (const char *data, int length)
bool analyze (const char *data, int len)
QTextDecoder * decoder ()

Member Enumeration Documentation

enum KEncodingDetector::EncodingChoiceSource

Enumerator:
DefaultEncoding 
AutoDetectedEncoding 
BOM 
EncodingFromXMLHeader 
EncodingFromMetaTag 
EncodingFromHTTPHeader 
UserChosenEncoding 

Definition at line 61 of file kencodingdetector.h.

enum KEncodingDetector::AutoDetectScript

Enumerator:
None 
SemiautomaticDetection 
Arabic 
Baltic 
CentralEuropean 
ChineseSimplified 
ChineseTraditional 
Cyrillic 
Greek 
Hebrew 
Japanese 
Korean 
NorthernSaami 
SouthEasternEurope 
Thai 
Turkish 
Unicode 
WesternEuropean 

Definition at line 72 of file kencodingdetector.h.


Constructor & Destructor Documentation

KEncodingDetector::KEncodingDetector (  ) 

Default codec is latin1 (as html spec says), EncodingChoiceSource is default, AutoDetectScript=Semiautomatic.

Definition at line 644 of file kencodingdetector.cpp.

KEncodingDetector::KEncodingDetector ( QTextCodec *  codec,
EncodingChoiceSource  source,
AutoDetectScript  script = None 
)

Allows to set Default codec, EncodingChoiceSource, AutoDetectScript.

Definition at line 648 of file kencodingdetector.cpp.

KEncodingDetector::~KEncodingDetector (  ) 

Definition at line 653 of file kencodingdetector.cpp.


Member Function Documentation

bool KEncodingDetector::setEncoding ( const char *  encoding,
EncodingChoiceSource  type 
)

Returns:
true if specified encoding was recognized

Definition at line 693 of file kencodingdetector.cpp.

const char * KEncodingDetector::encoding (  )  const

Convenience method.

Returns:
mime name of detected encoding

Definition at line 672 of file kencodingdetector.cpp.

bool KEncodingDetector::visuallyOrdered (  )  const

Definition at line 678 of file kencodingdetector.cpp.

void KEncodingDetector::setAutoDetectLanguage ( KEncodingDetector::AutoDetectScript  lang  ) 

Definition at line 658 of file kencodingdetector.cpp.

KEncodingDetector::AutoDetectScript KEncodingDetector::autoDetectLanguage (  )  const

Definition at line 662 of file kencodingdetector.cpp.

KEncodingDetector::EncodingChoiceSource KEncodingDetector::encodingChoiceSource (  )  const

Definition at line 667 of file kencodingdetector.cpp.

QString KEncodingDetector::decode ( const char *  data,
int  len 
)

The main class method.

Calls protected analyze() only the first time of the whole object life

Replaces all null chars with spaces.

Definition at line 748 of file kencodingdetector.cpp.

QString KEncodingDetector::decode ( const QByteArray &  data  ) 

Definition at line 760 of file kencodingdetector.cpp.

QString KEncodingDetector::decodeWithBuffering ( const char *  data,
int  len 
)

Convenience method that uses buffering.

It waits for full html head to be buffered (i.e. calls analyze every time until it returns true).

Replaces all null chars with spaces.

Returns:
Decoded data, or empty string, if there was not enough data for accurate detection
See also:
flush()

Definition at line 772 of file kencodingdetector.cpp.

QString KEncodingDetector::flush (  ) 

Convenience method to be used with decodeForHtml.

Flushes buffer.

See also:
decodeForHtml()

Definition at line 839 of file kencodingdetector.cpp.

KEncodingDetector::AutoDetectScript KEncodingDetector::scriptForName ( const QString &  lang  )  [static]

Takes lang name _after_ it were i18n()'ed.

Definition at line 1112 of file kencodingdetector.cpp.

QString KEncodingDetector::nameForScript ( KEncodingDetector::AutoDetectScript  script  )  [static]

Definition at line 1174 of file kencodingdetector.cpp.

bool KEncodingDetector::hasAutoDetectionForScript ( KEncodingDetector::AutoDetectScript  script  )  [static]

Definition at line 1140 of file kencodingdetector.cpp.

bool KEncodingDetector::processNull ( char *  data,
int  length 
) [protected]

This nice method will kill all 0 bytes (or double bytes) and remember if this was a binary or not ;).

Definition at line 550 of file kencodingdetector.cpp.

bool KEncodingDetector::errorsIfUtf8 ( const char *  data,
int  length 
) [protected]

Check if we are really utf8.

Taken from kate

Returns:
true if current encoding is utf8 and the text cannot be in this encoding
Please somebody read http://de.wikipedia.org/wiki/UTF-8 and check this code...

Definition at line 579 of file kencodingdetector.cpp.

bool KEncodingDetector::analyze ( const char *  data,
int  len 
) [protected]

Analyze text data.

Returns:
true if there was enough data for accurate detection

Definition at line 853 of file kencodingdetector.cpp.

QTextDecoder * KEncodingDetector::decoder (  )  [protected]

Returns:
QTextDecoder for detected encoding

Definition at line 688 of file kencodingdetector.cpp.


The documentation for this class was generated from the following files:
  • kencodingdetector.h
  • kencodingdetector.cpp

KDECore

Skip menu "KDECore"
  • Main Page
  • Modules
  • Namespace List
  • Class Hierarchy
  • Alphabetical List
  • Class List
  • File List
  • Namespace Members
  • Class Members
  • Related Pages

kdelibs

Skip menu "kdelibs"
  • DNSSD
  • Interfaces
  •   KHexEdit
  •   KMediaPlayer
  •   KSpeech
  •   KTextEditor
  • Kate
  • kconf_update
  • KDE3Support
  •   KUnitTest
  • KDECore
  • KDED
  • KDEsu
  • KDEUI
  • KDocTools
  • KFile
  • KHTML
  • KImgIO
  • KInit
  • KIO
  • KIOSlave
  • KJS
  •   WTF
  • KJSEmbed
  • KNewStuff
  • KParts
  • Kross
  • KUtils
  • Nepomuk
  •   core
  • Phonon
  •   Backend
  • Solid
  • Sonnet
  • ThreadWeaver
Generated for kdelibs by doxygen 1.5.4
This website is maintained by Adriaan de Groot and Allen Winter.
KDE® and the K Desktop Environment® logo are registered trademarks of KDE e.V. | Legal