kviewshell
DjVuTXT Class Reference
Description of the text contained in a DjVu page. More...
#include <DjVuText.h>
Classes | |
class | Zone |
Data structure representing document textual components. More... | |
Public Types | |
enum | ZoneType { PAGE = 1, COLUMN = 2, REGION = 3, PARAGRAPH = 4, LINE = 5, WORD = 6, CHARACTER = 7 } |
Public Member Functions | |
GP< DjVuTXT > | copy (void) const |
void | decode (const GP< ByteStream > &bs) |
void | encode (const GP< ByteStream > &bs) const |
GList< Zone * > | find_text_in_rect (GRect target_rect, GUTF8String &text) const |
GList< GRect > | find_text_with_rect (const GRect &box, GUTF8String &text, const int padding=0) const |
unsigned int | get_memory_usage () const |
GUTF8String | get_xmlText (const int height) const |
void | get_zones (int zone_type, const Zone *parent, GList< Zone * > &zone_list) const |
int | has_valid_zones () const |
void | normalize_text () |
void | writeText (ByteStream &bs, const int height) const |
Static Public Member Functions | |
static GP< DjVuTXT > | create (void) |
Public Attributes | |
Zone | page_zone |
GUTF8String | textUTF8 |
Static Public Attributes | |
static const char | end_of_column = 013 |
static const char | end_of_line = 012 |
static const char | end_of_paragraph = 037 |
static const char | end_of_region = 035 |
Protected Member Functions | |
DjVuTXT (void) |
Detailed Description
Description of the text contained in a DjVu page.This class contains the textual data for the page. It describes the text as a hierarchy of zones corresponding to page, column, region, paragraph, lines, words, etc... The piece of text associated with each zone is represented by an offset and a length describing a segment of a global UTF8 encoded string.
Definition at line 109 of file DjVuText.h.
Member Enumeration Documentation
enum DjVuTXT::ZoneType |
These constants are used to tell what a zone describes.
This can be useful for a copy/paste application. The deeper we go into the hierarchy, the higher the constant.
Definition at line 120 of file DjVuText.h.
Constructor & Destructor Documentation
DjVuTXT::DjVuTXT | ( | void | ) | [inline, protected] |
Definition at line 112 of file DjVuText.h.
Member Function Documentation
void DjVuTXT::decode | ( | const GP< ByteStream > & | bs | ) |
void DjVuTXT::encode | ( | const GP< ByteStream > & | bs | ) | const |
GList< DjVuTXT::Zone * > DjVuTXT::find_text_in_rect | ( | GRect | target_rect, | |
GUTF8String & | text | |||
) | const |
GList< GRect > DjVuTXT::find_text_with_rect | ( | const GRect & | box, | |
GUTF8String & | text, | |||
const int | padding = 0 | |||
) | const |
unsigned int DjVuTXT::get_memory_usage | ( | void | ) | const |
Returns the number of bytes needed by this data structure.
It's used by caching routines to estimate the size of a {DjVuImage}.
Definition at line 674 of file DjVuText.cpp.
GUTF8String DjVuTXT::get_xmlText | ( | const int | height | ) | const |
Get all zones of zone type zone_type under node parent.
zone_list contains the return value.
Definition at line 487 of file DjVuText.cpp.
int DjVuTXT::has_valid_zones | ( | ) | const |
void DjVuTXT::normalize_text | ( | ) |
Normalize textual data.
Assuming that a zone hierarchy has been built and represents the reading order. This function reorganizes the string textUTF8# by gathering the highest level text available in the zone hierarchy. The text offsets and lengths are recomputed for all the zones in the hierarchy. Separators are inserted where appropriate.
Definition at line 302 of file DjVuText.cpp.
void DjVuTXT::writeText | ( | ByteStream & | bs, | |
const int | height | |||
) | const |
Member Data Documentation
const char DjVuTXT::end_of_column = 013 [static] |
Definition at line 181 of file DjVuText.h.
const char DjVuTXT::end_of_line = 012 [static] |
Definition at line 184 of file DjVuText.h.
const char DjVuTXT::end_of_paragraph = 037 [static] |
Definition at line 183 of file DjVuText.h.
const char DjVuTXT::end_of_region = 035 [static] |
Definition at line 182 of file DjVuText.h.
Textual data for this page.
The content of this string is encoded using the UTF8 code. This code corresponds to ASCII for the first 127 characters. Columns, regions, paragraph and lines are delimited by the following control character: {tabular}{lll} { Name} & { Octal} & { Ascii name} \\ { DjVuText::end_of_column} & 013 & VT, Vertical Tab \ { DjVuText::end_of_region} & 035 & GS, Group Separator \ { DjVuText::end_of_paragraph} & 037 & US, Unit Separator \ { DjVuText::end_of_line} & 012 & LF: Line Feed {tabular}
Definition at line 180 of file DjVuText.h.
The documentation for this class was generated from the following files: