DataPool Class Reference

Thread safe data storage. More...

#include <DataPool.h>

Inheritance diagram for DataPool:


Public Member Functions
void	clear_stream (const bool release=true)
void	load_file (void)
void	stop (bool only_blocked=false)
Adding data.
Please note, that these functions are for not connected DataPool::s only. You can not add data to a DataPool#, which is connected to another DataPool# or to a file.
void	add_data (const void *buffer, int offset, int size)
void	add_data (const void *buffer, int size)
void	set_eof (void)
Trigger callbacks.
{Trigger callbacks} are special callbacks called when all data for the given range of offsets has been made available. Since reading unavailable data may result in a thread block, which may be bad, the usage of {trigger callbacks} appears to be a convenient way to signal availability of data. You can add a trigger callback in two ways: {enumerate} By specifying a range. This is the most general case By providing just one {threshold}. In this case the range is assumed to start from offset ZERO# and last for {threshold}+1 bytes. {enumerate}
void	add_trigger (int thresh, void(callback)(void ), void *cl_data)
void	add_trigger (int start, int length, void(callback)(void ), void *cl_data)
void	del_trigger (void(callback)(void ), void *cl_data)
Accessing data.
These functions provide direct and sequential access to the data of the DataPool#. If the DataPool# is not connected (contains some real data) then it handles the requests itself. Otherwise they are forwarded to the master DataPool# or the file.
int	get_data (void *buffer, int offset, int size)
GP< ByteStream >	get_stream (void)
State querying functions.
int	get_length (void) const
int	get_size (void) const
bool	has_data (int start, int length)
bool	is_connected (void) const
bool	is_eof (void) const
DataPool.h
Files #"DataPool.h"# and #"DataPool.cpp"# implement classes {DataPool} and {DataRange} used by DjVu decoder to access data. The main goal of class {DataPool} is to provide concurrent access to the same data from many threads with a possibility to add data from yet another thread. It is especially important in the case of the Netscape plugin when data is not immediately available, but decoding should be started as soon as possible. In this situation it is vital to provide transparent access to the data from many threads possibly blocking readers that try to access information that has not been received yet. When the data is local though, it can be accessed directly using standard IO mechanism. To provide a uniform interface for decoding routines, {DataPool} supports file mode as well. Thread safe data storage Author: Andrei Erofeev <eaf@geocities.com> Version: $Id: DataPool.h,v 1.10 2003/11/07 22:08:20 leonb Exp $#
bool	simple_compare (DataPool &pool) const
Static Public Member Functions
static void	close_all (void)
static void	load_file (const GURL &url)
Static Public Attributes
static const char *	Stop = ERR_MSG("STOP")
Protected Member Functions
	DataPool (void)
Initialization
static GP< DataPool >	create (const GURL &url, int start=0, int length=-1)
static GP< DataPool >	create (const GP< DataPool > &master_pool, int start=0, int length=-1)
static GP< DataPool >	create (const GP< ByteStream > &str)
static GP< DataPool >	create (void)
void	connect (const GURL &url, int start=0, int length=-1)
void	connect (const GP< DataPool > &master_pool, int start=0, int length=-1)
virtual	~DataPool ()

Detailed Description

Thread safe data storage.

The purpose of DataPool# is to provide a uniform interface for accessing data from decoding routines running in a multi-threaded environment. Depending on the mode of operation it may contain the actual data, may be connected to another DataPool# or may be mapped to a file. Regardless of the mode, the class returns data in a thread-safe way, blocking reading threads if there is no data of interest available. This blocking is especially useful in the networking environment (plugin) when there is a running decoding thread, which wants to start decoding as soon as there is just one byte available blocking if necessary.

Access to data in a DataPool# may be direct (Using {get_data}() function) or sequential (See {get_stream}() function).

If the DataPool# is not connected to anything, that is it contains some real data, this data can be added to it by means of two {add_data}() functions. One of them adds data sequentially maintaining the offset of the last block of data added by it. The other can store data anywhere. Thus it's important to realize, that there may be "white spots" in the data storage.

There is also a way to test if data is available for some given data range (See {has_data}()). In addition to this mechanism, there are so-called {trigger callbacks}, which are called, when there is all data available for a given data range.

Let us consider all modes of operation in details:

{enumerate} { Not connected DataPool#}. In this mode the DataPool# contains some real data. As mentioned above, it may be added by means of two functions {add_data}() operating independent of each other and allowing to add data sequentially and directly to any place of data storage. It's important to call function {set_eof}() after all data has been added.

Functions like {get_data}() or {get_stream}() can be used to obtain direct or sequential access to the data. As long as {is_eof}() is FALSE#, DataPool# will block every reader, which is trying to read unavailable data until it really becomes available. But as soon as {is_eof}() is TRUE#, any attempt to read non-existing data will read #0# bytes.

Taking into account the fact, that DataPool# was designed to store DjVu files, which are in IFF formats, it becomes possible to predict the size of the DataPool# as soon as the first #32# bytes have been added. This is invaluable for estimating download progress. See function {get_length}() for details. If this estimate fails (which means, that stored data is not in IFF format), {get_length}() returns #-1#.

Triggers may be added and removed by means of {add_trigger}() and {del_trigger}() functions. {add_trigger}() takes a data range. As soon as all data in that data range is available, the trigger callback will be called.

All trigger callbacks will be called when EOF# condition has been set.

{ DataPool# connected to another DataPool#}. In this {slave} mode you can map a given DataPool# to any offsets range inside another DataPool#. You can connect the slave DataPool# even if there is no data in the master DataPool#. Any {get_data}() request will be forwarded to the master DataPool#, and it will be responsible for blocking readers trying to access unavailable data.

The usage of {add_data}() functions is prohibited for connected DataPool::s.

The offsets range used to map a slave DataPool# can be fully specified (both start offset and length are positive numbers) or partially specified (the length is negative). In this mode the slave DataPool# is assumed to extend up to the end of the master DataPool#.

Triggers may be used with slave DataPool::s as well as with the master ones.

Calling {stop}() function of a slave will stop only the slave (and any other slave connected to it), but not the master.

{set_eof}() function is meaningless for slaves. They obtain the ByteStream::EndOfFile# status from their master.

Depending on the offsets range passed to the constructor, {get_length}() returns different values. If the length passed to the constructor was positive, then it is returned by {get_length}() all the time. Otherwise the value returned is either #-1# if master's length is still unknown (it didn't manage to parse IFF data yet) or it is calculated as masters_length-slave_start#.

{ DataPool# connected to a file}. This mode is quite similar to the case, when the DataPool# is connected to another DataPool#. Similarly, the DataPool# stores no data inside. It just forwards all {get_data}() requests to the underlying source (a file in this case). Thus these requests will never block the reader. But they may return #0# if there is no data available at the requested offset.

The usage of {add_data}() functions is meaningless and is prohibited.

{is_eof}() function always returns TRUE#. Thus {set_eof}() us meaningless and does nothing.

{get_length}() function always returns the file size.

Calling {stop}() function will stop this DataPool# and any other slave connected to it.

Trigger callbacks passed through {add_trigger}() function are called immediately.

This mode is useful to read and decode DjVu files without reading and storing them in full in memory. {enumerate}

Definition at line 225 of file DataPool.h.

Constructor & Destructor Documentation

DataPool::DataPool ( void ) [protected]

Definition at line 741 of file DataPool.cpp.

DataPool::~DataPool ( void ) [virtual]

Definition at line 830 of file DataPool.cpp.

Member Function Documentation

void DataPool::add_data	(	const void *	buffer,
		int	offset,
		int	size
	)

Stores the specified block of data at the specified offset.

Like the function above this one can also unblock readers waiting for data and engage trigger callbacks. The difference is that { this} function can store data anywhere.

{ Note:} After all the data has been added, it's necessary to call {set_eof}() to tell the DataPool# that nothing else is expected.

{ Note:} This function may not be called if the DataPool# has been connected to something.

Parameters:

	buffer	data to store
	offset	where to store the data
	size	length of the {buffer}

Definition at line 1022 of file DataPool.cpp.

void DataPool::add_data	(	const void *	buffer,
		int	size
	)

Appends the new block of data to the DataPool#.

There are two {add_data}() functions available. One is for adding data sequentially. It keeps track of the last byte position, which has been stored { by it} and always appends the next block after this position. The other {add_data}() can store data anywhere.

The function will unblock readers waiting for data if this data arrives with this block. It may also trigger some {trigger callbacks}, which may have been added by means of {add_trigger}() function.

{ Note:} After all the data has been added, it's necessary to call {set_eof}() to tell the DataPool# that nothing else is expected.

{ Note:} This function may not be called if the DataPool# has been connected to something.

Parameters:

	buffer	data to append
	size	length of the {buffer}

Definition at line 1011 of file DataPool.cpp.

void DataPool::add_trigger	(	int	thresh,
		void()(void )	callback,
		void *	cl_data
	)

Associates the specified {trigger callback} with the specified threshold.

This function is a simplified version of the function above. The callback will be called when there is data available for every offset from #0# to thresh#, if thresh# is positive, or when EOF# condition has been set otherwise.

Definition at line 1506 of file DataPool.cpp.

void DataPool::add_trigger	(	int	start,
		int	length,
		void()(void )	callback,
		void *	cl_data
	)

Associates the specified {trigger callback} with the given data range.

{ Note:} The callback may be called immediately if all data for the given range is already available or EOF# is TRUE#.

Parameters:

	start	The beginning of the range for which all data should be available
	length	If the {length} is not negative then the callback will be called when there is data available for every offset from {start} to {start+length-1}. If {thresh} is negative, the callback is called after EOF# condition has been set.
	callback	Function to call
	cl_data	Argument to pass to the callback when it's called.

Definition at line 1515 of file DataPool.cpp.

void DataPool::clear_stream ( const bool release = true )

Definition at line 812 of file DataPool.cpp.

void DataPool::close_all ( void ) [static]

This function will remove OpenFiles filelist.

Definition at line 1788 of file DataPool.cpp.

void DataPool::connect	(	const GURL &	url,
		int	start = `0`,
		int	length = `-1`
	)

Connects the DataPool# to the specified offsets range of the named url#.

Parameters:

	url	Name of the file to connect to.
	start	Beginning of the offsets range which the DataPool# is mapped into
	length	Length of the offsets range. If negative, the range is assumed to extend up to the end of the file.

Definition at line 899 of file DataPool.cpp.

void DataPool::connect	(	const GP< DataPool > &	master_pool,
		int	start = `0`,
		int	length = `-1`
	)

Switches the DataPool# to slave mode and connects it to the specified offsets range of the master DataPool#.

Parameters:

	master_pool	Master DataPool# providing data for this slave
	start	Beginning of the offsets range which the slave is mapped into
	length	Length of the offsets range. If negative, the range is assumed to extend up to the end of the master DataPool#.

Definition at line 863 of file DataPool.cpp.

GP< DataPool > DataPool::create	(	const GURL &	url,
		int	start = `0`,
		int	length = `-1`
	)			`[static]`

Initializes the DataPool# in slave mode and connects it to the specified offsets range of the specified file.

It is equivalent to calling default constructor and function {connect}().

Parameters:

	url	Name of the file to connect to.
	start	Beginning of the offsets range which the DataPool# is mapped into
	length	Length of the offsets range. If negative, the range is assumed to extend up to the end of the file.

Definition at line 795 of file DataPool.cpp.

GP< DataPool > DataPool::create	(	const GP< DataPool > &	master_pool,
		int	start = `0`,
		int	length = `-1`
	)			`[static]`

Initializes the DataPool# in slave mode and connects it to the specified offsets range of the specified master DataPool#.

It is equivalent to calling default constructor and function {connect}().

Parameters:

	master_pool	Master DataPool# providing data for this slave
	start	Beginning of the offsets range which the slave is mapped into
	length	Length of the offsets range. If negative, the range is assumed to extend up to the end of the master DataPool#.

Definition at line 782 of file DataPool.cpp.

GP< DataPool > DataPool::create ( const GP< ByteStream > & str ) [static]

Creates and initialized the DataPool# with data from stream str#.

The constructor will read the stream's contents and add them to the pool using the {add_data}() function. Afterwards it will call {set_eof}() function, and no other data will be allowed to be added to the pool.

Definition at line 760 of file DataPool.cpp.

GP< DataPool > DataPool::create ( void ) [static]

Default creator.

Will prepare DataPool# for accepting data added through functions {add_data}(). Use {connect}() functions if you want to map this DataPool# to another or to a file.

Definition at line 744 of file DataPool.cpp.

void DataPool::del_trigger	(	void()(void )	callback,
		void *	cl_data
	)

Use this function to unregister callbacks, which are no longer needed.

{ Note!} It's important to do it when the client is about to be destroyed.

Definition at line 1556 of file DataPool.cpp.

int DataPool::get_data	(	void *	buffer,
		int	offset,
		int	size
	)

Attempts to return a block of data at the given offset# of the given size#.

{enumerate} If the DataPool# is connected to another DataPool# or to a file, the request will just be forwarded to them. If the DataPool# is not connected to anything and some of the data requested is in the internal buffer, the function copies available data to buffer# and returns immediately.

If there is no data available, and {is_eof}() returns FALSE#, the reader (and the thread) will be { blocked} until the data actually arrives. Please note, that since the reader is blocked, it should run in a separate thread so that other threads have a chance to call {add_data}(). If there is no data available, but {is_eof}() is TRUE# the behavior is different and depends on the DataPool#'s estimate of the file size: {itemize} If DataPool# learns from the IFF structure of the data, that its size should be greater than it really is, then any attempt to read non-existing data in the range of {valid} offsets will result in an ByteStream::EndOfFile# exception. This is done to indicate, that there was an error in adding data, and the data requested is { supposed} to be there, but has actually not been added. If DataPool#'s expectations about the data size coincide with the reality then any attempt to read data beyond the legal range of offsets will result in ZERO# bytes returned. {itemize}. {enumerate}.

Parameters:

	buffer	Buffer to be filled with data
	offset	Offset in the DataPool# to read data at
	size	Size of the {buffer}

Returns:: The number of bytes actually read

Exceptions:

	STOP	The stream has been stopped
	EOF	The requested data is not there and will not be added, although it should have been.

Definition at line 1098 of file DataPool.cpp.

int DataPool::get_length ( void ) const

Returns the {length} of data in the DataPool#.

The value returned depends on the mode of operation: {itemize} If the DataPool# is not connected to anything then the length returned is either calculated by interpreting the IFF structure of stored data (if successful) or by calculating the real size of data after {set_eof}() has been called. Otherwise it is #-1#. If the DataPool# is connected to a file, the length is calculated basing on the length passed to the {connect}() function and the file size. If the DataPool# is connected to a master DataPool#, the length is calculated basing on the value returned by the master's get_length()# function and the length passed to the {connect}() function. {itemize}.

Definition at line 967 of file DataPool.cpp.

int DataPool::get_size ( void ) const [inline]

Returns the number of bytes of data available in this DataPool#.

Contrary to the {get_length}() function, this one doesn't try to interpret the IFF structure and predict the file length. It just returns the number of bytes of data really available inside the DataPool#, if it contains data, or inside its range, if it's connected to another DataPool# or a file.

Definition at line 476 of file DataPool.h.

GP< ByteStream > DataPool::get_stream ( void )

Returns a {ByteStream} to access contents of the DataPool# sequentially.

By reading from the returned stream you basically call {get_data}() function. Thus, everything said for it remains true for the stream too.

Definition at line 1795 of file DataPool.cpp.

bool DataPool::has_data	(	int	start,
		int	length
	)

Returns TRUE# if all data available for offsets from start# till start+length-1#.

If length# is negative, the range is assumed to extend up to the end of the DataPool#. This function works both for connected and not connected DataPool::s. Once it returned TRUE# for some offsets range, you can be sure that the subsequent {get_data}() request will not block.

Definition at line 1087 of file DataPool.cpp.

bool DataPool::is_connected ( void ) const [inline]

Returns TRUE# if this DataPool# is connected to another DataPool# or to a file.

Definition at line 613 of file DataPool.h.

bool DataPool::is_eof ( void ) const [inline]

Definition at line 451 of file DataPool.h.

void DataPool::load_file ( const GURL & url ) [static]

This function will make every DataPool# in the program, which is connected to a file, to load the file contents to the main memory and close the file.

This feature is important when you want to do something with the file like remove or overwrite it not affecting the rest of the program.

Definition at line 1445 of file DataPool.cpp.

void DataPool::load_file ( void )

Loads data from the file into memory.

This function is only useful for DataPool::s getting data from a file. It descends the DataPool::s hierarchy until it either reaches a file-connected DataPool# or DataPool# containing the real data. In the latter case it does nothing, in the first case it makes the DataPool# read all data from the file into memory and stop using the file.

This may be useful when you want to overwrite the file and leave existing DataPool::s with valid data.

Definition at line 1401 of file DataPool.cpp.

void DataPool::set_eof ( void )

Tells the DataPool# that all data has been added and nothing else is anticipated.

When EOF# is true, any reader attempting to read non existing data will not be blocked. It will either read ZERO# bytes or will get an ByteStream::EndOfFile# exception (see {get_data}()). Calling this function will also activate all registered trigger callbacks.

{ Note:} This function is meaningless and does nothing when the DataPool# is connected to another DataPool# or to a file.

Definition at line 1324 of file DataPool.cpp.

bool DataPool::simple_compare ( DataPool & pool ) const [inline]

Useful in comparing data pools.

Returns true if dirived from same URL or bytestream.

Definition at line 603 of file DataPool.h.

void DataPool::stop ( bool only_blocked = false )

Tells the DataPool# to stop serving readers.

If only_blocked# flag is TRUE# then only those requests will be processed, which would not block. Any attempt to get non-existing data would result in a STOP# exception (instead of blocking until data is available).

If only_blocked# flag is FALSE# then any further attempt to read from this DataPool# (as well as from any DataPool# connected to this one) will result in a STOP# exception.

Definition at line 1347 of file DataPool.cpp.

Member Data Documentation

const char * DataPool::Stop = ERR_MSG("STOP") [static]

Definition at line 598 of file DataPool.h.

The documentation for this class was generated from the following files:

kviewshell

DataPool Class Reference

Public Member Functions

Static Public Member Functions

Static Public Attributes

Protected Member Functions

Initialization

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation

kviewshell

API Reference