strigi
The Strigi API Reference
Strigi is a Desktop Search Engine. It is very modular and uses the efficient JStreams classes to read files that are embedded at arbitrary depth in other documents.
For more information, have a look at the streams and streamanalyzer sections.
KDE 4.1 API Reference