CAcInvertedFile Class Reference

#include <CAcInvertedFile.h>

Inheritance diagram for CAcInvertedFile:

CAcURL2FTS CAccessor CAccessorImplementation CAccessor CAcIFFileSystem CAcIFMeta

List of all members.

Public Member Functions

virtual bool operator() () const =0
virtual string IDToURL (TID inID) const =0
virtual pair< bool, TIDURLToID (const string &inURL) const =0
virtual list< TID > * getAllFeatureIDs () const =0
bool operator() () const
 CAcInvertedFile (const CXMLElement &inCollectionElement)
bool init (bool)
 ~CAcInvertedFile ()
string IDToURL (TID inID) const
TID URLToID (const string &inURL) const
TID getMaximumFeatureID () const
list< TID > * getAllFeatureIDs () const
The proper inverted file access
virtual CDocumentFrequencyListFeatureToList (TFeatureID inFID) const =0
virtual CDocumentFrequencyListURLToFeatureList (string inURL) const =0
virtual CDocumentFrequencyListDIDToFeatureList (TID inDID) const =0
Accessing information about features
virtual double FeatureToCollectionFrequency (TFeatureID) const =0
virtual unsigned int getFeatureDescription (TID inFeatureID) const =0
Accessing additional document information
virtual double DIDToMaxDocumentFrequency (TID) const =0
virtual double DIDToDFSquareSum (TID) const =0
virtual double DIDToSquareDFLogICFSum (TID) const =0
virtual bool generateInvertedFile ()=0
virtual bool checkConsistency ()=0
The proper inverted file access
CDocumentFrequencyListFeatureToList (TFeatureID) const
CDocumentFrequencyListURLToFeatureList (string inURL) const
CDocumentFrequencyListDIDToFeatureList (TID inDID) const
Accessing information about features
double FeatureToCollectionFrequency (TFeatureID) const
unsigned int getFeatureDescription (TID inFeatureID) const
Accessing additional document information
double DIDToMaxDocumentFrequency (TID) const
double DIDToDFSquareSum (TID) const
double DIDToSquareDFLogICFSum (TID) const
bool generateInvertedFile ()
bool newGenerateInvertedFile ()
bool checkConsistency ()
bool findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const

Protected Types

typedef hash_map< TID,
unsigned int > 
CIDToOffset

Protected Member Functions

void writeOffsetFileElement (TID inFeatureID, int inPosition, ostream &inOpenOffsetFile)
CDocumentFrequencyListgetFeatureFile (string inFileName) const

Protected Attributes

TID mMaximumFeatureID
CArraySelfDestroyPointer< char > mInvertedFileBuffer
CSelfDestroyPointer< istream > mInvertedFile
ifstream mOffsetFile
ifstream mFeatureDescriptionFile
string mInvertedFileName
string mOffsetFileName
string mFeatureDescriptionFileName
CIDToOffset mIDToOffset
hash_map< TID, double > mFeatureToCollectionFrequency
for fast access...
hash_map< TID, unsigned int > mFeatureDescription
CADIHash mDocumentInformation


Detailed Description

An accessor to an inverted file. This access is done "by hand" at present this not really efficient, however we plan to move to memory mapped files.

The above content is pretty old. As a note of 20070703, memory mapped files become interesting again when many people are using 64-bit systems. On 32-bit systems, memory-mapped files place a severe limit on the size of inverted files. As an aside, probably there is more time to save if we use inverted file size.

An accessor to an inverted file. This access is done "by hand" at present this not really efficient, however we plan to move to memory mapped files.

Definition at line 90 of file CAcInvertedFile.h.


Member Typedef Documentation

typedef hash_map<TID,unsigned int> CAcInvertedFile::CIDToOffset [protected]

map from feature id to the offset for this feature

Reimplemented in CAcIFFileSystem.

Definition at line 110 of file CAcSQLInvertedFile.h.


Constructor & Destructor Documentation

CAcInvertedFile::CAcInvertedFile ( const CXMLElement inCollectionElement  ) 

This opens an exsisting inverted file, and then inits this structure. After that it is fully usable

As a paramter it takes an XMLElement which contains a "collection" element and its content.

If the attribute vi-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards.

The REAL constructor.

CAcInvertedFile::~CAcInvertedFile (  ) 

Destructor


Member Function Documentation

virtual bool CAcInvertedFile::operator() (  )  const [pure virtual]

for testing if the inverted file is correctly constructed

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual string CAcInvertedFile::IDToURL ( TID  inID  )  const [pure virtual]

<HIER-WIRDS-INTERESSANT-> Translate a DocumentID to a URL (for output)

Implements CAccessor.

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual pair<bool,TID> CAcInvertedFile::URLToID ( const string &  inURL  )  const [pure virtual]

Translate an URL to its document ID

Implements CAccessor.

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual CDocumentFrequencyList* CAcInvertedFile::FeatureToList ( TFeatureID  inFID  )  const [pure virtual]

Give the List of documents containing the feature inFID

Returns:
list of ID/frequency pairs struct{ int mID, float mFrequency; }

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual CDocumentFrequencyList* CAcInvertedFile::URLToFeatureList ( string  inURL  )  const [pure virtual]

List of features contained by a document with URL inURL

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual CDocumentFrequencyList* CAcInvertedFile::DIDToFeatureList ( TID  inDID  )  const [pure virtual]

List of features contained by a document with ID inDID

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual double CAcInvertedFile::FeatureToCollectionFrequency ( TFeatureID   )  const [pure virtual]

virtual unsigned int CAcInvertedFile::getFeatureDescription ( TID  inFeatureID  )  const [pure virtual]

What kind of feature is the feature with ID inFeatureID?

Implemented in CAcIFFileSystem, and CAcIFMeta.

Referenced by CWeightingFunction::setID().

virtual double CAcInvertedFile::DIDToMaxDocumentFrequency ( TID   )  const [pure virtual]

returns the maximum document frequency for one document ID

Implemented in CAcIFFileSystem, and CAcIFMeta.

Referenced by CWFBestProbabilistic::apply().

virtual double CAcInvertedFile::DIDToDFSquareSum ( TID   )  const [pure virtual]

Returns the document-frequency square sum for a given document ID

Implemented in CAcIFFileSystem, and CAcIFMeta.

Referenced by CWFStandardTF::apply().

virtual double CAcInvertedFile::DIDToSquareDFLogICFSum ( TID   )  const [pure virtual]

Returns this function for a given document ID

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual bool CAcInvertedFile::generateInvertedFile (  )  [pure virtual]

Generating an inverted File, if there is none.

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual bool CAcInvertedFile::checkConsistency (  )  [pure virtual]

Check the consistency of the inverted file system accessed by this accessor.

Implemented in CAcIFFileSystem, and CAcIFMeta.

virtual list<TID>* CAcInvertedFile::getAllFeatureIDs (  )  const [pure virtual]

Getting a list of all features contained in this. This function is necessary, because in the present system only about 50 percent of the features are really used.

A feature is considered used if it arises in at least one image

Implemented in CAcIFFileSystem, and CAcIFMeta.

void CAcInvertedFile::writeOffsetFileElement ( TID  inFeatureID,
int  inPosition,
ostream &  inOpenOffsetFile 
) [protected]

add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction)

CDocumentFrequencyList* CAcInvertedFile::getFeatureFile ( string  inFileName  )  const [protected]

loads a *.fts file. and returns the feature list

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

bool CAcInvertedFile::operator() (  )  const

for testing if the inverted file is correctly constructed

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

bool CAcInvertedFile::init ( bool   ) 

called by constructors

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

string CAcInvertedFile::IDToURL ( TID  inID  )  const [virtual]

Translate a DocumentID to a URL (for output)

Implements CAccessor.

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

TID CAcInvertedFile::URLToID ( const string &  inURL  )  const [virtual]

Translate an URL to its document ID

Implements CAccessor.

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

CDocumentFrequencyList* CAcInvertedFile::FeatureToList ( TFeatureID   )  const

List of documents containing the feature

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

CDocumentFrequencyList* CAcInvertedFile::URLToFeatureList ( string  inURL  )  const

List of features contained by a document

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

CDocumentFrequencyList* CAcInvertedFile::DIDToFeatureList ( TID  inDID  )  const

List of features contained by a document with ID inDID

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

double CAcInvertedFile::FeatureToCollectionFrequency ( TFeatureID   )  const

Collection frequency for a given feature

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

unsigned int CAcInvertedFile::getFeatureDescription ( TID  inFeatureID  )  const

What kind of feature is the feature with ID inFeatureID?

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

double CAcInvertedFile::DIDToMaxDocumentFrequency ( TID   )  const

returns the maximum document frequency for one document ID

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

double CAcInvertedFile::DIDToDFSquareSum ( TID   )  const

Returns the document-frequency square sum for a given document ID

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

double CAcInvertedFile::DIDToSquareDFLogICFSum ( TID   )  const

Returns this function for a given document ID

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

bool CAcInvertedFile::generateInvertedFile (  ) 

Generating an inverted File, if there is none. Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation.

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

bool CAcInvertedFile::newGenerateInvertedFile (  ) 

Generating an inverted File, if there is none.

Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181)

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

bool CAcInvertedFile::checkConsistency (  ) 

Check the consistency of the inverted file system accessed by this accessor.

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

bool CAcInvertedFile::findWithinStream ( TID  inFeatureID,
TID  inDocumentID,
double  inDocumentFrequency 
) const

Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

TID CAcInvertedFile::getMaximumFeatureID (  )  const

This is interesting for browsing

Reimplemented in CAcIFFileSystem, and CAcIFMeta.

list<TID>* CAcInvertedFile::getAllFeatureIDs (  )  const

Getting a list of all features contained in this. This function is necessary, because in the present system only about 50 percent of the features are really used.

A feature is considered used if it arises in mIDToOffset.

Reimplemented in CAcIFFileSystem, and CAcIFMeta.


Member Data Documentation

the maximum feature ID arising in this file

Reimplemented in CAcIFFileSystem.

Definition at line 87 of file CAcSQLInvertedFile.h.

A buffer, if the inverted file is to be held in ram

Reimplemented in CAcIFFileSystem.

Definition at line 90 of file CAcSQLInvertedFile.h.

CSelfDestroyPointer<istream> CAcInvertedFile::mInvertedFile [mutable, protected]

The inverted file

Reimplemented in CAcIFFileSystem.

Definition at line 92 of file CAcSQLInvertedFile.h.

ifstream CAcInvertedFile::mOffsetFile [mutable, protected]

Feature -> Offset in inverted file

Reimplemented in CAcIFFileSystem.

Definition at line 95 of file CAcSQLInvertedFile.h.

File of feature descriptions

Reimplemented in CAcIFFileSystem.

Definition at line 98 of file CAcSQLInvertedFile.h.

Name of the inverted file

Reimplemented in CAcIFFileSystem.

Definition at line 101 of file CAcSQLInvertedFile.h.

Name of the Offset file

Reimplemented in CAcIFFileSystem.

Definition at line 104 of file CAcSQLInvertedFile.h.

Name for the file with the feature description

Reimplemented in CAcIFFileSystem.

Definition at line 107 of file CAcSQLInvertedFile.h.

map from feature id to the offset for this feature

Reimplemented in CAcIFFileSystem.

Definition at line 112 of file CAcSQLInvertedFile.h.

hash_map<TID,double> CAcInvertedFile::mFeatureToCollectionFrequency [mutable, protected]

map from feature to the collection frequency

Reimplemented in CAcIFFileSystem.

Definition at line 115 of file CAcSQLInvertedFile.h.

hash_map<TID,unsigned int> CAcInvertedFile::mFeatureDescription [protected]

map from the feature ID to the feature description

Reimplemented in CAcIFFileSystem.

Definition at line 120 of file CAcSQLInvertedFile.h.

additional information about the document like, e.g. the euclidean length of the feature list.

Reimplemented in CAcIFFileSystem.

Definition at line 125 of file CAcSQLInvertedFile.h.


The documentation for this class was generated from the following files:

Generated on Wed Jan 7 00:31:03 2009 for Gift by  doxygen 1.5.6