#include <CAcInvertedFile.h>

Public Member Functions | |
| virtual bool | operator() () const =0 |
| virtual string | IDToURL (TID inID) const =0 |
| virtual pair< bool, TID > | URLToID (const string &inURL) const =0 |
| virtual list< TID > * | getAllFeatureIDs () const =0 |
| bool | operator() () const |
| CAcInvertedFile (const CXMLElement &inCollectionElement) | |
| bool | init (bool) |
| ~CAcInvertedFile () | |
| string | IDToURL (TID inID) const |
| TID | URLToID (const string &inURL) const |
| TID | getMaximumFeatureID () const |
| list< TID > * | getAllFeatureIDs () const |
The proper inverted file access | |
| virtual CDocumentFrequencyList * | FeatureToList (TFeatureID inFID) const =0 |
| virtual CDocumentFrequencyList * | URLToFeatureList (string inURL) const =0 |
| virtual CDocumentFrequencyList * | DIDToFeatureList (TID inDID) const =0 |
Accessing information about features | |
| virtual double | FeatureToCollectionFrequency (TFeatureID) const =0 |
| virtual unsigned int | getFeatureDescription (TID inFeatureID) const =0 |
Accessing additional document information | |
| virtual double | DIDToMaxDocumentFrequency (TID) const =0 |
| virtual double | DIDToDFSquareSum (TID) const =0 |
| virtual double | DIDToSquareDFLogICFSum (TID) const =0 |
| virtual bool | generateInvertedFile ()=0 |
| virtual bool | checkConsistency ()=0 |
The proper inverted file access | |
| CDocumentFrequencyList * | FeatureToList (TFeatureID) const |
| CDocumentFrequencyList * | URLToFeatureList (string inURL) const |
| CDocumentFrequencyList * | DIDToFeatureList (TID inDID) const |
Accessing information about features | |
| double | FeatureToCollectionFrequency (TFeatureID) const |
| unsigned int | getFeatureDescription (TID inFeatureID) const |
Accessing additional document information | |
| double | DIDToMaxDocumentFrequency (TID) const |
| double | DIDToDFSquareSum (TID) const |
| double | DIDToSquareDFLogICFSum (TID) const |
| bool | generateInvertedFile () |
| bool | newGenerateInvertedFile () |
| bool | checkConsistency () |
| bool | findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const |
Protected Types | |
| typedef hash_map< TID, unsigned int > | CIDToOffset |
Protected Member Functions | |
| void | writeOffsetFileElement (TID inFeatureID, int inPosition, ostream &inOpenOffsetFile) |
| CDocumentFrequencyList * | getFeatureFile (string inFileName) const |
Protected Attributes | |
| TID | mMaximumFeatureID |
| CArraySelfDestroyPointer< char > | mInvertedFileBuffer |
| CSelfDestroyPointer< istream > | mInvertedFile |
| ifstream | mOffsetFile |
| ifstream | mFeatureDescriptionFile |
| string | mInvertedFileName |
| string | mOffsetFileName |
| string | mFeatureDescriptionFileName |
| CIDToOffset | mIDToOffset |
| hash_map< TID, double > | mFeatureToCollectionFrequency |
for fast access... | |
| hash_map< TID, unsigned int > | mFeatureDescription |
| CADIHash | mDocumentInformation |
The above content is pretty old. As a note of 20070703, memory mapped files become interesting again when many people are using 64-bit systems. On 32-bit systems, memory-mapped files place a severe limit on the size of inverted files. As an aside, probably there is more time to save if we use inverted file size.
An accessor to an inverted file. This access is done "by hand" at present this not really efficient, however we plan to move to memory mapped files.
Definition at line 90 of file CAcInvertedFile.h.
typedef hash_map<TID,unsigned int> CAcInvertedFile::CIDToOffset [protected] |
map from feature id to the offset for this feature
Reimplemented in CAcIFFileSystem.
Definition at line 110 of file CAcSQLInvertedFile.h.
| CAcInvertedFile::CAcInvertedFile | ( | const CXMLElement & | inCollectionElement | ) |
This opens an exsisting inverted file, and then inits this structure. After that it is fully usable
As a paramter it takes an XMLElement which contains a "collection" element and its content.
If the attribute vi-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards.
The REAL constructor.
| CAcInvertedFile::~CAcInvertedFile | ( | ) |
Destructor
| virtual bool CAcInvertedFile::operator() | ( | ) | const [pure virtual] |
for testing if the inverted file is correctly constructed
Implemented in CAcIFFileSystem, and CAcIFMeta.
| virtual string CAcInvertedFile::IDToURL | ( | TID | inID | ) | const [pure virtual] |
<HIER-WIRDS-INTERESSANT-> Translate a DocumentID to a URL (for output)
Implements CAccessor.
Implemented in CAcIFFileSystem, and CAcIFMeta.
Translate an URL to its document ID
Implements CAccessor.
Implemented in CAcIFFileSystem, and CAcIFMeta.
| virtual CDocumentFrequencyList* CAcInvertedFile::FeatureToList | ( | TFeatureID | inFID | ) | const [pure virtual] |
Give the List of documents containing the feature inFID
Implemented in CAcIFFileSystem, and CAcIFMeta.
| virtual CDocumentFrequencyList* CAcInvertedFile::URLToFeatureList | ( | string | inURL | ) | const [pure virtual] |
List of features contained by a document with URL inURL
Implemented in CAcIFFileSystem, and CAcIFMeta.
| virtual CDocumentFrequencyList* CAcInvertedFile::DIDToFeatureList | ( | TID | inDID | ) | const [pure virtual] |
List of features contained by a document with ID inDID
Implemented in CAcIFFileSystem, and CAcIFMeta.
| virtual double CAcInvertedFile::FeatureToCollectionFrequency | ( | TFeatureID | ) | const [pure virtual] |
Collection frequency for a given feature
Implemented in CAcIFFileSystem, and CAcIFMeta.
Referenced by CQNSquareDFLogICFSum::considerQueryFeature(), CSortByDFTimesLogICF_WF::operator()(), CWFClassicalIDF::preCalculate(), CWFBinaryTerm::subApply(), CWFBestProbabilistic::subApply(), and CWFBestFullyWeighted::subApply().
| virtual unsigned int CAcInvertedFile::getFeatureDescription | ( | TID | inFeatureID | ) | const [pure virtual] |
What kind of feature is the feature with ID inFeatureID?
Implemented in CAcIFFileSystem, and CAcIFMeta.
Referenced by CWeightingFunction::setID().
| virtual double CAcInvertedFile::DIDToMaxDocumentFrequency | ( | TID | ) | const [pure virtual] |
returns the maximum document frequency for one document ID
Implemented in CAcIFFileSystem, and CAcIFMeta.
Referenced by CWFBestProbabilistic::apply().
| virtual double CAcInvertedFile::DIDToDFSquareSum | ( | TID | ) | const [pure virtual] |
Returns the document-frequency square sum for a given document ID
Implemented in CAcIFFileSystem, and CAcIFMeta.
Referenced by CWFStandardTF::apply().
| virtual double CAcInvertedFile::DIDToSquareDFLogICFSum | ( | TID | ) | const [pure virtual] |
Returns this function for a given document ID
Implemented in CAcIFFileSystem, and CAcIFMeta.
| virtual bool CAcInvertedFile::generateInvertedFile | ( | ) | [pure virtual] |
Generating an inverted File, if there is none.
Implemented in CAcIFFileSystem, and CAcIFMeta.
| virtual bool CAcInvertedFile::checkConsistency | ( | ) | [pure virtual] |
Check the consistency of the inverted file system accessed by this accessor.
Implemented in CAcIFFileSystem, and CAcIFMeta.
| virtual list<TID>* CAcInvertedFile::getAllFeatureIDs | ( | ) | const [pure virtual] |
Getting a list of all features contained in this. This function is necessary, because in the present system only about 50 percent of the features are really used.
A feature is considered used if it arises in at least one image
Implemented in CAcIFFileSystem, and CAcIFMeta.
| void CAcInvertedFile::writeOffsetFileElement | ( | TID | inFeatureID, | |
| int | inPosition, | |||
| ostream & | inOpenOffsetFile | |||
| ) | [protected] |
add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction)
| CDocumentFrequencyList* CAcInvertedFile::getFeatureFile | ( | string | inFileName | ) | const [protected] |
loads a *.fts file. and returns the feature list
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| bool CAcInvertedFile::operator() | ( | ) | const |
for testing if the inverted file is correctly constructed
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
called by constructors
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| string CAcInvertedFile::IDToURL | ( | TID | inID | ) | const [virtual] |
Translate a DocumentID to a URL (for output)
Implements CAccessor.
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| TID CAcInvertedFile::URLToID | ( | const string & | inURL | ) | const [virtual] |
Translate an URL to its document ID
Implements CAccessor.
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| CDocumentFrequencyList* CAcInvertedFile::FeatureToList | ( | TFeatureID | ) | const |
List of documents containing the feature
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| CDocumentFrequencyList* CAcInvertedFile::URLToFeatureList | ( | string | inURL | ) | const |
List of features contained by a document
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| CDocumentFrequencyList* CAcInvertedFile::DIDToFeatureList | ( | TID | inDID | ) | const |
List of features contained by a document with ID inDID
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| double CAcInvertedFile::FeatureToCollectionFrequency | ( | TFeatureID | ) | const |
Collection frequency for a given feature
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| unsigned int CAcInvertedFile::getFeatureDescription | ( | TID | inFeatureID | ) | const |
What kind of feature is the feature with ID inFeatureID?
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| double CAcInvertedFile::DIDToMaxDocumentFrequency | ( | TID | ) | const |
returns the maximum document frequency for one document ID
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| double CAcInvertedFile::DIDToDFSquareSum | ( | TID | ) | const |
Returns the document-frequency square sum for a given document ID
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| double CAcInvertedFile::DIDToSquareDFLogICFSum | ( | TID | ) | const |
Returns this function for a given document ID
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| bool CAcInvertedFile::generateInvertedFile | ( | ) |
Generating an inverted File, if there is none. Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation.
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| bool CAcInvertedFile::newGenerateInvertedFile | ( | ) |
Generating an inverted File, if there is none.
Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181)
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| bool CAcInvertedFile::checkConsistency | ( | ) |
Check the consistency of the inverted file system accessed by this accessor.
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| bool CAcInvertedFile::findWithinStream | ( | TID | inFeatureID, | |
| TID | inDocumentID, | |||
| double | inDocumentFrequency | |||
| ) | const |
Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| TID CAcInvertedFile::getMaximumFeatureID | ( | ) | const |
This is interesting for browsing
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
| list<TID>* CAcInvertedFile::getAllFeatureIDs | ( | ) | const |
Getting a list of all features contained in this. This function is necessary, because in the present system only about 50 percent of the features are really used.
A feature is considered used if it arises in mIDToOffset.
Reimplemented in CAcIFFileSystem, and CAcIFMeta.
TID CAcInvertedFile::mMaximumFeatureID [protected] |
the maximum feature ID arising in this file
Reimplemented in CAcIFFileSystem.
Definition at line 87 of file CAcSQLInvertedFile.h.
CArraySelfDestroyPointer<char> CAcInvertedFile::mInvertedFileBuffer [protected] |
A buffer, if the inverted file is to be held in ram
Reimplemented in CAcIFFileSystem.
Definition at line 90 of file CAcSQLInvertedFile.h.
CSelfDestroyPointer<istream> CAcInvertedFile::mInvertedFile [mutable, protected] |
The inverted file
Reimplemented in CAcIFFileSystem.
Definition at line 92 of file CAcSQLInvertedFile.h.
ifstream CAcInvertedFile::mOffsetFile [mutable, protected] |
Feature -> Offset in inverted file
Reimplemented in CAcIFFileSystem.
Definition at line 95 of file CAcSQLInvertedFile.h.
ifstream CAcInvertedFile::mFeatureDescriptionFile [protected] |
File of feature descriptions
Reimplemented in CAcIFFileSystem.
Definition at line 98 of file CAcSQLInvertedFile.h.
string CAcInvertedFile::mInvertedFileName [protected] |
Name of the inverted file
Reimplemented in CAcIFFileSystem.
Definition at line 101 of file CAcSQLInvertedFile.h.
string CAcInvertedFile::mOffsetFileName [protected] |
Name of the Offset file
Reimplemented in CAcIFFileSystem.
Definition at line 104 of file CAcSQLInvertedFile.h.
string CAcInvertedFile::mFeatureDescriptionFileName [protected] |
Name for the file with the feature description
Reimplemented in CAcIFFileSystem.
Definition at line 107 of file CAcSQLInvertedFile.h.
CIDToOffset CAcInvertedFile::mIDToOffset [protected] |
map from feature id to the offset for this feature
Reimplemented in CAcIFFileSystem.
Definition at line 112 of file CAcSQLInvertedFile.h.
hash_map<TID,double> CAcInvertedFile::mFeatureToCollectionFrequency [mutable, protected] |
map from feature to the collection frequency
Reimplemented in CAcIFFileSystem.
Definition at line 115 of file CAcSQLInvertedFile.h.
hash_map<TID,unsigned int> CAcInvertedFile::mFeatureDescription [protected] |
map from the feature ID to the feature description
Reimplemented in CAcIFFileSystem.
Definition at line 120 of file CAcSQLInvertedFile.h.
CADIHash CAcInvertedFile::mDocumentInformation [protected] |
additional information about the document like, e.g. the euclidean length of the feature list.
Reimplemented in CAcIFFileSystem.
Definition at line 125 of file CAcSQLInvertedFile.h.
1.5.6