#include <CQMultiple.h>

Public Member Functions | |
| CQMultiple () | |
| ~CQMultiple () | |
| CQMultiple (CAccessorAdminCollection &inAccessorAdminCollection, CAlgorithm &inAlgorithm) | |
| virtual CIDRelevanceLevelPairList * | fastQuery (const CXMLElement &inQuery, int inNumberOfInterestingImages, double inDifferenceToBest) |
| virtual CXMLElement * | query (const CXMLElement &inQuery) |
| virtual bool | setAlgorithm (CAlgorithm &inAlgorithm) |
Static Public Member Functions | |
| static void * | doFastQueryThread (void *) |
| static void * | doQueryThread (void *) |
Protected Member Functions | |
| void | init () |
Protected Attributes | |
| bool | mUsesResultURLs |
Probably we will put another layer into the class tree: The CQTreeNode, but let's first start.
Important: The basic assumption here is, that all children operate on the same collections. If this is not the case we have to be more careful, and most of all: we have to operate using URLs.
[in the following I am talking about things I WANT to do, so the two modes stuff is not yet implemented]
CQMultiple has two minor modes:
Merge-by-ID or Merge-by-URL
In the first case, we need information on how to translate image IDs to image URLs. We dispatche a fastQuery() to each child node, and then we merge the results (by imageID). The resulting list of ID-relevancelevel pairs is translated back to URLs using an URL2FTS accessor.
Please note that I am aware that this needs refactoring: we should have an ULRToID accessor superclass, which provides the necessary translation services, without being fixed on a given data representation.
In the second case, we do not need any additional information: we dispatch a query() (as opposed to fastQuery()) to the child nodes, and then we merge the results. This means we have to merge plenty of XML.
: Wolfgang Müller
Definition at line 118 of file CQMultiple.h.
| CQMultiple::CQMultiple | ( | ) |
| CQMultiple::~CQMultiple | ( | ) |
we need to unregister the accessors used
destructor: at present empty
Definition at line 90 of file CQMultiple.cc.
00090 { 00091 00092 cout << "destroying this " 00093 << __FILE__ 00094 << __LINE__ 00095 << flush 00096 << endl; 00097 00098 //i thought i will need this, but at present I do not have this impression 00099 //it does not hurt, so we leave it in 00100 };
| CQMultiple::CQMultiple | ( | CAccessorAdminCollection & | inAccessorAdminCollection, | |
| CAlgorithm & | inAlgorithm | |||
| ) |
In fact, what we are doing here is to get ourselves an accessor ACURL2FTS to do a proper fastQuery
constructor see CQuery
Definition at line 68 of file CQMultiple.cc.
References CXMLElement::boolReadAttribute(), CQuery::mAccessor, CQuery::mAccessorAdmin, mUsesResultURLs, and CAccessorAdmin::openAccessor().
00069 : 00070 CQuery(inAccessorAdminCollection, 00071 inAlgorithm){ 00072 { 00073 00074 pair<bool,bool> lUsesURLs(inAlgorithm.boolReadAttribute("cui-uses-result-urls")); 00075 00076 mUsesResultURLs=(lUsesURLs.first && lUsesURLs.second); 00077 00078 // mproxy has been filled in a reasonable way 00079 // by CQuery::CQuery 00080 mAccessor=mAccessorAdmin->openAccessor("url2fts"); 00081 assert(mAccessor); 00082 } 00083 };
| void CQMultiple::init | ( | ) | [protected, virtual] |
Do we merge the results by their URL or by their image ID?
We read the algorithm to find out
how we will dispatch the queries to the child nodes
in which way we will merge the results coming from the child nodes
Implements CQuery.
Definition at line 49 of file CQMultiple.cc.
| void * CQMultiple::doFastQueryThread | ( | void * | inParameter | ) | [static] |
This function processes is in the inner loop of fastQuery. If multithreading is possible on the system on which GIFT was compiled, then this function will run in a thread, and fastQuery will wait for it
Definition at line 344 of file CQMultiple.cc.
References CQuery::fastQuery(), CQMThread::mDifferenceToBest, CQMThread::mFastResult, CQMThread::mQuery, CQMThread::mQueryProcessor, CQMThread::mResult, and CQMThread::mResultSize.
Referenced by CQMThread::runFastThread().
00344 { 00345 class CQMThread* lParam((CQMThread*) inParameter); 00346 lParam->mFastResult=lParam->mQueryProcessor.fastQuery(*(lParam->mQuery), 00347 lParam->mResultSize, 00348 lParam->mDifferenceToBest); 00349 cout << "I AM FINISHED HERE " << lParam << "result" << lParam->mResult << endl; 00350 return 0; 00351 }
| CIDRelevanceLevelPairList * CQMultiple::fastQuery | ( | const CXMLElement & | inQuery, | |
| int | inNumberOfInterestingImages, | |||
| double | inDifferenceToBest | |||
| ) | [virtual] |
calls fastQuery for every child, merges the results
calls fastQuery for every child, merges the results and translates them back into URLs
NEW, MORE EFFICIENT VERSION
Implements CQuery.
Definition at line 369 of file CQMultiple.cc.
References CQMThread::CQMThread(), CQuery::mAccessor, CQuery::mChildren, and CAccessor::size().
00371 { 00372 00373 cout << "CMultiple Number of children:" 00374 << mChildren.size() 00375 << endl; 00376 00377 00378 //list<CWeightedResult> lTemporary; 00379 double lWeightSum(0); 00380 00381 map<TID,CIDRelevanceLevelPair> lResultMap; 00382 00383 00384 // no mutex protection needed as this is not called except by main thread 00385 00386 // 00387 // rip this into two parts 00388 // in order to 00389 // make it possible to run the querying in one thread 00390 // and do the merging after having waited for each thread 00391 // 00392 00393 list<CQMThread> lListOfThreads; 00394 00395 lCChildren::const_iterator lLast=mChildren.end(); 00396 lLast--; 00397 for(lCChildren::const_iterator i=mChildren.begin(); 00398 i!=mChildren.end(); 00399 i++){ 00400 lWeightSum+=i->mWeight; 00401 00402 //lTemporary.push_back(CWeightedResult()); 00403 00404 cout << "çç---------------------this CMultiple:fastQuery" << this 00405 << ", i->mQuery:" << i->mQuery 00406 << ", i->mWeight:" << i->mWeight 00407 << endl; 00408 00409 00410 lListOfThreads.push_back(CQMThread(*(i->mQuery), // The Query processor to choose 00411 inQuery, // the query to be processed 00412 i->mWeight, // the weight the result will receive 00413 mAccessor->size(), // the size of the accessor (to get all potential results) 00414 inDifferenceToBest));// the difference to the best which is allowed for a result 00415 /* EX-LEAK 00416 the following was a special branch for reducing the 00417 number of spawned threads by one. Apparently this did 00418 not work and caused a memory leak. Now it seems to work. 00419 00420 if(1==0 00421 && (i==lLast)){ 00422 lListOfThreads.back().callFunction();//something to do for the main thread 00423 }else*/ 00424 00425 { 00426 cout << "Running thread" 00427 << endl; 00428 lListOfThreads.back().runFastThread();//run the thread 00429 cout << "loop" 00430 << endl; 00431 } 00432 cout << "endloop" 00433 << endl; 00434 } 00435 // here we would join all threads 00436 00437 for(list<CQMThread>::iterator lThread=lListOfThreads.begin(); 00438 lThread!=lListOfThreads.end(); 00439 lThread++){ 00440 00441 cout << "joining..." << endl; 00442 00443 lThread->join(); 00444 00445 cout << "before merging " << endl; 00446 00447 if(!lThread->mFastResult){ 00448 cout << "THE THE RESULT OF THIS THREAD WAS NIL " 00449 << endl; 00450 } 00451 if(lThread->mFastResult){ 00452 for(CIDRelevanceLevelPairList::iterator i=lThread->mFastResult->begin(); 00453 i!=lThread->mFastResult->end(); 00454 i++){ 00455 00456 map<TID,CIDRelevanceLevelPair>::const_iterator lFound=lResultMap.find(i->getID()); 00457 00458 i->setRelevanceLevel(i->getRelevanceLevel()*lThread->getWeight()); 00459 00460 if(lFound==lResultMap.end()){ 00461 00462 lResultMap.insert(make_pair(i->getID(), 00463 *i)); 00464 }else{ 00465 00466 lResultMap[i->getID()].setRelevanceLevel(lResultMap[i->getID()].getRelevanceLevel() 00467 +i->getRelevanceLevel() 00468 ); 00469 } 00470 } 00471 delete lThread->mFastResult; 00472 } 00473 00474 00475 cout << "after merging " << endl; 00476 } 00477 00478 CIDRelevanceLevelPairList* lReturnValue=new CIDRelevanceLevelPairList(); 00479 00480 cout << "<pushing>" 00481 << endl; 00482 00483 for(map<TID,CIDRelevanceLevelPair>::const_iterator i=lResultMap.begin(); 00484 i!=lResultMap.end(); 00485 i++){ 00486 lReturnValue->push_back(i->second); 00487 } 00488 00489 cout << "</pushing>\n<sorting>" 00490 << endl; 00491 00492 lReturnValue->sort(); 00493 lReturnValue->reverse(); 00494 cout << "Size of the result " 00495 << lReturnValue->size() 00496 << endl; 00497 cout << "</sorting>" 00498 << endl; 00499 00500 cout << "<cutting>" 00501 << endl; 00502 { 00503 CIDRelevanceLevelPairList::iterator iSkip=lReturnValue->begin(); 00504 for(int i=0; 00505 i<inNumberOfInterestingImages && i<lReturnValue->size(); 00506 i++){ 00507 iSkip->setRelevanceLevel(iSkip->getRelevanceLevel()/lWeightSum); 00508 iSkip++; 00509 } 00510 lReturnValue->erase(iSkip,lReturnValue->end()); 00511 } 00512 cout << "</cutting>" 00513 << endl; 00514 return lReturnValue; 00515 };
| void * CQMultiple::doQueryThread | ( | void * | inParameter | ) | [static] |
This function processes is in the inner loop of query. If multithreading is possible on the system on which GIFT was compiled, then this function will run in a thread, and fastQuery will wait for it
do the query thread, but starting query
Definition at line 355 of file CQMultiple.cc.
References CQMThread::mQuery, CQMThread::mQueryProcessor, CQMThread::mResult, and CQuery::query().
Referenced by CQMThread::runThread().
00355 { 00356 class CQMThread* lParam((CQMThread*) inParameter); 00357 lParam->mResult=lParam->mQueryProcessor.query(*(lParam->mQuery)); 00358 cout << "I AM FINISHED HERE " << lParam << "result" << lParam->mResult << endl; 00359 return 0; 00360 }
| CXMLElement * CQMultiple::query | ( | const CXMLElement & | inQuery | ) | [virtual] |
calls query for every child, merges the results by URLs
calls fastQuery for every child, merges the results and translates them back into URLs
NEW, MORE EFFICIENT VERSION
Reimplemented from CQuery.
Definition at line 592 of file CQMultiple.cc.
References CXMLElement::addAttribute(), CXMLElement::addChild(), mrml_const::calculated_similarity, CXMLElement::child_list_begin(), CXMLElement::child_list_end(), CXMLElement::clone(), mrml_const::error, CQuery::getRandomImages(), mrml_const::image_location, CXMLElement::longReadAttribute(), CQuery::mAccessor, CQuery::mChildren, mrml_const::message, CXMLElement::moveUp(), mUsesResultURLs, CQuery::query(), mrml_const::query_result, mrml_const::query_result_element, mrml_const::query_result_element_list, mrml_const::result_cutoff, mrml_const::result_size, CAccessor::size(), and mrml_const::thumbnail_location.
00592 { 00593 00594 if(!mUsesResultURLs){ 00595 // if the mUsesReusltURLs is not set, 00596 // just call fastquery, and assemble from that a result, 00597 // as CQuery does. 00598 return CQuery::query(inQuery); 00599 } 00600 00601 pair<bool,long> lNumberOfInterestingImages= 00602 inQuery.longReadAttribute(mrml_const::result_size); 00603 00604 int inNumberOfInterestingImages= 00605 lNumberOfInterestingImages.second; 00606 00607 pair<bool,long> lCutoff= 00608 inQuery.longReadAttribute(mrml_const::result_cutoff); 00609 00610 int inCutoff= 00611 lCutoff.second; 00612 00613 // do a deep clone of the query (for const cast) 00614 CXMLElement* lQuery=inQuery.clone(1); 00615 00616 if(lQuery->child_list_begin()!=lQuery->child_list_end()){ 00617 00618 // 00619 // set the result size to a multiple of the 00620 // number of the images requested 00621 // to get a higher probability that 00622 // the combination reflects the real score 00623 // you want more explanation? 00624 // you get it at help-gift@gnu.org 00625 // 00626 lQuery->addAttribute(mrml_const::result_size, 00627 long(inNumberOfInterestingImages*5)); 00628 00629 cout << "CMultiple::query Number of children:" 00630 << mChildren.size() 00631 << endl; 00632 00633 00634 //list<CWeightedResult> lTemporary; 00635 double lWeightSum(0); 00636 00637 map<string,CMergeTriplet> lResultMap; 00638 00639 // no mutex protection needed as this is not called except by main thread 00640 00641 // 00642 // rip this into two parts 00643 // in order to 00644 // make it possible to run the querying in one thread 00645 // and do the merging after having waited for each thread 00646 // 00647 00648 list<CQMThread> lListOfThreads; 00649 00650 lCChildren::const_iterator lLast=mChildren.end(); 00651 lLast--; 00652 for(lCChildren::const_iterator i=mChildren.begin(); 00653 i!=mChildren.end(); 00654 i++){ 00655 lWeightSum+=i->mWeight; 00656 00657 //lTemporary.push_back(CWeightedResult()); 00658 00659 cout << "**-------------------------------this CMultiple QUERY:" << this 00660 << ", i->mQuery:" << i->mQuery 00661 << ", i->mWeight:" << i->mWeight 00662 << endl; 00663 00664 00665 lListOfThreads.push_back(CQMThread(*(i->mQuery), // The Query processor to choose 00666 *lQuery, // the query to be processed 00667 i->mWeight, // the weight the result will receive 00668 mAccessor->size(), // the size of the accessor (to get all potential results) 00669 inCutoff));// the difference to the best which is allowed for a result 00670 /* EX-LEAK 00671 the following was a special branch for reducing the 00672 number of spawned threads by one. Apparently this did 00673 not work and caused a memory leak. Now it seems to work. 00674 00675 if(1==0 00676 && (i==lLast)){ 00677 lListOfThreads.back().callFunction();//something to do for the main thread 00678 }else*/ 00679 00680 { 00681 cout << "Running thread" 00682 << endl; 00683 lListOfThreads.back().runThread();//run the thread 00684 cout << "loop" 00685 << endl; 00686 } 00687 cout << "endloop" 00688 << endl; 00689 } 00690 // here we would join all threads 00691 00692 for(list<CQMThread>::iterator lThread=lListOfThreads.begin(); 00693 lThread!=lListOfThreads.end(); 00694 lThread++){ 00695 00696 cout << "joining..." << endl; 00697 00698 lThread->join(); 00699 00700 cout << "before merging " << endl; 00701 00702 if(!lThread->mResult){ 00703 cout << "THE THE RESULT OF THIS THREAD WAS NIL " 00704 << endl; 00705 } 00706 if(lThread->mResult){ 00707 /* 00708 OK. At this point we got back a query-result XML element. 00709 now we want the result-element-list 00710 */ 00711 cout << "H" << flush; 00712 for(CXMLElement::lCChildren::const_iterator i=lThread->mResult->child_list_begin(); 00713 i!=lThread->mResult->child_list_end(); 00714 i++){ 00715 cout << "I" << flush; 00716 if((*i)->getName() == mrml_const::query_result_element_list){ 00717 for(CXMLElement::lCChildren::const_iterator j=(*i)->child_list_begin(); 00718 j!=(*i)->child_list_end(); 00719 j++){ 00720 cout << "J" << flush; 00721 if((*j)->getName() == mrml_const::query_result_element){ 00722 cout << "K" << flush; 00723 /* 00724 inside this, *j points now to an XML element which 00725 is a query-result-element. from this we will read now 00726 the calculated-relevance, 00727 as well as the image and thumbnail location. 00728 */ 00729 pair<bool,double> lCalculatedRelevance= 00730 (*j)->doubleReadAttribute(mrml_const::calculated_similarity); 00731 pair<bool,string> lImageLocation= 00732 (*j)->stringReadAttribute(mrml_const::image_location); 00733 pair<bool,string> lThumbnailLocation= 00734 (*j)->stringReadAttribute(mrml_const::thumbnail_location); 00735 00736 cout << "L" << flush; 00737 00738 // no relevance corresponds to relevance 0 00739 if(! lCalculatedRelevance.first){ 00740 lCalculatedRelevance=make_pair(bool(0), 00741 double(0)); 00742 } 00743 00744 cout << "L" << flush; 00745 // if there is a thumbnail and no image, 00746 // we take the thumbnail location as image location 00747 if((lThumbnailLocation.first) 00748 && (!lImageLocation.first)){ 00749 lImageLocation=lThumbnailLocation; 00750 } 00751 cout << "L" << flush; 00752 // if there is an image and no thumbnail, 00753 // we take the image location as thumbnail location 00754 if((!lThumbnailLocation.first) 00755 && (lImageLocation.first)){ 00756 lThumbnailLocation=lImageLocation; 00757 } 00758 cout << "L" << flush; 00759 00760 // now we are guaranteed to have a well-initialised 00761 // image location 00762 if(lImageLocation.first){ 00763 map<string,CMergeTriplet>::iterator lFound(lResultMap.find(lImageLocation.second)); 00764 00765 if(lFound!=lResultMap.end()){ 00766 cout << "A" << flush; 00767 lFound->second.addToRelevance(lCalculatedRelevance.second); 00768 cout << "[" << lFound->second.getCalculatedSimilarity() << "]" << flush; 00769 }else{ 00770 // this result is not yet in the result map 00771 lFound=lResultMap.insert(make_pair(lImageLocation.second,CMergeTriplet(lImageLocation.second, 00772 lThumbnailLocation.second))).first; 00773 lFound->second.addToRelevance(lCalculatedRelevance.second); 00774 } 00775 cout << "M" << flush; 00776 00777 } 00778 } 00779 } 00780 } 00781 } 00782 delete lThread->mFastResult; 00783 } 00784 00785 00786 00787 cout << "after thread " << endl; 00788 } 00789 cout << "ALL THREADS FINISHED " << endl; 00790 00791 { 00792 // now we build a list of merge triplets 00793 // that is sorted by score in descending order 00794 list<CMergeTriplet> lResultList; 00795 for(map<string,CMergeTriplet>::const_iterator i=lResultMap.begin(); 00796 i!=lResultMap.end(); 00797 i++){ 00798 lResultList.push_back(i->second); 00799 } 00800 cout << "DELETING " << endl; 00801 lResultList.sort(CSortDescendingByRelevance_MT()); 00802 cout << "DELETING " << endl; 00803 00804 { 00805 list<CMergeTriplet>::iterator iSkip=lResultList.begin(); 00806 for(int i=0; 00807 i<inNumberOfInterestingImages && i<lResultList.size(); 00808 i++){ 00809 iSkip->setSimilarity(iSkip->getCalculatedSimilarity()/lWeightSum); 00810 iSkip++; 00811 } 00812 lResultList.erase(iSkip,lResultList.end()); 00813 } 00814 00815 // now let's build a result element tree 00816 CXMLElement* lReturnValue(new CXMLElement(mrml_const::query_result,0)); 00817 CXMLElement* lReturnList(new CXMLElement(mrml_const::query_result_element_list,0)); 00818 lReturnValue->addChild(lReturnList); 00819 00820 assert(mAccessor); 00821 00822 for(list<CMergeTriplet>::const_iterator i=lResultList.begin(); 00823 i!=lResultList.end(); 00824 i++){ 00825 00826 CXMLElement* lReturnElement(new CXMLElement(mrml_const::query_result_element, 00827 0)); 00828 { 00829 double lRelevanceLevel(i->getCalculatedSimilarity()); 00830 string lString(mrml_const::calculated_similarity); 00831 lReturnElement->addAttribute(lString, 00832 lRelevanceLevel); 00833 } 00834 00835 00836 00837 { 00838 string lURL(i->getImageLocation()); 00839 string lString(mrml_const::image_location); 00840 lReturnElement->addAttribute(lString, 00841 lURL); 00842 } 00843 00844 { 00845 string lURL(i->getThumbnailLocation()); 00846 string lString(mrml_const::thumbnail_location); 00847 lReturnElement->addAttribute(lString, 00848 lURL); 00849 } 00850 00851 lReturnValue->addChild(lReturnElement); 00852 00853 lReturnValue->moveUp(); 00854 00855 } 00856 //gMutex->unlock();//debugging 00857 return lReturnValue; 00858 } 00859 }else{ 00860 //gMutex->unlock();//debugging 00861 return getRandomImages(inNumberOfInterestingImages); 00862 } 00863 00864 //gMutex->unlock();//debugging 00865 // missing sort and output 00866 list<pair<string,string> > lAttributes; 00867 lAttributes.push_back(make_pair(mrml_const::message, 00868 string("empty query result, i seem to have missed all ifs and elses!"))); 00869 return new CXMLElement(mrml_const::error,lAttributes); 00870 };
| bool CQMultiple::setAlgorithm | ( | CAlgorithm & | inAlgorithm | ) | [virtual] |
set the Algorithm. same scheme as in setCollection
everything happening in the children
Reimplemented from CQuery.
Definition at line 113 of file CQMultiple.cc.
References CAccessorAdmin::closeAccessor(), CAlgorithm::getCollectionID(), CAccessorAdminCollection::getProxy(), CQuery::mAccessor, CQuery::mAccessorAdmin, CQuery::mAccessorAdminCollection, CQuery::mAlgorithm, CAccessorAdmin::openAccessor(), and CQuery::setAlgorithm().
00113 { 00114 if(mAlgorithm && mAlgorithm->getCollectionID()==inAlgorithm.getCollectionID()){ 00115 00116 return true; 00117 00118 }else{ 00119 //close the old collection, if exsisting 00120 if(mAccessorAdmin) 00121 mAccessorAdmin->closeAccessor("url2fts"); 00122 // 00123 mAccessorAdmin=&mAccessorAdminCollection->getProxy(inAlgorithm.getCollectionID()); 00124 mAccessor=mAccessorAdmin->openAccessor("url2fts"); 00125 00126 assert(mAccessor); 00127 // 00128 return (CQuery::setAlgorithm(inAlgorithm) && mAccessor); 00129 } 00130 };
bool CQMultiple::mUsesResultURLs [protected] |
do we merge result URLs or result IDs?
Definition at line 129 of file CQMultiple.h.
Referenced by CQMultiple(), and query().
1.5.6