View as source file or XML.
let $x := <msg>Self Improvement</msg> return $x contains text "improve" using stemmingreturns true because $x contains "Improvment" that has the same stem as "improve".The initial implementation of the stemming option uses the Snowball stemmers and therefore can stem words in the following languages: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, and Turkish.
class Stemmer { public: typedef /* implementation-defined */ ptr; struct Properties { char const *uri; }; virtual void destroy() const = 0; virtual void properties( Properties *result ) const = 0; virtual void stem( String const &word, locale::iso639_1::type lang, String *result ) const = 0; protected: virtual ~Stemmer(); };For details about the ptr type, the destroy() function, and why the destructor is protected, see the Memory Management document.To implement the Stemmer, you need to implement the stem() function where:
word | The word to be stemmed. |
lang | The language of the word. |
result | The stemmed word goes here. |
class MyStemmer : public Stemmer { public: void destroy() const; void properties( Properties *result ) const; void stem( String const &word, locale::iso639_1::type lang, String *result ) const; private: MyStemmer(); friend class MyStemmerProvider; // only it can create instances }; void MyStemmer::destroy() const { // Do nothing since we statically allocate a singleton instance of our stemmer. } void MyStemmer::properties( Properties *props ) const { props->uri = "http://my.example.com/zorba/full-text/stemmer"; } void MyStemmer::stem( String const &word, locale::iso639_1::type lang, String *result ) const { if ( word == "foobar" ) *result = "foo"; else *result = word; // Don't know how to stem word: set result to word as-is. }A real stemmer would either use a stemming algorithm or a dictionary look-up to stem many words, of course. Although not used in this simple example, lang can be used to allow a single stemmer instance to stem words in more than one language.
class StemmerProvider { public: virtual ~StemmerProvider(); virtual bool getStemmer( locale::iso639_1::type lang, Stemmer::ptr *s = 0 ) const = 0; };The getStemmer() function should return true only if it can provide a Stemmer for the given language; false otherwise. If the Stemmer::ptr argument is null, the caller wants to check only whether the provider can provide a stemmer for the given language and doesn't want a Stemmer instance created or returned.A simple StemmerProvider for our simple stemmer can be implemented as:
class MyStemmerProvider : public StemmerProvider { public: bool getStemmer( locale::iso639_1::type lang Stemmer::ptr *s = 0 ) const; }; Stemmer::ptr MyStemmerProvider::getStemmer( locale::iso639_1::type lang ) const { static MyStemmer stemmer; Stemmer::ptr result; switch ( lang ) { case iso639_1::en: case iso639_1::unknown: // Handle "unknown" language since, in many cases, the language is not known. result.reset( &stemmer ); return true; default: // // We have no stemmer for the given language: return false. // Zorba will then use the built-in stemmer for the given language. // return false; } }
void *const store = StoreManager::getStore(); Zorba *const zorba = Zorba::getInstance( store ); MyStemmerProvider provider; zorba->getXmlDataManager()->registerStemmerProvider( &provider );