Most functions in this module take a language as a parameter
using the
xs:language
XML schema data type.
stem()
functions return the
stem
of a word.
The stem of a word itself, however, is not guaranteed to be a word.
It is best to consider a stem as an opaque byte sequence.
All that is guaranteed about a stem is that,
for a given word,
the stem of that word will always be the same byte sequence.
Hence,
you should never compare the result of one of the stem()
functions against a non-stemmed string,
for example:
if ( ft:stem( "apples" ) eq "apple" ) ** WRONG **Instead do:
if ( ft:stem( "apples" ) eq ft:stem( "apple" ) ) ** CORRECT **
thesaurus-lookup()
functions have "levels"
and "relationship" parameters.
The values for these are implementation-defined.
The default implementation uses the
WordNet lexical database,
version 3.0.
In WordNet, the number of "levels" that two phrases are apart are how many hierarchical meanings apart they are. For example, "canary" is 5 levels away from "vertebrate" (carary > finch > oscine > passerine > bird > vertebrate).
When using the WordNet implementation, all of the relationships (and their abbreviations) specified by ISO 2788 and ANSI/NISO Z39.19-2005 with the exceptions of "HN" (history note) and "X SN" (see scope note for) are supported. These relationships are:
Rel. | Meaning | WordNet Rel. |
---|---|---|
BT | broader term | hypernym |
BTG | broader term generic | hypernym |
BTI | broader term instance | instance hypernym |
BTP | broader term partitive | part meronym |
NT | narrower term | hyponym |
NTG | narrower term generic | hyponym |
NTI | narrower term instance | instance hyponym |
NTP | narrower term partitive | part holonym |
RT | related term | also see |
SN | scope note | n/a |
TT | top term | hypernym |
UF | non-preferred term | n/a |
USE | preferred term | n/a |
Relationship | Meaning |
---|---|
also see | A word that is related to another, e.g., for "varnished" (furniture) one should also see "finished." |
antonym | A word opposite in meaning to another, e.g., "light" is an antonym for "heavy." |
attribute | A noun for which adjectives express values, e.g., "weight" is an attribute for which the adjectives "light" and "heavy" express values. |
cause | A verb that causes another, e.g., "show" is a cause of "see." |
derivationally related form | A word that is derived from a root word, e.g., "metric" is a derivationally related form of "meter." |
derived from adjective | An adverb that is derived from an adjective, e.g., "correctly" is derived from the adjective "correct." |
entailment | A verb that presupposes another, e.g., "snoring" entails "sleeping." |
hypernym | A word with a broad meaning that more specific words fall under, e.g., "meal" is a hypernym of "breakfast." |
hyponym | A word of more specific meaning than a general term applicable to it, e.g., "breakfast" is a hyponym of "meal." |
instance hypernym | A word that denotes a category of some specific instance, e.g., "author" is an instance hypernym of "Asimov." |
instance hyponym | A term that donotes a specific instance of some general category, e.g., "Asimov" is an instance hyponym of "author." |
member holonym | A word that denotes a collection of individuals, e.g., "faculty" is a member holonym of "professor." |
member meronym | A word that denotes a member of a larger group, e.g., a "person" is a member meronym of a "crowd." |
part holonym | A word that denotes a larger whole comprised of some part, e.g., "car" is a part holonym of "engine." |
part meronym | A word that denotes a part of a larger whole, e.g., an "engine" is part meronym of a "car." |
participle of verb | An adjective that is the participle of some verb, e.g., "breaking" is the participle of the verb "break." |
pertainym | An adjective that classifies its noun, e.g., "musical" is a pertainym in "musical instrument." |
similar to | Similar, though not necessarily interchangeable, adjectives. For example, "shiny" is similar to "bright", but they have subtle differences. |
substance holonym | A word that denotes a larger whole containing some constituent substance, e.g., "bread" is a substance holonym of "flour." |
substance meronym | A word that denotes a constituant substance of some larger whole, e.g., "flour" is a substance meronym of "bread." |
verb group | A verb that is a member of a group of similar verbs, e.g., "live" is in the verb group of "dwell", "live", "inhabit", etc. |
current-compare-options
() as object() external
Gets the current compare options. |
current-lang
() as xs:language external
Gets the current language : either the language specified by the declare ft-option using language statement (if any) or the one returned by ft:host-lang() (if none). |
host-lang
() as xs:language external
Gets the host's current language . |
is-stem-lang-supported
($lang as xs:language) as xs:boolean external
Checks whether the given language is supported for stemming. |
is-stop-word-lang-supported
($lang as xs:language) as xs:boolean external
Checks whether the given language is supported for stop words. |
is-stop-word
($word as xs:string) as xs:boolean external
Checks whether the given word is a stop-word. |
is-stop-word
($word as xs:string, $lang as xs:language) as xs:boolean external
Checks whether the given word is a stop-word. |
is-thesaurus-lang-supported
($lang as xs:language) as xs:boolean external
Checks whether the given language is supported for look-up using the default thesaurus. |
is-thesaurus-lang-supported
($uri as xs:string, $lang as xs:language) as xs:boolean external
Checks whether the given language is supported for look-up using the thesaurus specified by the given URI. |
is-tokenizer-lang-supported
($lang as xs:language) as xs:boolean external
Checks whether the given language is supported for tokenization. |
stem
($word as xs:string) as xs:string external
Stems the given word. |
stem
($word as xs:string, $lang as xs:language) as xs:string external
Stems the given word. |
strip-diacritics
($string as xs:string) as xs:string external
Strips all diacritical marks from all characters. |
thesaurus-lookup
($phrase as xs:string) as xs:string* external
Looks-up the given phrase in the default thesaurus. |
thesaurus-lookup
($uri as xs:string, $phrase as xs:string) as xs:string* external
Looks-up the given phrase in a thesaurus. |
thesaurus-lookup
($uri as xs:string, $phrase as xs:string, $lang as xs:language) as xs:string* external
Looks-up the given phrase in the thesaurus specified by the given URI. |
thesaurus-lookup
($uri as xs:string, $phrase as xs:string, $lang as xs:language, $relationship as xs:string) as xs:string* external
Looks-up the given phrase in a thesaurus. |
thesaurus-lookup
($uri as xs:string, $phrase as xs:string, $lang as xs:language, $relationship as xs:string, $level-least as xs:integer, $level-most as xs:integer) as xs:string* external
Looks-up the given phrase in a thesaurus. |
tokenize-node
($node as node()) as object()* external
Tokenizes the given node and all of its descendants. |
tokenize-node
($node as node(), $lang as xs:language) as object()* external
Tokenizes the given node and all of its decendants. |
tokenize-nodes
($includes as node()+, $excludes as node()*) as object()* external
Tokenizes the set of nodes comprising $includes (and all of its descendants) but excluding $excludes (and all of its descendants), if any. |
tokenize-nodes
($includes as node()+, $excludes as node()*, $lang as xs:language) as object()* external
Tokenizes the set of nodes comprising $includes (and all of its descendants) but excluding $excludes (and all of its descendants), if any. |
tokenize-string
($string as xs:string) as xs:string* external
Tokenizes the given string. |
tokenize-string
($string as xs:string, $lang as xs:language) as xs:string* external
Tokenizes the given string. |
tokenizer-properties
() as object() external
Gets properties of the tokenizer for the language returned by ft:current-lang() . |
tokenizer-properties
($lang as xs:language) as object() external
Gets properties of the tokenizer for the given language . |
declare function ft:current-compare-options() as object() external
declare function ft:current-lang() as xs:language external
declare ft-option using
language
statement (if any)
or the one returned by ft:host-lang()
(if none).
declare function ft:host-lang() as xs:language external
setlocale
(3) returns non-null,
the language corresponding to that locale is used.
LANG
environment variable is set,
that language is ued.
GetLocaleInfo()
function is used.
declare function ft:is-stem-lang-supported($lang as xs:language) as xs:boolean external
true
only if the language is supported.declare function ft:is-stop-word-lang-supported($lang as xs:language) as xs:boolean external
true
only if the language is supported.declare function ft:is-stop-word($word as xs:string) as xs:boolean external
ft:current-lang()
.true
only if $word
is a stop-word.declare function ft:is-stop-word($word as xs:string, $lang as xs:language) as xs:boolean external
$word
.true
only if $word
is a stop-word.declare function ft:is-thesaurus-lang-supported($lang as xs:language) as xs:boolean external
true
only if the language is supported.declare function ft:is-thesaurus-lang-supported($uri as xs:string, $lang as xs:language) as xs:boolean external
true
only if the language is supported.declare function ft:is-tokenizer-lang-supported($lang as xs:language) as xs:boolean external
true
only if the language is supported.declare function ft:stem($word as xs:string) as xs:string external
ft:current-lang()
.$word
.declare function ft:stem($word as xs:string, $lang as xs:language) as xs:string external
$word
.$word
.declare function ft:strip-diacritics($string as xs:string) as xs:string external
$string
with diacritical marks stripped.declare function ft:thesaurus-lookup($phrase as xs:string) as xs:string* external
ft:current-lang()
.$phrase
is found in the thesaurus or the empty sequence if not.declare function ft:thesaurus-lookup($uri as xs:string, $phrase as xs:string) as xs:string* external
ft:current-lang()
.$phrase
is found in the thesaurus or the empty sequence if not.declare function ft:thesaurus-lookup($uri as xs:string, $phrase as xs:string, $lang as xs:language) as xs:string* external
$phrase
.$phrase
is found in the thesaurus or the empty sequence if not.declare function ft:thesaurus-lookup($uri as xs:string, $phrase as xs:string, $lang as xs:language, $relationship as xs:string) as xs:string* external
$phrase
.$phrase
.$phrase
is found in the thesaurus or the empty sequence if not.declare function ft:thesaurus-lookup($uri as xs:string, $phrase as xs:string, $lang as xs:language, $relationship as xs:string, $level-least as xs:integer, $level-most as xs:integer) as xs:string* external
$phrase
.$phrase
.$phrase
is found in the thesaurus or the empty sequence if not.declare function ft:tokenize-node($node as node()) as object()* external
ft:current-lang()
.declare function ft:tokenize-node($node as node(), $lang as xs:language) as object()* external
$node
.declare function ft:tokenize-nodes($includes as node()+, $excludes as node()*) as object()* external
$includes
(and all of its
descendants) but excluding $excludes
(and all of its
descendants), if any.
ft:current-lang()
.declare function ft:tokenize-nodes($includes as node()+, $excludes as node()*, $lang as xs:language) as object()* external
$includes
(and all of its
descendants) but excluding $excludes
(and all of its
descendants), if any.
declare function ft:tokenize-string($string as xs:string) as xs:string* external
ft:current-lang()
.declare function ft:tokenize-string($string as xs:string, $lang as xs:language) as xs:string* external
$string
.declare function ft:tokenizer-properties() as object() external
ft:current-lang()
.
declare function ft:tokenizer-properties($lang as xs:language) as object() external
$ft:LANG-DA as xs:language
xs:language
.
$ft:LANG-DE as xs:language
xs:language
.
$ft:LANG-EN as xs:language
xs:language
.
$ft:LANG-ES as xs:language
xs:language
.
$ft:LANG-FI as xs:language
xs:language
.
$ft:LANG-FR as xs:language
xs:language
.
$ft:LANG-HU as xs:language
xs:language
.
$ft:LANG-IT as xs:language
xs:language
.
$ft:LANG-NL as xs:language
xs:language
.
$ft:LANG-NO as xs:language
xs:language
.
$ft:LANG-PT as xs:language
xs:language
.
$ft:LANG-RO as xs:language
xs:language
.
$ft:LANG-RU as xs:language
xs:language
.
$ft:LANG-SV as xs:language
xs:language
.
$ft:LANG-TR as xs:language
xs:language
.