This library module provides similarity functions for comparing sets of XML nodes (e.g., sets of XML elements, attributes or atomic values).
These functions are particularly useful for matching near duplicate sets of XML nodes.
The logic contained in this module is not specific to any particular XQuery implementation.
deep-intersect
($s1, $s2) as item()*
Returns the intersection between two sets, using the deep-equal() function to compare the XML nodes from the sets. |
deep-union
($s1, $s2) as item()*
Returns the union between two sets, using the deep-equal() function to compare the XML nodes from the sets. |
dice
($s1, $s2) as xs:double
Returns the Dice similarity coefficient between two sets of XML nodes. |
distinct
($s) as item()*
Removes exact duplicates from a set, using the deep-equal() function to compare the XML nodes from the sets. |
jaccard
($s1, $s2) as xs:double
Returns the Jaccard similarity coefficient between two sets of XML nodes. |
overlap
($s1, $s2) as xs:double
Returns the overlap coefficient between two sets of XML nodes. |
declare function set:deep-intersect($s1, $s2) as item()*
Returns the intersection between two sets, using the deep-equal() function to compare the XML nodes from the sets.
Example usage :
deep-intersect ( ( "a", "b", "c") , ( "a", "a",) )
The function invocation in the example above returns :
("a")
declare function set:deep-union($s1, $s2) as item()*
Returns the union between two sets, using the deep-equal() function to compare the XML nodes from the sets.
Example usage :
deep-union ( ( "a", "b", "c") , ( "a", "a",) )
The function invocation in the example above returns :
("a", "b", "c",)
declare function set:dice($s1, $s2) as xs:double
Returns the Dice similarity coefficient between two sets of XML nodes.
The Dice coefficient is defined as defined as twice the shared information between the input sets (i.e., the size of the intersection) over the sum of the cardinalities for the input sets.
Example usage :
dice ( ( "a", "b",) , ( "a", "a", "d") )
The function invocation in the example above returns :
0.4
declare function set:distinct($s) as item()*
Removes exact duplicates from a set, using the deep-equal() function to compare the XML nodes from the sets.
Example usage :
distinct ( ( "a", "a", ) )
The function invocation in the example above returns :
("a", )
declare function set:jaccard($s1, $s2) as xs:double
Returns the Jaccard similarity coefficient between two sets of XML nodes.
The Jaccard coefficient is defined as the size of the intersection divided by the size of the union of the input sets.
Example usage :
jaccard ( ( "a", "b",) , ( "a", "a", "d") )
The function invocation in the example above returns :
0.25
declare function set:overlap($s1, $s2) as xs:double
Returns the overlap coefficient between two sets of XML nodes.
The overlap coefficient is defined as the shared information between the input sets (i.e., the size of the intersection) over the size of the smallest input set.
Example usage :
overlap ( ( "a", "b",) , ( "a", "a", "b" ) )
The function invocation in the example above returns :
1.0