The Zorba team is pleased to announce the release of Zorba 2.7, codename Gaia. The release is a substantial step forward in terms ofnew features, performance improvements, and bug fixes. It is available in our download section.
Features
Simple Map Operator
One of the new expressions in XQuery 3.0 is the Simple Map Operator. It provides a very compact way to do simple mappings which may replace simple “for” expression and make them much shorter. For example:
("Here is", "Zorba", "2.7") ! upper-case(.)
In this example, the context item ”.” in the argument on the right-hand side of the ! operator is bound to each item in the sequence resulting from evaluating the left-hand side. In contrast to the path operator (/), the simple map operator also allows non-node sequences as a result of the left-hand side argument.
Reading PDFs
Zorba 2.7 comes with a new module that provides functionality to read the text from PDF documents and allow you to render pages of a PDF documents to images. It’s based on the Apache PDFBox library. Here is an example of the module in action below:
import module namespace read-pdf = "http://www.zorba-xquery.com/modules/read-pdf";
declare namespace rpo = "http://www.zorba-xquery.com/modules/read-pdf/read-pdf-options";
let $pdf := fetch:content-binary("some.pdf")
let $options :=
< ="http://www.zorba-xquery.com/modules/read-pdf/read-pdf-options">
<>simple</>
<>2</>
<>3</>
<>---start-page-separator---</>
<>---end-page-separator---</>
</>
return
read-pdf:extract-text($pdf, $options)
The code snippet above loads the content of the some.pdf file and extracts the text contained in page two and three.
JSONiq Improvements
In Zorba 2.6.0, parsing JSON text and serializing results to JSON was kind of a hassle. In this release, we have improved this functionality and made it much easier to use. For example, you can now parse sequences of JSON documents or serialize JSON objects that contain dateTime-typed values. For example:
{
"now" : fn:current-dateTime()
}
In this example (try it live), the current dateTime is cast to a string. In case you want to keep the types of such values but they are not natively supported by JSON, you can use the function jn:encode-for-roundtrip to get a fully round-trippable version of the JSON object that you can serialize. to read the object back, the jn:decode-from-roundtrip function ensures that all of the values still have the same type that they had before encoding. See example below and try it live here.
jn:encode-for-roundtrip(
{
"now" : fn:current-dateTime()
}
)
Two other little changes in the JSONiq language are the automatic conversion of empty sequences to null. For instance:
{ "null" : () }
will be serialized to:
{ "null" : null }
And the automatic casting of key names to string.
{ 1 : "foo" }
will be serialized to:
{ "1" : "foo" }
Performance Improvements
The release also contains quite some under-the-hood changes. Specifically, the memory management in the compiler was refactored to accommodate for the fact that compiler expressions are not very big and their life-time is naturally limited. This allows for a much more efficient memory management.
Beyond that, the optimizer has been improved to deal with positional predicates that involve inequality comparisons. Knowing about this, the runtime can avoid unnecessary work because of its lazy evaluation strategy.
Bug Fixes
Despite all the work we did for those new features and performance improvements, we also spent quite some time to fix bugs. Zorba 2.7 comes with approx. 30 bug fixes. The most asked for was the correct serialization of XHTML elements with empty content. Particularly, only tags such as br or img should legally use the empty element shorthand whereas all other must be written as <elem></elem>.
We hope you like the release and are looking forward to your feedback.