The OME Blog Because metadata is worth a thousand pictures

Update on Bio-Formats Development Work

Current development on Bio-Formats can be grouped into four categories:

Model changes

Through our 4.4.x and 5.0.x releases (July 2012-present), we’ve held the OME Data Model and all interfaces largely constant, primarily to support the large and growing array of users of our software, and to avoid destabilizing any of the tools they use. Stability is great for our user community, but now we’ve reached a point where several extensions of OME software demand updates to the core OME Data Model.

In our upcoming 5.1.x releases, we are instituting a series of model changes that substantially expand OME’s capabilities. Some of these simply reflect corrections or additions that are necessary to keep the software fresh and up-to-date. Others are additions that we intend to be the foundation for work in the next 1-2 years.

Examples include:

  • A new metadata feature that allows users to use annotations which take the form of key-value pairs (e.g. ‘cell line’ and ‘HT-293’). This provides a flexible, semi-structured way for users to add critical components of metadata (e.g. experimental protocol metadata) stored in OME-TIFF or OMERO or read by Bio-Formats.

  • Changing channel wavelength from ‘int’ to ‘float’ - this is a long overdue fix which recognizes the increasing use of illumination systems with special wavelengths that must be represented with better than integer accuracy.

  • Unit support - this is by far the most substantial change in the 5.1.x series. We want as many of the metadata values as possible to be described by correct scientific units. For example, the wavelength of a laser source should be “647.0 nm”. This capability becomes particularly important as OME expands the data types it supports from electron microscopy through light microscopy and mesoscopic imaging. One example of the importance of the units is that recording the pixel size of the images in these different domains is critical for proper measurements and analysis. In the longer term, we aim to record and express the relationship between different images, e.g. in correlative imaging, so we must have this capability.

C++ implementation

For OME 5.1 we are releasing a native C++ implementation for Bio-Formats. The goal here is to make it easier to call Bio-Formats from a non-Java program. The first version of this work will be released with 5.1.0 but the API will be subject to change until the 5.2 release later in 2015. Our goal is again to expand the number of domains in which OME software can be used, in particular into applications such as ITK, OpenCV, and NumPy in medical imaging.

For the moment, our work here is related to basic I/O functionality (reading/writing OME-TIFF etc.). This proves the concept and allows us to look forward to adding specific readers that will benefit the community.

Disclaimer: not all, or even most, of the Java-based Bio-Formats will be ported to C++.

Format fixes

As always we have been working on fixing several formats. This work has been complicated by the number of imaging modalities which are appearing inside in each individual file format (see previous blog post). Our work is focused on ND2, MetaMorph, Prairie, DICOM, Zeiss CZI, and Leica LIF. We also have added support for the i2i and im3 formats.

Performance

Dataset sizes continue to increase by several-fold, so Bio-Formats’ capabilities have to keep up. For the 5.1 release, we’ve reduced the overhead for metadata parsing, improved I/O buffering, and tuned Zeiss CZI and several HCS readers to have faster initialization and plane reading times.

Looking a bit further forward…

After the 5.1 release, we of course have several more things we are considering. To get changes to Bio-Formats out to the community as rapidly as possible, we will begin the process of decoupling Bio-Formats releases from the rest of the OME project’s work. We have released all the different tools together because, in fact, they are closely related, and a release process requires a lot of the same testing and process (release-testing production-grade software is not easy, as any developer will tell you). In short, it’s easier for the OME team to work this way, but clearly it is better for Bio-Formats users if we get updates and fixes out as soon as possible. Of course, we want to accelerate Bio-Formats releases, while maintaining all the testing and QA which helps make the software so useful. This means work on our CI system to make all this happen. Stay tuned.