The OME Blog Because metadata is worth a thousand pictures

Bio-Formats and OME Data Model Development Status

This is an update about what we are working on in the Bio-Formats codebase for the next few months. As this is where the OME Data Model lives, it covers our current and upcoming work on the Model and the Bio-Formats project.

Current Bio-Formats development focus

The release of 5.1.7 back in December is likely to be the last regularly planned release of Bio-Formats 5.1.x. Bio-Formats development has now shifted to focus on 5.2.0 in the develop branch. There are two points for Bio-Formats users to note:

  • the primary aim of the Bio-Formats 5.2.0 work is to upgrade our OME Data Model (as discussed below) to provide critical new functionality for many of our users
  • our regular Bio-Formats Java schedule of monthly releases will be suspended and non-critical bug fixes and new format support will have a lower priority until this model work is complete

For developers using Bio-Formats, the develop branch will include development schema versions and should not be used for writing OME files (OME-XML, OME-TIFF) until Bio-Formats 5.2.0 is released.

We hope to release Bio-Formats version 5.2.0 in Spring 2016. You can follow our progress on the public Trello board.

Data Model work

The main effort of the Bio-Formats 5.2.0 development work will be focused on updating the Data Model to include a folder-like structure for storing Regions of Interest (ROIs), as discussed in the most recent OMERO status post.

Regions of Interest are core features of the OME Data Model currently stored as image components without any ordering or structure. We have identified several use cases across a wide range of imaging domains from high content screening to digital pathology where this representation limits the ROI usability. For instance, the Image Data Resource1 built by OME contains several datasets where each image is associated with several hundreds of thousands of ROIs (nice examples are here, here, and here). Similar orders of magnitudes of ROIs are commonly generated computationally by analytical tools in high content screening. In other domains, a ROI or set of ROIs needs to be associated with a complex hierarchical representation like ontology. Across all these use cases, there is a growing need to organize, browse and filter ROIs at the model-level. To address it, we will introduce a folder concept allowing the ROIs within an image to be grouped in a hierarchical manner.

We aim to update OMERO to include ROI Folders and release this as version 5.3 during Spring 2016.

If you are interested in our design process, you can follow the discussion on the issues in our Design GitHub repository.

We also aim to extend our support of experimental and analytic metadata—more about this in a later entry. In brief, our aim is to package and release all the work we’ve done on the Image Data Repository as tools for the community to use to access a broad range of types of metadata.

New format support

Despite the focus on the Data Model, 5.2.0 will also introduce two new formats. These are scheduled to be Becker & Hickl SPC and Princeton Instruments SPE. We are currently working on the readers for these and would greatly appreciate sample files if you have any to help us with testing (you can submit files via our QA system or get in touch on our mailing lists for details on how to submit larger files).

While the core team won’t be focusing on any other readers for 5.2.0, we continue to encourage community submissions. New readers submitted by external collaborators will be treated on a case-by-case basis. We always aim to review external PRs promptly but our capacity for reviewing major changes is going to be reduced for the next couple of months so release of new readers may be delayed to 5.2.1 or later. We will endeavour to keep you informed of the timeline, including having public Trello boards for future Bio-Formats releases so the whole community can follow what is upcoming (these will be listed on the Getting Started Trello board).

OME-Files

The OME Data Model and Bio-Formats C++ will be decoupled from the main Bio-Formats code repository and renamed as OME-Files. This new API will provide the reference implementation for working with the file formats defined by the Open Microscopy Environment—OME-XML and OME-TIFF—in Java and C++ and the new development cycle will allow us to get updates out to our users as quickly as possible.

  1. At the time of publication, this was referred to as the ‘Image Data Repository’. 

Supporting complex formats - what we will and won't do, and what you can do to help

You may have noticed that a few months ago, we received an email asking us about when we expect to support 3D HISTECH .mrxs files. This sort of request isn’t particularly unusual and the reply gives an insight into one of the key challenges we face.

Just because we don’t have a reader, doesn’t mean we haven’t done any work

3D HISTECH .mrxs is an example of a complex format, the design of which does not make our work any easier. In fact, we can say with some confidence that the 3D HISTECH .mrxs file format is the most complex whole slide imaging file format we have ever encountered. We can say this because although we haven’t delivered a full reader for .mrxs—and there hasn’t been substantial public development—we have spent a great deal of time examining the format and potential solutions, and building test readers. Thanks to the example data the community has generously provided, we have been able to analyse the on-disk layout as well as the compression types, and map out the details of what an implementation would entail.

Unfortunately, the result of all this work has been the conclusion that we simply do not have the resources to prioritize delivering a complete solution for this format. This is not the only format we have reached this conclusion about. For example, support for 3i Slidebook 6 files was only added to Bio-Formats last May when 3i committed to developing the reader themselves. Obviously, we are very grateful for this, but that doesn’t change the fact that we had already spent years working on various versions of this format (our initial single-series Slidebook reader was released back in 2006 and obviously the work to produce it started even further back than that). Nikon ND2 and Zeiss CZI are other examples of formats with a complex design that makes them very difficult for us to support.

We won’t deliver something that doesn’t do the job well enough

One thing to understand about our work, strategy and commitment to supporting all file formats, especially formats used in production-scale facilities that use technology like whole slide imaging, is that we insist on delivering as close to complete support as possible. This is important given the size of community we support, the breadth of applications that use our software, and the need for utility and reliability in the software we deliver to the community.

With 3D HISTECH .mrxs, it is very hard and expensive to meet this goal. To be specific:

  1. The design decisions of 3D HISTECH with respect to image pyramid layout are at odds with what we can reasonably handle within the infrastructure currently in place. Our analysis suggests we will have to re-calculate several of the resolution levels, because of choices 3D HISTECH has made in their tiling strategy. This will create a substantial performance penalty for anyone using Bio-Formats to read this format.
  2. The strategy for storing binary data on disk in 3D HISTECH .mrxs brightfield differs substantially from fluorescence images stored in .mrxs. They are essentially two different file formats, thus doubling the work required.
  3. Based on recent data submissions and information from the community, 3D HISTECH scanners default to JPEG-XR compression when acquiring fluorescence data. Another doubling of work and complexity, as we would need to support both compressed and uncompressed data, in brightfield and fluorescence.

These points are specific to this format but similar issues occur with other proprietary formats. As a team, we are not comfortable with releasing a reader implementation that works on a limited set of file format variants, or requires time consuming and computationally expensive reprocessing and pyramid creation, just because of the implementation choices made by the format designers.

A philosophical point about our funding and resources

The OME Consortium and the wider development community have worked steadily since 2002, funded mostly by grants from non-profit charities and public funders, to build tools for the scientific community.

Building readers for proprietary formats has never been funded, and we don’t think it would ever be funded by any grant funding panel. New readers are created either by diverting our precious resources from other projects, by contributions from the community (most recently by the companies themselves), or by work commissioned by customers of Glencoe Software. We certainly listen to the community and adjust our priorities based on requests, but we can’t do everything with limited resources.

Perhaps we could crowdsource the funding for file formats but that misses the point—the formats we often lack the resources to support are those which are complex, expensive, difficult, proprietary, closed formats, designed to lock their users into a single, proprietary software application. The community’s resources are finite and should be used for things other than reverse engineering this type of format; work that, if subjected to peer review, would be declined as a waste of community resources. Several of those “other things” were discussed at our most recent Annual Users Meeting and represent key technologies that the community needs to achieve its scientific goals.

Over the last few years, we have seen efforts by several commercial imaging companies to support open formats, provide open APIs, and to make it easier for researchers and clinicians to work with the data acquired by their instruments. We have also received specifications and input from several imaging companies, which we have used to improve our own work and output. We applaud this trend; ultimately it means scientists, clinicians, engineers and developers spend less time dealing with data formats and more time doing science, developing new technologies and treating patients.

What you can do

The community has the power to change this situation. You are paying for these proprietary formats. You can condition your purchase, continued payment of support and maintenance fees etc. on:

  • the delivery of a rational, well-designed, efficient, open format
  • use of open compression schemes
  • support for the community’s efforts to deliver open readers for these files

You can of course also commit your own development resources to help solve this problem.

OMERO status update

This is a quick update on the status of various versions of OMERO, and some discussion about our future development plans and aspirations.

Releases, maintenance and deprecation

With the release of OMERO 5.2.0 at the beginning of November 2015, the current situation with OMERO versions is as follows:

  • The last release of the OMERO 5.0.x line was February 2015. This version of OMERO and all earlier ones are now unmaintained - we won’t be updating the 5.0.x line of OMERO; anyone who continues to use them does so at their own risk.

  • OMERO 5.1.x is now in “maintenance” - we will provide updates for major security issues but we won’t be releasing bug fixes or any performance enhancements. To access these, you must upgrade to the 5.2.x line. We expect to continue this level of support until the release of OMERO 5.3 (currently aimed for Spring 2016).

  • OMERO 5.2.x is the new stable line and will be updated with bug fixes and enhancements as point releases at least until OMERO 5.3 is released. We expect to continue to support this line throughout 2016, although this support will drop to maintenance level once OMERO 5.3.0 is released.

The best way to keep up-to-date with our release plans is to follow our public Trello boards

  • ‘Getting Started’ acts as a landing board for discovering more specific issue-based boards that may be of interest to you and gives an overview of upcoming releases with their current estimated timelines. You can view these boards without signing up but will need to register for a free Trello account if you want to subscribe to notifications or comments. You only need to supply a valid email address to sign up.

Our next OMERO release will be 5.2.1, which will feature bug fixes but also focus on improving our sysadmin documentation and installation workflows. We recognise that often groups wishing to use OMERO do not have access to a dedicated sysadmin and are trying to alleviate the burden as much as possible. You can follow progress on the OMERO 5.2.1 board.

What’s next?

With OMERO’s use expanding in numbers and in breadth of domains, we are focussing our efforts on making OMERO even more powerful - ensuring that it provides proper facilities for more and more types of image data and metadata, and that it can be deployed in increasingly complex and heterogeneous configurations. Some of our longer term goals:

  • We have applied for funding to develop OMERO’s “federation” capabilities, to make it possible for multiple OMERO servers to be connected at the level of clients or servers.
  • For some time we have wanted to re-architect the framework we use for our OMERO.web and OMERO.insight clients and are seeking funding to do so.
  • Making Bio-Formats and OMERO work effectively with object stores is a major priority.
  • We aim to re-build OMERO’s image rendering engine, to make it faster, more powerful and able to support the creation of increasingly complex data visualisations.

As always, there is a long list of things to do, and we always appreciate any feedback or comments on our work, goals or directions. All of this work is aimed at improving and expanding OME’s support for the heterogeneous metadata which is the foundation of much of modern science (an early vision of the requirements that drive our work was published by Jim Gray and colleagues). Comments welcome on our Trello boards or via our other community channels.

In the shorter term, the bulk of our work involves improvements to metadata handling, especially for regions of interest (ROIs) and analytic metadata. We have been working with several use cases, especially involving super-resolution localisation microscopy (PALM, STORM, dSTORM, etc), high content screening (HCS) and digital pathology, to define common requirements for handling the metadata generated by manual and automatic processing of image data from these different modalities. A discussion of super-resolution data storage (e.g. this thread on the ome-devel mailing list) has helped drive this, as has our ongoing work on publishing large image datasets on the BBSRC-funded Image Data Resource (IDR)1) and participation in several ongoing data-focussed projects (e.g. MULTIMOT and CORBEL). We have already built defined tools for metadata import into the IDR (see the code repository) and will aim to harmonize these with other projects that are collecting metadata on scientific datasets (e.g. BioStudies). Import of metadata from spreadsheets and other text-based tabular formats will be supported, but we will also be supporting more modern, powerful data formats (e.g. HDF5). We will use new resources that provide online services for controlled vocabularies (e.g. EBI’s Ontology Lookup Service).

But next, we want to make ROIs first class citizens in OME’s applications. This means we will add an ‘ROI Folder’ concept, which will allow users and analytic tools to cluster ROIs just like they already do with images and tags. All of these new capabilities will start to appear in OMERO 5.3 (currently scheduled for Spring 2016) and in several point releases through 2016.

  1. At the time of publication, this was referred to as the ‘Image Data Repository’. 

Ending Java 6 support

Following our published roadmap for Java support we are ending support for Java 6 with the release of OMERO and Bio-Formats 5.2 later this year.

We are not alone

This will potentially affect users of the ImageJ plugins for Bio-Formats and OMERO-ImageJ (OMERO.ij). Note that Java 6 has been unsupported since February 2013 and will no longer work with MacOS X after 10.11. This change is also being made by other ImageJ plugin developers.

Version support

Our current and planned support for Java 6 and Java 7 is as follows:

Bio-Formats and OMERO Minimum Java
5.0 (old) 1.6
5.1 (current) 1.6
5.2 (forthcoming) 1.7

The change for 5.2 will affect ImageJ and Fiji with a bundled version of Java 6, and users of non-bundled ImageJ and Fiji who have Java 6 provided by the operating system (this includes older Linux distributions providing OpenJDK6).

What to do

In all cases, it should be possible to download a 1.7 or 1.8 JRE for your platform from Oracle, or alternatively install OpenJDK 7 or 8 for supported platforms.

One exception is older and unsupported versions of MacOS X (10.6 and earlier), however support for these versions was already dropped with the 5.1 release.

If you upgrade the system’s version of Java, you can then run a version of ImageJ or Fiji without a bundled JVM.

ImageJ bundle users

Users of ImageJ with a bundled JVM may download a new version either using the “platform independent” version without a bundled JRE, or download the MacOS X or Windows versions with a bundled Java 8. These are marked as experimental, but our testing has shown them to be perfectly functional with the Bio-Formats and OMERO.ij plugins.

Fiji users

Fiji for “all platforms (no JREs)” will work with the system Java 7 or Java 8. The Fiji downloads for individual platforms currently provide Java 6, but will be bundled with Java 8 in the near future. For now, the “all platforms (no JREs)” download is recommended.

If you can’t upgrade

Users who are unable or unwilling to upgrade their Java to version 7 or later will be able to continue to use the 5.1 and earlier releases of Bio-Formats and OMERO with ImageJ, both of which will retain Java 6 support for their lifetime. However, we do not intend to back-port new features and would always recommend you use a Java version with security support.

The slow death of Java Web Start

For the past few years, we have supported the distribution of the OMERO Java desktop clients as Java Web Start Applications. This feature was requested by several institutions and we are aware that some continue to use it. We acknowledge that Java Web Start is a practical and still active way to distribute the applications. But, due to the steady increase of issues not under our control, continuing to support the use of Java Web Start for distribution of the OMERO Desktop clients is not sustainable and is likely to become impossible in the near future.

Java applets and almost all NPAPI plugins are becoming obsolete and are being replaced by Web-based technologies, probably for the better due to the security risks that plugins bring. NPAPI plugins have now been removed from the latest version of Google Chrome and Chromium. We believe the days of Java Web Start applications are probably numbered.

In recently released versions of OMERO, we made significant effort to unify the OMERO Desktop and Web clients. With new import options available via, for example, OMERO.dropbox or in-place, more features like ROI support being worked on in the Web client and the removal of Java support in Google Chrome, we feel that it is time to deprecate the distribution of our Desktop clients as Java Web Start Applications. We emphasize that this decision is really out of our control, and reflects the current trends in security policies being enforced in web browsers. As such, we will be deprecating Java Web Start in the upcoming 5.1.4 release and will stop providing these applications from OMERO version 5.2. We understand that this will cause issues for some members of our community but we really have no choice in the longer term. We will continue to expand the functionality of the OMERO web client to try to mitigate this as much as we can.

Below we discuss the technical background behind this decision in more detail.

Background

Java Web Start was introduced in 2001 to allow applications to be launched through browsers or directly via the Java Network Launching Protocol (JNLP).

Java Web Start Applications do not run inside the browser like Applets but still run in a restricted environment. Those restrictions can be removed by signing the applications with a trusted certificate. This has been encouraged since Java 7 update 21 to reduce risks for desktop users and it has been reinforced in follow-up updates. Two security changes to enhance authentication and authorization for Java Web Start (and Applets) were introduced back in January 2014 with Java 7 update 51.

As of Java 7 update 51 Rich Internet Applications must contain two things:

  1. code signatures from a trusted authority
  2. Manifest Attributes:
    • Permissions (required) indicating if the application should run within the sandbox or it requires full-permissions.
    • Codebase (recommended): location of the hosted code

Due to vulnerabilities affecting Java plugins, security experts frequently recommend users disable Java at least in their browser. Since 2013, Firefox, Google Chrome and other browsers have started to block plugins by default.

What does it mean for desktop developers/administrators?

To deploy Java Web Start, one first needs to get familiar with Deployment Rule Sets. Administrators can then create a list of known-safe applications and manage compatibility between different versions of Java on the system. Each browser will have their own set of dialogs and control mechanisms.

It is getting harder and harder to distribute Java Web Start applications for developers and/or administrators.

What about Browser support?

The Java plugin for web browsers relies on the cross-platform plugin architecture NPAPI, which has been supported by all major web browsers for the past 10 years. In version 45 (released Sept 2015), Google Chrome has dropped support for NPAPI plugins like Java. This means that you can’t enable Java in Google Chrome 45 (or later). Firefox, Internet Explorer and Safari still continue to support it but for how long?

During our testing, we are increasingly encountering unpredictable issues across all platforms, with various combinations of browsers, browser versions, Java versions and Java Web Start.

This means that even if we had the resources to devote to supporting this in the near future, we would only be delaying the inevitable.