The OME Blog Because metadata is worth a thousand pictures

OME-Zarr HCS specification (Dec. 2020)

As discussed in our previous post, we have now extended our NGFF specification to support high-content screening.

The official specification has been migrated to its own repository and the current version 0.1 including support for multiscale images, labels and HCS is published at https://ngff.openmicroscopy.org/0.1/.

All the open-source projects reading or writing the specification in various languages are listed in an implementations section.

Like previously, we have converted representative plates from Image Data Resource (IDR) for various HCS acquisition modalities into OME-Zarr. Samples are publicly available and hosted on S3 storage in the Embassy Cloud at the European Bioinformatics Institute (EBI).

Below is an example of a visual representation of an OME-Zarr plate using the vizarr viewer.

The table below list all plates converted into version 0.1 of the OME-Zarr spec. This list is also available in JSON format.

Plate Study Wells Fields of view (acquisitions) Image dimensions (XYZCT) Viewer S3-endpoint
IDR thumbnail for image:1229801 idr0001 96 6 ( 6) 1376x1040x16x2x1 view copy
IDR thumbnail for image:179693 idr0002 96 1 ( 1) 1344x1024x1x2x329 view copy
IDR thumbnail for image:692149 idr0004 69 1 ( 1) 672x510x10x2x1 view copy
IDR thumbnail for image:3230447 idr0033 384 9 ( 1) 1080x1080x1x5x1 view copy
IDR thumbnail for image:10567921 idr0094 96 9 ( 1) 1080x1080x1x1x1 view copy

Public OME-Zarr data (Nov. 2020)

The OME team is leading a community effort to design a new cloud-friendly “Next Generation” file format (NGFF). See the announcement and other image.sc posts tagged with ome-ngff.

The ome-zarr spec is currently under development and includes a changelog with version numbers. A number of tools are being developed to work with ome-zarr data. The vizarr viewer is used below to view OME-Zarr images.

As part of this work, we are converting sample datasets from Image Data Resource (IDR) into OME-Zarr, to correspond to each version of the spec. These are publicly available and hosted on S3 storage in the Embassy Cloud at the European Bioinformatics Institute (EBI). Below are listed Images converted into version 0.1 of the OME-Zarr spec. This list is also available in JSON format.

We are currently working on a spec for representing HCS data in OME-Zarr. When this spec is finalised, we will also provide links to sample HCS data in this format.

Image Study X Y Z C T Viewer S3-endpoint
IDR thumbnail for image:1884807 idr0021 256 256 1 3 1 view copy
IDR thumbnail for image:4007801 idr0044 2169 2048 988 2 532 view copy
IDR thumbnail for image:4495402 idr0053 921600 380928 1 1 1 view copy
IDR thumbnail for image:6001237 idr0062 1024 1024 39 4 1 view copy
IDR thumbnail for image:6001238 idr0062 1024 1024 27 4 1 view copy
IDR thumbnail for image:6001239 idr0062 1024 1024 36 4 1 view copy
IDR thumbnail for image:6001240 idr0062 271 275 236 2 1 view copy
IDR thumbnail for image:6001241 idr0062 278 263 236 2 1 view copy
IDR thumbnail for image:6001242 idr0062 481 275 237 2 1 view copy
IDR thumbnail for image:6001243 idr0062 212 218 237 2 1 view copy
IDR thumbnail for image:6001244 idr0062 325 281 239 2 1 view copy
IDR thumbnail for image:6001245 idr0062 220 226 257 2 1 view copy
IDR thumbnail for image:6001246 idr0062 281 284 257 2 1 view copy
IDR thumbnail for image:6001247 idr0062 253 210 257 2 1 view copy
(with labels)
IDR thumbnail for image:6001248 idr0062 234 269 195 2 1 view copy
IDR thumbnail for image:6001249 idr0062 246 241 195 2 1 view copy
IDR thumbnail for image:6001250 idr0062 246 241 195 2 1 view copy
IDR thumbnail for image:6001251 idr0062 170 163 140 2 1 view copy
IDR thumbnail for image:6001252 idr0062 170 163 297 2 1 view copy
IDR thumbnail for image:6001253 idr0062 803 1024 297 2 1 view copy
IDR thumbnail for image:6001254 idr0062 393 354 51 4 1 view copy
IDR thumbnail for image:6001255 idr0062 551 416 32 4 1 view copy
IDR thumbnail for image:6001256 idr0062 350 372 61 4 1 view copy
IDR thumbnail for image:6001257 idr0062 3763 2860 145 4 1 view copy
IDR thumbnail for image:6001258 idr0062 1282 838 36 3 1 view copy
IDR thumbnail for image:9798462 idr0073 21115 16433 1 3 1 view copy
IDR thumbnail for image:9822151 idr0083 79360 167424 1 1 1 view copy
IDR thumbnail for image:9822152 idr0083 144384 93184 1 1 1 view copy
IDR thumbnail for image:9836831 idr0077 1920 1920 259 4 1 view copy
IDR thumbnail for image:9836832 idr0077 1920 1920 259 4 1 view copy
IDR thumbnail for image:9836833 idr0077 1920 1920 259 4 1 view copy
IDR thumbnail for image:9836834 idr0077 1920 1920 259 4 1 view copy
IDR thumbnail for image:9836835 idr0077 1920 1920 259 4 1 view copy
IDR thumbnail for image:9836836 idr0077 1920 1920 259 4 1 view copy
IDR thumbnail for image:9836837 idr0077 1920 1920 259 4 1 view copy
IDR thumbnail for image:9836838 idr0077 1920 1920 259 4 1 view copy
IDR thumbnail for image:9836839 idr0077 1920 1920 1 4 1 view copy
IDR thumbnail for image:9836840 idr0077 1920 1920 1 4 1 view copy
IDR thumbnail for image:9836841 idr0077 1920 1920 1 4 1 view copy
IDR thumbnail for image:9836842 idr0077 1920 1920 1 4 1 view copy
IDR thumbnail for image:9836843 idr0077 1920 1920 1 4 1 view copy
IDR thumbnail for image:9836844 idr0077 1920 1920 1 4 1 view copy
IDR thumbnail for image:9836845 idr0077 1920 1920 1 4 1 view copy
IDR thumbnail for image:9836846 idr0077 1920 1920 1 4 1 view copy
IDR thumbnail for image:9836950 idr0079 1636 816 156 1 1 view copy

OME's position regarding file formats

Below is a statement delivered November 2018 to the Euro-Bioimaging Industry Board regarding the support of proprietary file formats by Bio-Formats. This was discussed during the From Images to Knowledge with ImageJ & Friends meeting in December and since then, there have been a growing number of conversations about a common format for bioimaging data. We're posting it here to tie the conversations back together and continue an open discussion of this critical issue.

As many of you know, work on Bio-Formats began in 2006, and over the first 10 years of development, support was added for over 140 file formats. If you include the per-format variants that have emerged over the years, that might be as much as 5 or 10 times higher, but precise numbers are difficult at best.

Growth of file format readers (2006-2018)

In 2016, we issued a public statement that OME, or more specifically its funding model, was not going to keep up with the accelerated development of new formats. We warned that we would be spending less time on closed formats, and we suggested that format developers either move to open formats or invest their own time or money to support their formats.

Statement about format complexity (2016) Statement about format complexity (2016)

How did that turn out? Well, two years later the growth curve has naturally levelled off as we pursue other priorities. Currently there are just over 150 formats supported. One company, 3i, has taken over support of their own file format (Slidebook6) with a closed source reader that lives outside of Bio-Formats.

A few other companies have added support for their format either by contributing directly to the library or by commissioning Glencoe Software to do so. Where necessary, the open source team has added support for formats that are needed for their funded priorities like datasets published in the Image Data Resource.1,2,3,4

Paying for the initial cost of a format is not enough.

But paying for the initial cost of a format is not enough. The need for indefinite support carries a larger, longer-lived price tag that leaves data written in a given format constantly at risk. These costs are exacerbated by format variants. Even when a format is defined following standards like DICOM, there is a need to contend with multiple implementations as is the case in the radiology domain. The same happened with the Olympus OIR format added in 2017 in partnership with Olympus Europe. Following public release, the community has periodically reported breakages caused by new variants of the format. 5,6,7,8,9

Put simply, the format landscape has scaled beyond a manageable level.

Put simply, the format landscape has scaled beyond a manageable level. The result is that scientists end up blocked in accessing and properly handling their data, and thus blocked in their scientific endeavor. If Bio-Formats were to cease to exist, a large percentage of imaging data would immediately cease to be accessible at least until someone took on the burden of support.

We understand the push to develop new formats. From numerous interactions, we know how crucial it is for data producers to be able to write data quickly as well as it is for users to be able to access their data quickly, and both across as many platforms as possible. We also know that, optimally, this ecosystem should all just keep working for years to come. But while these requirements need to be fulfilled, something must give.

We think the only scalable way forward is to work together on an ever smaller number of formats.

We think the only scalable way forward is to work together on an ever smaller number of formats. That’s why we’ve been concentrating on open formats instead of adding new proprietary formats. For example, Bio-Formats 6.1 adds support for the open BigDataViewer (BDV) format, a strong candidate for support across the community.

Simple TIFF icon
Simple HDF5 icon
Simple zarr icon

BDV provides a testbed for moving beyond the current single binary format of OME-TIFF. The OME Model will be extended to permit describing the multiscale, multidimensional data that is currently stored in BDV XML/H5. As a stable container format, HDF5 allows us a quick way to validate these concepts.

At the same time there’s a consensus that HDF5 itself as currently implemented cannot be the only binary container for our community, and, therefore, we are also collaborating on next-generation open-source, chunked (or “cloud”) formats for the scale of data generated by future acquisition systems. Two candidates — Zarr and N5 — were independently developed but overlap in most of their core concepts. Both communities have since begun work on a common storage spec, and other groups from NetCDF to Pangeo are getting involved.

We would like to see a community agreement between the various parties on a minimal set of open formats covering a broad range of imaging modalities.

We would like to see the bioimaging community agree on set of open formats covering a broad range of imaging modalities. We need to reduce long-term cost of our domain’s file formats and their variants. We want data users and producers to be able to ensure the long-term viability of their data.

OME-TIFF has been available for over a decade and today is in use by software across industry and academia, minimally as an export format, but it still doesn’t have the traction to stop a proliferation of new file formats. As support for this new binary format solidifies, we intend to invest long-term support in a new OME format.

Some of this work is the regular work of supporting the bioimaging community, but we feel this is a larger effort that could use more collaboration and funding. We are considering an application to the CZI’s Essential Open Source Software call and welcome any coordinated efforts. Beyond that, a truly common format will need indefinite support, and we will continue to look for avenues to do so.


You’re invited to discuss this forum post on the image.sc topic.


OMERO 5.5 Status Update

After the intensive development period of IDR’s first releases, the 5.4 series of OMERO was intended to be a stable platform for the community and the OME team to build on. From its first release in October 2017 to its tenth and final release this year, 5.4 has, we think, served as a reference point for the community.

In trying to maintain that stability, however, it’s become ever more clear that we need the ability to quickly release individual parts of OMERO to the community. Fixes to file formats, performance improvements, security patches, and more should not need to wait on the simultaneous release of the entire OMERO platform.

Enabling such releases has been the focus of the upcoming, largely developer-centric release. Though with production-quality Docker deployments and the fresh-off-the-presses Bio-Formats 6.1, we hope that OMERO 5.5 will provide something for everyone.

Individual Repositories

During the development of OMERO 5.5, all 800,000+ lines of Java and Matlab code were migrated out of the openmicroscopy/openmicroscopy GitHub repository into individual repositories, each with a new Gradle build system. Support for Java 7, Python 2.6 and Ice 3.5 were dropped. Java 11 support was added. The versions of most of these new repositories began at 5.5.0, but they have already begun to diverge following semver principles. Though initially disruptive, we hope this modernization and modularization will ease participation in the development of OMERO. See the Gradle super project omero-build for more details.

omero-build in IntelliJ

Docker Deployments

Beyond the changes for building OMERO, the distribution of OMERO.server and OMERO.web as Docker images is now considered production quality. Examples for using these images in various configurations are available under omero-deployment-examples. Both images will be updated with every OMERO release, and will also be updated with releases of the embedded components and plugins as necessary.

Other Docker images from the OME team that you may have used over the years have been deprecated and will soon be removed. A next step will be to additionally provide Helm Charts for easing deployment on Kubernetes. If you are interested please get in touch through the forum.

Bio-Formats 6.1

But don’t worry: we also didn’t forget our users. OMERO 5.5 finally makes the jump to Bio-Formats 6 both with its support for pyramidal TIFFs (see post) and for new community file formats like BDV, see Bio-Formats 6.1 announcement for more details. Moving forward, we look forward to helping you to create and share these more scalable file-formats.

Beyond

In the coming months, we will continue to release fixes for the individual components of OMERO and hope to ease their introduction into your local installation. Feedback on how you find working with the decoupled repositories and installing changes would be much appreciated.

At the same time, we will begin preparing for the next large changes:

  1. With the deprecation of Python 2, all OME code bases will need to be upgraded to work with Python 3. Likely a similar modularization will be applied to the Python and Web code such that pip install -U omero-web should be all that is needed to receive the latest updates to OMERO.web.
  2. A development version of OMERO will begin with a flexible extension mechanism for instrument and eventually experiment metadata. This is likely to become the basis for OMERO 6 which, unlike OMERO 5.5, will require a database upgrade.
  3. And, OMERO will finally enter the age of microservices. Thanks to the substantial work by Glencoe Software, a number of standalone services are already available for integration into OMERO. See omero-ms-pixel-buffer, omero-ms-thumbnail, omero-ms-image-region, omero-ms-core and omero-ms-backbone.

P.S. In case you missed it, the OME forums have been migrated to image.sc

Initial release of OMERO.downloader

Introducing OMERO.downloader

For OMERO to properly fulfill the role of being a useful repository for microscopy images its users must have easy access to that data. As data sets grow in size it becomes a correspondingly greater challenge to provide access to that data. This motivates the creation of server-side solutions such as the IDR’s Virtual Analysis Environment. For the past couple of years the OME team has also been investigating ways to improve users’ ability to obtain data from OMERO for client-side storage and processing.

We now release OMERO.downloader v0.1.0, a Java application that acts as a command-line OMERO client. It writes selected data from an OMERO server into a local directory and creates soft links to represent some of the relationships among server objects. This is still an early version missing many features but it can already download some original files and metadata.

OMERO.downloader is designed to handle situations in which not all the specified data can be downloaded in a single session. If download is interrupted then it can be resumed by repeating the same command line invocation. If files have already been downloaded then they will not be fetched again.

Downloading original files

The files that were uploaded for OMERO image ID 1234 are available through:

./download.sh -u my-user -w my-pass -s my.omero.server -f binary,companion Image:1234

These are downloaded within the current directory. The -b option can be used to specify a different preexisting directory to use as a base for the downloads. We recommend using a different base directory for each OMERO server that you use because the directory structure created locally reflects how the server stores your data.

The above command would download image files into the Image/1234/Binary/ directory with any companion files (not containing pixel data) in the Image/1234/Companion/ directory. The files are soft links that, perhaps via a Fileset/ directory, link to files in Repository/. In the repository directory the binary and companion files are located together. On systems with the GNU Core Utilities installed a command like:

showinf `realpath Image/1234/Binary/my-image.fmt`

can be used to conveniently direct Bio-Formats’ command-line tools to the directory that includes the binary and companion files together.

The original files for multiple images can be downloaded by specifying, e.g., Dataset:123 or Image:1234,1235,1236. However, nothing stored in the base directory indicates which datasets or other containers held the downloaded images. Original files from plates may be downloaded only if the server’s omero.policy.binary_access setting is configured to permit it.

Exporting metadata as OME-XML

Metadata representing images, ROIs and some annotations can be fetched from the OMERO server and written locally as OME-XML:

./download.sh -u my-user -w my-pass -s my.omero.server -f ome-xml Image:1234

The OME-XML is stored in two forms: First, each top-level schema object is stored independently in separate files, e.g., in Image/1234/Metadata/image-1234.ome.xml. Soft links exist among related model objects, e.g., Image/1234/Annotation/567 may link to Annotation/567/ which contains Metadata/annotation-567.ome.xml. To use those files and links to list the IDs of the images that are tagged as “anaphase”:

grep -lr '^<Tag.*<Value>anaphase<' Image/*/Annotation/*/Metadata/ | cut -f 2 -d / | sort -nu

Second, each specified model object is assembled from the various object files into a single OME-XML document, e.g., Image/1234/Export/image-1234.ome.xml. The OME-XML files in Export/ can include multiple top-level schema objects: for example, with ROIRef elements linking an image to its ROIs.

As the pixel data is not included, any Pixels element contains MetadataOnly.

Plans for the future

OMERO.downloader is an early prototype: we have many ideas for how to improve both how it is engineered and what it can do. For instance, it cannot yet fetch map annotations or file attachments but both should be feasible. We have been working toward offering export of pixel data into TIFF or OME-TIFF, even for large images. This could make local image analysis easier for pathology images that are too large for the server to export or for plates where file download is disabled. We intend to benefit from new developments in Bio-Formats such as having large exported OME-TIFFs include pyramids.

There are also more ambitious possibilities. For example, OMERO.downloader’s operation could be parallelized for greater speed, a graphical user interface could be added, images’ container structure (screens, projects, etc.) could be fetched. Further work depends on what our user community most needs and what best supports our funded deliverables. We would gladly exchange design and implementation ideas with collaborators who wish to assist with OMERO.downloader development. In the meantime, we hope that the present version is already very useful to some scientists. We welcome questions and comments via our forum and mailing lists.