The OME Blog Because metadata is worth a thousand pictures

Update on the OME-TIFF pyramidal format

Designing an open multi-resolution file format

In 2012, the OME Consortium started working on the support of multi-resolution, or pyramidal, proprietary file formats. The result of this work was the addition of several proprietary whole slide image (WSI) file format readers in Bio-Formats releases during 2013-2017. As often happens, proprietary file format (PFF) readers are only part of the solution required by the scientific and clinical communities - see also our blog post. We have been repeatedly asked to build an open, supported alternative for storing large WSIs. For some time, we have deferred to the work in DICOM Supplement 145 introduced in 2010, but this work has not produced viable, supported, open implementations nor considered the rapidly emerging field of multiplexed WSIs. Therefore, we started working on an extension to the OME-TIFF format specification to support pyramidal resolutions at the beginning of 2018. This scoping exercise resulted in a formal open proposal announced to the community in February 2018.

Extending a widely adopted format specification with hundreds of instruments generating files in production is not without risk. Compatibility between the various formats and software version is usually a key target of the process. The principal challenges are:

  • maintaining the usability of new files using either older versions of OME software or third-party software
  • minimizing the risks of conflicts of the proposed extension with other specifications

The OME team has reviewed several approaches for representing multi-resolution pyramidal levels in TIFF files. In the end, the solution we chose offered the maximum compatibility with existing reference tools while remaining unambiguous and compliant with the official TIFF specification.

Reading and writing OME-TIFF pyramids

Following the decisions on the formats specification, we began working to enable support for reading and writing OME-TIFF pyramids in our Java-based application stack, specifically, Bio-Formats and OMERO.

We demonstrated the first part of this work during the 2018 OME Users Meeting in Dundee. We presented example OME-TIFF files derived from public pyramidal files acquired using the Leica software. Furthermore, these files were imported into an OMERO server.

Pyramidal OME-TIFF WSIs stored as pyramidal OME-TIFF explored visually using OMERO.iviewer Web viewer

Since then, we focused our efforts on the generation of pyramidal OME-TIFF files using Bio-Formats. As of today, we are proud to have a milestone release of Bio-Formats available here with full support for reading and writing open OME-TIFF files with support for pyramids. The software allows both the conversion of existing proprietary file formats as well as the generation of pyramidal levels from large resolution planes.

Towards Bio-Formats 6.0.0 and beyond

What are our next steps? We first require more testing to ensure that the changes have not affected our existing image reading library and are compatible with Bio-Formats’ existing support for pyramidal file formats. This includes several tens of file formats generated by manufacturers like Leica/Aperio, Hamamatsu, and Zeiss.

We are also working on ongoing performance improvements to ensure consumers can write compliant OME-TIFF rapidly either using the API directly or via our end-users tools like the Bio-Formats command-line utility, the ImageJ plugin or the OMERO exporter.

Finally, we will review the API changes necessary so that Bio-Formats can read and write compliant OME-TIFF pyramids. We have maintained our API fully backwards-compatible for the last two years following our last release of the OME model. At present, we believe a new major version of Bio-Formats 6.0.0 including breaking API changes will be released by the end of 2018. An OMERO release that supports Bio-Formats 6.0.0 will follow, but the exact date is not yet known. For those of you following our develop branch, you will see IDR consuming Bio-Formats 6.0.0 not too long after release.

Once this work is completed, our next aims in terms of formats will be:

  • Continue the work on OME formats using new binary vessels also presented at the 2018 Users Meeting. Bio-Formats 6.0.0-m3 already includes a reader for the Keller Lab Block format and Bio-Formats 6 will include a reader for the Big DataViewer format.
  • Work with the community to keep OME-Files C++ support for pyramidal OME-TIFF in sync with the Java changes - read this ome-users thread to know more about recent improvements driven by the community.

The Image Data Resource is added to the list of Scientific Data recommended data repositories

Nature Research journal Scientific Data is a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data. These data descriptors provide a path for publishing datasets associated with scientific publications.

Scientific Data requires that all datasets related to a data descriptor, including experimental, computational and curated data, should be submitted to an appropriate open, public community repository. Although Scientific Data mandates the release of the datasets that accompany their manuscripts, they do not themselves host data. Instead, they encourage submission of data to community-recognized data repositories where possible.

The Image Data Resource (IDR), the world’s largest public bioimage database, is now recommended by Scientific Data as a repository for bioimaging data. Authors wishing to submit a data descriptor that includes imaging data will be asked to deposit the data in one of the image data repositories that now include IDR. IDR accepts reference image datasets (as defined at IDR Submissions). Datasets that are not reference images can be published using data archives (e.g. BioStudies or Data Dryad) or, if appropriate, any of the other imaging data repositories recommended by Scientific Data (e.g. EM DataBank, Cancer Imaging Archive, SICAS Medical Image Repository, or Coherent X-ray Imaging Data Bank). Datasets published in IDR will receive a Data DOI that should then be included during the Scientific Data manuscript submission process. This follows the successful publication in IDR of the imaging data for a Scientific Data data descriptor by Pascual-Vargas et al on the role of Rho GTPases in triple negative breast cancer.

The Scientific Data repository list is mirrored for use by other Springer Nature journals and is therefore also available for use by authors submitting to other Springer Nature journals. As an example, an article published by Kilpinen et al in Nature is supported by imaging data that has been archived in IDR.

We look forward to publishing many more imaging datasets associated with Scientific Data and other Springer Nature articles now that this partnership is officially launched!

Quality figures from OMERO.figure

This is a repost from the figure.openmicroscopy.org blog where its creator Will Moore talks about how OMERO.figure works:

I recently read an excellent guide by Benjamin Names on How to Create Publication-Quality Figures.

It describes the goals of scientific figure creation (accuracy, quality, transparency) and a very thorough workflow to achieve these goals. The key is to understand your data and how it is stored, manipulated and displayed. In particular, it is important to minimise the number of steps where data is transformed and perform lossy steps as late as possible in the figure creation process.

Benjamin documents specific tools that he uses for his workflow such as ImageJ for images and Inkscape for figure layout. But much of his workflow can also be applied to OMERO.figure since it was designed with the same principals in mind.

I highly recommend you read the guide above, as it provides a lot of background information on how computers handle vector and raster data. The steps of Benjamin’s guide that can be replicated in OMERO.figure are described below.

Preparing figure components (High-bit-depth images)

The OMERO server stores your original microscope files and can render them as 8-bit images using your chosen rendering settings. Single-color LUTs can be applied to each channel over a specified intensity range and channels can be toggled on and off. None of these changes will alter the original microscope data. OMERO.figure does not require you to save 8-bit images before creating a figure, since all rendering is done ‘live’ within the figure app itself after importing images, as described below.

Figure layout

OMERO.figure is similar to Inkscape and Adobe Illustrator in that it defines figures in a vector-based format that embeds linked images. This means that moving and resizing images within a figure does not require resampling of pixel data, avoiding loss of image quality.

screenshot

Screenshot: Editing layout and rendering settings in OMERO.figure.
Data from Wang et al JCB 3013.

Importing images

To add images to OMERO.figure, you simply specify the ID of the image in the OMERO server. The necessary data such as image size, number of channels, pixel bit-depth etc is then imported from the server. You can then edit the image rendering settings while working on the figure layout and these changes are stored in the OMERO.figure file. The file format is a Javascript object (saved as json data) and contains no pixel data. OMERO.figure retrieves rendered 8-bit images from the OMERO.server and assembles them into a figure in the browser as needed.

The resolution (dpi) of images in OMERO.figure is calculated from their size on the page and the printed size of the page itself (which can be edited under File > Paper Setup…). The dpi of each image can be seen under the ‘Info’ tab and will change as the image is resized and zoomed.

Journals usually require all images to be at 300 dpi or above in order to avoid a pixelated appearance when figures are displayed at their published size. If you need to increase the dpi for an image, you can set an export dpi and the panel will be resampled as necessary in the exported PDF.

Clipping masks

OMERO.figure allows you to crop images. It uses a ‘clipping mask’ to produce the cropping effect which means you can undo or edit the crop at any time. You can crop by using the zoom slider to zoom the image, then pan to the desired spot, or you can use a standard ‘crop’ tool to draw a crop region on an image.

Calculating scale bars

Scale bars can be easily added to images in OMERO.figure and the known pixel size will be used to calculate the correct length. Scale bars are vector objects overlaid on the image and will be automatically resized if you resize or zoom the image.

Exporting final figure files

OMERO.figure offers export in PDF and TIFF formats. Both are generated on the OMERO server using a Python script and the Python Imaging Library (PIL) for image manipulation. Figures are saved on the server and are then available to download.

Creating TIFF Images

Tiff images, at 300 dpi are generated by resampling all the embedded images using a bilinear filter. Vector data such as labels and scalebars is rasterized and overlaid on the image.

Creating PDF Files

The Python script uses the Reportlab library to produce PDF files. Images are rotated, cropped and resampled to the chosen dpi as necessary and saved as TIFFs before embedding in the PDF. Labels and scalebars remain as vector objects that can subseqently be manipulated in other vector-based packages if needed.

Export with images

An additional option with TIFF or PDF figure export is to export all the embedded images as TIFFs, saved at each stage of the figure generation process:

  • As 8-bit images at full size as rendered by OMERO
  • After cropping & rotating, but before any resampling
  • Finally, saved as they are embedded in the figure

This option increases the transparency of the image processing steps, and also provides images that can be used for other purposes if needed.

Summary

OMERO.figure is a web application that stores figures in a vector-based file format linked to images. By linking to the original microscope images in the OMERO.server, we have complete control over rendering of high bit-depth images within the figure. Only when the figure is exported do we need to save images as 8-bit TIFFs. This pushes the lossy and file-format specific steps to the very end of the figure creation process, ensuring the highest possible quality of images in the final figure.

Thanks to Benjamin Names for his original guide which provided the basis of this article.

(Originally published on April 30th 2015)

OME FAIR

Recently there have been several publications and substantial discussion about the FAIR principles (see for example, Wilkinson et al, 2016 and the Force11 Fair Data Principles). Overall, the goal of the FAIR principles is “to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of …scientific data and their associated algorithms and workflows.”1 These principles are extremely powerful but as has been repeatedly noted, the routine implementation of FAIR principles is a significant challenge.

Imaging datasets present a particular challenge for implementing FAIR. The datasets are large, multidimensional and complex. Perhaps most importantly it is probably unrealistic to suggest that a single metadata standard will handle the huge diversity of imaging experiments and datasets. In the best possible case, it is likely that there will be families of metadata standards or flexible APIs, each tuned and designed for accessing specific types of imaging metadata.

OME has been working on the image data publication problem for many years. Our recent work on the Image Data Resource (IDR) is an example of an added value database that integrates imaging data from many biological imaging datasets and links gene and drug perturbations with cell phenotypes2. IDR focuses on reference image datasets, i.e. those datasets that have high levels of biological and molecular annotations and have a strong likelihood of re-use by the scientific community.

Our work on IDR has been well-received and the resource is growing in size and usage. However, IDR doesn’t address more routine data publication; the datasets that are not reference images, but are associated with a scientific publication in the biological sciences. For example, our lab in Dundee has recently published a paper that explores the interaction of a single protein Bod1 and the Ndc80 complex, a protein complex that mediates the attachment of microtubules to connect to chromosomes during cell division3. How to publish the imaging data associated with this paper?

As you might guess, we’ve used OMERO to publish and link these data. We’ve used our institutional OMERO server, and used an institutional DOI as the definitive link to the data. The datasets associated with this paper were imported into OMERO as part of the analysis workflows and then were moved into a public OMERO group for publication. The data can be browsed, searched, viewed and downloaded. We believe we’ve made the datasets “AIR”—Accessible, Interoperable and Reusable. Making these datasets truly “Findable” will take more time as we develop routine landing pages and JSON-LD-based metadata for these images.

In the meantime, we thought it might be useful for the community to see how we have achieved this work. With the latest releases of OMERO (5.4 and beyond), we have made it fairly easy for images to be managed and published online. Documentation describing exactly what we did is available4.

We hope this work is an important contribution to the movement for making data available online. We believe we’ve made reasonable progress in making data AIR and look forward to fully achieving the goals of the FAIR principles.

OME Project Status Update

The year so far…

2017 has been a very busy year for the OME team so far with:

  • 2 major OMERO releases plus 6 security or patch releases - introducing ROI Folders, a whole new UI layout for HCS data, a configurable restricted admin user role and many other fixes and improvements
  • 12 Bio-Formats releases to date (and another in the pipeline) - featuring 2 new formats and improved support for many more
  • 4 OME Files C++ releases - improving support for reading and writing OME-TIFF in an open, liberally-licensed, native library
  • 4 OMERO.figure releases - adding support for loading ROIs and choosing look-up tables from OMERO, using markdown syntax for italics and bold labels, setting background colors, plus other fixes and improvements
  • various other web app releases - all now available from PyPI, including OMERO.FPBioimage, a volumetric visualization tool
  • the launch of our new OMERO.iviewer - a web browser-based interactive multidimensional image viewing app that includes ROI drawing and measurement functionality. OMERO.iviewer is approaching a full v1.0 release, a sign of our commitment to maintaining and growing the functionality in this app
  • AND a new website!

Plus of course, our usual Annual User Meeting bringing together members of the community from across the globe (if you couldn’t make it, talks and workshops are available from the event page).

The Image Data Resource (IDR) also continues to go from strength to strength. If you missed it, our Nature Methods paper (or the open access PubMedCentral version) discusses how the IDR can be used to obtain new biological insights from existing datasets, plus an in-depth explanation of how the resource is set up. It also includes information on how you can set up your own IDR.

With all that, hopefully you’ll forgive us for neglecting this blog!

What’s still to come…

We’re running two days of Bioinformatics training at Cambridge University in December. In the run up to this, we’ll be developing new training materials which will all be available via our website for those of you who can’t attend. We’ll also be expanding our collection of Jupyter notebooks providing you with examples of how to carry out image analysis via the OMERO API and likely adding to our collection of how-to movies on our YouTube channel. The IDR will also be updated, publishing several new datasets (our first light sheet fluorescence microscopy dataset is almost ready to go!) and improvements to the Jupyter analysis tools.

In terms of releases, we expect to put out patch releases for OMERO, Bio-Formats and OME Files. As always the content of these will be driven by both the community and our own projects, for example the IDR continues to challenge us in terms of format support and display, and the way our tools connect and interact with analysis packages. You can always keep up with the latest upcoming releases and more via our Trello boards (you can sign up here with just an email address).

Beyond this, we are continuing to push forward our functionality and the scale at which our tools operate. As imaging technology takes huge leaps in scale, this is challenging to say the least and we’ll need the support of the community more than ever. If you have resources to contribute, we’d love you to get involved - write to the mailing lists or forums, check out our contributing developer docs, read our blog posts on helping us support new file formats. Even if you don’t have resources to spare, you can always help us secure grant funding by citing our work in your publications.