The OME Blog Because metadata is worth a thousand pictures


Recently there have been several publications and substantial discussion about the FAIR principles (see for example, Wilkinson et al, 2016 and the Force11 Fair Data Principles). Overall, the goal of the FAIR principles is “to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of …scientific data and their associated algorithms and workflows.”1 These principles are extremely powerful but as has been repeatedly noted, the routine implementation of FAIR principles is a significant challenge.

Imaging datasets present a particular challenge for implementing FAIR. The datasets are large, multidimensional and complex. Perhaps most importantly it is probably unrealistic to suggest that a single metadata standard will handle the huge diversity of imaging experiments and datasets. In the best possible case, it is likely that there will be families of metadata standards or flexible APIs, each tuned and designed for accessing specific types of imaging metadata.

OME has been working on the image data publication problem for many years. Our recent work on the Image Data Resource (IDR) is an example of an added value database that integrates imaging data from many biological imaging datasets and links gene and drug perturbations with cell phenotypes2. IDR focuses on reference image datasets, i.e. those datasets that have high levels of biological and molecular annotations and have a strong likelihood of re-use by the scientific community.

Our work on IDR has been well-received and the resource is growing in size and usage. However, IDR doesn’t address more routine data publication; the datasets that are not reference images, but are associated with a scientific publication in the biological sciences. For example, our lab in Dundee has recently published a paper that explores the interaction of a single protein Bod1 and the Ndc80 complex, a protein complex that mediates the attachment of microtubules to connect to chromosomes during cell division3. How to publish the imaging data associated with this paper?

As you might guess, we’ve used OMERO to publish and link these data. We’ve used our institutional OMERO server, and used an institutional DOI as the definitive link to the data. The datasets associated with this paper were imported into OMERO as part of the analysis workflows and then were moved into a public OMERO group for publication. The data can be browsed, searched, viewed and downloaded. We believe we’ve made the datasets “AIR”—Accessible, Interoperable and Reusable. Making these datasets truly “Findable” will take more time as we develop routine landing pages and JSON-LD-based metadata for these images.

In the meantime, we thought it might be useful for the community to see how we have achieved this work. With the latest releases of OMERO (5.4 and beyond), we have made it fairly easy for images to be managed and published online. Documentation describing exactly what we did is available4.

We hope this work is an important contribution to the movement for making data available online. We believe we’ve made reasonable progress in making data AIR and look forward to fully achieving the goals of the FAIR principles.