Skip to main content

Research Data Management and Data Publication with Chemotion Repository

Pei-Chi Huang1, Chia-Lin Lin1, Helena Šimek Tosino1, John Jolliffe3, Nicole Jung1, 2

1. Institute of Biological and Chemical Systems, Karlsruhe Institute of Technology
2. Institute of Organic Chemistry, Karlsruhe Institute of Technology
3. Johannes Gutenberg University Mainz

The Chemotion Repository is a specialized platform established for chemistry and related disciplines, designed to preserve molecular synthesis and characterization data. It offers a comprehensive suite of tools that support data collection, preparation, and reuse through discipline-specific methods and processing tools. Automated procedures for selecting analytical data streamline the organization process, while the platform's data publishing and citation features include automated Digital Object Identifier (DOI) generation, peer review workflows, customizable embargo settings, and registration with databases such as DataCite and PubChem. The Chemotion Repository provides a robust data management solution, preserving domain-specific information in a machine-readable format to address the challenges of contemporary research data management.

The repository supports hybrid authentication, allowing scientists to log in using system-specific credentials or through Federated Single Sign-On (SSO), which utilizes trusted external identity providers like ORCID, or home institution credentials in DFN-AAI Federation-participating institutions. The research data can be provided either through direct upload or by transferring data from Electronic Laboratory Notebook (ELN) instances, eliminating the need for re-description of data.

The Chemotion Repository continually enhances its functionality and introduces new technologies to benefit the research community:

  1. Data Exposure and Harvesting: The repository supports data exposure and harvesting via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and offers a set of APIs. These APIs allow scientists to retrieve public data in standardized and verified schemas, available in formats such as XML and JSON-LD. The ontology terminology coverage for metadata in JSON-LD format is continuously expanding, enhancing data interoperability across platforms.

  2. Data Viewer Functions: Tools such as ChemSpectra, NMRium, and Structure Viewer are integrated to visualize and analyze various data types without requiring additional software.

  3. Automatic Metadata Extraction: The repository features automatic metadata extraction from raw files, with seamless mapping to datasets through the use of ChemConverter and Chemotion LabIMotion, improving the overall quality, completeness, and accuracy of the datasets.

  4. Data Transfer Enhancements: Significant improvements have been made to accommodate large volumes of reactions, samples, and other data elements, ensuring a more reliable data transfer process.

  5. Review and Publishing Processes: Enhanced features have been introduced for the review and publishing processes, including the ability to add additional reviewers and comment on embargoed collections.

  6. Connecting Data: Sample information in the Chemotion Repository is now linked to the physical presence of compounds in the Molecule Archive of the Compound Platform, to ensures that a physically available sample is not only made FAIR through metadata but also becomes accessible and reusable as a material for others; the vendor compound are published in Chemotion Repository (StartingMaterial4Chem) to benefit scientists, such as Sustainability - saving resources as fewer NMRs have to be measured.

The Chemotion Repository directly addresses the need for accessible detailed information and original data associated with scientific results. With the growing application templates in Chemotion LabIMotion, such as Polymer, (Sur)MOF, cyclic voltammetry (CV), and X-ray diffraction (XRD), the repository now offers more comprehensive support for related discipline workflows. Coupled with its review and publishing processes, the long-term deposition of data in repositories, as it becomes a standard practice alongside the publication of original research, will significantly enhance the transparency and quality of scientific work.