Design and technical challenges of building the data management system in Django for the Norwegian Institute for Nature Research following the FAIR principles. The solution relies on GDAL and cloud native file formats and technologies.
We present the journey to design and develop the Data Management System for the Norwegian Institute of Nature Research (NINA): the challenges we encountered, the solutions we evaluated, the decision to develop a new one, the enabling technologies we chose to build upon, and why we released it as open-source software.
https://github.com/NINAnor/dms: NINA Data Management System
This system allows linking together information datasets with existing data sources (scientific data, administration ERP for projects, users, web services), as well as allowing users to share datasets via PyCSW and PyGeoAPI, both within the institution and in national data catalogs.
The system is designed to be format agnostic, allowing the users to use their own storage backend and protocols, but it provides additional functionalities when using cloud-native formats such as Apache Parquet and COG (Cloud Optimized GeoTIFF). GDAL is used to query both spatial and non-spatial files. The NINA DMS supports various metadata schemas (ISO19115, ISO19139, DataCite) and harvesting from different sources (such as IPT).
Integration and synchronization with other services is managed through a set of data pipelines (https://github.com/NINAnor/miljodata-datasync: A set of pipelines to move data from different sources).