Una red global de investigadores

Tips for early career life science researchers in choosing and using data repositories

Creado por Kojo Ahiakpa | Dic. 02, 2024  | Resources Career tips Researcher Experience

Data repository overview

A vital challenge for early-career life scientists in today's data-driven research environment is the efficient management and exchange of research data. Making sure research data is accessible, reusable, and meets the needs of many stakeholders relies heavily on choosing the right data repositories. Researchers in their early careers will find this blog useful as they traverse the maze of options when it comes to selecting and making use of appropriate data repositories.

Data repositories are essential to contemporary science and provide much more than simply a place to store information. Research data, code, and other outputs can be safely and easily stored and shared through data repositories, which are online storage infrastructures (Fig.1). By giving permanent identifiers (DOIs) for citation purposes, these platforms guarantee that research data will always be findable and available to the wider scientific community. Finding, Accessible, Interoperable, and Reusable (FAIR) principles are becoming more important for academics to follow in order to satisfy the demanding standards set by journals and funding agencies. Additionally, they make it easier to validate and reproduce studies, which are fundamental to scientific progress.

A cylindrical outline with floating text reading 'repositories'; data sets', 'funding agencies', data producers', and 'publishers'. Each text has an arrow then pointing out of the cylinder to a cube outline which contains floating text of 'metadata management - mapping - indexing', 'metadata ingestion', 'terminology server - query expansion - 'result ranking' and 'datamed user interface search engine'. On the side of the cube outline is the word bioCADDIE.
Fig. 1. Data respository architecture. Source: Nature publishing

 

Choosing and utilising data repositories

            There are three primary kinds of repositories that researchers can select from (Fig.2). Protein data bank archive, which stores protein structures, and GenBank, which stores sequences, are examples of domain-specific repositories. These repositories offer specialised metadata standards and tools for storing specific types of data. A significant development in digital library systems, institutional repositories provide all-encompassing spaces for gathering, cataloguing, and retaining an institution's scholarly works. These archives serve as high-tech online libraries by carrying out essential tasks related to information management such as building collections, organising, cataloguing, curating, preserving, and making digital content accessible. Institutional repositories are important for more than just storing information. They allow researchers to self-archive their works, which increases the impact and visibility of institutional research, and they are significant instruments for research dissemination. In addition to bolstering information management, research evaluation, and the larger open access movement, this self-archiving capability has revolutionised the way academic institutions handle and display their intellectual contributions. For anyone trying to make sense of institutional repositories, Clifford Lynch's seminal 2003 definition is still the gold standard. He described them as infrastructure that higher education and research institutions provide to their local communities to house and share digital resources made by faculty and students. The importance of an organisation's dedication to digital stewardship, which includes preservation for the long term and distribution that is easy to understand, is central to this concept. The content of the repository mirrors the goals and objectives of the institution. As the MIT Institutional Repository shows, it is common practice for universities to host collections that cover a wide range of academic fields.

However, discipline repositories tend to function autonomously from particular institutions and centre on specific areas of study. The German-language psychology database PsyDok and the social science open access repository are two such examples. These archives house both digitised historical documents and content that were born electronically, allowing them to compile extensive collections that cover both traditional and modern research findings. The development of digital content management has completely altered the way in which organisations store and disseminate information, solidifying the role of institutional repositories as fundamental components of contemporary research and academic institutions. Zenodo, Figshare, Dryad, Harvard Dataverse, Open Science Framework, Mendeley Data, and Science Data Bank are general-purpose repositories that take a variety of data types from different fields. Typically, these repositories offer free basic services with paid premium options.

An image showing a blue shape that reads 'research data repository' in the middle of 7 different coloured circles. Clockwise starting from the top, a light blue circle reads 'General information', an orange circle reads 'Services', a purple circle reads 'Policy', a green circle reads 'Legal Aspects', a red circle reads 'Technical Standards', a dark blue circle reads 'Metadata Standards', and a grey circle reads 'Quality Standards'
Fig. 2. How to choose a data respository. Source: Zenodo and National Institute of Health

 

Criteria for selecting a data repository

            There are a number of important factors to think about while choosing a repository. Here are three key points to think about:  

Checklist 1: Is there personal identifiable information in your data?  If so, you should make use of repositories with restricted access, such as Figshare, Zenodo, or OSF.  

Checklist 2: Is a repository that caters to a specific field of study accessible? Use discipline-specific databases for your data if that's the case.

Checklist 3: Is there a repository at your University or institution?  If yes, then think about your library's collection if that's the case.  Otherwise, make use of generalist databases.  

Early career researchers should always verify that the chosen repository is compatible with their data formats and that there are no size or file type limits before submitting any data. The credibility and consistency of the repository among researchers are of utmost importance. You should seek out well-established repositories with CoreTrustSeal, which provides solid institutional support and certifications. Your sharing requirements should be in sync with the accessibility options, such as licensing choices and public access capabilities. Research budgets, including free storage constraints and fees for bigger datasets, need to be considered early on.

 

 Tips for selecting data repositories for early career life scientists

A number of niche repositories have established themselves as go-to options for the biological sciences. Common databases utilised by genomics researchers include GenBank, Gene Expression Omnibus (GEO), and the European Nucleotide Archive (ENA). Repositories such as the Image Data Resource and the BioImage Archive are good places to deposit imaging data, whereas ProteomeXchange and the PRIDE Archive are good places to deposit proteomics data. It takes meticulous preparation and focus on detail to make good use of data repositories. You should start by making sure your data is clean and organised, creating detailed metadata, and choosing the right file formats (Fig. 3). When submitting to a repository, be sure to follow their submission criteria, fill out all metadata boxes, and select appropriate licenses. Make sure the data is accessible, save the DOI, and check that it integrates properly with relevant papers after submission. Preparing for data sharing at the outset of your research endeavour will help you avoid common mistakes. Provide comprehensive documentation of your data, use appropriate version control, and allocate funds to cover the expenditures of the repository. Keep yourself updated on the latest news in your field on data sharing standards, evolving funding needs, and new repository possibilities.

Image showing an oval shape in the middle of the image, with 11 small circles arranged equidistant around the oval. The text inside the oval reads 'Generalist Repository Ecosystem Initiative (GREI)' and 'Objectives'.' Clockwise from the top orange circle 'Implement Best Practices for Data Repositories', green circle 'Support Discovery of NIH-Funded Data', light green circle 'Adopt Consistent Metadata Models', blue circle 'Facilitate QA/QC', yellow circle 'Connect Digital Objects', orange circle 'Catalog Use Cases Supported', green circle 'Implement Open Metrics', light green circle 'Prepare Training Materials', blue circle 'Commit to "Coopetition"'
Fig. 3. How to choose a data respository. Source: Zenodo and National Institute of Health

 

New repositories, standards, and regulations are evolving on a regular basis, changing the research data management landscape. To stay updated on advancements in this ever-changing sector, stay connected with resources like as FAIRsharing.org, re3data.org, and the data services staff at your institutional library. Choosing and using data repositories with care will increase visibility of your research and effect while also ensuring compliance with standards. You may help advance science and provide the groundwork for new discoveries in the life sciences by making your data discoverable, accessible, and reusable.

 

Kojo Ahiakpa, Ph.D. is an agribusiness consultant with Research Desk Consulting Limited in Accra, Ghana. He has published more than 60 research articles, book chapters and conference proceedings in reputable journals and serves as a reviewer for several journals. His research interests include crop genomics, vegetable science, agribusiness and project management. He is an author, independent researcher and startup founder. 

 

Thumbnail image: the image was created using DALL-E (3) on 02/12/24, by Maisie Northing, INASP

blog comments powered by Disqus