Skip to Main Content

Research Data Management

A comprehensive guide to the best practices for planning, collecting, working with, sharing and reusing research data

Data Collection


Data collection is a systematic process of gathering and measuring information on variables of interest. Collecting quality data that is valid and relevant to your research is the foundation of any successful research project.

Planning for Data Collection


Researchers are encouraged to first identify their data needs before getting started with the collection process. No matter the type of research, obtaining relevant data addressing your research questions prevents you from handling massive irrelevant data to ensure research efficiency. When planning for your data collection, you should consider:

  • aim of your research - what data will be needed to answer your research questions?
  • data sources - What types of data already exist? Any new data need to be generated?
  • methods and procedures to collect data - Are the measures used to collect data valid and reliable?
  • costs of collecting data - Are there enough resources to collect the data? Any additional costs for data analysis is required?
  • any ethical issues that may arise

Finding Existing Data


There are tons of data available online, however, it is not easy to find the data you need from thousands of datasets. Here are some strategies you can use for finding existing data for your research:

Here are some useful dataset search engines that allow you to easily search data across thousands of data repositories online.

Google Dataset Search is a simple keyword search engine which allows users to discover datasets from multidisciplinary fields such as life sciences, social sciences, machine learning, civic and government data, and more.


DataCite is a multi-disciplinary dataset search engine which gathers metadata for each DOI assigned to an object. The metadata is used for a large index of research data that can be queried directly to find data, obtain stats and explore connections. All the metadata is free to access and review.


gesisDataSearch is a discovery service operated by the GESIS Leibniz Institute based on up-to-date metadata that are harvested from social science research data collections worldwide.


Operated by Bielefeld University Library, BASE is one of the world's most voluminous multi-disciplinary academic search engines especially for academic web resources such as journal articles, preprints, digital collections, images / videos and research data, etc.

 

lightbulbTo only search for datasets in BASE, you can limit your search to "Dataset" under "Document Type" in advanced search.

Alternatively, you can also obtain data directly from various data repositories. Here are some of the most well-known multi-disciplinary data repositories:

The Dryad Digital Repository is a curated resource for a wide diversity of data types from any discipline to make makes research data discoverable, freely reusable, and citable.


Figshare is a multi-disciplinary repository where users can make all of their research outputs, from posters and presentations to datasets and code, available in a citable, shareable and discoverable manner.


The Harvard Dataverse is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data.


Zenodo is an open dissemination research data repository which enables researchers to easily share the long tail of small research results in a wide variety of formats including text, spreadsheets, datasets, software, and images across all fields of science.

 

lightbulbRemember to limit your results to "Dataset" under "Type".


lightbulbLooking for disciplinary data repositories?

You can use re3data, a global registry of research data repositories that covers research data repositories from different academic disciplines, to browse data repositories by by subject, or to search a data repository that matches your research needs.

Don't forget that the Library has a comprehensive statistics collection which can be a great source of data for your research!

There are some useful dataset search engines that allow you to easily search data across thousands of data repositories online.

Google Dataset Search

Google Dataset Search is a simple keyword search engine which allows users to discover datasets from multidisciplinary fields such as life sciences, social sciences, machine learning, civic and government data, and more.


DataCite Search

DataCite is a multi-disciplinary dataset search engine which gathers metadata for each DOI assigned to an object. The metadata is used for a large index of research data that can be queried directly to find data, obtain stats and explore connections. All the metadata is free to access and review.


gesisDataSearch

gesisDataSearch is a discovery service operated by the GESIS Leibniz Institute based on up-to-date metadata that are harvested from social science research data collections worldwide.


Bielefeld Academic Search Engine (BASE)

Operated by Bielefeld University Library, BASE is one of the world's most voluminous multi-disciplinary academic search engines which provides more than 240 million documents from more than 8,000 content providers. You can access the full texts of about 60% of the indexed documents for free (Open Access).

lightbulbYou can limit your search to "Dataset" under "Document Type" in advanced search.

Alternatively, you can also obtain data directly from various data repositories. Here are some of the most well-known multi-disciplinary data repositories:

Dryad

The Dryad Digital Repository is a curated resource for a wide diversity of data types from any discipline to make makes research data discoverable, freely reusable, and citable.


Figshare

Figshare is a multi-disciplinary repository where users can make all of their research outputs, from posters and presentations to datasets and code, available in a citable, shareable and discoverable manner.


Harvard Dataverse

The Harvard Dataverse Repository is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data


Zenodo

Zenodo is an open dissemination research data repository which enables researchers to easily share the long tail of small research results in a wide variety of formats including text, spreadsheets, datasets, software, and images across all fields of science.

lightbulbRemember to limit your results to "Dataset" under "Type".


lightbulbLooking for disciplinary data repositories?

You can use re3data, a global registry of research data repositories that covers research data repositories from different academic disciplines, to browse data repositories by by subject, or to search to find a data repository that matches your research needs.

Don't forget that the Library has a comprehensive statistics collection which can be a great source of data for your research!