Documentation
1 Introduction
“If we are not conscientious documenters, we can easily end up… without the ability to coherently describe our research process up to that point” (Stoudt, Vásquez, and Martinez (2021)).
Quality documentation is critical for ensuring that your work is understandable, reusable, and interpretable over time by external users, your colleagues, and your future self. It reduces errors, facilitates smooth project team transitions, and helps avoid confusion and duplication of efforts.
Without documentation, projects lose their usefulness over time, as illustrated below (see also Michener et al. 1997).

2 Types of documentation
This summary can help guide team conversations around documentation strategies.
2.1 Preliminary
Preliminary documentation refers to early-stage descriptions created during the planning phase. This often includes a data management plan (DMP), which outlines data collection, organization, storage, and sharing strategies.
2.2 Process
Process documentation involves capturing step-by-step procedures and workflows used to collect, process, analyze, and model data, including:
- Documenting raw data sources
- Documenting methods with code comments
- Documenting methods with version control tools
2.3 Product
Product documentation provides information to accompany code and data of completed projects, ensuring usability and transparency. This documentation can be a part of or accompanying written products.
3 Core documentation
project_name/
├── README.txt <<<<<<<
├── data/
│ ├── raw/
│ │ ├── README_raw_dataset.txt <<<<<<<
│ ├── intermediate/
│ ├── clean/
│ │ ├── metadata.txt <<<<<<<
3.1 The project-level README file
At a minimum, the project should have a README file in text or markdown format with the information listed below. This can be an evolving, living document. While it’s technically product documentation, it’s easiest to start it early in the project’s development.
Include the following in the README file:
- Project name and description
- Project PI(s) and contact information
- List of staff responsible for data management and code development
- Associated final product(s), date of release, and DOI (if applicable)
- Link to published data/code (if applicable)
- License(s) associated with final product(s)
- Nature of sensitive or proprietary data (if applicable)
- Any other important notes for navigating folder or using data/code
3.2 Raw data
Best practices:
- The source of all downloaded raw datasets should be documented in a README file.
- Create a README file associated with each raw data file, or each logical “cluster” of related raw data files, in the same folder as the data.
- If there are multiple data files in a folder, name the README so that it is easily associated with the data file(s) it describes (e.g.,
README_PRISM_Daily_Temperature.txt). - Format README files consistently.
- Write the README document in an plain text and open source file format, such as
.txtor.md.
Below is a README template and example. Include in the README file the information shown in the example.
Filename: README_PRISM_Daily_Temperature.txt
Dataset name & format: PRISM_Daily_Temperature_2024.csv
Data source: Downloaded from the PRISM Climate Group website: https://prism.oregonstate.edu/
Date acquired: 2024-02-28.
Acquired by: [Name of researcher who downloaded the data]
Data description: This dataset contains daily minimum and maximum temperature data for Washington State for the year 2024, with a spatial resolution of 4km.
Preprocessing: No modifications were made to the raw dataset. The file is stored exactly as downloaded from the source.
License & Usage Restrictions: This dataset is publicly available under the PRISM Climate Group's data use policy. Refer to https://prism.oregonstate.edu/documents/PRISM_terms_of_use.pdf for more details.
As described in the Organization section, all raw data should be retained in their raw form and not directly modified.
3.3 Methods
There are two critical strategies for documenting methods effectively:
Use version control to track and explain changes to scripts over time (e.g., through clear and consistent commit messages).
Use code comments to document the purpose and logic of scripts and key functions, helping others (and your future self) understand how the code works.
3.4 Data products
Metadata documentation should be included with final data products, and is described in the Publication section.