Publication

Publication is making code/data available to the broader community, often through formal dissemination channels such as data repositories, journal articles, or public databases. Publication ensures that data is discoverable and can be accessed by other researchers, stakeholders, or the public. Documentation and metadata are included to facilitate understanding and reuse. Publication may also involve adherence to specific standards and best practices to enhance the visibility and impact of the data.

While RFF does not mandate the publication of code and data, it is highly encouraged.

Increasingly, journals require code and data to be submitted along with the article. In addition, many funders and stakeholders value open source software and data availability. Planning for this in the early stages of a project facilitates reproducibility, access, and the ability of others to use and cite your work.

1 Licensing

A well-chosen license clarifies permissions, prevents misunderstandings, and encourages responsible use. The next section provides guidance on selecting and attaching appropriate licenses to ensure your data and code remain accessible and properly credited.

1.1 Rights and ownership at RFF

All agreements that involve the creation of RFF research or a similar work product must have terms that define intellectual property (IP) rights.

Before proceeding with choosing licenses, confirm that your team owns IP rights to the work produced and has full discretion over licensing the project’s data and code without restrictions from funders, institutional policies, or legal/data agreements.

In some cases the research team may have joint IP ownership with partners or funders. In this case, licensing options must be agreed upon by both parties.

In other cases, the research team may have licensing rights to software/code developed, but not data products. Data license restrictions generally do not restrict code licensing / publication.

Note

Data and code products are licensed separately from RFF publication products. All work on the RFF.org and Resources.org websites (working papers, reports, issue briefs, explainers, Common Resources blog posts, Resources magazine articles, Resources Radio podcast episodes, graphs, charts, photographs, audio, and video) are listed under the Deed - Attribution-NonCommercial-NoDerivatives 4.0 International - Creative Commons license. This Creative Commons license is not suitable for either software or data.

1.2 Choose appropriate licenses

For software and data, there are three license suites common in the academic space: MIT, GNU General Public License (GPL), and Open Data Commons (ODC). MIT and GNU GPL are separate software licenses, while ODC has two commonly-used data licenses. All four are described below.

Software licenses

Attribute MIT GNU General Public Use (GPL)
Link to license text MIT GNU GPL
Description The more permissive and flexible. Users, including commercial entities, can view, use, modify, and distribute the work freely. Users, including commercial entities, can view, use, modify, and distribute the work freely, but are subject to copyleft restrictions.
Attribution required ✅ Yes ✅ Yes
Enclosure allowed ✅ Yes ❌ No
Copyleft required (viral) ❌ No ✅ Yes
Best uses Libraries and tools where broad reuse is encouraged, including in proprietary or closed-source contexts. Models, scripts, and workflows where preserving long-term openness and share-alike reuse is a priority.

Terms:

  • Enclosure: The act of applying legal or technical restrictions—like proprietary licensing or copyright—to limit access, reuse, or modification of software or data that was previously open.

  • Attribution: The requirement to credit the original creator or source as specified in the license. It ensures recognition while allowing reuse and modification.

  • Copyleft: A license condition that requires derivative works to be distributed under the same terms as the original. It ensures continued openness by mandating that source code remains available and free to reuse.

  • Viral: Describes licenses (like GPL or ODbL) that require any derivatives to adopt the same license. This spreads the original license’s terms to all adaptations, preserving openness but limiting reuse.

Data licenses

Attribute ODC-By ODCbL
Link to license text ODC-By ODCbL
Description A permissive license for databases. Users, including commercial entities, may use, modify, redistribute, and build upon the data. A copyleft license for databases. Users may use, modify, redistribute, but derivative databases must be shared under the same license. Ensures openness through a share-alike (viral) requirement.
Attribution required ✅ Yes ✅ Yes
Enclosure allowed ✅ Yes ❌ No
Share-alike required (viral) ❌ No ✅ Yes
Best uses When the goal is to encourage broad academic and public reuse of a dataset, including by commercial entities, without requiring openness in derived products Collaborative academic projects where ensuring that all derivative datasets remain openly accessible under the same terms is critical.
  • Share-alike: A condition applied mainly to data and content (e.g., under CC BY-SA or ODbL), requiring that any adapted works use the same or a compatible open license (analogous to copyleft).

Other licenses

If there are other requirements or the team would like to review a broader range of software licenses and their specifications, visit ChooseALicense.

1.3 Create and customize license files

  • Once the license has been selected, download or create a .txt license file from the license website. Save the file as LICENSE.txt in the project folder and/or repository.

  • Review the license terms and modify where necessary (most open-source and open-data licenses are designed to be used as-is, but others may require you to fill in specific details, such as name or organization).

  • If adding additional terms, include them in a separate README or license appendix to avoid conflicting with the main license.

2 Publishing

For publishing both code and data, ensure that the project-level README is up to date.

2.1 Code

When your project is ready to publish code stored in a GitHub repository (whether alongside a journal article, report, or other research output), it can be tagged to associate it with a publication and linked to the RFF GitHub organization.

Note that some journals require authors to:

  • Make the codebase publicly accessible
  • Include a link to the GitHub code repository in the manuscript
  • Reference the codebase DOI in the manuscript

Options to share repositories via the RFF GitHub Organization

You can link your GitHub repository to the RFF GitHub organization in one of two ways: by forking it into the organization or by transferring ownership.

Option 1: Fork into the RFF GitHub Organization

Forking creates a new copy under the RFF GitHub organization’s namespace. It can be kept in sync with the original or left unchanged.

This is the most flexible option and is well-suited when:

  • You want to retain full ownership and admin control of the original repository.
  • The original repo belongs to an external collaborator or other GitHub organization.
  • You want to create a static snapshot of a repo to archive a version associated with a publication (e.g., v1.0-paper-release).
  • You intend to continue developing the code for purposes unrelated to RFF.

To connect it to a research output:

  • Tag a release (e.g., v1.0).
  • Add a note in the README describing its relationship to the publication and original source (preferably, including a DOI).

While the fork is housed in the organization, you retain full control of the upstream repo.

Option 2: Transfer Ownership to the RFF Organization

This option is best for long-term institutional or collaborative projects where:

  • The repository should be owned and managed by the organization rather than an individual.
  • Continuity is important (e.g., when researchers move institutions).
  • You’re finished working on the project and no longer want to manage the repository.

Key characteristics:

  • The original repo is moved to the GitHub organization, including its issues, branches, and history.
  • Collaborators retain their roles. Former repository owners can request to be made co-administrators (along with the organization), granting control over collaborator roles.
  • The URL changes but GitHub automatically redirects old links.

2.2 Data

Data-level documentation (metadata)

It is recommended to attach metadata files to published datasets. Metadata (“data about data”) documents the “who, what, when, where, how, and why” of a data resource. Metadata not only allows users (your future self included) to understand and use datasets, but also facilitates search and retrieval of the data when deposited in a data repository.

Below are the key components of metadata. They can be stored in a simple text or markdown file.

  • Title: Descriptive name of the dataset.
  • DOI number: Associated with the final publication, dataset, or both
  • Abstract: Summary of the dataset’s content
  • Keywords: Relevant terms for search and discovery
  • Temporal Extent: Time period covered by the data
  • Data Format: File format
  • Data Source(s): Origin of the data
  • Accuracy and Precision: Information about data quality
  • Access Constraints: Restrictions on data use
  • Attribute / field definitions: Define all abbreviations and coded entries
  • Additional geospatial metadata components, if applicable
  • Spatial Extent: Geographic coverage (bounding coordinates)
  • Projection Information: Coordinate system details

In some contexts, generating machine-readable metadata that adheres to certain disciplinary standards is useful. There are various metadata formats and standards for specific disciplines. Additional guidance and resources for generating machine-readable metadata are here: Metadata and describing data – Cornell Data Services.

Uploading data to Zenodo

Zenodo is an online repository for sharing research data, software, and other scientific outputs. It has a broad disciplinary focus and is safe, citeable (every upload is assigned a DOI), compatible with GitHub, and free for up to 50GB of storage.

Step 1: Prepare the research data

Before uploading, ensure that:

  • The data is well-organized (e.g., structured folders, clear file names).
  • Metadata files are prepared for each data file or sets of data files.
  • A license is attached.
  • Any sensitive or restricted data is removed or anonymized (if applicable).
  • The project-level README is up to date.

For guidance on choosing which files to publish and how to handle API-accessed data, see Finalize data organization.

Step 2: Create a Zenodo account & access the upload dashboard
  • Go to Zenodo and sign in (or create an account). Note that you can create an account using your GitHub profile.
  • Click the “New Upload” button on the Zenodo dashboard.
Step 3: Upload the data files, fill in metadata, set access
  • Upload data, metadata, and the project-level README file.
  • Enter metadata information in applicable fields (contributors, associated journal article or conference presentation, etc.)
  • Include a link to the GitHub code repository.
  • Choose an Access Level
    • Open Access: Publicly available for anyone.
    • Embargoed: Set a release date if the data must remain private for a certain period.
    • Restricted Access: Requires users to request access.
Step 4: Publish & Get a DOI
  • Review all details and make any necessary edits.
  • Click “Publish” to finalize the upload.
  • Zenodo will generate a DOI — use this when citing the dataset in publications.
Versioning & Updates

If the dataset needs to be updated:

  • Use the “New Version” option in Zenodo instead of creating a separate upload.
  • Zenodo will link versions together and maintain persistent DOIs.