Readability
“Good coding style is like correct punctuation: you can manage without it,
butitsuremakesthingseasiertoread.”
— Tidyverse Style Guide
Readable code reduces the time collaborators and future developers spend deciphering complex code and ensures continuity even if the original author is unavailable. To accomplish this, we recommend:
- modularizing code,
- using consistent code style, and
- providing clear in-code documentation.
1 Modularizing code
Modularizing code is the practice of organizing an entire research project into small, self-contained, and logically structured components, each with a clear and focused purpose. This applies at multiple levels: separating the overall codebase into well-defined scripts (e.g., raw data acquisition and import, cleaning, analysis, visualization), breaking scripts into coherent code blocks, and further decomposing repeated or complex operations into functions with clearly defined inputs and outputs. Together, these layers of modularization make the structure of the project transparent and easier for others to understand, review, and reuse.
Each script, code block, or function should be accompanied by descriptive in-line comments or documentation explaining its role, assumptions, and expected behavior. Modularization also improves debugging and maintenance: when something goes wrong, issues can be isolated to a specific component rather than requiring developers to trace through the entire codebase. Over time, this structure supports collaborative development, facilitates testing, and allows individual pieces of the workflow to evolve without disrupting the rest of the project.
2 Consistent code style
Code style refers to the conventions that govern how code is written and formatted. This includes:
- formatting choices (e.g., indentation, spacing, line length),
- naming conventions (for variables, functions, datasets, etc.), and
- documentation and commenting practices.
While personal preferences vary, the key to readability is consistency.
Below are established style guides relevant to programming languages commonly used at RFF:
- R: Tidyverse Style Guide by Hadley Wickham
- Python: Google Python Style Guide
- Stata: Suggestions on Stata programming style by Nicholas Fox
- Other languages: Google style guides for other languages
In particular, we recommend using consistent, distinctive, and meaningful names for variables, functions, datasets, and files:
- Variable or object names should be descriptive of their content, avoiding overly vague or generic terms (e.g.,
discount_rateinstead ofval) - Function names should describe their action or output (e.g.,
calculate_average_price()). - Datasets and files should follow a systematic naming pattern that includes relevant identifiers (e.g.,
county_population_2022.csvinstead ofdata_final.csv). See also the Naming folders, files, and scripts subsection under Data Management.
3 In-code documentation
In-code documentation refers to comments written directly within the source code to explain its purpose, functionality, and usage. We recommend incorporating three types of documentation:
Script headers
Include a header at the top of each script outlining key metadata such as the script’s objective, author, and start date.Block-level comments
Use comments to describe the intent and logic of each major code block.Inline comments
Add comments on individual lines of code, especially when the functionality is not obvious or when there are potential limitations.