Skip to content

Latest commit

 

History

History
104 lines (68 loc) · 11.6 KB

02-effective-upstream-contributions.md

File metadata and controls

104 lines (68 loc) · 11.6 KB
Learner personas - code contributor
- code-adjacent contributor
- manager/stakeholder
Pre-requisites - Module 01: Introduction to Open Source
- Module 02: Participating in Open Source

Chapter 02: Effective upstream and downstream collaboration

Learning Objectives 🧠

  • Understand upstream and downstream project relationships in the open source ecosystem
  • Learn to identify strategic upstream and downstream projects and collaborate with them effectively

Understand Upstream and Downstream Projects ↕️

In the open source ecosystem, "upstream" and "downstream" refer to the relationship between different projects that build upon or contribute to each other. They describe the code flow, changes, and contributions in a software ecosystem. When a project builds on a different project’s functionality, the former becomes downstream of the latter, as described in the following figure:

Upstream and downstream OSS

Let’s consider pandas. Several PyData libraries like Dask and GeoPandas use pandas data structures (Series and DataFrame) and functions to provide more complex features; hence they are downstream of pandas. pandas is partially written in Python, so the CPython project is upstream of pandas. The upstream/downstream terminology is not isolated to core library features. pandas uses Sphinx to generate its documentation and pytest for its test infrastructure, so these libraries are also upstream of pandas.

Related software development terms are “dependency” and “dependents” — they refer to the same relationship but in more practical terms. A “dependency graph” is a structure used to map this relationship, where each project is a node, and they are connected with directed arrows that start at the project providing features (dependents) and end in the project using the feature (dependent). The above figure depicts a dependency graph.

However, not all dependencies are upstream projects (and vice versa for dependents and downstream projects). Dependencies can be directly mapped to the project’s codebase and include all the external tools required for the project to be developed, packaged, and executed. Upstream projects are a subset of dependencies central to the core value of your project. They are external tools that your project uses heavily and potentially extends. For an example to demonstrate the difference between upstream and dependencies, consider a project that has tests to ensure downstream project stability; this makes the downstream project a dependency of your project.

Note GitHub (and some other tools) autogenerates a list of dependencies and dependents based on the project’s packaging-related files like environment.yml, requirements.txt, pyproject.toml, package-lock.json, etc. This is useful but may not tell the full story because it doesn’t capture the dependency tree's depth and does not include packaging tools used to generate the graphs. Make sure to manually vet autogenerated dependency graphs.

Why Contribute to Upstream Projects? 💭

The flow of features and contributions between projects helps keep our ecosystem healthy. Not only open source projects but also corporate proprietary projects that rely on OSS should aim to contribute back to the upstream projects for the following non-exhaustive set of reasons:

  • Good OSS Citizenship: Collaborating with upstream projects in the ecosystem displays a good OSS partnership and improves community trust in your contributions.
  • Evaluate ideas with community experts: Bug fixes or feature implementation contributed to an upstream project (instead of fixing downstream) can be reviewed and improved by a larger community. By working with the community, you can also catch any unintended consequences in other parts of the upstream codebase or related projects.
  • Avoid duplicated effort: A fix or feature relevant to the upstream project, especially one that benefits more projects, may already be in development/discussion. An “upstream first” approach, where you coordinate with upstream projects before implementing your solution, will help you start or participate in community conversations early and avoid duplicating work.
  • Project stability: Contributing relevant features upstream keeps your project clean. Your codebase will have a clear scope, and you can minimize any cascading effects caused by implementing workarounds for the upstream issue.

Upstream Engagement Strategy 🌳

The dependency graph for a modern open source (or corporate) project can be enormous, reaching depths of the programming language and the operating system, and breadth-wise touching several linters, CI/CD workflow tools, and more. Hence, “upstream contribution” typically corresponds to supporting your project's immediate and core dependencies.

The strategy for your upstream contributions will depend on your specific upstream reliance, community health, and available resources. In this section, we’ll highlight some elements of good upstream participation to guide you in creating this strategy.

Note To reiterate, the cardinal rule of thumb is to think “upstream first” in your open source development journey.

You can define levels of your upstream participation based on how significant the upstream project is to your project and outline contribution activities based on the upstream project's needs and health. You can start with the two levels described below and add more as required.

Critical projects

These are upstream libraries that provide fundamental features for your project, and their stability and sustainability are crucial for the stability of your project. These projects should be the primary focus of your upstream strategy, and your collaboration activities can include the following:

  • Active contribution: As a power user of the upstream project, you are in an excellent position to report issues early, share ideas for and contribute new features, and improve the upstream documentation.
  • Maintenance support: You can particularly contribute to issues & PRs triage - triage involves small scoped tasks in very high volumes, which makes it a straightforward yet impactful way to help maintainers; and user questions - your insights as an immediate and direct user can be helpful to fellow users of the upstream project, as well as help you develop your expertise in the upstream project which will aid in your downstream development.
  • Community participation: Join upstream community spaces to keep up with upcoming changes that may influence your project. You can also share your valuable perspective as a downstream user in community discussions like project roadmap and enhancement conversations.

Important Follow the contribution and community guidelines for upstream projects, which vary between projects.

Ideally, you must test your project against active development branches, release candidates (RCs) of upstream projects, and regularly upgrade to the latest versions. This way, you can detect any breaking changes with enough time to report back upstream and make necessary updates to your project. You can also evaluate aligning your project’s release cadence with upstream releases.

Supporting projects

These tools enhance your project's user and developer experience, but their instability won’t be catastrophic. These projects should be on your radar but don’t require active contributions1. Your upstream strategy can include contributing back when relevant and actively reporting any issues as a user.

Collaborate with Downstream Projects 🌿

Interfacing with downstream projects is an extension of your collaboration strategy. Enthusiastic downstream projects are valuable to your community because they improve your project, advocate for it, and increase its adoption and impact. Therefore, it's your responsibility to provide them with the support that you expect from projects that are upstream to you, including:

  • Answer questions: Downstream projects are first-line users who will have many good questions and identify several issues in your project. You can work with them to build a knowledge base, improve your project documentation, and gather their feedback to enhance the project.
  • Communicate changes: Even minor changes can cause severe challenges downstream, so involve downstream projects in development discussions and proactively share your release notes, release candidates, and more. They can help you thoroughly test your RCs before wide releases.
  • Adopt strict policies: Follow clearly documented policies for releases, handling deprecations, security and hot fixes, and more, to set expectations for downstream project and guide your interactions.

    Note > Hyrum's Law is an observation in software engineering: "With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."

    In other words, every change is a breaking change if you have sufficient number of users. Hence, strong policies and communication with downstream projects is critical in OSS.

  • Encourage contributions and community participation: Encourage and support downstream projects to become regular contributors and active community members. You can invite them to special communication channels and meetings and eagerly seek their feedback on major project decisions.

🙋🏽‍♀️ Learner Question: Identify the "critical upstream" projects for napari

This is for your team to answer, and you can start at napari's dependency graph on GitHub.

Resources 📚

Continue learning 🚥

⬅️ Previous Chapter: 01 Creating a New OSS Project | Next Chapter: 03 Creating an impactful open source strategy ➡️

Footnotes

  1. If some projects are in a risky and low-maintenance state, and you have the resources to contribute, you should consider supporting them more actively.