Issue Triage Guidelines

These guidelines serve as a primary document for triaging incoming issues to Kubernetes. SIGs and projects are encouraged to use this guidance as a starting point, and customize to address specific triaging needs.

Issue Triage: A Primer

Table of Contents

Scope

These guidelines serve as a primary document for triaging incoming issues to Kubernetes. SIGs and projects are encouraged to use this guidance as a starting point, and customize to address specific triaging needs.

Note: These guidelines only apply to the Kubernetes repository. Usage for other Kubernetes-related GitHub repositories is TBD.

What Is Triaging?

Issue triage is a process by which a SIG intakes and reviews new GitHub issues and requests, and organizes them to be actioned—either by its own members, or by other SIGs. Triaging involves categorizing issues and pull requests based on factors such as priority/urgency, SIG ownership of the issue, and the issue kind (bug, feature, etc.)

  • the SIG or SIGs responsible for handling the issue or pull request

Triage can happen asynchronously and continuously, or in regularly scheduled meetings. Several Kubernetes SIGs and projects have adopted their own approaches to triaging.

Why Is Triaging Beneficial?

SIGs who triage regularly say it:

  • speeds up issue management
  • keeps contributors engaged by shortening response times
  • prevents work from lingering endlessly
  • replaces “special requests” and one-offs with a neutral process that acts like a boundary
  • leads to greater transparency, interesting discussions, and more collaborative, informed decision-making
  • it helps build prioritization, negotiation and decision-making skills, which are critical to most tech roles
  • it reinforces SIG community and culture

People who enjoy product management and iterating on processes tend to enjoy triaging because it empowers their SIGs to maintain a steady, continuous flow of work that is assessed and prioritized based on feedback and value.

How to Triage: A Step-by-Step Flow

This aims to walk you through a standard triaging process, first covering tools and tips.

These tools that your SIG can use to make the process simpler, more efficient and faster.

Permissions and the Bot

Opening new issues and leaving comments on other people’s issues are possible for all contributors. However, permission to assign specific labels (e.g. triage), change milestones, or close other contributors issues is only granted to the author of an issue, assignees, and organization members. For this reason, we use a bot to manage labelling and triaging. For a full list of commands and permissions, see the Prow command reference page.

Gubernator

Gubernator offers a dashboard that tells you which pull requests are waiting for your feedback and which PRs are waiting for the contributor to respond. Please note that Gubernator only shows pull requests. You will not see which issues are assigned to you.

Triage Party

Triage Party is a tool for triaging incoming GitHub issues for large open-source projects, built with the GitHub API. Made public in April 2020, it facilitates “massively multi-player GitHub triage” and reduces contributor response latency.

Some of its features:

  • Queries across multiple repositories
  • Queries that are not possible on GitHub:
    • conversation direction (tag: recv, tag: send)
    • duration (updated: +30d)
    • regexp (label: priority/.*)
    • reactions (reactions: >=5)
    • comment popularity (comments-per-month: >0.9)
  • Multiplayer mode: for simultaneous group triage of a pool of issues
  • Button to open issue groups as browser tabs (pop-ups must be disabled)
  • “Shift-Reload” for live data pull

GitHub Project Boards

GitHub offers project boards, set up like kanban boards, to help teams organize and track their workflow in order to get work done. The Release Team has come to depend on their project board for planning new Kubernetes releases; they also use it as an archive to show the work Done for past releases. Other SIGs using project boards:

We encourage more SIGs to use project boards to enhance visibility and tracking. If you’d like some help getting started, visit GitHub’s documentation or reach out to SIG Contributor Experience.

DevStats

The CNCF has created a suite of Grafana dashboards and charts for collecting metrics related to all the CNCF projects. The Kubernetes dashboard can be used to help SIGs view real-time metrics on many aspects of their workflow, including:

  • Issue Velocity: How quickly issues are resolved
  • PR Velocity: Including PR workload per SIG, PR time to approve and merge, and other data

Process Pointers and Advice from SIGs

Several SIGs consistently meet weekly or monthly to triage issues. Here are some details about their processes:

Running a Triage Meeting: Tips from api-machinery

The api-machinery SIG has found that triage meetings offer valuable opportunities for newcomers to listen, learn, and start contributing. api-machinery hold triage meetings every Tuesday and Thursday and archive recordings via their Youtube playlist; here is an example.

In a typical triage meeting, api-machinery members sort through every issue that they haven’t triaged since the previous meeting, using a simple query and issue # to track Open PRs and Issues. They usually then:

  • read through the comments and the code briefly to understand what the issue is about.
  • determine by consensus if it belongs to the api-machinery SIG or not. If not, remove the sig/api-machinery label.
  • label other SIGs, if appropriate
  • discuss briefly the technical implications
  • assign people with expertise in the domain to review, comment, reject, etc.

api-machinery has found that consistently meeting on a regular, fixed schedule is key to the success of a triaging effort. More frequent, small meetings are better than infrequent, large meetings, they’ve found. A few other pointers:

  • We try to balance the load, and ask people if they are okay taking on an issue before assigning it to them
  • We skip issues that are closed
  • We also skip cherrypicks, because we consider that the code change was reviewed in the original PR
  • We ensure participation from the entire SIG and support company diversity.
  • We use this opportunity to mark “help needed”, “good first issue”

Triage Guide by cluster-lifecycle

The SIG has developed a triaging page detailing their process, including the Milestones stage. Here is a March 2020 presentation delivered to the SIG chairs and leads group on their process.

Step One: Review Newly Created Open Issues

Kubernetes issues are listed here. New, untriaged issues come without labels attached. SIG leads should identify at least one SIG member to serve as a first point of contact for new issues.

Labels are the primary tools for triaging. Here’s a comprehensive label list.

Conducting Searches

GitHub allows you to filter out types of issues and pull requests, which helps you discover items in need of triaging. This table includes some predetermined searches for convenience:

Search What it sorts
created-asc Untriaged issues by age
needs-sig Issues that need to be assigned to a SIG
is:open is:issue Newest incoming issues
comments-desc Busiest untriaged issues, sorted by # of comments
comments-asc Issues that need more attention, based on # of comments

We suggest preparing your triage by filtering out the oldest, unlabelled issues and/or pull requests first.

Step Two: Triage Issues by Type

Use these labels to find open issues that can be quickly closed. A triage engineer can add the appropriate labels.

Depending on your permissions, either close or comment on any issues that are identified as support requests, duplicates, or not-reproducible bugs, or that lack enough information from the reporter.

Support Requests

Some people mistakenly use GitHub issues to file support requests. Usually they’re asking for help configuring some aspect of Kubernetes. To handle such an issue, direct the author to use our support request channels. Then apply the kind/support label, which is directed to our support structures, and apply the close label.

Please find more detailed information about Support Requests in the Footnotes section.

Abandoned or Wrongly Placed Issues

Either close or comment on it.

Needs More Information

  • The triage/needs-information label indicates an issue needs more information in order to work on it; comment on or close it.

Bugs

First, validate if the problem is a bug by trying to reproduce it.

If you can reproduce it:

  • Define its priority
  • Do a quick duplicate search to see if the issue has been reported already. If a duplicate is found, let the issue reporter know, reference the original issue, and close the duplicate.

If you can’t reproduce it:

  • Contact the issue reporter with your findings
  • Close the issue if both the parties agree that it could not be reproduced.

If you need more information to further work on the issue:

  • Let the reporter know it by adding an issue comment followed by label lifecycle/needs-information.

In all cases, if you do not get a response in 20 days then close the issue with an appropriate comment. If you have permission to close someone else’s issue, first /assign the issue to yourself, then /close it. If you do not, please leave a comment describing your findings.

Help Wanted/Good First Issues

To identify issues that are specifically groomed for new contributors, we use the help wanted and good first issue labels. To use these labels:

  • Review our specific guidelines for how to use them.
  • If the issue satisfies these guidelines, you can add the help wanted label with the /help command and the good first issue label with the /good-first-issue command. Please note that adding the good first issue label will also automatically add the help wanted label.
  • If an issue has these labels but does not satisfy the guidelines, please ask for more details to be added to the issue or remove the labels using the /remove-help or /remove-good-first-issue commands.

Kind Labels

Usually the kind label is applied by the person submitting the issue. Issues that feature the wrong kind (for example, support requests labelled as bugs) can be corrected by someone triaging; double-checking is a good approach. Our issue templates aim to steer people to the right kind.

Step Three: Define Priority

We use GitHub labels for prioritization. If an issue lacks a priority label, this means it has not been reviewed and prioritized yet.

We aim for consistency across the entire project. However, if you notice an issue that you believe to be incorrectly prioritized, please leave a comment offering your counter-proposal and we will evaluate it.

Priority label What it means Examples
priority/critical-urgent Team leaders are responsible for making sure that these issues (in their area) are being actively worked on—i.e., drop what you’re doing. Stuff is burning. These should be fixed before the next release. user-visible bugs in core features
broken builds
tests and critical security issues
priority/important-soon Must be staffed and worked on either currently or very soon—ideally in time for the next release. Important, but wouldn’t block a release. [XXXX]
priority/important-longterm Important over the long term, but may not be currently staffed and/or may require multiple releases to complete. Wouldn’t block a release. [XXXX]
priority/backlog General agreement that this is a nice-to-have, but no one’s available to work on it anytime soon. Community contributions would be most welcome in the meantime, though it might take a while to get them reviewed if reviewers are fully occupied with higher-priority issues—for example, immediately before a release. [XXXX]
priority/awaiting-more-evidence Possibly useful, but not yet enough support to actually get it done. Mostly placeholders for potentially good ideas, so that they don’t get completely forgotten, and can be referenced or deduped every time they come up

Step Four: Find and Set the Right SIG(s) to Own an Issue

Components are divided among Special Interest Groups (SIGs). The bot assists in finding a proper SIG to own an issue.

  • For example, typing /sig network in a comment should add the sig/network label.
  • Multiword SIGs use dashes: for example, /sig cluster-lifecycle.
  • Keep in mind that these commands must be on their own lines, and at the front of the comment.
  • If you are not sure about who should own an issue, defer to the SIG label only.
  • If you feel an issue should warrant a notification, ping a team with an @ mention, in this format: @kubernetes/sig-<group-name>-<group-suffix>.
    • Here, the <group-suffix> can be one of bugs, feature-requests, pr-reviews, test-failures, proposals. For example, @kubernetes/sig-cluster-lifecycle-bugs, can you have a look at this?

Self-Assigning

If you think you can fix the issue, assign it to yourself with just the /assign label. If you cannot self-assign for permissions-related reasons, leave a comment that you’d like to claim it and work on creating a PR.

Step Five: Follow Up

If no PR is created for an Issue Within the Current Release Cycle

If you see any issue which is owned by a developer but a PR is not created in 30 days, a Triage engineer should contact the issue owner and ask them to either create a PR or release ownership.

If a SIG label Is Assigned, but No Action Is Taken Within 30 Days

If you find an issue with a SIG label assigned, but there’s no evidence of movement or discussion within 30 days, then gently poke the SIG about this pending issue. Also, consider attending one of their meetings to bring up the issue, if you feel this is appropriate.

If an Issue Has No Activity After 90 Days

When this happens, the fejta-bot adds the lifecycle/stale label to that issue. You can block the bot by applying the /lifecycle frozen label preemptively, or remove the label with the /remove-lifecycle stale command. The fejta-bot adds comments in the issue that include additional details. If you take neither step, the issue will eventually be auto-closed.

Footnotes

Support Requests: Channels

These should be directed to the following:

User Support Response: Example

If you see support questions on kubernetes-dev@googlegroups.com or issues asking for support, try to redirect them to Discuss. Example response:

Please re-post your question to our [Discussion Forums](https://discuss.kubernetes.io).

We are trying to consolidate the channels to which questions for help/support
are posted so that we can improve our efficiency in responding to your requests,
and to make it easier for you to find answers to frequently asked questions and
how to address common use cases.

We regularly see messages posted in multiple forums, with the full response
thread only in one place or, worse, spread across multiple forums. Also, the
large volume of support issues on GitHub is making it difficult for us to use
issues to identify real bugs.

Members of the Kubernetes community use Discussion Forums to field
support requests. Before posting a new question, please search these for answers
to similar questions, and also familiarize yourself with:

  * [user documentation](https://kubernetes.io/docs/home/)
  * [troubleshooting guide](https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/)

Again, thanks for using Kubernetes.

The Kubernetes Team