The Real Context Switching Cost: What 5 Million GitHub PRs Tell Us
We synthesised data from 5M+ PRs to measure the real context switching cost for developers – and it's not where you think.
By Ellis Keane · 2026-03-29
The context switching cost that most articles quote – 23 minutes to refocus after an interruption, from Gloria Mark's UC Irvine research – is a real finding from a real study. But it measured general knowledge workers in 2008, not software engineers. And the cottage industry of blog posts that multiplies 23 minutes by an assumed interruption count to produce alarming annual dollar figures (always accompanied by a stock photo of someone holding their head) is doing something the original research never intended.
I have a personal stake in this question. At a previous company, I spent – and this isn't hyperbole – 80 to 100 per cent of some days just being a human router. Not writing code, not reviewing it. Routing information between people and tools, because no single system connected them. That experience is part of why we built Sugarbug, but it's also why I'm sceptical of the standard context switching cost calculators. They measure the interruption. They don't measure the days you spend never getting to the thing you were supposed to be interrupted from.
So we wanted to know what context switching actually costs in engineering work – not in abstract developer productivity terms, but measured in the artefact teams produce daily: pull requests. We synthesised findings from three large-scale studies covering over 5 million PRs across thousands of open-source projects, and looked at what actually drives pull request review time.
The main finding: the most expensive context switch isn't the Slack notification that breaks your flow. It's the pull request that sits in a review queue for a day, forcing the author to rebuild an entire mental model when questions finally arrive.
The Datasets We Drew From
We didn't build a custom scraper and analyse 10,000 PRs in isolation. We synthesised findings from two peer-reviewed studies and one large industry dataset, then pressure-tested their conclusions against each other.
stat: "3.35M" headline: "Pull requests analysed by Zhang et al." source: "Pull Request Latency Explained: An Empirical Overview (Empirical Software Engineering, 2022)"
The three primary datasets:
- Zhang et al. (2022), peer-reviewed: 3,347,937 closed PRs across 11,230 projects. Used mixed-effects linear regression to identify what drives PR review delays. Published in Empirical Software Engineering.
- Adadot (2023), industry dataset: 300,000+ merged PRs from ~30,000 developers. Non-peer-reviewed, but the sample is large and the methodology (Kendall tau correlation) is transparent. Focused on PR size vs. lead time.
- Multi-reviewing study (2019), peer-reviewed: 1,836,280 PRs across 760 GitHub projects. Published in Information and Software Technology. Examines concurrent review behaviour – a direct proxy for context switching in code review.
We cross-referenced these against the 2024 DORA State of DevOps Report and Atlassian's 2024 Developer Experience Report (surveying 2,100+ developers on context switching, developer productivity, and the human side of the equation).
The Queue Is the Real Killer
Zhang et al. found that the time it takes for a PR to receive its first response – first comment, first review, first anything – explains 58.7% of the variance in the PR's total lifetime. That's the strongest observed predictor in the dataset – ahead of PR size, code complexity, or number of files changed! Not even close.
The biggest cost of context switching in code review isn't the switching itself – it's the queue that forms while everyone is busy switching between other things.
Think about what that means in practice. An engineer opens a PR at 10am. The designated reviewer is deep in their own feature branch, or in a meeting, or triaging Slack messages (and honestly, probably all three in sequence). The PR sits. By the time someone picks it up at 3pm, the author has moved on to something else entirely. Now the reviewer has questions, which means the author has to context-switch back to code they wrote five hours ago, rebuild the mental model, and respond. That response lands at 4:30pm, but the reviewer has gone home.
The PR ages another day.
The data suggests this is a queuing problem more than a discipline problem – and the context switching cost of that queue compounds in ways that interruption calculators completely miss.
Small PRs Won't Save You
You've heard this one before: smaller PRs get reviewed faster, so keep your PRs small. That's not wrong, exactly, but the effect size is (genuinely) smaller than you'd expect.
Adadot's analysis of 300,000+ PRs found a Kendall tau correlation of just 0.06 between PR size and lead time – a weak association, though the study didn't report confidence intervals for the aggregate figure. PRs under 100 lines do have roughly an 80% probability of completion within a week, which sounds great until you realise that's the same completion rate as a PR that's sat in someone's review queue for six days!
stat: "0.06" headline: "Correlation between PR size and lead time" source: "Adadot analysis of 300,000+ PRs from ~30,000 developers (2023)"
The more interesting finding: this correlation varied wildly between organisations, ranging from 0.1 to nearly 0.7 depending on the company. Which suggests that PR size isn't inherently the bottleneck – the review culture and process around the PR is. A team with a strong review cadence can handle larger PRs efficiently. A team where reviews are an afterthought will struggle with PRs of any size.
The 400-line threshold from the SmartBear/Cisco code review study holds up as a useful heuristic – Adadot's data also found that review engagement drops beyond that range. But optimising for small PRs without fixing the underlying review cadence is (and I say this with genuine affection for every engineering manager who's tried it) rearranging deck chairs.
Everyone Is Reviewing Everything at Once
The multi-reviewing study found that 62% of pull requests involve developers who are simultaneously reviewing multiple PRs. More importantly, they found a statistically significant correlation: more concurrent reviews per reviewer was associated with longer PR resolution latency.
62% of pull requests involve developers simultaneously reviewing multiple PRs – and multi-reviewing correlates directly with longer resolution times. attribution: Multi-reviewing pull-requests study, 1.8M PRs across 760 projects
The mechanism is intuitive (even if the study, being observational, doesn't prove direction of causation). A reviewer picks up PR #1, reads through the diff, starts forming a mental model of what the code is trying to do. Then a notification arrives – PR #2 needs review because it's blocking a deploy. The reviewer switches. When they come back to PR #1, they have to re-read half the diff because the mental model has decayed.
Scale that across a team of eight engineers, each with two or three PRs open, each reviewing for two or three colleagues, and the coordination overhead starts to explain itself. Separately, the 2024 DORA Report found that the "high performer" cluster shrank from 31% to 22% while the low-performer cluster grew from 17% to 25%. DORA doesn't isolate PR review concurrency as a factor, but increasing coordination overhead is one plausible contributor to that shift.
What the Context Switching Cost Estimates Get Wrong
Let me be direct about the "$50K per developer per year" figure that circulates widely in context switching cost articles. The methodology behind most of these estimates goes: take the 23-minute refocus time, multiply by estimated daily interruptions (usually somewhere between 6 and 15, depending on how dramatic the author is feeling), multiply by an hourly developer rate, and annualise.
The problem isn't that the maths is wrong. The problem is that it treats all context switches as equivalent. Switching from deep coding to a Slack message asking where the team lunch is – that's a context switch. Switching from reviewing one PR to reviewing a different PR in a completely different codebase – that's also a context switch. But the cognitive cost isn't remotely comparable, and flattening them into a single hourly rate obscures where the real damage happens.
To put it concretely: at my last job, a typical day meant switching between Notion, Linear, Mattermost, Proton Mail, Proton Calendar, Discord, Twitter, Farcaster, and innumerable Telegram and Signal channels – and I'm sure I'm forgetting a half-dozen. Now I use a handful (Signal, Obsidian, Figma, GitHub, email, calendars). The per-switch cost didn't change. What changed was how many contexts were queuing for attention – and which of them actually mattered.
The PR data suggests that the expensive switches are the ones that create queues, not the ones that interrupt flow. A developer who gets pinged to review a PR immediately (within minutes) and does a quick 50-line review – that's a short interruption with a high return. A developer who queues that review request alongside four others and gets to it tomorrow – that's a longer interruption for the reviewer but creates a much larger cost for the author and the team.
What the cost calculators measure
- Individual interruptions – how often someone's flow breaks
- Refocus time – how long to get back to the previous task
- Hourly rate multiplication – big scary annual numbers
What the PR data actually shows
- Queue formation – PRs waiting for first response
- Review concurrency – reviewers juggling multiple PRs
- Cascade delays – author context-switches compounding reviewer delays
What This Means for Your Team
If you're trying to reduce context switching cost for developers on your team, the practical answer is boring – which is probably why it doesn't get written about much. It's not a tool. It's not a process framework with a certification programme. It's review cadence. (I know, I know. Nobody ever got promoted for improving review cadence.)
LinearB's 2025 engineering benchmarks, drawn from 6.1 million PRs across 3,000 organisations, found that teams achieving elite cycle times (under 2.5 days) shared one trait: they reviewed PRs quickly. Not because they had fewer PRs, or because their PRs were smaller (though they often were), but because responding to review requests within hours was a team norm, not an afterthought.
For what it's worth, Ben and I – a two-person team – average minutes on first PR response, not hours. That's not a flex about discipline (we're not). It's a team agreement: review requests are the one notification you don't queue. CI actions and automated tests handle the mechanical checks, which means the human review – the part that requires actual context – is shorter and happens immediately. The agreement came first. The tooling just made it sustainable.
Practically, that means:
- Measure time-to-first-response, not just cycle time. If you're tracking DORA metrics, add this one. It's the single strongest predictor of PR throughput (explaining 58.7% of lifetime variance, per Zhang et al.).
- Limit review concurrency. If a reviewer has three pending review requests, a fourth one isn't going to get a good review anyway. The multi-reviewing data showed a clear association between concurrency and latency. Start with a WIP limit of two concurrent reviews and monitor the impact.
- Stop optimising PR size in isolation. Small PRs are good, but they're not a substitute for a team that actually reviews things. A team producing twenty 50-line PRs a day with a 48-hour review backlog is worse off than a team producing five 200-line PRs with same-day reviews.
- Acknowledge that review is real work. Atlassian's 2024 survey found that 69% of developers lose 8+ hours weekly to inefficiencies. Review doesn't have to be one of those inefficiencies – but only if it's treated as a first-class engineering activity, not an interruption to "real" work.
And here's the part nobody in the productivity-tool space (ourselves included, to be fair) wants to say out loud: the most impactful intervention for context switching cost in engineering teams isn't a tool. It's a team agreement about when PRs get reviewed. If your team's implicit norm is "I'll get to reviews when I have a gap," no amount of tooling will prevent the queuing cascade that the PR data reveals.
Tools help – being able to see the full context of a PR without opening four browser tabs reduces the per-switch cognitive load, and surfacing which reviews are blocking other people's work helps prioritise. But the core lever is agreement, and agreements are free. No 23-minute calculator required.
The most expensive context switch isn't the notification that breaks your flow. It's the review request that sits in a queue for a day, forcing the author to rebuild mental context when questions finally arrive.
Get signal intelligence delivered to your inbox.
Frequently Asked Questions
Q: How much does context switching cost per developer per year? A: Estimates vary, but the underlying research is thinner than most articles suggest. Gloria Mark's UC Irvine study found 23 minutes of refocus time per interruption, and Atlassian's 2024 survey of 2,100+ developers found 69% losing 8+ hours weekly to inefficiencies. The dollar figure depends heavily on salary assumptions, interruption frequency, and how you define "switching" – which is why we focused on PR data instead.
Q: Does Sugarbug help reduce context switching for engineering teams? A: Yes. Sugarbug connects tools like Linear, GitHub, Slack, and Figma into a single knowledge graph, so engineers can see the full context of a task – the relevant PR, the Slack discussion, the Figma comment – without opening four tabs. The goal is fewer switches, not fewer tools.
Q: What is the ideal pull request size to minimise review delays? A: Research from Adadot's analysis of 300,000+ PRs found that PRs under 100 lines of code have roughly an 80% probability of being completed within one week. Above 400 lines, review quality and completion speed both drop. Smaller PRs also reduce the reviewer's context-switching burden.
Q: Does Sugarbug integrate with GitHub pull requests? A: Yes. Sugarbug scrapes GitHub activity – PRs, comments, reviews, and status changes – and links them to related signals across your other tools. If a Linear issue spawned the PR, and a Slack thread debated the approach, Sugarbug connects all three automatically.
Q: Where does the "23 minutes to refocus" statistic come from? A: It comes from Gloria Mark's research at UC Irvine, published in "The Cost of Interrupted Work: More Speed and Stress" (CHI 2008). The study found workers took an average of 23 minutes and 15 seconds to return to their original task after an interruption. It's worth noting the study observed general knowledge workers, not software engineers specifically.