You're missing a big piece of the puzzle! The main reason some companies fire the bottom X% of performers every year is to motivate everyone to work hard (if not frantically) to try to stay out of firable bucket. The system you are measuring is affected by the performance management tactics you use, by design.
I have a gut sense that software work is likely different than other forms of work. The value of a top performer can really have a non-linear effect on a team, they can just do things others wont do as well, and this effect is compounding over time.
Really the same for certain leadership roles. They wield decision making over dozens or hundreds. Thus even a few percentage better can have an outsized return since the opportunity cost for taking mediocre action is often so high.
This is not an argument for a gaussian curve, rather it is an argument for some kind of truncated exponential or such, I am not exactly sure. I think my main point, is that i would expect the curves to really be different for different roles
In a previous life, I have both taken part in and organized annual employee rankings for a group of about 100 staff. What were the goals?
- Identify excellent performers for likely promotions, bonuses, and large raises.
- Provide information for annual raises.
- Identify those staff members who need specific guidance, training, or management to improve.
- Identify those staff members (generally only one or two) who may need to find some other place to work (possibly terminated or possibly moved for some reason to another part of the organization).
There are some features of the practical process.
- The process only works if the ranking is done by large group of near managers cross-consulting in an open (but closed to those being ranked, of course) meeting.
- There must be written evaluations of every member being ranked written in advance by the direct manager. These must be circulated well in advance to the group doing the rankings.
- The actual rankings took about 2 days of solid group meeting. No manager excused during that time.
- There are no tied ranks. We organized that by opening nominations for slot one and then, by a process of consensus, finally selecting a staff member for that slot. Then slot two, slot three, and so on.
Some observations.
- At the top, there were always several candidates for each slot and the discussions took a long time.
- Oddly, someone who might have been a candidate for slot 2, say, could fall way down after discussions and someone who was not thought to be exceptional could rise a lot during the discussions. This would sometimes happen when the phrase (or something like it) was heard:
"But didn't X work on that project beside Y?"
- The direct manager led the discussion for each candidate for each slot but it was very often comments from other managers who had seen the candidate "from the side" which were the most important.
- Wherever a staff member landed, the manager was expected to amend the original review with knowledge learned during this meeting. Sometimes the manager was specifically directed by the meeting about some point to include.
- There were often surprises. One, in particular, was of a staff member who was very good, arrogant, and hard for management to work with. However, the member happened to be, by a large margin, the most experienced on a large (maybe 15 members) group which had been given the mind-numbing task of bringing a large set of (software) libraries and tools gathered from "every which where" up to release standards. As it happened, many other managers had heard from the junior members of this group that the experienced member had been extremely kind and helpful to all of them, mentoring and teaching them constantly and never taking any credit. All the way, the direct manager had not been shown this behavior. The upshot is that the direct manager learned an important lesson that people can have more than one facet, the staff member was ranked very highly, got a big bonus and raise, and was sounded out about possible promotion. Turns out that the staff member was happy to hear about all that but really just wanted to go back to small, hard projects that could be done pretty much alone without much management. And so that happened.
- The distribution of competence clearly bulged in the middle with a long tail towards the better end. It was not a normal distribution. Only near the lower ranks did performance start to drop off noticeably. In almost all cases, the problems were either temporary (say, an accident with lots of medical treatment) or something else that could be remediated fairly easily. It was common for someone, after a year ranked at 30%, to jump up to 60, 70, or 80% just because an appropriate adjustment had been made. There were seldom such dramatic falls.
- Because of this, even the staff at the bottom of the rankings could be treated generously and fairly. During my tenure in that job, there were only one or two tagged for termination and one had already figured it out and found a job outside before the termination could even be organized. The other that I can remember may have moved on voluntarily as well. However, a few years later I worked somewhere else where that staff member had worked for a while. The member was also terminated at the new place. That suggests that mechanism had actually identified someone who was not a good worked in the field.
Finally, a word about consensus as a decision making mechanism.
The rankings were done by consensus. We did not move on from slot one until the management group (about 25 people) had agreed who was deserved that rank. Consensus does not mean a majority vote. Rather, it is a mechanism in which every voter has a veto. The discussion cannot end until every voter agrees that the decision is to stand. So long as at least one voter "vetoes", the discussion continues.
The objection is that discussions can go on for a long time. Apparently forever!
The actual effect is that a decision desired even by a large majority can be held up by a strong enough counter view. That may seem bad. But the meta-effect is even when the decision is very close (51% to 49%), if there is consensus, even the opponents agree that it is the best decision to be had. The opponents may feel it is wrong, but they will also feel that
- they and their objections have been completely and fairly heard, and
- more time and energy will not create a decision different from this one — or, at least, it would be not worth the effort to do that.
[I have also used this at other kinds of meetings to make technical decisions and it works surprisingly well. Often, if a break is taken (say, come back a week later for another discussion), the objection will vanish or the two sides will have found a new path to an even better solution. The key seems to be let every one feel that they have have been heard out and that what they have been said has been absorbed. Then the final decision can be accepted.]
The summary.
- Do prior work before the decisions (of any sort).
- A group makes the decision and consensus works. Do not rank (or promote) based on the observations of a single person.
- Rankings are not punishments. They are a way for management to figure out what needs to be rewarded and what staff needs help to do better.
- Almost every staff member has positive contributions and even those whose contributions are fairly small can get better once the roadblocks are known and addressed.
- Ranking can identify staff members who should not be part of the organization. There should not be many of those (or perhaps it is time to start ranking those who do the hiring). But the number of them should never be decided by some mechanical rule.
- Finally, fear is not a good motivation for good performance. And those who think it is should be considered for termination.
I have perhaps been lucky in my life and in my places of employment. But I was continually surprised by the good work done by those who were simply treated like human beings.
Sounds like you would have been a great person to have as a manager. Thoughtful, caring, and trying to make your team as strong as possible. I’m struggling with this right now and I’ll take your advice.
There you go taking all the fun out of the process by imposing a rational framework on what is essentially a social organism. Here is my experience with annual reviews/bonuses from a now defunct Fortune 50.
Each year we were graded on a five point scale across the usual pile of attributes.
1-outstanding across the board
2-meets or exceeds expectations
3-consistently meets expectations
4-misses some expectations
5-consistently fails to meet expectations
Nobody’s perfect, so there were no 1s.
Anyone would was heading toward a 5 was put on a performance plan early enough not to be around anymore, so there were none of those, either.
There was a fixed pot of bonus money to be split among the middle. Management would decide who was to receive fattened bonuses. They would be graded as 2s. Anyone who was actually having a good year but whose turn it wasn’t got knocked down to 3s. A few hard luck cases having a poor year were kicked up to 3s. Occasionally, someone had to take one for the team and they would be taken down a notch or even two. Happened to me the year before I made my bones and ascended to an exalted rank.
That’s the problem with economic rationality—there’s very little to go around.
Great post! Appreciate the sports callout in the notes. I didn't call it out explicitly here and Pareto isn't quite right), but I did notice that sport performance has a fat positive tail and a lot of distribution mass under the median.
I wanted to agree with this because we loved Pareto distributions but we used to say 80% of the work was done by 20% of the team. I think though you're kind of saying 80% of the team is essentially one level of productivity/contribution and then 20% are either hiring mistakes or promotion mistakes.
There's no way around three aspects of performance management:
• At some "n" (~100) there is a normal distribution of contribution. It might cluster around 3 or 5 groups but it is. Just as in freshman calculus at MIT there is a cluster of scores this way even though everyone got 1600 on their SATs and 4.0 GPA at admission. The biggest failure of the implementation of this distribution is too small a set of people (lazy management that avoids the process of finding groups) OR mixing job functions and/or expectations to make the group. The latter is not comparing similar levels of experience or combining job functions that make it impossible to compare output.
• There is a budget for dollars/stock and it is fixed which means no matter what you have to create some distribution and enforce it. You can always just give everyone a gold star and randomly assign amounts of money but that is always perceived as political. You can give everyone the same amount of money but that is a different kind of politics. If you presume too much similarity in people then the 20% doing all the work won't get meaningful rewards.
• Part of a finite budget is that you can't promote everyone all the time or said another way you can't promote every individual through the entire system. There's not enough salary budget or bonus budget. I like what you say about a short term performance "error". As with giving everyone a gold star you can also just separate out titles/levels/rank from compensation and then everyone will be a VP like in a bank but people will find other ways to know who is senior.
I was at GE from '94-'02. I imposed the Welch Rule on my team.
A few issues come to mind:
First, no one knows what the marginal productivity of someone is unless the job is extraordinarily routinized. It doesn't work if workers call on different customers, have different products with different customers in different markets or if the returns are a result of team efforts and on and on.
Second, Competition for employees by competitors drives the greatest change in individuals' compensation, not promotion. But changing jobs is almost entirely driven by the ever changing needs of business, e.g. AI engineers in the bottom 10% are doing pretty well right now.
Finally, performance of a single employee can change enormously just by moving to a different job (and boss) in the same business)
I found your argument about Pareto distributions compelling, but I wonder if it oversimplifies the role of context in employee performance. For example, wouldn’t team dynamics, resources, and managerial support significantly skew individual outcomes, making it harder to fit performance into any single distribution? It feels like the Pareto assumption might still miss some of the nuance in how performance is actually shaped within organizations.
Wonderful article Tim! It is completely flawed to assume that employee performance can be modeled using Gaussian Distribution. On top of it , they use the top and bottom 5 or 10% for you know what.
I wrote an article on "Flawed logic of bottom 5% layoffs"
You force fit into a curve where the performance does not belong. I agree when you work on machined, sheet metal, plastic parts, the dimensions will fall under Gaussian distribution, not employee performance
I dont think anyone who has managed larger teams or even a small/medium business thinks the value of employees to the business is normally distributed. Price's Law also has important relevance (50% of the work is done by the square root of the total people who participate in the work). Curious if you had additional thoughts on this point. I would think more time should be spent identifying and keeping Price+Pareto talent
Very interesting article (linked to from Byrne Hobart's newsletter); I subscribed post haste and hope to see more posts from you in future. Incidentally, re log-normal vs. power-law, for the Patreon "creator economy" service the data show that both the (monthly) earnings per project and the number of patrons per project appear to follow a log-normal distribution. See for example https://rpubs.com/frankhecker/993611 and https://rpubs.com/frankhecker/994383 (Patreon is a more extreme example of "pay for performance" than any corporate environment, with a Gini coefficient of 0.84.)
> On the other hand, looking at the Pareto percentile plot, the bottom 10% aren’t really all that different from the folks in the next 10%. As a matter of fact, there’s not an obvious place to draw a line to identify the “lowest end” employees to expunge. ~65% of employees are performing below the expectations that are associated with the salary midpoint (the green dashed line)!
> Summarizing:
> There is no intrinsic bottom 10% that needs to be expunged annually. Let managers identify any hiring errors if they think their team can do better, but don’t set a target for this number based on faulty statistical notions.
That's funny. The summary I would have put together from this is that managers should target cutting the bottom 65%
You're missing a big piece of the puzzle! The main reason some companies fire the bottom X% of performers every year is to motivate everyone to work hard (if not frantically) to try to stay out of firable bucket. The system you are measuring is affected by the performance management tactics you use, by design.
I have a gut sense that software work is likely different than other forms of work. The value of a top performer can really have a non-linear effect on a team, they can just do things others wont do as well, and this effect is compounding over time.
Really the same for certain leadership roles. They wield decision making over dozens or hundreds. Thus even a few percentage better can have an outsized return since the opportunity cost for taking mediocre action is often so high.
This is not an argument for a gaussian curve, rather it is an argument for some kind of truncated exponential or such, I am not exactly sure. I think my main point, is that i would expect the curves to really be different for different roles
In a previous life, I have both taken part in and organized annual employee rankings for a group of about 100 staff. What were the goals?
- Identify excellent performers for likely promotions, bonuses, and large raises.
- Provide information for annual raises.
- Identify those staff members who need specific guidance, training, or management to improve.
- Identify those staff members (generally only one or two) who may need to find some other place to work (possibly terminated or possibly moved for some reason to another part of the organization).
There are some features of the practical process.
- The process only works if the ranking is done by large group of near managers cross-consulting in an open (but closed to those being ranked, of course) meeting.
- There must be written evaluations of every member being ranked written in advance by the direct manager. These must be circulated well in advance to the group doing the rankings.
- The actual rankings took about 2 days of solid group meeting. No manager excused during that time.
- There are no tied ranks. We organized that by opening nominations for slot one and then, by a process of consensus, finally selecting a staff member for that slot. Then slot two, slot three, and so on.
Some observations.
- At the top, there were always several candidates for each slot and the discussions took a long time.
- Oddly, someone who might have been a candidate for slot 2, say, could fall way down after discussions and someone who was not thought to be exceptional could rise a lot during the discussions. This would sometimes happen when the phrase (or something like it) was heard:
"But didn't X work on that project beside Y?"
- The direct manager led the discussion for each candidate for each slot but it was very often comments from other managers who had seen the candidate "from the side" which were the most important.
- Wherever a staff member landed, the manager was expected to amend the original review with knowledge learned during this meeting. Sometimes the manager was specifically directed by the meeting about some point to include.
- There were often surprises. One, in particular, was of a staff member who was very good, arrogant, and hard for management to work with. However, the member happened to be, by a large margin, the most experienced on a large (maybe 15 members) group which had been given the mind-numbing task of bringing a large set of (software) libraries and tools gathered from "every which where" up to release standards. As it happened, many other managers had heard from the junior members of this group that the experienced member had been extremely kind and helpful to all of them, mentoring and teaching them constantly and never taking any credit. All the way, the direct manager had not been shown this behavior. The upshot is that the direct manager learned an important lesson that people can have more than one facet, the staff member was ranked very highly, got a big bonus and raise, and was sounded out about possible promotion. Turns out that the staff member was happy to hear about all that but really just wanted to go back to small, hard projects that could be done pretty much alone without much management. And so that happened.
- The distribution of competence clearly bulged in the middle with a long tail towards the better end. It was not a normal distribution. Only near the lower ranks did performance start to drop off noticeably. In almost all cases, the problems were either temporary (say, an accident with lots of medical treatment) or something else that could be remediated fairly easily. It was common for someone, after a year ranked at 30%, to jump up to 60, 70, or 80% just because an appropriate adjustment had been made. There were seldom such dramatic falls.
- Because of this, even the staff at the bottom of the rankings could be treated generously and fairly. During my tenure in that job, there were only one or two tagged for termination and one had already figured it out and found a job outside before the termination could even be organized. The other that I can remember may have moved on voluntarily as well. However, a few years later I worked somewhere else where that staff member had worked for a while. The member was also terminated at the new place. That suggests that mechanism had actually identified someone who was not a good worked in the field.
Finally, a word about consensus as a decision making mechanism.
The rankings were done by consensus. We did not move on from slot one until the management group (about 25 people) had agreed who was deserved that rank. Consensus does not mean a majority vote. Rather, it is a mechanism in which every voter has a veto. The discussion cannot end until every voter agrees that the decision is to stand. So long as at least one voter "vetoes", the discussion continues.
The objection is that discussions can go on for a long time. Apparently forever!
The actual effect is that a decision desired even by a large majority can be held up by a strong enough counter view. That may seem bad. But the meta-effect is even when the decision is very close (51% to 49%), if there is consensus, even the opponents agree that it is the best decision to be had. The opponents may feel it is wrong, but they will also feel that
- they and their objections have been completely and fairly heard, and
- more time and energy will not create a decision different from this one — or, at least, it would be not worth the effort to do that.
[I have also used this at other kinds of meetings to make technical decisions and it works surprisingly well. Often, if a break is taken (say, come back a week later for another discussion), the objection will vanish or the two sides will have found a new path to an even better solution. The key seems to be let every one feel that they have have been heard out and that what they have been said has been absorbed. Then the final decision can be accepted.]
The summary.
- Do prior work before the decisions (of any sort).
- A group makes the decision and consensus works. Do not rank (or promote) based on the observations of a single person.
- Rankings are not punishments. They are a way for management to figure out what needs to be rewarded and what staff needs help to do better.
- Almost every staff member has positive contributions and even those whose contributions are fairly small can get better once the roadblocks are known and addressed.
- Ranking can identify staff members who should not be part of the organization. There should not be many of those (or perhaps it is time to start ranking those who do the hiring). But the number of them should never be decided by some mechanical rule.
- Finally, fear is not a good motivation for good performance. And those who think it is should be considered for termination.
I have perhaps been lucky in my life and in my places of employment. But I was continually surprised by the good work done by those who were simply treated like human beings.
-
Sounds like you would have been a great person to have as a manager. Thoughtful, caring, and trying to make your team as strong as possible. I’m struggling with this right now and I’ll take your advice.
There you go taking all the fun out of the process by imposing a rational framework on what is essentially a social organism. Here is my experience with annual reviews/bonuses from a now defunct Fortune 50.
Each year we were graded on a five point scale across the usual pile of attributes.
1-outstanding across the board
2-meets or exceeds expectations
3-consistently meets expectations
4-misses some expectations
5-consistently fails to meet expectations
Nobody’s perfect, so there were no 1s.
Anyone would was heading toward a 5 was put on a performance plan early enough not to be around anymore, so there were none of those, either.
There was a fixed pot of bonus money to be split among the middle. Management would decide who was to receive fattened bonuses. They would be graded as 2s. Anyone who was actually having a good year but whose turn it wasn’t got knocked down to 3s. A few hard luck cases having a poor year were kicked up to 3s. Occasionally, someone had to take one for the team and they would be taken down a notch or even two. Happened to me the year before I made my bones and ascended to an exalted rank.
That’s the problem with economic rationality—there’s very little to go around.
This is correct.
Great post! Appreciate the sports callout in the notes. I didn't call it out explicitly here and Pareto isn't quite right), but I did notice that sport performance has a fat positive tail and a lot of distribution mass under the median.
https://open.substack.com/pub/achromath/p/net-on-court-win-probability
I wanted to agree with this because we loved Pareto distributions but we used to say 80% of the work was done by 20% of the team. I think though you're kind of saying 80% of the team is essentially one level of productivity/contribution and then 20% are either hiring mistakes or promotion mistakes.
There's no way around three aspects of performance management:
• At some "n" (~100) there is a normal distribution of contribution. It might cluster around 3 or 5 groups but it is. Just as in freshman calculus at MIT there is a cluster of scores this way even though everyone got 1600 on their SATs and 4.0 GPA at admission. The biggest failure of the implementation of this distribution is too small a set of people (lazy management that avoids the process of finding groups) OR mixing job functions and/or expectations to make the group. The latter is not comparing similar levels of experience or combining job functions that make it impossible to compare output.
• There is a budget for dollars/stock and it is fixed which means no matter what you have to create some distribution and enforce it. You can always just give everyone a gold star and randomly assign amounts of money but that is always perceived as political. You can give everyone the same amount of money but that is a different kind of politics. If you presume too much similarity in people then the 20% doing all the work won't get meaningful rewards.
• Part of a finite budget is that you can't promote everyone all the time or said another way you can't promote every individual through the entire system. There's not enough salary budget or bonus budget. I like what you say about a short term performance "error". As with giving everyone a gold star you can also just separate out titles/levels/rank from compensation and then everyone will be a VP like in a bank but people will find other ways to know who is senior.
I was at GE from '94-'02. I imposed the Welch Rule on my team.
A few issues come to mind:
First, no one knows what the marginal productivity of someone is unless the job is extraordinarily routinized. It doesn't work if workers call on different customers, have different products with different customers in different markets or if the returns are a result of team efforts and on and on.
Second, Competition for employees by competitors drives the greatest change in individuals' compensation, not promotion. But changing jobs is almost entirely driven by the ever changing needs of business, e.g. AI engineers in the bottom 10% are doing pretty well right now.
Finally, performance of a single employee can change enormously just by moving to a different job (and boss) in the same business)
I found your argument about Pareto distributions compelling, but I wonder if it oversimplifies the role of context in employee performance. For example, wouldn’t team dynamics, resources, and managerial support significantly skew individual outcomes, making it harder to fit performance into any single distribution? It feels like the Pareto assumption might still miss some of the nuance in how performance is actually shaped within organizations.
And here I was expecting a political post.
"Establishing governmentwide limits on rating levels will promote a high-performance culture." OPM Quote
https://chcoc.gov/sites/default/files/New%20Senior%20Executive%20Service%20Performance%20Appraisal%20System%20and%20Performance%20Plan%2C%20and%20Guidance%20on%20Next%20Steps%20for%20Agencies%20to%20Implement%20Restoring%20Accountability%20for%20Career%20Senior%20Executives%20FINAL.pdf
Ah, seems you're writing the substack I wish I could. I envied this and loved it all at the same time, look forward to reading more
Man, don’t cut into my performance bonus
Wonderful article Tim! It is completely flawed to assume that employee performance can be modeled using Gaussian Distribution. On top of it , they use the top and bottom 5 or 10% for you know what.
I wrote an article on "Flawed logic of bottom 5% layoffs"
https://substack.com/home/post/p-156598569
You force fit into a curve where the performance does not belong. I agree when you work on machined, sheet metal, plastic parts, the dimensions will fall under Gaussian distribution, not employee performance
I dont think anyone who has managed larger teams or even a small/medium business thinks the value of employees to the business is normally distributed. Price's Law also has important relevance (50% of the work is done by the square root of the total people who participate in the work). Curious if you had additional thoughts on this point. I would think more time should be spent identifying and keeping Price+Pareto talent
Very interesting article (linked to from Byrne Hobart's newsletter); I subscribed post haste and hope to see more posts from you in future. Incidentally, re log-normal vs. power-law, for the Patreon "creator economy" service the data show that both the (monthly) earnings per project and the number of patrons per project appear to follow a log-normal distribution. See for example https://rpubs.com/frankhecker/993611 and https://rpubs.com/frankhecker/994383 (Patreon is a more extreme example of "pay for performance" than any corporate environment, with a Gini coefficient of 0.84.)
Though this conflates outcomes with performance
> On the other hand, looking at the Pareto percentile plot, the bottom 10% aren’t really all that different from the folks in the next 10%. As a matter of fact, there’s not an obvious place to draw a line to identify the “lowest end” employees to expunge. ~65% of employees are performing below the expectations that are associated with the salary midpoint (the green dashed line)!
> Summarizing:
> There is no intrinsic bottom 10% that needs to be expunged annually. Let managers identify any hiring errors if they think their team can do better, but don’t set a target for this number based on faulty statistical notions.
That's funny. The summary I would have put together from this is that managers should target cutting the bottom 65%