Un-breaking the 9-box

In the 1970s, a practice known as “9-box” was developed as a simple way to identify different levels of talent within an organization. By the late 1980s and early 1990s, it became the “de rigueur” of succession planning that persists even 40 years later (that’s right, it is nearly a half-century old paradigm.) The practice of 9-boxing uses the two spectrums of 1)performance and 2)potential to create a two axis chart. The corresponding axes are then segmented into 3rds representing low, mid, and high on each spectrum. The resulting visualization is 9 “boxes” that people can be categorized into based on their respective performance and potential. The idea being, you want to promote the high performers with high potential and appropriately resource development to the rest. On the surface, it certainly seems clever.

30 years later I’ll ask the Tyler Durden (Chuck Palahniuk’s “Fight Club” vehicle of anarchy and disruption) question of “how’s that working out for you? Being clever?” If you feel you have excellent leadership at all levels, low micromanagement, high engagement, and high trust…then I’d agree with Tyler “Keep it up then.” But according to research from the Corporate Executive Board (CEB), 66% of companies invest in programs that aim to identify high-potential employees and help them advance, but only 24% of senior executives at those firms consider the programs to be a success. A mere 13% have confidence in the rising leaders at their firms, down from an already-low 17% just three years ago. If only 13% of execs have confidence in their leadership talent, perhaps it is time to question how clever the 9-box method really is.

Good Method/Bad data

The madness with this method is not in the method itself but in the source of data used. For instance, most individuals, managers, and even HR pros agree…performance reviews do not provide an accurate depiction of someone’s actual performance. According to a recent survey, 45% of HR pros said their performance management reviews were not indicative of actual performance. (SHRM/Globoforce, “Employee Recognition Survey”) 2012.) If we extrapolate that, we could correlate an approximate 50% accuracy rating for that entire performance spectrum within the 9-box. Even if we assumed the measurement for potential was 100% accurate…multiplying the accuracy of the two (as a matrix would imply) gives us an accuracy rating of 50%. So in just looking at ONE axis, we are already 50/50 on how effective the 9-box actually is in identifying top talent. The next axis is even less accurate.

How do you judge potential? From a physic’s perspective, “potential energy” is stored energy that can be objectively measured. But the measurement of potential energy in physics relies on a few known constants (gravitational force, mass, slope, density, elasticity, etc.) and usually only relates to one vector (movement in a single direction or for a single purpose.) People are anything but constant. So evaluating the “potential” of people especially as it relates to leading in dynamic environments with variable teams…well, I think it is safe to say the accuracy is probably less than 50%. In fact, when you look for scholarly articles on assessing leadership potential, first you will be struck by how little research there actually is; and second, when you do find it, the results are far from confidence inspiring. The large majority of research indicates that assessments of leadership potential are only useful within the construct of a known environment with a known purpose. In other words, you can only reasonably predict how someone will act in a specific situation towards a specific outcome. For all other situations and outcomes, you might as well throw darts at a dart board…with a blindfold…upside down. Everyone will show up differently than your initial assessment might predict.

To sum it up, a recent study of High-Potential cohorts done by Zenger/Folkman indicated that more than 40% of individuals within High Potential development programs…didn’t belong there. It’s relatively clear, the current 9-box method does not work. At least not with the data we are using.

Broken Data = Broken Outcome (aka “garbage in - Garbage Out”)

Maybe it is the data nerd in me…or maybe it is the philosopher in me…but I continue to chase down the where and why when things don’t add up. (In another life I may have enjoyed a career as an investigative reporter or forensic scientist.) In the case of I wanted to understand, where are we getting the bad data and why is it bad. The “where” is relatively easy for most organizations. Performance Reviews are the gold (albeit “fools’ gold”) standard and for the vast majority, with the final assessments made by the person’s direct manager. So the source of bad performance data is the manager. But WHY. That answer gets a little more complex but suffice to say, it’s not necessarily the managers’ fault.

In the 70s and early 80s (near the genesis of the 9-box…and performance reviews, for that matter) management by “walking around” was an encouraged practice. The balance of individual task management and team management was more favorably skewed on the side of team management. This greater amount of team focus often gave managers a more visible line to how someone was performing in their role. Over the past few decades, the increase in technology and the combination of economic downturns and the (misuderstood) practice of “LEAN” operation, has lead to a shift in task/team management responsibilities for managers. Today, most managers report 70% of their job to be individual task execution and only 30% team management. With that shift and the increase in remote working (especially in the age of COVID-19) managers simply do not have the visibility they once had. In addition, less than 15% of all managers report ever receiving guidance on accurately evaluating performance. So if the current practice of performance evaluation assumes that managers have accurate visibility to their teams and reasonable knowledge of how to evaluate talent…we have a process based on bad assumptions.

As it relates the evaluating leadership potential, this too most often comes from the manager making is subject to the same results based on misguided assumptions. Additionally, evaluating leadership ability from a superior position is inherently flawed anyway. It is impossible for someone who has never been lead by an individual to appropriately assess leadership behavior. Just as you would be unable to evaluate the cooking ability of someone if you’ve never eaten their food, you are unable to evaluate leadership ability of someone if you have never been lead by them. So along with the inability to accurately judge potential based on a dynamic environment, evaluating potential is also subject to the same misguided assumptions that make performance reviews inaccurate PLUS you have a problem of perspective.

In short, we are looking in the wrong place (managers) when it comes to evaluating performance AND leadership.

Better Data = Better Outcome (AKA “Better Ingredients = Better Food”)

If I told you to pick what you thought would have the most successful outcome between the following two options, which would you pick:

A platform for reference information that is well funded by one of the largest software companies in the world, incorporates articles from some of the worlds most successful encyclopedias, hires the brightest subject matter experts around the globe to contribute, and is sold with annual updates as required.
or
A similar platform for reference information that is created by volunteers, can be edited by anyone, contains socially reviewed content, and costs nothing for people to use or access

Most people would pick option 1. After all, how could something funded by the largest software company at the time (Microsoft) lose to something created by volunteers, edited by anyone, and remains free to this day? Well, I am sure the creators of Encarta (Option 1) continue to be puzzled by the success of Wikipedia (Option 2). The irony of technology is while it is enabling experts to reach more and more people much quicker…social advancements and knowledge expansion is happening faster than experts can react. Essentially, while technology allows experts to get their message out faster, it is at the same time diluting their individual credibility. In other words, there are more experts out there than we ever realized.

Technology has opened up a whole new world of information sharing, growth, and access. This can be seen with things like Amazon Reviews, Google Reviews, Yelp, TripAdvisor, and a host of other socially aggregated evaluation sites. They have all supplanted the “expert model” of cultivated encyclopedias, restaurant/hotel reviews, and consumer goods evaluation with a “community intelligence model.” This is not to diminish the contributions of experts, rather it acknowledges the limitations of limited exposure that a small group has in comparison to an exponentially larger group. Simply put, more perspectives with direct experience = better information.

SOcial Data, Competency, and Trust

Performance in the job is important, but in a world where self-promotion seems to reign supreme and little veracity is taken to verify it, you can be easily led astray by charming sycophants. Anyone can claim responsibility for the outcome. I knew a guy during grad school who always showed up on the last day of a project to present “his” team’s work. From the professor’s perspective, he was skilled and leading a successful team. But from the team’s perspective, he was skilled at timing and taking credit. This is where social data comes into play. Would team mates go to that person for expertise? Certainly not. They didn’t do anything other than take the credit for other people’s work. Social data from the people who work directly with someone will likely get you a better picture of their performance. The team knows who they go to, who they trust, and who they look up to. You only know who turns in the project. If you are not collecting social data on people’s performance, you are missing a HUGE component of leadership effectiveness.

Task completion does not imply mastery. Take the Socratic method as evidence. The example of how Socrates walked a plain-minded servant into proving the Pythagorean theorem shows how a master can tease an answer out of anyone. As such, if directions are written clearly, anyone can follow them to successful completion. It doesn’t provide any evidence that the person completing the task understands the principles behind the task, it just proves they can follow directions. Typical performance management practices are focused on task completion, not levels of competency. Competency implies a thorough understanding or various principles within a given body of knowledge, the ability to analyze information in light of those principles, synthesize disparate information, and creatively adapt novel solutions to generate an outcome. Competency is a measure of how effective someone is when no directions are available. If you are just measuring completion and not competency, you are likely giving good “doers” an advantage of “creative problem solvers”.

Lastly, a leader is often defined as “someone others would follow if given the choice not to.” In most business scenarios, employees are not given a choice of who to follow. But trust me, there are people they would choose to follow, and more importantly, many they would not follow out of a burning building. Trust is the foundation of psychological safety, which, as Google’s Project Aristotle found, is the key differentiator between average teams and high performing teams. Leader trust is the cornerstone of effective teams. When team’s do not trust their leader, they stay inwardly focused, always concerning themselves with self-protection. Not giving information away, not doing more than necessary, and not speaking up with new ideas or areas of concern. So if you are not measuring trust in your people, you are likely missing some of your best leaders.

An improved way to identify leaders

Using social data (this includes manager input), competencies, and trust, you can build a MUCH more accurate model of how to identify leaders. Great leaders are people others trust, respect, and align with the organizational culture. Traditional performance management put task performance as the most crucial element, and while that may hold true for getting individual tasks done, it can sometimes be the opposite as it relates to leadership performance. Leadership trust is potentially the #1 predictor of new manager success therefore, we suggest a new grid, based off of social data that ranks the following areas in order of predicting leadership success:

Trust
Core Competency aggregate (Organizational Citizenship/Value Alignment)
Functional Competency aggregate (task completion)

An example of a highly recommended leader for either a function or a team would score high in all three. A potential leader for a functional leadership role might score moderately in trust, but high in core values, and functional competency. Someone who is skilled in functional competencies and organizational values may justify a Subject Matter Expert designation without a leadership role.

Fix the filter before you treat

By selecting leaders who are skilled with people over those who are skilled at the job, you reduce the amount of development needed to improve their leadership effectiveness. Often companies will promote a person who is skilled at things but unskilled with people and spend copious amounts of money and time trying to fix their people skills. Since people skills are foundational to leader effectiveness, most companies have their identification process backwards. They continue to promote potentially toxic functional experts at the cost of dedicated and steady team members. If we get better at filtering in those with people skills and filtering out those who lack such ability, the amount of development for leader effectiveness will naturally decrease, retention will increase, and engagement will benefit.

Dave NeedhamOctober 23, 2023Comment