Por Mauro Rebelo

Is it P, M or G? Improving the use of the Fibonacci scale in measuring the effort of Sprint tasks

young-confident-scientist-pointing-at-magnified-mi-2021-12-23-00-20-18-utc (1)

For the last three years, we have been using agile methodology to develop biotechnology projects. One of the main challenges is assigning a value to the effort required to carry out a task.

We had learned from Jeff Sutherland in ‘Scrum – the art of doing twice as much work in half the time’ that our brains are very bad at assigning an absolute value to a task, such as the number of hours (which was totally in line with our previous experience), and instead work with comparative scales. In his words:

So you have a list of things that need to be done, and you’ve prioritized each item. Now the job is to figure out how much effort, time and money will be needed for the project. As I’ve said before, we human beings are absolutely terrible at this, but apparently we’re good at relative size classification – comparing one in relation to the other. Consider, for example, estimating the difference between small, medium and large t-shirts.”

He moves on to some ways of assigning values comparatively, until he arrives at the Fibonacci scale. In the book, he gives some extra reasons why we should use this scale.

“But perhaps you’ve noticed a pattern in the numbers I’ve defined: 1, 3, 5, 8, 13. Each number in the series is the sum of the two previous numbers. It’s called the Fibonacci sequence, and there’s a reason we use it. It’s everywhere.”

“Humans are programmed to find proportions attractive. For our purposes, all that is important to know is that our species has a deep understanding of the proportions of the Fibonacci sequence. We know it intuitively.”

“The numbers in the Fibonacci sequence are far enough apart that we can easily feel the difference. It’s easy for someone to tend to one side or the other. If one person estimates something as five and another as eight, we can see the difference intuitively. But the difference between five and six? It’s quite subtle, more subtle than our brains can register.”

“In medicine, it’s well known that for a patient to report an improvement in a symptom, the difference has to be greater than 65%. Our minds don’t work in steady increments. We are better at perceiving jumps from one state to another – and not stable jumps, but irregular ones.”

He ends with the conclusion that Fibonacci is the best scale.

“Using the Fibonacci sequence to calculate the size of a task allows us to make estimates that don’t have to be 100% accurate. Nothing will be exactly a five, an eight or a 13, but by using these numbers we have ways of gathering opinions about the size of a task in which everyone is using more or less the same unit of measurement, and in this way we can form a consensus.

Making group estimates in this way is a way of getting a much more accurate forecast than if we had to do it alone.”

And then we went straight to the Fibonacci scale as a metric of effort

As Jeff anticipates in the book, one of the great challenges of using the scale is that everyone has the same parameter. He suggests solving this with “Planning Poker”.

“The idea is simple. Each person has a deck of cards with those super interesting Fibonacci numbers – 1, 3, 5, 8, 13 and so on. Each item that needs to be estimated is brought to the table. Then everyone pulls the card they believe represents the effort needed to complete it and places it on the table with the number face down. Everyone turns over their cards at the same time”

“If everyone is one card away from the other (say, a 5, two 8s and a 13), the team just adds up the results and takes the average (in this case, 6.6) and moves on to the other item. Remember that we were talking about estimates, not armored schedules. And estimates on small parts of the project.”

“If the numbers on the cards show a difference of more than three, the people with the highest and lowest cards talk about why they believe their number is the right one. Then we do another round for the same task. Otherwise, they just average out the estimates that will come closest to the numbers those Rand Corporation statisticians came up with.”

“This incredibly simple method is a way of avoiding any kind of anchoring behavior, such as herd or halo effects, and allows the whole team to share knowledge about a specific task.”

“However, it is crucial that you are with the team that is actually going to carry out the work to make the estimates”

Poker helps to create an intuitive consensus, but if we stray a little from people’s area of expertise, the variations between Fibonacci become gigantic. This began to become more of a challenge than an impediment to using the scale to estimate effort.

Could effort have an absolute metric? Could we better align the perception of effort between people?

I thought that using a metric with greater granularity might help with this alignment. A metric with fewer options. And with which people were more familiar. After all, how can we measure effort by comparison if we’re not clear about the comparison? In the book, the first metric that Jeff comments on, the size of the shirts, P, M or G, meets this criterion.

In addition, this scale has the advantage of familiarity: everyone buys T-shirts, knows their size and can easily estimate their friends’ T-shirt size (for example, to buy a birthday present). This metric has another advantage: 90% of observations fit it. Yes, we can have PP, GG or even GGG, but if we imagine that problems such as people’s shirts have a normal distribution, then these observations are beyond 2 standard deviations from the mean.

Can we associate the Fibonacci numbers with the PMG scale? I decided to do this exercise.

The easy tasks, P tasks, are Fibonacci 2 to 5. This makes Fibonacci 1 tasks very easy, points outside the curve, or PP. Tasks 8 and 13 are medium tasks. Tasks 21 are difficult tasks. Tasks 34 are very difficult tasks GG and tasks 55 are very, very difficult tasks GGG. You see, even though we can think of 89 tasks or even higher numbers on the Fibonacci scale, it’s more likely that they can be broken down into smaller tasks that fit into one of the other categories.

The result would look like the graph below, with the frequency of tasks on the Y axis and the estimated effort on the Fibonacci and PMG scales on the X axis. The curve is asymmetrical with a higher concentration of low-effort tasks and a long tail of high-effort tasks.

IMG 2802

This graph is very representative of my list of tasks and I think it can help others to better allocate effort to theirs. But I need to make three final points:

The first is that I developed an absolute time metric for the easy and very easy tasks which, by comparison, helped me to measure the medium and difficult tasks. Most of my daily activities are Fibonacci tasks 2. They are involved in the production of content, which can be an e-mail reply, a correction to a proposal, the analysis of a result. They take about an hour and usually all the materials I need are at hand. Tasks that require me to move around or first look for the necessary information somewhere I don’t know what it is (or information that I don’t know exists), and therefore can last from 3-24 hours, are tasks 3. A blog post like this, which requires several days of research and writing, can become a chore 5. In the same way, I can do a very simple task, such as a call or an email to clarify a specific doubt or align an expectation. They last 15 minutes and would have Fibonacci 1. I hardly count these activities. Maybe it should, but my feeling is that the productivity gain from recording and accounting would be irrelevant. Even smaller tasks, such as looking up a date in the calendar, are definitely not worth the effort of counting.

Secondly, and very importantly, there may be overlap between the high and low Fibonacci tasks. Some low Fibonacci activities on your to-do list may actually be sub-tasks of a high Fibonacci task. So the total number of Fibonacci points on your list of things to do at any given time may be overestimated more often than underestimated.

The third, but no less important, is the impact of this effort metric on the pricing of your time or service. Not every price is a direct measure of effort, because knowledge tends to reduce the effort required to perform a task, but increases its opportunity cost. That’s why the 5 minutes it takes you to do a task, not because it’s easy (and could be done by anyone else), but because it’s very difficult and you’ve accumulated the knowledge and experience to do it quickly, costs much more than the 5 minutes it takes to do an easy task. But let’s leave that discussion for another article.