Measure Once, Cut Twice

Estimation Rules of Thumb

Posted in agile, software development by steve on May 31, 2010

Software project estimation is hard. In fact, it is so hard that estimating within the accuracy most people expect is actually impossible. To get as accurate as humanly possible, read McConnell’s software estimation book,[1] collect your own metrics, and then carefully and critically apply the principles. If you just need to get a quick order of magnitude check, here are some heuristics and techniques for a bottom up approach based on estimating code size.

The basic numbers are:

  • 5-20 LOC per developer per hour
  • 2000 person hours per year
  • 50 LOC/class (Java), 100 LOC/class (C++)

This method uses objects as a proxy for size estimation.[2] You need to supply the number of objects in the target software and out pops the magic number. The two dominant variables tend to be the the number of objects (obviously) and the LOC per developer per hour. The second can often be pulled from historical data. I tend to measure the start when developers are first engaged in serious coding, skipping the early requirements and visioning, and the end when the code is running, unit tested, and lightly functionally tested i.e. DCUT code (Design-Code-Unit Test). For some teams this alpha, others beta, and others Running Tested Features. However you do it, try to find reasonably consistent points and make your historical measurements. 

If you have no historical data, here is a rough continuum:

  1. 25+ LOC/person/hour — prototypes; small trivial projects
  2. 20 LOC/person/hour — small, 2-3 person team with fast micro-requirement turnaround (e.g. onsite customer, or more commonly, the developers are able to fill in many of the details of the requirements)
  3. 10 LOC/person/hour — regular agile team building a non-trivial app
  4. 5 LOC/person/hour — typical enterprise development pace
  5. 1-3 LOC/person/hour — stringent or archaic, unproductive environments (e.g. banking software); you’ll see this in some historical literature, but they are often taking into account the time beyond DCUT

Pick one that seems to fit your team size and environment. Don’t be too optimistic. How big is your team size? Is it a prototype? Do you have to worry about localization, security, scalability? How familiar is the team with the languages, frameworks, and tools? 

The 2000 person hours per year is just a shortcut to take care of holidays, sick days, bathroom breaks, and other daily down time. Also known as non-ideal programmer days (hours). 

Now the hard part. How do you figure out the number of objects or lines of code in your future software? The easiest way is by analogy. Find a similar project that either you’ve done or someone else has done. There may be some open source projects that cover some of your project scope. If so, take a look at their code bases. 

Barring that, you’ll need to do some high level design in order to start figure out how big your code will be. Knowing how many layers your architecture will have and which frameworks you’ll be using is important. More layers tend to add more code. Frameworks often provide design constraints that you can use to start to enumerate the scope of the code — count the number of services, commands, or functions. Database tables and screens are also good proxies for code size estimation. If you already have a database schema, how many objects will be needed to wrapper it? Will there be a separation of data objects and domain objects? 

Screens tend to map to template files, controllers, views, model proxies, etc. If you have both a existing database schema and requirements that map out screens, you should be in pretty good shape. If you have a pure codebase with no external anchors such as screens, database tables, web services to process, or transactions to fulfill, you may want take a different approach.

Once you estimate out how many objects it is just a matter of multiplying out the Objects * LOC per object * LOC per person per hour to get the total person hours. Multiply by 2000 to get the person years. 

Now take a look at the software estimation cone of uncertainty and realize your error bars are probably worse than +/-100%.[3] Still, it is better than nothing at this point. Ideally, you should use this technique along with a couple of others, such as a top-down work breakdown structure, gut checks with a few team members, and/or high level epic estimation via planning poker. Multiple techniques done independently (don’t taint each other!) are more powerful than one expert judgement.

Note this number does not take into account non-code related and other project related costs. Designing the database, setting up build machines, project management, and high level requirements definition should be estimated separately.

[1] Software Estimation, Steve McConnell.
[2] A Discipline for Software Engineering, Watts S. Humphrey.