A Day in the Life, Fancy Statistical Tools, Help!!!

I’m not really sure where I’m going with this post. Maybe its “a day in the life of an economist” combined with “anyone have any suggestions?” In any case, it gives me an opportunity to put some thoughts on paper, and hopefully you’ll find it interesting.

My day job is as a consultant. I do some strategy work for companies (not as much as I’d like – I have not been that great at marketing myself at these jobs, but if anyone is looking for an economist, drop me a line!), some forecasting (mostly for phone companies), and I build statistical models and/or statistical software (mostly for the military or NASA or the like). The latter projects come via a firm for which I am a freelance employee. I tend to get the oddball projects, and its really a lot of fun, because basically it means that every time, I have to literally invent something from scratch.

The problem is that every so often something comes along that kind of stumps me. Here’s an example… we have a project to estimate the costs of developing satellites that will not be ready for launch for at least 15 years. In other words, satellites for which some of the technology at least doesn’t exist and won’t exist for a while. I have descriptions of a couple of hundred (mostly US) satellites, and an analyst converted these descriptions into some basic information. This includes some things like mass, size, year of launch, design life, etc., but also some measures that are more specialized toward satellites, such as autonomy, method of spin control, size of solar array, etc.

Originally, we built a tool that was almost or just barely (depending on how you looked at it) artificially intelligent to rummage through the data and reach a conclusion. It took us about six months. Sadly, it didn’t do any better than at forecasting 15 years out than I was able to do with some simple regressions. As I’m getting older and the jobs are getting more complicated, I’m finding that the complex methods really don’t always buy you all that much, at least when it comes to forecasting. (As an aside… this is one of the reasons I’ve always been suspicious of the unemployment figures. One of the components of those figures comes from an ARIMA model. ARIMAs are not particularly complex, but if you pick the right parameters, you can literally get any answer you want, from unemployment = -5 trillion to unemployment = + 5 trillion. Throw on an administration that trusts political hacks more than professionals in any field, and who knows what they’re spitting out.)

And its not like we’re using the tools poorly. I’m well aware of GIGO. But we’re not blindly shoving variables into the maw of some piece of software. We’re building the software, we know how it works, and we are using them in the, um, appropriate way. But… at some point, intuition matters more. There’s more mileage from creating the right variables than from using the best tools.

Here’s an example… with a satellite, as a general rule, mass is highly correlated with cost. (The way in which it is correlated changes over time. The costs of getting a box into the sky have not only changed, the way the costs have changed, and the rate of change of those changes has itself changed a lot over time.) To some extent, so is the power consumption of the thing. But, some satellites are much more (or less) costly than their mass would indicate. Communications satellites, for instance, are large. But there are sometimes reason why other satellites might be very big or very small. Similarly, the power consumption may be higher (or lower) than one would expect given the cost of the satellite. But while the mass may be misleading, or the power consumption may be misleading, I’ve noticed that rarely are both misleading at the same time. A well-constructed model will take advantage of that fact. Instead of simply being related to mass, it will be related to mass subject to some information about power.

Anyway, I’ve been beating my brains against the wall, but I’m just not producing models I’m happy with for this project. (Out year forecasts with median and average absolute percentage errors of about 50%.) Not bad considering what we have to work with, but I like to do better. I have a few more ideas, but not many. Comments, suggestions, contributions and donations are all welcome.