Visualizing the CRU E-Mails

November 29, 2009

Very cool visualization of the Climategate e-mails over here. For more information see the Computational Legal Studies blog post.

Additionally, they have hub and authority scores for the authors of the e-mails. I like.

Thanks to Pankaj Gupta and Drew Conway for pointing me to this.

ClimateGate: Free The Data

November 25, 2009

I wanted to get this out because I’m quickly becoming consumed with other things. But I’ve been following the ClimateGate scandal for coming up on a week now. And every time I turn around it looks worse for anthropogenic global warming.

For those of you who don’t know what I’m talking about, here’s a quick summary:

Someone stole (or possibly leaked) a ton of files and e-mails from the Climate Research Unit

My position on climate change has heretofore been: “I’m not a climate scientist, but there seems to be a pretty significant agreement among those who are that the main points of climate change are solid. The earth is warming and humans are causing it to some degree. The extent to which humans are causing it (do we account for 90% of the change? 50%? 30%?) and what to do about it seems to still be a matter of debate. “

I’ve read a number of the journal articles on the matter just because I’m interested enough in what is going on and my inclination is to get as close to the data as I can.

Because that’s my thing. Data.

Everything about data is vital to the scientific process. How we collect it, how we analyze it, how we compare different sets… these things are desperately important to good scientific work. When data gets too big, we use statistical analysis to understand it and models to predict what will happen next.

Most importantly, for science to work we need people to check our work. The next scientist down the line should be able to work his way to the same conclusion in order to be able to rely on moving toward the next conclusion. Verification is the heart and soul of the scientific process.

And the process is more important than the result. If you don’t believe me, go read up on Fermat’s last theorem. Pierre de Fermat made a conjecture in 1637 that turned out to be true, but mathematicians couldn’t prove it for over 300 years. That the conjecture was true is important, but how we know it is true is the key part.

That is why I am so pissed off at the scientists at CRU. If you read their e-mails (a good collection of what they say has been collected by Bishop Hill), they spend a ton of energy making sure other people can’t do independent verification of their data. They attack people who disagree with them, not because those people have bad data or use poor process, but because the results are not consistent with the message the CRU scientists are trying to propagate.

Add to that the fact that the CRU e-mails reveal an almost violent disregard for proper scientific peer review in favor of bullying journals into accepting only appropriate papers. And they make no bones about it: Appropriate is defined in relation to the desired result. If the result is different from what they want to hear, they worked tirelessly to politically punish people who found those results.

And we haven’t even started talking about the code.

I have a solution to this, one that I believe is non-partisan and vital to future work:

  • If a paper is going to be referenced in an IPCC report, they need to post their all the data, an explanation of the process and the code for the paper where anyone can look at it and verify it.
  • Any grants that are offered with federal money should require public access to the data, the process and the modeling code. If “the people” bought the research, we should be able to look at it, not just at some 10 page summary report.
  • Any paper used for public policy purposes should hold the same requirement.

In short, this is a call to free the data. We can’t make decisions in the dark. If these guys have done good science, anyone with an appropriate expertise will be able to verify it.

Is this unfair to climate scientists? A violation of intellectual property?

Forgive me if I don’t give a sh**. These guys have crapped all over the scientific method and made a mockery of objective science. This kind of bad PR will take years, possibly decades, to overcome. If they want to keep their data to themselves, they can get a private firm to support their research and stop using their findings to push public policy.

Take note: This does not mean that the conclusions the CRU scientists have come to are wrong. They could be 100% right and still be huge assholes who want to hide their data from everyone else. But we have no reason to believe that they are 100% right because we can’t see the data and we don’t know their process. Just because you cheer the deaths of your opponents doesn’t make you wrong. In the future it’s going to take more to convince me than “But the scientists SAID SO!”

Also, given the blatant and horrific way in which these people have manipulated the peer review process, the “But the skeptics aren’t published in peer reviewed journals” argument is a pretty sh***y line of attack from here on out. Just from reading the e-mails, we can see that:

  1. That isn’t even remotely true
  2. Manipulation of the peer review process has been a top priority for these scientists, to the point of intentionally ruining careers and lives.

From here on out, they can have my confidence in their results when I see their data.

This is pretty funny. Or horrifying. Depends on how you want to look at it.

Several days ago, I noted on Twitter that there were a lot of “saved” jobs that weren’t saved at all but actually cost of living increases. About 24 hours after I noted this, there was an Associated Press article about that very phenomena.

Coincidence? Almost certainly. But I’ll flatter myself anyway.

But the laugh riot comes several paragraphs into the article as they look into why Southwest Georgia Community Action Council was able to save 935 jobs with a cost of living increase for only 508 people. The director of the action council said:

“she followed the guidelines the Obama administration provided. She said she multiplied the 508 employees by 1.84 — the percentage pay raise they received — and came up with 935 jobs saved.

“I would say it’s confusing at best,” she said. “But we followed the instructions we were given.”

“Confusing at best”? The multiplication of percentages is “confusing at best”? It seems obvious to me she should have multiplied 508 people by the amount the increase (.0184) and gotten 9.3. But she forgot that you have to divide the percentage by 100 before you multiply.

The fact that she had “saved” more jobs than there were people in the organization should have been a tip-off. But this is a pretty common problem with people who don’t have a very good grasp on mathematics… they don’t recognize obvious mathematical errors, they just plug in the numbers and go with whatever comes out.

And this, children, is why you pay attention at school. So you don’t get in the national news for doing something really stupid and then blame it on the instruction manual.

One of the key talking points for the stimulus that was passed earlier this year was that it would “save or create” jobs. Lots of jobs. Oodles of jobs. Jobs piled so high, we’ll have to hire people to dig us out of all the jobs we will have.

Or, more specifically, the Obama administration stated that they would “save or create” 4 million jobs.

This led to a great deal of mockery over the “save or create” turn of phrase, but the administration set out to actually measure the number of jobs that were saved or created by having recipients of the stimulus funds fill out a form in which they indicate how many jobs that particular chunk of the stimulus created (that form can be found here).

Now, if you look at recovery.gov, you’ll see that the stimulus has “saved or created” 640,000 jobs. That is only 16% of the promised jobs, but it’s still a pretty big number. I was curious how they got it, so I downloaded the raw data and started sifting through it. This is what I found:

  • Over 6,500 of all the “created or saved” jobs are cost-of-living adjustments (COLA), which is really just a raise of about 2% for 6,500 people. That’s not a job saved, no matter how you calculate it.
  • Over 6,000 of the jobs are federal work study jobs, which are part time jobs for needy students. As such, they’re not really “jobs” in the sense that most other federal agencies report job statistics (We don’t count full time college students as “unemployed” in the statistics.)
  • About half of the jobs (over 300,000) fall under the “State Fiscal Stabilization Fund”, which can be described like so: Your state (perhaps it rhymes with Balicornia) can’t afford all the programs it has running, but when the state government tries to raise taxes, people yell and scream and threaten to move. The federal government comes in with stimulus funds and subsidizes the state programs. Consider this a “reach-around” tax in which the state can’t raise taxes its citizens any more, but the federal government can. So the federal government just gives the state the money to keep running programs they can’t afford on their own.
  • There are, scattered hither and non, contracts and grants that state in no unclear language that “This project has no jobs created or retained” but lists dozens, if not hundreds, of jobs that have been “saved or created” by the project. It makes no sense whatsoever.

Finally, there is a statistical problem to the data here that I’ve not heard discussed at all, the problem of job duration.

Because there is no guidance in the forms on the proper way to measure “a job”, recipients are left to themselves to figure out what counts as a job. Some of them fill it out by calculating “man-weeks” and assume one “job-year” to be the measurement of a single job. Others fulfill contracts that only require two weeks, but they count every person they hire for every job to be a separate job created.

As an illustration: Let’s say you have a highway construction project in the Salt Lake City area that takes one month. A foreman is hired for the project and he brings on 20 guys he likes to work with to fill out his crew. That is 21 jobs “saved or created”. While that job is being completed, the funding if being secured for another highway construction project. By the time that funding goes through, the first project is done and they decide to just move the whole crew over to the next project. That is another 21 jobs “saved or created”.

If this happens four more times, on paper it looks like 124 jobs have been “saved or created” when in reality 21 people have been fully employed for six months. But if you judge jobs through a “man-weeks”/”job-years” lens, you have 10.5 jobs.

This is how the Blooming Grove Housing Authority in San Antonio, Texas can run a project titled “Stemules Grant” to create 450 roofing jobs for only $42 per job. My educated guess is that they hired day-laborers, paid them minimum wage or below and only worked them for a single day. Each new day brought new workers which meant more jobs “created”. Either that or they simply lied on the form. (UPDATE: USA Today interviewed the owner here. He says that he used only 5 people on the roofing jobs but that a federal official told him that his original number wasn’t right, so he adjusted it to count the number of hours worked, not the numbers of jobs created.)

Rational people can see that this kind of behavior skews the data upward. How much upward? It’s hard to say, although it is a safe bet that any project that manages to create a job for less than $20,000 is probably telling you some kind of fib.

My ultimate conclusion from looking at the jobs data is that:

  • The jobs numbers reported on recovery.gov are heavily exaggerated
  • The jobs numbers reported are not subjected to any scrutiny or auditing whatsoever; they are a simple data dump and therefore be seen with heavy skepticism
  • The jobs numbers are a laudable transparency effort. I’m impressed that so much work has gone into trying to measure the results of the stimulus funding. Normally, these kinds of numbers would be shrouded in mystery and a normal Joe like myself would be unable to investigate them. Kudos to the Obama administration for implementing this data gathering and display initiative. However, they put too much faith in the data and statements like “The stimulus has saved or created 640,000 jobs” are uttered with a profound ignorance in the nitty-gritty details of what the data actually says.

For more interesting stimulus jobs data, you can see Paul Krugman getting angry about it here and Greg Mankiw responding to that anger here and Brad DeLong calling Allan Meltzer a shameless partisan hack about the topic over here and a story of how $900 worth of boots became 9 jobs over here. Or you can just download the jobs data and look through it yourself. There’s lots of interesting stories in there.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers