May 29, 2009
OK… now anyone who wants to look at this is more than welcome to it.
This is an Excel file into which I’ve managed to put all the information about the closed and open Chrysler dealerships. Because wordpress is stupid, I can’t upload an excel document, so I had to rename it so that it is a Word document. Just download it and change the extension to “.xls” and you’re set to go.
There are four sheets to the file. Two sheets are the raw data as best I could translate it. The other two sheets are the data cleaned up a little bit to make it more readable.
WARNING: For anyone who hasn’t worked with this kind of data before… data is ugly. Some stuff is missing, some things are misspelled, names are inconsistent and addresses haven’t been parsed. This isn’t meant to be the most perfect data source of all time. It’s just a format for the data that can be more easily organized, sorted, parsed, and analyzed.
So… go at it. From Excel, you should be able to export as a CSV (comma delimited), which is nice and fun to work with from a visualization point of view.
May 29, 2009
I noticed yesterday that a good number of people are getting worked up because it looks like a large number of the Chysler dealerships that are being closed are heavy Republican donors. (Michelle Malkin does her usual roundup here)
I’m taking the time to try to do something that still seems somewhat lacking… run an actual statistical analysis of the data. I’ll post more when I get some real data, but I did want to put up a couple thoughts early on.
Thought 1: Megan McArdle says that this is likely a red herring. She points out that “Democratic and Republican dealers are unlikely to be found in the same place, and the rural counties that tend to be red are probably less profitable. I would be less surprised to find out that the administration rescued specific donors from the hit list than to find that they deliberately closed Republican dealerships.”
If there was any behind the scenes work by the Obama administration, saving Obama dealerships seems more likely than spitefully killing Republican ones. And I think that we’ve got a pretty big “if” there to begin with.
Thought 2: All the skeptics to this story are pointing to Nate Silver’s “Car Dealerships are Republican (It’s Called a Control Group, People)“. Unfortunately for them, that post is a load of statistical garbage.
Nate is trying to establish a baseline of Republican-to-Democratic donations against which he can judge the validity of the data coming from the closed dealerships. This is a laudable goal, but I get really frustrated when people use statistical or mathematical terms and they don’t know what those terms mean. I’m starting to understand that people on both sides of the isle use “science-y” or “math-y” words because it makes it look like they’re using science and can therefore be trusted. That’s exactly what is going on here.
Nate’s investigation does not a control group make for the following reasons:
- There are really three categories here: Republican donor, Democratic donor, and not a donor. He doesn’t even recognize that the last category might exist.
- He don’t make any distinction between Chrysler dealerships and other dealerships. Maybe Honda dealerships skew Republican and thereby mess up his “control group”. This is like testing a drug aimed at teenage girls and building a “control group” that includes toddlers, WWII veterans and 40-year-old soccer moms. His data is hopelessly polluted.
- He assumes that everyone who owns a car dealership will list their occupation as car dealer (or some variant). Where I grew up, Hank Aaron owned a couple car dealerships, but I think it was unlikely he listed his occupation as “car dealer”. (If I got a business card from Hank Aaron, I would want it to say “Hank Aaron – Awesomest Person in the World… and Barry Bonds Can Die in a Ditch”)
Take your pick. I got more.
Thought 3: That fact that Nate Silver’s “analysis” is a load of crap doesn’t make the other analysis better… it just makes him something of an ass for pretending that he’s better than everyone else.
Example 1: Dan Collins says:
Statistics that are available suggest that Chrysler auto dealers donated 76% Republican and 24% Democratic.
Looks like someone else didn’t control for non-donating dealerships. (UPDATE: Dan Collins comments below that this statement was revised, although I still don’t see anyone taking into account non-donors.)
Example 2: Doug Ross has a post called “Dealergate: Stats demonstrate that Chrysler Dealers likely shuttered on a partisan basis“. Towards the bottom, he has a “What Are The Odds” section in which he notices that one company, RLJ-McLarty-Landers, has six Chrysler dealerships that were not closed and claims that:
The approximate odds of such an occurrence can be calculated
He then proceeds to “calculate” those odds based on the assumption that the dealerships were closed at random.
His odds are meaningless. What is RLJ-McLarty-Landers happens to have remarkable market share? Or excellent customer service?
To posit an imperfect analogy, it’s like me being surprised when all the K-Marts in my area go out of business. So I do a statistical sampling of all local supermarkets and say “Ah-ha! All the Wal-Marts in the area didn’t go out of business… what are the odds of that?” And then I calculate the odds out and claim that there are nefarious plans afoot. (I love that word… afoot. Afoot, afoot, afoot.)
Thought 4: This smells like a conspiracy theory. I hate conspiracy theories. I lean toward believing that people, Republicans and Democrats, conservatives and liberals, are good people who are trying to do what they think is right.
On the other hand, if I had been editor at the Washington Post in the 70’s, I probably would have told Bob Woodward and Carl Bernstein that they were acting like crazy people.
I confess to a heavy skepticism. So I’m running the data as carefully as I can and I’ll post what I find. It might take a couple days, though. I’m not quite ready to quit my job to chase this story full time.
If you’re looking for what seems to be the best work on this so far, it’s probably at the entertainingly named Chrysler Dealership Campaign Donation Information blog. Based off an extremely quick scan of the information, it looks like Joey Smith (the author) is trying to gather data in a meaningful way.
May 28, 2009
I’m currently watching two week old episodes of Red Eye with Greg Gutfeld on Hulu. If you like outrageous, off the wall humor in your news, you really can’t do better than this show. While “The Daily Show” and “The Colbert Report” take familiar cable news concepts and parody them, Gutfeld completely deconstructs those concepts. If he wasn’t so libertarian, media professors would call his show a work of surreal genius. The show may not be as consistently funny as some others, but it is far less safe… you never know where they’re going to go and what they’re going to say when they get there.
Anyway… back to the numbers thing. They were talking about Dick Cheney’s interview with Bob Schieffer in which Cheney (in Greg’s words):
…insisted that enhanced interrogation saved a crapload of lives. That’s right, he said ‘crapload’.
OK, he didn’t, but he should have.
They then show the part where Cheney stated that:
“I am convinced, absolutely convinced, that we saved thousands, perhaps hundreds of thousands of lives.”
Now I don’t want to talk about the morality and ethics of enhanced interrogation, a topic about which I can’t even begin to talk intelligently.
But I do know a little something about numbers and I remember that, on 9/11 we were all terrified (or at least I was) when we heard how many people worked in the World Trade Center buildings. The number “50,000” was tossed around a good bit that morning. I was happily surprised when the final toll was drastically revised downward over the several weeks .
Near as I can make it, the only way the Bush administration could have saved “possibly hundreds of thousands” of lives is if they stopped a nuclear attack in a major city. And I’m going to go ahead and say that the burden of proof on them is pretty heavy for something like that.
If you bust six guys drinking beer and talking about nuking LA, you probably didn’t save that many people. If, however, you bust six guys drinking beer and talking about nuking LA… and they have a dozen gas centrifuges in the basement enriching uranium, they’re still miles away from nuking LA, but at least you can make the case that you saved a crapload of lives by busting them.
Take note, I’m not at all against going after potential terrorists. I’m just against using numbers so carelessly that they lose their meaning. The “hundred thousand lives saved” is, as Kevin Godlington stated on the show, lunacy.
As a side note, Kevin Godlington is one of Red Eye’s best contributors. He is a British veteran who provides remarkable insight on the show and also works with military charities to help British and American soldiers deal with combat stress. I’ve had a couple people ask if they could donate to help my pro bono work here. If you’ve ever thought of doing so, donate to Kevin’s charity instead.
May 26, 2009
Sadly, not all problems can be solved by the careful application of mathematics.
I’m currently trying to figure out how to appropriately calculate the yearly increases in the GDP over the past 100 years. The reason is because, according to President Obama’s budget estimates, after we get out of the recession, we will have four consecutive years of +5% growth. I’m trying to compare that growth to economic growth we’ve had in the past.
What I need to know is that, when we calculate past growth, is it properly calculated using inflation adjusted dollars or with unadjusted dollars? I seems to me that adjusted is the only way to go, but if there is an economist out there somewhere who can help me answer that question, it would be helpful in getting the statistics right.
Of course, it makes all the difference in the calculations. If we don’t adjust for inflation, then the biggest sustained growth we’ve had in the last 50 years was 1971 – 1981 in which we had 10 years in a row of +8% growth. But inflation was so bad that for a couple of those years, it actually outpaced that growth and then some, turning 8.8% growth in Carter’s last year into -4.2%.
Ultimately, if we take Obama’s numbers as adjusted for inflation, he is predicting that his policies will bring the largest sustained growth this nation has seen since the Baby Boomers started entering the workforce in the early-to-mid sixties. This would be quite a trick, since it would be happening while the Baby Boomers are leaving the work force.
If we don’t adjust for inflation, he is predicting about the same kind of economic recovery we saw from 2003 – 2006.
I’d like to know which one it is.
May 21, 2009
Today the Obama administraion launched Data.gov, a new website designed to make governmental data easily accessible to normal people (who love looking at data) and in formats that allow software developers to mine the data.
This is an excellent step towards transparency in government. The ultimate utility will matter on how many databases they allow us access to and how often they are updated, but it looks like the new go-to site for government data.
Just at a glance, we’ve got extensive data for:
- USA Spending Contracts and Purchases (searchable database)
- Benefits Data from the Benefits and Earning (Social Security Benefits)
- Patent Application Bibliographic Data (2009)
- Graphical Database of Tornados (1950-2006)
- Rain, Hail and Snow Observations
- Energy Consumption Survey (RECS) Files (1978-2005)
- Migratory Bird Flyways for the Continental United States
Lots of government gathered scientific data and a couple things that look like they might have some actual “responsible government” implications. I’d love to see more of this.
Very well done.
May 21, 2009
I’ve gotten a number of people asking some permutation of the following question:
“Why don’t you give the national debt as a percentage of the GDP as a whole? Isn’t that more meaningful/relevant?”
My answer the the latter question is “Yes and no.”
The answer is “Yes”… in the sense that if you made $50,000 per year and you had $80,000 in debt, you’re more screwed than if you make $100,000 per year and you have $80,000 in debt.
But the answer is “No” for the purposes of making a visualization for the following reasons.
First, I didn’t frame the debt in that way is because it fundamentally hides some really important things that shouldn’t be hidden. I’ll go ahead and give the game away… I’m in the business of communicating numbers clearly. And using the debt-to-GDP ration feels too much like trying to hide the real meaning of the numbers.
It feels like a car salesman who refuses to talk about the raw numbers of the car you’re buying because when he talks about monthly payments, it’s easier to screw you. Because, really, what’s the difference between $287.87 per month and $359.60? It’s not that much, is it? And if you’re already spending $300, you might as well spend $350, right?
In the same way, talking about the debt in a percentage manner is hiding the true cost. So we increase the debt-to-GDP by 2.2%… big deal, right?
But that 2.2% is the same amount as everyone in the state of Washington makes in a year. Every. Single. Person. Go look at a Google street view of Seattle and try to count how many people live in a high-rise apartment building. Take a stroll down some of the swankier neighborhoods. Look at the obscenely expensive houses that line the bay. Everything every one of those people makes in a year. The more thought you apply to the real meaning of the number, the more you see that, while 2.2% might be an accurate number to describe an increase, it doesn’t even begin to communicate the scope.
That’s the first reason I didn’t use debt-to-GDP… becuase it violates the core principle of what I’m trying to do: give a clear understanding of the scope of the issue. When people use it, it feels like they’re looking around for the best possible way to represent the problem so that it doesn’t feel as big as it is.
Make no mistake, the problem is huge. Huge in a way almost none of us understand because our brains don’t process that kind of huge very well.
There are other problems with framing the issue this way too. One is that comparing the federal debt to the GDP is something of a misnomer because the government doesn’t own the GDP. The GDP is “owned” in part by everyone in the country. And all those people and business have their own debt (mortgages, credit card debt, student loans, business loans).
Quick, off-the-cuff example using very rough numbers: Sam makes $100,000 per year, but he spending $150,000 per year. As if that weren’t bad enough, he is $500,000 in debt already. But he tells himself it’s not a big deal because his kid is in college and that will only last a couple years and, besides, he has a business protecting houses and mowing yards for a living and if you combine everything his clients make in a year, it comes out to be almost $750,000 per year.
So if you look at how much he owes compared to how much his clients make, it’s only about 70%. And if his clients make $1,000,000 next year, he could owe $666,000 and there would be no change whatsoever in his “how-much-I-owe to how-much-my-clients-make” ratio. No problem!
Except that Sam’s clients are probably a little nervous about Sam comparing the truly absurd scope of his debt to the amount of money they make every year. Shouldn’t he be comparing his debt to the money he makes every year?
I could go on at length, and perhaps I’ll make a visualization about this, but right now I’ve got to work the day job.
May 18, 2009
This graph has been going around a good deal in the last week. (Source)
Basically, the light blue line is the unemployement rate the Obama administration predicted would happen if we didn’t pass the stimulus bill back in . The dark blue line is the unemployment rate the Obama administration predicted would happen if we did pass the stimulus bill. (Here’s the raw document.) And the red triangles are the actual unemployment rate as it has panned out. Not only are they worse than the Obama adminstration expected, they’re worse than what they expected even if we didn’t pass the stimulus bill.
I think it is fair to say that the stimulus bill has not been as stimulating as they told us it would be. Now, it could certainly be the case that the unemployment rate would be even higher than this if we hadn’t passed the stimulus bill, but that is about as non-falsifiable a statement as you can get.
(UPDATE: The author of this graph explains why he thinks there has been little effect … we’ve spent almost none of the stimulus money yet. I’m trying to figure out where he’s getting his data because I don’t see any infrastructure projects on there. I’m certain that there is infrastructure spending going on right now because there is a stimulus project not 3 miles from my house causing daily traffic jams.
UPDATE 2: Here’s the best I could find on stimulus money currently being spent.)
I don’t really feel like dogpiling on the adminstration on this particular issue, so I want to hit a broader topic here… the administration’s use of numbers. This graph tells us some simple things that are scary and a complex thing that is scarier.
The simple thing it tells us is that the Obama administration was completely unable to predict the economic conditions four months into the future. They thought we would be at about 8.0% unemployment if the stimulus bill passed and at 8.5% unemployment if we sat on our hands.
As it turns out, we passed the stimulus bill and we’re at 8.9%. The easy lesson is that they didn’t get that one right. But, as Robert Strom Petersen said, “It’s tough making predictions, especially about the future.” And I probably couldn’t have done any better.
But no one is hanging the weight of hundreds of billions of dollars around my neck, which makes it more OK that I can’t project the future economic conditions. It seems fair to demand a slightly higher level of predictive accuracy from an administration that is using their predictions to push trillion dollar policies.
The complex thing that this graph tells us is that the Obama administration is comfortable using graphs that don’t really have a basis in reality in order to bolster support for their decisions. Graphs make us think that something is scientific and studied and therefore more reliable. But reliability is something that has to be earned. The team that put this graph together should be questioned on what they got wrong and what they would do next time to get it right.
Basically, the next time the president uses projected figures to push his policies, I would like to see someone ask the following question:
“Mr President, the last number predictions you threw at us turned out to be pretty far off the mark. What assurances do we have that these new numbers are accurate?”