Helping the House Advance Data Transparency
The House of Representatives is poised to make great strides forward in transparency, and our work over the last year aims to help them do that. Here’s how this spreadsheet (.xls) will do that.
In December, the House Administration Committee announced a plan to improve the publication of House documents. In January, a new site—docs.house.gov—went live. (It’s attractive looking, but still bare-bones.) On Thursday this week, the Committee is hosting a “Legislative Data and Transparency Conference” to examine what data is out there and what data should be out there. Little information is on the Web yet, but you can sign up to attend at the link just above.
I’ll be speaking on the last panel of the day, which deals with measuring transparency success. Likely, they chose me for this panel because I’ve already been grading the government on its publication practices.
Last September, you see, we graded Congress on how well it publishes data that would assist the public in computer-aided oversight. The summary blog post is called “Needs Improvement.” And then in December, we graded the government on publication of budget, appropriations, and spending data. That’s a joint legislative-executive responsibility, but mostly executive. The message was: “‘Needs Improvement’ is Understatement.”
How do you grade Congress and the government on their data publication?
You start out by modeling the data government should publish. We put together a data model for legislative process, for example, and then a data model for budgeting, appropriating, and spending. We got a great deal of help from folks at the Sunlight Foundation, OMB Watch, and others such as the National Priorities Project, as well as data guru Josh Tauberer, whose latest project is PopVox.
Even with all this help, these models won’t be the last word—there is much to learn yet about the data structure that will serve every use the public may want to make of information. But it’s a strong start.
Then we compared the data that’s actually out there to the practices described in my paper, “Publication Practices for Transparent Government,” and out popped the grades! They were pretty bad…
The House of Representatives aims to fix that—for its part, at least.
Now to this spreadsheet: it’s a list of the things that should be identified in congressional documents so that computers can find the most salient information in them. It also indicates the “vocabularies” that already exist for identifying many of them: members of Congress, bills, laws, statutes, committees, agencies, programs, and so on. We’ve talked about how to identify “budget authority” and appropriations (spending) so that computers can capture that information from bills and committee reports. Locations, state and foreign governments, times, meetings—all these things can be put into electronic versions of documents to allow computer-aided public oversight.
Once documents contain data like this in the proper structures, literally thousands of questions about Congress will be answered instantly.
- How much new budget authority has each member of Congress proposed? Voted for? Voted against? Allowed to go through on voice vote or unanimous consent? How about this same information by state? By region? Or by seniority?
- What title of the U.S. code do members of Congress most often propose to amend? What title do they actually amend the most?
- What bills affect my state specifically, such as by naming buildings, creating wilderness areas, changing boundaries on parks, or giving land to localities?
- How often do my member of Congress and senators break with their party?
These are just a few examples. In the hands of varied users, the data will be converted to hundreds or thousands of uses. It will go into studies performed by political scientists and it will supercharge news reporting. But more importantly, it will go into services that inform people directly and quickly about how their own representatives in Congress are acting and what they’re saying.
It will give people insight into where the money goes—from the moment new spending is proposed all the way through to when Congress spends it—or declines to spend.
Credit is due to the leadership in the House of Representative for starting this work. There is a lot to do before they show clear success. But they are way ahead of President Obama, whose Sunlight Before Signing transparency promise lags badly, and who has yet to put together a machine-readable organization chart for the executive branch of the federal government. He can easily do the latter, and coordination with Congress is essential for transparency success. The sooner that happens the better.
Sunlight Before Signing, Year Three
In last night’s State of the Union speech, President Obama called for tax law reforms he says we need. Cato scholars have their doubts about much of what was in the speech, but my interest was piqued by the fact that he said, “Send me these tax reforms, and I will sign them right away.”
You see signing them “right away” would again violate his 2008 campaign promise to post the bills sent him by Congress online for five days before signing them.
That’s a cheeky point, but it is time to focus on campaign promises and their honesty. The beginning of President Obama’s fourth year in office is roughly the beginning of his campaign for another term.
When I first began tracking President Obama’s Sunlight Before Signing promise, I joked with friends that it was career gold because I could write hundreds of blog posts for the next four years without thinking a new thought. Well, it’s not quite that good. This is post thirty-six in the SBS series.
(Each character in that last sentence was a link to a previous post. You can spend a whole day reviewing them!)
Last Thursday, January 19th, was the end of President Obama’s third year, so it’s time to review how he’s been doing with Sunlight Before Signing. It was the president’s first broken promise, and at the mid-point of the term he had popped just above 50% in his compliance.
How has he done in the ensuing year?
Well … meh.
SOPA/PIPA: Harbinger or Aberration?
He’s not unrestrained, but Larry Downes sees the remarkable downfall of legislation to regulate the Internet’s engineering as a harbinger of things to come. Jerry Brito, meanwhile, tells us “Why We Won’t See Many Protests like the SOPA Blackout.”
They’re both right—over different time-horizons. The information environment and economics of political organization today are still quite stacked against public participation in our unwieldy federal government. But in time this will change. Congress and Washington, D.C.’s advocacy and lobbying groups now have some idea what the future will feel like.
There’s No Machine-Readable Government Org Chart
At a recent Cato event on transparency, I emphasized that there is no federal government “organization chart” published in a way computers can use.
Here’s what I mean:
Appendix C of the Office of Management and Budget’s Circular A-11 is the White House’s definitive public listing of agencies and bureaus, along with their OMB and Treasury codes—unique identifiers for the agencies and bureaus of the federal government.
First problem: It’s a PDF document. To be computer-usable this should be represented in digital form as a lookup table.
But beyond that, it doesn’t follow a coherent organization. There’s an agency code (“200″) called “Other Defense Civil Programs,” for example. There’s obviously no agency called “Other Defense Civil Programs.” That’s a catch-all description, not an agency.
With most agencies, the bureau codes refer to bureaus, such as the Bureau of Land Management (bureau code: “04″) in the Department of the Interior (agency code: “010″), but with respect to the Department of Defense (agency code: “007″), the bureau codes become functional descriptions such as “Military Personnel” (“05″). There is no bureau in the Department of Defense called “Military Personnel.”
Even the most basic organizational information is a hash, and it’s published in PDF, unusable for computer-assisted oversight of the government!
The House appears committed to improving its publication practices. If the administration wants to advance the ball on transparency for its part, it will begin to publish coherent information—starting with basic information about the organization of the executive branch—in machine-readable form, using standardized identifiers. An edict from OMB to harmonize on identifiers down to the program level could be implemented in months, if not weeks.
My recent paper “Publication Practices for Transparent Government” talks about what to do. Our data model for budgeting, appropriating, and spending articulates how government agencies, bureaus, programs, and projects—and the relationships among them—should be represented.
Why Data Transparency?
At a recent Capitol Hill briefing on government transparency, I made an effort to describe the importance of getting data from the government reflecting its deliberations, management, and results.
I analogized to the World Wide Web. The structure that allows you to find and then view a blog post as a blog post is called hypertext markup language, or html. HTML is what made the Internet into the huge, rollicking information machine you see today. Think of the darkness we lived in before we had it.
Government information is not yet published in useable formats—as data—for the public to use as it sees fit. We need government information published as data, so we can connect it in new ways, the way the World Wide Web allowed connections among documents, images, and sounds.
And when you connect data together, you get power in a way that doesn’t happen with the web, with documents. You get this really huge power out of it.
Tim Berners-Lee was not thinking of wresting power from government when he said that, but the inventor of the World Web does a better job than I could of arguing for getting data and making it available for any use. We’ll look back on today with bemusement and surprise at the paucity of information we had about our government’s activities and expenditures.
House Transparency Slated to Improve
Perhaps my mean grading has contributed to nascent competition between the Republican House and the Democratic administration for the transparency prize. Last Friday, the House Administration Committee adopted standards that “require all House legislative documents be published electronically in an open, searchable format on one centralized website.”
At a September Cato Capitol Hill briefing, I rated Congress on the quality of the data it publishes reflecting its membership, activities, documents, and decisions. Its grades weren’t that good. At a briefing last week, I graded the data about federal budgeting, appropriations, and spending, which is largely an executive branch responsibility. Those grades weren’t very good either.
Able and dogged transparency advocate Daniel Schuman at the Sunlight Foundation has a good write-up up the House’s move to produce good data—he and Sunlight certainly did their part to encourage it—though I’ll quibble with one particular. The adoption of the document—a two-page outline of what should be standardized, and not a standards document itself—was not really “a tremendous step into the 21st Century.” It was an outline of a course to improved transparency. 21st-Century transparency.
What is required to produce that transparency? My recent paper “Publication Practices for Transparent Government” sought to establish guideposts for publication of data that will foster public access to meaningful information about what happens in Washington, D.C. The practices, in ascending order of importance and difficulty, are: authority, availability, machine-discoverability, and machine-readability.
Putting all documents on a single site will enhance authority. People will know where to look, and what source to trust. In our rough grading system, we weighted the simple practice of authoritative publishing at 10% of the total grade.
The second practice, availability, means ensuring that the data is complete, that it remains permanently in the same location, that it is not proprietary itself, and that it is not in a proprietary format. This is likely to be fulfilled by adherence to the Committee’s language and basic good practices. Availability we weighted at 20% of the total grade.
Machine-discoverability is when data is identified and located consistent with a variety of good practices going to the naming and locating of Internet resources. It’s weighted at 30% of the total grade in our system for rating data publication. It is likely that the House will develop good practices, but it will be important to watch and see that it does.
Machine-readability is the most important part of transparency. It means publishing data so that the logical relationships among elements are clear, and so that computers can automatically detect the semantic meaning of the documents and data they examine.
This is where the House Administration Committee’s release is least clear. Documents like bills and committee reports could be published so that each reference to existing law, to federal agencies, bureaus, and programs, to newly authorized spending, and to a variety of other items and entities are automatically discoverable in the document.
You should be able to do a quick search, rather than labor for hours, to see what bills affect the Labor Department. You should be able to see every dollar authorized or appropriated in every bill, nearly instantly. The data should be a foundation for dozens of sites and services that disseminate iformation in different ways to different audiences.
Here’s hoping that the House Administration Committee’s standards drive all the way to machine-readability. It will be a step into the 21st century if the House provides data the Internet can use and that the Internet-connected public very much wants to see.
Coming through with robust machine-readability will handily take the transparency mantle from President Obama, who promised transparency as a campaigner, but who was not produced the vibrant, different government people wanted. As I noted in a write-up last week, the administration has some low-hanging transparency fruit that could bring its grades up decisively. House Republicans are first out of the gate.
The DATA Act and Cato’s Transparency Work
In his final “Chairman’s Corner” blog post as head of the White House’s Recovery Act Transparency and Accountability Board, Earl Devaney highlights the need for orderly publication of data about government spending.
There is bi-partisan legislation now in the Congress—it’s called the Digital Accountability and Transparency Act, or DATA Act—that could accomplish this mission. But the reform bill faces an uphill battle, primarily because some in the bureaucracy prefer the status quo—a hodgepodge of data collection and display sites that, frankly, makes no sense at all unless you believe your government should confuse you.
The DATA Act would establish an independent board within the executive branch to track federal spending, and it would require federal agencies and recipients of federal funds to comply with reporting requirements set up by the board.
The board would “designate common data elements, such as codes, identifiers, and fields, for information required to be reported by recipients or agencies” (section 102 of the reported version, adding a new §3611 to title 31 of the U.S. code). The bill’s author, Rep. Darrell Issa (R-CA), spoke at our September Capitol Hill briefing, rolling out our legislative data model.
On Wednesday, another Cato Capitol Hill briefing highlighted the results of our work the last few months to model federal budgeting, appropriating, and spending. Should the DATA Act become law, the model we’ve been working on can illuminate the work of the proposed board. Use of our model will help ensure that the structure of government spending data supports public oversight use cases.
I don’t know that there needs to be a board—certainly not a permanent one. The bill authorizes more money than I think is required for the board, and the Congressional Budget Office’s cost estimate for implementing the requirements of the DATA Act seems wildly high. But the dynamics set in motion by making government spending more transparent may well reduce government spending by well more than even these high estimated costs.
Government Spending Transparency: ‘Needs Improvement’ Is Understatement
Back in September, I rated Congress on how well it is publishing information about its deliberations and decisions. “Needs Improvement” was the understated theme.
Now we’re looking at the government’s publication of data that reflects budgeting, appropriations, and spending. “Needs improvement” isn’t just understated in this area. It’s really, really understated.
On the budgeting, appropriations, and spending transparency report card I’m putting out today, B+ is the best grade—and it goes to just half of one subject area. There are 2.5 Cs, 3 Ds, and 4 incompletes. This area needs improvement.
What is transparency, anyway? In my briefing paper, “Publication Practices for Transparent Government,” I wrote about the publication practices that support transparency. They are: authority, availability, machine-discoverability, and machine-readability. That means putting good data out from a consistent source in sensible ways, and, especially, structuring the data so that computers can interpret it.
You know what the World Wide Web is? It’s a whole bunch of structured data. If you want the kind of breakthrough in transparency for government data that the Web was for communications, you want the data structured right.
Our draft structure for data in this area is in our “Conceptual Data Model of the U.S. Federal Government Budgetary Process.” (HTML version, Word version)
Structured data doesn’t really exist yet in the area of budgeting, appropriating, and spending. The one bright spot is the president’s annual budget submission, which includes some information in a workable structure, but there is much room for improvement even there.
Because I’m so nice, I’ve given a lot of “incompletes” where I could have—and some say should have—given Fs. Believe it or not, there is NO federal government “organization chart” that is published in a way computers can use. That’s one of the building blocks of computerized oversight, and its absence is easily rectified.
When we return to these issues in the summer or fall of next year, and review more formally how Congress and the administration have done on transparency, I expect these things to be fixed. (Fear the blog post!)
In the meantime, here’s a run-down of the grades and why they were given. A Hill briefing today might be available online at the page for the event. (It’s somewhat symbolic that the room we have on Capitol Hill is ill-equipped for live-streaming, but we’re going to try.)
I’ve alternated in this post between “I” and “we” because I’ve gotten so much help on this. People from OMB Watch, the National Priorities Project, and the Sunlight Foundation have helped a great deal with this project, to name a few—and omit many others! The grades, the commentary, the errors, the misstatements, and omissions are all mine. And there are going to be plenty of gaps in this work. That’s why this is a blog post and not a formal Cato publication.
Transparency and Its Discontents
Remember when you had to wait until the end of the month to see your bank statement?
Last week, on the cusp of failing to pass any annual appropriations bills ahead of the October 1 start of the new fiscal year, congressional leaders came up with a short-term government funding bill (or “continuing resolution”) that would fund the government until November 18th. For whatever reason, that deal (H.R. 2608) wasn’t ready to go before the end of the week, so Congress passed an even shorter-term continuing resolution (H.R. 2017) that funds the government until tomorrow, October 4th.
Every weekend, I hunch over my computer and update key records in the database of WashingtonWatch.com, a government transparency website I run as a non-partisan, non-ideological resource (disclosure: it’s my own, not a Cato project). Then I put a summary of what’s going on into an email like this one (subscribe!) that goes out to 7,000 or so of my closest friends.
Last weekend, the Library of Congress’ THOMAS website, which is one of my resources, was down a good chunk of the time for maintenance. Even after it came up again, some materials such as bill text and committee reports weren’t available. (They had come up by the wee hours this morning.) Maintenance is necessary sometimes, though when the service provider I use for the WashingtonWatch.com email does maintenance, it’s usually for an hour or so in the middle of a weekend night.
But when I went to update the database to reflect last week’s passage of H.R. 2017, I could find no record of its public law number. When a bill becomes a law, it gets a public law number starting with the number of the Congress that passed and then a sequential number, like Public Law No. 112-29. The Government Printing Office’s FDsys system lets you browse public laws. At this writing, it isn’t updated to reflect the passage of new laws last week. When THOMAS came back up, its public laws page also had no data to reflect the passage of that continuing resolution last week (and still doesn’t, also at this writing).
There is barely any news reporting on humdrum details about governing like the passage of a law expending $40 billion in taxpayer funds. (That’s about what H.R. 2017 spends to operate the government four more days, roughly $400 per U.S. family.) Where can you confirm with an official source that this happened?
The winning data resource this week, if by default, is Whitehouse.gov, which has a page dedicated to laws the president has signed. That page says that President Obama signed four new laws on Friday (Sept. 30). When might FDsys or THOMAS reflect this information? It’ll happen soon, and that data will start to propagate out to society.
But I think that’s not soon enough. A couple of days’ delay is a big deal.
If I were to take $400 in cash out of my bank account at an ATM, I could review that transaction from that instant forward on my bank’s website. If I had a concern or even a passing interest, I could just go look. That is an utterly unremarkable service in this day and age.
But it’s remarkable that such a service doesn’t exist in systems that are as important as our bank accounts. When Congress and the president pass a bill to spend $40 billion dollars, the fact of its passage is pretty much undocumented by any official sources until enough Mon-Fri, 9-to-5 work hours have passed.
In my recently published paper, Publication Practices for Transparent Government, I go through the things the government should do to make itself more transparent (thus improving public oversight and producing lots of felicitous outcomes). A practice I cite is “real-time or near-real-time publication.” Why? Because then any of the 300 million Americans who have an interest, real or passing, can see what is happening with their money as it happens, just like they can with their bank holdings. People like me (and many more) can propagate complete and timely information, making it that much more accessible.
When you’re talking about a potential audience of 200 million people and $40 billion in expense (one of the tiniest spending bills—others are much larger), it is not too much to ask to have the data published in real time.
I don’t expect a lot of people to join me at the barricades with pitchforks and torches on this one. Government transparency is an area ruled by implicit demand. People don’t know what they are missing, so they don’t know to suffer a sense of deprivation. I do that for them—all of them. (Heroic, idn’t it?)
Before too long, though, the government’s opacity will be recognized as a contributor to the public’s general—and strong—distaste for all that goes on in Washington, D.C. The idea of spending $400 per U.S. family without documenting every detail of it on the Internet will seem as absurd as waiting until the end of the month to see what happened in your bank account.
A ‘Soviet-Style Power-Grab,’ to Squelch Bad Press for ObamaCare
The Department of Health and Human Services has released new guidelines on communications between department employees and the media. The guidelines evidently require all communications to be approved by the Assistant Secretary for Public Affairs. Also: no off-the-record communications.
The media are not happy. The editor of FDA Webview & FDA Review writes (via Poynter; more here):
The new formal HHS Guidelines on the Provision of Information to the News Media represent, to this 36-year veteran of reporting FDA news, a Soviet-style power-grab. By requiring all HHS employees to arrange their information-sharing with news media through their agency press office, HHS has formalized a creeping information-control mechanism that informally began during the Clinton Administration and was accelerated by the Bush and Obama administrations. The U.S. now takes a large step toward joining other information-controlling countries like my native Australia, where government employees who talk with the news media without permission commit a federal crime. I came to the U.S. in 1974 to escape this oppression.
The HHS guidelines once again show that the purpose of a public information office is not to disseminate information to the public but to withhold information from the public.
Since this came on the heels of an HHS official announcing that the agency is scuttling ObamaCare‘s long-term care entitlement, a.k.a. the “CLASS Act,” one wonders if there is a connection. Or maybe HHS is just motivated by a general fear that the more the public learns about ObamaCare, the less we will like it.
(Update: Turns out, HHS released their new guidelines the same day that agency official voiced his opinion about the future of the CLASS Act. HT: Chris Jacobs.)
Congress on Transparency: ‘Needs Improvement’
“Needs improvement” is the understated theme of a Capitol Hill briefing this morning entitled “Publication Practices for Transparent Government: Rating the Congress.” (Live-streamed starting at 9:00 am. If timely, check it out—the video will come up before too long also—and join the conversation on Twitter at the #RateCongress hashtag.)
Congress needs to improve its data publication practices if it’s going to be the transparent legislature that it should be.
How did we arrive at this conclusion? We’re doing more than stating the obvious.
A Cato Briefing Paper released today entitled “Publication Practices for Transparent Government” goes through some technically challenging but essential concepts in data publication: authoritative sourcing, availability, machine-discoverability, and machine-readability. Together, these practices will allow computers to automatically generate the myriad stories that the data Congress produces have to tell. Following these practices will allow many different users to put the data to hundreds of new uses in government oversight.
At the event, we’re releasing informal grades that rate how each of the major parts of the legislative process are published as data. To produce the grades, we constructed a “data model” of formal federal legislative processes (HTML version, Word version).
Rating Congress on Transparency
Tomorrow morning, I’ll be officially releasing a paper entitled “Publication Practices for Transparent Government” at a Hill briefing entitled “Publication Practices for Transparent Government: Rating the Congress.”
If you’re a smart and savvy Internet user, you probably noticed that the paper is there at the first link above, unofficially released just for you. This qualifies you to read it and get some of the fascinating and different technical aspects of transparency.
This is all a teaser for our release tomorrow of “grades” on how Congress is doing with publishing data about the essential parts of its legislative work. For that, you’ll have to attend the event or watch it live-streamed (here, commencing at 9:00 Eastern with remarks from House Oversight and Government Reform Committee Chairman Darrell Issa (R-CA)).
If you like transparency—and chances are you do—you can help spur discussion tomorrow (or even today) using the hashtag #RateCongress, along with, of course, #transparency. (Don’t know what a hashtag is? Well, here’s a little help.)
Despite good faith efforts on the part of the Obama administration and congressional leaders, government transparency hasn’t flourished as it could the last few years. The paper, event, and “report card” are intended to spur progress on that front.
Transparency is interesting not only technically and administratively, but ideologically. Libertarians and conservatives believe it will expose waste and corruption, fomenting downward pressure on the size and scope of government. Liberals and progressives believe transparency will expose waste and corruption, validating many government programs and roles.
I say let’s get on with exposing waste and corruption, so we can find out what happens next!

