Yesterday the A2B3 group met for lunch, as we do almost every Thursday (although it is a lucky week when I am able to attend).* The A2B3 group is my model of the most successful kind of social network, one which bridges and blurs the divide between online and offline. A traditional value-added portion of each week’s face-to-face meeting is The Question. At 12:30, either Ed Vielmetti (the official lunch convener) or one of his designated representatives (commonly known as faux-Ed) stands up and poses a question of the day, which then makes the rounds of the table. Yesterday’s question centered around the proposed closure of Data.gov, which I blogged about earlier this week. I asked permission to share my copious notes about the lunch conversation here, and people consented with a surprising amount of enthusiasm.
Faux Ed for the day was Roger Rayle, who introduced The Question in the context of both the potential closing of Data.gov and the Gulf Oil Spill. Roger had attended the Risk Science Unplugged event that morning on Gulf Oil, where they said there was so much they couldn’t answer and couldn’t say because “We don’t have enough data.” Data is important, government data is especially important, but (as Roger pointed out) a lot of data even when it’s available isn’t usable. So let’s talk open data!
Kathy Griswold brought up the open data archive for local governance, A2Docs.org. The conversation slipped over into problems of redaction and privacy and personal safety (especially of children and other vulnerable individuals), especially when documents are contributed by the general public as with this project. The leading suggestion in the conversation was to encourage people to create collections of local governance materials on a topic and link to them from the discussions in ArborWiki.
Gyll Stanford emphasized the view that the Internet is an amazing place, where astounding things are discoverable and once something is available it doesn’t truly disappear. Close Data.gov and the world will step up to the plate and take care of its business. That led to a brief discussion of who is already stepping up to do this, and Google’s Freebase community of people collecting and sharing open data.
Steve ” I trust little of data on the web” Cornell described issues of authenticity, authority, reliability, accuracy and credibility, with an emphasis on the need for checks and balances with curation and oversight. This was heartily and enthusiastically endorsed by the rest of the table.
I (PF Anderson) mentioned more about Google Freebase, that librarians are actually pretty good at curation (hint, hint), the need to both be aware of potential bias in curated collections and urging people to make overt the need to balanced and unbiased collections that show a range of information resources.
Steve Pierce basically said, “Whoa, folks”, expressing the view that when government doesn’t have the resources to manage the data themselves or when the amount of data is beyond the resources of a local government, that partnership with local companies and individuals can be of great value to both. Then he described a 5 year collection of videos of Ypsilanti government meetings that he is working with Archive.org to have archived. The two main observations from the peanut gallery were (1) “If the government isn’t doing it, who’s job is it?” and (2) Collaboration with the public and industry is fine as long as there is some assurance of PRESERVATION, archiving and long term oversight.
Dennis Tokarski was concerned about the balance of how tax cuts are being designed with cutting services such as Data.gov that are core to an economic recovery. Question raised included,
– Is FOIA going to become the main route to government data?
– How are the interests of citizens protected if private concerns become the gatekeepers of public data?
– Is public data a public good? and if so, shouldn’t it be accessible to the public at cost, if not for free?
Another comment from the table was about repeated efforts to privatize NOAA weather data, which was met with a table-wide outcry of disbelief. Dennis’s phrase (which was the popular phrase of the day) was “bogus to the bone.”
Rick Adler of Future Forward is an archivist by training, so this was all topics near and dear to his heart for which he expressed great passion. He began with the illustration of the estimate of a 10 year shelf life for CDs and DVDs used for data (disbelief was expressed around the table, so this will be a separate blogpost topic). This shifted into the very practical, real world limitations and costs associated with creating and supporting data archives, with an emphasis on data migration, backlogs in processing, the need for appropriate metadata, and curation / verification of the authority and reliability of the data being preserved. Basically, you might think of this as data husbandry. He referenced Barbara Hegstrom’s view that because of these issues not yet having a good solution in place as common practice, that we are living in the middle of what may very likely end up being a 30-year Dark Ages, where content created was digital but lacked necessary ways to archive. “People will know more about 1950 than they will about 2011, or even 1980.”
Dan Romanchik made the astute observation that, since this is really all about government spending, if you want the biggest impact on how your tax dollars are being spent and used, pay very very close attention to your local politics and local elections.
Jean (last name not caught) began her academic and professional career as a statistician before realizing that a great part of statistics is “cooking the data.” She emphasize the essential role of access to original data sources for verifying what someone is actually saying with data.
Barbara Bergman relayed a curious tale of local government data (check stubs) kept in an online public archive but without the contextual information that would make them meaningful. This led to a discussion about the differences between data and meaning, the impact of costs on these archives, and the need for government to charge what they can of people using the data to support the archives. The most interesting observation from the table (or from Barbara?) was the idea of creating an information path tool that would allow minutes posted online to be linked to the decisions that ultimately came from those discussions and meetings. There was a side conversation about elections, districts, gerrymandering and a gerrymandering boardgame.
Dan Friedus made a pointed and critical observation that while certain data crucial to the functioning of society really needs to be open, there are other types of data that may be luxuries — nice to have, but not essential. Cost savings in data archiving may be dependent on priority setting.
Ed Vielmetti pointed out that open data that is not delivered to the stakeholders is a process failure. An example was provided of a map that shows firefighter response times to different parts of the city, but included only data from the city itself and did not show data from partnerships and collaborations with fire departments of other local townships, thus creating a perception of danger and urgency that does not actually exist. Another example was provided by someone else about high school crime data provided to support a videocamera system, but the data may or may not show a surveillance system to actually be a deterrent.
Round the table discussion tied up with a general overview of the limitations of open data and data in general — politics, agendas, bias, costs, redaction, and more. An opinion was expressed that the reason Data.gov is being targeted for closure is because it has been too useful and too important, and that government is now seeking to reverse the earlier stance on transparency. Personally, I hope very much that this is not true.