Many of you know that part of the reason for the decline in my online presence is that I’ve been hard at work for several months now on a few book chapters, the most recent pair about searching for information to support evidence-based practice in dentistry. The one I’ve trying so hard to finish right now is on searching PubMed. As part of this, I am trying to give a little bit of background on where PubMed comes from as part of trying to explain why certain features work the way they do now, sort of how evolution and early constraints shapes the later versions of the tool. For this, even though I rarely spend more than a sentence or two on any specific piece of history, I am searching for articles and content to validate dates of when I think things happened, and similar sorts of proof to support what I’m saying. God forbid I cite the evidence, eh? (Yes, that’s sarcasm, or irony, or something along those lines.
At one point last week I was searching for information about the origins of MeSH, and was delighted to discover a link on the MeSH homepage for their online exhibit about the 50th anniversary celebrations for MeSH.
Unfortunately, it was a dead link, which surprised me. When did the history of MeSH and the 50th anniversary celebration become “grey literature”, or rather simply lost? Well, last week. I sent a quick email off to Customer Service at NLM on July 31st, and received a reply the following day. To my complete surprise, the reply stated that the link was to old content that had been deleted from the site, and the link to the content should have also been deleted. “The link was meant to be removed but we have the contents as pdf files.” True to their words, they promptly deleted the link from the page.
I asked why, and was told it is part of their policy to keep web content fresh and lively, as is true of so many other organizations.
Alright, yes, that is a good idea in general, and it is official policy, and there are good reasons for it, but … but … but … how on earth is someone supposed to know that such content ever existed, or that it was preserved as PDFs? How would someone discover that it existed to even ask for a copy? Don’t we want copies of information of interest about the history and origins and evolution of our profession? MeSH is so inextricably intertwined with medical librarianship that it seems to me essential to preserve not only this information but also ready access to it, DISCOVERABLE access.
I understand that the persons involved are simply doing their job the best they can and as they have been instructed to do it. I am not blaming them (which is why I am not giving any names). I see this as a symptom of a broader problem at a higher level. Policies of that sort are usually developed by and for the workflows of “webmasters, IT staff, and those program officials responsible for web content.” Personally, I find it shocking, perhaps even distressing, that a library, especially a library the caliber of the National Library of Medicine would choose to honor a well-intended policy that diminishes access to useful public information rather attempt to inform policy makers of the impact and to try to inject some insight and perspective into the policy reformation process. But that is my perspective, and possibly only mine.
The official guidelines from the National Archives and Records Administration (NARA) do include policies that allow for the retention of information as well as for the disposal of information, or, as they put it, “records that have been destroyed.” Those NARA guidelines focus on trust, risk, mitigating risk, and responsibility. The guidelines include answers to such questions as “Does managing agency web sites as Federal records mean that I must keep all page changes for a long time?” That was a particularly interesting answer, also.
Q: Does managing agency web sites as Federal records mean that I must keep all page changes for a long time?
A: No. As MANAGING WEB RECORDS and SCHEDULING WEB RECORDS discuss in greater detail, your agency business needs, including the risks to the agency programs and mission should the information not be available, are the major factors in determining how long you need to keep those pages. Your web site schedule specifies the length of time you need to keep pages.
There are some very useful thoughts and considerations in these documents, even though they were drafted in January 2005 and have not been thoroughly updated since them. [ASIDE: There was an addendum issued in 2010 on "recent web technologies" including blogs and wikis, which expires October 31, 2013, so hopefully we'll soon see something more in keeping with the current state of web technologies and trends.] The part that most interested me right now was how they archive content (they recommend “spiders” and “web snapshots”), and how they determine what content should be archived.
NARA Guidance on Scheduling Web Records: How are retention periods for web site-related records determined? http://www.archives.gov/records-mgmt/policy/managing-web-records-scheduling.html#retention
“[T]he agency needs to assess how long the information will be needed to satisfy business needs and mitigate risk, taking into account Government accountability and the protection of legal rights. If specific web content is available in places other than the web, consider whether the existence of the information in other records affects the retention needs for the web records. In the case of information unique to the web site, the web version is the only recordkeeping copy.” NARA Guidance on Scheduling Web Records.
Note especially, “the case of information unique to the web site.” The question becomes how valuable and relevant that information is over time, how worth preservation. There is other information about the history of MeSH. There is the valuable but brief introduction from NLM, duplicated in the MeSH Preface, and a 2006 variant of the same text.
NLM: History of MeSH: http://www.nlm.nih.gov/mesh/intro_hist.html
As part of the 50th anniversary celebration, there is an online copy of the first volume of MeSH, which I discovered only through a brief blogpost from the NNLM Southeastern/Atlantic Region.
Regarding web-searchable content of the actual 50th celebration itself, we are primarily reduced to the video from the presentation (lacking the transcript, and not located in YouTube for sharing or embedding); an announcement in the NLM Technical Bulletin; and myriad blogposts referencing the now defunct website.
Robert Braude. MeSH at 50 – 50th Anniversary of Medical Subject Headings (The impact of the Medical Subject Headings (MeSH) vocabulary on access to biomedical information.) http://videocast.nih.gov/Summary.asp?File=16292
50th Anniversary Medical Subject Headings (MeSH®) Event. November 02, 2010 [posted]. NLM Technical Bulletin 2010 NOVEMBER–DECEMBER No. 377.
I did search in Google for the actual title of Dr. Braude’s presentation (“MeSH at 50 or Should It Now Join AARP”), and found one hit, from a chemical industry page evidently created by scraping the web through a spider and still online.
Oh. Dear. He did such a splendid presentation, and now we can’t even find out that he had done it.
“As with other agency records, most web records do not warrant permanent retention and should be scheduled for disposal in accordance with the guidance provided above. In instances where NARA determines that a site or portions of a site has long-term historical value, NARA will work with the creating agency to develop procedures to preserve the records and provide for their transfer to the National Archives.” NARA Guidance on Scheduling Web Records.
Was the MeSH 50th Anniversary content archived with NARA? I don’t know. I don’t know how to find out. I did have an idea for how to find what was missing. If I can’t find the government’s information from the actual government, if I can’t trust the government to keep available the information I need or want from them, I look in the Internet Archive. The Archive is not a government organization. They are “a 501(c)(3) non-profit that was founded to build an Internet library.” What happens when the Archive runs out of money, I don’t know. I will say the idea scares me.
Meanwhile, I was able to find an archived copy of the main page before the link was deleted.
Archive.org: NLM: MeSH: http://web.archive.org/web/20130727172025/http://www.nlm.nih.gov/mesh/
Why couldn’t I find this in Google? Because the Archive is part of what is known as the Internet’s “Deep Web.” The Deep Web is, according to Wikipedia, “The Deep Web (also called the Deepnet, the Invisible Web, the Darknet, the Undernet or the hidden Web) is World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines.” Most websites that require you to perform a search to get to their content would be considered part of the Deep Web, especially if the search results do not generate a persistent URL. If the results do generate a permanent URL, then it is possible (although challenging) to create a resource that maps those links to the deep content of the site in a space which is searchable by Google.
That’s what I’m going to do now, for the web pages for the MeSH 50th anniversary. I’m doing this because I want to be able to find it again, more easily than it was for me this time. I’m doing this because Robert Braude said important things about MeSH and how it got here, because he gave faces and lively personalities to the people behind this famously dull and detailed masterwork, because he (and the rest of the 50th celebration site) gave a context that I have never seen anywhere else. Here are just a few of my favorite quotes from Braude’s presentation.
“When I received the invitation to speak today on the history of MeSH, I was truly shocked. I wondered how the History of Medicine Division dredged up my name but then I realized — I was NOW history.”
* * *
“Rather I choose to focus on the antecedents of MeSH, the fertile soil prepared by so many from which MeSH grew. These antecedents, shrouded in the dim mist of history, are, I think, of more interest. Revealing them, I believe, will give us a stronger sense of how far back the chain of MeSH development goes.”
* * *
(Quoting Janet Doe) “It is, moreover, economically unsound for all of our individual libraries to be trying to do for themselves what can only be adequately done by experts drudging away tirelessly for years on a fully representative collection of material.”
* * *
“Why MeSH; what were the forces shaping the effort to create such a resource?”
* * *
“Stan Jablonski, esteemed author of the Illustrated dictionary of eponymic syndromes and diseases and their synonyms and the Dictionary of medical acronyms & abbreviations was there, towering above us all physically as well as intellectually. Coffee breaks with Stan were a treat and an education. And I will never forget having to turn in my used pencils at the end of the day to Gus Gillespie since funds were just as tight then as they are now.”
* * *
“One of the problems with the constant changes to MeSH was searching backwards in time, for one needed to know what heading had been previously used.”
* * *
“The issue raised by Claudius Mayer was that there was no way a single authority list for cataloging monographs and indexing the periodical literature could be developed. Wrong Claudius, Dr. Rogers did it with MeSH.”
Here are the links to the Archive’s copy of the MeSH 50th Anniersary pages that have been lost to Google search.
Celebrating MeSH: 50 years of Medical Subject Headings
50 Years of Medical Subject Headings:
Past, Present, and Future Impact on Biomedical Information
Robert M. Braude, MLS, PhD, AHIP, FMLA, FACMI
Thursday, November 18, 2010