Livetweets from Sharing Research Data

I had really tried hard to make it to the Sharing Research Data session today, with absolutely no luck rearranging my schedule. Then, completely unexpectedly at the last minute, voilà! There I was. The speakers included Dr. Philip Andrews, Alex Kanous, Felicia LeClere, and colleague Greg Grossmeier. There was a video camera there and slides and such, so I have every hope that more information from the session will be forthcoming. Also, please remember that I was just an audience member taking abbreviated notes via Twitter, so don’t take anything here as a quote from the speakers. I try hard, but may have misinterpreted something here or there. Given that there were legal folk on the stage I want to be clear that I’m not quoting them word for word even if it looks like it. Also, since the transcript was collated by the What The Hashtag tool, timeshift the datestamps backward four hours for the true local time (event starting at around 2:30pm EDT). Those caveats in place, I hope this is useful!

July 27, 2010
6:43 pm pfanderson: Philip Andrews: Resources for data sharing, tranche & #datashare
6:44 pm WARNING: Live-tweeting: Sharing Research Data: Perspectives from Campus Community #UMich #datashare
6:46 pm PA: Peer-review in our field badly broken, & it is impacting on the profession, data validation imp #datashare Economics push data reuse.
6:47 pm Frustration getting data out to international peers #datashare Systems lacked ease of access, data structures, Heidorn 08 > “dark data”
6:48 pm PA: “Even if you build an archive, it doesn’t mean it will be useful” Like a roach motel – goes in, doesn’t go out #datashare (D.Salo 08)
6:50 pm PA: Researchers: Sensitivity to errors, intellectual rigor, credit, competitive. Need: incentives, guidelines, sharing resources #datashare
6:54 pm PA: Lack of data sharing results: assertions w/o data, breakdown in peer-review, data rarely reused. False positive problems #datashare
6:55 pm PA: Paris Guidelines for Publication for Proteomic Data #datashare
6:57 pm PA: About the Tranche Project #datashare distributed repository NOT a database. Provenance & fidelity inherent
6:58 pm PA: Data licensing – very cool. >12.6 *terabytes* of data in tranche. Integrated w/ ProteomeCommons #datashare
6:59 pm PA: Tranche has captured HALF of the proteomics researchers in the world! #datashare
7:00 pm PA: Uses: develop test new algorithms; peer review; re-analysis; new experiments; aggregate 4 stat signif; testsets 4 software #datashare
7:02 pm PA: #datashare makes reuse much higher (~4K downloads). Upsurge since new MPC requirements. >9K datasets deposit Restricted license REDUCES use
7:04 pm PA: ProteomeXchange #datashare pushes raw data to many locations, web of databases
7:05 pm PA: repositories w/o annotation aren’t good enough. Incentives & annotation are crucial & big problems #datashare
7:07 pm PA: What annotation is needed? What original investigator needs? Subsequent researchers? Bioinformatician? Controlled vocab? #datashare
7:08 pm PA: He makes a joke about investigators cursing when the bioinformatician asks for detailed metadata about their data #datashare
7:09 pm PA: w00t! Built a social network model for project management of #datashare data sets. Annotation manager allows users to nudge researchers
7:14 pm Next up: Alex Kanous – Data Sensitivity Analysis: Data Sharing & Security Framework (DSSF) #datashare
7:16 pm AK: caBIG Cancer Biomedical Informatics Grid #datashare includes tools for analysis
7:18 pm AK: #datashare Tech Tools: caGrid, Clinical Trials Suite, more. Legal constraints: DSSF policies. Great phrase > “Trust Fabric”
7:19 pm AK: SOCIAL CONSTRAINTS (researcher caution/resistence) addressed by education & incentives #datashare
7:19 pm AK: DSSF: #datashare
7:23 pm AK: #datashare audiences include IRBs, Privacy officers, tech transfer folk, etc. Local resource
7:25 pm Now speaking: Felicia LeClere ICPSR new funders very concerned with data sharing issues. Focus on human subjects issues #datashare
7:26 pm FLC: What about legal/ethical issues in #datashare ? Waste of NOT sharing data is huge, normative culture evolving.
7:29 pm FLC: Social & behavioral science have special issues in #datashare >> human issues, privacy, regulatory climate, climate of fear
7:31 pm FLC: “The Common Rule” & HIPAA #datashare
7:32 pm FLC: Deidentification is only part of the picture. Data on folks who’ve died is easier. FERPA has its own issues, changed in ’04 #datashare
7:35 pm FLC: Protection in the research process? Informed consent can restrict #datashare Need 2 plan 4 datasharing BEFORE, reconsent-ugh #datashare
7:36 pm bacigalupe: Social & behavioral science have special issues in #datashare > human issues, privacy, regulatory climate, climate of fear @pfanderson
7:37 pm pfanderson: FLC: IRBs have different standards in different places, often consider #datashare problematic. Imp NOT to be cavalier w/ data, educate IRB
7:38 pm FLC: Disclosure risk: risk of people being re-identified after release of data. Uh oh. #datashare Horror stories of phone #s sent via email
7:38 pm bacigalupe: Live-tweeting by @pfanderson : Sharing Research Data: Perspectives from Campus Community #UMich #datashare //thanks
7:39 pm pfanderson: FLC: Example of disclosure risk: only Vietnamese oncologist in small town in Wyoming. Deductive disclosure. #datashare
7:42 pm FLC: Complicated issues in preventing deductive disclosure, how to identify risk / mask, add noise, convert format, license #datashare
7:44 pm Now speaking: Greg Grossmeier. Reproducible science = good science. Open data + open source/standards = reproducible science #datashare
7:46 pm GG: articles w/ shared data cited WAY more than those w/ restricted data. People trust you more. He shows detailed stat analysis #datashare
7:48 pm GG: Empirical observation is NOT protected by copyright law. Data is typically not protected per se. EG: Feinberg suit. #datashare
7:49 pm RT @jasonriedy Yup. To add noise, U need a gd model 4 data, but obtaining a good model wants data released. Biting me everyday. #datashare
7:52 pm GG: Attribution does not equal citing. It does not equal social norms. What is socially required may be diff than what is legal #datashare
7:53 pm GG USA: Fact vs collection; collection vs creative choice. IE. a collection of Mother Goose > order protected, not poems #datashare
7:54 pm GG: Aha! Creative Commons Zero (0) means you waive ALL your rights. Effective public domain from day 1. #datashare Doesn’t include privacy
7:57 pm GG: The open database license applies ONLY to database, not to the individual items. IE. Flickr. images yrs, resource theirs #datashare
7:59 pm Q&A starting for #datashare event
8:00 pm Asked FLC @jasonriedy ‘s question. Adding noise needs to be done by original researcher. Is it worth sharing once masked? #datashare
8:02 pm FLC: @jasonriedy She agrees this is real prob, prefers jumping thru hoops to get her own access to the pure data, build own model #datashare
8:05 pm GG: Example: photo of dr dissecting leg. Not choices, designed 2B clear, not art. More accurate is less protectable by copyright #datashare
8:06 pm “Someone will scoop me if I put data out” “someone will infringe on my data” “I will infringe if I use others’ data” #datashare
8:08 pm A: FLC Paranoia exists. Lots of anecdotes. Driven by tolerance for risk, career anxiety. #datashare Need to gather data to address fears
8:09 pm A: FLC: Concerned that fears will drive conservative reverse over-reaction from government. #datashare
8:09 pm PA: PIs concerned that they accidentally release data to public before their article is accepted. Managing that process w/in tool #datashare
8:11 pm PA: Technologies R always changing. New types of data arise, new formats. Problem: this leaves behind info that isn’t in standard #datashare
8:12 pm PA: EG. standards can’t do everything, so perhaps don’t include information on the instrument used to gather data #datashare
8:13 pm FLC: Transformation of how research is done, driven by the research community itself. #datashare
8:14 pm rivenhomewood: RT @pfanderson: GG USA: Fact vs collection; collection vs creative choice. IE. a collection of Mother Goose > order protected, not poems #datashare
8:15 pm pfanderson: FLC: Mentions Harvard MIT Data Center, w/ ISCPR, includes preferred citation format in the data download #datashare
8:18 pm Q: When he heard abt NSF req thought it meant write another 2 paragraphs, now realizes #datashare much bigger issue, needs to know more
8:20 pm FLC: Caution abt putting additional effort into formulating/documenting yr #datashare process until requirements are more clear.
8:23 pm Discussion about whether making your own website to push out your own data is good enough. Probs w/ longterm maintenance/support #datashare
8:24 pm Q: I want to know that I can die and my data will be preserved and still available. A: FLC: That’s why I said ASCII 🙂 #datashare
8:25 pm bacigalupe: Harvard MIT Data Center w/ ISCPR includes preferred citation format in data download #datashare -via @pfanderson
8:25 pm pfanderson: PA: Longterm preservation of data requires institutional backing. Bigger than #datashare for research w/o preservation. Data Darwinism
8:30 pm RT @jasonriedy We deploy a common reverse engineering structure. Person w/access determines reqs, person w/o implements & models. #datashare
8:30 pm @jasonriedy That is fascinating. I think that needs its own article just to document the process! #datashare
8:31 pm Video will be available in DeepBlue sometime soon. I’ll post when I hear about it. #datashare
8:31 pm #Datashare session is OVER! Packing up and headed home.

Powered by WTHashtag, A Microblink Property | Contact


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s