See original posting here.

As the National Digital Stewardship Resident at the American Museum of Natural History, I was introduced to the very specific problems facing museum librarians and archivists not only through observing the Research Library, but by speaking individually with some of the most intensive data creators at the Museum. As a part of my larger needs assessment project at the Museum, I created a semi-structured interview guide that I used to enter into a targeted dialogue with scientific staff members, covering all aspects of their digital research and collections data. Topics included the volume of their data, its rate of growth, format types, necessary software and hardware support, management practices, and opinions on preservation of their data (i.e. what data they believe is important in the long-term). I interviewed close to 60 staff members in total, including all the curators in the five Science divisions at the Museum: Anthropology, Invertebrate Zoology, Paleontology, Physical Sciences, and Vertebrate Zoology.

During the course of my analysis, I discovered not only the sheer volume of data (with a substantial number of curators generating many terabytes a day!) but also the diversity of said data, for both research purposes and within collections. This is a big data problem that many research museums are facing. Looking at the AMNH, diversity of data is found not only in the macrocosm of the Museum’s five Science divisions, but also with each curator and research methodology.

Inez the DigiPres Turtle

The NDSR mascot, Inez the DigiPres Turtle, looking in on a CT scanner scanning a monkey's skull at AMNH.

After gathering this interview data, I was tasked with analyzing it in order to make recommendations in a larger final report on three essential categories: storage, management, and preservation of digital research and collecaions data. A related deliverable of my project was also a report on solutions other museums have developed for curating their in-house research and collections data. This environmental scan showed that few natural history museums in the United States take an institutional approach to solving this challenge, largely due to resource constraints. A popular institutional solution for collections data is Arctos, the community-driven multidisciplinary collection management information system that was developed as a collaboration among multiple institutions and currently holds three million natural history museum records. However for research data, fewer such solutions exist for natural science research and are in development currently. The National Museum of Natural History and the British Natural History Museum are both growing their digital preservation program by building institutional repositories to house their respective research data.

As I continued to develop my AMNH-specific recommendations for storage, management, and preservation of digital research and collections data, I remained cognizant of the community implications. This final report is still a working document, now totaling over 100 pages. It is my hope that through at least publicly releasing my semi-structured interview guide (which will be in my public NDSR report to be released in the coming weeks), that other natural science museums can pursue the same needs assessment procedure to understand the extent and scope of their own digital data—and in doing so, have the opportunity to advocate and educate for and on digital preservation in their own institutions. Only when there is institutional support can larger community-driven resources be developed and the risk of data loss minimized.