Dr. Scott Long Keeps Us From Drowning in Data Thanks To The Workflow of Data Analysis

The Center for Research on Families’ hosted Dr. Scott Long as part of the Methodology Series on March 7, 2013. Dr. Long is a Distinguished Professor of Sociology and Statistics at Indiana University and gave a presentation entitled, “Drowning in Data? The Workflow of Data Analysis.” He has published numerous scholarly journals and books on how to improve data analysis systems including his most recent book, The Workflow of Data Analysis Using Stata.

Long’s lecture focused on methods that researchers can use to organize, display, and house their data. But why should we spend time on organization instead of the research itself? Long says that replication is essential to good science and that it is more important to have data that is replicable than it is to get the right answers. Although this seems straightforward, as a consultant Long found himself advising on easy, simple, systematic questions instead of complex questions about the research itself. This led Long to emphasize the fact that every research needs a workflow: a way of organizing your data.

He highlighted multiple ways to organize data such as having consistent file names, including dates, properly using metadata, creating a “library” of files, and maintaining a system for cleaning up files. These suggestions are often especially difficult to implement when collaborating with other researchers or graduate students, so this makes having a consistent workflow even more essential. If multiple collaborators have different workflows, this creates an infinite number of ways that data can be labeled, re-named, analyzed, and saved. By taking the time to determine a system, these inconsistencies can be eliminated and collaboration can be effective instead of problematic.

Although the audience came from a wide variety of backgrounds from managers of multi-million dollar grants down to those who need to clean up their word documents, Long provided lessons that can be applicable to everyone. He jokingly brought up “Long’s Law” which states:

It is faster to document it today than tomorrow.
Addendum 1: Nobody likes to write documentation.
Addendum 2: Nobody regrets having documentation.

Documentation is a crucial process for all disciplines of research and is necessary to have replicable, organized data. If we’re smart, we should all strive to create a workflow that properly organizes, protects, and backs up our information. If we can create a system that we’ll actually use, Long says it will be well worth the pain and we’ll be able to prevent ourselves from drowning in data.

To see the strategies Scott Long outlined in his presentation, download a .pdf version of his slides.

Scott Long will also be giving a week-long workshop from June 17-21, 2013 as part of the CRF Methodology Series. Click here for more information.