This intensive workshop deals with the workflow of data analysis. Workflow encompasses the entire process of scientific research: planning, documenting, and organizing your work; creating, labeling, naming, and verifying variables; performing and presenting statistical analyses; preserving your work; and, critically, producing replicable results. Most classes in statistics focus on estimating and interpreting models. In "real world" research, these activities often involve less than 10% of the total work. This workshop is about the other 90% of the work.
Lectures will show you how to develop a workflow that is guided by the demands of producing replicable and accurate results while working as quickly and efficiently as possible. In lab you will be able to apply these principles to your own work. Topics to be covered include:
- General principles that guide your research: replicability, accuracy, and efficiency.
- Efficient methods for planning, organizing, documenting, executing, and preserving your work.
- Tools that enhance and simplify your work: software, simple but powerful methods to automate routine tasks, organizational structures that save time, and cyberinfrastructure to simplify your work and preserve your files.
- Simple programming in Stata to improve accuracy and greatly simplify data management.
- Real world examples of what works and what does not in each stage of the research process:
- Planning and organizing research
- Preparing data for analysis: importing data; developing consistent names and labels; documenting the sample and variables; and cleaning the data.
- Conducting sophisticated data analysis that is replicable and efficient.
- Accurately and quickly incorporating statistical results into your writing and presentations while maintaining the provenance of each result.
- Methods to speed up the inevitable task of revising your work.
- Ways to prevent catastrophic loss of files during the project and to ensure long term preservation of your materials.
While many software tools are illustrated, Stata is the primary package for data management and analysis. The course uses Long (2009) The Workflow of Data Analysis Using Stata.
J. Scott Long
Distinguished Professor of Sociology and Statistics, Indiana University
This workshop is offered in collaboration with the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan. To register, go to the ICPSR Registration Website.