Skip to main content
Please note this event occurred in the past.
March 13, 2019 9:00 am - March 14, 2019 4:00 pm ET
Data and Software,
Other ISSR Workshops,
Research Methodology
University of Massachusetts, Amherst | E20 Machmer Hall

Wednesday, March 13, 2019 - 9:00am to 4:00pm
Thursday, March 14, 2019 - 9:00am to 4:00pm

Description: In this two-day short course, participants will learn how to efficiently automate the process of collecting data from large numbers of websites and text files using R. We will cover practical and computational issues associated with scraping large amounts of data in a timely manner as well as potential legal issues and how to address them. Additionally, focus will be placed on giving participants the skills to work at a low level with html and text data once it has been collected. This workshop will include a mini-unit on text processing in R as well as a mini-unit on scraping Twitter using R.

Goals: By the end of the workshop, participants should possess the basic skills necessary to scrape a large amount of web or text data and extract useful information from that data.

Prerequisites: No previous experience with web scraping is required, but participants are expected to be familiar with data management in R at the level covered in the "Data Management in R" short course. These two courses are designed to work in sequence; however, if a participant has a strong R programming background, they should be prepared to step directly into the course.

 

Instructor: Matt Denny is a PhD student in Political Science and Social Data Analytics, NSF Big Data Social Science IGERT Fellow, and Data Scientist at Skopos Labs.  He holds Master's degrees in Political Science and Resource Economics from UMass Amherst where he was a statistical methods consultant for ISSR from 2013-15. He has taught a number of workshops on topics ranging from social network theory to big data analytics, and his research primarily focusses on developing statistical models for text, networks, and text-valued networks. You can check out more of his work at www.mjdenny.com.

Questions? For more information about this short course, please contact ISSR Methodologist Jessica Pearlman (jpearlman@issr.umass.edu).

 


To sign up for this workshop click here.

  • Five College Undergraduate and Graduate Students…………………………..$150/person
  • Five College Faculty………………………………………………………………..$250/person
  • Non-Five College Undergraduate and Graduate Students…………………………..$275/person
  • Non-Five College Faculty………………………………………………………………..$400/person

Registration note: The Five Colleges include: UMass Amherst, Amherst College, Hampshire College, Mount Holyoke College, and Smith College. Registration closes for each workshop 2 full business days prior to the start date. Payment of registration does not guarantee a spot in these space-limited workshops. ISSR will offer you the choice of placement on a wait-list or a refund for any workshops that are over-subscribed. If paying with departmental funds or personal checks, contact Karen Mason (@email).

Cancellation note: In cases where enrollment is 5 or less, we reserve the right to cancel the workshop. In cases where the registrant cancels prior to the workshop, a full refund will be given with two weeks notice, and 50% refund will be given with one week notice.  We will not be able to refund in cases where registrant does not notify us of cancellation at least one week prior to the beginning date of the workshop.