Monday, July 8, 2024 - 1:30pm to 4:30pm
Tuesday, July 9, 2024 - 1:30pm to 4:30pm
In this two-day course (6 hour), participants will learn how to efficiently automate the process of collecting data from large numbers of websites and text files using R. Practical and computational issues associated with scraping large amounts of data in a timely manner will be among the topics covered. The workshop will cover topics needed for web scraping, including working with html, text processing with Regular Expressions, using APIs, data management and building a simple web crawler. Participants will practice applying these skills to real websites during class time.
By the end of the workshop, participants should possess the basic skills necessary to scrape a large amount of web or text data and extract useful information from that data. No previous experience with web scraping is required, but participants are expected to be familiar with R at an introductory level.
Instructor: Alex Karl
Alexander Karl is a PhD student in the Sociology Department at UMass Amherst and holds a certificate in Computational Data Science from the Manning College of Information & Computer Science. He has previously held positions in industry in software engineering and data science roles. Alex specializes in computational social science methods. His research interests include social networks, text-as-data, and internet culture.
REGISTRATION INFORMATION | 6-HOUR WORKSHOP
Important: If you are registering for more than one workshop, please verify that all workshops are in your cart with the correct institutional and career status selected, for accurate pricing.
Five College Students and Faculty
- Five College Undergraduate and Graduate Students and Postdocs: $75/person
- Five College Faculty & Staff: $135/person
Non-Five College Students and Faculty
- Non-Five College Undergraduate and Graduate Students and Postdocs: $150/person
- Non-Five College Faculty, Staff & Other Professionals: $210/person
Registration note: The Five Colleges include: UMass Amherst, Amherst College, Hampshire College, Mount Holyoke College, and Smith College. Faculty, students and staff from University of Massachusetts Boston, Dartmouth and Lowell campuses and UMass Chan Medical School pay the five college rates. Registration closes for each workshop 2 full business days prior to the start date. If paying with departmental funds, contact Sue Falcetti (sfalcetti@umass.edu).
Cancellation note: In cases where enrollment is 5 or less, we reserve the right to cancel the workshop. In cases where the registrant cancels prior to the workshop, a full refund will be given with two weeks notice, and 50% refund will be given with one week notice. We will not be able to refund in cases where registrant does not notify us of cancellation at least one week prior to the beginning date of the workshop.