The Third Annual
Journal of Information Technology & Politics Conference
May 16 &
17, 2011 – University of Washington
- Seattle, WA
JITP 2011 Speaker:
John Wilkerson, University of Washington
Title: "Tradeoffs
in Accuracy and Efficiency in Supervised Learning Methods"
Abstract:
Text is becoming a central source of data for social science
research. With advances in digitization and open records practices,
the central challenge has in large part shifted away from
availability to usability. Automated text classification
methodologies are becoming increasingly important within political
science because they hold the promise of substantially reducing the
costs of converting text to data for a variety of tasks. In this
paper, we consider a number of questions of interest to prospective
users of supervised learning methods, which are appropriate to
classification tasks where known categories are applied. For the
right task, supervised learning methods can dramatically lower the
costs associated with labeling large volumes of textual data while
maintaining high reliability and accuracy. Information science
researchers devote considerable attention to comparing the
performance of supervised learning algorithms and different feature
representations, but the questions posed are often less directly
relevant to the practical concerns of social science researchers.
The first question prospective social science users are likely to
ask is — how well do such methods work? The second is likely to be —
how much do they cost in terms of human labeling effort? Relatedly,
how much do marginal improvements in performance cost? We address
these questions in the context of a particular dataset — the
Congressional Bills Project — which includes more than 400,000
labeled bill titles (19 policy topics). This corpus also provides
opportunities to experiment with varying sample sizes and sampling
methodologies. We are ultimately able to locate an
accuracy/efficiency sweet spot of sorts for this dataset by
leveraging results generated by an ensemble of supervised learning
algorithms.
John Wilkerson (Ph.D., University of
Rochester, 1991) is an associate professor in the Political Science
Department at the University of Washington. His research centers on
legislative organization and decision-making, with related interests in
health politics and comparative legislative studies. He is particularly
interested how new information technologies can advance political
science research and teaching.
Join Now!
[Conference Home Page]
[Conference
Speakers & Authors] [Registered
Participants]
[JITP.net]
[Join
the JITP Reviewer Database] [Browse
the JITP archives]
|