September 30, 2015

Tree-based models for political science data

Political scientists often find themselves analyzing datasets with a large number of observations, a large number of variables, or both. Yet, traditional statistical techniques fail to take full advantage of the opportunities inherent in "big data" as they are too rigid to recover nonlinearities and do not facilitate the easy exploration of interactions in high-dimensional datasets. In this paper, we introduce a family of tree-based nonparametric techniques that are often more appropriate than traditional methods for confronting these data challenges. In particular, tree models are extremely effective for detecting nonlinearities and interactions in datasets with many (potentially irrelevant) covariates. We introduce the basic logic of tree-based models, provide an overview of the most prominent methods in the literature, and conduct three analyses that illustrate how the methods can be implemented while highlighting both their advantages and limitations.

Forthcoming at American Journal of Political Science.  With Santiago Olivella
Local copy (pdf) | Supplemental Information (pdf)