Product Partitioned Dirichlet Process Prior Models for Identifying Substantive Clusters and Fitted Subclusters in Mouse Model Data and Terrorism Data. With Andrew Womack, Indiana University Department of Statistics. This work introduces a new model-based clustering design which incorporates two sources of heterogeneity for the modeling of social science data. The first source of heterogeneity is in the residuals from the mean structure and is modeled with Dirichlet Process random effects. The second source of heterogeneity is unobserved grouping in the data, which is modeled using the product partition framework. Incorporating both sources of inhomogeneity allows the model to capture both structural differences in response to the covariates as well as departures from normality in the error structure. The model is applied to the analysis of terrorist groups, which shows how this tool reveals important features in a dataset that are otherwise undetectable
Applying a Social Ecological Framework to Examine Behaviors and Structures Linking Obesity and Cancer. Director for the Bioinformatics core of the Washington University Transdisciplinary Research on Energetics and Cancer (TREC) Center, integrating research in biological, environmental, emotional, cultural, and social factors that influence obesity and its linkage to cancers. The statistical component and contributions from the bioinformatics core will be incorporated into all phases of the research on linkages between obesity and cancer, including the design of the procedures that produce data. This critical stage includes the assessment of new technology that can produce anthropometric and physical information. Such devices include: personal digital assistants (PDAs) and smart phones for field study, neuroimaging machinery, survey instrumentation, and real-time data capture tools. Other data collection efforts that apply here are sampling procedures, randomization plans, trial designs, and specification of prior distributions for Bayesian updating.
A Bayesian Kriging Approach to Measuring State Ideology with Spatial Realignment. With James Monogan, University of Georgia Department of Political Science. We develop a new approach for modeling public sentiment by micro-level geographic region based on Bayesian hierarchical spatial hierarchical modeling using multiple geocoded data sources. This is done with a model-based smooth density blanket for the concept of interest (primarily ideology here) such that arbitrary geographic boundaries are immaterial. We do this with a Bayesian hierarchical model that uses kriging to analyze geocoded survey responses, and we use this analysis to forecast areal units. By exploiting the spatial relationships among observations and units of measurement and point-to-block realignment calls, we extract measurements of ideology as geographically narrow as available covariates measures. Finally, we provide a freely distributed product that allows other researchers to produce aggregation at all practical governmental levels: states, congressional districts, census tracts, MSAs, state legislative districts, cities, and counties.
A Bayesian Type-2 Tobit Model with Stochastic Censoring. With Neal Beck, New York University Department of Government. The Type-2 Tobit model is used to incorporate a prior filtering decision by studied actors that can be parameterized. Classic selection models are a special case. In this work we modify the algorithm of Li (1998) for structural equations models and McCulloch, Polson and Rossi (2000) for for multinomial probit models to produce a Gibbs sampler for the unknown regression parameters. All other parameters are subsumed in these definitions, and conditionality on the data is implied.
Missing Interval Imputation for Accelerometer Data. With Jung Ae Lee, Washington University School of Medicine. Accelerometers are an important device worn on the hip or wrist to monitor physical activity levels of individuals. Unfortunately the resulting data often have consecutive periods of missing values resulting from non-usage. Such intervals of missingness often occur irregularly throughout the assigned time. Our purpose here is to develop a method to produce a predicted activity count uniquely for each minute in such missing periods rather than specify a single summary statistic across period of inactivity, as is currently the dominant practice. We apply functional data analysis with an ANOVA structureto incorporate demographic characteristics and other explanatory variables to obtain unbiased imputations using a zero-inflated Poisson regression model for the count response with autoregressive predictor. We demonstrate that this approach is superior to all known alternatives and provide an application using 2003-2004 National Health and Nutrition Examination Survey data.
Spike and Slab Prior Distributions for Simultaneous Bayesian Hypothesis Testing, Model Selection, and Prediction, of Nonlinear Outcomes. With Xun Pang, Tsinghua University. A small body of literature has used the spike and slab prior specification for model selection with strictly continuous outcomes. In this setup a two-component mixture distribution is stipulated for coefficients of interest with one part centered at zero with very high precision (the spike) and the other as a distribution diffusely centered at the research hypothesis (the slab). With selective shrinkage, this setup incorporates the zero coefficient contingency directly into the modeling process to produce posterior probabilities for hypothesized outcomes. We extend the model to qualitative responses by designing a hierarchy of forms over both the parameter and model spaces to achieve variable selection, model averaging, and individual coefficient hypothesis testing. The performance of the models and methods are assessed with empirical applications in political science.
The Variable Effect of War on Longterm Childhood Mental Health Outcomes. With Enbal Shacham, Saint Louis University Behavioral Science and Health Education.. We propose to add to the understanding of the adverse affects of war on children by simultaneously leveraging three areas of expertise and unique access to affected subjects. The two principle investigators provide combined expertise in psychologicigal epidemiology, statistics, and political science. It is important to understand this problem from all of these perspectives since their effects are intermingled and interrelated. Currently there is no study offering this combination of perspectives. At the origin this is a political problem since malevelent, disfunctional, or aggressive governments and non-state entities foment violence for bureaucratic, economic, and personal ambitions. It should also be viewed as an epidemiological subject since it distributes adverse health outcomes in a contageous, geographic, and enduring manner. While the physical casualties are often substantial, we focus here on the long-term psychological damage that may follow these children for much of their lives. We intend to execute a truly interdisciplinary study where the political science, the epidemiology, the psychology, and the statistical analysis are all done to the highest standards of each discipline.
Queueing Theory Models for Political Science Data. Queueing theory is widely used in many literatures to describe assembly-line type processes, and services in which the completion time is indeterminant. However, there are almost no applications of queueing theory in political science. This is curious because political actors queue up for desired benefits under a number of circumstances. This project looks at the theoretical and practical basis for applying of queueing theory to the analysis of institutional politics. Empirical applications include the process of bills through legislatures, the scheduling and hearing of court cases, and initiatives within international institutions.
Political Representation and Economic Change: A Bayesian Temporal Hierarchical Model of Partisan Influence on Deindustrialization in the U.S. States. With Christopher Witko, University of South Carolina Department of Political Sience. We argue that politics and government are important determinants of the nature, pace and timing of major economic changes. Specifically, when groups that are harmed by major emerging economic changes are better represented in government, these changes will be slower. We elaborate and examine such arguments in an analysis of the state-level deindustrialization of the U.S. economy from 1975-2007. Due to the variation in state economies and changes in the national Democratic Party coalition over time, however, we expect that there will be some heterogeneity in this effect over time and across states. A new Bayesian hierarchical spatial model is developed to reflect these interrelated effects of time and geography. The results of the analysis provide evidence for our arguments.
Assessing Changes in Rapid Response Protocols for Pediatric Intensive Care. With Mary Hartman, Nikoleta Kolovos, and Allan Doctor, Washington University School of Medicine. We are fundamentally concerned with resource use at the time of PICU admission in the period before and after changes in interventions. To test the efficacy of the rapid response team (RRT) approach we collected data from the Childrens' Hospital Pediatric Intensive Care Unit (PICU) for the period 2010 to 2013. This produced 2152 patient records: 1097 pre-program implementation cases and $1055$ post-program implementation cases, which is nearly balanced. The variable PrePostRRT is coded 0 for pre-program and 1 for post-program. This is the key explanatory variable in our statistical models, and its multidimensional relationship with the other variables is the source of our conclusions below. Our modeling approach includes statistical assessment of divergent measures with generalized Bayesian measurement models.
Overcrowding contributes to murine prostate hyperplasia: a transdisciplinary approach to analyzing murine data with generalized linear regression modeling. With Emily C. Benesh, Laura E. Lamb, Graham Colditz, and Kelle H. Moley, Washington University School of Medicine. Discovering modifiable risk factors for prostate cancer are gravely needed to improve outcomes for patients. We previously developed a murine pre-cancer model where maternal obesity led to hyperplastic prostates in offspring. Here, in a transdisciplinary effort, statistical and developmental biology approaches were combined to evaluate the influence of additional environmental factors on prostate hyperplasia in offspring with multivariate regression modeling. These factors included: maternal diet, body weight, age of subjects at evaluation, maternal exposure length, and cage density. From this modeling effort we identified the surprising finding that the cage density positively associated with both body weight of the offspring and prostate hyperplasia outcomes. Additionally, aging correlated with prostate hyperplasia and was exacerbated by maternal obesity. By contrast, no evidence was revealed that body weight of the offspring or the maternal exposure length independently predicted prostate hyperplasia outcomes in offspring. Taken together, these data suggest the hypothesis that prostate tissue is both patterned during early life by maternal diet and is susceptible to alteration by environmental factors like overcrowding during adult life. This transdisciplinary effort emphasizes the need for multivariate regression models to evaluate the context of coordinated variables in developmental biology systems instead of simply comparing pairwise effects.
Linear Algebra for Social Science Statistics. With Sudipto Banerjee, UCLA Department of Biostatistics. Linear (matrix) algebra books can generally be put into to categories: those for mathematics students at the undergraduate level, and those for graduate students in statistics and related fields. Caught between this dichotomy are social and behavioral science graduate students who do not need detailed proofs and derivations but need intuition and examples that illustrate the major principles for moving from scalar-based model notation to matrix/vector-based model notation. This book fills this gap with examples drawn from political science, sociology, psychology, public policy, anthropology, and more.