Optional Readings for Sociology 73994

Overview of Generalized Linear Models

These readings cover many of the models we will be covering during the course of the semester. You may want to skim through them now and return to them throughout the course of the semester (they'll hopefully make and more sense as we go through the semester).

J. Scott Long's Lab Guide for Stata provides an introduction to Stata and shows how it can be used to estimate many of the models we will be discussing in this course. (Thanks to Scott Long for giving me this!)

This excerpt from the Stata 8 reference manual shows how several models all fall under the heading of generalized linear models -- indeed, if you want, there is even a single command in Stata, glm, that can estimate most of them! (Although the more specialized commands in Stata tend to work better.)

Regression Models for Categorical Data, by J. Scott Long and Simon Cheng, provides a substantive/mathematical overview of most of the models discussed in Long and Freese's book. In the first few weeks of the course, pay particular attention to pp. 1-15, which discuss logistic regression.

As an alternative (or in addition to) the Long and Cheng piece, Alternative Regression Models: Logistic, Poisson Regression, and the Generalized Linear Model talks about logistic regression, multinomial logit models, ordinal regression, and count models. See especially pp. 479-519 on logistic regression.

Models for Binomial Outcomes

The excerpt from Pampel's Logistic Regression: A Primer (Sage Series on Quantitative Applications in the Social Sciences, paper # 132. 2000) provides a relatively straightforward overview of logistic regression. Read the rest of the book for more advanced topics.

The section from Menard 's Applied Logistic Regression analysis (2^nd Edition, Sage Series on Quantitative Applications in the Social Sciences, paper # 106, 2001) discusses the logistic regression analogues to OLS -s R² and F statistics.

Raftery provides a superb critique of conventional hypothesis testing and suggests the BIC measure as a powerful alternative. It is a little technical in places but he even tells you the parts you can skip over if you are willing to trust him on the proofs. At a minimum read these excerpts. Robert Hauser's response is also very informative.

Hamilton's chapter on Logistic Regression (from his Statistics with Stata, Updated for Version 8) provides some excellent examples of using Stata for logistic regression. He also covers multinomial and ordinal models.

Long's (1997) Chapter 3, Binary Outcomes (10 MB), and Chapter 4, Hypothesis Testing (5 MB), provide more in-depth and mathematical coverage of the corresponding topics from Long and Freese. These can be difficult to read but are also very informative. Don't worry too much about the math if you don't understand it. These are high quality scans and hence the files are pretty large, so try to use a high-speed connection when you download these. Also, if possible, you might set your printer to "good quality" rather than "best quality."

In Treat me - The crucial health stat you've never heard of (a short simple article from Slate that is written for the masses) Darshak Sanghavi explains why Risk Ratios can be highly misleading. Risk ratios aren't quite the same as odds ratios, but pretty much the same argument would apply to them.

Carina Mood's Logistic Regression: Why we cannot do what we think we can do, and what we can do about it -- explains in more detail than my notes do why it is difficult to compare coefficients across nested models, and notes other ways in which logistic regression is different from OLS.

Applications: Freese and others ask Who are Feminists and what do they believe? and use predicted margins to illustrate their points, as do Ayers et al when answering the question Can religion help prevent obesity? Johnson & Mollborn also use predicted probabilities to discuss Hardship in Childhood and Adolescence, but their comparisons of nested models might have been better had they used something like Y-standardization. All you Soc of Religion & Networking fans may enjoy ND Grads Brian Miller, Peter Mundey, and Jonathan Hill's logit/ologit analysis of Faith in the Age of Facebook.

Marginal Effects

This 2012 Stata Journal Article by Richard Williams elaborates on Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects.

Bornmann & Williams provide an application of the margins command in How to calculate the practical significance of citation impact differences?

Patrick Royston shows how you can use his MCP (marginscontplot) command for understanding and interpreting the effects of continuous variables.

Models for Ordinal Outcomes I

As part of the Sage Research Methods Foundations Project (SRMF), Williams and Quiroz (2019) provide an overview of Ordinal Regression Models. Both basic and more advanced methds (e.g. interval interval regression, generalized ordered logit models, heterogeneous choice models) are discussed. Those with an ND.edu account can access it here. For those not at ND, if your library has purchased SRMF (and if it hasn't it should!) the entry can be found at https://methods.sagepub.com/Foundations/ordinal-regression-models.

Ordinal Outcomes, Chapter 5 from Long (1997) provides a more in-depth discussion of this topic.

Menard provides a fairly straightforward discussion of multinomial and ordinal models in this excerpt from Applied Logistic Regression.

The Stata 9 Reference Manual discusses the intreg command for interval regression. It is potentially useful when the intervals used for an ordinal variable are known, e.g. for income, 1 = 0-$5,000, 2 = $5,001-$15,000, ..., 6 = $50,001 or higher.

Models for Count Outcomes

Count Outcomes, Chapter 8 from Long (1997) provides an excellent in-depth discussion.

Survival and Event-Count Models from Hamilton's Statistics with Stata, Updated for Version 8, shows how to use Stata for several types of analyses related to events. It also discusses Generalized Linear Models.

Models for Multinomial Outcomes

Menard provides a fairly straightforward discussion of multinomial and ordinal models in this excerpt from Applied Logistic Regression.

Survey Data Analysis

This excerpt from the Stata 9 Survey Data Manual provides an Introduction to Survey Data Analysis.

Sampling Weights and Regression by Winship and Radbill discusses issues to be aware of when using weighting with complicated survey designs. Since this article was written in 1994, I asked Winship in July 2005 what he thought of how Stata handled weighting. He said "I am very happy with how Stata handles weighting. It does provide a lot of different options so one can do it wrong in any particular case. However, it does calculate the standard errors correctly. It is still the case that if weighted and unweighted differ, this is evidence that you have a misspecified model. If they are the same, one should use the unweighted because they have smaller standard errors."

Models for Ordinal Outcomes II - gologit & other alternatives

Williams (2006) discusses the generalized ordered logit model and shows how it can be estimated with gologit2. Several other articles use different terminology, but they basically show the ways that gologit models can be used and interpreted. Hedeker and Mermelstein present the algebraically equivalent Thresholds of Change Model and how the effects of explanatory variables can differ across ordered stages. Boes & Winkelman ask whether money cannot buy happiness, but can buy-off unhappiness. Lindeboom and Doorslaer claim that people use different frames of reference when self-reporting health, potentially invalidating cross-group comparisons; gologit models can correct for that.

Jones and Westerland (2006) discuss several alternative ordinal models, including the stereotype and gologit models.

Williams (2016) discusses how to interpret gologit models.

Models for Ordinal Outcomes III - Group Comparisons/ Heterogeneous Choice Models

Allison (1999 Sociological Methods and Research) warns that group comparisons/ interaction effects can be problematic in logistic regression, much more so than is the case in OLS regression.

Williams (2009) discusses Allison's proposed solution. He argues that Allison's method involves a special case of the heterogeneous choice model. He further contends that Allison's proposed solution can sometimes have serious problems; but that these problems can often be addressed by turning to the broader class of heterogeneous choice models.

Williams (2010) provides detailed examples using the oglm program and offers additional substantive insights about heterogeneous choice models.

Long (2009) argues for the use of predicted probabilities for comparing groups.

Hoetker (2004 working paper) agrees with Allison and suggests some strategies for dealing with the problem.

Keele and Park (2006) further explain heterogeneous choice models and point out potential problems with them that have been overlooked.

A special type of group comparisons involves transitions from one level to the next, such as in educational transitions. Contrary to previous belief, Mare in the1980s found that the effects of socioeconomic background variables decline regularly across educational transitions in conditional logistic regression analyses. Hauser and Andrew (2006) propose the logistic response model with partial proportionality constraints (LRPPC) for such analyses and argue that their models confirm Mare's main findings. Time permitting, we'll try some alternative analyses of the Hauser/Andrew data and see if we agree.

Panel Data

Stata has an entire manual (xt.pdf) devoted to panel data. You should have a pdf copy of it if you have Stata 11 or higher. In order, I recommend reading the sections entitled xt, xtreg, and xtlogit, and possibly xtgee. The first two sections provide background that is needed for understanding xtlogit. Allison's book, Fixed Effects Regression Models, is well worth buying and includes lots of Stata examples.

Allison (1982) illustrates how logistic regression can be used in Discrete-Time Methods for the Analysis of Event Histories.

Fractional Response Models/ Rare Events

Jeffrey Wooldridge explains fractional response models in this 1996 paper and in this 2011 presentation at the Chicago Stata Conference. There are now ways to estimate some of the models he said could not be estimated with Stata. These FAQs from StataCorp and UCLA provide additional details.

As Gary King notes, "Rare events are binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents")... popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events... [Also] commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter million dyads, only a few of which are at war... We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed." If this is of interest to you, http://gking.harvard.edu/stats.shtml provide links to his papers as well as to his relogit program for Stata.

Paul Allison and Heinz Leitg�b offer additional thoughts on the analysis of rare events.

Also of Interest

The Tobit Model (also called the censored regression model) is discussed in Long's 1997 book but not in Long and Freese. Tobit models are designed to make improved estimates when there is either left- or right-censoring. For example, income would be right-censored if the highest value recorded was $100,000, where $100,000 meant $100,000 or more. OLS will produce biased estimates in such cases. Here are some Stata examples: 1, 2 .

This excerpt from Long & Freese's Regression Models for Categorical Dependent Variables Using Stata shows how to handle categorical independent variables. Of special interest is the section on pp. 421-422 that discusses how to test whether or not an ordinal independent variable can be treated as though it were interval, and what to do if it can't be.