Readings Packet for Grad Stats II

Grad Stats II

Readings Packet - Table of Contents

Complete citations are usually included with the reading itself; if not, they are given below. The course notes and the other required texts cover most of what you need for the course, but the materials in the readings packet may help to clarify and expand on some topics for you. Some of the readings are quite technical, so do not be too concerned if you do not understand everything in them. If you are following the course notes ok, most of these readings can be viewed as "recommended," but I think you'll find it helpful if you at least skim through many of them. The readings may also be very helpful when doing your own research or if you want to take the Methods and Statistics area exam someday.

Note: Many of the readings are scanned and hence quite large. You probably don't want to access them if you only have a slow Internet connection.

Introduction and Overview

1)                  McClendon's chapter (from Multiple Regression and Causal Analysis, 1994) on "Nominal independent variables" may help you to better understand dummy variable and effect coding. We probably won't go over it in class, but his section on "Contrast Coding" discusses a powerful alternative to dummy variable coding that you may find useful someday.

2)                 Allison's Multiple Regression: A Primer is required reading for the course, but in case you haven't been able to get a copy yet some of the chapters (as well as his data) are on the web.

Missing Data

1)                  Cohen and Cohen's chapter on "Missing Data" (from Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences) summarizes many of the commonly used techniques for handling MD. This is from the 1975 edition of their book; the 2003 edition also goes over more advanced techniques.

2)                  The excerpt from Paul Allison's Sage monograph on Missing Data (2002; paper # 136 in the Sage Series on Quantitative Applications in the Social Sciences) also discusses basic techniques, and points out some errors that Cohen and Cohen made (and which I made for years too!) The rest of Allison's book is recommended if you want to learn about more advanced techniques.

3)        This excerpt from the Stata 11 MI Manual provides a substantive introduction to Multiple Imputation. If you have Stata 11, all of the manuals are available as PDF files.

4)                  Newman's excellent paper compares traditional methods, such as listwise and pairwise deletion, with newer methods, such as Full Information Maximum Likelihood (FIML) and Multiple Imputation (MI). The author does a Monte Carlo simulation of a 3-wave panel study where some subjects drop out of the study after each wave. Newman shows that MI and ML approaches tend to work much better than listwise deletion in such a situation because listwise deletion discards many cases that contain at least some usable information. He also shows that all missing data techniques perform worse when data are missing on a nonrandom basis. (Newman, Daniel A. 2003. Longitudinal Modeling with Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques. Organizational Research Methods, Vol. 6 No. 3, July 2003 pp. 328-362)

5)                  I'd like to talk about multiple imputation methods in class but time may not permit it. If you want to find out more on your own, Joe Schafer's Multiple Imputation FAQ provides a good brief general overview. UCLA has a nice tutorial on Multiple Imputation Using ICE, a Stata program written by Patrick Royston. For still more information on ice, see the Stata Journal Articles written by Royston. (NOTE: these can be a little confusing because the syntax keeps evolving. In particular, note that the program mvis gets renamed ice. Pay particular attention to the middle article's discussion of the passive and substitute options; these are a little tricky but they are important to understand if you have nominal variables or interaction effects). Finally, if you're really a fanatic, see http://www.multiple-imputation.com . This has lots of other links, suggested readings, and information on other software that is available.

Measurement Error

1)         The excerpts from Reliability and Validity Assessment (Carmines and Zeller, 1979, paper # 17 in the Sage Series on Quantitative applications in the Social Sciences) provide a conceptual and mathematical overview of many of the issues involved in measurement. We'll cover just a small piece of this in class but the rest may be useful to you.

Outliers

1)         "Outlying and Influential data" from Fox's Regression Diagnostics (1991, paper # 79 in the Sage Series on Quantitative applications in the Social Sciences) discusses outlier problems and solutions.

Serial Correlation, Heteroskedasticity, Weighting

1)         It isn't easy reading; but the chapter from Pindyck & Rubinfeld's Econometric Models and Economic Forecasts (1991 edition) provides a fairly detailed (and mathematical) discussion of serial correlation and heteroskedasticity if you want it.

2)         "Sampling weights and regression analysis" (Winship and Radbill, Sociological Methods and Research, v. 23 # 2, November 1994, pp. 230-257) discusses issues that arise when something other than simple random sampling is used. For example, some groups (e.g. minorities) may be oversampled to ensure that there are enough members of these groups to adequately analyze them. How, then, should you adjust your statistical analyses to reflect the fact that simple random sampling has not been used? The authors point out that some commonly used and seemingly intuitive strategies can actually be harmful (e.g. produce heteroskedasticity) and suggest alternatives.

The Logic of Causal Modeling

1)         "The Logic of Scientific Inference" (from Constructing Social Theories, Arthur L. Stinchcombe, 1968) provides a non-mathematical view of how we go about building models and theories.

2)         The Logic of Causal Order (James Davis, 1985, paper # 55 in the Sage Series on Quantitative applications in the Social Sciences) illustrates many key principles concerning how variables can be causally related to each other.

3)         "The educational and early occupational attainment process" (Sewell, Haller & Portes, American Sociological Review, V. 34, Issue 1, Feb. 1969, pp. 82-92) is one of the classic early pieces using path analysis and illustrates many of the principles of causal ordering.

Interaction Effects

1)         McClendon's chapter (from Multiple Regression and Causal Analysis, 1994) on "Nonadditive Relationships" provides a relatively straightforward overview of interaction effects.

Also Recommended: Multiple Regression: Testing and Interpreting Interactions (1991) by Leona S. Aiken and Stephen G. West. It is lengthy but not too hard to read. Jaccard and his colleagues have several Sage monographs on interaction effects using both basic and advanced methods, e.g. Interaction effects in multiple regression, Lisrel approaches to interaction effects in multiple regression and Interaction effects in Logistic Regression.

Nonlinear Relationships

1)         McClendon's "Nonlinear relationships" (from Multiple Regression and Causal Analysis, 1994) explains how nonlinear relationships can be handled within a multiple regression framework.

2)        The documentation for the SPSS 14 Curvefit command describes various nonlinear relations and how they can be graphed in SPSS.

Intro to Path Analysis

1)         Chapters 3 and 4 of Otis Dudley Duncan's classic book, Introduction to Structural Equation Models, (1975) explain the principles behind path analysis with recursive models.

2)         "Standardization in Causal Analysis" (Kim and Ferree, Sociological Methods and Research Vol. 10, No. 2, November 1981, pp. 187-210) adds to Duncan's discussion of the issues and problems you need to be aware of with standardized variables in causal models.

Nonrecursive Models

1)         Chapters 5, 6 and 7 of Duncan (Introduction to Structural Equation Models, 1975) extend the discussion of path analysis to include nonrecursive models and issues of identification.

Also Recommended: Nonrecursive Causal Models by William D. Berry. 1984. Paper # 37 in the Sage series on Quantitative Applications in the Social Sciences. Berry covers more of the math involved in nonrecursive models and also gives additional substantive examples.

Logistic Regression & Other Alternative Regression Models

1)         Cohen and Cohen's (from Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Third Edition, by Jacob Cohen, Patricia Cohen, Stephen G. West, and Leona S. Aiken, 2003) "Alternative Regression Models: Logistic, Poisson Regression, and the Generalized Linear Model" talks about logistic regression, multinomial logit models, ordinal regression, and count models.

2)         The excerpt from Pampel's Logistic Regression: A Primer (Sage Series on Quantitative Applications in the Social Sciences, paper # 132. 2000) provides a relatively straightforward overview of logistic regression. Read the rest of the book for more advanced topics.

3)         The section from Menard's Applied logistic regression analysis (2^nd Edition, Sage Series on Quantitative Applications in the Social Sciences, paper # 106, 2001) discusses the logistic regression analogues to OLS's R^2 and F statistics.

4)         The excerpt from Menard (Applied logistic regression analysis, 2^nd Edition, Sage Series on Quantitative Applications in the Social Sciences, paper # 106, 2001) on "Polytomous logistic regression and alternatives to logistic regression" provides an overview of multinomial and ordered logit models.

5)         Williams discusses thegeneralized ordered logitmodel.This model is less restrictive than the ordered logitmodel (whose assumptions are often violated) but more parsimonious thannon-ordinal alternatives such asmlogit.gologit models can be estimated with the Stata programgologit2.

6)        This excerpt from Long & Freese's Regression Models for Categorical Dependent Variables Using Stata shows how to handle categorical independent variables. Of special interest is the section on pp. 421-422 that discusses how to test whether or not an ordinal independent variable can be treated as though it were interval, and what to do if it can't be.

7)         Sometimes the dependent variable is a proportion, e.g. the proportion of days workers spend off sick. These FAQs from StataCorp and UCLA show what to do in such cases.

Brief Overview of Other Advanced Methods

These articles provide a brief introduction to other advanced methods that you may want to learn more about in the future.   I will go over much of this in class but mostly I just want to alert you to the existence of these techniques so you can learn more about them or take additional classes as needed for the kind of research you want to do.

1)         "The Stability of Students' Interracial Friendships." This classic (well, anyway, convenient) article by Hallinan and Williams (American Sociological Review, Volume 52 Issue 5, Oct. 1987, pp. 653-664) uses a form of Event history analysis (EHA) to examine the factors that affect friendship stability across time.

2)         Chapter 1 from Hierarchical Linear Models (Anthony S. Bryk and Stephen W. Raudenbush. Book # 1 in the Sage Series on Advanced Quantitative Techniques in the Social Sciences. 1992) briefly introduces the rationale for the powerful and increasingly popular HLM technique. (A newer edition of the book is out now.)

3)         "Beyond Wives' Family Sociology: A Method for Analyzing Couple Data." Yet another one of those classic/convenient pieces, this 1982 article by Thomson and Williams (Journal of Marriage and the Family, vol. 44 issue 4, Nov. 1982, pp. 999-1008) uses LISREL to examine data from married couples.

Some Other Recommendations

In general, the Sage series on Quantitative Applications in the Social Sciences has some good short monographs on many topics. The Sage Series isn't a bad starting place if you want to find out more about some specific topic. For greater depth, see Sage's Advanced Quantitative Techniques in the Social Sciences.

Loglinear Analysis provides a powerful means of analyzing cross-classified data in tables. For example, it has often been used in studies of occupational mobility. An early but still-good book on this topic is Stephen Fienberg's The analysis of cross-classified categorical data. Sage also has several good books on this, e.g. Michael Hout's Mobility Tables and Logit Modeling: Practical Applications by Alfred Demaris.

It is a bit more difficult than I might like, but at least until something more basic comes along J. Scott Long's Regression Models for Categorical and Limited Dependent Variables (Book #7 in the Sage Series on Advanced Quantitative Techniques, 1997, Sage Publications: Thousand Oaks, California) provides a good discussion of how regression techniques can be extended to categorical variables. Among other things, Long talks about logistic regression, multinomial logit models (where the dependent variable has more than 2 categories), Tobit models (which deal with censored or truncated data, where information on the DVs is limited or cases are missing altogether), regression with ordinal dependent variables, and "count" outcomes (variables which count the number of times that an event has happened.) A less technical and more applied piece is J. Scott Long's and Jeremy Friese's Regression Models for Categorical Dependent Variables Using Stata, 2nd Edition (Stata Press, 2006, College Station ,Texas). This book is best if you already understand the underlying concepts, but you may also find that you understand the concepts a lot better once you see examples. Other advanced books include Powers and Xie's Statistical Methods for Categorical Data Analysis, Agresti's Categorical Data Analysis, and Hosmer and Lenshow's Applied Logistic Regression.

Event History Analysis by Paul Allison (Sage paper # 46) provides a nice succinct overview of this advanced method. Among other things, EHA is useful when you want to examine the rate at which things happen. For example, everybody dies, but how quickly someone dies is influenced by many factors. The factors that influence the pace of job and other transitions are examples of other topics that can be addressed by EHA. Allison offers (fairly expensive) seminars in the summer that you may want to attend if you ever get the chance. Another good book, that has a practical and applied orientation, is Mario A. Cleves, William W. Gould, and Roberto G. Gutierrez's An Introduction to Survival Analysis Using Stata (Stata Press, 2004, College Station, Texas.)

Since the 1970's, the LISREL program has been popular with those who want to do structural equation/path models and confirmatory factor analysis. A key strength of LISREL is that it provides a means for controlling for measurement error in variables. The program itself comes with some pretty good documentation. For more, see Leslie Hayduk's Structural Equation Modeling with LISREL and Lisrel Issues, Debates and Strategies. I also like J. Scott Long's Covariance Structure Models: An introduction to LISREL. However, note that LISREL has gotten easier to use in recent years, and many older works refer to its original, more complicated syntax. Also, there are many alternatives to LISREL now that many think are superior, such as AMOS and M-PLUS.

In the social sciences, we often have data that are hierarchical: we have variables describing individuals, but the individuals are also grouped into larger units, for which we also have variables. For example, students are grouped in classes, classes are grouped in schools, schools are grouped in school districts, etc. Because of these groupings, assumptions of independence of observations are often violated. Anthony S. Bryk and Stephen W. Raudenbush's book Hierarchical Linear Models discusses how to handle such hierarchical models. Also, the HLM manual describes how to use their powerful HLM program. Bryk and Raudenbush regularly offer seminars on HLM in Chicago which you may want to attend if you find your work requires this technique.

If you want a much more mathematical approach to both basic and advanced statistical techniques, William H. Greene's Econometric Analysis is one of the leading textbooks in the field. Greene's Limdep User's Manual explains how to use his Limdep program. Mere mortals who want more math but something easier than Greene may be able to handle Pindyck and Rubinfeld's Econometric Models and Economic Forecasts.

For some other suggested readings, check out my web page at

https://academicweb.nd.edu/~rwilliam/ndonly/area_exams/xsocmeth/index.html

This page includes some of the reading lists that have been prepared by students taking the area exam in Methods and Exams. Old exams are also there if you want to take a look at them.

For more on categorical data analysis, you can see my course notes and suggested readings at

https://academicweb.nd.edu/~rwilliam/xsoc73994/index.html