The naive estimator of the treatment effect, \(\widehat{\delta}^{\text{*}}_{N}\), is
\[ \widehat{\delta}^{\text{*}}_{N} = \overline{Y}_{1} - \overline{Y}_{0}, \] where \(\overline{Y}_{1}\) and \(\overline{Y}_{0}\) are the sample average outcomes among individuals that received treatment \(A=1\) and those that received treatment \(A=0\), respectively. Specifically,
\[ \overline{Y}_{1} = \frac{\sum\limits_{i=1}^{n} A_i Y_i}{\sum\limits_{i=1}^{n} A_i}~~~~ \text{and}~~~ \overline{Y}_{0} = \frac{\sum\limits_{i=1}^{n} (1-A_i) Y_i}{\sum\limits_{i=1}^{n} (1 - A_i)}. \]
We estimate the standard error of the naive estimator of the treatment effect as
\[ \text{SE}_N = \frac{1}{\sqrt{n}} \left\{\frac{1}{n}\sum_{i = 1}^{n} \left( \widehat{\delta}^{\text{*}}_{N,i} \right) ^2 \right\}^{1/2}, \] where \(\widehat{\delta}^{\text{*}}_{N,i}\) is \[ \widehat{\delta}^{\text{*}}_{N,i} = \frac{ A_i \left( Y_i - \overline{Y}_{1} \right)}{\frac{1}{n}\sum\limits_{i=1}^{n} A_i} - \frac{ (1-A_i) \left(Y_i - \overline{Y}_{0}\right)}{\frac{1}{n}\sum\limits_{i=1}^{n} (1 - A_i)}. \]
This expression follows from the sandwich formula discussed in Section 2.5, Review of M-estimation, where the estimator for \(\delta^{\text{*}}\) solves jointly in \(\delta^{\text{*}}\) and \(\pi\) the stacked estimating equations
\[ \begin{align} \sum_{i = 1}^{n} \frac{ A_i Y_i}{\pi} - \frac{ (1-A_i) Y_i}{1 - \pi} & = & 0 \\ \sum_{i = 1}^{n} \left( A_{i} - \pi \right) &=& 0. \end{align} \]
We have defined two functions to estimate the treatment effect and the standard error.
#------------------------------------------------------------------------------#
# naive estimator of tx effect
#
# ASSUMPTIONS
# tx is binary coded as 0,1
#
# INPUTS
# y : a vector containing the outcome of interest
# A : a vector containing the tx received
#
# RETURNS
# a list containing
# EY : the sample mean of the outcome under each tx option
# deltaHat : the naive estimator for the tx effect
#------------------------------------------------------------------------------#
delta_N <- function(y, A) {
#### Average Responses
# aggregate data for mean in each tx
EY <- stats::aggregate(x = y, by = list(A), FUN = mean)
# convert to named vector
EY <- array(data = EY[,2L], dimnames = list(EY[,1L]))
#### Treatment Effect
delta <- unname(obj = EY[2L] - EY[1L])
return( list("EY" = EY, "deltaHat" = delta) )
}
#------------------------------------------------------------------------------#
# naive estimator of tx effect and its standard error
#
# REQUIRES
# delta_N()
#
# ASSUMPTIONS
# tx is binary coded as {0,1}
#
# INPUTS
# y : a vector containing the outcome of interest
# A : a vector containing the tx received
#
# RETURNS
# a list containing
# deltaHat : the naive estimator for the tx effect
# EY : the sample mean of the outcome under each tx option
# sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_N_se <- function(y, A) {
# tx effect
delta <- delta_N(y = y, A = A)
# mean outcome A = 1
ybar_1 <- sum(y * A) / sum(A)
# mean outcome A = 0
ybar_0 <- sum(y * {1.0 - A}) / sum(1.0 - A)
# estimated tx effect each individual
delta_N_i <- A * {y - ybar_1} / mean(x = A) -
{1.0 - A} * {y - ybar_0} / mean(x = 1.0 - A)
# variance
sigmaSq <- mean(x = {delta_N_i}^2L)
return( c(delta, "sigmaHat" = sqrt(x = sigmaSq / length(x = y))) )
}
For our simulated data set, the naive estimate of the treatment effect and its standard error are
delta_N_se(y = y, A = dataSBP$A)
$EY
0 1
-4.164835 14.222467
$deltaHat
[1] 18.3873
$sigmaHat
[1] 0.7543302
Under the naive estimator, the treatment effect is estimated to be \(\widehat{\delta}^{\text{*}}_N =\) 18.39 (0.75) mmHg indicating that patients that received the fictitious medication experienced on average a larger decrease in systolic blood pressure after six months as compared to patients that did not receive the drug. However, the randomization assumption underpinning this estimator is not appropriate for our simulated data set, and this result may be misleading.
The outcome regression estimator of the treatment effect, \(\widehat{\delta}^{*}_{OR}\), is defined as
\[ \widehat{\delta}^{*}_{OR} = n^{-1} \sum_{i=1}^{n} \left\{Q(X_{i},1; \widehat{\beta}) - Q(X_{i},0;\widehat{\beta})\right\}, \] where \(Q(x, a; \beta)\) is a parametric model for \(Q(x,a) = E\left(Y|X=x, A=a\right)\) and \(\widehat{\beta}\) is a suitable estimator.
For the outcome regression estimator of the treatment effect for which the outcome regression parameters are estimated using ordinary least squares (OLS), the \((1,1)\) component of the sandwich estimator for the variance is defined as
\[ \widehat{\Sigma}_{11} =A_n + (C^{\intercal}_n~D^{-1}_n)~B_n~(C^{\intercal}_n~D^{-1}_n)^{\intercal}, \]
where
\(A_{n} = n^{-1}\sum_{i=1}^{n} A_{ni} A_{ni}\) and \(B_{n} = n^{-1}\sum_{i=1}^{n} B_{ni} B_{ni}^{\intercal}\) with
\[ \begin{aligned} A_{ni} &= \left\{Q(X_{i},1;\widehat{\beta})- Q(X_{i},0;\widehat{\beta})\right\} - \widehat{\delta}^{*}_{OR}, \\ \\ B_{ni} &= \left\{Y_{i} - Q(X_{i},A_{i};\widehat{\beta})\right\} \frac{\partial Q(X_{i},A_{i};\widehat{\beta})}{\partial \beta}. \end{aligned} \]
And,
\[ \begin{aligned} C_n = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial {\beta}} \quad D_n = n^{-1}\sum_{i=1}^{n}\frac{\partial B_{ni}}{\partial {\beta}^{\intercal}}. \end{aligned} \]
Anticipating later methods, we implement the treatment effect and the sandwich estimator of the variance using multiple functions, each function calculating a specific component of the M-estimator and its derivative.
This implementation is not the most efficient implementation of the variance estimator. For example, the regression step is completed twice (once for
#------------------------------------------------------------------------------#
# outcome regression estimator of tx effect
#
# ASSUMPTIONS
# tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
# moOR : a modeling object specifying the outcome regression step
# data : a data.frame containing covariates and tx
# y : a vector containing the outcome of interest
#
# RETURNS
# a list containing
# fitOR : a modelObjFit object containing the results of the
# outcome regression step
# EY : the sample mean of the outcome under each tx option
# deltaHat : the outcome regression estimator for the tx effect
#------------------------------------------------------------------------------#
delta_OR <- function(moOR, data, y) {
#### Outcome Regression
fitOR <- modelObj::fit(object = moOR, data = data, response = y)
# Q(X,0;betaHat)
data$A <- 0L
Q0 <- drop(x = modelObj::predict(object = fitOR, newdata = data))
# Q(X,1;betaHat)
data$A <- 1L
Q1 <- drop(x = modelObj::predict(object = fitOR, newdata = data))
#### Tx Effect
EY <- array(data = 0.0, dim = 2L, dimnames = list(c("0","1")))
EY[2L] <- mean(x = Q1)
EY[1L] <- mean(x = Q0)
delta <- unname(obj = EY[2L] - EY[1L])
return( list( "fitOR" = fitOR,
"EY" = EY,
"deltaHat" = delta) )
}
#------------------------------------------------------------------------------#
# outcome regression estimator of tx effect and its standard error
#
# REQUIRES
# swv_OLS() and delta_OR_swv()
#
# ASSUMPTIONS
# tx is denoted by A and is binary coded as {0,1}
# outcome regression model is a linear model and parameters are estimated
# using OLS
#
# INPUTS
# moOR : a modeling object specifying the outcome regression step
# *** must be a linear model ***
# data : a data.frame containing covariates and tx
# *** tx must be named 'A' and coded 0/1 ***
# y : a vector containing the outcome of interest
#
# RETURNS
# a list containing
# deltaHat : the outcome regression estimator for the tx effect
# EY : the sample mean of the outcome under each tx option
# fitOR : a modelObjFit object containing the results of the
# outcome regression step
# sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_OR_se <- function(moOR, data, y) {
#### OLS components
OLS <- swv_OLS(mo = moOR, data = data, y = y)
#### estimator components
OR <- delta_OR_swv(moOR = moOR, data = data, y = y)
#### 1,1 Component of Sandwich Estimator
# OLS contribution
temp <- OR$dMdB %*% OLS$invdM
sig11OLS <- temp %*% OLS$MM %*% t(x = temp)
sig11 <- drop(x = OR$MM + sig11OLS)
return( c(OR$delta, "sigmaHat" = sqrt(x = sig11 / nrow(x = data))) )
}
#------------------------------------------------------------------------------#
# tx effect estimator component of the sandwich estimator
# outcome regression estimator
#
# REQUIRES
# delta_OR()
#
# ASSUMPTIONS
# outcome regression model is a linear model
# tx is denoted by A and is binary coded as {0,1}
#
# INPUTS:
# moOR : a modeling object specifying the outcome regression step
# *** must be a linear model ***
# data : a data.frame containing covariates and tx
# *** tx must be named 'A' and coded 0/1 ***
# y : a vector containing the outcome of interest
#
# OUTPUTS:
# a list containing
# delta : the list returned by delta_OR()
# MM : M M^T matrix
# dMdB : matrix of derivatives of M wrt beta
#------------------------------------------------------------------------------#
delta_OR_swv <- function(moOR, data, y) {
# estimate tx effect
delta <- delta_OR(moOR = moOR, data = data, y = y)
# Q(X,0;betaHat)
data$A <- 0L
Q0 <- drop(modelObj::predict(object = delta$fitOR, newdata = data))
# derivative of Q(X,0;betaHat)
dQ0 <- stats::model.matrix(object = modelObj::model(object = moOR),
data = data)
# Q(X,1;betaHat)
data$A <- 1L
Q1 <- drop(modelObj::predict(object = delta$fitOR, newdata = data))
# derivative of Q(X,1;betaHat)
dQ1 <- stats::model.matrix(object = modelObj::model(object = moOR),
data = data)
# delta component of M MT
mmDelta <- mean(x = {Q1 - Q0 - delta$deltaHat}^2)
# derivative of delta component w.r.t beta
dMdB <- colMeans(x = dQ1 -dQ0)
return( list("delta" = delta, "MM" = mmDelta, "dMdB" = dMdB) )
}
#------------------------------------------------------------------------------#
# components of the sandwich estimator for an ordinary least squares estimation
# of a linear regression model
#
# ASSUMPTIONS
# mo is a linear model with parameters to be estimated using OLS
#
# INPUTS
# mo : a modeling object specifying a regression step
# *** must be a linear model ***
# data : a data.frame containing covariates and tx
# y : a vector containing the outcome of interest
#
#
# RETURNS
# a list containing
# MM : M^T M component from OLS
# invdM : inverse of the matrix of derivatives of M wrt beta
#------------------------------------------------------------------------------#
swv_OLS <- function(mo, data, y) {
n <- nrow(x = data)
fit <- modelObj::fit(object = mo, data = data, response = y)
# Q(X,A;betaHat)
Qa <- drop(x = modelObj::predict(object = fit, newdata = data))
# derivative of Q(X,A;betaHat)
dQa <- stats::model.matrix(object = modelObj::model(object = mo),
data = data)
# number of parameters in model
p <- ncol(x = dQa)
# OLS component of M
mOLSi <- {y - Qa}*dQa
# OLS component of M MT
mmOLS <- sapply(X = 1L:n,
FUN = function(i){ mOLSi[i,] %o% mOLSi[i,] },
simplify = "array")
# derivative OLS component
dmOLS <- - sapply(X = 1L:n,
FUN = function(i){ dQa[i,] %o% dQa[i,] },
simplify = "array")
# sum over individuals
if (p == 1L) {
mmOLS <- mean(x = mmOLS)
dmOLS <- mean(x = dmOLS)
} else {
mmOLS <- rowMeans(x = mmOLS, dim = 2L)
dmOLS <- rowMeans(x = dmOLS, dim = 2L)
}
# invert derivative OLS component
invD <- solve(a = dmOLS)
return( list("MM" = mmOLS, "invdM" = invD) )
}
Throughout Chapters 2-4, we consider three outcome regression models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.
Click on each tab below to see the model and basic model diagnostic steps.
The first model is a completely misspecified model
\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}). \]
The parameters of this model will be estimated using ordinary least squares via
The modeling objects for this regression step is
q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
solver.method = 'lm',
predict.method = 'predict.lm')
Though ultimately, the regression steps will be performed within the implementations of the treatment effect and value estimators, we make use of modelObj::
For \(Q^{1}(x,a;\beta)\) the regression can be completed as follows
OR1 <- modelObj::fit(object = q1, data = dataSBP, response = y)
OR1 <- modelObj::fitObject(object = OR1)
OR1
Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)
Coefficients:
(Intercept) W A Cr A:Cr
-6.66893 0.02784 16.46653 0.56324 2.41004
where for convenience we have made use of modelObj’s
Let’s examine the regression results for the model under consideration. First, the diagnostic plots defined for “lm” objects.
par(mfrow = c(2,2))
graphics::plot(x = OR1)
We see that the diagnostic plots for model \(Q^{1}(x,a;\beta)\) show some unusual behaviors. The two groupings of residuals in the Residuals vs Fitted and the Scale-Location plots are reflecting the fact that the model includes only the covariates W and Cr, neither of which are associated with outcome in the true regression relationship. Thus, for all practical purposes the model is fitting the two treatment means.
Now, let’s examine the summary statistics for the regression step
summary(object = OR1)
Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)
Residuals:
Min 1Q Median 3Q Max
-35.911 -7.605 -0.380 7.963 35.437
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.66893 4.67330 -1.427 0.15389
W 0.02784 0.02717 1.025 0.30564
A 16.46653 5.96413 2.761 0.00587 **
Cr 0.56324 5.07604 0.111 0.91167
A:Cr 2.41004 7.22827 0.333 0.73889
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.6 on 995 degrees of freedom
Multiple R-squared: 0.3853, Adjusted R-squared: 0.3828
F-statistic: 155.9 on 4 and 995 DF, p-value: < 2.2e-16
We see that the residual standard error is large and that the adjusted R-squared value is small.
Readers familiar with R might have noticed that the response variable specified in the call to the regression method is
The second model is an incomplete model having only one of the covariates of the true model,
\[ Q^{2}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + a~(\beta_{2} + \beta_{3} \text{Ch}). \]
As for \(Q^{1}(x,a;\beta)\), the parameters of this model will be estimated using ordinary least squares via
The modeling object for this regression step is
q2 <- modelObj::buildModelObj(model = ~ Ch*A,
solver.method = 'lm',
predict.method = 'predict.lm')
For \(Q^{2}(x,a;\beta)\) the regression can be completed as follows
OR2 <- modelObj::fit(object = q2, data = dataSBP, response = y)
OR2 <- modelObj::fitObject(object = OR2)
OR2
Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)
Coefficients:
(Intercept) Ch A Ch:A
36.5101 -0.2052 -89.5245 0.5074
where for convenience we have made use of modelObj’s
First, let’s examine the diagnostic plots defined for “lm” objects.
par(mfrow = c(2,4))
graphics::plot(x = OR2)
graphics::plot(x = OR1)
where we have included the diagnostic plots for model \(Q^{1}(x,a;\beta)\) for easy comparison. We see that the residuals from the fit of model \(Q^{2}(x,a;\beta)\) do not exhibit the two groupings, reflecting the fact that \(Q^{2}(x,a;\beta)\) is only partially misspecified in that it includes the important covariate Ch.
Now, let’s examine the summary statistics for the regression step
summary(object = OR2)
Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)
Residuals:
Min 1Q Median 3Q Max
-16.1012 -2.7476 -0.0032 2.6727 15.1825
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.510110 0.933019 39.13 <2e-16 ***
Ch -0.205226 0.004606 -44.56 <2e-16 ***
A -89.524507 1.471905 -60.82 <2e-16 ***
Ch:A 0.507374 0.006818 74.42 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.511 on 996 degrees of freedom
Multiple R-squared: 0.907, Adjusted R-squared: 0.9068
F-statistic: 3239 on 3 and 996 DF, p-value: < 2.2e-16
Comparing to the diagnostics for model \(Q^{1}(x,a;\beta)\), we see that the residual standard error is smaller (4.51 vs 11.6) and that the adjusted R-squared value is larger (0.91 vs 0.38). Both of these results indicate that model \(Q^{2}(x,a;\beta)\) is a more suitable model for \(E(Y|X=x,A=a)\) than model \(Q^{1}(x,a;\beta)\).
The third model we will consider is the correctly specified model used to generate the dataset,
\[ Q^{3}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + \beta_{2} \text{K} + a~(\beta_{3} + \beta_{4} \text{Ch} + \beta_{5} \text{K}). \]
As for \(Q^{1}(x,a;\beta)\) and \(Q^{2}(x,a;\beta)\), the parameters of this model will be estimated using ordinary least squares via
The modeling object for this regression step is
q3 <- modelObj::buildModelObj(model = ~ (Ch + K)*A,
solver.method = 'lm',
predict.method = 'predict.lm')
For \(Q^{3}(x,a;\beta)\) the regression can be completed as follows
OR3 <- modelObj::fit(object = q3, data = dataSBP, response = y)
OR3 <- modelObj::fitObject(object = OR3)
OR3
Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)
Coefficients:
(Intercept) Ch K A Ch:A K:A
-15.6048 -0.2035 12.2849 -61.0979 0.5048 -6.6099
where for convenience we have made use of modelObj’s
Again, let’s start by examining the diagnostic plots defined for “lm” objects.
par(mfrow = c(2,4))
graphics::plot(x = OR3)
graphics::plot(x = OR2)
where we have included the diagnostic plots for model \(Q^{2}(x,a;\beta)\) for easy comparison. We see that the results for models \(Q^{2}(x,a;\beta)\) and \(Q^{3}(x,a;\beta)\) are very similar, with model \(Q^{3}(x,a;\beta)\) yielding slightly better diagnostics (e.g., smaller residuals); a result in line with the knowledge that \(Q^{3}(x,a;\beta)\) is the model used to generate the data.
Now, let’s examine the summary statistics for the regression step
summary(object = OR3)
Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)
Residuals:
Min 1Q Median 3Q Max
-9.0371 -1.9376 0.0051 2.0127 9.6452
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.604845 1.636349 -9.536 <2e-16 ***
Ch -0.203472 0.002987 -68.116 <2e-16 ***
K 12.284852 0.358393 34.278 <2e-16 ***
A -61.097909 2.456637 -24.871 <2e-16 ***
Ch:A 0.504816 0.004422 114.168 <2e-16 ***
K:A -6.609876 0.538386 -12.277 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.925 on 994 degrees of freedom
Multiple R-squared: 0.961, Adjusted R-squared: 0.9608
F-statistic: 4897 on 5 and 994 DF, p-value: < 2.2e-16
As for model \(Q^{2}(x,a;\beta)\), we see that the residual standard error is smaller (2.93 vs 4.51) and that the adjusted R-squared value is larger (0.96 vs 0.91). Again, these results indicate that model \(Q^{3}(x,a;\beta)\) is a more suitable model than both models \(Q^{1}(x,a;\beta)\) and \(Q^{2}(x,a;\beta)\).
With the implementations defined, it is straightforward to estimate the treatment effect and the standard error for a variety of linear models. Throughout this chapter, we consider three outcome regression models selected to represent a range of model (mis)specification. Click on each tab below to see the respective analysis. The treatment effects and standard errors under all models are summarized under the heading Comparison.
The first model is a completely misspecified model
\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}). \]
For \(Q^{1}(x,a;\beta)\) the estimated treatment effect and its standard error are obtained as follows:
q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
solver.method = 'lm',
predict.method = 'predict.lm')
delta_OR_se(moOR = q1, data = dataSBP, y = y)
$fitOR
Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)
Coefficients:
(Intercept) W A Cr A:Cr
-6.66893 0.02784 16.46653 0.56324 2.41004
$EY
0 1
-4.184763 14.255836
$deltaHat
[1] 18.4406
$sigmaHat
[1] 0.7563737
We see that the estimated treatment effect is 18.44 (0.76) mmHg. Recall, the model diagnostic step indicated that there are issues with this model, and thus these results may be misleading.
The second model is an incomplete model having only one of the covariates of the true model,
\[ Q^{2}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + a~(\beta_{2} + \beta_{3} \text{Ch}). \]
The estimated treatment effect and its standard error are obtained as follows:
q2 <- modelObj::buildModelObj(model = ~ Ch*A,
solver.method = 'lm',
predict.method = 'predict.lm')
delta_OR_se(moOR = q2, data = dataSBP, y = y)
$fitOR
Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)
Coefficients:
(Intercept) Ch A Ch:A
36.5101 -0.2052 -89.5245 0.5074
$EY
0 1
-6.432054 10.208093
$deltaHat
[1] 16.64015
$sigmaHat
[1] 0.7561243
We see that the estimated treatment effect is 16.64 (0.76) mmHg, a value smaller than that obtained under the completely misspecified model, \(Q^{1}(x,a;\beta)\) and with similar standard error.
The third model we consider is the correctly specified model used to generate the dataset,
\[ Q^{3}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + \beta_{2} \text{K} + a~(\beta_{3} + \beta_{4} \text{Ch} + \beta_{5} \text{K}). \]
The estimated treatment effect and its standard error are obtained as follows:
q3 <- modelObj::buildModelObj(model = ~ (Ch + K)*A,
solver.method = 'lm',
predict.method = 'predict.lm')
delta_OR_se(moOR = q3, data = dataSBP, y = y)
$fitOR
Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)
Coefficients:
(Intercept) Ch K A Ch:A K:A
-15.6048 -0.2035 12.2849 -61.0979 0.5048 -6.6099
$EY
0 1
-6.458372 10.244167
$deltaHat
[1] 16.70254
$sigmaHat
[1] 0.7282849
We see that the estimated treatment effect is 16.7 (0.73) mmHg, a value similar to that obtained under the previously discussed incomplete model, \(Q^{2}(x,a;\beta)\), and with similar standard error.
In the table below, we see that the estimated treatment effect under these three models ranges from 16.64 mmHg - 18.44 mmHg.
(mmHg) | \(Q^{1}(x,a;\beta)\) | \(Q^{2}(x,a;\beta)\) | \(Q^{3}(x,a;\beta)\) |
\(\widehat{\delta}^{\text{*}}_{OR}\) | 18.44 (0.76) | 16.64 (0.76) | 16.70 (0.73) |
Following the suggestion of Rosenbaum and Rubin (1983, 1984) we can estimate \(\delta^{\text{*}}\) by stratifying individuals into \(K\) groups (\(K>1)\) based on the values of the estimated propensity score \(\pi(x;\widehat{\gamma})\).
Groups are defined using cut-off values such that \(0 = c_0 \lt c_1 \lt \cdots \lt c_K = 1\). Individual \(i\) belongs to group \(j\) if \(c_{j-1} \lt \pi(X_{i}; \widehat{\gamma}) \le c_j\); we denote group membership as \[ g_{ji} = \text{I}\left\{c_{j-1} < \pi(X_{i};\widehat{\gamma}_{1}) \le c_{j}\right\}; \] that is, \(g_{ji}\) is 1 if the propensity score for individual \(i\) satisfies \(c_{j-1} < \pi(X_{i};\widehat{\gamma}_{1}) \le c_{j}\) and is thereby in group \(j\), and \(g_{ji}\) is 0 otherwise.
Within each group, we can define sample average outcomes among individual in the group receiving treatment 1 and among those receiving treatment 0 as \[ \overline{Y}_{1j} = \frac{\frac{1}{n_{j}}\sum_{i = 1}^{n} A_{i}~Y_{i}~g_{ji}}{\frac{1}{n_{j}}\sum_{k=1}^{n} A_{k}~g_{jk}} \quad \textrm{and} \quad \overline{Y}_{0j} = \frac{\frac{1}{n_{j}}\sum_{i = 1}^{n}(1-A_{i})~Y_{i}~g_{ji}}{\frac{1}{n_{j}}\sum_{k=1}^{n} (1-A_{k})~g_{jk}}. \]
The treatment effect for group \(j\) is written as \[ \widehat{\delta}^{\text{*}}_{Sj} = \overline{Y}_{1j} - \overline{Y}_{0j}. \]
The stratification estimator for the treatment effect is the weighted sum of \(\widehat{\delta}^{\text{*}}_{Sj}\)
\[ \widehat{\delta}^{\text{*}}_{S} = \sum_{j=1}^{K}(\overline{Y}_{1j} - \overline{Y}_{0j}) (n_j/n). \]
where \(n_{j}\) is the number of individuals in the \(j^{th}\) group.
Cut-off values are chosen such that an approximately equal number of individuals are in each stratum.
We estimate the standard error using the pooled estimator: \[ \text{SE}_{pooled} = \left(\frac{1}{K}\sum_{j=1}^{K} \widehat{\sigma}_{j}^2\right)^{1/2}, \] where \(\sigma_{j}^2\) is the variance for group \(j\). For the stratification estimator, the variance is defined as
\[ \widehat{\sigma}^{2}_{j} = \frac{1}{n_{j}} \sum_{i=1}^{n} \left(\widehat{\delta}^{\text{*}}_{Sj,i} - \widehat{\delta}^{\text{*}}_{Sj}\right)^2 g_{ji}, \] where \(g_{ji}\) is the indicator of membership in group \(j\).
We have defined two functions to estimate the treatment effect and the standard error.
#------------------------------------------------------------------------------#
# stratification estimator of tx effect
#
# ASSUMPTIONS
# tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
# moPS : a modeling object specifying the propensity score regression step
# data : a data.frame containing covariates and tx
# *** tx must be named 'A' and coded 0/1 ***
# y : a vector containing the outcome of interest
# K : the integer number of strata
#
# RETURNS
# a list containing
# fitPS : a modelObjFit object containing the results of the
# propensity score regression step
# cj : the propensity score boundaries defining the strata
# grp : the number of individuals in each stratum
# EY : the sample mean of the outcome under each tx option
# deltaHatj : the stratification estimator for the tx effect for each strata
# deltaHat : the stratification estimator for the tx effect
#------------------------------------------------------------------------------#
delta_S <- function(moPS, data, y, K) {
#### Propensity Score Regression
fitPS <- modelObj::fit(object = moPS, data = data, response = data$A)
# estimated propensity score
p1 <- modelObj::predict(object = fitPS, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
#### Identify Strata
# cutoff points for K groups
probs <- seq(from = 0.0, to = 1.0, length.out = K+1L)
coff <- stats::quantile(x = p1, probs = probs)
coff[1L] <- 0.0
coff[K+1L] <- 1.0
# group ids for each individual
grp <- colSums(x = sapply(X = p1, FUN = function(x){x <= coff}))
#### Treatment Effect
EY <- matrix(data = 0.0, nrow = K, ncol = 2L,
dimnames = list(NULL, c("0","1")))
delta <- 0.0
deltaj <- array(data = 0.0, dim = K, dimnames = list(1L:K))
for (j in 1L:K) {
gji <- grp == j
EY[j,2L] <- mean(x = data$A[gji] * y[gji]) / mean(x = data$A[gji])
EY[j,1L] <- mean({1.0-data$A[gji]} * y[gji]) / mean(x = 1.0 - data$A[gji])
deltaj[j] <- EY[j,2L] - EY[j,1L]
delta <- delta + sum(gji) / nrow(x = data) * deltaj[j]
}
delta <- unname(obj = delta)
return( list( "fitPS" = fitPS,
"cj" = coff,
"grp" = table(bin = grp),
"EY" = EY,
"deltaHatj" = deltaj,
"deltaHat" = delta) )
}
#------------------------------------------------------------------------------#
# stratification estimator of tx effect and its standard error
#
# REQUIRES
# delta_S()
#
# ASSUMPTIONS
# tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
# moPS : a modeling object specifying the propensity score regression step
# data : a data.frame containing covariates and tx
# *** tx must be named 'A' and coded 0/1 ***
# y : a vector containing the outcome of interest
# K : the integer number of strata
#
# RETURNS
# a list containing
# fitPS : a modelObjFit object containing the results of the
# propensity score regression step
# cj : the propensity score boundaries defining the strata
# grp : the number of individuals in each stratum
# EY : the sample mean of the outcome under each tx option
# deltaHatj : the stratification estimator for the tx effect for each strata
# deltaHat : the stratification estimator for the tx effect
# sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_S_se <- function(moPS, data, y, K) {
#### Treatment Effect
delta <- delta_S(moPS = moPS, data = data, y = y, K = K)
#### Standard Error
# estimated propensity score
p1 <- modelObj::predict(object = delta$fitPS, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
# group ids for each individual
grp <- colSums(x = sapply(X = p1, FUN = function(x){x <= delta$cj}))
sigmaSq <- 0.0
for (j in 1L:K) {
gji <- grp == j
Y1ji <- data$A[gji] * y[gji] / mean(x = data$A[gji])
Y0ji <- {1.0-data$A[gji]} * y[gji] / mean(x = 1.0 - data$A[gji])
deltaji <- {Y1ji - Y0ji}
sigmaSq <- sigmaSq + mean(x = {deltaji - delta$deltaHatj[j]}^2)
}
sigmaSq <- sigmaSq / K
return( c(delta, "sigmaHat" = sqrt(x = sigmaSq / nrow(x = data))) )
}
Throughout Chapters 2-4, we consider three propensity score models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.
Click on each tab below to see the model and basic model diagnostic steps.
The first model is a completely misspecified model having none of the covariates used in the data generating model
\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]
The parameters of this model will be estimated using maximum likelihood via
p1 <- modelObj::buildModelObj(model = ~ W + Cr,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
Though we will implement our treatment effect and value estimators in such a way that the regression step is performed internally, it is convenient to do model diagnostics separately here. We will make use of modelObj::
For \(\pi^{1}(x;\gamma)\) the regression can be completed as follows:
PS1 <- modelObj::fit(object = p1, data = dataSBP, response = dataSBP$A)
PS1 <- modelObj::fitObject(object = PS1)
PS1
Call: glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Coefficients:
(Intercept) W Cr
0.966434 -0.007919 -0.703766
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1374 AIC: 1380
where for convenience we have made use of modelObj’s
Now, let’s examine the regression results for the model under consideration.
summary(object = PS1)
Call:
glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.239 -1.104 -1.027 1.248 1.443
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.966434 0.624135 1.548 0.1215
W -0.007919 0.004731 -1.674 0.0942 .
Cr -0.703766 0.627430 -1.122 0.2620
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1373.8 on 997 degrees of freedom
AIC: 1379.8
Number of Fisher Scoring iterations: 4
First, in comparing the null deviance (1377.8) and the residual deviance (1373.8), we see that including the independent variables does not significantly reduce the deviance. Thus, this model is not significantly better than the constant propensity score model. However, we know that the data mimics an observational study for which the propensity score is not constant or known. And, note that the Akaike Information Criterion (AIC) is large (1379.772). Though the AIC value alone does not tell us much about the quality of our model, we can compare this to other models to determine a relative quality.
The second model is an incomplete model having only one of the covariates of the true data generating model
\[ \pi^{2}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1} \text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1} \text{Ch})}. \]
As for \(\pi_{1}(h_{1};\gamma)\) the parameters of this model will be estimated using maximum likelihood via
The modeling objects for this regression step is
p2 <- modelObj::buildModelObj(model = ~ Ch,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The regression is completed as follows:
PS2 <- modelObj::fit(object = p2, data = dataSBP, response = dataSBP$A)
PS2 <- modelObj::fitObject(object = PS2)
PS2
Call: glm(formula = YinternalY ~ Ch, family = "binomial", data = data)
Coefficients:
(Intercept) Ch
-3.06279 0.01368
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1378
Residual Deviance: 1298 AIC: 1302
Again, we use
summary(PS2)
Call:
glm(formula = YinternalY ~ Ch, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7497 -1.0573 -0.7433 1.1449 1.9316
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.062786 0.348085 -8.799 <2e-16 ***
Ch 0.013683 0.001617 8.462 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1298.2 on 998 degrees of freedom
AIC: 1302.2
Number of Fisher Scoring iterations: 4
First, in comparing the null deviance (1377.8) and the residual deviance (1298.2), we see that including the independent variables leads to a smaller deviance than obtained from the intercept only model. And finally, we see that the AIC is large (1302.247), but it is less than that obtained for \(\pi^{1}(x;\gamma)\) (1379.772). This result is not unexpected as we know that the model is only partially misspecified.
The third model we will consider is the correctly specified model used to generate the data set,
\[ \pi^{3}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{SBP0} + \gamma_{2}~\text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{SBP0}+ \gamma_{2}~\text{Ch})}. \]
The parameters of this model will be estimated using maximum likelihood via
The modeling objects for this regression step is
p3 <- modelObj::buildModelObj(model = ~ SBP0 + Ch,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The regression is completed as follows:
PS3 <- modelObj::fit(object = p3, data = dataSBP, response = dataSBP$A)
PS3 <- modelObj::fitObject(object = PS3)
PS3
Call: glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Coefficients:
(Intercept) SBP0 Ch
-15.94153 0.07669 0.01589
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1162 AIC: 1168
Again, we use
summary(PS3)
Call:
glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3891 -0.9502 -0.4940 0.9939 2.1427
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.941527 1.299952 -12.263 <2e-16 ***
SBP0 0.076687 0.007196 10.657 <2e-16 ***
Ch 0.015892 0.001753 9.066 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1161.6 on 997 degrees of freedom
AIC: 1167.6
Number of Fisher Scoring iterations: 3
First, we see from the null deviance (1377.8) and the residual deviance (1161.6) that including the independent variables does reduce the deviance as compared to the intercept only model. And finally, we see that the AIC is large (1167.621), but it is less than that obtained for both \(\pi^{1}(x;\gamma)\) (1379.772) and \(\pi^{2}(x;\gamma)\) (1302.247). This result is in-line with the knowledge that this is the data generating model.
To illustrate, we assume \(K=5\) and evaluate the estimator and standard error for each of the propensity score regression models below.
For model \(\pi^{1}(x;\gamma)\), the completely misspecified model,
delta_S_se(moPS = p1, data = dataSBP, y = y, K = 5)
$fitPS
Call: glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Coefficients:
(Intercept) W Cr
0.966434 -0.007919 -0.703766
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1374 AIC: 1380
$cj
0% 20% 40% 60% 80% 100%
0.0000000 0.4282948 0.4469531 0.4623706 0.4795282 1.0000000
$grp
bin
1 2 3 4 5
200 198 202 200 200
$EY
0 1
[1,] -4.428571 14.65686
[2,] -5.218487 13.32911
[3,] -4.838384 14.85437
[4,] -3.425926 13.91304
[5,] -3.032787 14.08974
$deltaHatj
1 2 3 4 5
19.08543 18.54760 19.69275 17.33897 17.12253
$deltaHat
[1] 18.35975
$sigmaHat
[1] 0.8430623
The estimated treatment effect is 18.36 (0.84) mmHg, which is larger than the true treatment effect. Notice the tight range of propensity score cutoffs ($cj). This is consistent with the fact that this model is misspecified; at a minimum, it indicates that the \(K=5\) stratification estimator under this model is not appropriate.
For model \(\pi^{2}(x;\gamma)\), the partial model,
delta_S_se(moPS = p2, data = dataSBP, y = y, K = 5)
$fitPS
Call: glm(formula = YinternalY ~ Ch, family = "binomial", data = data)
Coefficients:
(Intercept) Ch
-3.06279 0.01368
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1378
Residual Deviance: 1298 AIC: 1302
$cj
0% 20% 40% 60% 80% 100%
0.0000000 0.3296184 0.4120421 0.4907518 0.5792204 1.0000000
$grp
bin
1 2 3 4 5
199 201 200 200 200
$EY
0 1
[1,] -19.085714 29.620155
[2,] -10.212766 17.504673
[3,] -6.640351 10.244186
[4,] -1.420168 3.061728
[5,] 6.362416 -7.176471
$deltaHatj
1 2 3 4 5
48.705869 27.717439 16.884537 4.481896 -13.538887
$deltaHat
[1] 16.82918
$sigmaHat
[1] 0.4056475
The estimated treatment effect is 16.83 (0.41) mmHg, which is smaller than the true treatment effect.
And for model \(\pi^{3}(x;\gamma)\), the correctly specified model,
delta_S_se(moPS = p3, data = dataSBP, y = y, K = 5)
$fitPS
Call: glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Coefficients:
(Intercept) SBP0 Ch
-15.94153 0.07669 0.01589
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1162 AIC: 1168
$cj
0% 20% 40% 60% 80% 100%
0.0000000 0.2379388 0.3743752 0.5098823 0.6679042 1.0000000
$grp
bin
1 2 3 4 5
200 200 200 200 200
$EY
0 1
[1,] -13.500000 21.5687500
[2,] -9.212766 15.9433962
[3,] -6.278846 10.0625000
[4,] -3.923077 6.5263158
[5,] 2.096970 -0.6285714
$deltaHatj
1 2 3 4 5
35.068750 25.156162 16.341346 10.449393 -2.725541
$deltaHat
[1] 16.85802
$sigmaHat
[1] 0.7607441
The estimated treatment effect is 16.86 (0.76) mmHg, which is smaller than the true treatment effect.
In the table below, we see that the estimated treatment effect under these three models ranges from 16.83 mmHg - 18.36 mmHg.
(mmHg) | \(\pi^{1}(x;\gamma)\) | \(\pi^{2}(x;\gamma)\) | \(\pi^{3}(x;\gamma)\) |
\(\widehat{\delta}^{\text{*}}_{S}\) | 18.36 (0.84) | 16.83 (0.41) | 16.86 (0.76) |
The inverse probability weighted (IPW) estimator of the treatment effect, \(\widehat{\delta}^{\text{*}}_{IPW}\), is defined as
\[ \widehat{\delta}^{\text{*}}_{IPW} = n^{-1} \sum_{i=1}^{n} \frac{A_{i} Y_{i}}{\pi(X_i; \widehat{\gamma})} - \frac{(1-A_{i}) Y_{i}}{\{1-\pi(X_i; \widehat{\gamma})\}} \] where \(\pi(X_i; \widehat{\gamma})\) is a parametric model for \(\pi(X_i) = P(A=1|X)\).
For treatment coded as \(A=\{0,1\}\), estimator \(\widehat{\delta}^{*}_{IPW}\) and the maximum likelihood (ML) estimate for \(\widehat{\gamma}\), the \((1,1)\) component of \(\widehat{\Sigma}\) is
\[ \widehat{\Sigma}_{11} =A_n + (C^{\intercal}_n~D^{-1}_n)~B_{n}~(C^{\intercal}_n~D^{-1}_n)^{\intercal} . \]
where \(A_{n} = n^{-1} \sum_{i=1}^{n} A_{ni} A_{ni}\) and \(B_{n} = n^{-1} \sum_{i=1}^{n} B_{ni} B_{ni}^{\intercal}\)
\[ \begin{aligned} A_{ni} &= \frac{A_{i} Y_{i}}{\pi(X_i; \widehat{\gamma})} - \frac{(1-A_{i}) Y_{i}}{\{1-\pi(X_i; \widehat{\gamma})\}} - \widehat{\delta}^{\text{*}}_{IPW}, \\ ~\\ B_{ni} &= \frac{ A_i - \pi(X_i;\widehat{\gamma})}{\pi(X_i; \widehat{\gamma})\{1-\pi(X_i; \widehat{\gamma})\}} \frac{\partial \pi(X_{i}; \widehat{\gamma})}{\partial \gamma}. \end{aligned} \]
And
\[ C_n = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial {\gamma}^{\intercal}} \quad D_n = n^{-1}\sum_{i=1}^{n}\frac{\partial B_{ni}}{\partial {\gamma}} \]
Anticipating later methods, we break the implementation of the treatment effect and the sandwich estimator of the variance into multiple functions, each function completing a specific component.
This implementation is not the most efficient implementation of the variance estimator. For example, the regression step is completed twice (once in
#------------------------------------------------------------------------------#
# IPW estimator of tx effect
#
# ASSUMPTIONS
# tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
# moPS : a modeling object specifying the propensity score regression step
# data : a data.frame containing covariates and tx
# y : a vector containing the outcome of interest
#
# RETURNS
# a list containing
# fitPS : a modelObjFit object containing the results of the
# propensity score regression step
# EY : the sample mean of the outcome under each tx option
# deltaHat : the outcome regression estimator for the tx effect
#------------------------------------------------------------------------------#
delta_IPW <- function(moPS, data, y) {
#### Propensity Score
fitPS <- modelObj::fit(object = moPS, data = data, response = data$A)
# estimated propensity score
p1 <- modelObj::predict(object = fitPS, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
#### Treatment Effect
EY <- array(data = 0.0, dim = 2L, dimnames = list(c("0","1")))
EY[1L] <- mean(x = {1.0 - data$A} * y / {1.0 - p1})
EY[2L] <- mean(x = data$A * y / p1)
delta <- unname(obj = EY[2L] - EY[1L])
return( list( "fitPS" = fitPS,
"EY" = EY,
"deltaHat" = delta) )
}
#------------------------------------------------------------------------------#
# IPW estimator of the tx effect and its standard error
#
# REQUIRES
# swv_ML() and delta_IPW_swv()
#
# ASSUMPTIONS
# tx is denoted by A and is binary coded as {0,1}
# propensity score regression model is a logistic model and parameters are
# estimated using ML
#
# INPUTS
# moOR : a modeling object specifying the propensity score regression step
# *** must be a logistic model ***
# data : a data.frame containing covariates and tx
# *** tx must be named 'A' and coded 0/1 ***
# y : a vector containing the outcome of interest
#
# RETURNS
# a list containing
# fitPS : a modelObjFit object containing the results of the
# propensity score regression step
# EY : the sample mean of the outcome under each tx option
# deltaHat : the outcome regression estimator for the tx effect
# sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_IPW_se <- function(moPS, data, y){
#### ML components
ML <- swv_ML(mo = moPS, data = data, y = data$A)
#### estimator components
IPW <- delta_IPW_swv(moPS = moPS, data = data, y = y)
#### 1,1 Component of Sandwich Estimator
# ML contribution
temp <- IPW$dMdG %*% ML$invdM
sig11ML <- temp %*% ML$MM %*% t(x = temp)
sig11 <- drop(x = IPW$MM + sig11ML)
return( c(IPW$delta, "sigmaHat" = sqrt(x = sig11 / nrow(x = data))) )
}
#------------------------------------------------------------------------------#
# tx effect estimator component of the sandwich estimator IPW estimator
#
# REQUIRES
# delta_IPW()
#
# ASSUMPTIONS
# propensity score regression model is a logistic model
# tx is denoted by A and is binary coded as {0,1}
#
# INPUTS:
# moPS : a modeling object specifying the propensity score regression step
# *** must be a logistic model ***
# data : a data.frame containing covariates and tx
# *** tx must be named 'A' and coded 0/1 ***
# y : a vector containing the outcome of interest
#
# OUTPUTS:
# a list containing
# delta : the list returned by delta_IPW()
# MM : M M^T matrix
# dMdG : matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
delta_IPW_swv <- function(moPS, data, y) {
# estimate treatment effect
delta <- delta_IPW(moPS = moPS, data = data, y = y)
# pi(x; gamma)
p1 <- modelObj::predict(object = delta$fitPS, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
# model.matrix for propensity model
piMM <- stats::model.matrix(object = modelObj::model(object = moPS),
data = data)
# delta component of M MT
mmDelta <- mean(x = {data$A * y / p1 - {1.0-data$A} * y / {1.0-p1} -
delta$deltaHat}^2)
# derivative w.r.t. gamma
dMdG <- -{ data$A * y * {1.0-p1} / p1 +
{1.0-data$A} * y * p1 / {1.0-p1} } * piMM
dMdG <- colMeans(x = dMdG)
return( list("delta" = delta,
"MM" = mmDelta,
"dMdG" = dMdG) )
}
#------------------------------------------------------------------------------#
# components of the sandwich estimator for a maximum likelihood estimation of
# a logistic regression model
#
# ASSUMPTIONS
# mo is a logistic model with parameters to be estimated using ML
#
# INPUTS
# mo : a modeling object specifying a regression step
# *** must be a logistic model ***
# data : a data.frame containing covariates and tx
# y : a vector containing the binary outcome of interest
#
#
# RETURNS
# a list containing
# MM : M^T M component from ML
# invdM : inverse of the matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
swv_ML <- function(mo, data, y) {
# regression step
fit <- modelObj::fit(object = mo, data = data, response = y)
# yHat
p1 <- modelObj::predict(object = fit, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
# model.matrix for model
piMM <- stats::model.matrix(object = modelObj::model(object = mo),
data = data)
n <- nrow(x = piMM)
p <- ncol(x = piMM)
# ML M-estimator component
mMLi <- {y - p1} * piMM
# ML component of M MT
mmML <- sapply(X = 1L:n,
FUN = function(i){ mMLi[i,] %o% mMLi[i,] },
simplify = "array")
# derivative of ML component
dFun <- function(i){
- piMM[i,] %o% piMM[i,] * p1[i] * {1.0 - p1[i]}
}
dmML <- sapply(X = 1L:n, FUN = dFun, simplify = "array")
if( p == 1L ) {
mmML <- mean(x = mmML)
dmML <- mean(x = dmML)
} else {
mmML <- rowMeans(x = mmML, dim = 2L)
dmML <- rowMeans(x = dmML, dim = 2L)
}
# invert derivative ML component
invD <- solve(a = dmML)
return( list("MM" = mmML, "invdM" = invD) )
}
Throughout Chapters 2-4, we consider three propensity score models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.
Click on each tab below to see the model and basic model diagnostic steps.
The first model is a completely misspecified model having none of the covariates used in the data generating model
\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]
The parameters of this model will be estimated using maximum likelihood via
p1 <- modelObj::buildModelObj(model = ~ W + Cr,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
Though we will implement our treatment effect and value estimators in such a way that the regression step is performed internally, it is convenient to do model diagnostics separately here. We will make use of modelObj::
For \(\pi^{1}(x;\gamma)\) the regression can be completed as follows:
PS1 <- modelObj::fit(object = p1, data = dataSBP, response = dataSBP$A)
PS1 <- modelObj::fitObject(object = PS1)
PS1
Call: glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Coefficients:
(Intercept) W Cr
0.966434 -0.007919 -0.703766
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1374 AIC: 1380
where for convenience we have made use of modelObj’s
Now, let’s examine the regression results for the model under consideration.
summary(object = PS1)
Call:
glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.239 -1.104 -1.027 1.248 1.443
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.966434 0.624135 1.548 0.1215
W -0.007919 0.004731 -1.674 0.0942 .
Cr -0.703766 0.627430 -1.122 0.2620
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1373.8 on 997 degrees of freedom
AIC: 1379.8
Number of Fisher Scoring iterations: 4
First, in comparing the null deviance (1377.8) and the residual deviance (1373.8), we see that including the independent variables does not significantly reduce the deviance. Thus, this model is not significantly better than the constant propensity score model. However, we know that the data mimics an observational study for which the propensity score is not constant or known. And, note that the Akaike Information Criterion (AIC) is large (1379.772). Though the AIC value alone does not tell us much about the quality of our model, we can compare this to other models to determine a relative quality.
The second model is an incomplete model having only one of the covariates of the true data generating model
\[ \pi^{2}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1} \text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1} \text{Ch})}. \]
As for \(\pi_{1}(h_{1};\gamma)\) the parameters of this model will be estimated using maximum likelihood via
The modeling objects for this regression step is
p2 <- modelObj::buildModelObj(model = ~ Ch,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The regression is completed as follows:
PS2 <- modelObj::fit(object = p2, data = dataSBP, response = dataSBP$A)
PS2 <- modelObj::fitObject(object = PS2)
PS2
Call: glm(formula = YinternalY ~ Ch, family = "binomial", data = data)
Coefficients:
(Intercept) Ch
-3.06279 0.01368
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1378
Residual Deviance: 1298 AIC: 1302
Again, we use
summary(PS2)
Call:
glm(formula = YinternalY ~ Ch, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7497 -1.0573 -0.7433 1.1449 1.9316
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.062786 0.348085 -8.799 <2e-16 ***
Ch 0.013683 0.001617 8.462 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1298.2 on 998 degrees of freedom
AIC: 1302.2
Number of Fisher Scoring iterations: 4
First, in comparing the null deviance (1377.8) and the residual deviance (1298.2), we see that including the independent variables leads to a smaller deviance than obtained from the intercept only model. And finally, we see that the AIC is large (1302.247), but it is less than that obtained for \(\pi^{1}(x;\gamma)\) (1379.772). This result is not unexpected as we know that the model is only partially misspecified.
The third model we will consider is the correctly specified model used to generate the data set,
\[ \pi^{3}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{SBP0} + \gamma_{2}~\text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{SBP0}+ \gamma_{2}~\text{Ch})}. \]
The parameters of this model will be estimated using maximum likelihood via
The modeling objects for this regression step is
p3 <- modelObj::buildModelObj(model = ~ SBP0 + Ch,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The regression is completed as follows:
PS3 <- modelObj::fit(object = p3, data = dataSBP, response = dataSBP$A)
PS3 <- modelObj::fitObject(object = PS3)
PS3
Call: glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Coefficients:
(Intercept) SBP0 Ch
-15.94153 0.07669 0.01589
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1162 AIC: 1168
Again, we use
summary(PS3)
Call:
glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3891 -0.9502 -0.4940 0.9939 2.1427
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.941527 1.299952 -12.263 <2e-16 ***
SBP0 0.076687 0.007196 10.657 <2e-16 ***
Ch 0.015892 0.001753 9.066 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1161.6 on 997 degrees of freedom
AIC: 1167.6
Number of Fisher Scoring iterations: 3
First, we see from the null deviance (1377.8) and the residual deviance (1161.6) that including the independent variables does reduce the deviance as compared to the intercept only model. And finally, we see that the AIC is large (1167.621), but it is less than that obtained for both \(\pi^{1}(x;\gamma)\) (1379.772) and \(\pi^{2}(x;\gamma)\) (1302.247). This result is in-line with the knowledge that this is the data generating model.
The IPW treatment effect and estimated standard error for the previously defined propensity score models are obtained as follows.
For model \(\pi^{1}(x;\gamma)\), the completely misspecified model,
delta_IPW_se(moPS = p1, data = dataSBP, y = y)
$fitPS
Call: glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Coefficients:
(Intercept) W Cr
0.966434 -0.007919 -0.703766
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1374 AIC: 1380
$EY
0 1
-4.202015 14.232216
$deltaHat
[1] 18.43423
$sigmaHat
[1] 0.9261142
The estimated treatment effect is 18.43 (0.93) mmHg, which is larger than the true treatment effect.
For model \(\pi^{2}(x;\gamma)\), the incomplete model,
delta_IPW_se(moPS = p2, data = dataSBP, y = y)
$fitPS
Call: glm(formula = YinternalY ~ Ch, family = "binomial", data = data)
Coefficients:
(Intercept) Ch
-3.06279 0.01368
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1378
Residual Deviance: 1298 AIC: 1302
$EY
0 1
-6.447606 10.205682
$deltaHat
[1] 16.65329
$sigmaHat
[1] 0.8205346
The estimated treatment effect is 16.65 (0.82) mmHg, which is smaller than the true treatment effect.
For model \(\pi^{3}(x;\gamma)\), the model used to generate the data,
delta_IPW_se(moPS = p3, data = dataSBP, y = y)
$fitPS
Call: glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Coefficients:
(Intercept) SBP0 Ch
-15.94153 0.07669 0.01589
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1162 AIC: 1168
$EY
0 1
-6.562869 10.445915
$deltaHat
[1] 17.00878
$sigmaHat
[1] 0.9855199
The estimated treatment effect is 17.01 (0.99) mmHg, which is smaller than the true treatment effect.
In the table below, we see that the estimated treatment effect under these three models ranges from 16.65 mmHg - 18.43 mmHg.
(mmHg) | \(\pi^{1}(x;\gamma)\) | \(\pi^{2}(x;\gamma)\) | \(\pi^{3}(x;\gamma)\) |
\(\widehat{\delta}^{\text{*}}_{IPW}\) | 18.43 (0.93) | 16.65 (0.82) | 17.01 (0.99) |
The doubly robust estimator for the causal treatment effect, \(\delta^{\text{*}}\), is given by
\[ \begin{aligned} \widehat{\delta^{\text{*}}}_{DR} = n^{-1} \sum_{i=1}^{n} \Bigg[ & \frac{A_{i} Y_{i}}{\pi(X_{i}; \widehat{\gamma})} - \frac{(1-A_{i}) Y_{i}}{1-\pi(X_{i}; \widehat{\gamma})} \left. - \{A_{i} - \pi(X_{i}; \widehat{\gamma})\} \left\{ \frac{Q(X_{i},1;\widehat{\beta})}{\pi(X_{i}; \widehat{\gamma})}+ \frac{Q(X_{i},0;\widehat{\beta})}{1-\pi(X_{i}; \widehat{\gamma})} \right\} \right], \end{aligned} \]
where \(Q(x,a;\beta)\) is a parametric model for \(Q(x,a)= E(Y|X=x,A=a)\) and \(\pi(x; \gamma)\) is that for the propensity score, \(\pi(x) = P(A=1|X)\).
The \((1,1)\) component of the sandwich estimator for the variance of the doubly robust estimator is defined as
\[ \widehat{\Sigma}_{11} = A_{n} + ( D_{n}^{\intercal}~E_{n}^{-1})~B_{n}~( D_{n}^{\intercal}~E_{n}^{-1})^{T} + ( G_{n}^{\intercal}~J_{n}^{-1})~C_{n}~( G_{n}^{\intercal}~J_{n}^{-1})^{T}, \]
where \(A_{n} = n^{-1} \sum_{i=1}^{n} A_{ni} A_{ni}^{\intercal}\), \(B_{n} = n^{-1} \sum_{i=1}^{n} B_{ni} B_{ni}^{\intercal}\), \(C_{n} = n^{-1} \sum_{i=1}^{n} C_{ni} C_{ni}^{\intercal}\), and
\[ \begin{aligned} A_{ni} &= \frac{A_{i}~Y_{i}}{\pi(X_{i};\widehat{\gamma})} - \frac{(1-A_{i})Y_{i}}{1-\pi(X_{i};\widehat{\gamma})} - \left\{A_{i}-\pi(X_{i};\widehat{\gamma})\right\} \left\{\frac{Q(X_{i},1;\widehat{\beta})}{\pi(X_{i};\widehat{\gamma})} + \frac{Q(X_{i},0;\widehat{\beta})}{1-\pi(X_{i};\widehat{\gamma})}\right\} - \widehat{\delta}^{\text{*}}_{DR}, \\ \\ B_{ni} &= \left\{ Y_{i} - Q(X_{i},A_{i};\widehat{\beta}) \right\} \frac{ \partial Q(X_{i},A_{i};\widehat{\beta})}{\partial \beta}, \\ \\ C_{ni} &= \frac{ A_i - \pi(X_i;\widehat{\gamma})}{\pi(X_i;\widehat{\gamma})\{1-\pi(X_i;\widehat{\gamma})\}} \frac{\partial \pi(X_i;\widehat{\gamma})}{\partial \gamma}. \end{aligned} \]
And,
\[ D_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial {\beta}} \quad G_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial {\gamma}} \quad E_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial B_{ni}}{\partial {\beta}^{\intercal}} \quad J_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial C_{ni}}{\partial {\gamma}^{\intercal}} \]
We break the implementation of the treatment effect and the sandwich estimator of the variance into multiple functions and make use of previously defined functions,
#------------------------------------------------------------------------------#
# doubly robust estimator of tx effect
#
# ASSUMPTIONS
# tx is denoted by A and is binary coded as {0,1}
#
# INPUTS
# moOR : a modeling object specifying the outcome regression step
# moPS : a modeling object specifying the propensity score regression step
# data : a data.frame containing covariates and tx
# y : a vector containing the outcome of interest
#
# RETURNS
# a list containing
# fitOR : a modelObjFit object containing the results of the
# outcome regression step
# fitPS : a modelObjFit object containing the results of the
# propensity score regression step
# EY : the sample mean of the outcome under each tx option
# deltaHat : the doubly robust estimator for the tx effect
#------------------------------------------------------------------------------#
delta_DR <- function(moOR, moPS, data, y) {
#### Propensity Score
fitPS <- modelObj::fit(object = moPS, data = data, response = data$A)
# estimated propensity score
p1 <- modelObj::predict(object = fitPS, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
#### Outcome Regression
fitOR <- modelObj::fit(object = moOR, data = data, response = y)
# store tx variable
A <- data$A
# estimated Q-function when all A=0
data$A <- 0L
Q0 <- drop(x = modelObj::predict(object = fitOR, newdata = data))
# estimated Q-function when all A=1
data$A <- 1L
Q1 <- drop(x = modelObj::predict(object = fitOR, newdata = data))
#### Treatment Effect
EY <- array(data = 0.0, dim = 2L, dimnames = list(c("0","1")))
aug <- {A - p1} * {Q1 / p1 + Q0 / {1.0 - p1}}
EY[2L] <- mean(x = {A == 1L} * {y / p1 - aug})
EY[1L] <- mean(x = {A == 0L} * {y / {1.0 - p1} + aug})
delta <- unname(obj = EY[2L] - EY[1L])
return( list( "fitOR" = fitOR,
"fitPS" = fitPS,
"deltaHat" = delta,
"EY" = EY) )
}
#------------------------------------------------------------------------------#
# doubly robust estimator of tx effect and its standard error
#
# REQUIRES
# swv_ML(), swv_OLS(), and delta_DR_swv()
#
# ASSUMPTIONS
# tx is denoted by A and is binary coded as {0,1}
# outcome regression model is a linear model and parameters are estimated
# using OLS
# propensity score regression model is a logistic model and parameters are
# estimated using ML
#
# INPUTS
# moOR : a modeling object specifying the outcome regression step
# *** must be a linear model ***
# moPS : a modeling object specifying the propensity score regression step
# *** must be a logistic model ***
# data : a data.frame containing covariates and tx
# *** tx must be named 'A' and coded 0/1 ***
# y : a vector containing the outcome of interest
#
# RETURNS
# a list containing
# fitOR : a modelObjFit object containing the results of the
# outcome regression step
# fitPS : a modelObjFit object containing the results of the
# propensity score regression step
# EY : the sample mean of the outcome under each tx option
# deltaHat : the doubly robust estimator for the tx effect
# sigmaHat : the estimated standard error
#------------------------------------------------------------------------------#
delta_DR_se <- function(moOR, moPS, data, y) {
#### ML components
ML <- swv_ML(mo = moPS, data = data, y = data$A)
#### OLS components
OLS <- swv_OLS(mo = moOR, data = data, y = y)
#### estimator components
DR <- delta_DR_swv(moOR = moOR, moPS = moPS, data = data, y = y)
#### 1,1 Component of Sandwich Estimator
# ML contribution
temp <- DR$dMdG %*% ML$invdM
sig11ML <- temp %*% ML$MM %*% t(x = temp)
# OLS contribution
temp <- DR$dMdB %*% OLS$invdM
sig11OLS <- temp %*% OLS$MM %*% t(x = temp)
sig11 <- drop(x = DR$MM + sig11ML + sig11OLS)
return( c(DR$delta, "sigmaHat" = sqrt(x = sig11 / nrow(x = data))) )
}
#------------------------------------------------------------------------------#
# tx effect estimator component of the sandwich estimator
# doubly robust estimator
#
# REQUIRES
# delta_DR()
#
# ASSUMPTIONS
# outcome regression model is a linear model
# propensity score regression model is a logistic model
# tx is denoted by A and is binary coded as {0,1}
#
# INPUTS:
# moOR : a modeling object specifying the outcome regression step
# *** must be a linear model ***
# moPS : a modeling object specifying the propensity score regression step
# *** must be a logistic model ***
# data : a data.frame containing covariates and tx
# *** tx must be named 'A' and coded 0/1 ***
# y : a vector containing the outcome of interest
#
# OUTPUTS:
# a list containing
# delta : the list returned by delta_DR()
# MM : M M^T matrix
# dMdB : matrix of derivatives of M wrt beta
# dMdG : matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
delta_DR_swv <- function(moOR, moPS, data, y) {
# estimate treatment effect
delta <- delta_DR(moOR = moOR, moPS = moPS, data = data, y = y)
# pi(x;gammaHat)
p1 <- modelObj::predict(object = delta$fitPS, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
# model.matrix for propensity model
piMM <- stats::model.matrix(object = modelObj::model(object = moPS),
data = data)
A <- data$A
# Q(x,A=0;betaHat)
data$A <- 0L
Q0 <- drop(modelObj::predict(object = delta$fitOR, newdata = data))
# dQ(x,A=0;betaHat)
# derivative of a linear model is the model.matrix
dQ0 <- stats::model.matrix(object = modelObj::model(object = moOR),
data = data)
# Q(x,A=1;betaHat)
data$A <- 1L
Q1 <- drop(modelObj::predict(object = delta$fitOR, newdata = data))
# dQ(x,A=1;betaHat)
# derivative of a linear model is the model.matrix
dQ1 <- stats::model.matrix(object = modelObj::model(object = moOR),
data = data)
data$A <- A
# delta component of M-equation
mDeltai <- data$A*y/p1 - {1.0 - data$A}*y/{1.0 - p1} -
{data$A - p1}*{Q1/p1 + Q0/{1.0 - p1}} - delta$deltaHat
mmDelta <- mean(x = mDeltai^2)
# derivative of delta component w.r.t. beta
dMdB <- colMeans(x = {data$A - p1}*{dQ1/p1 + dQ0/{1.0 - p1}})
# derivative of delta component w.r.t. gamma
dMdG <- - data$A/p1^2*{y - Q1} - {1.0 - data$A}/{1.0 - p1}^2*{y - Q0}
dMdG <- colMeans(x = dMdG*p1*{1.0 - p1}*piMM)
return( list("delta" = delta,
"MM" = mmDelta,
"dMdB" = dMdB,
"dMdG" = dMdG) )
}
#------------------------------------------------------------------------------#
# components of the sandwich estimator for a maximum likelihood estimation of
# a logistic regression model
#
# ASSUMPTIONS
# mo is a logistic model with parameters to be estimated using ML
#
# INPUTS
# mo : a modeling object specifying a regression step
# *** must be a logistic model ***
# data : a data.frame containing covariates and tx
# y : a vector containing the binary outcome of interest
#
#
# RETURNS
# a list containing
# MM : M^T M component from ML
# invdM : inverse of the matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
swv_ML <- function(mo, data, y) {
# regression step
fit <- modelObj::fit(object = mo, data = data, response = y)
# yHat
p1 <- modelObj::predict(object = fit, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
# model.matrix for model
piMM <- stats::model.matrix(object = modelObj::model(object = mo),
data = data)
n <- nrow(x = piMM)
p <- ncol(x = piMM)
# ML M-estimator component
mMLi <- {y - p1} * piMM
# ML component of M MT
mmML <- sapply(X = 1L:n,
FUN = function(i){ mMLi[i,] %o% mMLi[i,] },
simplify = "array")
# derivative of ML component
dFun <- function(i){
- piMM[i,] %o% piMM[i,] * p1[i] * {1.0 - p1[i]}
}
dmML <- sapply(X = 1L:n, FUN = dFun, simplify = "array")
if( p == 1L ) {
mmML <- mean(x = mmML)
dmML <- mean(x = dmML)
} else {
mmML <- rowMeans(x = mmML, dim = 2L)
dmML <- rowMeans(x = dmML, dim = 2L)
}
# invert derivative ML component
invD <- solve(a = dmML)
return( list("MM" = mmML, "invdM" = invD) )
}
#------------------------------------------------------------------------------#
# components of the sandwich estimator for an ordinary least squares estimation
# of a linear regression model
#
# ASSUMPTIONS
# mo is a linear model with parameters to be estimated using OLS
#
# INPUTS
# mo : a modeling object specifying a regression step
# *** must be a linear model ***
# data : a data.frame containing covariates and tx
# y : a vector containing the outcome of interest
#
#
# RETURNS
# a list containing
# MM : M^T M component from OLS
# invdM : inverse of the matrix of derivatives of M wrt beta
#------------------------------------------------------------------------------#
swv_OLS <- function(mo, data, y) {
n <- nrow(x = data)
fit <- modelObj::fit(object = mo, data = data, response = y)
# Q(X,A;betaHat)
Qa <- drop(x = modelObj::predict(object = fit, newdata = data))
# derivative of Q(X,A;betaHat)
dQa <- stats::model.matrix(object = modelObj::model(object = mo),
data = data)
# number of parameters in model
p <- ncol(x = dQa)
# OLS component of M
mOLSi <- {y - Qa}*dQa
# OLS component of M MT
mmOLS <- sapply(X = 1L:n,
FUN = function(i){ mOLSi[i,] %o% mOLSi[i,] },
simplify = "array")
# derivative OLS component
dmOLS <- - sapply(X = 1L:n,
FUN = function(i){ dQa[i,] %o% dQa[i,] },
simplify = "array")
# sum over individuals
if (p == 1L) {
mmOLS <- mean(x = mmOLS)
dmOLS <- mean(x = dmOLS)
} else {
mmOLS <- rowMeans(x = mmOLS, dim = 2L)
dmOLS <- rowMeans(x = dmOLS, dim = 2L)
}
# invert derivative OLS component
invD <- solve(a = dmOLS)
return( list("MM" = mmOLS, "invdM" = invD) )
}
Throughout Chapters 2-4, we consider three outcome regression models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.
Click on each tab below to see the model and basic model diagnostic steps.
The first model is a completely misspecified model
\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}). \]
The parameters of this model will be estimated using ordinary least squares via
The modeling objects for this regression step is
q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
solver.method = 'lm',
predict.method = 'predict.lm')
Though ultimately, the regression steps will be performed within the implementations of the treatment effect and value estimators, we make use of modelObj::
For \(Q^{1}(x,a;\beta)\) the regression can be completed as follows
OR1 <- modelObj::fit(object = q1, data = dataSBP, response = y)
OR1 <- modelObj::fitObject(object = OR1)
OR1
Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)
Coefficients:
(Intercept) W A Cr A:Cr
-6.66893 0.02784 16.46653 0.56324 2.41004
where for convenience we have made use of modelObj’s
Let’s examine the regression results for the model under consideration. First, the diagnostic plots defined for “lm” objects.
par(mfrow = c(2,2))
graphics::plot(x = OR1)
We see that the diagnostic plots for model \(Q^{1}(x,a;\beta)\) show some unusual behaviors. The two groupings of residuals in the Residuals vs Fitted and the Scale-Location plots are reflecting the fact that the model includes only the covariates W and Cr, neither of which are associated with outcome in the true regression relationship. Thus, for all practical purposes the model is fitting the two treatment means.
Now, let’s examine the summary statistics for the regression step
summary(object = OR1)
Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)
Residuals:
Min 1Q Median 3Q Max
-35.911 -7.605 -0.380 7.963 35.437
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.66893 4.67330 -1.427 0.15389
W 0.02784 0.02717 1.025 0.30564
A 16.46653 5.96413 2.761 0.00587 **
Cr 0.56324 5.07604 0.111 0.91167
A:Cr 2.41004 7.22827 0.333 0.73889
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.6 on 995 degrees of freedom
Multiple R-squared: 0.3853, Adjusted R-squared: 0.3828
F-statistic: 155.9 on 4 and 995 DF, p-value: < 2.2e-16
We see that the residual standard error is large and that the adjusted R-squared value is small.
Readers familiar with R might have noticed that the response variable specified in the call to the regression method is
The second model is an incomplete model having only one of the covariates of the true model,
\[ Q^{2}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + a~(\beta_{2} + \beta_{3} \text{Ch}). \]
As for \(Q^{1}(x,a;\beta)\), the parameters of this model will be estimated using ordinary least squares via
The modeling object for this regression step is
q2 <- modelObj::buildModelObj(model = ~ Ch*A,
solver.method = 'lm',
predict.method = 'predict.lm')
For \(Q^{2}(x,a;\beta)\) the regression can be completed as follows
OR2 <- modelObj::fit(object = q2, data = dataSBP, response = y)
OR2 <- modelObj::fitObject(object = OR2)
OR2
Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)
Coefficients:
(Intercept) Ch A Ch:A
36.5101 -0.2052 -89.5245 0.5074
where for convenience we have made use of modelObj’s
First, let’s examine the diagnostic plots defined for “lm” objects.
par(mfrow = c(2,4))
graphics::plot(x = OR2)
graphics::plot(x = OR1)
where we have included the diagnostic plots for model \(Q^{1}(x,a;\beta)\) for easy comparison. We see that the residuals from the fit of model \(Q^{2}(x,a;\beta)\) do not exhibit the two groupings, reflecting the fact that \(Q^{2}(x,a;\beta)\) is only partially misspecified in that it includes the important covariate Ch.
Now, let’s examine the summary statistics for the regression step
summary(object = OR2)
Call:
lm(formula = YinternalY ~ Ch + A + Ch:A, data = data)
Residuals:
Min 1Q Median 3Q Max
-16.1012 -2.7476 -0.0032 2.6727 15.1825
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.510110 0.933019 39.13 <2e-16 ***
Ch -0.205226 0.004606 -44.56 <2e-16 ***
A -89.524507 1.471905 -60.82 <2e-16 ***
Ch:A 0.507374 0.006818 74.42 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.511 on 996 degrees of freedom
Multiple R-squared: 0.907, Adjusted R-squared: 0.9068
F-statistic: 3239 on 3 and 996 DF, p-value: < 2.2e-16
Comparing to the diagnostics for model \(Q^{1}(x,a;\beta)\), we see that the residual standard error is smaller (4.51 vs 11.6) and that the adjusted R-squared value is larger (0.91 vs 0.38). Both of these results indicate that model \(Q^{2}(x,a;\beta)\) is a more suitable model for \(E(Y|X=x,A=a)\) than model \(Q^{1}(x,a;\beta)\).
The third model we will consider is the correctly specified model used to generate the dataset,
\[ Q^{3}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + \beta_{2} \text{K} + a~(\beta_{3} + \beta_{4} \text{Ch} + \beta_{5} \text{K}). \]
As for \(Q^{1}(x,a;\beta)\) and \(Q^{2}(x,a;\beta)\), the parameters of this model will be estimated using ordinary least squares via
The modeling object for this regression step is
q3 <- modelObj::buildModelObj(model = ~ (Ch + K)*A,
solver.method = 'lm',
predict.method = 'predict.lm')
For \(Q^{3}(x,a;\beta)\) the regression can be completed as follows
OR3 <- modelObj::fit(object = q3, data = dataSBP, response = y)
OR3 <- modelObj::fitObject(object = OR3)
OR3
Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)
Coefficients:
(Intercept) Ch K A Ch:A K:A
-15.6048 -0.2035 12.2849 -61.0979 0.5048 -6.6099
where for convenience we have made use of modelObj’s
Again, let’s start by examining the diagnostic plots defined for “lm” objects.
par(mfrow = c(2,4))
graphics::plot(x = OR3)
graphics::plot(x = OR2)
where we have included the diagnostic plots for model \(Q^{2}(x,a;\beta)\) for easy comparison. We see that the results for models \(Q^{2}(x,a;\beta)\) and \(Q^{3}(x,a;\beta)\) are very similar, with model \(Q^{3}(x,a;\beta)\) yielding slightly better diagnostics (e.g., smaller residuals); a result in line with the knowledge that \(Q^{3}(x,a;\beta)\) is the model used to generate the data.
Now, let’s examine the summary statistics for the regression step
summary(object = OR3)
Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)
Residuals:
Min 1Q Median 3Q Max
-9.0371 -1.9376 0.0051 2.0127 9.6452
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.604845 1.636349 -9.536 <2e-16 ***
Ch -0.203472 0.002987 -68.116 <2e-16 ***
K 12.284852 0.358393 34.278 <2e-16 ***
A -61.097909 2.456637 -24.871 <2e-16 ***
Ch:A 0.504816 0.004422 114.168 <2e-16 ***
K:A -6.609876 0.538386 -12.277 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.925 on 994 degrees of freedom
Multiple R-squared: 0.961, Adjusted R-squared: 0.9608
F-statistic: 4897 on 5 and 994 DF, p-value: < 2.2e-16
As for model \(Q^{2}(x,a;\beta)\), we see that the residual standard error is smaller (2.93 vs 4.51) and that the adjusted R-squared value is larger (0.96 vs 0.91). Again, these results indicate that model \(Q^{3}(x,a;\beta)\) is a more suitable model than both models \(Q^{1}(x,a;\beta)\) and \(Q^{2}(x,a;\beta)\).
Throughout Chapters 2-4, we consider three propensity score models selected to represent a range of model (mis)specification. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc, to arrive at a posited model. Our objective is to illustrate how the methods work under a range of model (mis)specification.
Click on each tab below to see the model and basic model diagnostic steps.
The first model is a completely misspecified model having none of the covariates used in the data generating model
\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]
The parameters of this model will be estimated using maximum likelihood via
p1 <- modelObj::buildModelObj(model = ~ W + Cr,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
Though we will implement our treatment effect and value estimators in such a way that the regression step is performed internally, it is convenient to do model diagnostics separately here. We will make use of modelObj::
For \(\pi^{1}(x;\gamma)\) the regression can be completed as follows:
PS1 <- modelObj::fit(object = p1, data = dataSBP, response = dataSBP$A)
PS1 <- modelObj::fitObject(object = PS1)
PS1
Call: glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Coefficients:
(Intercept) W Cr
0.966434 -0.007919 -0.703766
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1374 AIC: 1380
where for convenience we have made use of modelObj’s
Now, let’s examine the regression results for the model under consideration.
summary(object = PS1)
Call:
glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.239 -1.104 -1.027 1.248 1.443
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.966434 0.624135 1.548 0.1215
W -0.007919 0.004731 -1.674 0.0942 .
Cr -0.703766 0.627430 -1.122 0.2620
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1373.8 on 997 degrees of freedom
AIC: 1379.8
Number of Fisher Scoring iterations: 4
First, in comparing the null deviance (1377.8) and the residual deviance (1373.8), we see that including the independent variables does not significantly reduce the deviance. Thus, this model is not significantly better than the constant propensity score model. However, we know that the data mimics an observational study for which the propensity score is not constant or known. And, note that the Akaike Information Criterion (AIC) is large (1379.772). Though the AIC value alone does not tell us much about the quality of our model, we can compare this to other models to determine a relative quality.
The second model is an incomplete model having only one of the covariates of the true data generating model
\[ \pi^{2}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1} \text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1} \text{Ch})}. \]
As for \(\pi_{1}(h_{1};\gamma)\) the parameters of this model will be estimated using maximum likelihood via
The modeling objects for this regression step is
p2 <- modelObj::buildModelObj(model = ~ Ch,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The regression is completed as follows:
PS2 <- modelObj::fit(object = p2, data = dataSBP, response = dataSBP$A)
PS2 <- modelObj::fitObject(object = PS2)
PS2
Call: glm(formula = YinternalY ~ Ch, family = "binomial", data = data)
Coefficients:
(Intercept) Ch
-3.06279 0.01368
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1378
Residual Deviance: 1298 AIC: 1302
Again, we use
summary(PS2)
Call:
glm(formula = YinternalY ~ Ch, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7497 -1.0573 -0.7433 1.1449 1.9316
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.062786 0.348085 -8.799 <2e-16 ***
Ch 0.013683 0.001617 8.462 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1298.2 on 998 degrees of freedom
AIC: 1302.2
Number of Fisher Scoring iterations: 4
First, in comparing the null deviance (1377.8) and the residual deviance (1298.2), we see that including the independent variables leads to a smaller deviance than obtained from the intercept only model. And finally, we see that the AIC is large (1302.247), but it is less than that obtained for \(\pi^{1}(x;\gamma)\) (1379.772). This result is not unexpected as we know that the model is only partially misspecified.
The third model we will consider is the correctly specified model used to generate the data set,
\[ \pi^{3}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{SBP0} + \gamma_{2}~\text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{SBP0}+ \gamma_{2}~\text{Ch})}. \]
The parameters of this model will be estimated using maximum likelihood via
The modeling objects for this regression step is
p3 <- modelObj::buildModelObj(model = ~ SBP0 + Ch,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The regression is completed as follows:
PS3 <- modelObj::fit(object = p3, data = dataSBP, response = dataSBP$A)
PS3 <- modelObj::fitObject(object = PS3)
PS3
Call: glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Coefficients:
(Intercept) SBP0 Ch
-15.94153 0.07669 0.01589
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1162 AIC: 1168
Again, we use
summary(PS3)
Call:
glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3891 -0.9502 -0.4940 0.9939 2.1427
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.941527 1.299952 -12.263 <2e-16 ***
SBP0 0.076687 0.007196 10.657 <2e-16 ***
Ch 0.015892 0.001753 9.066 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1377.8 on 999 degrees of freedom
Residual deviance: 1161.6 on 997 degrees of freedom
AIC: 1167.6
Number of Fisher Scoring iterations: 3
First, we see from the null deviance (1377.8) and the residual deviance (1161.6) that including the independent variables does reduce the deviance as compared to the intercept only model. And finally, we see that the AIC is large (1167.621), but it is less than that obtained for both \(\pi^{1}(x;\gamma)\) (1379.772) and \(\pi^{2}(x;\gamma)\) (1302.247). This result is in-line with the knowledge that this is the data generating model.
It is now straightforward to estimate the treatment effect and the standard error for a variety of linear outcome models and logistic regression propensity score models. Throughout this chapter, we have considered three outcome regression models and three propensity score models selected to represent a range of model (mis)specification. Below, we show the analysis for only a few of these combinations. The treatment effects and standard errors for all combinations of outcome and propensity score models are summarized under the heading Comparison.
First we consider the two completely misspecified models
\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}), \]
and
\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]
The modeling objects for the regression steps are
q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
solver.method = 'lm',
predict.method = 'predict.lm')
p1 <- modelObj::buildModelObj(model = ~ W + Cr,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The estimated treatment effect and its standard error are obtained as follows:
delta_DR_se(moOR = q1, moPS = p1, data = dataSBP, y = y)
$fitOR
Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)
Coefficients:
(Intercept) W A Cr A:Cr
-6.66893 0.02784 16.46653 0.56324 2.41004
$fitPS
Call: glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Coefficients:
(Intercept) W Cr
0.966434 -0.007919 -0.703766
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1374 AIC: 1380
$deltaHat
[1] 18.43236
$EY
0 1
-10.090516 8.341846
$sigmaHat
[1] 0.7559916
We see that the estimated treatment effect is 18.43 (0.76) mmHg.
Next we consider the scenario where one model is correctly specified and the other is not. Here, the outcome regression model is that used to generate the data
\[ Q^{3}(x,a;\beta) = \beta_{0} + \beta_{1} \text{Ch} + \beta_{2} \text{K} + a~(\beta_{3} + \beta_{4} \text{Ch} + \beta_{5} \text{K}), \] and the propensity score model is completely misspecified
\[ \pi^{1}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{W} + \gamma_{2}~\text{Cr})}. \]
The modeling objects for the regression steps are
q3 <- modelObj::buildModelObj(model = ~ (Ch + K)*A,
solver.method = 'lm',
predict.method = 'predict.lm')
p1 <- modelObj::buildModelObj(model = ~ W + Cr,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The estimated treatment effect and its standard error is obtained as follows:
delta_DR_se(moOR = q3, moPS = p1, data = dataSBP, y = y)
$fitOR
Call:
lm(formula = YinternalY ~ Ch + K + A + Ch:A + K:A, data = data)
Coefficients:
(Intercept) Ch K A Ch:A K:A
-15.6048 -0.2035 12.2849 -61.0979 0.5048 -6.6099
$fitPS
Call: glm(formula = YinternalY ~ W + Cr, family = "binomial", data = data)
Coefficients:
(Intercept) W Cr
0.966434 -0.007919 -0.703766
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1374 AIC: 1380
$deltaHat
[1] 16.71185
$EY
0 1
-6.075114 10.636739
$sigmaHat
[1] 0.7286743
We see that the estimated treatment effect is 16.71 (0.73) mmHg and is less than that obtained previously under completely misspecified models. And, the standard error is similar to that obtained previously.
Next we consider the second scenario under which one model is correctly specified and the other is not. That is, the outcome regression model is completely misspecified
\[ Q^{1}(x,a;\beta) = \beta_{0} + \beta_{1} \text{W} + \beta_{2} \text{Cr} + a~(\beta_{3} + \beta_{4} \text{Cr}), \]
and the propensity score model is that used to generate the data
\[ \pi^{3}(x;\gamma) = \frac{\exp(\gamma_{0} + \gamma_{1}~\text{SBP0} + \gamma_{2}~\text{Ch})}{1+\exp(\gamma_{0} + \gamma_{1}~\text{SBP0}+ \gamma_{2}~\text{Ch})}. \]
The modeling objects for the regression steps are
q1 <- modelObj::buildModelObj(model = ~ W + A*Cr,
solver.method = 'lm',
predict.method = 'predict.lm')
p3 <- modelObj::buildModelObj(model = ~ SBP0 + Ch,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
The estimated treatment effect is obtained as follows:
delta_DR_se(moOR = q1, moPS = p3, data = dataSBP, y = y)
$fitOR
Call:
lm(formula = YinternalY ~ W + A + Cr + A:Cr, data = data)
Coefficients:
(Intercept) W A Cr A:Cr
-6.66893 0.02784 16.46653 0.56324 2.41004
$fitPS
Call: glm(formula = YinternalY ~ SBP0 + Ch, family = "binomial", data = data)
Coefficients:
(Intercept) SBP0 Ch
-15.94153 0.07669 0.01589
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 1378
Residual Deviance: 1162 AIC: 1168
$deltaHat
[1] 17.23598
$EY
0 1
-12.416919 4.819056
$sigmaHat
[1] 1.111939
We see that the estimated treatment effect is 17.24 (1.11) mmHg and is similar to that obtained previously under the alternative (true/misspecified) combination. And, the standard error is larger than that obtained under the alternative.
In the table below, we see that the estimated treatment effect under these model combinations ranges from 16.65 mmHg - 18.43 mmHg.
(mmHg) | \(Q^{1}(x,a;\beta)\) | \(Q^{2}(x,a;\beta)\) | \(Q^{3}(x,a;\beta)\) |
\(\pi^{1}(x;\gamma)\) | 18.43 (0.76) | 16.65 (0.76) | 16.71 (0.73) |
\(\pi^{2}(x;\gamma)\) | 16.70 (1.08) | 16.70 (0.76) | 16.74 (0.73) |
\(\pi^{3}(x;\gamma)\) | 17.24 (1.11) | 16.72 (0.78) | 16.77 (0.74) |
The table below compares the estimated treatment effects and standard errors for all of the estimators discussed here and under all the models considered in this chapter.
--- | \(\widehat{\delta}^{\text{*}}_{S}\) | \(\widehat{\delta}^{\text{*}}_{IPW}\) | \(Q^{1}(x,a;\beta)\) | \(Q^{2}(x,a;\beta)\) | \(Q^{3}(x,a;\beta)\) | |
\(\widehat{\delta}^{\text{*}}_{N}\) | 18.39 (0.75) | --- | --- | --- | --- | --- |
\(\widehat{\delta}^{\text{*}}_{OR}\) | --- | --- | --- | 18.44 (0.76) | 16.64 (0.76) | 16.70 (0.73) |
\(\pi^{1}(x;\gamma)\) | --- | 18.36 (0.84) | 18.43 (0.93) | 18.43 (0.76) | 16.65 (0.76) | 16.71 (0.73) |
\(\pi^{2}(x;\gamma)\) | --- | 16.83 (0.41) | 16.65 (0.82) | 16.70 (1.08) | 16.70 (0.76) | 16.74 (0.73) |
\(\pi^{3}(x;\gamma)\) | --- | 16.86 (0.76) | 17.01 (0.99) | 17.24 (1.11) | 16.72 (0.78) | 16.77 (0.74) |
The estimated standard errors are shown in parentheses. The first row contains the naive estimator, which does not depend on a regression model. The second row contains \(\widehat{\delta}^{\text{*}}_{OR}\) estimated under each outcome regression model. The second column corresponds to \(\widehat{\delta}^{\text{*}}_{S}\) under each propensity score regression model; the third to \(\widehat{\delta}^{\text{*}}_{IPW}\). The pale shaded elements show \(\widehat{\delta}^{\text{*}}_{DR}\) for each combination of outcome and propensity score models.
We see that the estimated treatment effect tends to be smaller with less variance for methods that use the correctly specified outcome regression model, \(Q^{3}(x,a;\widehat{\beta})\).
We have carried out a simulation study to evaluate the performances of the presented methods. We generated 1000 Monte Carlo data sets, each with \(n=1000\). The table below compares the Monte Carlo average estimated treatment effects and standard deviation of the estimated treatment effect. In addition, the Monte Carlo average of the standard errors as defined previously for each estimator is provided in square brackets.
--- | \(\widehat{\delta}^{\text{*}}_{S}\) | \(\widehat{\delta}^{\text{*}}_{IPW}\) | \(Q^{1}(x,a;\beta)\) | \(Q^{2}(x,a;\beta)\) | \(Q^{3}(x,a;\beta)\) | |
\(\widehat{\delta}^{\text{*}}_{N}\) | 19.26 (0.76); [0.85] | --- | --- | --- | --- | --- |
\(\widehat{\delta}^{\text{*}}_{OR}\) | --- | --- | --- | 19.26 (0.76); [0.76] | 17.39 (0.78); [0.77] | 17.39 (0.75); [0.74] |
\(\pi^{1}(x;\gamma)\) | --- | 19.26 (0.76); [0.85] | 19.26 (0.76); [0.94] | 19.26 (0.76); [0.76] | 17.39 (0.78); [0.77] | 17.39 (0.75); [0.74] |
\(\pi^{2}(x;\gamma)\) | --- | 17.57 (0.78); [0.43] | 17.37 (0.86); [0.86] | 17.38 (0.89); [1.16] | 17.39 (0.78); [0.77] | 17.39 (0.75); [0.74] |
\(\pi^{3}(x;\gamma)\) | --- | 17.61 (0.86); [0.78] | 17.40 (1.03); [1.07] | 17.41 (1.13); [1.37] | 17.39 (0.80); [0.79] | 17.39 (0.76); [0.75] |
Recall that the true treatment effect is \(\delta^{\text{*}} = 17.4\) mmHg. We see that the naive estimator overestimates the treatment effect. Further, when either the propensity or outcome model is correctly specified, the estimated treatment effect agrees well with the true value.
This simulation demonstrates some results and subtleties that are not discussed in Chapter 2 of the book but are well-known in the literature on causal inference. To allow a general discussion of these, define
$$ X_{1} = \text{Ch},~~ X_{2} = \text{K},~~ X_{3} = \text{SBP0}, \text{and}~~X_{4} =\text{ all other variables}. $$A review of the simulation scenario under which the data were generated and on which this simulation is based shows that the conditional distribution of \(Y\) given \(A, X_{1}, X_{2}, X_{3}, X_{4}\) does not depend on \(X_{3}, X_{4}\) and thus is the same as the conditional distribution of \(Y\) given \(A, X_{1}, X_{2}\). Similarly, the conditional distribution of \(A\) given \(X_{1}, X_{2}, X_{3}, X_{4}\) does not depend on \(X_{2} and X_{4}\) and thus is the same as the conditional distribution of \(A\) given \(X_{1}, X_{3}\). We summarize as
$$ \begin{align} Y | A, X_1, X_2, X_3, X_4 \sim Y | A, X_1, X_2 \\ A| X_1, X_2, X_3, X_4 \sim A | X_1, X_3; \end{align} $$that is, \(X_1, X_2\) are the relevant individual characteristics (covariates) for the true outcome regression and \(X_1, X_3\) are the relevant covariates for the true propensity score.
Underlying this situation are the assumptions of SUTVA (consistency),
$$ Y = Y^{\text{*}}(1) A + Y^{\text{*}}(0) (1-A), $$where \(Y^{\text{*}}(a)\) is the potential outcome if an individual were to receive treatment a; and no unmeasured confounders,
$$ \{ Y^{\text{*}}(1), Y^{\text{*}}(0) \} \perp\!\!\!\perp A | X_1, X_2, X_3, X_4. $$Because of the way the data were generated, it can be shown that, under SUTVA and the no unmeasured confounders assumption, it suffices to consider \(X_1, X_2\) (covariates associated with outcome) or \(X_1, X_3\) (covariates associated with propensity) as confounders for which an adjustment must be made to obtain valid inference on \(\delta^\text{*}\). That is, we can show that in fact
$$ \begin{align} \{ Y^{\text{*}}(1), Y^{\text{*}}(0) \} \perp\!\!\!\perp A | X_1, X_2, \\ \{ Y^{\text{*}}(1), Y^{\text{*}}(0) \} \perp\!\!\!\perp A | X_1, X_3. \end{align} $$Moreover, because \(X_1, X_2, X_3\) were generated independently of each other in this scenario, it suffices to consider only \(X_1\) as a confounder for which an adjustment must be made; that is, it can be shown that
$$ \{ Y^{\text{*}}(1), Y^{\text{*}}(0) \} \perp\!\!\!\perp A | X_1. $$Consequently, if we use a correctly specified model for \(E(Y | A, X_1)\) with either the outcome regression estimator or the doubly-robust (AIPW) estimator for \(\delta^{\text{*}}\), then we should obtain a consistent estimator for \(\delta^{\text{*}}\).
Similarly, if we use a correctly specified model for \(E(A | X_1)\) as the model for the propensity score in either of the IPW or doubly-robust (AIPW) estimators, then, again, we should obtain a consistent estimator for \(\delta^{\text{*}}\).
The incomplete models \(Q^2\) and \(\pi^2\) for the outcome regression and propensity score, respectively, both include \(X_1\) (Ch). Thus, the above implies that we should obtain consistent estimators for \(\delta^{\text{*}}\) using \(Q^2\) with the outcome regression estimator or the doubly-robust (AIPW) estimator (in the latter case even if the propensity model is misspecified, as in the case of \(\pi^1\)). The above also implies that we should obtain consistent estimators for \(\delta^{\text{*}}\) using \(\pi^2\) with either of the IPW or doubly-robust (AIPW) estimators (in the latter case even if the outcome regression model is misspecified, as in the case of \(Q^1\)). The simulation results in the table above are consistent with these observations.
Theory also suggests that using all of the covariates associated with outcome in either of the outcome regression model or the propensity score model should result in more efficient estimation of \(\delta^{\text{*}}\). That is, including \(X_1, X_2\) in the outcome regression model or the propensity score model should improve the performance of any of the outcome regression, IPW, or doubly-robust (AIPW) estimators for \(\delta^{\text{*}}\) even though \(X_2\) is not needed to adjust for confounding. Again, this theoretical result is consistent with the simulations above.
It is also known that adjusting for covariates that are related to the propensity score but not to the outcome regression actually yields estimators that are less precise asymptotically. Thus, including \(X_3\) in the propensity model, so using the correctly specified model \(\pi^3\) in the IPW estimator actually results in a less precise IPW estimator than that obtained using \(\pi^2\). Again, this is borne out in the simulations above.
Note, however, that, because the logistic model is nonlinear, if \(\pi^3\) is a correctly specified model for \(E(A | X_1, X_3)\), a correctly specified model for \(E(A | X_1)\) will not have the logistic form. Thus, in the simulations, \(\pi^1\) is not a correctly specified model for the true relationship \(E(A | X_1)\) implied by the simulation scenario, so, technically, the IPW estimator using \(\pi^1\) need not be consistent. However, as seen in the simulations, there is very little effect of this misspecification on the performance of the estimator.
Finally, although not demonstrated in these simulations, the theory implies that modeling the propensity score using \(X_1, X_2\) (so including the covariates in the propensity model that are associated with response) should result in an IPW estimator that is consistent and more efficient. That is, adding \(X_2\), a covariate that is related to outcome but not to treatment assignment, to the propensity score model including the confounder \(X_1\) results in improved precision. This is shown explicitly in Lunceford and Davidian (2004).