\[ \widehat{\mathcal{V}}_{IPW}(d) = n^{-1} \sum_{i=1}^{n} \frac{\mathcal{C}_{d,i}Y_{i}}{\left\{\prod_{k=2}^{K}\pi_{d,k}(\overline{X}_{ki});\widehat{\gamma}_{k})\right\}\pi_{d,1}(X_{1i};\widehat{\gamma}_{1})}, \] where
\[ \mathcal{C}_{d,i} = \text{I}\{\overline{A}_{i} = \overline{d}(\overline{X}_{i})\} \] is the indicator of whether or not all treatments actually received coincide with the options dictated by \(d\).
For binary treatments \(\mathcal{A}_{k} = \{0,1\}\), \(k=1, \dots, K\),
\[ \pi_{d,k}(h_{k}; \gamma_{k}) = \pi_{k}(h_{k}; \gamma_{k}) ~ \text{I}\{d_{k}(h_{k}) = 1\} + \{1-\pi_{k}(h_{k}; \gamma_{k})\} ~ \text{I}\{d_{k}(h_{k}) = 0\}. \]
Recall that \(\pi_{k}(h_{k}; \gamma_{k})\) is a model for
\[ \pi_{k}(h_{k}) = P(A_{k} = 1 | H_{k} = h_{k}), \quad k = 1, \ldots, K. \]
To streamline the presentation of methods, we consider only one propensity score model for each decision point. Note that we are not demonstrating a definitive analysis that one might do in practice, in which the analyst would use all sorts of variable selection techniques, etc., to arrive at a posited model. Our objective is to illustrate how to apply the methods; we have chosen to use the data generating models, which are a class of models that are likely to be familiar to many readers and will not obfuscate the objective.
Click on each tab below to see each model and basic model diagnostic steps.
The posited model for the first decision point is the model used to generate the data
\[ \pi_{1}(h_{1};\gamma_{1}) = \frac{\exp(\gamma_{10} + \gamma_{11}~\text{CD4_0})}{1+\exp(\gamma_{10} + \gamma_{11}~\text{CD4_0})}. \]
The parameters of this model will be estimated using maximum likelihood via
p1 <- modelObj::buildModelObj(model = ~ CD4_0,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
Though the estimators are implemented in DynTxRegime such that the regression step is performed internally and model diagnostics can be performed post-analysis, it is convenient to do model diagnostics separately here. We will make use of modelObj::
For \(\pi_{1}(h_{1};\gamma)\) the regression can be completed as follows:
PS1 <- modelObj::fit(object = p1, data = dataMDP, response = dataMDP$A1)
PS1 <- modelObj::fitObject(object = PS1)
PS1
Call: glm(formula = YinternalY ~ CD4_0, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_0
2.356590 -0.006604
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1315
Residual Deviance: 1228 AIC: 1232
where for convenience we have made use of modelObj’s
Though we know this model to be correctly specified, let’s examine the regression results.
summary(object = PS1)
Call:
glm(formula = YinternalY ~ CD4_0, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9010 -0.9414 -0.7107 1.2178 2.1054
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.3565904 0.3353980 7.026 2.12e-12 ***
CD4_0 -0.0066038 0.0007588 -8.703 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1314.7 on 999 degrees of freedom
Residual deviance: 1227.6 on 998 degrees of freedom
AIC: 1231.6
Number of Fisher Scoring iterations: 4
In comparing the null deviance (1314.7) and the residual deviance (1227.6), we see that including the independent variable, \(\text{CD4_0}\), reduces the deviance relative to the simpler constant propensity score model. This result is consistent with the fact that it is the model used to generate our data.
The posited model for the second decision point is the model used to generate the data
\[ \pi_{2}(h_{2};\gamma_{2}) = \frac{\exp(\gamma_{20} + \gamma_{21}~\text{CD4_6})}{1+\exp(\gamma_{20} + \gamma_{21}~\text{CD4_6})}. \]
The parameters of this model will be estimated using maximum likelihood via
p2 <- modelObj::buildModelObj(model = ~ CD4_6,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
Though the estimators are implemented in DynTxRegime such that the regression step is performed internally and model diagnostics can be performed post-analysis, it is convenient to do model diagnostics separately here. We will make use of modelObj::
For \(\pi_{2}(h_{2};\gamma_{2})\) the regression can be completed as follows:
PS2 <- modelObj::fit(object = p2, data = dataMDP, response = dataMDP$A2)
PS2 <- modelObj::fitObject(object = PS2)
PS2
Call: glm(formula = YinternalY ~ CD4_6, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_6
0.818685 -0.004294
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 951.8
Residual Deviance: 911.4 AIC: 915.4
where for convenience we have made use of modelObj’s
Though we know this model to be correctly specified, let’s examine the regression results.
summary(object = PS2)
Call:
glm(formula = YinternalY ~ CD4_6, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2416 -0.6751 -0.5610 -0.4157 2.2830
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8186852 0.3733067 2.193 0.0283 *
CD4_6 -0.0042941 0.0006989 -6.144 8.05e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 951.82 on 999 degrees of freedom
Residual deviance: 911.44 on 998 degrees of freedom
AIC: 915.44
Number of Fisher Scoring iterations: 4
In comparing the null deviance (951.8) and the residual deviance (911.4), we see that including the independent variable, \(\text{CD4_6}\) reduces the deviance relative to the simpler constant propensity score model. This result is consistent with the fact that it is the model used to generate our data.
The posited model for the final decision point is the model used to generate the data
\[ \pi_{3}(h_{3};\gamma_{3}) = \frac{\exp(\gamma_{30} + \gamma_{31}~\text{CD4_12})}{1+\exp(\gamma_{30} + \gamma_{31}~\text{CD4_12})}. \]
The parameters of this model will be estimated using maximum likelihood via
p3 <- modelObj::buildModelObj(model = ~ CD4_12,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
Though the estimators are implemented in DynTxRegime such that the regression step is performed internally and model diagnostics can be performed post-analysis, it is convenient to do model diagnostics separately here. We will make use of modelObj::
For \(\pi_{3}(h_{3};\gamma_{3})\) the regression can be completed as follows:
PS3 <- modelObj::fit(object = p3, data = dataMDP, response = dataMDP$A3)
PS3 <- modelObj::fitObject(object = PS3)
PS3
Call: glm(formula = YinternalY ~ CD4_12, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_12
1.411752 -0.004945
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1252
Residual Deviance: 1203 AIC: 1207
where for convenience we have made use of modelObj’s
Though we know this model to be correctly specified, let’s examine the regression results.
summary(object = PS3)
Call:
glm(formula = YinternalY ~ CD4_12, family = "binomial", data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4176 -0.9026 -0.7373 1.2970 2.0555
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.411752 0.324038 4.357 1.32e-05 ***
CD4_12 -0.004945 0.000734 -6.736 1.62e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1252.2 on 999 degrees of freedom
Residual deviance: 1203.0 on 998 degrees of freedom
AIC: 1207
Number of Fisher Scoring iterations: 4
In comparing the null deviance (1252.2) and the residual deviance (1203), we see that including the independent variable, \(\text{CD4_12}\), reduces the deviance relative to the simpler constant propensity score model. This result is consistent with the fact that it is the model used to generate our data.
For the inverse probability weighted estimator of the value of fixed regime \(d\) for which \(\pi_{k}(h_{k})\) are modeled using a logistic regression model and the parameters of which are estimated using maximum likelihood, the \((1,1)\) component of the sandwich estimator for the variance is defined as
\[ \widehat{\Sigma}_{11} = A_{n} + (C_{n}^{\intercal}~D_{n}^{-1})~B_{n}~(C_{n}^{\intercal}~D_{n}^{-1})^{\intercal} , \]
where \(A_{n} = n^{-1} \sum_{i=1}^{n} A_{ni} A_{ni}\) is the value estimator component
\[ A_{ni} = \frac{C_{d,i}~Y_i}{\left\{\prod_{k=2}^{K}\pi_{d,k}(\overline{X}_{ki});\widehat{\gamma}_{k})\right\}\pi_{d,1}(X_{1i};\widehat{\gamma}_{1})} - \widehat{\mathcal{V}}_{IPW}(d). \]
And, the components arising from the maximum likehood estimation of each of the \(k = 1, \dots, K\) logistic regression models are \[ B_{n} = \left( \begin{array}{cc} B_{n1} & \cdots & 0\\ 0 & \ddots & 0\\ 0 & \cdots & B_{nK} \end{array} \right), \] where \(B_{nk} = n^{-1} \sum_{i=1}^{n} B_{nki} B_{nki}^{\intercal}\), \(k=1,\dots,K\) and
\[ B_{nki} = \frac{ A_{ki} - \pi_{k}(\overline{X}_{ki};\widehat{\gamma}_{k})}{\pi_{k}(\overline{X}_{ki}; \widehat{\gamma}_{k})\{1-\pi_{k}(\overline{X}_{ki}; \widehat{\gamma}_{k})\}} \frac{\partial \pi_{k}(\overline{X}_{ki}; \widehat{\gamma}_{k})}{\partial \gamma_{k}}. \]
Terms \(C_{n}\) and \(D_{n}\) are the derivatives with respect to the parameters of the propensity score models:
\[ C_{n} = n^{-1}\sum_{i=1}^{n}\frac{\partial A_{ni}}{\partial \gamma} \quad \text{and} \quad D_{n} = n^{-1}\sum_{i=1}^{n} \frac{\partial B_{ni}}{\partial \gamma^{\intercal}}. \]
\(\gamma = (\gamma_{1}^{\intercal}, \dots, \gamma_{K}^{\intercal})^{\intercal}\).
As was done in Chapters 2 and 3, we break the implementation of the value estimator and the sandwich estimator of the variance into multiple functions, each function calculating a specific component. In addition, we reuse function
This implementation is not the most efficient implementation of the variance estimator. For example, the regression step is completed twice (once in
#------------------------------------------------------------------------------#
# IPW estimator for value of a fixed tx regime with binary tx at each of the
# K>1 decision points
#
# ASSUMPTIONS
# each of the tx is denoted by Ak k=(1,...,K) and is binary coded as {0,1}
#
# INPUTS
# moPS : list of modeling objects for propensity score regressions
# must be logistic models; kth element corresponds to kth decision
# point
# data : data.frame containing all covariates and txs
# txs must be named 'Ak' (k=1,...,K) and coded 0/1
# y : outcome of interest
# regime : matrix of 0/1 containing recommended txs
# kth column corresponds to kth decision point
#
# RETURNS
# a list containing
# fitPS : list of value objects returned by propensity score regression
# valueHat : estimated value
#------------------------------------------------------------------------------#
value_IPW_md <- function(moPS, data, y, regime) {
# number of decision points
K <- length(x = moPS)
# number of patients in data set
n <- nrow(x = data)
# tx names in data set
nms <- paste0('A',1L:K)
#### Propensity Score Regression
fitPS <- list()
pid <- rep(x = 1.0, times = n)
for (k in 1L:K) {
# fit propensity score
fitPS[[ k ]] <- modelObj::fit(object = moPS[[ k ]],
data = data,
response = data[,nms[k]])
# product of estimated propensity scores
pk <- modelObj::predict(object = fitPS[[ k ]], newdata = data)
if (is.matrix(x = pk)) pk <- pk[,ncol(x = pk), drop = TRUE]
pid <- pid * {pk*{regime[,k] == 1L} + {1.0 - pk}*{regime[,k] == 0L}}
}
#### Value Estimate
# indicator of tx received was that dictated by the regime
Cd <- rowSums(x = regime == data[,nms]) == K
value <- mean(x = Cd * y / pid)
return( list( "fitPS" = fitPS,
"valueHat" = value) )
}
#------------------------------------------------------------------------------#
# standard error for the IPW estimator of the value of a fixed regime
# with K > 1 decision points estiamted using the sandwich estimator of variance
#
# ASSUMPTIONS
# all K propensity score models are logistic models
# txs are denoted by Ak (k=1,...,K) and are binary coded as {0,1}
#
# INPUTS
# moPS : list of modeling objects for propensity score regressions
# must be logistic models; kth element corresponds to kth decision
# point
# data : data.frame containing all covariates and txs
# txs must be named 'Ak' (k=1,...,K) and coded 0/1
# y : outcome of interest
# regime : matrix of 0/1 containing recommended txs
# kth column corresponds to kth decision point
#
# RETURNS
# a list containing
# fitPS : list of value objects returned by propensity score regression
# valueHat : estimated value
# sigmaHat : estimated standard error
#------------------------------------------------------------------------------#
value_IPW_se_md <- function(moPS, data, y, regime) {
# number of decision points
K <- length(x = moPS)
# number of patients in data set
n <- nrow(x = data)
# tx names in data set
nms <- paste0('A',1L:K)
#### ML components
invdM <- matrix(data = 0.0, nrow = 0L, ncol = 0L)
MM <- matrix(data = 0.0, nrow = 0L, ncol = 0L)
for (k in 1L:K) {
ML <- swv_ML(mo = moPS[[ k ]], data = data, y = data[,nms[k]])
invdM <- rbind(cbind(invdM,
matrix(data = 0.0,
nrow = nrow(x = invdM),
ncol = ncol(x = ML$invdM))),
cbind(matrix(data = 0.0,
nrow = nrow(x = ML$invdM),
ncol = ncol(x = invdM)),
ML$invdM))
MM <- rbind(cbind(MM,
matrix(data = 0.0,
nrow = nrow(x = MM),
ncol = ncol(x = ML$MM))),
cbind(matrix(data = 0.0,
nrow = nrow(x = ML$MM),
ncol = ncol(x = MM)),
ML$MM))
}
#### IPW value components
IPW <- value_IPW_swv_md(moPS = moPS, data = data, y = y, regime = regime)
#### variance estimator
temp <- IPW$dMdG %*% invdM
sig11 <- drop(x = IPW$MM + temp %*% MM %*% t(x = temp))
sig11 <- sqrt(x = sig11 / n)
return( c(IPW$value, "sigmaHat" = sig11) )
}
#------------------------------------------------------------------------------#
# component of sandwich estimator for IPW estimator of value of a fixed regime
# with K > 1
#
# ASSUMPTIONS
# all K propensity score models are logistic models
# txs are denoted by Ak (k=1,...,K) and are binary coded as {0,1}
#
# INPUTS
# moPS : list of modeling objects for propensity score regressions
# must be logistic models; kth element corresponds to kth decision
# point
# data : data.frame containing all covariates and txs
# txs must be named 'Ak' (k=1,...,K) and coded 0/1
# y : outcome of interest
# regime : matrix of 0/1 containing recommended txs
# kth column corresponds to kth decision point
#
# RETURNS
# a list containing
# value : list returned by value_IPW()
# MM : M M^T matrix
# dMdG : matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
value_IPW_swv_md <- function(moPS, data, y, regime) {
# estimated value
value <- value_IPW_md(moPS = moPS, data = data, y = y, regime = regime)
# number of decision points
K <- length(x = moPS)
# number of patients in data set
n <- nrow(x = data)
# tx names in data set
nms <- paste0('A',1L:K)
piMM <- list()
pid <- rep(x = 1.0, times = n)
for (k in 1L:K) {
# pi(x_k; gamma_k)
pk <- modelObj::predict(object = value$fitPS[[ k ]], newdata = data)
if (is.matrix(x = pk)) pk <- pk[,ncol(x = pk), drop = TRUE]
# propensity for receiving recommended tx
pid <- pid * {pk*{regime[,k] == 1L} + {1.0 - pk}*{regime[,k] == 0L}}
# derivative
piMM[[ k ]] <- stats::model.matrix(object = modelObj::model(object = moPS[[ k ]]),
data = data)*{-regime[,k] + pk}
}
# indicator of tx received was that dictated by the regime
Cd <- as.integer(x = {rowSums(x = regime == data[,nms]) == K})
# value component of M MT
mmValue <- mean(x = {Cd * y / pid - value$valueHat}^2)
#### derivative w.r.t. gamma
dMdG <- NULL
for (k in 1L:K) {
dMdG <- c(dMdG, colMeans(x = Cd * y / pid * piMM[[ k ]]))
}
return( list("value" = value,
"MM" = mmValue,
"dMdG" = dMdG) )
}
#------------------------------------------------------------------------------#
# components of the sandwich estimator for a maximum likelihood estimation of
# a logistic regression model
#
# ASSUMPTIONS
# mo is a logistic model with parameters to be estimated using ML
#
# INPUTS
# mo : a modeling object specifying a regression step
# *** must be a logistic model ***
# data : a data.frame containing covariates and tx
# y : a vector containing the binary outcome of interest
#
#
# RETURNS
# a list containing
# MM : M^T M component from ML
# invdM : inverse of the matrix of derivatives of M wrt gamma
#------------------------------------------------------------------------------#
swv_ML <- function(mo, data, y) {
# regression step
fit <- modelObj::fit(object = mo, data = data, response = y)
# yHat
p1 <- modelObj::predict(object = fit, newdata = data)
if (is.matrix(x = p1)) p1 <- p1[,ncol(x = p1), drop = TRUE]
# model.matrix for model
piMM <- stats::model.matrix(object = modelObj::model(object = mo),
data = data)
n <- nrow(x = piMM)
p <- ncol(x = piMM)
# ML M-estimator component
mMLi <- {y - p1} * piMM
# ML component of M MT
mmML <- sapply(X = 1L:n,
FUN = function(i){ mMLi[i,] %o% mMLi[i,] },
simplify = "array")
# derivative of ML component
dFun <- function(i){
- piMM[i,] %o% piMM[i,] * p1[i] * {1.0 - p1[i]}
}
dmML <- sapply(X = 1L:n, FUN = dFun, simplify = "array")
if( p == 1L ) {
mmML <- mean(x = mmML)
dmML <- mean(x = dmML)
} else {
mmML <- rowMeans(x = mmML, dim = 2L)
dmML <- rowMeans(x = dmML, dim = 2L)
}
# invert derivative ML component
invD <- solve(a = dmML)
return( list("MM" = mmML, "invdM" = invD) )
}
With this implementation, it is straightforward to estimate the value and the standard error when logistic regression models are posited for each of the \(K\) decision points. The propensity score models considered are described and reviewed under tab \(\pi_{k}(h_{k};\gamma_{k})\).
As was done for the single decision point estimator, we compare the estimated value under a variety of fixed regimes. Each tab below illustrates how to define the fixed regime under consideration and complete an analysis using the posited propensity score models.
First, we consider static regime \(d = \{d_{1}(x_{1}) \equiv 1, d_{2}(\overline{x}_{2}) \equiv 1, d_{3}(\overline{x}_{3}) \equiv 1\}\). That is, all individuals are recommended to receive treatment \(A_{k} = 1\), \(k = 1, 2, 3\).
The estimated value of this fixed regime and its standard error are obtained as follows:
p1 <- modelObj::buildModelObj(model = ~ CD4_0,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
p2 <- modelObj::buildModelObj(model = ~ CD4_6,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
p3 <- modelObj::buildModelObj(model = ~ CD4_12,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
regime <- matrix(data = 1L, nrow = nrow(x = dataMDP), ncol = 3L)
value_IPW_se_md(moPS = list(p1, p2, p3),
data = dataMDP,
y = dataMDP$Y,
regime = regime)
$fitPS
$fitPS[[1]]
Call: glm(formula = YinternalY ~ CD4_0, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_0
2.356590 -0.006604
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1315
Residual Deviance: 1228 AIC: 1232
$fitPS[[2]]
Call: glm(formula = YinternalY ~ CD4_6, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_6
0.818685 -0.004294
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 951.8
Residual Deviance: 911.4 AIC: 915.4
$fitPS[[3]]
Call: glm(formula = YinternalY ~ CD4_12, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_12
1.411752 -0.004945
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1252
Residual Deviance: 1203 AIC: 1207
$valueHat
[1] 160.1139
$sigmaHat
[1] 48.43737
We see that the inverse probability weighted estimator of the value under \(d = \{d_{1}(x_{1}) \equiv 1, d_{2}(\overline{x}_{2}\} \equiv 1, d_{3}(\overline{x}_{3}) \equiv 1)\) is 160.11 (48.44) cells/mm\(^3\) for this data set. Note that only 28 individuals in our data set follow this regime and thus contribute to the estimated value, which represents a very small percentage of the data.
We have carried out a simulation study to evaluate the performance of the presented method. We generated 2 Monte Carlo data sets, each with \(n=1000\). The Monte Carlo average estimated value and standard deviation of the estimated value are 13.99 (221.38). In addition, the Monte Carlo average of the standard errors is 145.48.
Next, we consider static regime \(d = \{d_{1}(x_{1}) \equiv 0, d_{2}(\overline{x}_{2}) \equiv 0, d_{3}(\overline{x}_{3}) \equiv 0\}\). That is, all individuals are recommended to receive treatment \(A_{k} = 0\), \(k = 1, 2, 3\).
The estimated value of this fixed regime and its standard error are obtained as follows:
p1 <- modelObj::buildModelObj(model = ~ CD4_0,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
p2 <- modelObj::buildModelObj(model = ~ CD4_6,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
p3 <- modelObj::buildModelObj(model = ~ CD4_12,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
regime <- matrix(data = 0L, nrow = nrow(x = dataMDP), ncol = 3L)
value_IPW_se_md(moPS = list(p1, p2, p3),
data = dataMDP,
y = dataMDP$Y,
regime = regime)
$fitPS
$fitPS[[1]]
Call: glm(formula = YinternalY ~ CD4_0, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_0
2.356590 -0.006604
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1315
Residual Deviance: 1228 AIC: 1232
$fitPS[[2]]
Call: glm(formula = YinternalY ~ CD4_6, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_6
0.818685 -0.004294
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 951.8
Residual Deviance: 911.4 AIC: 915.4
$fitPS[[3]]
Call: glm(formula = YinternalY ~ CD4_12, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_12
1.411752 -0.004945
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1252
Residual Deviance: 1203 AIC: 1207
$valueHat
[1] 1095.729
$sigmaHat
[1] 60.60388
We see that the inverse probability weighted estimator of the value under \(d = \{d_{1}(x_{1}) \equiv 0, d_{2}(\overline{x}_{2}) \equiv 0, d_{3}(\overline{x}_{3}) \equiv 0\}\) is 1095.73 (60.60) cells/mm\(^3\). In the current data set, 376 individuals follow this regime; a small percentage of the total data, but significantly more than regime \(d = \{d_{1}(x_{1}) \equiv 1, d_{2}(\overline{x}_{2}) \equiv 1, d_{3}(\overline{x}_{3}) \equiv 1\}\).
We have carried out a simulation study to evaluate the performance of the presented method. We generated 2 Monte Carlo data sets, each with \(n=1000\). The Monte Carlo average estimated value and standard deviation of the estimated value are 1106.47 (28.75). In addition, the Monte Carlo average of the standard errors is 60.65.
The final fixed regime we consider is a covariate dependent regime \(d = \{d_{1}(x_{1}) = \text{I}(\text{CD4_0} < 250 ~\text{cells}/\text{mm}^{3}),\)\(d_{2}(\overline{x}_{2}) = \text{I}(\text{CD4_6} < 360~\text{cells}/\text{mm}^{3}),\)\(d_{3}(\overline{x}_{3}) = \text{I}(\text{CD4_12} < 300~\text{cells}/\text{mm}^{3})\}\).
The estimated value of this fixed regime and its standard error are obtained as follows:
p1 <- modelObj::buildModelObj(model = ~ CD4_0,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
p2 <- modelObj::buildModelObj(model = ~ CD4_6,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
p3 <- modelObj::buildModelObj(model = ~ CD4_12,
solver.method = 'glm',
solver.args = list(family='binomial'),
predict.method = 'predict.glm',
predict.args = list(type='response'))
regime <- cbind(as.integer(x = {dataMDP$CD4_0 < 250.0}),
as.integer(x = {dataMDP$CD4_6 < 360.0}),
as.integer(x = {dataMDP$CD4_12 < 300.0}))
value_IPW_se_md(moPS = list(p1, p2, p3),
data = dataMDP,
y = dataMDP$Y,
regime = regime)
$fitPS
$fitPS[[1]]
Call: glm(formula = YinternalY ~ CD4_0, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_0
2.356590 -0.006604
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1315
Residual Deviance: 1228 AIC: 1232
$fitPS[[2]]
Call: glm(formula = YinternalY ~ CD4_6, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_6
0.818685 -0.004294
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 951.8
Residual Deviance: 911.4 AIC: 915.4
$fitPS[[3]]
Call: glm(formula = YinternalY ~ CD4_12, family = "binomial", data = data)
Coefficients:
(Intercept) CD4_12
1.411752 -0.004945
Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
Null Deviance: 1252
Residual Deviance: 1203 AIC: 1207
$valueHat
[1] 1118.004
$sigmaHat
[1] 61.41719
We see that the IPW estimator of the value under \(d = \{d_{1}(x_{1}) = \text{I}(\text{CD4_0} < 250 ~\text{cells}/\text{mm}^{3}),\)\(d_{2}(\overline{x}_{2}) = \text{I}(\text{CD4_6} < 360~\text{cells}/\text{mm}^{3}),\)\(d_{3}(\overline{x}_{3}) = \text{I}(\text{CD4_12} < 300~\text{cells}/\text{mm}^{3})\}\) is 1118.00 (61.42) cells/mm\(^3\). As we saw in the second example, 379 individuals follow this regime.
We have carried out a simulation study to evaluate the performance of the presented method. We generated 2 Monte Carlo data sets, each with \(n=1000\). The Monte Carlo average estimated value and standard deviation of the estimated value are 1060.69 (41.98). In addition, the Monte Carlo average of the standard errors is 62.22.