Chapter 5 Data

Description

We compare the estimators introduced in Chapter 5 using a simulated data set based on the \(K=3\) scenario in Zhang, et al., 2013. Specifically, the scenario mimics a multiple decision point study in which HIV-infected patients receive one of two treatment options at each decision point, coded as \(\{0,1\}\). The discussion presented in Chapter 5 of the book made several simplifying assumptions including that there are only two treatment options at each decision point, both of which are feasible for all individuals regardless of their histories. A more complex design will be considered in subsequent chapters.

The data set depicts a three-decision-point study. The primary end point of interest is a continuous variable, the 18-month CD4 count, for which larger values are preferred.

The dataset comprises one thousand patients (\(n=1000\)). The baseline patient characteristic is the baseline CD4 count \(\text{CD4_0}\) (cells/mm^3). At Decision 1, each patient in this study received one of two treatments (\(A_{1} \in \{0,1\}\)). Six-months after the first treatment decision the CD4 count was reassessed, and a second treatment decision was made. Specifically, covariate \(\text{CD4_6}\) (cells/mm^3) was measured and one of two treatments (\(A_{2} \in \{0,1\}\)) assigned. Six months later, 12-months after the first treatment, the patient’s CD4 count was remeasured, \(\text{CD4_12}\) (cells/mm^3), and a final treatment decision was made. Finally, six months later, 18-months after the initial treatment, the patient’s CD4 count is remeasured; this is the outcome of interest, \(Y_{i}\) is the CD4 count after 18-months of treatment for which higher values are more desirable. The observed data are independent and identically distributed

\[ \{\text{CD4_0}_{i}, A_{1i}, \text{CD4_6}_{i}, A_{2i}, \text{CD4_12}_{i}, A_{3i}, Y_i\},~ \text{for}~ i = 1, \dots, n. \]

The primary outcome of interest is \(Y\), where larger values are considered better. We assume that all participants adhered to the treatment plan to which they were assigned and that no participants dropped out of the study.

For the simulation scenario used to generate the data, the true value under the optimal regime is \(\mathcal{V}(d^{opt}) = 1120\) cells/mm\(^3\) and the optimal regime is

\[ \begin{align} d^{opt}_{1}(h_{1}) &= \text{I} (\text{CD4_0} < 250 ~ \text{cells/mm}^3) \\ d^{opt}_{2}(h_{2}) &= \text{I} (\text{CD4_6} < 360 ~ \text{cells/mm}^3) \\ d^{opt}_{3}(h_{3}) &= \text{I} (\text{CD4_12} < 300 ~ \text{cells/mm}^3) \end{align} \]

This scenario implies the following true outcome regression relationship

\[ \begin{align} Q_{3}(h_{3},a_{3}) =& 400 + 1.6~\text{CD4_0} \\ &- |500 - 2~\text{CD4_0}| \{a_{1} - \text{I}(250 - \text{CD4_0} > 0.0)\}^2 \\ &- |720 - 2~\text{CD4_6}|\{a_{2} - \text{I}(360 - \text{CD4_6} > 0.0)\}^2 \\ &- |600 - 2~\text{CD4_12}|\{a_{3} - \text{I}(300 - \text{CD4_12} > 0.0)\}^2 \end{align} \]

and propensity models

\[ \pi_1(h_{1}) = \frac{\exp(2 - 0.006~\text{CD4_0})}{1+\exp(2 - 0.006~\text{CD4_0})}, \] \[ \pi_2(h_{2}) = \frac{\exp(0.8 - 0.004~\text{CD4_6})}{1+\exp(0.8 - 0.004~\text{CD4_6})}, \] and

\[ \pi_3(h_{3}) = \frac{\exp(1.0 - 0.004~\text{CD4_12})}{1+\exp(1.0 - 0.004~\text{CD4_12})}. \]

The data generating models and the R implementation are provided under the Simulation Scenario heading in the sidebar.

Download Data Set

R Environment

Once downloaded, the data set can be loaded into R using

dataMDP <- utils::read.csv(file = 'path_to_file/dataMDP.txt', header = TRUE)

where ‘path_to_file’ is the full path to the downloaded data set file.

Examine the first few records of the data set

utils::head(x = dataMDP)

  CD4_0 A1 CD4_6 A2 CD4_12 A3    Y
1 329.3  1 426.0  0  337.7  0  686
2 477.7  0 586.2  0  473.1  0 1056
3 558.4  0 692.3  0  556.3  0 1239
4 215.4  1 264.8  1  226.6  0  516
5 492.9  0 613.6  0  503.6  0 1281
6 500.6  0 622.7  0  495.3  1  954

to ensure that the data set contains the expected covariates.

Simulation Scenario

The data for this simulated trial were generated from the following models:

set.seed(seed = 1234L)
n <- 1000L

Baseline CD4 Count \(\text{CD4_0}\)

\[ \text{CD4_0} \sim \mathcal{N}(\mu=450, \sigma^2=100^2). \]

  CD4_0 <- round(stats::rnorm(n = n, mean = 450.0, sd = 100.0), digits = 1L)

First Stage Treatment Received

\[ A_{1} \sim B\{n=1, p= \pi_{1}(x_{1})\} \]

where

\[ \pi_{1}(x_{1}) = \text{expit}(2 - 0.006~\text{CD4_0}) \]

and \(\text{expit}(u) = e^{u}/(1+e^{u})\).

  xb <- 2.0 - 0.006*CD4_0
  probA1 <- exp(x = xb) / {1.0 + exp(x = xb)}
  A1 <- stats::rbinom(n = n, size = 1L, prob = probA1)

Six-Month CD4 Count \(\text{CD4_6}\)

\[ \text{CD4_6} \sim \mathcal{N}(\mu=1.25~\text{CD4_0}, \sigma^2=8^2). \]

  CD4_6 <- round(stats::rnorm(n = n, mean = 1.25*CD4_0, sd = 8.0), digits = 1L)

Second Stage Treatment Received

\[ A_{2} \sim B\{n=1, p= \pi_2(h_{2})\} \]

where

\[ \pi_{2}(h_{2}) = \text{expit}(0.8 - 0.004~\text{CD4_6}) \]

and \(\text{expit}(u) = e^{u}/(1+e^{u})\).

  xb <- 0.8 - 0.004*CD4_6
  probA2 <- exp(x = xb) / {1.0 + exp(x = xb)}
  A2 <- stats::rbinom(n = n, size = 1L, prob = probA2)

Twelve-Month CD4 Count \(\text{CD4_12}\)

\[ \text{CD4_12} \sim \mathcal{N}(\mu=0.8~\text{CD4_6}, \sigma^2=8^2). \]

  CD4_12 <- round(stats::rnorm(n = n, mean = 0.8*CD4_6, sd = 8.0), digits = 1L)

Third Stage Treatment Received

\[ A_{3} \sim B\{n=1, p= \pi_{3}(h_{3})\} \]

where

\[ \pi_{3}(h_{3}) = \text{expit}(1.0 - 0.004~\text{CD4_12}) \]

and \(\text{expit}(u) = e^{u}/(1+e^{u})\).

  xb <- 1.0 - 0.004*CD4_12
  probA3 <- exp(x = xb) / {1.0 + exp(x = xb)}
  A3 <- stats::rbinom(n = n, size = 1L, prob = probA3)

Outcome of Interest, Eighteen-Month CD4 Count

\[ Y \sim \mathcal{N}(\mu=\mu, \sigma^2=60^2), \]

where \(0 \le Y \le 1500\) and

\[ \begin{align} \mu = & 400 + 1.6~\text{CD4_0} \\ & - |500 - 2~\text{CD4_0}|\{A_{1} - \text{I}(250-\text{CD4_0}>0)\}^2 \\ & - |720 - 2~\text{CD4_6}|\{A_{2} - \text{I}(360-\text{CD4_6}>0)\}^2 \\ & - |600 - 2~\text{CD4_12}|\{A_{3} - \text{I}(300-\text{CD4_12}>0)\}^2. \end{align} \]

  mu <- 400.0 + 1.6*CD4_0 - 
        abs(x = {500.0 - 2.0*CD4_0})*{A1 - {250.0 - CD4_0 > 0.0}}^2 - 
        abs(x = {720.0 - 2.0*CD4_6})*{A2 - {360.0 - CD4_6 > 0.0}}^2 - 
        abs(x = {600.0 - 2.0*CD4_12})*{A3 - {300.0 - CD4_12 > 0.0}}^2

  pY_L <- min(stats::pnorm(q = 0, mean = mu, sd = 60), 0.999)
  pY_U <- stats::pnorm(q = 2000, mean = mu, sd = 60) 
  prob <- stats::runif(n = n, min = pY_L, max = pY_U) 
  Y <- stats::qnorm(p = prob, mean = mu, sd = 60)
  Y <- round(x = Y, digit = 0L)

From this, we see that the true optimal treatment regime is

and \(\mathcal{V}(d^{opt}) = 400 + 1.6 * 450 = 1120\) cells/mm\(^3\).

Summary

The outcome of interest, the 18-month CD4 count, is plotted below. Recall that we define the outcome of interest such that larger values are more desirable. The colors indicate the combination of treatments received across the three decision points.

The outcome of interest.

The summary data for each covariate is given below.

summary(object = dataMDP)

     CD4_0             A1            CD4_6             A2            CD4_12            A3              Y         
 Min.   :110.4   Min.   :0.000   Min.   :140.9   Min.   :0.000   Min.   :116.1   Min.   :0.000   Min.   :-262.0  
 1st Qu.:382.7   1st Qu.:0.000   1st Qu.:476.7   1st Qu.:0.000   1st Qu.:381.0   1st Qu.:0.000   1st Qu.: 677.8  
 Median :446.0   Median :0.000   Median :556.6   Median :0.000   Median :445.4   Median :0.000   Median : 807.5  
 Mean   :447.3   Mean   :0.367   Mean   :559.4   Mean   :0.183   Mean   :447.4   Mean   :0.319   Mean   : 854.3  
 3rd Qu.:511.6   3rd Qu.:1.000   3rd Qu.:637.4   3rd Qu.:0.000   3rd Qu.:510.2   3rd Qu.:1.000   3rd Qu.:1114.0  
 Max.   :769.6   Max.   :1.000   Max.   :955.7   Max.   :1.000   Max.   :767.2   Max.   :1.000   Max.   :1652.0

In the figure below, we show the outcome of interest, \(Y\) plotted against each covariate. The colors indicate the combination of treatments received across the three treatment decision and are those used in the figure above.

Outcome of interest plotted against each covariate.