Chapter Objectives



The objective of this chapter is to estimate an optimal \(\Psi\)-specific treatment regime in the setting where there are \(K \gt 1\) decision points at which treatment selection will take place. In the nomenclature of potential outcomes the value is defined as

\[ \mathcal{V}(d) = E\left\{Y^{\text{*}}(d)\right\}, \]

where \(Y^{\text{*}}(d)\) is the potential outcome that an individual would achieve if all \(K\) rules in \(d\) were followed to select treatment. A regime that satifies

\[ E\left\{Y^{\text{*}}(d^{opt})\right\} \ge E\left\{Y^{\text{*}}(d)\right\} \textrm{for all d} \in \mathcal{D} \]

is termed an optimal regime.

Here, we provide example analyses using the Q-Learning, classification, value search, and backward outcome weighted learning approaches discussed in Chapter 7. These estimators for \(d^{opt}\) have been implemented in R package DynTxRegime. This package is freely available through the repository maintained by R, CRAN.