We assume a basic level of knowledge about the
We use a variety of font decorations when referring in text passages to packages, functions, and input arguments. Package names are printed in bold packageName. Functions that are defined by a package in
If a function exists in
As mentioned previously, function argument names in text sections are bold text-type, argumentName. In addition, for code blocks, we always use the named argument input convention (name = object). Many functions allow for unnamed, positional inputs; however, we believe that named inputs provide more clarity.
For example, we will write the code to generate 100 random numbers from a normal distribution with mean 10 and standard deviation 2 as
stats::rnorm(n = 100, mean = 10, sd = 2)
[1] 8.167082 8.265135 11.750101 12.823050 9.817039 12.971156 13.097068
[8] 10.344266 15.192547 8.951312 10.192172 5.241290 10.286082 11.255429
[15] 11.721025 11.896537 10.158942 12.745745 10.044947 8.355021 8.532626
[22] 10.446595 7.957810 12.723588 9.979115 10.219141 8.244518 8.395776
[29] 10.250264 9.798035 8.170835 11.931790 8.900916 9.411897 8.484375
[36] 9.605591 12.386412 11.770323 12.505528 7.199994 8.534601 9.412743
[43] 14.017033 8.626459 10.049519 8.859831 11.460698 9.245987 11.783126
[50] 7.390956 11.622964 7.443781 8.515167 12.142947 9.646388 9.809504
[57] 11.954840 11.039984 12.012906 14.171457 8.724991 8.176391 9.993004
[64] 12.387336 9.283920 6.868521 7.508926 11.393521 9.833981 9.347148
[71] 9.549741 9.084681 9.240683 11.131544 7.019987 6.579202 7.602609
[78] 8.941636 12.273428 12.592533 8.403796 9.295288 15.314072 5.445649
[85] 8.530123 10.659008 11.296296 9.689286 6.439194 8.956625 8.219575
[92] 7.575232 7.842680 13.619029 10.693222 9.233547 10.463085 11.726034
[99] 9.177021 13.536567
though
stats::rnorm(100, 10, 2)
[1] 9.271698 9.192661 10.725390 14.811142 9.716182 8.752896 10.753793
[8] 10.535138 12.410409 12.764792 10.232825 6.620746 10.030879 10.607405
[15] 6.595075 13.265181 11.810778 9.331417 10.994086 10.058339 10.090701
[22] 9.453874 6.953572 10.934352 7.375328 12.758642 12.978439 10.125703
[29] 8.011926 7.081048 12.185786 8.443431 12.499837 8.714348 11.413399
[36] 9.589082 13.171304 11.300317 12.272204 8.447132 11.390216 11.753112
[43] 7.644975 7.183854 10.797007 7.292401 8.593925 10.174816 11.748587
[50] 13.721499 10.902068 11.424707 8.163498 11.138128 9.065724 8.807123
[57] 11.632177 8.277435 9.414094 11.121433 13.071885 12.599860 7.926980
[64] 9.349403 13.106993 8.584079 11.247641 10.744198 8.330467 8.933934
[71] 11.793538 9.596610 15.001984 9.435056 6.873102 7.789240 8.378898
[78] 8.422043 9.515749 5.353567 11.533838 9.003131 9.745201 10.446663
[85] 10.383406 9.747556 9.739211 8.205746 11.155214 7.327233 11.173334
[92] 11.451706 9.147778 8.114784 7.471612 13.408158 10.665739 10.517753
[99] 10.384110 8.325173
to be equivalent. (Though the calls to stats::
All numeric values include a decimal. All integers are indicated using
We group expressions using curly brackets. For example, our convention is to write
res <- {a + b} * {c + d}
in contrast to
res <- (a + b) * (c + d)
Both are acceptable in
If you are not familiar with a function used in our implementations (for illustration assume the method is stats::
The methods discussed on this website rely on regressions. We make use of
For some of the early chapters, the modelObj framework may seem a bit heavy-handed. However, in later chapters we discuss the methods for dynamic treatment regimes as implemented in package DynTxRegime, which was built on the modelObj framework. It is our hope that introducing the framework early in our discussion will facilitate their use in those more complex settings.
A modeling object can be thought of as a Class, such as those encountered in high-level languages C++ and Java. In the traditional language of classes, a modeling object has ‘state variables’ that include the postulated model, the method to be used to estimate parameters, and the method to be used to make predictions. The object also has behaviors such as ‘obtain parameter estimates’ and ‘make predictions.’
This framework essentially separates the implementation of a statistical method that requires a regression step from the details of the regression step. Specifically, a developer does not need to specify a specific regression method (and its inputs!) such as stats::
Users of methods developed on the modeling object framework provide the details of a regression step as a compact input variable. The defined object contains all control parameters for regression and prediction.
This modeling object framework has been implemented in
modelObj::buildModelObj(model,
solver.method = NULL,
solver.args = NULL,
predict.method = NULL,
predict.args = NULL)
Input model is a standard
Inputs solver.method and solver.args specify the method to be used to obtain parameter estimates. solver.method is a character string specifying the
Similarly, inputs predict.method and predict.args specify the method to be used to obtain predictions. predict.method is a character string specifying the
To illustrate, assume that our data set, data, contains covariates \(x1\) and \(x2\) and a continuous outcome variable, \(y\). We postulate a model \(y \sim \beta_0 + \beta_1~x1 + \beta_2~x2\) and want to obtain parameter estimates using ordinary least squares through
fit <- stats::lm(formula = y ~ x1 + x2, data = data)
And, predictions from that analysis would be obtained as
pred <- stats::predict.lm(object = fit)
The modeling object defining these regression and prediction steps is specified as
mo <- modelObj::buildModelObj(model = ~ x1 + x2,
solver.method = "lm",
predict.method = "predict.lm")
Notice that we did not explicitly include the package name stats when specifying solver.method or predict.method. Because we specify the function names using character strings, we cannot include the package name. With this choice of input style, the stats package must be loaded into your
For a binary outcome, \(y,\) we postulate the logistic regression model \(\text{logit}(y) \sim \beta_0 + \beta_1 x_1 + \beta_2 x_2.\) Parameter estimates are obtained using maximum likelihood through
fit <- stats::glm(formula = y ~ x1 + x2,
data = data,
family = binomial)
Notice that an additional input is required; namely, family=binomial. This input specifies the family of the model and the link function; the default for stats::
Predictions from the above analysis would be obtained as
pred <- stats::predict.glm(object = fit, type = "response")
Again there is a change to the default inputs; i.e., type = “response”. The default input for stats::
The modeling object defining these regression and prediction steps is specified as
mo <- modelObj::buildModelObj(model = ~ x1 + x2,
solver.method = "glm",
solver.args = list(family = "binomial"),
predict.method = "predict.glm",
predict.args = list(type = "response"))
There are two important limitations to this framework that must be kept in mind whenever using packages or functions that are built on the modelObj framework.
There is no built-in model checking. It is the responsibility of the user to define models responsibly.
Care must be taken in specifying the scale of the predictions, as in the binary example above.
Package modelObj has several methods available. In this chapter, we will make extensive use of only three of them: modelObj::
modelObj::fit(object, data, response, ...)
Function modelObj::
modelObj::predict(object, ...)
Function modelObj::
modelObj::fitObject(object, ...)
Function modelObj::