Nowadays clinical predictive or diagnostic models are ubiquitous, facilitating personalized medicine and guiding therapy decisions. Developing such models often requires the use of variable selection or shrinkage techniques. However, classical methods for statistical inference are not applicable in such settings, as they assume that the set of modelling variables is fixed. In contrast, valid post-selection inference must account for the selection of the variables. This can be facilitated by means of the selective inference framework, which provides inference when the statistical hypotheses to be tested are explored, and analysed using the same set of data. In recent years the methodology was developed for the widely used Lasso method, i.e. L1-penalized regression, but there are also approaches agnostic of the model selection procedure.
We present our experiences in working within a selective inference framework. In a systematic simulation study in linear regression, including settings based on real clinical data, we applied techniques for selective inference to obtain confidence intervals for Lasso regression models and studied their properties such as selective coverage, power to exclude zero and stability. To discuss the practical applicability of selective inference we provide a real-data example using the freely available Johnson’s body fat dataset, which is concerned with the estimation of body fat in men using correlated anthropometric body measurements.
We found available software for selective inference to be challenging to work with. Lasso-specific confidence intervals tended to be very wide and quite variable, but could potentially improve model selection properties, in particular false positive findings. Selection agnostic methods, which are so far only available in linear regression, were found to be more conservative and computationally demanding, limiting their practical usability. In conclusion, selective inference using the Lasso remains a challenging problem in practice as the interpretation requires proper understanding of the use-case, and corresponding user-friendly software is still in its infancy.
(Michael Kammer, Daniela Dunkler, Stefan Michiels, Georg Heinze)
Link to the personal website of Michael Kammer