In modern applications, selection of the formal statistical problem is typically done after some level of interaction with the data. Usually, an initial exploratory analysis is used to identify interesting aspects of the population under study, and then the same dataset is used to learn about them. Such “data snooping” invalidates classical inferential procedures. Many approaches have been proposed to restore inferential validity in these settings. In this talk, I will present an alternative to data splitting based on randomization which allows for higher selection and inferential power. I will describe the theoretical and empirical advantages of this method and discuss some related problems of current interest.
Underlying paper: "Splitting strategies for post-selection inference"
Personal website of Daniel Garcia Rasines