Talk from Archives

Breiman’s Samplers or Models? There is a little but important difference: Models, including priors, can be wrong!

16.10.2023 16:45 - 17:45

 

Breiman (2001) urged statisticians to provide tools when the data, X=s(θ,Y); sampler s is available as Black-Box, parameter θεΘ, Y is random, either observed or latent. The paper’s discussants, D. R. Cox and B. Efron, looked at the problem as X-prediction, surprisingly neglecting the statistical inference for θ, and disagreed with the main thrust of the paper. Consequently, mathematical statisticians ignored Breiman’s suggestion! However, computer scientists work with X=s(θ,Y), calling s learning machine.  In this talk, following Breiman, statistical inference tools are presented for θ: a) The Empirical Discrimination Index (EDI), to detect θ-discrimination and identifiability. b) Matching estimates of θ with upper bounds on the errors that depend on the “massiveness” of Θ. c) For known stochastic models of X,  Laplace’s 1774 Principle for inverse probability is proved without Bayes rule, and for unknown X-models, an Approximate Inverse/Fiducial distribution for θ is obtained. The approach can also be used in ABC, providing F-ABC, that includes all θ* drawn from a Θ-sampler, unlike the Rubin (1984) ABC-rejection method followed until now. The results in a) are unique in the literature (YY, 2023). Mild assumptions are needed in b) and c), unlike existing results that need strong and often unverifiable assumptions. The errors’ upper bounds in b) have the same rate, independent of the data dimension. When Θ is subset of Rm, m unknown, the rate can be [mn (log n)/n]1/2 in probability, with mn increasing to infinity as slow as we wish; when m is known, mn=m. Approximate Fiducial distributions and F-ABC posteriors in c) are obtained for any data dimension. Thus, when X=s(θ,Υ) and a cdf, Fθ, is assumed for X, it seems logical  to use instead the sampler, s, and a)-c), since Fθ  and an assumed θ-prior may be wrong.

References

Breiman, L. (2001) Statistical Modeling: The Two Cultures. Stat. Science 16, 3, 199-231.

Rubin, D. B. (1984) Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician. Ann. Statist. 12, 213-244. 

ΥΥ (2023) EDI-Graphic: A Tool to study Parameter Discrimination and confirm Identifiability in Black-Box Models, and to select Data-Generating Machines.

Currently in the “Latest articles”, June 2023, Journal of Computational and Graphical Statistics. https://www.tandfonline.com/doi/full/10.1080/10618600.2023.2205483

Location:
HS 7 OMP1 (#1.303)