Statistics Theory
- [1] arXiv:2405.08907 [pdf, ps, html, other]
-
Title: Properties of stationary cyclical processesSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
The paper investigates the theoretical properties of zero-mean stationary time series with cyclical components, admitting the representation $y_t=\alpha_t \cos \lambda t + \beta_t \sin \lambda t$, with $\lambda \in (0,\pi]$ and $[\alpha_t\,\, \beta_t]$ following some bivariate process. We diagnose that in the extant literature on cyclic time series, a prevalent assumption of Gaussianity for $[\alpha_t\,\, \beta_t]$ imposes inadvertently a severe restriction on the amplitude of the process. Moreover, it is shown that other common distributions may suffer from either similar defects or fail to guarantee the stationarity of $y_t$. To address both of the issues, we propose to introduce a direct stochastic modulation of the amplitude and phase shift in an almost periodic function. We prove that this novel approach may lead, in general, to a stationary (up to any order) time series, and specifically, to a zero-mean stationary time series featuring cyclicity, with a pseudo-cyclical autocovariance function that may even decay at a very slow rate. The proposed process fills an important gap in this type of models and allows for flexible modeling of amplitude and phase shift.
- [2] arXiv:2405.09196 [pdf, ps, other]
-
Title: Harnessing pattern-by-pattern linear classifiers for prediction with missing dataSubjects: Statistics Theory (math.ST)
Missing values have been thoroughly analyzed in the context of linear models, where the final aim is to build coefficient estimates. However, estimating coefficients does not directly solve the problem of prediction with missing entries: a manner to address empty components must be designed. Major approaches to deal with prediction with missing values are empirically driven and can be decomposed into two families: imputation (filling in empty fields) and pattern-by-pattern prediction, where a predictor is built on each missing pattern. Unfortunately, most simple imputation techniques used in practice (as constant imputation) are not consistent when combined with linear models. In this paper, we focus on the more flexible pattern-by-pattern approaches and study their predictive performances on Missing Completely At Random (MCAR) data. We first show that a pattern-by-pattern logistic regression model is intrinsically ill-defined, implying that even classical logistic regression is impossible to apply to missing data. We then analyze the perceptron model and show how the linear separability property extends to partially-observed inputs. Finally, we use the Linear Discriminant Analysis to prove that pattern-by-pattern LDA is consistent in a high-dimensional regime. We refine our analysis to more complex MNAR data.
- [3] arXiv:2405.09510 [pdf, ps, html, other]
-
Title: The Instrumental Variable Model with Categorical Instrument, Treatment and OutcomeSubjects: Statistics Theory (math.ST)
Instrumental variable models are central to the inference of causal effects in many settings. We consider the instrumental variable model with discrete variables where the instrument (Z), exposure (X) and outcome (Y) take Q, K, and M levels respectively. We assume that the instrument is randomized and that there is no direct effect of Z on Y so that Y(x,z) = Y(x). We first provide a simple characterization of the set of joint distributions of the potential outcomes P(Y(x=1), ..., Y(x=K)) compatible with a given observed distribution P(X, Y | Z). We then discuss the variation (in)dependence property of the marginal probability distribution of the potential outcomes P(Y(x=1)), ..., P(Y(x=K)) which has direct implications for partial identification of average causal effect contrasts such as E[Y(x=i) - Y(x=j)]. We also include simulation results on the volume of the observed distributions not compatible with the IV model as K and Q change.
- [4] arXiv:2405.09511 [pdf, ps, html, other]
-
Title: Stability via resampling: statistical problems beyond the real lineSubjects: Statistics Theory (math.ST)
Model averaging techniques based on resampling methods (such as bootstrapping or subsampling) have been utilized across many areas of statistics, often with the explicit goal of promoting stability in the resulting output. We provide a general, finite-sample theoretical result guaranteeing the stability of bagging when applied to algorithms that return outputs in a general space, so that the output is not necessarily a real-valued -- for example, an algorithm that estimates a vector of weights or a density function. We empirically assess the stability of bagging on synthetic and real-world data for a range of problem settings, including causal inference, nonparametric regression, and Bayesian model selection.
- [5] arXiv:2405.09523 [pdf, ps, html, other]
-
Title: On Semi-supervised Estimation of Discrete Distributions under f-divergencesComments: Full version. Presented in ISIT-24. arXiv admin note: text overlap with arXiv:2305.07955Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
We study the problem of estimating the joint probability mass function (pmf) over two random variables. In particular, the estimation is based on the observation of $m$ samples containing both variables and $n$ samples missing one fixed variable. We adopt the minimax framework with $l^p_p$ loss functions. Recent work established that univariate minimax estimator combinations achieve minimax risk with the optimal first-order constant for $p \ge 2$ in the regime $m = o(n)$, questions remained for $p \le 2$ and various $f$-divergences. In our study, we affirm that these composite estimators are indeed minimax optimal for $l^p_p$ loss functions, specifically for the range $1 \le p \le 2$, including the critical $l_1$ loss. Additionally, we ascertain their optimality for a suite of $f$-divergences, such as KL, $\chi^2$, Squared Hellinger, and Le Cam divergences.
New submissions for Thursday, 16 May 2024 (showing 5 of 5 entries )
- [6] arXiv:2405.08975 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: A distribution-free valid p-value for finite samples of bounded random variablesComments: -Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We build a valid p-value based on a concentration inequality for bounded random variables introduced by Pelekis, Ramon and Wang. The motivation behind this work is the calibration of predictive algorithms in a distribution-free setting. The super-uniform p-value is tighter than Hoeffding and Bentkus alternatives in certain regions. Even though we are motivated by a calibration setting in a machine learning context, the ideas presented in this work are also relevant in classical statistical inference. Furthermore, we compare the power of a collection of valid p- values for bounded losses, which are presented in previous literature.
- [7] arXiv:2405.09003 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Nonparametric Inference on Dose-Response Curves Without the Positivity ConditionComments: 74 pages (23 pages for the main paper), 4 figuresSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
Existing statistical methods in causal inference often rely on the assumption that every individual has some chance of receiving any treatment level regardless of its associated covariates, which is known as the positivity condition. This assumption could be violated in observational studies with continuous treatments. In this paper, we present a novel integral estimator of the causal effects with continuous treatments (i.e., dose-response curves) without requiring the positivity condition. Our approach involves estimating the derivative function of the treatment effect on each observed data sample and integrating it to the treatment level of interest so as to address the bias resulting from the lack of positivity condition. The validity of our approach relies on an alternative weaker assumption that can be satisfied by additive confounding models. We provide a fast and reliable numerical recipe for computing our estimator in practice and derive its related asymptotic theory. To conduct valid inference on the dose-response curve and its derivative, we propose using the nonparametric bootstrap and establish its consistency. The practical performances of our proposed estimators are validated through simulation studies and an analysis of the effect of air pollution exposure (PM$_{2.5}$) on cardiovascular mortality rates.
- [8] arXiv:2405.09500 (cross-list from econ.TH) [pdf, ps, html, other]
-
Title: Identifying Heterogeneous Decision Rules From Choices When Menus Are UnobservedSubjects: Theoretical Economics (econ.TH); Econometrics (econ.EM); Statistics Theory (math.ST)
Given only aggregate choice data and limited information about how menus are distributed across the population, we describe what can be inferred robustly about the distribution of preferences (or more general decision rules). We strengthen and generalize existing results on such identification and provide an alternative analytical approach to study the problem. We show further that our model and results are applicable, after suitable reinterpretation, to other contexts. One application is to the robust identification of the distribution of updating rules given only the population distribution of beliefs and limited information about heterogeneous information sources.
Cross submissions for Thursday, 16 May 2024 (showing 3 of 3 entries )
- [9] arXiv:2212.01792 (replaced) [pdf, ps, html, other]
-
Title: Classification by sparse generalized additive modelsSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Methodology (stat.ME)
We consider (nonparametric) sparse (generalized) additive models (SpAM) for classification. The design of a SpAM classifier is based on minimizing the logistic loss with a sparse group Lasso/Slope-type penalties on the coefficients of univariate additive components' expansions in orthonormal series (e.g., Fourier or wavelets). The resulting classifier is inherently adaptive to the unknown sparsity and smoothness. We show that under certain sparse group restricted eigenvalue condition it is nearly-minimax (up to log-factors) simultaneously across the entire range of analytic, Sobolev and Besov classes. The performance of the proposed classifier is illustrated on a simulated and a real-data examples.
- [10] arXiv:2305.13152 (replaced) [pdf, ps, html, other]
-
Title: Covariate-informed reconstruction of partially observed functional data via factor modelsSubjects: Statistics Theory (math.ST)
This paper studies linear reconstruction of partially observed functional data which are recorded on a discrete grid. We propose a novel estimation approach based on approximate factor models with increasing rank taking into account potential covariate information. Whereas alternative reconstruction procedures commonly involve some preliminary smoothing, our method separates the signal from noise and reconstructs missing fragments at once. We establish uniform convergence rates of our estimator and introduce a new method for constructing simultaneous prediction bands for the missing trajectories. A simulation study examines the performance of the proposed methods in finite samples. Finally, a real data application of temperature curves demonstrates that our theory provides a simple and effective method to recover missing fragments.
- [11] arXiv:2308.01156 (replaced) [pdf, ps, html, other]
-
Title: A new adaptive local polynomial density estimation procedure on complicated domainsComments: 43 pages, 4 figuresSubjects: Statistics Theory (math.ST); Probability (math.PR); Applications (stat.AP); Methodology (stat.ME)
This paper presents a novel approach for pointwise estimation of multivariate density functions on known domains of arbitrary dimensions using nonparametric local polynomial estimators. Our method is highly flexible, as it applies to both simple domains, such as open connected sets, and more complicated domains that are not star-shaped around the point of estimation. This enables us to handle domains with sharp concavities, holes, and local pinches, such as polynomial sectors. Additionally, we introduce a data-driven selection rule based on the general ideas of Goldenshluger and Lepski. Our results demonstrate that the local polynomial estimators are minimax under a $L^2$ risk across a wide range of Hölder-type functional classes. In the adaptive case, we provide oracle inequalities and explicitly determine the convergence rate of our statistical procedure. Simulations on polynomial sectors show that our oracle estimates outperform those of the most popular alternative method, found in the sparr package for the R software. Our statistical procedure is implemented in an online R package which is readily accessible.
- [12] arXiv:2404.17222 (replaced) [pdf, ps, other]
-
Title: Asymptotic analysis for covariance parameter estimation of Gaussian processes with functional inputsSubjects: Statistics Theory (math.ST)
We consider covariance parameter estimation for Gaussian processes with functional inputs. From an increasing-domain asymptotics perspective, we prove the asymptotic consistency and normality of the maximum likelihood estimator. We extend these theoretical guarantees to encompass scenarios accounting for approximation errors in the inputs, which allows robustness of practical implementations relying on conventional sampling methods or projections onto a functional basis. Loosely speaking, both consistency and normality hold when the approximation error becomes negligible, a condition that is often achieved as the number of samples or basis functions becomes large. These later asymptotic properties are illustrated through analytical examples, including one that covers the case of non-randomly perturbed grids, as well as several numerical illustrations.
- [13] arXiv:2209.07295 (replaced) [pdf, ps, html, other]
-
Title: A new set of tools for goodness-of-fit validationComments: 35 pages, 10 figures, submitted to the Electronic Journal of StatisticSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
We introduce two new tools to assess the validity of statistical distributions. These tools are based on components derived from a new statistical quantity, the $comparison$ $curve$. The first tool is a graphical representation of these components on a $bar$ $plot$ (B plot), which can provide a detailed appraisal of the validity of the statistical model, in particular when supplemented by acceptance regions related to the model. The knowledge gained from this representation can sometimes suggest an existing $goodness$-$of$-$fit$ test to supplement this visual assessment with a control of the type I error. Otherwise, an adaptive test may be preferable and the second tool is the combination of these components to produce a powerful $\chi^2$-type goodness-of-fit test. Because the number of these components can be large, we introduce a new selection rule to decide, in a data driven fashion, on their proper number to take into consideration. In a simulation, our goodness-of-fit tests are seen to be powerwise competitive with the best solutions that have been recommended in the context of a fully specified model as well as when some parameters must be estimated. Practical examples show how to use these tools to derive principled information about where the model departs from the data.
- [14] arXiv:2306.08321 (replaced) [pdf, ps, html, other]
-
Title: Nonparametric regression using over-parameterized shallow ReLU neural networksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the Hölder space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.
- [15] arXiv:2404.17358 (replaced) [pdf, ps, html, other]
-
Title: Adversarial Consistency and the Uniqueness of the Adversarial Bayes ClassifierComments: 18 pages, v2: fixed typosSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Adversarial training is a common technique for learning robust classifiers. Prior work showed that convex surrogate losses are not statistically consistent in the adversarial context -- or in other words, a minimizing sequence of the adversarial surrogate risk will not necessarily minimize the adversarial classification error. We connect the consistency of adversarial surrogate losses to properties of minimizers to the adversarial classification risk, known as \emph{adversarial Bayes classifiers}. Specifically, under reasonable distributional assumptions, a convex loss is statistically consistent for adversarial learning iff the adversarial Bayes classifier satisfies a certain notion of uniqueness.