Statistics Theory
- [1] arXiv:2405.08225 [pdf, ps, other]
-
Title: Linear Operator Approximate Message Passing (OpAMP)Comments: 31 pages, 5 figuresSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Probability (math.PR)
This paper introduces a framework for approximate message passing (AMP) in dynamic settings where the data at each iteration is passed through a linear operator. This framework is motivated in part by applications in large-scale, distributed computing where only a subset of the data is available at each iteration. An autoregressive memory term is used to mitigate information loss across iterations and a specialized algorithm, called projection AMP, is designed for the case where each linear operator is an orthogonal projection. Precise theoretical guarantees are provided for a class of Gaussian matrices and non-separable denoising functions. Specifically, it is shown that the iterates can be well-approximated in the high-dimensional limit by a Gaussian process whose second-order statistics are defined recursively via state evolution. These results are applied to the problem of estimating a rank-one spike corrupted by additive Gaussian noise using partial row updates, and the theory is validated by numerical simulations.
- [2] arXiv:2405.08332 [pdf, ps, html, other]
-
Title: Parameter estimation and long-range dependence of the fractional binomial processComments: 18 pages, 4 figuresSubjects: Statistics Theory (math.ST)
In 1990, Jakeman (see \cite{jakeman1990statistics}) defined the binomial process as a special case of the classical birth-death process, where the probability of birth is proportional to the difference between a fixed number and the number of individuals present. Later, a fractional generalization of the binomial process was studied by Cahoy and Polito (2012) (see \cite{cahoy2012fractional}) and called it as fractional binomial process (FBP). In this paper, we study second-order properties of the FBP and the long-range behavior of the FBP and its noise process. We also estimate the parameters of the FBP using the method of moments procedure. Finally, we present the simulated sample paths and its algorithm for the FBP.
- [3] arXiv:2405.08524 [pdf, ps, other]
-
Title: The Asymptotic Properties of the Extreme Eigenvectors of High-dimensional Generalized Spiked Covariance ModelSubjects: Statistics Theory (math.ST)
In this paper, we investigate the asymptotic behaviors of the extreme eigenvectors in a general spiked covariance matrix, where the dimension and sample size increase proportionally. We eliminate the restrictive assumption of the block diagonal structure in the population covariance matrix. Moreover, there is no requirement for the spiked eigenvalues and the 4th moment to be bounded. Specifically, we apply random matrix theory to derive the convergence and limiting distributions of certain projections of the extreme eigenvectors in a large sample covariance matrix within a generalized spiked population model. Furthermore, our techniques are robust and effective, even when spiked eigenvalues differ significantly in magnitude from nonspiked ones. Finally, we propose a powerful statistic for hypothesis testing for the eigenspaces of covariance matrices.
- [4] arXiv:2405.08640 [pdf, ps, other]
-
Title: A sparsity test for multivariate Hawkes processesSubjects: Statistics Theory (math.ST)
Multivariate Hawkes processes (MHP) are a class of point processes in which events at different coordinates interact through mutual excitation. The weighted adjacency matrix of the MHP encodes the strength of the relations, and shares its support with the causal graph of interactions of the process. We consider the problem of testing for causal relationships across the dimensions of a marked MHP. The null hypothesis is that a joint group of adjacency coefficients are null, corresponding to the absence of interactions. The alternative is that they are positive, and the associated interactions do exist. To this end, we introduce a novel estimation procedure in the context of a large sample of independent event sequences. We construct the associated likelihood ratio test and derive the asymptotic distribution of the test statistic as a mixture of chi squared laws. We offer two applications on financial datasets to illustrate the performance of our method. In the first one, our test reveals a deviation from a static equilibrium in bidders' strategies on retail online auctions. In the second one, we uncover some factors at play in the dynamics of German intraday power prices.
- [5] arXiv:2405.08747 [pdf, ps, other]
-
Title: Minimax optimal seriation in polynomial timeSubjects: Statistics Theory (math.ST)
We consider the statistical seriation problem, where the statistician seeks to recover a hidden ordering from a noisy observation of a permuted Robinson matrix. In this paper, we tightly characterize the minimax rate for this problem of matrix reordering when the Robinson matrix is bi-Lipschitz, and we also provide a polynomial time algorithm achieving this rate; thereby answering two open questions of [Giraud et al., 2021]. Our analysis further extends to broader classes of similarity matrices.
- [6] arXiv:2405.08806 [pdf, ps, html, other]
-
Title: Bounds on the Distribution of a Sum of Two Random Variables: Revisiting a problem of Kolmogorov with application to Individual Treatment EffectsSubjects: Statistics Theory (math.ST); Econometrics (econ.EM); Probability (math.PR)
We revisit the following problem, proposed by Kolmogorov: given prescribed marginal distributions $F$ and $G$ for random variables $X,Y$ respectively, characterize the set of compatible distribution functions for the sum $Z=X+Y$. Bounds on the distribution function for $Z$ were given by Markarov (1982), and Frank et al. (1987), the latter using copula theory. However, though they obtain the same bounds, they make different assertions concerning their sharpness. In addition, their solutions leave some open problems in the case when the given marginal distribution functions are discontinuous. These issues have led to some confusion and erroneous statements in subsequent literature, which we correct.
Kolmogorov's problem is closely related to inferring possible distributions for individual treatment effects $Y_1 - Y_0$ given the marginal distributions of $Y_1$ and $Y_0$; the latter being identified from a randomized experiment. We use our new insights to sharpen and correct results due to Fan and Park (2010) concerning individual treatment effects, and to fill some other logical gaps.
New submissions for Wednesday, 15 May 2024 (showing 6 of 6 entries )
- [7] arXiv:2405.08421 (cross-list from cond-mat.dis-nn) [pdf, ps, other]
-
Title: Faster algorithms for the alignment of sparse correlated Erd\"os-R\'enyi random graphsComments: 31 pagesSubjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Data Structures and Algorithms (cs.DS); Probability (math.PR); Statistics Theory (math.ST)
The correlated Erdös-Rényi random graph ensemble is a probability law on pairs of graphs with $n$ vertices, parametrized by their average degree $\lambda$ and their correlation coefficient $s$. It can be used as a benchmark for the graph alignment problem, in which the labels of the vertices of one of the graphs are reshuffled by an unknown permutation; the goal is to infer this permutation and thus properly match the pairs of vertices in both graphs. A series of recent works has unveiled the role of Otter's constant $\alpha$ (that controls the exponential rate of growth of the number of unlabeled rooted trees as a function of their sizes) in this problem: for $s>\sqrt{\alpha}$ and $\lambda$ large enough it is possible to recover in a time polynomial in $n$ a positive fraction of the hidden permutation. The exponent of this polynomial growth is however quite large and depends on the other parameters, which limits the range of applications of the algorithm. In this work we present a family of faster algorithms for this task, show through numerical simulations that their accuracy is only slightly reduced with respect to the original one, and conjecture that they undergo, in the large $\lambda$ limit, phase transitions at modified Otter's thresholds $\sqrt{\widehat{\alpha}}>\sqrt{\alpha}$, with $\widehat{\alpha}$ related to the enumeration of a restricted family of trees.
- [8] arXiv:2405.08787 (cross-list from cs.DS) [pdf, ps, other]
-
Title: Explicit Orthogonal Arrays and Universal Hashing with Arbitrary ParametersSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Combinatorics (math.CO); Statistics Theory (math.ST)
Orthogonal arrays are a type of combinatorial design that were developed in the 1940s in the design of statistical experiments. In 1947, Rao proved a lower bound on the size of any orthogonal array, and raised the problem of constructing arrays of minimum size. Kuperberg, Lovett and Peled (2017) gave a non-constructive existence proof of orthogonal arrays whose size is near-optimal (i.e., within a polynomial of Rao's lower bound), leaving open the question of an algorithmic construction. We give the first explicit, deterministic, algorithmic construction of orthogonal arrays achieving near-optimal size for all parameters. Our construction uses algebraic geometry codes.
In pseudorandomness, the notions of $t$-independent generators or $t$-independent hash functions are equivalent to orthogonal arrays. Classical constructions of $t$-independent hash functions are known when the size of the codomain is a prime power, but very few constructions are known for an arbitrary codomain. Our construction yields algorithmically efficient $t$-independent hash functions for arbitrary domain and codomain.
Cross submissions for Wednesday, 15 May 2024 (showing 2 of 2 entries )
- [9] arXiv:2310.03722 (replaced) [pdf, ps, html, other]
-
Title: Anytime-valid t-tests and confidence sequences for Gaussian means with unknown varianceComments: Substantive revision in v3 (Apr 23 2024)Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma^2$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an "e-process" (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious polynomial dependence on the error probability $\alpha$ that we prove to be not only unavoidable, but (for universal inference) even better than the classical fixed-sample t-test. Numerical experiments are provided along the way to compare and contrast the various approaches, including some recent suboptimal ones.
- [10] arXiv:2403.19396 (replaced) [pdf, ps, other]
-
Title: Persistence Diagram Estimation of Multivariate Piecewise H\"older-continuous SignalsComments: 33 pagesSubjects: Statistics Theory (math.ST); Algebraic Topology (math.AT)
To our knowledge, the analysis of convergence rates for persistence diagram estimation from noisy signals had predominantly relied on lifting signal estimation results through sup norm (or other functional norm) stability theorems. We believe that moving forward from this approach can lead to considerable gains. We illustrate it in the setting of Gaussian white noise model. We examine from a minimax perspective, the inference of persistence diagram (for sublevel sets filtration). We show that for piecewise Hölder-continuous functions, with control over the reach of the discontinuities set, taking the persistence diagram coming from a simple histogram estimator of the signal, permit to achieve the minimax rates known for Hölder-continuous functions.
- [11] arXiv:2405.07910 (replaced) [pdf, ps, other]
-
Title: A Unification of Exchangeability and Continuous Exposure and Confounder Measurement Errors: Probabilistic ExchangeabilitySubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Exchangeability concerning a continuous exposure, X, implies no confounding bias when identifying average exposure effects of X, AEE(X). When X is measured with error (Xep), two challenges arise in identifying AEE(X). Firstly, exchangeability regarding Xep does not equal exchangeability regarding X. Secondly, the necessity of the non-differential error assumption (NDEA), overly stringent in practice, remains uncertain. To address them, this article proposes unifying exchangeability and exposure and confounder measurement errors with three novel concepts. The first, Probabilistic Exchangeability (PE), states that the outcomes of those with Xep=e are probabilistically exchangeable with the outcomes of those truly exposed to X=eT. The relationship between AEE(Xep) and AEE(X) in risk difference and ratio scales is mathematically expressed as a probabilistic certainty, termed exchangeability probability (Pe). Squared Pe (Pe.sq) quantifies the extent to which AEE(Xep) differs from AEE(X) due to exposure measurement error through mechanisms not akin to confounding mechanisms. The coefficient of determination (R.sq) in the regression of X against Xep may sometimes be sufficient to measure Pe.sq. The second concept, Emergent Pseudo Confounding (EPC), describes the bias introduced by exposure measurement error through mechanisms akin to confounding mechanisms. PE can hold when EPC is controlled for, which is weaker than NDEA. The third, Emergent Confounding, describes when bias due to confounder measurement error arises. Adjustment for E(P)C can be performed like confounding adjustment to ensure PE. This paper provides formal justifications for using AEE(Xep) and maximum insight into potential divergence of AEE(Xep) from AEE(X) and how to measure it.
- [12] arXiv:2104.13753 (replaced) [pdf, ps, other]
-
Title: Sum-of-norms clustering does not separate nearby ballsComments: 40 pages, 17 figures, published versionJournal-ref: Journal of Machine Learning Research, volume 25 (2024), no. 143, pp. 1--40Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Sum-of-norms clustering is a popular convexification of $K$-means clustering. We show that, if the dataset is made of a large number of independent random variables distributed according to the uniform measure on the union of two disjoint balls of unit radius, and if the balls are sufficiently close to one another, then sum-of-norms clustering will typically fail to recover the decomposition of the dataset into two clusters. As the dimension tends to infinity, this happens even when the distance between the centers of the two balls is taken to be as large as $2\sqrt{2}$. In order to show this, we introduce and analyze a continuous version of sum-of-norms clustering, where the dataset is replaced by a general measure. In particular, we state and prove a local-global characterization of the clustering that seems to be new even in the case of discrete datapoints.
- [13] arXiv:2107.10955 (replaced) [pdf, ps, other]
-
Title: Learning Linear Polytree Structural Equation ModelsComments: 35 pages, 5 figures, 4 tablesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree, which is uniquely represented by a CPDAG. On the other hand, necessary conditions on the required sample sizes for both skeleton and CPDAG recovery are also derived in terms of information-theoretic lower bounds, which match the respective sufficient conditions and thereby give a sharp characterization of the difficulty of these tasks. We also consider the problem of inverse correlation matrix estimation under the linear polytree models, and establish the estimation error bound in terms of the dimension and the total number of v-structures. We also consider an extension of group linear polytree models, in which each node represents a group of variables. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of polytree learning when the true graphical structures can only be approximated by polytrees.
- [14] arXiv:2405.07979 (replaced) [pdf, ps, other]
-
Title: Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interferenceSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, which is typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, which are typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and consider the problem of estimation in low-order outcome models using data from a general experimental design. Our contributions are threefold. First, we present an estimator of the total treatment effect (also called the global average treatment effect) in a low-degree outcome model when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of cluster randomized designs with both Bernoulli and complete randomization. For clustered Bernoulli randomization, we find that our estimator is always unbiased and that its variance scales like the smaller of the variance obtained from a low-order assumption and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. For clustered complete randomization, we find a notable bias-variance trade-off mediated by specific features of the clustering. Third, when choosing a clustered experimental design, our bounds can be used to select a clustering from a set of candidate clusterings. Across a range of graphs and clustering algorithms, we show that our method consistently selects clusterings that perform well on a range of response models, suggesting that our bounds are useful to practitioners.