Applications
- [1] arXiv:2405.09003 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Nonparametric Inference on Dose-Response Curves Without the Positivity ConditionComments: 74 pages (23 pages for the main paper), 4 figuresSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
Existing statistical methods in causal inference often rely on the assumption that every individual has some chance of receiving any treatment level regardless of its associated covariates, which is known as the positivity condition. This assumption could be violated in observational studies with continuous treatments. In this paper, we present a novel integral estimator of the causal effects with continuous treatments (i.e., dose-response curves) without requiring the positivity condition. Our approach involves estimating the derivative function of the treatment effect on each observed data sample and integrating it to the treatment level of interest so as to address the bias resulting from the lack of positivity condition. The validity of our approach relies on an alternative weaker assumption that can be satisfied by additive confounding models. We provide a fast and reliable numerical recipe for computing our estimator in practice and derive its related asymptotic theory. To conduct valid inference on the dose-response curve and its derivative, we propose using the nonparametric bootstrap and establish its consistency. The practical performances of our proposed estimators are validated through simulation studies and an analysis of the effect of air pollution exposure (PM$_{2.5}$) on cardiovascular mortality rates.
Cross submissions for Thursday, 16 May 2024 (showing 1 of 1 entries )
- [2] arXiv:2307.09546 (replaced) [pdf, ps, other]
-
Title: Spatio-temporal quasi-experimental methods for rare disease outcomes: The impact of reformulated gasoline on childhood hematologic cancerSubjects: Applications (stat.AP)
Although some pollutants emitted in vehicle exhaust, such as benzene, are known to cause leukemia in adults with high exposure levels, less is known about the relationship between traffic-related air pollution (TRAP) and childhood hematologic cancer. In the 1990s, the US EPA enacted the reformulated gasoline program in select areas of the US, which drastically reduced ambient TRAP in affected areas. This created an ideal quasi-experiment to study the effects of TRAP on childhood hematologic cancers. However, existing methods for quasi-experimental analyses can perform poorly when outcomes are rare and unstable, as with childhood cancer incidence. We develop Bayesian spatio-temporal matrix completion methods to conduct causal inference in quasi-experimental settings with rare outcomes. Selective information sharing across space and time enables stable estimation, and the Bayesian approach facilitates uncertainty quantification. We evaluate the methods through simulations and apply them to estimate the causal effects of TRAP on childhood leukemia and lymphoma.
- [3] arXiv:2209.03935 (replaced) [pdf, ps, other]
-
Title: Generative Adversarial Networks Applied to Synthetic Financial Scenarios GenerationJournal-ref: Physica A: Statistical Mechanics and its Applications, 2023, 623, pp.128899Subjects: Computational Finance (q-fin.CP); Applications (stat.AP)
The finance industry is producing an increasing amount of datasets that investment professionals can consider to be influential on the price of financial assets. These datasets were initially mainly limited to exchange data, namely price, capitalization and volume. Their coverage has now considerably expanded to include, for example, macroeconomic data, supply and demand of commodities, balance sheet data and more recently extra-financial data such as ESG scores. This broadening of the factors retained as influential constitutes a serious challenge for statistical modeling. Indeed, the instability of the correlations between these factors makes it practically impossible to identify the joint laws needed to construct scenarios. Fortunately, spectacular advances in Deep Learning field in recent years have given rise to GANs. GANs are a type of generative machine learning models that produce new data samples with the same characteristics as a training data distribution in an unsupervised way, avoiding data assumptions and human induced biases. In this work, we are exploring the use of GANs for synthetic financial scenarios generation. This pilot study is the result of a collaboration between Fujitsu and Advestis and it will be followed by a thorough exploration of the use cases that can benefit from the proposed solution. We propose a GANs-based algorithm that allows the replication of multivariate data representing several properties (including, but not limited to, price, market capitalization, ESG score, controversy score,. . .) of a set of stocks. This approach differs from examples in the financial literature, which are mainly focused on the reproduction of temporal asset price scenarios. We also propose several metrics to evaluate the quality of the data generated by the GANs. This approach is well fit for the generation of scenarios, the time direction simply arising as a subsequent (eventually conditioned) generation of data points drawn from the learned distribution. Our method will allow to simulate high dimensional scenarios (compared to $\lesssim10$ features currently employed in most recent use cases) where network complexity is reduced thanks to a wisely performed feature engineering and selection. Complete results will be presented in a forthcoming study.
- [4] arXiv:2308.01156 (replaced) [pdf, ps, html, other]
-
Title: A new adaptive local polynomial density estimation procedure on complicated domainsComments: 43 pages, 4 figuresSubjects: Statistics Theory (math.ST); Probability (math.PR); Applications (stat.AP); Methodology (stat.ME)
This paper presents a novel approach for pointwise estimation of multivariate density functions on known domains of arbitrary dimensions using nonparametric local polynomial estimators. Our method is highly flexible, as it applies to both simple domains, such as open connected sets, and more complicated domains that are not star-shaped around the point of estimation. This enables us to handle domains with sharp concavities, holes, and local pinches, such as polynomial sectors. Additionally, we introduce a data-driven selection rule based on the general ideas of Goldenshluger and Lepski. Our results demonstrate that the local polynomial estimators are minimax under a $L^2$ risk across a wide range of Hölder-type functional classes. In the adaptive case, we provide oracle inequalities and explicitly determine the convergence rate of our statistical procedure. Simulations on polynomial sectors show that our oracle estimates outperform those of the most popular alternative method, found in the sparr package for the R software. Our statistical procedure is implemented in an online R package which is readily accessible.
- [5] arXiv:2403.08493 (replaced) [pdf, ps, other]
-
Title: Rumor Forwarding Prediction Model Based on Uncertain Time SeriesComments: 11 pages,3 figuresSubjects: Social and Information Networks (cs.SI); Applications (stat.AP)
The rapid spread of rumors in social media is mainly caused by individual retweets. This paper applies uncertainty time series analysis (UTSA) to analyze a rumor retweeting behavior on Weibo. First, the rumor forwarding is modeled using uncertain time series, including order selection, parameter estimation, residual analysis, uncertainty hypothesis testing and forecast, and the validity of using uncertain time series analysis is further supported by analyzing the characteristics of the residual plot. The experimental results show that the uncertain time series can better predict the next stage of rumor forwarding. The results of the study have important practical significance for rumor management and the management of social media information dissemination.