Quantitative Methods

Cross-lists
Replacements

See recent articles

Total of 6 entries

Showing up to 1000 entries per page: fewer | more | all

[1] arXiv:2406.06767 (cross-list from stat.ME) [pdf, other]: Title: ULV: A robust statistical method for clustered data, with applications to multisubject, single-cell omics data

Mingyu Du, Kevin Johnston, Veronica Berrocal, Wei Li, Xiangmin Xu, Zhaoxia Yu

Subjects: Methodology (stat.ME); Quantitative Methods (q-bio.QM); Computation (stat.CO)

Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile biological information at the single-cell level. However, the analysis of such data faces several critical challenges: limited number of individuals, non-normality, potential dropouts, outliers, and repeated measurements from the same individual. In this article, we propose a novel method, which we call U-statistic based latent variable (ULV). Our proposed method takes advantage of the robustness of rank-based statistics and exploits the statistical efficiency of parametric methods for small sample sizes. It is a computationally feasible framework that addresses all the issues mentioned above simultaneously. An additional advantage of ULV is its flexibility in modeling various types of single-cell data, including both RNA and protein abundance. The usefulness of our method is demonstrated in two studies: a single-cell proteomics study of acute myelogenous leukemia (AML) and a single-cell RNA study of COVID-19 symptoms. In the AML study, ULV successfully identified differentially expressed proteins that would have been missed by the pseudobulk version of the Wilcoxon rank-sum test. In the COVID-19 study, ULV identified genes associated with covariates such as age and gender, and genes that would be missed without adjusting for covariates. The differentially expressed genes identified by our method are less biased toward genes with high expression levels. Furthermore, ULV identified additional gene pathways likely contributing to the mechanisms of COVID-19 severity.
[2] arXiv:2406.06768 (cross-list from stat.ME) [pdf, other]: Title: Data-Driven Switchback Experiments: Theoretical Tradeoffs and Empirical Bayes Designs

Ruoxuan Xiong, Alex Chin, Sean J. Taylor

Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Quantitative Methods (q-bio.QM)

We study the design and analysis of switchback experiments conducted on a single aggregate unit. The design problem is to partition the continuous time space into intervals and switch treatments between intervals, in order to minimize the estimation error of the treatment effect. We show that the estimation error depends on four factors: carryover effects, periodicity, serially correlated outcomes, and impacts from simultaneous experiments. We derive a rigorous bias-variance decomposition and show the tradeoffs of the estimation error from these factors. The decomposition provides three new insights in choosing a design: First, balancing the periodicity between treated and control intervals reduces the variance; second, switching less frequently reduces the bias from carryover effects while increasing the variance from correlated outcomes, and vice versa; third, randomizing interval start and end points reduces both bias and variance from simultaneous experiments. Combining these insights, we propose a new empirical Bayes design approach. This approach uses prior data and experiments for designing future experiments. We illustrate this approach using real data from a ride-sharing platform, yielding a design that reduces MSE by 33% compared to the status quo design used on the platform.
[3] arXiv:2406.06841 (cross-list from cs.LG) [pdf, html, other]: Title: Compass: A Comprehensive Tool for Accurate and Efficient Molecular Docking in Inference and Fine-Tuning

Ahmet Sarigun, Vedran Franke, Altuna Akalin

Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

While there has been discussion about noise levels in molecular docking datasets such as PDBBind, a thorough analysis of their physical/chemical and bioactivity noise characteristics is still lacking. PoseCheck addresses this issue by examining molecular strain energy, molecular-protein clashes, and interactions, but it is primarily created for $de$ $novo$ drug design. Another important metric in molecular docking, Binding Affinity Energy, is better assessed by the new empirical score function, AA-Score, which has demonstrated improved performance over existing methods. To tackle these challenges, we propose the COMPASS method, which integrates the PoseCheck and AA-Score modules. This approach evaluates dataset noise levels and the physical/chemical and bioactivity feasibility of docked molecules. Our analysis of the PDBBind dataset using COMPASS reveals significant noise in the ground truth data. Additionally, we incorporate COMPASS with the state-of-the-art molecular docking method, DiffDock, in inference mode to achieve efficient and accurate assessments of docked ligands. Finally, we propose a new paradigm to enhance model performance for molecular docking through fine-tuning and discuss the potential benefits of this approach. The source code is available publicly at this https URL.
[4] arXiv:2406.07025 (cross-list from cs.LG) [pdf, other]: Title: Entropy-Reinforced Planning with Large Language Models for Drug Discovery

Xuefeng Liu, Chih-chan Tien, Peng Ding, Songhao Jiang, Rick L. Stevens

Comments: Published in ICML2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms of likelihood for molecule generation. However, relying solely on LLM decoding often results in the generation of molecules that are either invalid due to a single misused token, or suboptimal due to unbalanced exploration and exploitation as a consequence of the LLMs prior experience. Here we propose ERP, Entropy-Reinforced Planning for Transformer Decoding, which employs an entropy-reinforced planning algorithm to enhance the Transformer decoding process and strike a balance between exploitation and exploration. ERP aims to achieve improvements in multiple properties compared to direct sampling from the Transformer. We evaluated ERP on the SARS-CoV-2 virus (3CLPro) and human cancer cell target protein (RTCB) benchmarks and demonstrated that, in both benchmarks, ERP consistently outperforms the current state-of-the-art algorithm by 1-5 percent, and baselines by 5-10 percent, respectively. Moreover, such improvement is robust across Transformer models trained with different objectives. Finally, to further illustrate the capabilities of ERP, we tested our algorithm on three code generation benchmarks and outperformed the current state-of-the-art approach as well. Our code is publicly available at: this https URL.
[5] arXiv:2406.07263 (cross-list from cs.LG) [pdf, html, other]: Title: Active learning for affinity prediction of antibodies

Alexandra Gessner, Sebastian W. Ober, Owen Vickery, Dino Oglić, Talip Uçar

Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

The primary objective of most lead optimization campaigns is to enhance the binding affinity of ligands. For large molecules such as antibodies, identifying mutations that enhance antibody affinity is particularly challenging due to the combinatorial explosion of potential mutations. When the structure of the antibody-antigen complex is available, relative binding free energy (RBFE) methods can offer valuable insights into how different mutations will impact the potency and selectivity of a drug candidate, thereby reducing the reliance on costly and time-consuming wet-lab experiments. However, accurately simulating the physics of large molecules is computationally intensive. We present an active learning framework that iteratively proposes promising sequences for simulators to evaluate, thereby accelerating the search for improved binders. We explore different modeling approaches to identify the most effective surrogate model for this task, and evaluate our framework both using pre-computed pools of data and in a realistic full-loop setting.

[6] arXiv:2402.16747 (replaced) [pdf, other]: Title: Recent progress in the physical principles of dynamic ground self-righting

Chen Li

Comments: In review at Integrative & Comparative Biology

Subjects: Biological Physics (physics.bio-ph); Systems and Control (eess.SY); Quantitative Methods (q-bio.QM)

Animals and robots must self-right on the ground after overturning. Biology research described various strategies and motor patterns in many species. Robotics research devised many strategies. However, we do not well understand how the physical principles of how the need to generate mechanical energy to overcome the potential energy barrier governs behavioral strategies and 3-D body rotations given the morphology. Here I review progress on this which I led studying cockroaches self-righting on level, flat, solid, low-friction ground, by integrating biology experiments, robotic modeling, and physics modeling.

Total of 6 entries

Showing up to 1000 entries per page: fewer | more | all

Quantitative Methods

Cross submissions for Wednesday, 12 June 2024 (showing 5 of 5 entries )

Replacement submissions for Wednesday, 12 June 2024 (showing 1 of 1 entries )