Electrical Engineering and Systems Science

New submissions

Submissions received from Thu 25 Apr 24 to Fri 26 Apr 24, announced Mon, 29 Apr 24

New submissions
Cross-lists
Replacements

[ total of 86 entries: 1-86 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 29 Apr 24

[1] arXiv:2404.16883 [pdf, other]: Title: Myopically Verifiable Probabilistic Certificates for Safe Control and Learning

Authors: Zhuoyuan Wang, Haoming Jing, Christian Kurniawan, Albert Chern, Yorie Nakahira

Comments: arXiv admin note: substantial text overlap with arXiv:2110.13380

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-based approaches that account for the long-term future may require prohibitive computation in real-time decision making. To overcome this challenge involving stringent long-term safety vs. computation tradeoffs, we first introduce a novel technique termed `probabilistic invariance'. This technique characterizes the invariance conditions of the probability of interest. When the target probability is defined using long-term trajectories, this technique can be used to design myopic conditions/controllers with assured long-term safe probability. Then, we integrate this technique into safe control and learning. The proposed control methods efficiently assure long-term safety using neural networks or model predictive controllers with short outlook horizons. The proposed learning methods can be used to guarantee long-term safety during and after training. Finally, we demonstrate the performance of the proposed techniques in numerical simulations.
[2] arXiv:2404.16900 [pdf, other]: Title: Space-Variant Total Variation boosted by learning techniques in few-view tomographic imaging

Authors: Elena Morotti, Davide Evangelista, Andrea Sebastiani, Elena Loli Piccolomini

Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

This paper focuses on the development of a space-variant regularization model for solving an under-determined linear inverse problem. The case study is a medical image reconstruction from few-view tomographic noisy data. The primary objective of the proposed optimization model is to achieve a good balance between denoising and the preservation of fine details and edges, overcoming the performance of the popular and largely used Total Variation (TV) regularization through the application of appropriate pixel-dependent weights. The proposed strategy leverages the role of gradient approximations for the computation of the space-variant TV weights. For this reason, a convolutional neural network is designed, to approximate both the ground truth image and its gradient using an elastic loss function in its training. Additionally, the paper provides a theoretical analysis of the proposed model, showing the uniqueness of its solution, and illustrates a Chambolle-Pock algorithm tailored to address the specific problem at hand. This comprehensive framework integrates innovative regularization techniques with advanced neural network capabilities, demonstrating promising results in achieving high-quality reconstructions from low-sampled tomographic data.
[3] arXiv:2404.16953 [pdf, other]: Title: An unsupervised learning-based shear wave tracking method for ultrasound elastography

Authors: Remi Delaunay, Yipeng Hu, Tom Vercauteren

Comments: Accepted to SPIE Medical Imaging 2022

Subjects: Image and Video Processing (eess.IV)

Shear wave elastography involves applying a non-invasive acoustic radiation force to the tissue and imaging the induced deformation to infer its mechanical properties. This work investigates the use of convolutional neural networks to improve displacement estimation accuracy in shear wave imaging. Our training approach is completely unsupervised, which allows to learn the estimation of the induced micro-scale deformations without ground truth labels. We also present an ultrasound simulation dataset where the shear wave propagation has been simulated via finite element method. Our dataset is made publicly available along with this paper, and consists in 150 shear wave propagation simulations in both homogenous and hetegeneous media, which represents a total of 20,000 ultrasound images. We assessed the ability of our learning-based approach to characterise tissue elastic properties (i.e., Young's modulus) on our dataset and compared our results with a classical normalised cross-correlation approach.
[4] arXiv:2404.16974 [pdf, ps, other]: Title: DRL2FC: An Attack-Resilient Controller for Automatic Generation Control Based on Deep Reinforcement Learning

Authors: Vasileios Dimitropoulos, Andreas D. Syrmakesis, Nikos Hatziargyriou

Comments: 2 pages, 2 figures, submitted to the 14th Mediterranean Conference on Power Generation, Transmission, Distribution and Energy Conversion

Subjects: Systems and Control (eess.SY)

Power grids heavily rely on Automatic Generation Control (AGC) systems to maintain grid stability by balancing generation and demand. However, the increasing digitization and interconnection of power grid infrastructure expose AGC systems to new vulnerabilities, particularly from cyberattacks such as false data injection attacks (FDIAs). These attacks aim at manipulating sensor measurements and control signals by injecting tampered data into the communication mediums. As such, it is necessary to develop innovative approaches that enhance the resilience of AGC systems. This paper addresses this challenge by exploring the potential of deep reinforcement learning (DRL) to enhancing the resilience of AGC systems against FDIAs. To this end, a DRL-based controller is proposed that dynamically adjusts generator setpoints in response to both load fluctuations and potential cyber threats. The controller learns these optimal control policies by interacting with a simulated power system environment that incorporates the AGC dynamics under cyberattacks. The extensive experiments on test power systems subjected to various FDIAs demonstrate the effectiveness of the presented approach in mitigating the impact of cyberattacks.
[5] arXiv:2404.17015 [pdf, other]: Title: Defect Localization Using Region of Interest and Histogram-Based Enhancement Approaches in 3D-Printing

Authors: Md Manjurul Ahsan, Shivakumar Raman, Zahed Siddique

Subjects: Image and Video Processing (eess.IV)

Additive manufacturing (AM), particularly 3D printing, has revolutionized the production of complex structures across various industries. However, ensuring quality and detecting defects in 3D-printed objects remain significant challenges. This study focuses on improving defect detection in 3D-printed cylinders by integrating novel pre-processing techniques such as Region of Interest (ROI) selection, Histogram Equalization (HE), and Details Enhancer (DE) with Convolutional Neural Networks (CNNs), specifically the modified VGG16 model. The approaches, ROIN, ROIHEN, and ROIHEDEN, demonstrated promising results, with the best model achieving an accuracy of 1.00 and an F1-score of 1.00 on the test set. The study also explored the models' interpretability through Local Interpretable Model-Agnostic Explanations and Gradient-weighted Class Activation Mapping, enhancing the understanding of the decision-making process. Furthermore, the modified VGG16 model showed superior computational efficiency with 30713M FLOPs and 15M parameters, the lowest among the compared models. These findings underscore the significance of tailored pre-processing and CNNs in enhancing defect detection in AM, offering a pathway to improve manufacturing precision and efficiency. This research not only contributes to the advancement of 3D printing technology but also highlights the potential of integrating machine learning with AM for superior quality control.
[6] arXiv:2404.17045 [pdf, other]: Title: Toward Automated Formation of Composite Micro-Structures Using Holographic Optical Tweezers

Authors: Tommy Zhang, Nicole Werner, Ashis G. Banerjee

Comments: To appear in the Proceedings of the 2024 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS)

Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

Holographic Optical Tweezers (HOT) are powerful tools that can manipulate micro and nano-scale objects with high accuracy and precision. They are most commonly used for biological applications, such as cellular studies, and more recently, micro-structure assemblies. Automation has been of significant interest in the HOT field, since human-run experiments are time-consuming and require skilled operator(s). Automated HOTs, however, commonly use point traps, which focus high intensity laser light at specific spots in fluid media to attract and move micro-objects. In this paper, we develop a novel automated system of tweezing multiple micro-objects more efficiently using multiplexed optical traps. Multiplexed traps enable the simultaneous trapping of multiple beads in various alternate multiplexing formations, such as annular rings and line patterns. Our automated system is realized by augmenting the capabilities of a commercially available HOT with real-time bead detection and tracking, and wavefront-based path planning. We demonstrate the usefulness of the system by assembling two different composite micro-structures, comprising 5 $\mu m$ polystyrene beads, using both annular and line shaped traps in obstacle-rich environments.
[7] arXiv:2404.17064 [pdf, other]: Title: Detection of Peri-Pancreatic Edema using Deep Learning and Radiomics Techniques

Authors: Ziliang Hong, Debesh Jha, Koushik Biswas, Zheyuan Zhang, Yury Velichko, Cemal Yazici, Temel Tirkes, Amir Borhani, Baris Turkbey, Alpay Medetalibeyoglu, Gorkem Durak, Ulas Bagci

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Identifying peri-pancreatic edema is a pivotal indicator for identifying disease progression and prognosis, emphasizing the critical need for accurate detection and assessment in pancreatitis diagnosis and management. This study \textit{introduces a novel CT dataset sourced from 255 patients with pancreatic diseases, featuring annotated pancreas segmentation masks and corresponding diagnostic labels for peri-pancreatic edema condition}. With the novel dataset, we first evaluate the efficacy of the \textit{LinTransUNet} model, a linear Transformer based segmentation algorithm, to segment the pancreas accurately from CT imaging data. Then, we use segmented pancreas regions with two distinctive machine learning classifiers to identify existence of peri-pancreatic edema: deep learning-based models and a radiomics-based eXtreme Gradient Boosting (XGBoost). The LinTransUNet achieved promising results, with a dice coefficient of 80.85\%, and mIoU of 68.73\%. Among the nine benchmarked classification models for peri-pancreatic edema detection, \textit{Swin-Tiny} transformer model demonstrated the highest recall of $98.85 \pm 0.42$ and precision of $98.38\pm 0.17$. Comparatively, the radiomics-based XGBoost model achieved an accuracy of $79.61\pm4.04$ and recall of $91.05\pm3.28$, showcasing its potential as a supplementary diagnostic tool given its rapid processing speed and reduced training time. Our code is available \url{https://github.com/NUBagciLab/Peri-Pancreatic-Edema-Detection}.
[8] arXiv:2404.17083 [pdf, other]: Title: Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation

Authors: Deepak Bhatia, Muhammad Abdullah, Anne Querfurth, Mahdi Mantash

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

This paper investigates the use of deep learning approaches to estimate the femur caput-collum-diaphyseal (CCD) angle from X-ray images. The CCD angle is an important measurement in the diagnosis of hip problems, and correct prediction can help in the planning of surgical procedures. Manual measurement of this angle, on the other hand, can be time-intensive and vulnerable to inter-observer variability. In this paper, we present a deep-learning algorithm that can reliably estimate the femur CCD angle from X-ray images. To train and test the performance of our model, we employed an X-ray image dataset with associated femur CCD angle measurements. Furthermore, we built a prototype to display the resulting predictions and to allow the user to interact with the predictions. As this is happening in a sterile setting during surgery, we expanded our interface to the possibility of being used only by voice commands.
Our results show that our deep learning model predicts the femur CCD angle on X-ray images with great accuracy, with a mean absolute error of 4.3 degrees on the left femur and 4.9 degrees on the right femur on the test dataset. Our results suggest that deep learning has the potential to give a more efficient and accurate technique for predicting the femur CCD angle, which might have substantial therapeutic implications for the diagnosis and management of hip problems.
[9] arXiv:2404.17089 [pdf, other]: Title: Auto-Calibration and 2D-DOA Estimation in UCAs via an Integrated Wideband Dictionary

Authors: Zavareh Bozorgasl, Hao Chen, Mohammad J. Dehghani

Comments: This is a completed version of a work which will be sent to 2024 Asilomar Conference on Signals, Systems, and Computers

Subjects: Signal Processing (eess.SP)

In this paper, we present a novel auto-calibration scheme for the joint estimation of the two-dimensional (2-D) direction-of-arrival (DOA) and the mutual coupling matrix (MCM) for a signal measured using uniform circular arrays. The method employs an integrated wideband dictionary to mitigate the detrimental effects of the discretization of the continuous parameter space over the considered azimuth and elevation angles. This leads to a reduction of the computational complexity and obtaining of more accurate DOA estimates. Given the more reliable DOA estimates, the method also allows for the estimation of more accurate mutual coupling coefficients. The method utilizes an integrated dictionary in order to iteratively refine the active parameter space, thereby reducing the required computational complexity without reducing the overall performance. The complexity is further reduced by employing only the dominant subspace of the measured signal. Furthermore, the proposed method does not require a constraint on the prior knowledge of the number of nonzero coupling coefficients nor suffer from ambiguity problems. Moreover, a simple formulation for 2-D non-numerical integration is presented. Simulation results show the effectiveness of the proposed method.
[10] arXiv:2404.17107 [pdf, other]: Title: Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection

Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Comments: 4 pages, 1 figure, and 4 tables. Accepted by IEEE EMBC 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

To reduce the need for skilled clinicians in heart sound interpretation, recent studies on automating cardiac auscultation have explored deep learning approaches. However, despite the demands for large data for deep learning, the size of the heart sound datasets is limited, and no pre-trained model is available. On the contrary, many pre-trained models for general audio tasks are available as general-purpose audio representations. This study explores the potential of general-purpose audio representations pre-trained on large-scale datasets for transfer learning in heart murmur detection. Experiments on the CirCor DigiScope heart sound dataset show that the recent self-supervised learning Masked Modeling Duo (M2D) outperforms previous methods with the results of a weighted accuracy of 0.832 and an unweighted average recall of 0.713. Experiments further confirm improved performance by ensembling M2D with other models. These results demonstrate the effectiveness of general-purpose audio representation in processing heart sounds and open the way for further applications. Our code is available online which runs on a 24 GB consumer GPU at https://github.com/nttcslab/m2d/tree/master/app/circor
[11] arXiv:2404.17138 [pdf, other]: Title: Sub-6GHz Assisted mmWave Hybrid Beamforming with Heterogeneous Graph Neural Network

Authors: Zhaohui Huang, Zhaocheng Wang, Sheng Chen

Comments: This paper has been submitted to IEEE Transactions on Communications (IEEE TCOM)

Subjects: Signal Processing (eess.SP)

In next-generation communications, sub-6GHz and millimeter-wave (mmWave) links typically coexist, with the sub-6GHz link always active and the mmWave link active when high-rate transmission is required. Due to the spatial similarities between sub-6GHz and mmWave channels, sub-6GHz channel information can be utilized to support hybrid beamforming in mmWave communications to reduce overhead costs. We consider a multi-cell heterogeneous communication network where both sub-6GHz and mmWave communications co-exist. Multiple mmWave base stations (BSs) in the heterogeneous network simultaneously transmit signals to multiple users in their own mmWave cells while interfering with each other. The challenging problem is to design hybrid beamformers in the mmWave band that can maximize the system spectral efficiency. To address this highly complex programming using sub-6GHz information, a novel heterogeneous graph neural network (HGNN) architecture is proposed to learn the intrinsic relationship between sub-6GHz and mmWave and design the hybrid beamformers for mmWave BSs. The proposed HGNN consists of two different node types, namely, BS nodes and user equipment (UE) nodes, and two different edge types, namely, desired link edge and interfering link edge. In addition, the attention mechanism and the residual structure are utilized in the HGNN architecture to improve the performance. Simulation results show that the proposed HGNN can successfully achieve better performances with sub-6GHz information than traditional learning methods. The results also demonstrate that the attention mechanism and residual structure improve the performances of the HGNN compared to its unmodified counterparts.
[12] arXiv:2404.17235 [pdf, other]: Title: Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment

Authors: Kazi Shahriar Sanjid, Md. Tanzim Hossain, Md. Shakib Shahariar Junayed, M. Monir Uddin

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning has revolutionized medical imaging by providing innovative solutions to complex healthcare challenges. Traditional models often struggle to dynamically adjust feature importance, resulting in suboptimal representation, particularly in tasks like semantic segmentation crucial for accurate structure delineation. Moreover, their static nature incurs high computational costs. To tackle these issues, we introduce Mamba-Ahnet, a novel integration of State Space Model (SSM) and Advanced Hierarchical Network (AHNet) within the MAMBA framework, specifically tailored for semantic segmentation in medical imaging.Mamba-Ahnet combines SSM's feature extraction and comprehension with AHNet's attention mechanisms and image reconstruction, aiming to enhance segmentation accuracy and robustness. By dissecting images into patches and refining feature comprehension through self-attention mechanisms, the approach significantly improves feature resolution. Integration of AHNet into the MAMBA framework further enhances segmentation performance by selectively amplifying informative regions and facilitating the learning of rich hierarchical representations. Evaluation on the Universal Lesion Segmentation dataset demonstrates superior performance compared to state-of-the-art techniques, with notable metrics such as a Dice similarity coefficient of approximately 98% and an Intersection over Union of about 83%. These results underscore the potential of our methodology to enhance diagnostic accuracy, treatment planning, and ultimately, patient outcomes in clinical practice. By addressing the limitations of traditional models and leveraging the power of deep learning, our approach represents a significant step forward in advancing medical imaging technology.
[13] arXiv:2404.17331 [pdf, ps, other]: Title: Finite Sample Analysis for a Class of Subspace Identification Methods

Authors: Jiabao He, Ingvar Ziemann, Cristian R. Rojas, Håkan Hjalmarsson

Subjects: Systems and Control (eess.SY)

While subspace identification methods (SIMs) are appealing due to their simple parameterization for MIMO systems and robust numerical realizations, a comprehensive statistical analysis of SIMs remains an open problem, especially in the non-asymptotic regime. In this work, we provide a finite sample analysis for a class of SIMs, which reveals that the convergence rates for estimating Markov parameters and system matrices are $\mathcal{O}(1/\sqrt{N})$, in line with classical asymptotic results. Based on the observation that the model format in classical SIMs becomes non-causal because of a projection step, we choose a parsimonious SIM that bypasses the projection step and strictly enforces a causal model to facilitate the analysis, where a bank of ARX models are estimated in parallel. Leveraging recent results from finite sample analysis of an individual ARX model, we obtain an overall error bound of an array of ARX models and proceed to derive error bounds for system matrices via robustness results for the singular value decomposition.
[14] arXiv:2404.17357 [pdf, other]: Title: Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution using Conditional Diffusion Model

Authors: Yushen Xu, Xiaosong Li, Yuchan Jie, Haishu Tan

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In clinical practice, tri-modal medical image fusion, compared to the existing dual-modal technique, can provide a more comprehensive view of the lesions, aiding physicians in evaluating the disease's shape, location, and biological activity. However, due to the limitations of imaging equipment and considerations for patient safety, the quality of medical images is usually limited, leading to sub-optimal fusion performance, and affecting the depth of image analysis by the physician. Thus, there is an urgent need for a technology that can both enhance image resolution and integrate multi-modal information. Although current image processing methods can effectively address image fusion and super-resolution individually, solving both problems synchronously remains extremely challenging. In this paper, we propose TFS-Diff, a simultaneously realize tri-modal medical image fusion and super-resolution model. Specially, TFS-Diff is based on the diffusion model generation of a random iterative denoising process. We also develop a simple objective function and the proposed fusion super-resolution loss, effectively evaluates the uncertainty in the fusion and ensures the stability of the optimization process. And the channel attention module is proposed to effectively integrate key information from different modalities for clinical diagnosis, avoiding information loss caused by multiple image processing. Extensive experiments on public Harvard datasets show that TFS-Diff significantly surpass the existing state-of-the-art methods in both quantitative and visual evaluations. The source code will be available at GitHub.
[15] arXiv:2404.17411 [pdf, ps, other]: Title: Low-Complexity Near-Field Channel Estimation for Hybrid RIS Assisted Systems

Authors: Rafaela Schroeder, Jiguang He, Hamza Djelouat, Markku Juntti

Comments: 5 pages, 5 figures

Subjects: Signal Processing (eess.SP)

We investigate the channel estimation (CE) problem for hybrid RIS assisted systems and focus on the near-field (NF) regime. Different from their far-field counterparts, NF channels possess a block-sparsity property, which is leveraged in the two developed CE algorithms: (i) boundary estimation and sub-vector recovery (BESVR) and (ii) linear total variation regularization (TVR). In addition, we adopt the alternating direction method of multipliers to reduce their computational complexity. Numerical results show that the linear TVR algorithm outperforms the chosen baseline schemes in terms of normalized mean square error in the high signal-to-noise ratio regime while the BESVR algorithm achieves comparable performance to the baseline schemes but with the added advantage of minimal CPU time.
[16] arXiv:2404.17426 [pdf, ps, other]: Title: One-Shot Image Restoration

Authors: Deborah Pereg

Comments: arXiv admin note: text overlap with arXiv:2209.14267

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Image restoration, or inverse problems in image processing, has long been an extensively studied topic. In recent years supervised learning approaches have become a popular strategy attempting to tackle this task. Unfortunately, most supervised learning-based methods are highly demanding in terms of computational resources and training data (sample complexity). In addition, trained models are sensitive to domain changes, such as varying acquisition systems, signal sampling rates, resolution and contrast. In this work, we try to answer a fundamental question: Can supervised learning models generalize well solely by learning from one image or even part of an image? If so, then what is the minimal amount of patches required to achieve acceptable generalization? To this end, we focus on an efficient patch-based learning framework that requires a single image input-output pair for training. Experimental results demonstrate the applicability, robustness and computational efficiency of the proposed approach for supervised image deblurring and super-resolution. Our results showcase significant improvement of learning models' sample efficiency, generalization and time complexity, that can hopefully be leveraged for future real-time applications, and applied to other signals and modalities.
[17] arXiv:2404.17474 [pdf, other]: Title: Establishing best practices for modeling long duration energy storage in deeply decarbonized energy systems

Authors: Gabriel Mantegna, Wilson Ricks, Aneesha Manocha, Neha Patankar, Dharik Mallapragada, Jesse Jenkins

Comments: Working paper

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Long duration energy storage (LDES) may become a critical technology for the decarbonization of the power sector, as current commercially available Li-ion battery storage technologies cannot cost-effectively shift energy to address multi-day or seasonal variability in demand and renewable energy availability. LDES is difficult to model in existing energy system planning models (such as electricity system capacity expansion models), as it is much more dependent on an accurate representation of chronology than other resources. Techniques exist for modeling LDES in these planning models; however, it is not known how spatial and temporal resolution affect the performance of these techniques, creating a research gap. In this study we examine what spatial and temporal resolution is necessarily to accurately capture the full value of LDES, in the context of a continent-scale capacity expansion model. We use the results to draw conclusions and present best practices for modelers seeking to accurately model LDES in a macro-energy systems planning context. Our key findings are: 1) modeling LDES with linked representative periods is crucial to capturing its full value, 2) LDES value is highly sensitive to the cost and availability of other resources, and 3) temporal resolution is more important than spatial resolution for capturing the full value of LDES, although how much temporal resolution is needed will depend on the specific model context.
[18] arXiv:2404.17479 [pdf, other]: Title: Scalable Adaptive Traffic Light Control Over a Traffic Network Including Turns, Transit Delays, and Blocking

Authors: Yingqing Chen, Christos G. Cassandras

Comments: arXiv admin note: substantial text overlap with arXiv:2305.09024

Subjects: Systems and Control (eess.SY)

We develop adaptive data-driven traffic light controllers for a grid-like traffic network considering straight, left-turn, and right-turn traffic flows. The analysis incorporates transit delays and blocking effects on vehicle movements between neighboring intersections. Using a stochastic hybrid system model with parametric traffic light controllers, we use Infinitesimal Perturbation Analysis (IPA) to derive a data-driven cost gradient estimator with respect to controllable parameters. We then iteratively adjust them through an online gradient-based algorithm to improve performance metrics. By integrating a flexible modeling framework to represent diverse intersection and traffic network configurations with event-driven IPA-based adaptive controllers, we develop a general scalable, adaptive framework for real-time traffic light control in multi-intersection traffic networks.
[19] arXiv:2404.17490 [pdf, other]: Title: The CARFAC v2 Cochlear Model in Matlab, NumPy, and JAX

Authors: Richard F. Lyon, Rob Schonberger, Malcolm Slaney, Mihajlo Velimirović, Honglin Yu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)

The open-source CARFAC (Cascade of Asymmetric Resonators with Fast-Acting Compression) cochlear model is upgraded to version 2, with improvements to the Matlab implementation, and with new Python/NumPy and JAX implementations -- but C++ version changes are still pending. One change addresses the DC (direct current, or zero frequency) quadratic distortion anomaly previously reported; another reduces the neural synchrony at high frequencies; the others have little or no noticeable effect in the default configuration. A new feature allows modeling a reduction of cochlear amplifier function, as a step toward a differentiable parameterized model of hearing impairment. In addition, the integration into the Auditory Model Toolbox (AMT) has been extensively improved, as the prior integration had bugs that made it unsuitable for including CARFAC in multi-model comparisons.
[20] arXiv:2404.17552 [pdf, other]: Title: A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification

Authors: Rémi Uro, David Doukhan, Albert Rilliard, Laëtitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon, Antoine Laurent

Comments: Keywords:, semi-automatic processing, corpus creation, diarization, speaker identification, gender-balanced, age-balanced, speaker corpus, diachrony

Journal-ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 3271-3280, Marseille, 20-25 June 2022. European Language Resources Association (ELRA)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG); Sound (cs.SD)

This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker's age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For each speaker, speech excerpts were extracted from audiovisual documents using an automatic pipeline consisting of speech detection, background music and overlapped speech removal and speaker diarization, used to present clean speaker segments to human annotators identifying target speakers. This pipeline proved highly effective, cutting down manual processing by a factor of ten. Evaluation of the quality of the automatic processing and of the final output is provided. It shows the automatic processing compare to up-to-date process, and that the output provides high quality speech for most of the selected excerpts. This method shows promise for creating large corpora of known target speakers.

Cross-lists for Mon, 29 Apr 24

[21] arXiv:2404.16844 (cross-list from cs.CV) [pdf, other]: Title: Sugarcane Health Monitoring With Satellite Spectroscopy and Machine Learning: A Review

Authors: Ethan Kane Waters, Carla Chia-Ming Chen, Mostafa Rahimi Azghadi

Comments: 22 pages, 6 figures, 3 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)

Research into large-scale crop monitoring has flourished due to increased accessibility to satellite imagery. This review delves into previously unexplored and under-explored areas in sugarcane health monitoring and disease/pest detection using satellite-based spectroscopy and Machine Learning (ML). It discusses key considerations in system development, including relevant satellites, vegetation indices, ML methods, factors influencing sugarcane reflectance, optimal growth conditions, common diseases, and traditional detection methods. Many studies highlight how factors like crop age, soil type, viewing angle, water content, recent weather patterns, and sugarcane variety can impact spectral reflectance, affecting the accuracy of health assessments via spectroscopy. However, these variables have not been fully considered in the literature. In addition, the current literature lacks comprehensive comparisons between ML techniques and vegetation indices. We address these gaps in this review. We discuss that, while current findings suggest the potential for an ML-driven satellite spectroscopy system for monitoring sugarcane health, further research is essential. This paper offers a comprehensive analysis of previous research to aid in unlocking this potential and advancing the development of an effective sugarcane health monitoring system using satellite technology.
[22] arXiv:2404.16848 (cross-list from cs.CR) [pdf, ps, other]: Title: Cyber Security issues and Blockchain-Deep Learning based solutions for UAV and Internet of Drones (FANETs)

Authors: Partha Protim Datta

Subjects: Cryptography and Security (cs.CR); Signal Processing (eess.SP)

Safety-critical systems such as automated embedded or industrial systems have a strong dependency on the trustworthiness of data collection. As sensors are the critical component for those systems, it is imperative to address the attack resilience of sensors
[23] arXiv:2404.16849 (cross-list from cs.CR) [pdf, ps, other]: Title: Smart Grids Secured By Dynamic Watermarking: How Secure?

Authors: Kate Davis, Laszlo B. Kish, Chanan Singh

Comments: Accepted for publication in Fluct. Noise Lett

Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)

Unconditional security for smart grids is defined. Cryptanalyses of the watermarked security of smart grids indicate that watermarking cannot guarantee unconditional security unless the communication within the grid system is unconditionally secure. The successful attack against the dynamically watermarked smart grid remains valid even with the presence of internal noise from the grid. An open question arises: if unconditionally authenticated secure communications within the grid, together with tamper resistance of the critical elements, are satisfactory conditions to provide unconditional security for the grid operation.
[24] arXiv:2404.16852 (cross-list from cs.LG) [pdf, other]: Title: A Disease Labeler for Chinese Chest X-Ray Report Generation

Authors: Mengwei Wang, Ruixin Yan, Zeyi Hou, Ning Lang, Xiuzhuang Zhou

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Image and Video Processing (eess.IV)

In the field of medical image analysis, the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports. On one hand, the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate expert disease annotation. On the other hand, a single natural language generation metric is commonly used to evaluate the similarity between generated and ground-truth reports, while the clinical accuracy and effectiveness of the generated reports rely on an accurate disease labeler (classifier). To address the issues, this study proposes a disease labeler tailored for the generation of Chinese chest X-ray reports. This labeler leverages a dual BERT architecture to handle diagnostic reports and clinical information separately and constructs a hierarchical label learning algorithm based on the affiliation between diseases and body parts to enhance text classification performance. Utilizing this disease labeler, a Chinese chest X-ray report dataset comprising 51,262 report samples was established. Finally, experiments and analyses were conducted on a subset of expert-annotated Chinese chest X-ray reports, validating the effectiveness of the proposed disease labeler.
[25] arXiv:2404.16879 (cross-list from cs.LG) [pdf, ps, other]: Title: Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

Authors: Maeva Guerrier, Hassan Fouad, Giovanni Beltrame

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)

Reinforcement learning is a powerful technique for developing new robot behaviors. However, typical lack of safety guarantees constitutes a hurdle for its practical application on real robots. To address this issue, safe reinforcement learning aims to incorporate safety considerations, enabling faster transfer to real robots and facilitating lifelong learning. One promising approach within safe reinforcement learning is the use of control barrier functions. These functions provide a framework to ensure that the system remains in a safe state during the learning process. However, synthesizing control barrier functions is not straightforward and often requires ample domain knowledge. This challenge motivates the exploration of data-driven methods for automatically defining control barrier functions, which is highly appealing. We conduct a comprehensive review of the existing literature on safe reinforcement learning using control barrier functions. Additionally, we investigate various techniques for automatically learning the Control Barrier Functions, aiming to enhance the safety and efficacy of Reinforcement Learning in practical robot applications.
[26] arXiv:2404.16905 (cross-list from cs.CL) [pdf, other]: Title: Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

Authors: Shen Zhang, Haojie Zhang, Jing Zhang, Xudong Zhang, Yimeng Zhuang, Jinting Wu

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions. Unraveling the causes of emotions is more challenging. A new task named Multimodal Emotion-Cause Pair Extraction in Conversations is responsible for recognizing emotion and identifying causal expressions. In this study, we propose a multi-stage framework to generate emotion and extract the emotion causal pairs given the target emotion. In the first stage, Llama-2-based InstructERC is utilized to extract the emotion category of each utterance in a conversation. After emotion recognition, a two-stream attention model is employed to extract the emotion causal pairs given the target emotion for subtask 2 while MuTEC is employed to extract causal span for subtask 1. Our approach achieved first place for both of the two subtasks in the competition.
[27] arXiv:2404.16913 (cross-list from cs.LG) [pdf, other]: Title: DE-CGAN: Boosting rTMS Treatment Prediction with Diversity Enhancing Conditional Generative Adversarial Networks

Authors: Matthew Squires, Xiaohui Tao, Soman Elangovan, Raj Gururajan, Haoran Xie, Xujuan Zhou, Yuefeng Li, U Rajendra Acharya

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Repetitive Transcranial Magnetic Stimulation (rTMS) is a well-supported, evidence-based treatment for depression. However, patterns of response to this treatment are inconsistent. Emerging evidence suggests that artificial intelligence can predict rTMS treatment outcomes for most patients using fMRI connectivity features. While these models can reliably predict treatment outcomes for many patients for some underrepresented fMRI connectivity measures DNN models are unable to reliably predict treatment outcomes. As such we propose a novel method, Diversity Enhancing Conditional General Adversarial Network (DE-CGAN) for oversampling these underrepresented examples. DE-CGAN creates synthetic examples in difficult-to-classify regions by first identifying these data points and then creating conditioned synthetic examples to enhance data diversity. Through empirical experiments we show that a classification model trained using a diversity enhanced training set outperforms traditional data augmentation techniques and existing benchmark results. This work shows that increasing the diversity of a training dataset can improve classification model performance. Furthermore, this work provides evidence for the utility of synthetic patients providing larger more robust datasets for both AI researchers and psychiatrists to explore variable relationships.
[28] arXiv:2404.16920 (cross-list from cs.NI) [pdf, other]: Title: Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks

Authors: Shufan Wang, Guojun Xiong, Shichen Zhang, Huacheng Zeng, Jian Li, Shivendra Panwar

Comments: IEEE Transactions on Wireless Communications

Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

We study the data packet transmission problem (mmDPT) in dense cell-free millimeter wave (mmWave) networks, i.e., users sending data packet requests to access points (APs) via uplinks and APs transmitting requested data packets to users via downlinks. Our objective is to minimize the average delay in the system due to APs' limited service capacity and unreliable wireless channels between APs and users. This problem can be formulated as a restless multi-armed bandits problem with fairness constraint (RMAB-F). Since finding the optimal policy for RMAB-F is intractable, existing learning algorithms are computationally expensive and not suitable for practical dynamic dense mmWave networks. In this paper, we propose a structured reinforcement learning (RL) solution for mmDPT by exploiting the inherent structure encoded in RMAB-F. To achieve this, we first design a low-complexity and provably asymptotically optimal index policy for RMAB-F. Then, we leverage this structure information to develop a structured RL algorithm called mmDPT-TS, which provably achieves an \tilde{O}(\sqrt{T}) Bayesian regret. More importantly, mmDPT-TS is computation-efficient and thus amenable to practical implementation, as it fully exploits the structure of index policy for making decisions. Extensive emulation based on data collected in realistic mmWave networks demonstrate significant gains of mmDPT-TS over existing approaches.
[29] arXiv:2404.16969 (cross-list from cs.SD) [pdf, other]: Title: COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

Authors: Ruben Ciranni, Emilian Postolache, Giorgio Mariani, Michele Mancusi, Luca Cosmo, Emanuele Rodolà

Comments: Demo page: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples. Our method operates at the level of stems (or their combinations) composing music tracks and allows the objective evaluation of compositional models for music in the task of accompaniment generation. We also introduce a new baseline for compositional music generation called CompoNet, based on ControlNet \cite{zhang2023adding}, generalizing the tasks of MSDM, and quantify it against the latter using COCOLA. We release all models trained on public datasets containing separate stems (MUSDB18-HQ, MoisesDB, Slakh2100, and CocoChorales).
[30] arXiv:2404.17022 (cross-list from cs.SD) [pdf, ps, other]: Title: Investigating differences in lab-quality and remote recording methods with dynamic acoustic measures

Authors: Cong Zhang, Kathleen Jepson, Yu-Ying Chuang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Increasingly, phonetic research utilizes data collected from participants who record themselves on readily available devices. Though such recordings are convenient, their suitability for acoustic analysis remains an open question, especially regarding how the individual methods affect acoustic measures over time. We used Quantile Generalized Additive Mixed Models (QGAMMs) to analyze measures of F0, intensity, and the first and second formants, comparing files recorded using a laboratory-standard recording method (Zoom H6 Recorder with an external microphone), to three remote recording methods, (1) the Awesome Voice Recorder application on a smartphone (AVR), (2) the Zoom meeting application with default settings (Zoom-default), and (3) the Zoom meeting application with the "Turn on Original Sound" setting (Zoom-raw). A linear temporal alignment issue was observed for the Zoom methods over the course of the long, recording session files. However, the difference was not significant for utterance-length files. F0 was reliably measured using all methods. Intensity and formants presented non-linear differences across methods that could not be corrected for simply. Overall, the AVR files were most similar to the H6's, and so AVR is deemed to be a more reliable recording method than either Zoom-default or Zoom-raw.
[31] arXiv:2404.17029 (cross-list from cs.CV) [pdf, other]: Title: Dr-SAM: An End-to-End Framework for Vascular Segmentation, Diameter Estimation, and Anomaly Detection on Angiography Images

Authors: Vazgen Zohranyan, Vagner Navasardyan, Hayk Navasardyan, Jan Borggrefe, Shant Navasardyan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Recent advancements in AI have significantly transformed medical imaging, particularly in angiography, by enhancing diagnostic precision and patient care. However existing works are limited in analyzing the aorta and iliac arteries, above all for vascular anomaly detection and characterization. To close this gap, we propose Dr-SAM, a comprehensive multi-stage framework for vessel segmentation, diameter estimation, and anomaly analysis aiming to examine the peripheral vessels through angiography images. For segmentation we introduce a customized positive/negative point selection mechanism applied on top of the Segment Anything Model (SAM), specifically for medical (Angiography) images. Then we propose a morphological approach to determine the vessel diameters followed by our histogram-driven anomaly detection approach. Moreover, we introduce a new benchmark dataset for the comprehensive analysis of peripheral vessel angiography images which we hope can boost the upcoming research in this direction leading to enhanced diagnostic precision and ultimately better health outcomes for individuals facing vascular issues.
[32] arXiv:2404.17044 (cross-list from cs.RO) [pdf, other]: Title: A new Taxonomy for Automated Driving: Structuring Applications based on their Operational Design Domain, Level of Automation and Automation Readiness

Authors: Johannes Betz, Melina Lutwitzi, Steven Peters

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The aim of this paper is to investigate the relationship between operational design domains (ODD), automated driving SAE Levels, and Technology Readiness Level (TRL). The first highly automated vehicles, like robotaxis, are in commercial use, and the first vehicles with highway pilot systems have been delivered to private customers. It has emerged as a crucial issue that these automated driving systems differ significantly in their ODD and in their technical maturity. Consequently, any approach to compare these systems is difficult and requires a deep dive into defined ODDs, specifications, and technologies used. Therefore, this paper challenges current state-of-the-art taxonomies and develops a new and integrated taxonomy that can structure automated vehicle systems more efficiently. We use the well-known SAE Levels 0-5 as the "level of responsibility", and link and describe the ODD at an intermediate level of abstraction. Finally, a new maturity model is explicitly proposed to improve the comparability of automated vehicles and driving functions. This method is then used to analyze today's existing automated vehicle applications, which are structured into the new taxonomy and rated by the new maturity levels. Our results indicate that this new taxonomy and maturity level model will help to differentiate automated vehicle systems in discussions more clearly and to discover white fields more systematically and upfront, e.g. for research but also for regulatory purposes.
[33] arXiv:2404.17069 (cross-list from cs.IT) [pdf, other]: Title: Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks

Authors: Yaqi Hu, Mingsheng Yin, Marco Mezzavilla, Hao Guo, Sundeep Rangan

Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operate at multiple frequency bands. Data-driven methods, especially generative adversarial networks (GANs), can capture the intricate relationships among data samples, and provide an appropriate tool for FR3 channel modeling. In this work, we present the architecture, link state model, and path generative network of GAN-based FR3 channel modeling. The comparison of our model greatly matches the ray-tracing simulated data.
[34] arXiv:2404.17094 (cross-list from cs.LO) [pdf, other]: Title: TIUP: Effective Processor Verification with Tautology-Induced Universal Properties

Authors: Yufeng Li, Yiwei Ci, Qiusong Yang

Comments: Accepted by ASP-DAC 2024, please note that this is not the final camera-ready version

Subjects: Logic in Computer Science (cs.LO); Hardware Architecture (cs.AR); Systems and Control (eess.SY)

Design verification is a complex and costly task, especially for large and intricate processor projects. Formal verification techniques provide advantages by thoroughly examining design behaviors, but they require extensive labor and expertise in property formulation. Recent research focuses on verifying designs using the self-consistency universal property, reducing verification difficulty as it is design-independent. However, the single self-consistency property faces false positives and scalability issues due to exponential state space growth. To tackle these challenges, this paper introduces TIUP, a technique using tautologies as universal properties. We show how TIUP effectively uses tautologies as abstract specifications, covering processor data and control paths. TIUP simplifies and streamlines verification for engineers, enabling efficient formal processor verification.
[35] arXiv:2404.17125 (cross-list from cs.RO) [pdf, other]: Title: Misaka: Interactive Swarm Testbed for Smart Grid Distributed Algorithm Test and Evaluation

Authors: Tingliang Zhang, Haiwang Zhong, Zhenfei Tan, Xinfei Yan

Journal-ref: 2020 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia)

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)

In this paper, we present Misaka, a visualized swarm testbed for smart grid algorithm evaluation, also an extendable open-source open-hardware platform for developing tabletop tangible swarm interfaces. The platform consists of a collection of custom-designed 3 omni-directional wheels robots each 10 cm in diameter, high accuracy localization through a microdot pattern overlaid on top of the activity sheets, and a software framework for application development and control, while remaining affordable (per unit cost about 30 USD at the prototype stage). We illustrate the potential of tabletop swarm user interfaces through a set of smart grid algorithm application scenarios developed with Misaka.
[36] arXiv:2404.17126 (cross-list from cs.LG) [pdf, other]: Title: Deep Evidential Learning for Dose Prediction

Authors: Hai Siong Tan, Kuancheng Wang, Rafe Mcbeth

Comments: 24 pages, 8 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)

In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of network training. This was achieved only after reformulating the original loss function for a stable implementation. We found that (i)epistemic uncertainty was highly correlated with prediction errors, with various association indices comparable or stronger than those for Monte-Carlo Dropout and Deep Ensemble methods, (ii)the median error varied with uncertainty threshold much more linearly for epistemic uncertainty in Deep Evidential Learning relative to these other two conventional frameworks, indicative of a more uniformly calibrated sensitivity to model errors, (iii)relative to epistemic uncertainty, aleatoric uncertainty demonstrated a more significant shift in its distribution in response to Gaussian noise added to CT intensity, compatible with its interpretation as reflecting data noise. Collectively, our results suggest that Deep Evidential Learning is a promising approach that can endow deep-learning models in radiotherapy dose prediction with statistical robustness. Towards enhancing its clinical relevance, we demonstrate how we can use such a model to construct the predicted Dose-Volume-Histograms' confidence intervals.
[37] arXiv:2404.17144 (cross-list from cs.LG) [pdf, ps, other]: Title: Sensor Response-Time Reduction using Long-Short Term Memory Network Forecasting

Authors: Simon J. Ward, Muhamed Baljevic, Sharon M. Weiss

Comments: 9 pages, 3 figures

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

The response time of a biosensor is a crucial metric in safety-critical applications such as medical diagnostics where an earlier diagnosis can markedly improve patient outcomes. However, the speed at which a biosensor reaches a final equilibrium state can be limited by poor mass transport and long molecular diffusion times that increase the time it takes target molecules to reach the active sensing region of a biosensor. While optimization of system and sensor design can promote molecules reaching the sensing element faster, a simpler and complementary approach for response time reduction that is widely applicable across all sensor platforms is to use time-series forecasting to predict the ultimate steady-state sensor response. In this work, we show that ensembles of long short-term memory (LSTM) networks can accurately predict equilibrium biosensor response from a small quantity of initial time-dependent biosensor measurements, allowing for significant reduction in response time by a mean and median factor of improvement of 18.6 and 5.1, respectively. The ensemble of models also provides simultaneous estimation of uncertainty, which is vital to provide confidence in the predictions and subsequent safety-related decisions that are made. This approach is demonstrated on real-time experimental data collected by exposing porous silicon biosensors to buffered protein solutions using a multi-channel fluidic cell that enables the automated measurement of 100 porous silicon biosensors in parallel. The dramatic improvement in sensor response time achieved using LSTM network ensembles and associated uncertainty quantification opens the door to trustworthy and faster responding biosensors, enabling more rapid medical diagnostics for improved patient outcomes and healthcare access, as well as quicker identification of toxins in food and the environment.
[38] arXiv:2404.17161 (cross-list from cs.SD) [pdf, other]: Title: An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

Authors: Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu

Comments: arXiv admin note: text overlap with arXiv:2311.14957

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

Generative Adversarial Network (GAN) based vocoders are superior in both inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator for GAN-based vocoders. Most existing Time-Frequency Representation (TFR)-based discriminators are rooted in Short-Time Fourier Transform (STFT), which owns a constant Time-Frequency (TF) resolution, linearly scaled center frequencies, and a fixed decomposition basis, making it incompatible with signals like singing voices that require dynamic attention for different frequency bands and different time intervals. Motivated by that, we propose a Multi-Scale Sub-Band Constant-Q Transform CQT (MS-SB-CQT) discriminator and a Multi-Scale Temporal-Compressed Continuous Wavelet Transform CWT (MS-TC-CWT) discriminator. Both CQT and CWT have a dynamic TF resolution for different frequency bands. In contrast, CQT has a better modeling ability in pitch information, and CWT has a better modeling ability in short-time transients. Experiments conducted on both speech and singing voices confirm the effectiveness of our proposed discriminators. Moreover, the STFT, CQT, and CWT-based discriminators can be used jointly for better performance. The proposed discriminators can boost the synthesis quality of various state-of-the-art GAN-based vocoders, including HiFi-GAN, BigVGAN, and APNet.
[39] arXiv:2404.17170 (cross-list from cs.CV) [pdf, other]: Title: S-IQA Image Quality Assessment With Compressive Sampling

Authors: Ronghua Liao, Chen Hui, Lang Yuan, Feng Jiang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

No-Reference Image Quality Assessment (IQA) aims at estimating image quality in accordance with subjective human perception. However, most existing NR-IQA methods focus on exploring increasingly complex networks or components to improve the final performance. Such practice imposes great limitations and complexity on IQA methods, especially when they are applied to high-resolution (HR) images in the real world. Actually, most images own high spatial redundancy, especially for those HR data. To further exploit the characteristic and alleviate the issue above, we propose a new framework for Image Quality Assessment with compressive Sampling (dubbed S-IQA), which consists of three components: (1) The Flexible Sampling Module (FSM) samples the image to obtain measurements at an arbitrary ratio. (2) Vision Transformer with the Adaptive Embedding Module (AEM) makes measurements of uniform size and extracts deep features (3) Dual Branch (DB) allocates weight for every patch and predicts the final quality score. Experiments show that our proposed S-IQA achieves state-of-the-art result on various datasets with less data usage.
[40] arXiv:2404.17175 (cross-list from cs.IT) [pdf, ps, other]: Title: Over-the-Air Modulation for RIS-assisted Symbiotic Radios: Design, Analysis, and Optimization

Authors: Hu Zhou, Ying-Chang Liang, Chau Yuen

Comments: 13 pages, 9 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR), an RIS is exploited to assist the primary system and to simultaneously operate as a secondary transmitter by modulating its own information over the incident primary signal from the air. Such an operation is called over-the-air modulation. The existing modulation schemes such as on-off keying and binary phase-shift keying suffer from two problems for joint detection of the primary and secondary signals in RIS-assisted SR, i.e., one is the detection ambiguity problem when the direct link is blocked, and the other is the bit error rate (BER) error-floor problem when the direct link is weak. To address the two problems, we propose a novel modulation scheme by dividing the phase-shift matrix into two parts: one is the assistance beamforming matrix for assisting the primary system and the other is the transmission beamforming matrix for delivering the secondary signal. To optimize the assistance and transmission beamforming matrices, we first introduce an assistance factor that describes the performance requirement of the primary system and then formulate a problem to minimize the BER of the secondary system, while guaranteeing the BER requirement of the primary system controlled by the assistance factor. To solve this non-convex problem, we resort to the successive convex approximation technique to obtain a suboptimal solution. Furthermore, to draw more insights, we propose a low-complexity assistance-transmission beamforming structure by borrowing the idea from the classical maximum ratio transmission and zero forcing techniques. Finally, simulation results reveal an interesting tradeoff between the BER performance of the primary and secondary systems by adjusting the assistance factor.
[41] arXiv:2404.17263 (cross-list from cs.IT) [pdf, ps, other]: Title: Multiple-Target Detection in Cell-Free Massive MIMO-Assisted ISAC

Authors: Mohamed Elfiatoure, Mohammadali Mohammadi, Hien Quoc Ngo, Michail Matthaiou

Comments: The manuscript has been submitted to IEEE TWC

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We propose a distributed implementation for integrated sensing and communication (ISAC) backed by a massive multiple input multiple output (CF-mMIMO) architecture without cells. Distributed multi-antenna access points (APs) simultaneously serve communication users (UEs) and emit probing signals towards multiple specified zones for sensing. The APs can switch between communication and sensing modes, and adjust their transmit power based on the network settings and sensing and communication operations' requirements. By considering local partial zero-forcing and maximum-ratio-transmit precoding at the APs for communication and sensing, respectively, we first derive closed-form expressions for the spectral efficiency (SE) of the UEs and the mainlobe-to-average-sidelobe ratio (MASR) of the sensing zones. Then, a joint operation mode selection and power control design problem is formulated to maximize the SE fairness among the UEs, while ensuring specific levels of MASR for sensing zones. The complicated mixed-integer problem is relaxed and solved via successive convex approximation approach. We further propose a low-complexity design, where AP mode selection is designed through a greedy algorithm and then power control is designed based on this chosen mode. Our findings reveal that the proposed scheme can consistently ensure a sensing success rate of $100\%$ for different network setups with a satisfactory fairness among all UEs.
[42] arXiv:2404.17270 (cross-list from cs.IT) [pdf, other]: Title: Empirical Studies of Propagation Characteristics and Modeling Based on XL-MIMO Channel Measurement: From Far-Field to Near-Field

Authors: Haiyang Miao, Jianhua Zhang, Pan Tang, Lei Tian, Weirang Zuo, Qi Wei, Guangyi Liu

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

In the sixth-generation (6G), the extremely large-scale multiple-input-multiple-output (XL-MIMO) is considered a promising enabling technology. With the further expansion of array element number and frequency bands, near-field effects will be more likely to occur in 6G communication systems. The near-field radio communications (NFRC) will become crucial in 6G communication systems. It is known that the channel research is very important for the development and performance evaluation of the communication systems. In this paper, we will systematically investigate the channel measurements and modeling for the emerging NFRC. First, the principle design of massive MIMO channel measurement platform are solved. Second, an indoor XL-MIMO channel measurement campaign with 1600 array elements is conducted, and the channel characteristics are extracted and validated in the near-field region. Then, the outdoor XL-MIMO channel measurement campaign with 320 array elements is conducted, and the channel characteristics are extracted and modeled from near-field to far-field (NF-FF) region. The spatial non-stationary characteristics of angular spread at the transmitting end are more important in modeling. We hope that this work will give some reference to the near-field and far-field research for 6G.
[43] arXiv:2404.17280 (cross-list from cs.SD) [pdf, other]: Title: Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequency device cepstral coefficient, derived from the graph frequency domain using a device-related linear transformation. We also introduce two novel representations: graph frequency logarithmic coefficient and graph frequency logarithmic device coefficient. We evaluate our methods using traditional Gaussian mixture model and light convolutional neural network systems as classifiers. On the ASVspoof 2017 V2, ASVspoof 2019 physical access, and ASVspoof 2021 physical access datasets, our proposed features outperform known front-ends, demonstrating their effectiveness for replay speech detection.
[44] arXiv:2404.17317 (cross-list from cs.NI) [pdf, other]: Title: Colosseum: The Open RAN Digital Twin

Authors: Michele Polese, Leonardo Bonati, Salvatore D'Oro, Pedram Johari, Davide Villa, Sakthivel Velumani, Rajeev Gangula, Maria Tsampazi, Clifton Paul Robinson, Gabriele Gemmi, Andrea Lacava, Stefano Maxenti, Hai Cheng, Tommaso Melodia

Comments: 13 pages, 8 figures, 1 table, submitted to IEEE for publication

Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

Recent years have witnessed the Open Radio Access Network (RAN) paradigm transforming the fundamental ways cellular systems are deployed, managed, and optimized. This shift is led by concepts such as openness, softwarization, programmability, interoperability, and intelligence of the network, all of which had never been applied to the cellular ecosystem before. The realization of the Open RAN vision into practical architectures, intelligent data-driven control loops, and efficient software implementations, however, is a multifaceted challenge, which requires (i) datasets to train Artificial Intelligence (AI) and Machine Learning (ML) models; (ii) facilities to test models without disrupting production networks; (iii) continuous and automated validation of the RAN software; and (iv) significant testing and integration efforts. This paper poses itself as a tutorial on how Colosseum - the world's largest wireless network emulator with hardware in the loop - can provide the research infrastructure and tools to fill the gap between the Open RAN vision, and the deployment and commercialization of open and programmable networks. We describe how Colosseum implements an Open RAN digital twin through a high-fidelity Radio Frequency (RF) channel emulator and end-to-end softwarized O-RAN and 5G-compliant protocol stacks, thus allowing users to reproduce and experiment upon topologies representative of real-world cellular deployments. Then, we detail the twinning infrastructure of Colosseum, as well as the automation pipelines for RF and protocol stack twinning. Finally, we showcase a broad range of Open RAN use cases implemented on Colosseum, including the real-time connection between the digital twin and real-world networks, and the development, prototyping, and testing of AI/ML solutions for Open RAN.
[45] arXiv:2404.17318 (cross-list from cs.IT) [pdf, other]: Title: Performance Bounds of Near-Field Sensing with Circular Arrays

Authors: Zhaolin Wang, Xidong Mu, Yuanwei Liu

Comments: 6 pages, 6 figures. arXiv admin note: text overlap with arXiv:2404.05076

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The performance bounds of near-field sensing are studied for circular arrays, focusing on the impact of bandwidth and array size. The closed-form Cramer-Rao bound (CRBs) for angle and distance estimation are derived, revealing the scaling laws of the CRBs with bandwidth and array size. Contrary to expectations, enlarging array size does not always enhance sensing performance. Furthermore, the asymptotic CRBs are analyzed under different conditions, unveiling that the derived expressions include the existing results as special cases. Finally, the derived expressions are validated through numerical results.
[46] arXiv:2404.17325 (cross-list from cs.ET) [pdf, ps, other]: Title: Towards Scalable Multi-Chip Wireless Networks with Near-Field Time Reversal

Authors: Ama Bandara, Fátima Rodríguez-Galán, Pau Talarn, Elana Pereira de Santana, Peter Haring Bolívar, Eduard Alarcón, Sergi Abadal

Subjects: Emerging Technologies (cs.ET); Signal Processing (eess.SP)

The concept of Wireless Network-on-Chip (WNoC) has emerged as a potential solution to address the escalating communication demands of modern computing systems due to their low-latency, versatility, and reconfigurability. However, for WNoC to fulfill its potential, it is essential to establish multiple high-speed wireless links across chips. Unfortunately, the compact and enclosed nature of computing packages introduces significant challenges in the form of Co-Channel Interference (CCI) and Inter-Symbol Interference (ISI), which not only hinder the deployment of multiple spatial channels but also severely restrict the symbol rate of each individual channel. In this paper, we posit that Time Reversal (TR) could be effective in addressing both impairments in this static scenario thanks to its spatiotemporal focusing capabilities even in the near field. Through comprehensive full-wave simulations and bit error rate analysis in multiple scenarios and at multiple frequency bands, we provide evidence that TR can increase the symbol rate by an order of magnitude, enabling the deployment of multiple concurrent links and achieving aggregate speeds exceeding 100 Gb/s. Finally, we evaluate the impact of reducing the sampling rate of the TR filter on the achievable speeds, paving the way to practical TR-based wireless communications at the chip scale.
[47] arXiv:2404.17367 (cross-list from cs.RO) [pdf, ps, other]: Title: An Optimised Brushless DC Motor Control Scheme for Robotics Applications

Authors: Nilabha Das, Laxman Rao S. Paragond, Balkrushna H. Waghmare

Comments: 6 Pages, 8 figures, 1 table

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This work aims to develop an integrated control strategy for Brushless Direct Current Motors for a wide range of applications in robotics systems. The controller is suited for both high torque - low speed and high-speed control of the motors. Hardware validation is done by developing a custom BLDC drive system, and the circuit elements are optimised for power efficiency.
[48] arXiv:2404.17400 (cross-list from cs.CV) [pdf, other]: Title: Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

Authors: Zishu Yao, Guodong Fan, Jinfu Fan, Min Gan, C.L. Philip Chen

Comments: 14 page

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)

Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at https://github.com/iijjlk/DFFN.
[49] arXiv:2404.17484 (cross-list from cs.CV) [pdf, other]: Title: Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model

Authors: Zhenghong Li, Jiaxiang Ren, Wensheng Cheng, Congwu Du, Yingtian Pan, Haibin Ling

Comments: 19 pages, 5 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-processing. To obtain a high-resolution B-scan with a precise flow map, densely sampled A-scans are required in current methods, causing both computational and storage burdens. To address this issue, in this paper we propose a novel sparse reconstruction framework with four main sequential steps: 1) early magnitude-phase fusion that encourages rich interaction of the complementary information in magnitude and phase, 2) State Space Model (SSM)-based representation learning, inspired by recent successes in Mamba and VMamba, to naturally capture both the intra-A-scan sequential information and between-A-scan interactions, 3) an Inception-based Feedforward Network module (IncFFN) to further boost the SSM-module, and 4) a B-line Pixel Shuffle (BPS) layer to effectively reconstruct the final results. In the experiments on real-world animal data, our method shows clear effectiveness in reconstruction accuracy. As the first application of SSM for image reconstruction tasks, we expect our work to inspire related explorations in not only efficient ODT imaging techniques but also generic image enhancement.
[50] arXiv:2404.17519 (cross-list from cs.IT) [pdf, other]: Title: Interpreting Deepcode, a learned feedback code

Authors: Yingyao Zhou, Natasha Devroye, Gyorgy Turan, Milos Zefran

Comments: Accepted to the 2024 ISIT conference

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Deep learning methods have recently been used to construct non-linear codes for the additive white Gaussian noise (AWGN) channel with feedback. However, there is limited understanding of how these black-box-like codes with many learned parameters use feedback. This study aims to uncover the fundamental principles underlying the first deep-learned feedback code, known as Deepcode, which is based on an RNN architecture. Our interpretable model based on Deepcode is built by analyzing the influence length of inputs and approximating the non-linear dynamics of the original black-box RNN encoder. Numerical experiments demonstrate that our interpretable model -- which includes both an encoder and a decoder -- achieves comparable performance to Deepcode while offering an interpretation of how it employs feedback for error correction.
[51] arXiv:2404.17532 (cross-list from cs.NI) [pdf, ps, other]: Title: Mitigating Collisions in Sidelink NR V2X: A Study on Cooperative Resource Allocation

Authors: Mohammadsaleh Nikooroo, Juan Estrada-Jimenez, Aurel Machalek, Jerome Harri, Thomas Engel, Ion Turcanu

Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)

New Radio (NR) Vehicle-to-Everything (V2X) Sidelink (SL), an integral part of the 5G NR standard, is expected to revolutionize the automotive and rail industries by enabling direct and low-latency exchange of critical information between traffic participants independently of cellular networks. However, this advancement depends primarily on efficient SL resource allocation. Mode 2(a) is a well-known method for this purpose, where each node autonomously selects resources. However, this method is prone to packet collisions due to the hidden-node problem. In this paper, we propose a cooperative scheduling method that could potentially address this issue. We describe an extension of Mode 2(a) that allows nodes to share resource allocation information at two hops. Initial simulation results show a promising improvement over Mode 2(a).
[52] arXiv:2404.17541 (cross-list from math.OC) [pdf, ps, other]: Title: Applications of Lifted Nonlinear Cuts to Convex Relaxations of the AC Power Flow Equations

Authors: Sergio I. Bugosen, Robert B. Parker, Carleton Coffrin

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

We demonstrate that valid inequalities, or lifted nonlinear cuts (LNC), can be projected to tighten the Second Order Cone (SOC), Convex DistFlow (CDF), and Network Flow (NF) relaxations of the AC Optimal Power Flow (AC-OPF) problem. We conduct experiments on 36 cases from the PGLib-OPF library for two objective functions, (1) power generation maximization and (2) generation cost minimization. Significant optimality gap improvements are shown for the maximization problem, where the LNC strengthen the SOC and CDF relaxations in 100% of the test cases, with average and maximum differences in the optimality gaps of 23.1% and 93.5% respectively. The NF relaxation is strengthened in 79.2% of test cases, with average and maximum differences in the optimality gaps of 3.45% and 21.2% respectively. We also study the trade-off between relaxation quality and solve time, demonstrating that the strengthened CDF relaxation outperforms the strengthened SOC formulation in terms of runtime and number of iterations needed, while the strengthened NF formulation is the most scalable with the lowest relaxation quality provided by these LNC.
[53] arXiv:2404.17554 (cross-list from cs.HC) [pdf, ps, other]: Title: A Novel Context driven Critical Integrative Levels (CIL) Approach: Advancing Human-Centric and Integrative Lighting Asset Management in Public Libraries with Practical Thresholds

Authors: Jing Lin, Nina Mylly, Per Olof Hedekvist, Jingchun Shen

Subjects: Human-Computer Interaction (cs.HC); Signal Processing (eess.SP); Systems and Control (eess.SY); Applications (stat.AP)

This paper proposes the context driven Critical Integrative Levels (CIL), a novel approach to lighting asset management in public libraries that aligns with the transformative vision of human-centric and integrative lighting. This approach encompasses not only the visual aspects of lighting performance but also prioritizes the physiological and psychological well-being of library users. Incorporating a newly defined metric, Mean Time of Exposure (MTOE), the approach quantifies user-light interaction, enabling tailored lighting strategies that respond to diverse activities and needs in library spaces. Case studies demonstrate how the CIL matrix can be practically applied, offering significant improvements over conventional methods by focusing on optimized user experiences from both visual impacts and non-visual effects.

Replacements for Mon, 29 Apr 24

[54] arXiv:2110.03309 (replaced) [pdf, other]: Title: Explaining deep learning models for spoofing and deepfake detection with SHapley Additive exPlanations

Authors: Wanying Ge, Jose Patino, Massimiliano Todisco, Nicholas Evans

Comments: Accepted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS)
[55] arXiv:2211.16879 (replaced) [pdf, other]: Title: Robust, fast and accurate mapping of diffusional mean kurtosis

Authors: Megan E. Farquhar, Qianqian Yang, Viktor Vegh

Subjects: Image and Video Processing (eess.IV); Biological Physics (physics.bio-ph); Medical Physics (physics.med-ph)
[56] arXiv:2302.05309 (replaced) [pdf, other]: Title: The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization

Authors: Ilayda Yaman, Guoda Tian, Martin Larsson, Patrik Persson, Michiel Sandra, Alexander Dürr, Erik Tegler, Nikhil Challa, Henrik Garde, Fredrik Tufvesson, Kalle Åström, Ove Edfors, Steffen Malkowsky, Liang Liu

Comments: 7 pages, 7 figures, Accepted to ICRA 2024

Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57] arXiv:2304.11671 (replaced) [pdf, other]: Title: Battery Capacity Knee-Onset Identification and Early Prediction Using Degradation Curvature

Authors: Huang Zhang, Faisal Altaf, Torsten Wik

Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
[58] arXiv:2306.04242 (replaced) [pdf, other]: Title: 4D Millimeter-Wave Radar in Autonomous Driving: A Survey

Authors: Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, Keqiang Li

Subjects: Signal Processing (eess.SP); Robotics (cs.RO)
[59] arXiv:2306.15945 (replaced) [pdf, ps, other]: Title: Permutation Polynomial Interleaved Zadoff-Chu Sequences

Authors: Fredrik Berggren, Branislav M. Popovic

Comments: Submitted to IEEE Transactions on Information Theory

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[60] arXiv:2308.00130 (replaced) [pdf, other]: Title: Kinodynamic Motion Planning via Funnel Control for Underactuated Unmanned Surface Vehicles

Authors: Dženan Lapandić, Christos K. Verginis, Dimos V. Dimarogonas, Bo Wahlberg

Comments: 12 pages, 10 figures, submitted to IEEE T-CST

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[61] arXiv:2308.09375 (replaced) [pdf, other]: Title: Image Processing and Machine Learning for Hyperspectral Unmixing: An Overview and the HySUPP Python Package

Authors: Behnood Rasti (HZDR), Alexandre Zouaoui (Thoth), Julien Mairal (Thoth), Jocelyn Chanussot (Thoth)

Comments: IEEE Transactions on Geoscience and Remote Sensing, 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[62] arXiv:2309.05855 (replaced) [pdf, other]: Title: Instabilities in Convnets for Raw Audio

Authors: Daniel Haider, Vincent Lostanlen, Martin Ehler, Peter Balazs

Comments: 4 pages, 5 figures, 1 page appendix with mathematical proofs

Journal-ref: IEEE Signal Processing Letters 31 (2024) 1084-1088

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63] arXiv:2311.01167 (replaced) [pdf, ps, other]: Title: Modulation Design and Optimization for RIS-Assisted Symbiotic Radios

Authors: Hu Zhou, Bowen Cai, Qianqian Zhang, Ruizhe Long, Yiyang Pei, Ying-Chang Liang

Comments: 16 pages,16 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[64] arXiv:2311.06890 (replaced) [pdf, ps, other]: Title: Distributed Sequential Receding Horizon Control of Multi-Agent Systems under Recurring Signal Temporal Logic

Authors: Eleftherios E. Vlahakis, Lars Lindemann, Dimos V. Dimarogonas

Comments: Accepted for presentation at ECC24

Subjects: Systems and Control (eess.SY)
[65] arXiv:2311.08496 (replaced) [pdf, other]: Title: A Robust, Efficient Predictive Safety Filter

Authors: Wenceslao Shaw Cortez, Jan Drgona, Draguna Vrabie, Mahantesh Halappanavar

Subjects: Systems and Control (eess.SY)
[66] arXiv:2311.08880 (replaced) [pdf, other]: Title: Motion Control of Two Mobile Robots under Allowable Collisions

Authors: Li Tan, Wei Ren, Xi-Ming Sun, Junlin Xiong

Comments: 8 pages, 5 figures

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[67] arXiv:2311.16568 (replaced) [pdf, ps, other]: Title: Active Reconfigurable Intelligent Surface Enhanced Spectrum Sensing for Cognitive Radio Networks

Authors: Jungang Ge, Ying-Chang Liang, Sumei Sun, Yonghong Zeng, Zhidong Bai

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[68] arXiv:2312.06349 (replaced) [pdf, ps, other]: Title: DoA-Aided MMSE Channel Estimation for Wireless Communication Systems

Authors: Franz Weißer, Nurettin Turan, Wolfgang Utschick

Comments: Submitted to IEEE for possible publication

Subjects: Signal Processing (eess.SP)
[69] arXiv:2312.15416 (replaced) [pdf, other]: Title: On Completeness of SDP-Based Barrier Certificate Synthesis over Unbounded Domains

Authors: Hao Wu, Shenghua Feng, Ting Gan, Jie Wang, Bican Xia, Naijun Zhan

Comments: 18 pages, 1 figure

Subjects: Systems and Control (eess.SY)
[70] arXiv:2401.05217 (replaced) [pdf, other]: Title: Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method

Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Tingting Jiang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[71] arXiv:2401.06076 (replaced) [pdf, ps, other]: Title: Wireless Ear EEG to Monitor Drowsiness

Authors: Ryan Kaveh, Carolyn Schwendeman, Leslie Pu, Ana C. Arias, Rikky Muller

Subjects: Quantitative Methods (q-bio.QM); Signal Processing (eess.SP)
[72] arXiv:2401.06966 (replaced) [pdf, other]: Title: Near-Field Channel Estimation for XL-RIS Assisted Multi-User XL-MIMO Systems: Hybrid Beamforming Architectures

Authors: Jeongjae Lee, Hyeongjin Chung, Yunseong Cho, Sunwoo Kim, Songnam Hong

Comments: submitted to IEEE Transactions on Communications

Subjects: Signal Processing (eess.SP)
[73] arXiv:2403.03314 (replaced) [pdf, other]: Title: Collision Avoidance Verification of Multiagent Systems with Learned Policies

Authors: Zihao Dong, Shayegan Omidshafiei, Michael Everett

Comments: 6 pages, 6 figures

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO)
[74] arXiv:2403.03611 (replaced) [pdf, ps, other]: Title: Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Authors: Dang Thoai Phan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75] arXiv:2403.04654 (replaced) [pdf, other]: Title: Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention

Authors: R. Gnana Praveen, Jahangir Alam

Comments: Accepted to FG2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[76] arXiv:2403.06756 (replaced) [pdf, other]: Title: One-Bit Target Detection in Collocated MIMO Radar with Colored Background Noise

Authors: Yu-Hang Xiao, David Ramírez, Lei Huang, Xiao Peng Li, Hing Cheung So

Subjects: Signal Processing (eess.SP)
[77] arXiv:2403.08718 (replaced) [pdf, ps, other]: Title: Probabilistic Metaplasticity for Continual Learning with Memristors

Authors: Fatima Tuz Zohora, Vedant Karia, Nicholas Soures, Dhireesha Kudithipudi

Subjects: Systems and Control (eess.SY)
[78] arXiv:2403.10962 (replaced) [pdf, other]: Title: Exploiting Topological Priors for Boosting Point Cloud Generation

Authors: Baiyuan Chen

Comments: 7 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[79] arXiv:2404.03436 (replaced) [pdf, other]: Title: Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation

Authors: Luca Comanducci, Fabio Antonacci, Augusto Sarti

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[80] arXiv:2404.12604 (replaced) [pdf, ps, other]: Title: Transmitter Side Beyond-Diagonal RIS for mmWave Integrated Sensing and Communications

Authors: Kexin Chen, Yijie Mao

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
[81] arXiv:2404.13108 (replaced) [pdf, other]: Title: RegWSI: Whole Slide Image Registration using Combined Deep Feature- and Intensity-Based Methods: Winner of the ACROBAT 2023 Challenge

Authors: Marek Wodzinski, Niccolò Marini, Manfredo Atzori, Henning Müller

Journal-ref: Computer Methods and Programs in Biomedicine, Vol. 250, 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[82] arXiv:2404.13330 (replaced) [pdf, other]: Title: SEGSRNet for Stereo-Endoscopic Image Super-Resolution and Surgical Instrument Segmentation

Authors: Mansoor Hayat, Supavadee Aramvith, Titipat Achakulvisut

Comments: Paper accepted for Presentation in 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Orlando, Florida, USA (Camera Ready Version)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[83] arXiv:2404.15620 (replaced) [pdf, other]: Title: A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

Authors: Zhixiong Yang, Jingyuan Xia, Shengxi Li, Xinghua Huang, Shuanghui Zhang, Zhen Liu, Yaowen Fu, Yongxiang Liu

Comments: Accepted for publication in CVPR 2024

Subjects: Image and Video Processing (eess.IV)
[84] arXiv:2404.15939 (replaced) [pdf, other]: Title: Telco-RAG: Navigating the Challenges of Retrieval-Augmented Language Models for Telecommunications

Authors: Andrei-Laurentiu Bornea, Fadhel Ayed, Antonio De Domenico, Nicola Piovesan, Ali Maatouk

Comments: 6 pages, 5 Figure, 4 Tables, submitted to IEEE Globecom 2024 (see this https URL)

Subjects: Information Retrieval (cs.IR); Signal Processing (eess.SP)
[85] arXiv:2404.16305 (replaced) [pdf, other]: Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model

Authors: Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2404.16743 (replaced) [pdf, other]: Title: Automatic Speech Recognition System-Independent Word Error Rate Estimation

Authors: Chanho Park, Mingjie Chen, Thomas Hain

Comments: Accepted to LREC-COLING 2024 (long)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

New submissions
Cross-lists
Replacements

[ total of 86 entries: 1-86 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2404, contact, help (Access key information)

> eess

Electrical Engineering and Systems Science

New submissions

New submissions for Mon, 29 Apr 24

Cross-lists for Mon, 29 Apr 24

Replacements for Mon, 29 Apr 24