Electrical Engineering and Systems Science

New submissions
Cross-lists
Replacements

Total of 94 entries

Showing up to 1000 entries per page: fewer | more | all

[1] arXiv:2405.07994 [pdf, ps, other]: Title: BubbleID: A Deep Learning Framework for Bubble Interface Dynamics Analysis

Christy Dunlap, Changgen Li, Hari Pandey, Ngan Le, Han Hu

Comments: 16 pages, 4 figures

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetime, and capturing dynamic events such as bubble departure. BubbleID is trained and tested on boiling images across diverse heater surfaces and operational settings. This paper also offers a comparative analysis of bubble interface dynamics prior to and post-critical heat flux (CHF) conditions.
[2] arXiv:2405.08049 [pdf, ps, html, other]: Title: Optimizing Synthetic Correlated Diffusion Imaging for Breast Cancer Tumour Delineation

Chi-en Amy Tai, Alexander Wong

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Breast cancer is a significant cause of death from cancer in women globally, highlighting the need for improved diagnostic imaging to enhance patient outcomes. Accurate tumour identification is essential for diagnosis, treatment, and monitoring, emphasizing the importance of advanced imaging technologies that provide detailed views of tumour characteristics and disease. Synthetic correlated diffusion imaging (CDI$^s$) is a recent method that has shown promise for prostate cancer delineation compared to current MRI images. In this paper, we explore tuning the coefficients in the computation of CDI$^s$ for breast cancer tumour delineation by maximizing the area under the receiver operating characteristic curve (AUC) using a Nelder-Mead simplex optimization strategy. We show that the best AUC is achieved by the CDI$^s$ - Optimized modality, outperforming the best gold-standard modality by 0.0044. Notably, the optimized CDI$^s$ modality also achieves AUC values over 0.02 higher than the Unoptimized CDI$^s$ value, demonstrating the importance of optimizing the CDI$^s$ exponents for the specific cancer application.
[3] arXiv:2405.08053 [pdf, ps, html, other]: Title: Radio Resource Management and Path Planning in Intelligent Transportation Systems via Reinforcement Learning for Environmental Sustainability

S. Norouzi, N. Azarasa, M.R. Abedi, N. Mokari, S.E. Seyedabrishami, H. Saeedi, E.A. Jorswieck

Comments: 6 pages, 5 figures, accepted and presented in International Conference on Innovation and Technological Advances for Sustainability

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Efficient and dynamic path planning has become an important topic for urban areas with larger density of connected vehicles (CV) which results in reduction of travel time and directly contributes to environmental sustainability through reducing energy consumption. CVs exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the vehicle-to-infrastructure (V2I) messages to the Base-station (BS) to improve situation awareness on urban roads. In this paper, we investigate radio resource management (RRM) in such a framework to minimize the age of information (AoI) so as to enhance path planning results. We use the fact that V2I messages with lower AoI value result in less error in estimating the road capacity and more accurate path planning. Through simulations, we compare road travel times and volume over capacity (V/C) against different levels of AoI and demonstrate the promising performance of the proposed framework.
[4] arXiv:2405.08076 [pdf, ps, html, other]: Title: Show Me the Way: Real-Time Tracking of Wireless Mobile Users with UWB-Enabled RIS

Kevin Weinberger, Simon Tewes, Aydin Sezgin

Comments: 6 pages, 12 figures, submitted to 19th International Symposium on Wireless Communication Systems (ISWCS 2024)

Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)

The integration of Reconfigurable Intelligent Surfaces (RIS) in 6G wireless networks offers unprecedented control over communication environments. However, identifying optimal configurations within practical constraints remains a significant challenge. This becomes especially pronounced, when the user is mobile and the configurations need to be deployed in real time. Leveraging Ultra-Wideband (UWB) as localization technique, we capture and analyze real-time movements of a user within the RIS-enabled indoor environment. Given this information about the system's geometry, a model-based optimization is utilized, which enables real-time beam steering of the RIS towards the user. However, practical limitations of UWB modules lead to fluctuating UWB estimates, causing the RIS beam to occasionally miss the tracked user. The methodologies proposed in this work aim to increase the compatibility between these two systems. To this end, we provide two key solutions: beam splitting for obtaining more robust RIS configurations and UWB estimation correction for reducing the variations in the UWB data. Through comprehensive theoretical and experimental evaluations in both stationary and mobile scenarios, the effectiveness of the proposed techniques is demonstrated. When combined, the proposed methods improve worst-case tracking performance by a significant 17.5dB compared to the conventional approach.
[5] arXiv:2405.08078 [pdf, ps, html, other]: Title: Dynamic Rate Splitting Grouping for Antifragile Responses to Wireless Network Disruptions

Kevin Weinberger, Aydin Sezgin

Comments: 6 pages, 2 figures, submitted to 19th International Symposium on Wireless Communication Systems (ISWCS 2024)

Subjects: Signal Processing (eess.SP)

The reliance on wireless network architectures for applications demanding high reliability and fault tolerance is growing. These architectures heavily depend on wireless channels, making them susceptible to impairments and blockages. Ensuring functionality, particularly for safety-critical applications, demands robust countermeasures at the physical layer. In response, this work proposes the utilization of a dynamic Rate Splitting (RS) grouping approach as a resilience mechanism during blockages. RS effectively manages interference within networks but faces challenges during outages and blockages, where system performance can deteriorate due to the lowest decoding rate dictating the common rate and increased interference from fewer available channel links. As a strategic countermeasure, RS is leveraged to mitigate the impact of blockages, maintaining system efficiency and performance amidst disruptions. In fact, the introduction of new RS groups enables the exploration of novel solutions to the resource allocation problem, potentially outperforming those adopted before the occurrence of a blockage. As it turns out, by employing the dynamic RS grouping, the network exhibits an antifragile recovery response, showcasing the network's ability to not only recover from disruptions but also surpass its initial performance.
[6] arXiv:2405.08096 [pdf, ps, html, other]: Title: Semantic MIMO Systems for Speech-to-Text Transmission

Zhenzi Weng, Zhijin Qin, Huiqiang Xie, Xiaoming Tao, Khaled B. Letaief

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate the transmission with high semantic fidelity to identify the critical semantic information and guarantee it is recovered accurately. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information.
[7] arXiv:2405.08119 [pdf, ps, html, other]: Title: GPS-IMU Sensor Fusion for Reliable Autonomous Vehicle Position Estimation

Simegnew Yihunie Alaba

Comments: 6 pages, 4 figures, and conference

Subjects: Systems and Control (eess.SY); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Global Positioning System (GPS) navigation provides accurate positioning with global coverage, making it a reliable option in open areas with unobstructed sky views. However, signal degradation may occur in indoor spaces and urban canyons. In contrast, Inertial Measurement Units (IMUs) consist of gyroscopes and accelerometers that offer relative motion information such as acceleration and rotational changes. Unlike GPS, IMUs do not rely on external signals, making them useful in GPS-denied environments. Nonetheless, IMUs suffer from drift over time due to the accumulation of errors while integrating acceleration to determine velocity and position. Therefore, fusing the GPS and IMU is crucial for enhancing the reliability and precision of navigation systems in autonomous vehicles, especially in environments where GPS signals are compromised. To ensure smooth navigation and overcome the limitations of each sensor, the proposed method fuses GPS and IMU data. This sensor fusion uses the Unscented Kalman Filter (UKF) Bayesian filtering technique. The proposed navigation system is designed to be robust, delivering continuous and accurate positioning critical for the safe operation of autonomous vehicles, particularly in GPS-denied environments. This project uses KITTI GNSS and IMU datasets for experimental validation, showing that the GNSS-IMU fusion technique reduces GNSS-only data's RMSE. The RMSE decreased from 13.214, 13.284, and 13.363 to 4.271, 5.275, and 0.224 for the x-axis, y-axis, and z-axis, respectively. The experimental result using UKF shows promising direction in improving autonomous vehicle navigation using GPS and IMU sensor fusion using the best of two sensors in GPS-denied environments.
[8] arXiv:2405.08169 [pdf, ps, html, other]: Title: Rethinking Histology Slide Digitization Workflows for Low-Resource Settings

Talat Zehra, Joseph Marino, Wendy Wang, Grigoriy Frantsuzov, Saad Nadeem

Comments: MICCAI 2024 Early Accept. First four authors contributed equally

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Histology slide digitization is becoming essential for telepathology (remote consultation), knowledge sharing (education), and using the state-of-the-art artificial intelligence algorithms (augmented/automated end-to-end clinical workflows). However, the cumulative costs of digital multi-slide high-speed brightfield scanners, cloud/on-premises storage, and personnel (IT and technicians) make the current slide digitization workflows out-of-reach for limited-resource settings, further widening the health equity gap; even single-slide manual scanning commercial solutions are costly due to hardware requirements (high-resolution cameras, high-spec PC/workstation, and support for only high-end microscopes). In this work, we present a new cloud slide digitization workflow for creating scanner-quality whole-slide images (WSIs) from uploaded low-quality videos, acquired from cheap and inexpensive microscopes with built-in cameras. Specifically, we present a pipeline to create stitched WSIs while automatically deblurring out-of-focus regions, upsampling input 10X images to 40X resolution, and reducing brightness/contrast and light-source illumination variations. We demonstrate the WSI creation efficacy from our workflow on World Health Organization-declared neglected tropical disease, Cutaneous Leishmaniasis (prevalent only in the poorest regions of the world and only diagnosed by sub-specialist dermatopathologists, rare in poor countries), as well as other common pathologies on core biopsies of breast, liver, duodenum, stomach and lymph node. The code and pretrained models will be accessible via our GitHub (this https URL), and the cloud platform will be available at this https URL for uploading microscope videos and downloading/viewing WSIs with shareable links (no sign-in required) for telepathology and knowledge sharing.
[9] arXiv:2405.08179 [pdf, ps, html, other]: Title: Do Bayesian imaging methods report trustworthy probabilities?

David Y. W. Thong, Charlesquin Kemajou Mbakam, Marcelo Pereyra

Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Signal Processing (eess.SP); Applications (stat.AP); Machine Learning (stat.ML)

Bayesian statistics is a cornerstone of imaging sciences, underpinning many and varied approaches from Markov random fields to score-based denoising diffusion models. In addition to powerful image estimation methods, the Bayesian paradigm also provides a framework for uncertainty quantification and for using image data as quantitative evidence. These probabilistic capabilities are important for the rigorous interpretation of experimental results and for robust interfacing of quantitative imaging pipelines with scientific and decision-making processes. However, are the probabilities delivered by existing Bayesian imaging methods meaningful under replication of an experiment, or are they only meaningful as subjective measures of belief? This paper presents a Monte Carlo method to explore this question. We then leverage the proposed Monte Carlo method and run a large experiment requiring 1,000 GPU-hours to probe the accuracy of five canonical Bayesian imaging methods that are representative of some of the main Bayesian imaging strategies from the past decades (a score-based denoising diffusion technique, a plug-and-play Langevin algorithm utilising a Lipschitz-regularised DnCNN denoiser, a Bayesian method with a dictionary-based prior trained subject to a log-concavity constraint, an empirical Bayesian method with a total-variation prior, and a hierarchical Bayesian Gibbs sampler based on a Gaussian Markov random field model). We find that, a few cases, the probabilities reported by modern Bayesian imaging techniques are in broad agreement with long-term averages as observed over a large number of replication of an experiment, but existing Bayesian imaging methods are generally not able to deliver reliable uncertainty quantification results.
[10] arXiv:2405.08228 [pdf, ps, html, other]: Title: Slow Inter-area Electro-mechanical Oscillations Revisited: Structural Property of Complex Multi-area Electric Power Systems

Hiya Akhil Gada, Marija D. Ilic

Subjects: Systems and Control (eess.SY)

This paper introduces a physically-intuitive notion of inter-area dynamics in systems comprising multiple interconnected energy conversion modules. The ideas build on an earlier general approach to setting their structural properties by modeling first stand-alone modular dynamics starting from the fundamental relations between energy stored in modules (components, areas), and constraining explicitly their Tellegen's quantities, power and rate of change of power, in particular. In this paper we derive, by following the same principles, a transformed state-space model for a general nonlinear system. Using this model we show the existence of an area-level interaction variable, intVar, whose rate of change depends solely on the area internal power imbalance. Given these structural properties of stand-alone modules, we define in this paper for the first time an inter-area variable as the difference of power wave incident to tie-line from Area I and the power reflected into tie-lie from Area II. Notably, these power waves represent the rate of change of intVars associated with the two interconnected areas. We illustrate these notions using a linearized case of two lossless inter-connected areas, and show the existence of a new inter-area mode when the areas get connected. We suggest that lessons learned in this paper open possibilities for computationally-efficient modeling and control of inter-area oscillations, and offer further the basis for modeling and control of dynamics in changing systems comprising faster energy conversion processes.
[11] arXiv:2405.08247 [pdf, ps, other]: Title: Automated classification of multi-parametric body MRI series

Boah Kim, Tejas Sudharshan Mathai, Kimberly Helm, Ronald M. Summers

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

Multi-parametric MRI (mpMRI) studies are widely available in clinical practice for the diagnosis of various diseases. As the volume of mpMRI exams increases yearly, there are concomitant inaccuracies that exist within the DICOM header fields of these exams. This precludes the use of the header information for the arrangement of the different series as part of the radiologist's hanging protocol, and clinician oversight is needed for correction. In this pilot work, we propose an automated framework to classify the type of 8 different series in mpMRI studies. We used 1,363 studies acquired by three Siemens scanners to train a DenseNet-121 model with 5-fold cross-validation. Then, we evaluated the performance of the DenseNet-121 ensemble on a held-out test set of 313 mpMRI studies. Our method achieved an average precision of 96.6%, sensitivity of 96.6%, specificity of 99.6%, and F1 score of 96.6% for the MRI series classification task. To the best of our knowledge, we are the first to develop a method to classify the series type in mpMRI studies acquired at the level of the chest, abdomen, and pelvis. Our method has the capability for robust automation of hanging protocols in modern radiology practice.
[12] arXiv:2405.08277 [pdf, ps, other]: Title: AI-driven, Model-Free Current Control: A Deep Symbolic Approach for Optimal Induction Machine Performance

Muhammad Usama, Yunkyung Hwang, Jaehong Kim

Comments: This work has been accepted for potential publication at the IEEE ECCE Asia 2024 International Power Electronics and Motion Control Conference. Please note that copyright may be transferred without prior notice

Subjects: Systems and Control (eess.SY)

This paper proposed a straightforward and efficient current control solution for induction machines employing deep symbolic regression (DSR). The proposed DSR-based control design offers a simple yet highly effective approach by creating an optimal control model through training and fitting, resulting in an analytical dynamic numerical expression that characterizes the data. Notably, this approach not only produces an understandable model but also demonstrates the capacity to extrapolate and estimate data points outside its training dataset, showcasing its adaptability and resilience. In contrast to conventional state-of-the-art proportional-integral (PI) current controllers, which heavily rely on specific system models, the proposed DSR-based approach stands out for its model independence. Simulation and experimental tests validate its effectiveness, highlighting its superior extrapolation capabilities compared to conventional methods. These findings pave the way for the integration of deep learning methods in power conversion applications, promising improved performance and adaptability in the control of induction machines. The simulation and experimental test results are provided with a 3.7 kw induction machine to verify the efficacy of the proposed control solution.
[13] arXiv:2405.08282 [pdf, ps, other]: Title: Automatic Segmentation of the Kidneys and Cystic Renal Lesions on Non-Contrast CT Using a Convolutional Neural Network

Lucas Aronson (1), Ruben Ngnitewe Massaa (1), Syed Jamal Safdar Gardezi (1), Andrew L. Wentland (1,2,3) ((1) Department of Radiology, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA, (2) Department of Medical Physics, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA, (3) Department of Biomedical Engineering, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Objective: Automated segmentation tools are useful for calculating kidney volumes rapidly and accurately. Furthermore, these tools have the power to facilitate large-scale image-based artificial intelligence projects by generating input labels, such as for image registration algorithms. Prior automated segmentation models have largely ignored non-contrast computed tomography (CT) imaging. This work aims to implement and train a deep learning (DL) model to segment the kidneys and cystic renal lesions (CRLs) from non-contrast CT scans.
Methods: Manual segmentation of the kidneys and CRLs was performed on 150 non-contrast abdominal CT scans. The data were divided into an 80/20 train/test split and a deep learning (DL) model was trained to segment the kidneys and CRLs. Various scoring metrics were used to assess model performance, including the Dice Similarity Coefficient (DSC), Jaccard Index (JI), and absolute and percent error kidney volume and lesion volume. Bland-Altman (B-A) analysis was performed to compare manual versus DL-based kidney volumes.
Results: The DL model achieved a median kidney DSC of 0.934, median CRL DSC of 0.711, and total median study DSC of 0.823. Average volume errors were 0.9% for renal parenchyma, 37.0% for CRLs, and 2.2% overall. B-A analysis demonstrated that DL-based volumes tended to be greater than manual volumes, with a mean bias of +3.0 ml (+/- 2 SD of +/- 50.2 ml).
Conclusion: A deep learning model trained to segment kidneys and cystic renal lesions on non-contrast CT examinations was able to provide highly accurate segmentations, with a median kidney Dice Similarity Coefficient of 0.934.
Keywords: deep learning; kidney segmentation; artificial intelligence; convolutional neural networks.
[14] arXiv:2405.08288 [pdf, ps, other]: Title: Orthogonal Delay-Doppler Division Multiplexing Modulation with Tomlinson-Harashima Precoding

Yiyan Ma, Akram Shafie, Jinhong Yuan, Guoyu Ma, Zhangdui Zhong, Bo Ai

Subjects: Signal Processing (eess.SP)

The orthogonal delay-Doppler (DD) division multiplexing(ODDM) modulation has been recently proposed as a promising modulation scheme for next-generation communication systems with high mobility. Despite its benefits, ODDM modulation and other DD domain modulation schemes face the challenge of excessive equalization complexity. To address this challenge, we propose time domain Tomlinson-Harashima precoding (THP) for the ODDM transmitter, to make the DD domain single-tap equalizer feasible, thereby reducing the equalization complexity. In our design, we first pre-cancel the inter-symbolinterference (ISI) using the linear time-varying (LTV) channel information. Second, different from classical THP designs, we introduce a modified modulo operation with an adaptive modulus, by which the joint DD domain data multiplexing and timedomain ISI pre-cancellation can be realized without excessively increasing the bit errors. We then analytically study the losses encountered in this design, namely the power loss, the modulo noise loss, and the modulo signal loss. Based on this analysis, BER lower bounds of the ODDM system with time domain THP are derived when 4-QAM or 16-QAM modulations are adopted for symbol mapping in the DD domain. Finally, through numerical results, we validate our analysis and then demonstrate that the ODDM system with time domain THP is a promising solution to realize better BER performance over LTV channels compared to orthogonal frequency division multiplexing systems with single-tap equalizer and ODDM systems with maximum ratio combining.
[15] arXiv:2405.08292 [pdf, ps, other]: Title: Hybrid Event-Frame Neural Spike Detector for Neuromorphic Implantable BMI

Vivek Mohan, Wee Peng Tay, Arindam Basu

Comments: This paper has been accepted for 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore

Subjects: Signal Processing (eess.SP)

This work introduces two novel neural spike detection schemes intended for use in next-generation neuromorphic brain-machine interfaces (iBMIs). The first, an Event-based Spike Detector (Ev-SPD) which examines the temporal neighborhood of a neural event for spike detection, is designed for in-vivo processing and offers high sensitivity and decent accuracy (94-97%). The second, Neural Network-based Spike Detector (NN-SPD) which operates on hybrid temporal event frames, provides an off-implant solution using shallow neural networks with impressive detection accuracy (96-99%) and minimal false detections. These methods are evaluated using a synthetic dataset with varying noise levels and validated through comparison with ground truth data. The results highlight their potential in next-gen neuromorphic iBMI systems and emphasize the need to explore this direction further to understand their resource-efficient and high-performance capabilities for practical iBMI settings.
[16] arXiv:2405.08349 [pdf, ps, other]: Title: Model-Free Unsupervised Anomaly detection framework in multivariate time-series of industrial dynamical systems

Mazen Alamir, Raphaël Dion

Comments: 21 pages, 2 tables, 10 figures

Subjects: Systems and Control (eess.SY)

In this paper, a new model-free anomaly detection framework is proposed for time-series induced by industrial dynamical systems. The framework lies in the category of conventional approaches which enable appealing features such as, a fast learning with reduced amount of learning data, a reduced memory, a high potential for explainability as well as easiness of incremental learning mechanism to incorporate operator feedback after an alarm is raised an analyzed. All these are crucial features towards acceptance of data-driven solution by industry but they are rarely considered in the comparisons between competing methods which generally exclusively focus on performance metrics. Moreover, the features engineering step involved in the proposed framework is inspired by the time-series being implicitly governed by physical laws as it is generally the case in industrial time-series. Two examples are given to assess the efficiency of the proposed approach.
[17] arXiv:2405.08353 [pdf, ps, other]: Title: Data-driven memory-dependent abstractions of dynamical systems via a Cantor-Kantorovich metric

Adrien Banse, Licio Romao, Alessandro Abate, Raphaël M. Jungers

Comments: Submitted to IEEE Transactions on Automatic Control

Subjects: Systems and Control (eess.SY)

Abstractions of dynamical systems enable their verification and the design of feedback controllers using simpler, usually discrete, models. In this paper, we propose a data-driven abstraction mechanism based on a novel metric between Markov models. Our approach is based purely on observing output labels of the underlying dynamics, thus opening the road for a fully data-driven approach to construct abstractions. Another feature of the proposed approach is the use of memory to better represent the dynamics in a given region of the state space. We show through numerical examples the usefulness of the proposed methodology.
[18] arXiv:2405.08392 [pdf, ps, other]: Title: Neuromorphic Robust Estimation of Nonlinear Dynamical Systems Applied to Satellite Rendezvous

Reza Ahmadvand, Sarah Safura Sharif, Yaser Mike Banad

Comments: 11 figures, 7 tables, 37 pages. arXiv admin note: text overlap with arXiv:2307.07963

Subjects: Systems and Control (eess.SY); Earth and Planetary Astrophysics (astro-ph.EP); Neural and Evolutionary Computing (cs.NE)

State estimation of nonlinear dynamical systems has long aimed to balance accuracy, computational efficiency, robustness, and reliability. The rapid evolution of various industries has amplified the demand for estimation frameworks that satisfy all these factors. This study introduces a neuromorphic approach for robust filtering of nonlinear dynamical systems: SNN-EMSIF (spiking neural network-extended modified sliding innovation filter). SNN-EMSIF combines the computational efficiency and scalability of SNNs with the robustness of EMSIF, an estimation framework designed for nonlinear systems with zero-mean Gaussian noise. Notably, the weight matrices are designed according to the system model, eliminating the need for a learning process. The framework's efficacy is evaluated through comprehensive Monte Carlo simulations, comparing SNN-EMSIF with EKF and EMSIF. Additionally, it is compared with SNN-EKF in the presence of modeling uncertainties and neuron loss, using RMSEs as a metric. The results demonstrate the superior accuracy and robustness of SNN-EMSIF. Further analysis of runtimes and spiking patterns reveals an impressive reduction of 85% in emitted spikes compared to possible spikes, highlighting the computational efficiency of SNN-EMSIF. This framework offers a promising solution for robust estimation in nonlinear dynamical systems, opening new avenues for efficient and reliable estimation in various industries that can benefit from neuromorphic computing.
[19] arXiv:2405.08417 [pdf, ps, other]: Title: Simple and Efficient Quantization Techniques for Neural Speech Coding

Andreas Brendel, Nicola Pia, Kishan Gupta, Guillaume Fuchs, Markus Multrus

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder has to be learned that allows for efficient transmission of the input audio signal. This discrete representation is typically generated by applying a quantizer to the output of the neural encoder. In almost all state-of-the-art neural audio coding approaches, this quantizer is realized as a Vector Quantizer (VQ) and a lot of effort has been spent to alleviate drawbacks of this quantization technique when used together with a neural audio coder. In this paper, we propose simple alternatives to VQ, which are based on projected Scalar Quantization (SQ). These quantization techniques do not need any additional losses, scheduling parameters or codebook storage thereby simplifying the training of neural audio codecs. Furthermore, we propose a new causal network architecture for neural speech coding that shows good performance at very low computational complexity.
[20] arXiv:2405.08423 [pdf, ps, html, other]: Title: NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at this https URL.
[21] arXiv:2405.08428 [pdf, ps, other]: Title: A Low-Power Spike Detector Using In-Memory Computing for Event-based Neural Frontend

Ye Ke, Arindam Basu

Comments: Originally submitted at IEEE ISCAS 2024

Subjects: Signal Processing (eess.SP)

With the sensor scaling of next-generation Brain-Machine Interface (BMI) systems, the massive A/D conversion and analog multiplexing at the neural frontend poses a challenge in terms of power and data rates for wireless and implantable BMIs. While previous works have reported the neuromorphic compression of neural signal, further compression requires integration of spike detectors on chip. In this work, we propose an efficient HRAM-based spike detector using In-memory computing for compressive event-based neural frontend. Our proposed method involves detecting spikes from event pulses without reconstructing the signal and uses a 10T hybrid in-memory computing bitcell for the accumulation and thresholding operations. We show that our method ensures a spike detection accuracy of 92-99% for neural signal inputs while consuming only 13.8 nW per channel in 65 nm CMOS.
[22] arXiv:2405.08431 [pdf, ps, other]: Title: Similarity Metrics for MR Image-To-Image Translation

Melanie Dohmen (1), Mark Klemens (1), Ivo Baltruschat (1), Tuan Truong (1), Matthias Lenga (1) ((1) Bayer AG, Berlin, Germany)

Comments: 29 pages, 6 figures, appendix with 5 figures

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Image-to-image translation can create large impact in medical imaging, i.e. if images of a patient can be translated to another modality, type or sequence for better diagnosis. However, these methods must be validated by human reader studies, which are costly and restricted to small samples. Automatic evaluation of large samples to pre-evaluate and continuously improve methods before human validation is needed. In this study, we give an overview of reference and non-reference metrics for image synthesis assessment and investigate the ability of nine metrics, that need a reference (SSIM, MS-SSIM, PSNR, MSE, NMSE, MAE, LPIPS, NMI and PCC) and three non-reference metrics (BLUR, MSN, MNG) to detect 11 kinds of distortions in MR images from the BraSyn dataset. In addition we test a downstream segmentation metric and the effect of three normalization methods (Minmax, cMinMax and Zscore). Although PSNR and SSIM are frequently used to evaluate generative models for image-to-image-translation tasks in the medical domain, they show very specific shortcomings. SSIM ignores blurring but is very sensitive to intensity shifts in unnormalized MR images. PSNR is even more sensitive to different normalization methods and hardly measures the degree of distortions. Further metrics, such as LPIPS, NMI and DICE can be very useful to evaluate other similarity aspects. If the images to be compared are misaligned, most metrics are flawed. By carefully selecting and reasonably combining image similarity metrics, the training and selection of generative models for MR image synthesis can be improved. Many aspects of their output can be validated before final and costly evaluation by trained radiologists is conducted.
[23] arXiv:2405.08512 [pdf, ps, other]: Title: CFM6, a closed-form NLI EGN model supporting multiband transmission with arbitrary Raman amplification

Yanchao Jiang, Pierluigi Poggiolini

Subjects: Signal Processing (eess.SP)

We formulated a closed-form EGN model for nonlinear interference in ultra-wideband optical systems with arbitrary Raman amplification. This model enhanced the CISCO-POLITO-CFM5 performance by introducing a novel contribution attributed to the backward Raman amplification. It can handle the frequency-dependent fiber parameters and inter-channel stimulated Raman scattering.
[24] arXiv:2405.08521 [pdf, ps, other]: Title: Cooperative Sensing of Side Lobes Interference for mmWave Blockages Localization and Mapping

Hiba Dakdouk, Mohamed Sana, Benoit Denis

Comments: This work has been accepted for publication in 2024 European Conference on Networks and Communications (EuCNC) & 6G Summit. arXiv admin note: text overlap with arXiv:2306.17650

Subjects: Signal Processing (eess.SP)

Radio localization and sensing are anticipated to play a crucial role in enhancing radio resource management in future networks. In this work, we focus on millimeter-wave communications, which are highly vulnerable to blockages, leading to severe attenuation and performance degradation. In a previous work, we proposed a novel mechanism that senses the radio environment to estimate the angular position of a moving blocker with respect to the sensing node. Building upon this foundation, this paper investigates the benefits of cooperation between different entities in the network by sharing sensed data to jointly locate the moving blocker while mapping the interference profile to probe the radio environment. Numerical evaluations demonstrate that cooperative sensing can achieve a more precise location estimation of the blocker as it further allows accurate estimation of its distance rather than its relative angular position only, leading to effective assessment of the blocker direction, trajectory and possibly, its speed, and size.
[25] arXiv:2405.08530 [pdf, ps, other]: Title: Parameter-Efficient Instance-Adaptive Neural Video Compression

Hyunmo Yang, Seungjun Oh, Eunbyung Park

Comments: 23 pages, 13 figures

Subjects: Image and Video Processing (eess.IV)

Learning-based Neural Video Codecs (NVCs) have emerged as a compelling alternative to the standard video codecs, demonstrating promising performance, and simple and easily maintainable pipelines. However, NVCs often fall short of compression performance and occasionally exhibit poor generalization capability due to inference-only compression scheme and their dependence on training data. The instance-adaptive video compression techniques have recently been suggested as a viable solution, fine-tuning the encoder or decoder networks for a particular test instance video. However, fine-tuning all the model parameters incurs high computational costs, increases the bitrates, and often leads to unstable training. In this work, we propose a parameter-efficient instance-adaptive video compression framework. Inspired by the remarkable success of parameter-efficient fine-tuning on large-scale neural network models, we propose to use a lightweight adapter module that can be easily attached to the pretrained NVCs and fine-tuned for test video sequences. The resulting algorithm significantly improves compression performance and reduces the encoding time compared to the existing instant-adaptive video compression algorithms. Furthermore, the suggested fine-tuning method enhances the robustness of the training process, allowing for the proposed method to be widely used in many practical settings. We conducted extensive experiments on various standard benchmark datasets, including UVG, MCL-JVC, and HEVC sequences, and the experimental results have shown a significant improvement in rate-distortion (RD) curves (up to 5 dB PSNR improvements) and BD rates compared to the baselines NVC.
[26] arXiv:2405.08556 [pdf, ps, other]: Title: Shape-aware synthesis of pathological lung CT scans using CycleGAN for enhanced semi-supervised lung segmentation

Rezkellah Noureddine Khiati, Pierre-Yves Brillet, Aurélien Justet, Radu Ispa, Catalin Fetita

Comments: 14 pages, 7 figures

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper addresses the problem of pathological lung segmentation, a significant challenge in medical image analysis, particularly pronounced in cases of peripheral opacities (severe fibrosis and consolidation) because of the textural similarity between lung tissue and surrounding areas. To overcome these challenges, this paper emphasizes the use of CycleGAN for unpaired image-to-image translation, in order to provide an augmentation method able to generate fake pathological images matching an existing ground truth. Although previous studies have employed CycleGAN, they often neglect the challenge of shape deformation, which is crucial for accurate medical image segmentation. Our work introduces an innovative strategy that incorporates additional loss functions. Specifically, it proposes an L1 loss based on the lung surrounding which shape is constrained to remain unchanged at the transition from the healthy to pathological domains. The lung surrounding is derived based on ground truth lung masks available in the healthy domain. Furthermore, preprocessing steps, such as cropping based on ribs/vertebra locations, are applied to refine the input for the CycleGAN, ensuring that the network focus on the lung region. This is essential to avoid extraneous biases, such as the zoom effect bias, which can divert attention from the main task. The method is applied to enhance in semi-supervised manner the lung segmentation process by employing a U-Net model trained with on-the-fly data augmentation incorporating synthetic pathological tissues generated by the CycleGAN model. Preliminary results from this research demonstrate significant qualitative and quantitative improvements, setting a new benchmark in the field of pathological lung segmentation. Our code is available at this https URL
[27] arXiv:2405.08585 [pdf, ps, other]: Title: Design of a Multi-User RIS-Aided System with Statistical Channel Knowledge

Sadaf Syed, Dominik Semmler, Donia Ben Amor, Michael Joham, Wolfgang Utschick

Subjects: Signal Processing (eess.SP)

Reconfigurable intelligent surface (RIS) is a promising technology to enhance the spectral and energy efficiency in a wireless communication system. The design of the phase shifts of an RIS in every channel coherence interval demands a huge training overhead, making its deployment practically infeasible. The design complexity can be significantly reduced by exploiting the second-order statistics of the channels. This paper is the extension of our previous work to the design of an RIS for the multi-user setup, where we employ maximisation of the lower bound of the achievable sum-rate of the users. Unlike for the single-user case, obtaining a closed-form expression for the update of the filters and phase shifts is more challenging in the multi-user case. We resort to the fractional programming (FP) approach and the non-convex block coordinate descent (BCD) method to solve the optimisation problem. As the phase shifts of the RIS obtained by the proposed algorithms are based on the statistical channel knowledge, they do not need to be updated in every channel coherence interval.
[28] arXiv:2405.08599 [pdf, ps, other]: Title: The distributed biased min-consensus protocol revisited: pre-specified finite time control strategies and small-gain based analysis

Yuanqiu Mo, He Wang

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Unlike the classical distributed consensus protocols enabling the group of agents as a whole to reach an agreement regarding a certain quantity of interest in a distributed fashion, the distributed biased min-consensus protocol (DBMC) has been proven to generate advanced complexity pertaining to solving the shortest path problem. As such a protocol is commonly incorporated as the first step of a hierarchical architecture in real applications, e.g., robots path planning, management of dispersed computing services, an impedance limiting the application potential of DBMC lies in, the lack of results regarding to its convergence within a user-assigned time. In this paper, we first propose two control strategies ensuring the state error of DBMC decrease exactly to zero or a desired level manipulated by the user, respectively. To compensate the high feedback gains incurred by these two control strategies, this paper further investigates the nominal DBMC itself. By leveraging small gain based stability tools, this paper also proves the global exponential input-to-state stability of DBMC, outperforming its current stability results. Simulations have been provided to validate the efficacy of our theoretical result.
[29] arXiv:2405.08614 [pdf, ps, other]: Title: FDD Massive MIMO: How to Optimally Combine UL Pilot and Limited DL CSI Feedback?

Jungyeon Kim, Jinseok Choi, Jeonghun Park, Ahmed Alkhateeb, Namyoon Lee

Comments: 13 pages, 10 figures

Subjects: Signal Processing (eess.SP)

In frequency-division duplexing (FDD) multiple-input multiple-output (MIMO) systems, obtaining accurate downlink channel state information (CSI) for precoding is vastly challenging due to the tremendous feedback overhead with the growing number of antennas. Utilizing uplink pilots for downlink CSI estimation is a promising approach that can eliminate CSI feedback. However, the downlink CSI estimation accuracy diminishes significantly as the number of channel paths increases, resulting in reduced spectral efficiency. In this paper, we demonstrate that achieving downlink spectral efficiency comparable to perfect CSI is feasible by combining uplink CSI with limited downlink CSI feedback information. Our proposed downlink CSI feedback strategy transmits quantized phase information of downlink channel paths, deviating from conventional limited methods. We put forth a mean square error (MSE)-optimal downlink channel reconstruction method by jointly exploiting the uplink CSI and the limited downlink CSI. Armed with the MSE-optimal estimator, we derive the MSE as a function of the number of feedback bits for phase quantization. Subsequently, we present an optimal feedback bit allocation method for minimizing the MSE in the reconstructed channel through phase quantization. Utilizing a robust downlink precoding technique, we establish that the proposed downlink channel reconstruction method is sufficient for attaining a sum-spectral efficiency comparable to perfect CSI.
[30] arXiv:2405.08621 [pdf, ps, other]: Title: RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content

Tianhao Peng, Chen Feng, Duolikun Danier, Fan Zhang, David Bull

Comments: 8pages, 2figures

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artefacts and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a novel blind deep video quality assessment (VQA) method specifically for enhanced video content. It employs a new Recurrent Memory Transformer (RMT) based network architecture to obtain video quality representations, which is optimised through a novel content-quality-aware contrastive learning strategy based on a new database containing 13K training patches with enhanced content. The extracted quality representations are then combined through linear regression to generate video-level quality indices. The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation. The results show its superior correlation performance when compared to ten existing no-reference quality metrics.
[31] arXiv:2405.08632 [pdf, ps, other]: Title: Instantaneous Bandwidth Estimation from Level-Crossing Samples via LSTM-based Encoder-Decoder Architecture

Johannes Königs, Carsten Bockelmann, Armin Dekorsy

Comments: Submitted to the 25th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2024)

Subjects: Signal Processing (eess.SP)

This paper presents an approach for instantaneous bandwidth estimation from level-crossing samples using a long short-term memory (LSTM) encoder-decoder architecture. Level-crossing sampling is a nonuniform sampling technique that is particularly useful for energy-efficient acquisition of signals with sparse spectra. Especially in combination with fully analog wireless sensor nodes, level-crossing sampling offers a viable alternative to traditional sampling methods. However, due to the nonuniform distribution of samples, reconstructing the original signal is a challenging task. One promising reconstruction approach is time-warping, where the local signal spectrum is taken into account. However, this requires an accurate estimate of the instantaneous bandwidth of the signal. In this paper, we show that applying neural networks (NNs) to the problem of estimating instantaneous bandwidth from level-crossing samples can improve the overall reconstruction accuracy. We conduct a comprehensive numerical analysis of the proposed approach and compare it to an intensity-based bandwidth estimation method from literature.
[32] arXiv:2405.08657 [pdf, ps, other]: Title: Self-supervised learning improves robustness of deep learning lung tumor segmentation to CT imaging differences

Jue Jiang, Aneesh Rangnekar, Harini Veeraraghavan

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Self-supervised learning (SSL) is an approach to extract useful feature representations from unlabeled data, and enable fine-tuning on downstream tasks with limited labeled examples. Self-pretraining is a SSL approach that uses the curated task dataset for both pretraining the networks and fine-tuning them. Availability of large, diverse, and uncurated public medical image sets provides the opportunity to apply SSL in the "wild" and potentially extract features robust to imaging variations. However, the benefit of wild- vs self-pretraining has not been studied for medical image analysis. In this paper, we compare robustness of wild versus self-pretrained transformer (vision transformer [ViT] and hierarchical shifted window [Swin]) models to computed tomography (CT) imaging differences for non-small cell lung cancer (NSCLC) segmentation. Wild-pretrained Swin models outperformed self-pretrained Swin for the various imaging acquisitions. ViT resulted in similar accuracy for both wild- and self-pretrained models. Masked image prediction pretext task that forces networks to learn the local structure resulted in higher accuracy compared to contrastive task that models global image information. Wild-pretrained models resulted in higher feature reuse at the lower level layers and feature differentiation close to output layer after fine-tuning. Hence, we conclude: Wild-pretrained networks were more robust to analyzed CT imaging differences for lung tumor segmentation than self-pretrained methods. Swin architecture benefited from such pretraining more than ViT.
[33] arXiv:2405.08658 [pdf, ps, other]: Title: Beyond the Black Box: Do More Complex Models Provide Superior XAI Explanations?

Mateusz Cedro, Marcin Chlebus

Comments: 15 pages, 9 figures, 5 tables

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The increasing complexity of Artificial Intelligence models poses challenges to interpretability, particularly in the healthcare sector. This study investigates the impact of deep learning model complexity and Explainable AI (XAI) efficacy, utilizing four ResNet architectures (ResNet-18, 34, 50, 101). Through methodical experimentation on 4,369 lung X-ray images of COVID-19-infected and healthy patients, the research evaluates models' classification performance and the relevance of corresponding XAI explanations with respect to the ground-truth disease masks. Results indicate that the increase in model complexity is associated with a decrease in classification accuracy and AUC-ROC scores (ResNet-18: 98.4%, 0.997; ResNet-101: 95.9%, 0.988). Notably, in eleven out of twelve statistical tests performed, no statistically significant differences occurred between XAI quantitative metrics - Relevance Rank Accuracy and the proposed Positive Attribution Ratio - across trained models. These results suggest that increased model complexity does not consistently lead to higher performance or relevance of explanations for models' decision-making processes.
[34] arXiv:2405.08672 [pdf, ps, other]: Title: EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

Comments: early accepted by MICCAI 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adaptation methods to adapt these models to endoscopic depth estimation. We propose Endoscopic Depth Any Camera (EndoDAC) which is an efficient self-supervised depth estimation framework that adapts foundation models to endoscopic scenes. Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks to tailor the foundational model to the surgical domain, utilizing remarkably few trainable parameters. Given that camera information is not always accessible, we also introduce a self-supervised adaptation strategy that estimates camera intrinsics using the pose encoder. Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs. Experiments demonstrate that our approach obtains superior performance even with fewer training epochs and unaware of the ground truth camera intrinsics. Code is available at this https URL.
[35] arXiv:2405.08706 [pdf, ps, other]: Title: Design and Analysis of Resilient Vehicular Platoon Systems over Wireless Networks

Tingyu Shui, Walid Saad

Comments: 6 pages, 4 figures, in submission of Globecom 2024

Subjects: Systems and Control (eess.SY)

Connected vehicular platoons provide a promising solution to improve traffic efficiency and ensure road safety. Vehicles in a platoon utilize on-board sensors and wireless vehicle-to-vehicle (V2V) links to share traffic information for cooperative adaptive cruise control. To process real-time control and alert information, there is a need to ensure clock synchronization among the platoon's vehicles. However, adversaries can jeopardize the operation of the platoon by attacking the local clocks of vehicles, leading to clock offsets with the platoon's reference clock. In this paper, a novel framework is proposed for analyzing the resilience of vehicular platoons that are connected using V2V links. In particular, a resilient design based on a diffusion protocol is proposed to re-synchronize the attacked vehicle through wireless V2V links thereby mitigating the impact of variance of the transmission delay during recovery. Then, a novel metric named temporal conditional mean exceedance is defined and analyzed in order to characterize the resilience of the platoon. Subsequently, the conditions pertaining to the V2V links and recovery time needed for a resilient design are derived. Numerical results show that the proposed resilient design is feasible in face of a nine-fold increase in the variance of transmission delay compared to a baseline designed for reliability. Moreover, the proposed approach improves the reliability, defined as the probability of meeting a desired clock offset error requirement, by 45% compared to the baseline.
[36] arXiv:2405.08742 [pdf, ps, other]: Title: A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes

Yicheng Hsu, Mingsian R. Bai

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Binaural Audio Telepresence (BAT) aims to encode the acoustic scene at the far end into binaural signals for the user at the near end. BAT encompasses an immense range of applications that can vary between two extreme modes of Immersive BAT (I-BAT) and Enhanced BAT (E-BAT). With I-BAT, our goal is to preserve the full ambience as if we were at the far end, while with E-BAT, our goal is to enhance the far-end conversation with significantly improved speech quality and intelligibility. To this end, this paper presents a tunable BAT system to vary between these two AT modes with a desired application-specific balance. Microphone signals are converted into binaural signals with prescribed ambience factor. A novel Spatial COherence REpresentation (SCORE) is proposed as an input feature for model training so that the network remains robust to different array setups. Experimental results demonstrated the superior performance of the proposed BAT, even when the array configurations were not included in the training phase.
[37] arXiv:2405.08745 [pdf, ps, other]: Title: Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features to help the BVQA model to handle complex distortions and diverse content of social media videos. Specifically, we use SimpleVQA, a BVQA model that consists of a trainable Swin Transformer-B and a fixed SlowFast, as our base model. The Swin Transformer-B and SlowFast components are responsible for extracting spatial and motion features, respectively. Then, we extract three kinds of features from Q-Align, LIQE, and FAST-VQA to capture frame-level quality-aware features, frame-level quality-aware along with scene-specific features, and spatiotemporal quality-aware features, respectively. Through concatenating these features, we employ a multi-layer perceptron (MLP) network to regress them into quality scores. Experimental results demonstrate that the proposed model achieves the best performance on three public social media VQA datasets. Moreover, the proposed model won first place in the CVPR NTIRE 2024 Short-form UGC Video Quality Assessment Challenge. The code is available at \url{this https URL}.
[38] arXiv:2405.08756 [pdf, ps, other]: Title: Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes

Samuel Tesfazgi, Leonhard Sprandl, Armin Lederer, Sandra Hirche

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, it is also computationally demanding and generally lacks convergence guarantees. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world data.
[39] arXiv:2405.08783 [pdf, ps, html, other]: Title: The Developing Human Connectome Project: A Fast Deep Learning-based Pipeline for Neonatal Cortical Surface Reconstruction

Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

Subjects: Image and Video Processing (eess.IV)

The Developing Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 hours to process a single MRI scan, making it expensive for large-scale neuroimaging studies. In this paper, we propose a fast deep learning (DL) based pipeline for dHCP neonatal cortical surface reconstruction, incorporating DL-based brain extraction, cortical surface reconstruction and spherical projection, as well as GPU-accelerated cortical surface inflation and cortical feature estimation. We introduce a multiscale deformation network to learn diffeomorphic cortical surface reconstruction end-to-end from T2-weighted brain MRI. A fast unsupervised spherical mapping approach is integrated to minimize metric distortions between cortical surfaces and projected spheres. The entire workflow of our DL-based dHCP pipeline completes within only 24 seconds on a modern GPU, which is nearly 1000 times faster than the original dHCP pipeline. Manual quality control demonstrates that for 82.5% of the test samples, our DL-based pipeline produces superior (54.2%) or equal quality (28.3%) cortical surfaces compared to the original dHCP pipeline.
[40] arXiv:2405.08790 [pdf, ps, other]: Title: Kolmogorov-Arnold Networks (KANs) for Time Series Analysis

Cristian J. Vaca-Rubio, Luis Blanco, Roberto Pereira, Màrius Caus

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper introduces a novel application of Kolmogorov-Arnold Networks (KANs) to time series forecasting, leveraging their adaptive activation functions for enhanced predictive modeling. Inspired by the Kolmogorov-Arnold representation theorem, KANs replace traditional linear weights with spline-parametrized univariate functions, allowing them to learn activation patterns dynamically. We demonstrate that KANs outperforms conventional Multi-Layer Perceptrons (MLPs) in a real-world satellite traffic forecasting task, providing more accurate results with considerably fewer number of learnable parameters. We also provide an ablation study of KAN-specific parameters impact on performance. The proposed approach opens new avenues for adaptive forecasting models, emphasizing the potential of KANs as a powerful tool in predictive analytics.
[41] arXiv:2405.08800 [pdf, ps, other]: Title: Estimation of Participation Factors for Power System Oscillation from Measurements

Tianwei Xia, Zhe Yu, Kai Sun, Di Shi, Kaiyang Huang

Subjects: Systems and Control (eess.SY)

In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under selected disturbances. The approach computes extended participation factors that coincide with accurate model-based participation factors when the measured responses satisfy an ideally symmetric condition. This paper relaxes this symmetric condition with the original measurement space by identifying and utilizing a coordinate transformation to a new space optimally recovering the symmetry. Thus, the optimal estimates of participation factors solely from measurements are achieved, and the accuracy and influencing factors are discussed. The proposed approach is first demonstrated in detail on a two-area system and then tested on an NPCC 48-machine power system. The penetration of inverter-based resources is also considered.

[42] arXiv:2403.01512 (cross-list from cs.RO) [pdf, ps, html, other]: Title: Cooperative Automated Driving for Bottleneck Scenarios in Mixed Traffic

M.V. Baumann, J. Beyerer, H.S. Buck, B. Deml, S. Ehrhardt, Ch. Frese, D. Kleiser, M. Lauer, M. Roschani, M. Ruf, Ch. Stiller, P. Vortisch, J.R. Ziehn

Comments: 8 pages, 7 figures

Journal-ref: 35th IEEE Intelligent Vehicles Symposium (IV 2023)

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

Connected automated vehicles (CAV), which incorporate vehicle-to-vehicle (V2V) communication into their motion planning, are expected to provide a wide range of benefits for individual and overall traffic flow. A frequent constraint or required precondition is that compatible CAVs must already be available in traffic at high penetration rates. Achieving such penetration rates incrementally before providing ample benefits for users presents a chicken-and-egg problem that is common in connected driving development. Based on the example of a cooperative driving function for bottleneck traffic flows (e.g. at a roadblock), we illustrate how such an evolutionary, incremental introduction can be achieved under transparent assumptions and objectives. To this end, we analyze the challenge from the perspectives of automation technology, traffic flow, human factors and market, and present a principle that 1) accounts for individual requirements from each domain; 2) provides benefits for any penetration rate of compatible CAVs between 0 % and 100 % as well as upward-compatibility for expected future developments in traffic; 3) can strictly limit the negative effects of cooperation for any participant and 4) can be implemented with close-to-market technology. We discuss the technical implementation as well as the effect on traffic flow over a wide parameter spectrum for human and technical aspects.
[43] arXiv:2405.08021 (cross-list from cs.SD) [pdf, ps, html, other]: Title: Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

Zhao Ren, Kevin Scheck, Qinhan Hou, Stefano van Gogh, Michael Wand, Tanja Schultz

Comments: Accepted by EMBC 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Electromyography-to-Speech (ETS) conversion has demonstrated its potential for silent speech interfaces by generating audible speech from Electromyography (EMG) signals during silent articulations. ETS models usually consist of an EMG encoder which converts EMG signals to acoustic speech features, and a vocoder which then synthesises the speech signals. Due to an inadequate amount of available data and noisy signals, the synthesised speech often exhibits a low level of naturalness. In this work, we propose Diff-ETS, an ETS model which uses a score-based diffusion probabilistic model to enhance the naturalness of synthesised speech. The diffusion model is applied to improve the quality of the acoustic features predicted by an EMG encoder. In our experiments, we evaluated fine-tuning the diffusion model on predictions of a pre-trained EMG encoder, and training both models in an end-to-end fashion. We compared Diff-ETS with a baseline ETS model without diffusion using objective metrics and a listening test. The results indicated the proposed Diff-ETS significantly improved speech naturalness over the baseline.
[44] arXiv:2405.08122 (cross-list from cs.RO) [pdf, ps, html, other]: Title: Equivariant Deep Learning of Mixed-Integer Optimal Control Solutions for Vehicle Decision Making and Motion Planning

Rudolf Reiter, Rien Quirynen, Moritz Diehl, Stefano Di Cairano

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Mixed-integer quadratic programs (MIQPs) are a versatile way of formulating vehicle decision making and motion planning problems, where the prediction model is a hybrid dynamical system that involves both discrete and continuous decision variables. However, even the most advanced MIQP solvers can hardly account for the challenging requirements of automotive embedded platforms. Thus, we use machine learning to simplify and hence speed up optimization. Our work builds on recent ideas for solving MIQPs in real-time by training a neural network to predict the optimal values of integer variables and solving the remaining problem by online quadratic programming. Specifically, we propose a recurrent permutation equivariant deep set that is particularly suited for imitating MIQPs that involve many obstacles, which is often the major source of computational burden in motion planning problems. Our framework comprises also a feasibility projector that corrects infeasible predictions of integer variables and considerably increases the likelihood of computing a collision-free trajectory. We evaluate the performance, safety and real-time feasibility of decision-making for autonomous driving using the proposed approach on realistic multi-lane traffic scenarios with interactive agents in SUMO simulations.
[45] arXiv:2405.08199 (cross-list from cs.LG) [pdf, ps, other]: Title: Modeling of Time-varying Wireless Communication Channel with Fading and Shadowing

Lee Youngmin, Ma Xiaomin, Lang S.I.D Andrew

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

The real-time quantification of the effect of a wireless channel on the transmitting signal is crucial for the analysis and the intelligent design of wireless communication systems for various services. Recent mechanisms to model channel characteristics independent of coding, modulation, signal processing, etc., using deep learning neural networks are promising solutions. However, the current approaches are neither statistically accurate nor able to adapt to the changing environment. In this paper, we propose a new approach that combines a deep learning neural network with a mixture density network model to derive the conditional probability density function (PDF) of receiving power given a communication distance in general wireless communication systems. Furthermore, a deep transfer learning scheme is designed and implemented to allow the channel model to dynamically adapt to changes in communication environments. Extensive experiments on Nakagami fading channel model and Log-normal shadowing channel model with path loss and noise show that the new approach is more statistically accurate, faster, and more robust than the previous deep learning-based channel models.
[46] arXiv:2405.08237 (cross-list from cs.CL) [pdf, ps, html, other]: Title: A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech

Oli Danyi Liu, Hao Tang, Naomi Feldman, Sharon Goldwater

Comments: Accepted to CogSci 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Speech perception involves storing and integrating sequentially presented items. Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech that may facilitate this temporal processing. In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech with the learning objective of predicting upcoming acoustics. Our simulations revealed temporal dynamics similar to those in brain signals, implying that these properties can arise without linguistic knowledge. Another property shared between brains and the model is that the encoding patterns of phonemes support some degree of cross-context generalization. However, we found evidence that the effectiveness of these generalizations depends on the specific contexts, which suggests that this analysis alone is insufficient to support the presence of context-invariant encoding.
[47] arXiv:2405.08295 (cross-list from cs.CL) [pdf, ps, other]: Title: SpeechVerse: A Large-scale Generalizable Audio Language Model

Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, David Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Comments: Single Column, 13 page

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while keeping the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
[48] arXiv:2405.08306 (cross-list from math.OC) [pdf, ps, other]: Title: Flight Path Optimization with Optimal Control Method

Gaofeng Su, Xi Cheng, Siyuan Feng, Ke Liu, Jilin Song, Jianan Chen, Chen Zhu, Hui Lin

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to define the dynamic model of the aircraft in accordance with the controllable inputs and wind disturbances. Then we will identify a precise objective in terms of optimization and implement an optimization program to solve it under the circumstances of simulated real flight situation. Finally, the optimization result is validated and discussed by different scenarios.
[49] arXiv:2405.08317 (cross-list from cs.CL) [pdf, ps, other]: Title: SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Comments: 9+6 pages, Submitted to ACL 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we design algorithms that can generate adversarial examples to jailbreak SLMs in both white-box and black-box attack settings without human involvement. Additionally, we propose countermeasures to thwart such jailbreaking attacks. Our models, trained on dialog data with speech instructions, achieve state-of-the-art performance on spoken question-answering task, scoring over 80% on both safety and helpfulness metrics. Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories. However, we demonstrate that our proposed countermeasures reduce the attack success significantly.
[50] arXiv:2405.08342 (cross-list from cs.SD) [pdf, ps, other]: Title: Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer

Whenty Ariyanti, Kai-Chun Liu, Kuan-Yu Chen, Yu Tsao

Comments: Published in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Journal-ref: 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (2023) 1-4

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Respiratory disease, the third leading cause of deaths globally, is considered a high-priority ailment requiring significant research on identification and treatment. Stethoscope-recorded lung sounds and artificial intelligence-powered devices have been used to identify lung disorders and aid specialists in making accurate diagnoses. In this study, audio-spectrogram vision transformer (AS-ViT), a new approach for identifying abnormal respiration sounds, was developed. The sounds of the lungs are converted into visual representations called spectrograms using a technique called short-time Fourier transform (STFT). These images are then analyzed using a model called vision transformer to identify different types of respiratory sounds. The classification was carried out using the ICBHI 2017 database, which includes various types of lung sounds with different frequencies, noise levels, and backgrounds. The proposed AS-ViT method was evaluated using three metrics and achieved 79.1% and 59.8% for 60:40 split ratio and 86.4% and 69.3% for 80:20 split ratio in terms of unweighted average recall and overall scores respectively for respiratory sound detection, surpassing previous state-of-the-art results.
[51] arXiv:2405.08343 (cross-list from cs.RO) [pdf, ps, other]: Title: Accuracy Evaluation of a Lightweight Analytic Vehicle Dynamics Model for Maneuver Planning

J.R. Ziehn, M. Ruf, M. Roschani, J. Beyerer

Comments: 9 pages, 13 figures

Journal-ref: 2020 5th International Conference on Robotics and Automation Engineering (ICRAE)

Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Numerical Analysis (math.NA)

Models for vehicle dynamics play an important role in maneuver planning for automated driving. They are used to derive trajectories from given control inputs, or to evaluate a given trajectory in terms of constraint violation or optimality criteria such as safety, comfort or ecology. Depending on the computation process, models with different assumptions and levels of detail are used; since maneuver planning usually has strong requirements for computation speed at a potentially high number of trajectory evaluations per planning cycle, most of the applied models aim to reduce complexity by implicitly or explicitly introducing simplifying assumptions. While evaluations show that these assumptions may be sufficiently valid under typical conditions, their effect has yet to be studied conclusively.
We propose a model for vehicle dynamics that is convenient for maneuver planning by supporting both an analytic approach of extracting parameters from a given trajectory, and a generative approach of establishing a trajectory from given control inputs. Both applications of the model are evaluated in real-world test drives under dynamic conditions, both on a closed-off test track and on public roads, and effects arising from the simplifying assumptions are analyzed.
[52] arXiv:2405.08396 (cross-list from cs.IT) [pdf, ps, other]: Title: Coupled-Band ESSFM for Low-Complexity DBP

Stella Civelli, Debi Pada Jana, Enrico Forestieri, Marco Secondini

Comments: The paper has been submitted for publication to ECOC 2024

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We propose a novel digital backpropagation (DBP) technique that combines perturbation theory, subband processing, and splitting ratio optimization. We obtain 0.23 dB, 0.47 dB, or 0.91 dB gains w.r.t. dispersion compensation with only 74, 161, or 681 real multiplications/2D-symbol, improving significantly on existing DBP techniques.
[53] arXiv:2405.08401 (cross-list from cs.RO) [pdf, ps, other]: Title: Realtime Global Optimization of a Fail-Safe Emergency Stop Maneuver for Arbitrary Electrical / Electronical Failures in Automated Driving

F. Duerr, J. Ziehn, R. Kohlhaas, M. Roschani, M. Ruf, J. Beyerer

Comments: 8 pages, 7 figures

Journal-ref: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)

Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Numerical Analysis (math.NA)

In the event of a critical system failures in auto-mated vehicles, fail-operational or fail-safe measures provide minimum guarantees for the vehicle's performance, depending on which of its subsystems remain operational. Various such methods have been proposed which, upon failure, use different remaining sets of operational subsystems to execute maneuvers that bring the vehicle into a safe state under different environmental conditions. One particular such method proposes a fail-safe emergency stop system that requires no particular electric or electronic subsystem to be available after failure, and still provides a basic situation-dependent emergency stop maneuver. This is achieved by preemptively setting parameters to a hydraulic / mechanical system prior to failure, which after failure executes the preset maneuver "blindly". The focus of this paper is the particular challenge of implementing a lightweight planning algorithm that can cope with the complex uncertainties of the given task while still providing a globally optimal solution at regular intervals, based on the perceived and predicted environment of the automated vehicle.
[54] arXiv:2405.08527 (cross-list from cs.LG) [pdf, ps, other]: Title: EEG-Features for Generalized Deepfake Detection

Arian Beckmann, Tilman Stephani, Felix Klotzsche, Yonghao Chen, Simon M. Hofmann, Arno Villringer, Michael Gaebler, Vadim Nikulin, Sebastian Bosse, Peter Eisert, Anna Hilsmann

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)

Since the advent of Deepfakes in digital media, the development of robust and reliable detection mechanism is urgently called for. In this study, we explore a novel approach to Deepfake detection by utilizing electroencephalography (EEG) measured from the neural processing of a human participant who viewed and categorized Deepfake stimuli from the FaceForensics++ datset. These measurements serve as input features to a binary support vector classifier, trained to discriminate between real and manipulated facial images. We examine whether EEG data can inform Deepfake detection and also if it can provide a generalized representation capable of identifying Deepfakes beyond the training domain. Our preliminary results indicate that human neural processing signals can be successfully integrated into Deepfake detection frameworks and hint at the potential for a generalized neural representation of artifacts in computer generated faces. Moreover, our study provides next steps towards the understanding of how digital realism is embedded in the human cognitive system, possibly enabling the development of more realistic digital avatars in the future.
[55] arXiv:2405.08567 (cross-list from cs.LG) [pdf, ps, other]: Title: Python-Based Reinforcement Learning on Simulink Models

Georg Schäfer, Max Schirl, Jakob Rehrl, Stefan Huber, Simon Hirlaender

Comments: Accepted at SMPS2024

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

This paper proposes a framework for training Reinforcement Learning agents using Python in conjunction with Simulink models. Leveraging Python's superior customization options and popular libraries like Stable Baselines3, we aim to bridge the gap between the established Simulink environment and the flexibility of Python for training bleeding edge agents. Our approach is demonstrated on the Quanser Aero 2, a versatile dual-rotor helicopter. We show that policies trained on Simulink models can be seamlessly transferred to the real system, enabling efficient development and deployment of Reinforcement Learning agents for control tasks. Through systematic integration steps, including C-code generation from Simulink, DLL compilation, and Python interface development, we establish a robust framework for training agents on Simulink models. Experimental results demonstrate the effectiveness of our approach, surpassing previous efforts and highlighting the potential of combining Simulink with Python for Reinforcement Learning research and applications.
[56] arXiv:2405.08569 (cross-list from cs.NI) [pdf, ps, other]: Title: Can 3GPP New Radio Non-Terrestrial Networks Meet the IMT-2020 Requirements for Satellite Radio Interface Technology?

Mikko Majamaa, Lauri Sormunen, Verneri Rönty, Henrik Martikainen, Jani Puttonen, Timo Hämäläinen

Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

The International Telecommunication Union defined the requirements for 5G in the International Mobile Telecommunications 2020 (IMT-2020) standard in 2017. Since then, advances in technology and standardization have made the ubiquitous deployment of 5G via satellite a practical possibility, for example, in locations where terrestrial networks (TNs) are not available. However, it may be difficult for satellite networks to achieve the same performance as TNs. To address this, the IMT-2020 requirements for satellite radio interface technology have recently been established. In this paper, these requirements are evaluated through system simulations for the 3rd Generation Partnership Project New Radio non-terrestrial networks with a low Earth orbit satellite. The focus is on the throughput, area traffic capacity, and spectral efficiency requirements. It is observed that the downlink (DL) requirements can be met for user equipment with 2 receive antenna elements. The results also reveal that frequency reuse factor 1 (FRF1) may outperform FRF3 in DL with a dual-antenna setup, which is a surprising finding since FRF3 is typically considered to outperform FRF1 due to better interference reduction. For uplink (UL), 1 transmit antenna is sufficient to meet the requirements by a relatively large margin - a promising result given that UL is generally more demanding.
[57] arXiv:2405.08577 (cross-list from cs.NI) [pdf, ps, other]: Title: Intelligent Control in 6G Open RAN: Security Risk or Opportunity?

Sanaz Soltani, Mohammad Shojafar, Ali Amanlou, Rahim Tafazolli

Comments: 36 pages, 14 figures, IEEE COMST (in review)

Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Performance (cs.PF); Systems and Control (eess.SY)

The Open Radio Access Network (Open RAN) framework, emerging as the cornerstone for Artificial Intelligence (AI)-enabled Sixth-Generation (6G) mobile networks, heralds a transformative shift in radio access network architecture. As the adoption of Open RAN accelerates, ensuring its security becomes critical. The RAN Intelligent Controller (RIC) plays a central role in Open RAN by improving network efficiency and flexibility. Nevertheless, it also brings about potential security risks that need careful scrutiny. Therefore, it is imperative to evaluate the current state of RIC security comprehensively. This assessment is essential to gain a profound understanding of the security considerations associated with RIC. This survey combines a comprehensive analysis of RAN security, tracing its evolution from 2G to 5G, with an in-depth exploration of RIC security, marking the first comprehensive examination of its kind in the literature. Real-world security incidents involving RIC are vividly illustrated, providing practical insights. The study evaluates the security implications of the RIC within the 6G Open RAN context, addressing security vulnerabilities, mitigation strategies, and potential enhancements. It aims to guide stakeholders in the telecom industry toward a secure and dependable telecommunications infrastructure. The article serves as a valuable reference, shedding light on the RIC's crucial role within the broader network infrastructure and emphasizing security's paramount importance. This survey also explores the promising security opportunities that the RIC presents for enhancing network security and resilience in the context of 6G mobile networks. It outlines open issues, lessons learned, and future research directions in the domain of intelligent control in 6G open RAN, facilitating a comprehensive understanding of this dynamic landscape.
[58] arXiv:2405.08590 (cross-list from math.OC) [pdf, ps, other]: Title: Accelerated Alternating Direction Method of Multipliers Gradient Tracking for Distributed Optimization

Eduardo Sebastián, Mauro Franceschelli, Andrea Gasparri, Eduardo Montijano, Carlos Sagüés

Comments: This paper has been accepted for publication at IEEE Control Systems Letters

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper presents a novel accelerated distributed algorithm for unconstrained consensus optimization over static undirected networks. The proposed algorithm combines the benefits of acceleration from momentum, the robustness of the alternating direction method of multipliers, and the computational efficiency of gradient tracking to surpass existing state-of-the-art methods in convergence speed, while preserving their computational and communication cost. First, we prove that, by applying momentum on the average dynamic consensus protocol over the estimates and gradient, we can study the algorithm as an interconnection of two singularly perturbed systems: the outer system connects the consensus variables and the optimization variables, and the inner system connects the estimates of the optimum and the auxiliary optimization variables. Next, we prove that, by adding momentum to the auxiliary dynamics, our algorithm always achieves faster convergence than the achievable linear convergence rate for the non-accelerated alternating direction method of multipliers gradient tracking algorithm case. Through simulations, we numerically show that our accelerated algorithm surpasses the existing accelerated and non-accelerated distributed consensus first-order optimization protocols in convergence speed.
[59] arXiv:2405.08596 (cross-list from cs.SD) [pdf, ps, other]: Title: EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark

Xiaohui Zhang, Jiangyan Yi, Jianhua Tao

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts as an effective tool for detecting newly emerged deepfake audio while maintaining performance on older types, lacks a well-constructed and user-friendly evaluation framework. To address this gap, we introduce EVDA, a benchmark for evaluating continual learning methods in deepfake audio detection. EVDA includes classic datasets from the Anti-Spoofing Voice series, Chinese fake audio detection series, and newly generated deepfake audio from models like GPT-4 and GPT-4o. It supports various continual learning techniques, such as Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), and recent methods like Regularized Adaptive Weight Modification (RAWM) and Radian Weight Modification (RWM). Additionally, EVDA facilitates the development of robust algorithms by providing an open interface for integrating new continual learning methods
[60] arXiv:2405.08654 (cross-list from cs.LG) [pdf, ps, other]: Title: Can we Defend Against the Unknown? An Empirical Study About Threshold Selection for Neural Network Monitoring

Khoi Tran Dang, Kevin Delmas, Jérémie Guiochet, Joris Guérin

Comments: 13 pages, 5 figures, 6 tables. To appear in the proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

With the increasing use of neural networks in critical systems, runtime monitoring becomes essential to reject unsafe predictions during inference. Various techniques have emerged to establish rejection scores that maximize the separability between the distributions of safe and unsafe predictions. The efficacy of these approaches is mostly evaluated using threshold-agnostic metrics, such as the area under the receiver operating characteristic curve. However, in real-world applications, an effective monitor also requires identifying a good threshold to transform these scores into meaningful binary decisions. Despite the pivotal importance of threshold optimization, this problem has received little attention. A few studies touch upon this question, but they typically assume that the runtime data distribution mirrors the training distribution, which is a strong assumption as monitors are supposed to safeguard a system against potentially unforeseen threats. In this work, we present rigorous experiments on various image datasets to investigate: 1. The effectiveness of monitors in handling unforeseen threats, which are not available during threshold adjustments. 2. Whether integrating generic threats into the threshold optimization scheme can enhance the robustness of monitors.
[61] arXiv:2405.08661 (cross-list from cs.LG) [pdf, ps, other]: Title: Gradient Estimation and Variance Reduction in Stochastic and Deterministic Models

Ronan Keane

Comments: cornell university dissertation

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)

It seems that in the current age, computers, computation, and data have an increasingly important role to play in scientific research and discovery. This is reflected in part by the rise of machine learning and artificial intelligence, which have become great areas of interest not just for computer science but also for many other fields of study. More generally, there have been trends moving towards the use of bigger, more complex and higher capacity models. It also seems that stochastic models, and stochastic variants of existing deterministic models, have become important research directions in various fields. For all of these types of models, gradient-based optimization remains as the dominant paradigm for model fitting, control, and more. This dissertation considers unconstrained, nonlinear optimization problems, with a focus on the gradient itself, that key quantity which enables the solution of such problems.
In chapter 1, we introduce the notion of reverse differentiation, a term which describes the body of techniques which enables the efficient computation of gradients. We cover relevant techniques both in the deterministic and stochastic cases. We present a new framework for calculating the gradient of problems which involve both deterministic and stochastic elements. In chapter 2, we analyze the properties of the gradient estimator, with a focus on those properties which are typically assumed in convergence proofs of optimization algorithms. Chapter 3 gives various examples of applying our new gradient estimator. We further explore the idea of working with piecewise continuous models, that is, models with distinct branches and if statements which define what specific branch to use.
[62] arXiv:2405.08679 (cross-list from cs.SD) [pdf, ps, other]: Title: Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

Comments: Self-supervision in Audio, Speech and Beyond workshop, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

This paper addresses the problem of self-supervised general-purpose audio representation learning. We explore the use of Joint-Embedding Predictive Architectures (JEPA) for this task, which consists of splitting an input mel-spectrogram into two parts (context and target), computing neural representations for each, and training the neural network to predict the target representations from the context representations. We investigate several design choices within this framework and study their influence through extensive experiments by evaluating our models on various audio classification benchmarks, including environmental sounds, speech and music downstream tasks. We focus notably on which part of the input data is used as context or target and show experimentally that it significantly impacts the model's quality. In particular, we notice that some effective design choices in the image domain lead to poor performance on audio, thus highlighting major differences between these two modalities.
[63] arXiv:2405.08691 (cross-list from cs.RO) [pdf, ps, other]: Title: Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search Planning

Jan-Hendrik Ewers, David Anderson, Douglas Thomson

Comments: Submitted to IROS 2024

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This study investigates the computational speed and accuracy of two numerical integration methods, cubature and sampling-based, for integrating an integrand over a 2D polygon. Using a group of rovers searching the Martian surface with a limited sensor footprint as a test bed, the relative error and computational time are compared as the area was subdivided to improve accuracy in the sampling-based approach. The results show that the sampling-based approach exhibits a $14.75\%$ deviation in relative error compared to cubature when it matches the computational performance at $100\%$. Furthermore, achieving a relative error below $1\%$ necessitates a $10000\%$ increase in relative time to calculate due to the $\mathcal{O}(N^2)$ complexity of the sampling-based method. It is concluded that for enhancing reinforcement learning capabilities and other high iteration algorithms, the cubature method is preferred over the sampling-based method.
[64] arXiv:2405.08711 (cross-list from cs.RO) [pdf, ps, html, other]: Title: Data-driven Force Observer for Human-Robot Interaction with Series Elastic Actuators using Gaussian Processes

Samuel Tesfazgi, Markus Keßler, Emilio Trigili, Armin Lederer, Sandra Hirche

Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Ensuring safety and adapting to the user's behavior are of paramount importance in physical human-robot interaction. Thus, incorporating elastic actuators in the robot's mechanical design has become popular, since it offers intrinsic compliance and additionally provide a coarse estimate for the interaction force by measuring the deformation of the elastic components. While observer-based methods have been shown to improve these estimates, they rely on accurate models of the system, which are challenging to obtain in complex operating environments. In this work, we overcome this issue by learning the unknown dynamics components using Gaussian process (GP) regression. By employing the learned model in a Bayesian filtering framework, we improve the estimation accuracy and additionally obtain an observer that explicitly considers local model uncertainty in the confidence measure of the state estimate. Furthermore, we derive guaranteed estimation error bounds, thus, facilitating the use in safety-critical applications. We demonstrate the effectiveness of the proposed approach experimentally in a human-exoskeleton interaction scenario.

[65] arXiv:2209.07548 (replaced) [pdf, ps, other]: Title: Open Set Recognition For Music Genre Classification

Kevin Liu, Julien DeMori, Kobi Abayomi

Comments: 9 pages, 5 figures, 4 tables

Subjects: Audio and Speech Processing (eess.AS); Optimization and Control (math.OC)

We explore segmentation of known and unknown genre classes using the open source GTZAN and FMA datasets. For each, we begin with best-case closed set genre classification, then we apply open set recognition methods. We offer an algorithm for the music genre classification task using OSR. We demonstrate the ability to retrieve known genres and as well identification of aural patterns for novel genres (not appearing in a training set). We conduct four experiments, each containing a different set of known and unknown classes, using the GTZAN and the FMA datasets to establish a baseline capacity for novel genre detection. We employ grid search on both OpenMax and softmax to determine the optimal total classification accuracy for each experimental setup, and illustrate interaction between genre labelling and open set recognition accuracy.
[66] arXiv:2307.03293 (replaced) [pdf, ps, other]: Title: CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images

Nicolás Gaggion, Candelaria Mosquera, Lucas Mansilla, Julia Mariel Saidman, Martina Aineseder, Diego H. Milone, Enzo Ferrante

Comments: The CheXmask dataset is publicly available at this https URL

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

The development of successful artificial intelligence models for chest X-ray analysis relies on large, diverse datasets with high-quality annotations. While several databases of chest X-ray images have been released, most include disease diagnosis labels but lack detailed pixel-level anatomical segmentation labels. To address this gap, we introduce an extensive chest X-ray multi-center segmentation dataset with uniform and fine-grain anatomical annotations for images coming from five well-known publicly available databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest, and VinDr-CXR, resulting in 657,566 segmentation masks. Our methodology utilizes the HybridGNet model to ensure consistent and high-quality segmentations across all datasets. Rigorous validation, including expert physician evaluation and automatic quality control, was conducted to validate the resulting masks. Additionally, we provide individualized quality indices per mask and an overall quality estimation per dataset. This dataset serves as a valuable resource for the broader scientific community, streamlining the development and assessment of innovative methodologies in chest X-ray analysis. The CheXmask dataset is publicly available at: this https URL
[67] arXiv:2308.00253 (replaced) [pdf, ps, other]: Title: Privacy and Security in Ubiquitous Integrated Sensing and Communication: Threats, Challenges and Future Directions

Kaiqian Qu, Jia Ye, Xuran Li, Shuaishuai Guo

Comments: to appear in IOTMAG

Subjects: Signal Processing (eess.SP)

Integrated sensing and communication (ISAC) technology is one of the featuring technologies of the next-generation communication systems. When sensing capability becomes ubiquitous, more information can be collected, which can facilitate many applications in intelligent transportation, unmanned aerial vehicle (UAV) surveillance and healthcare. However, it also faces many information privacy leakage and security issues. This article highlights the potential threats to privacy and security and the technical challenges to realizing private and secure ISAC. Three promising combating solutions including artificial intelligence (AI)-enabled schemes, friendly jamming and reconfigurable intelligent surface (RIS)-assisted design are provided to maintain user privacy and ensure information security. Case studies demonstrate their effectiveness.
[68] arXiv:2310.02700 (replaced) [pdf, ps, html, other]: Title: Insights of using Control Theory for minimizing Induced Seismicity in Underground Reservoirs

Diego Gutierrez-Oribio, Ioannis Stefanou

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Deep Geothermal Energy, Carbon Capture, and Storage and Hydrogen Storage have significant potential to meet the large-scale needs of the energy sector and reduce the CO$_2$ emissions. However, the injection of fluids into the earth's crust, upon which these activities rely, can lead to the formation of new seismogenic faults or the reactivation of existing ones, thereby causing earthquakes. In this study, we propose a novel approach based on control theory to address this issue. First, we obtain a simplified model of induced seismicity due to fluid injections in an underground reservoir using a diffusion equation in three dimensions. Then, we design a robust tracking control approach to force the seismicity rate to follow desired references. In this way, the induced seismicity is minimized while ensuring fluid circulation for the needs of renewable energy production and storage. The designed control guarantees the achievement of the control objectives even in the presence of system uncertainties and unknown dynamics. Finally, we present simulations of a simplified geothermal reservoir under different scenarios of energy demand to show the reliability and performance of the control approach, opening new perspectives for field experiments based on real-time regulators.
[69] arXiv:2311.08707 (replaced) [pdf, ps, other]: Title: K-BMPC: Derivative-based Koopman Bilinear Model Predictive Control for Tractor-Trailer Trajectory Tracking with Unknown Parameters

Zehao Wang, Han Zhang, Jingchuan Wang

Subjects: Systems and Control (eess.SY)

Nonlinear dynamics bring difficulties to controller design for control-affine systems such as tractor-trailer vehicles, especially when the parameters in the dynamics are unknown. To address this constraint, we propose a derivative-based lifting function construction method, show that the corresponding infinite dimensional Koopman bilinear model over the lifting function is equivalent to the original control-affine system. Further, we analyze the propagation and bounds of state prediction errors caused by the truncation in derivative order. The identified finite dimensional Koopman bilinear model would serve as predictive model in the next step. Koopman Bilinear Model Predictive control (K-BMPC) is proposed to solve the trajectory tracking problem. We linearize the bilinear model around the estimation of the lifted state and control input. Then the bilinear Model Predictive Control problem is approximated by a quadratic programming problem. Further, the estimation is updated at each iteration until the convergence is reached. Moreover, we implement our algorithm on a tractor-trailer system, taking into account the longitudinal and side slip effects. The open-loop simulation shows the proposed Koopman bilinear model captures the dynamics with unknown parameters and has good prediction performance. Closed-loop tracking results show the proposed K-BMPC exhibits elevated tracking precision with the commendable computational efficiency. The experimental results demonstrate the feasibility of K-BMPC.
[70] arXiv:2311.13080 (replaced) [pdf, ps, other]: Title: High-Speed Voltage Control in Active Distribution Systems with Smart Inverter Coordination and Deep Reinforcement Learning

Mohammad Golgol, Anamitra Pal

Subjects: Systems and Control (eess.SY)

The increasing penetration of renewable energy resources in distribution systems necessitates high-speed monitoring and control of voltage for ensuring reliable system operation. However, existing voltage control algorithms often make simplifying assumptions in their formulation, such as real-time availability of smart meter measurements (for monitoring), or real-time knowledge of every power injection information(for control).This paper leverages the recent advances made in highspeed state estimation for real-time unobservable distribution systems to formulate a deep reinforcement learning-based control algorithm that utilizes the state estimates alone to control the voltage of the entire system. The results obtained for a modified (renewable-rich) IEEE34-nodedistributionfeeder indicate that the proposed approach excels in monitoring and controlling voltage of active distribution systems.
[71] arXiv:2312.06472 (replaced) [pdf, ps, other]: Title: Dissipativity-Based Decentralized Co-Design of Distributed Controllers and Communication Topologies for Vehicular Platoons

Shirantha Welikala, Zihao Song, Panos J. Antsaklis, Hai Lin

Comments: 16 pages, 14 figures, one manuscript has been submitted to Automatica

Subjects: Systems and Control (eess.SY)

Vehicular platoons provide an appealing option for future transportation systems. Most of the existing work on platoons separated the design of the controller and its communication topologies. However, it is beneficial to design both the platooning controller and the communication topology simultaneously, i.e., controller and topology co-design, especially in the cases of platoon splitting and merging. We are, therefore, motivated to propose a co-design framework for vehicular platoons that maintains both the compositionality of the controller and the string stability of the platoon, which enables the merging and splitting of the vehicles in a platoon. To this end, we first formulate the co-design problem as a centralized linear matrix inequality (LMI) problem and then decompose it using Sylvester's criterion to obtain a set of smaller decentralized LMI problems that can be solved sequentially at individual vehicles in the platoon. Moreover, in the formulated decentralized LMI problems, we encode a specifically derived local LMI to enforce the $L_2$ stability of the closed-loop platooning system, further implying the $L_2$ weak string stability of the vehicular platoon. Finally, to validate the proposed co-design method and its features in terms of merging/splitting, we provide an extensive collection of simulation results generated from a specifically developed simulation framework. Available in GitHub: this http URL that we have made publicly available.
[72] arXiv:2401.02902 (replaced) [pdf, ps, other]: Title: State Derivative Normalization for Continuous-Time Deep Neural Networks

Jonas Weigand, Gerben I. Beintema, Jonas Ulmen, Daniel Görges, Roland Tóth, Maarten Schoukens, Martin Ruskowski

Comments: This work has been accepted for presentation at the 20th IFAC Symposium on System Identification 2024

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

The importance of proper data normalization for deep neural networks is well known. However, in continuous-time state-space model estimation, it has been observed that improper normalization of either the hidden state or hidden state derivative of the model estimate, or even of the time interval can lead to numerical and optimization challenges with deep learning based methods. This results in a reduced model quality. In this contribution, we show that these three normalization tasks are inherently coupled. Due to the existence of this coupling, we propose a solution to all three normalization challenges by introducing a normalization constant at the state derivative level. We show that the appropriate choice of the normalization constant is related to the dynamics of the to-be-identified system and we derive multiple methods of obtaining an effective normalization constant. We compare and discuss all the normalization strategies on a benchmark problem based on experimental data from a cascaded tanks system and compare our results with other methods of the identification literature.
[73] arXiv:2401.05819 (replaced) [pdf, ps, other]: Title: TAnet: A New Temporal Attention Network for EEG-based Auditory Spatial Attention Decoding with a Short Decision Window

Yuting Ding, Fei Chen

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Auditory spatial attention detection (ASAD) is used to determine the direction of a listener's attention to a speaker by analyzing her/his electroencephalographic (EEG) signals. This study aimed to further improve the performance of ASAD with a short decision window (i.e., <1 s) rather than with long decision windows ranging from 1 to 5 seconds in previous studies. An end-to-end temporal attention network (i.e., TAnet) was introduced in this work. TAnet employs a multi-head attention (MHA) mechanism, which can more effectively capture the interactions among time steps in collected EEG signals and efficiently assign corresponding weights to those EEG time steps. Experiments demonstrated that, compared with the CNN-based method and recent ASAD methods, TAnet provided improved decoding performance in the KUL dataset, with decoding accuracies of 92.4% (decision window 0.1 s), 94.9% (0.25 s), 95.1% (0.3 s), 95.4% (0.4 s), and 95.5% (0.5 s) with short decision windows (i.e., <1 s). As a new ASAD model with a short decision window, TAnet can potentially facilitate the design of EEG-controlled intelligent hearing aids and sound recognition systems.
[74] arXiv:2401.10389 (replaced) [pdf, ps, other]: Title: Inverse Problem Approach to Aberration Correction for in vivo Transcranial Imaging Based on a Sparse Representation of Contrast-enhanced Ultrasound Data

Paul Xing, Antoine Malescot, Eric Martineau, Ravi Rungta, Jean Provost

Subjects: Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)

Transcranial ultrasound imaging is currently limited by attenuation and aberration induced by the skull. First used in contrast-enhanced ultrasound (CEUS), highly echoic microbubbles allowed for the development of novel imaging modalities such as ultrasound localization microscopy (ULM). Herein, we develop an inverse problem approach to aberration correction (IPAC) that leverages the sparsity of microbubble signals. We propose to use the \textit{a priori} knowledge of the medium based upon microbubble localization and wave propagation to build a forward model to link the measured signals directly to the aberration function. A standard least-squares inversion is then used to retrieve the aberration function. We first validated IPAC on simulated data of a vascular network using plane wave as well as divergent wave emissions. We then evaluated the reproducibility of IPAC \textit{in vivo} in 5 mouse brains. We showed that aberration correction improved the contrast of CEUS images by 4.6 dB. For ULM images, IPAC yielded sharper vessels, reduced vessel duplications, and improved the resolution from 21.1 $\mu$m to 18.3 $\mu$m. Aberration correction also improved hemodynamic quantification for velocity magnitude and flow direction.
[75] arXiv:2402.01130 (replaced) [pdf, ps, other]: Title: convSeq: Fast and Scalable Method for Detecting Patterns in Spike Data

Roman Koshkin, Tomoki Fukai

Comments: This paper has been accepted to ICML 2024

Subjects: Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)

Spontaneous neural activity, crucial in memory, learning, and spatial navigation, often manifests itself as repetitive spatiotemporal patterns. Despite their importance, analyzing these patterns in large neural recordings remains challenging due to a lack of efficient and scalable detection methods. Addressing this gap, we introduce convSeq, an unsupervised method that employs backpropagation for optimizing spatiotemporal filters that effectively identify these neural patterns. Our method's performance is validated on various synthetic data and real neural recordings, revealing spike sequences with unprecedented scalability and efficiency. Significantly surpassing existing methods in speed, convSeq sets a new standard for analyzing spontaneous neural activity, potentially advancing our understanding of information processing in neural circuits.
[76] arXiv:2402.17050 (replaced) [pdf, ps, other]: Title: Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

Kathy Jang, Nathan Lichtlé, Eugene Vinitsky, Adit Shah, Matthew Bunting, Matthew Nice, Benedetto Piccoli, Benjamin Seibold, Daniel B. Work, Maria Laura Delle Monache, Jonathan Sprinkle, Jonathan W. Lee, Alexandre M. Bayen

Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

In this article, we explore the technical details of the reinforcement learning (RL) algorithms that were deployed in the largest field test of automated vehicles designed to smooth traffic flow in history as of 2023, uncovering the challenges and breakthroughs that come with developing RL controllers for automated vehicles. We delve into the fundamental concepts behind RL algorithms and their application in the context of self-driving cars, discussing the developmental process from simulation to deployment in detail, from designing simulators to reward function shaping. We present the results in both simulation and deployment, discussing the flow-smoothing benefits of the RL controller. From understanding the basics of Markov decision processes to exploring advanced techniques such as deep RL, our article offers a comprehensive overview and deep dive of the theoretical foundations and practical implementations driving this rapidly evolving field. We also showcase real-world case studies and alternative research projects that highlight the impact of RL controllers in revolutionizing autonomous driving. From tackling complex urban environments to dealing with unpredictable traffic scenarios, these intelligent controllers are pushing the boundaries of what automated vehicles can achieve. Furthermore, we examine the safety considerations and hardware-focused technical details surrounding deployment of RL controllers into automated vehicles. As these algorithms learn and evolve through interactions with the environment, ensuring their behavior aligns with safety standards becomes crucial. We explore the methodologies and frameworks being developed to address these challenges, emphasizing the importance of building reliable control systems for automated vehicles.
[77] arXiv:2403.15528 (replaced) [pdf, ps, other]: Title: Evaluating GPT-4 with Vision on Detection of Radiological Findings on Chest Radiographs

Yiliang Zhou, Hanley Ong, Patrick Kennedy, Carol Wu, Jacob Kazam, Keith Hentel, Adam Flanders, George Shih, Yifan Peng

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The study examines the application of GPT-4V, a multi-modal large language model equipped with visual recognition, in detecting radiological findings from a set of 100 chest radiographs and suggests that GPT-4V is currently not ready for real-world diagnostic usage in interpreting chest radiographs.
[78] arXiv:2404.08212 (replaced) [pdf, ps, other]: Title: Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography

Hika Barki, Wan-Young Chung

Comments: The paper is being withdrawn because we have identified substantial issues with the data analysis process. To ensure the integrity and accuracy of our findings, we are re-evaluating the data and will resubmit the paper after thorough revisions

Subjects: Signal Processing (eess.SP)

Mental stress is a prevalent condition that can have negative impacts on one's health. Early detection and treatment are crucial for preventing related illnesses and maintaining overall wellness. This study presents a new method for identifying mental stress using a wearable biosensor worn in the ear. Data was gathered from 14 participants in a controlled environment using stress-inducing tasks such as memory and math tests. The raw photoplethysmography data was then processed by filtering, segmenting, and transforming it into scalograms using a continuous wavelet transform (CWT) which are based on two different mother wavelets, namely, a generalized Morse wavelet and the analytic Morlet (Gabor) wavelet. The scalograms were then passed through a convolutional neural network classifier, GoogLeNet, to classify the signals as stressed or non-stressed. The method achieved an outstanding result using the generalized Morse wavelet with an accuracy of 91.02% and an F1-score of 90.95%. This method demonstrates promise as a reliable tool for early detection and treatment of mental stress by providing real-time monitoring and allowing for preventive measures to be taken before it becomes a serious issue.
[79] arXiv:2404.17742 (replaced) [pdf, ps, other]: Title: Segmentation Quality and Volumetric Accuracy in Medical Imaging

Zheyuan Zhang, Ulas Bagci

Comments: Data used in the paper contains some privacy issue in medical image. Some proper citations are also missing

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Current medical image segmentation relies on the region-based (Dice, F1-score) and boundary-based (Hausdorff distance, surface distance) metrics as the de-facto standard. While these metrics are widely used, they lack a unified interpretation, particularly regarding volume agreement. Clinicians often lack clear benchmarks to gauge the "goodness" of segmentation results based on these metrics. Recognizing the clinical relevance of volumetry, we utilize relative volume prediction error (vpe) to directly assess the accuracy of volume predictions derived from segmentation tasks. Our work integrates theoretical analysis and empirical validation across diverse datasets. We delve into the often-ambiguous relationship between segmentation quality (measured by Dice) and volumetric accuracy in clinical practice. Our findings highlight the critical role of incorporating volumetric prediction accuracy into segmentation evaluation. This approach empowers clinicians with a more nuanced understanding of segmentation performance, ultimately improving the interpretation and utility of these metrics in real-world healthcare settings.
[80] arXiv:2404.19463 (replaced) [pdf, ps, other]: Title: Enhancing Physical Layer Security with Deep SIMO Auto-Encoder and RF Impairments Modeling

Abdullahi Mohammad, Mahmoud Tukur Kabir, Mikko Valkama, Bo Tan

Comments: 6 pages, 5 figures, conference

Subjects: Signal Processing (eess.SP)

This paper presents a novel approach to achieving secure wireless communication by leveraging the inherent characteristics of wireless channels through end-to-end learning using a single-input-multiple-output (SIMO) autoencoder (AE). To ensure a more realistic signal transmission, we derive the signal model that captures all radio frequency (RF) hardware impairments to provide reliable and secure communication. Performance evaluations against traditional linear decoders, such as zero-forcing (ZR) and linear minimum mean square error (LMMSE), and the optimal nonlinear decoder, maximum likelihood (ML), demonstrate that the AE-based SIMO model exhibits superior bit error rate (BER) performance, but with a substantial gap even in the presence of RF hardware impairments. Additionally, the proposed model offers enhanced security features, preventing potential eavesdroppers from intercepting transmitted information and leveraging RF impairments for augmented physical layer security and device identification. These findings underscore the efficacy of the proposed end-to-end learning approach in achieving secure and robust wireless communication.
[81] arXiv:2405.02126 (replaced) [pdf, ps, other]: Title: Multipath-based SLAM with Cooperation and Map Fusion in MIMO Systems

Erik Leitinger, Lukas Wielandner, Alexander Venus, Klaus Witrisal

Comments: 8 pages

Subjects: Signal Processing (eess.SP)

Multipath-based simultaneous localization and mapping (MP-SLAM) is a promising approach in wireless networks for obtaining position information of transmitters and receivers as well as information on the propagation environment. MP-SLAM models specular reflections of radio frequency (RF) signals at flat surfaces as virtual anchors (VAs), the mirror images of base stations (BSs). Conventional methods for MP-SLAM consider a single mobile terminal (MT) which has to be localized. The availability of additional MTs paves the way for utilizing additional information in the scenario. Specifically enabling MTs to exchange information allows for data fusion over different observations of VAs made by different MTs. Furthermore, cooperative localization becomes possible in addition to multipathbased localization. Utilizing this additional information enables more robust mapping and higher localization accuracy.
[82] arXiv:2405.04610 (replaced) [pdf, ps, other]: Title: Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification

Mukaffi Bin Moin, Fatema Tuj Johora Faria, Swarnajit Saha, Busra Kamal Rafa, Mohammad Shafiul Alam

Comments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold standard, although time-consuming and vulnerable to inter-observer mistakes. Limited access to high-end technology further limits patients' ability to receive immediate medical care and diagnosis. Recent advances in deep learning have generated interest in its application to medical imaging analysis, specifically the use of histopathological images to diagnose lung and colon cancer. The goal of this investigation is to use and adapt existing pre-trained CNN-based models, such as Xception, DenseNet201, ResNet101, InceptionV3, DenseNet121, DenseNet169, ResNet152, and InceptionResNetV2, to enhance classification through better augmentation strategies. The results show tremendous progress, with all eight models reaching impressive accuracy ranging from 97% to 99%. Furthermore, attention visualization techniques such as GradCAM, GradCAM++, ScoreCAM, Faster Score-CAM, and LayerCAM, as well as Vanilla Saliency and SmoothGrad, are used to provide insights into the models' classification decisions, thereby improving interpretability and understanding of malignant and benign image classification.
[83] arXiv:2405.05236 (replaced) [pdf, ps, html, other]: Title: Stability and Performance Analysis of Discrete-Time ReLU Recurrent Neural Networks

Sahel Vahedi Noori, Bin Hu, Geir Dullerud, Peter Seiler

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)

This paper presents sufficient conditions for the stability and $\ell_2$-gain performance of recurrent neural networks (RNNs) with ReLU activation functions. These conditions are derived by combining Lyapunov/dissipativity theory with Quadratic Constraints (QCs) satisfied by repeated ReLUs. We write a general class of QCs for repeated RELUs using known properties for the scalar ReLU. Our stability and performance condition uses these QCs along with a "lifted" representation for the ReLU RNN. We show that the positive homogeneity property satisfied by a scalar ReLU does not expand the class of QCs for the repeated ReLU. We present examples to demonstrate the stability / performance condition and study the effect of the lifting horizon.
[84] arXiv:2405.07443 (replaced) [pdf, ps, other]: Title: Minimum-Variance Recursive State Estimation for 2-D Systems: When Asynchronous Multi-Channel Delays meet Energy Harvesting Constraints

Yu Chen, Wei Wang

Subjects: Systems and Control (eess.SY)

This paper is concerned with the state estimation problem for two-dimensional systems with asynchronous multichannel delays and energy harvesting constraints. In the system, each smart sensor has a certain probability of harvesting energy from the external environment, the authorized transmission between the sensor and the remote filter is contingent upon the current energy level of the sensor, which results in intermittent transmission of observation information. Addressing the issue of incomplete observation information due to asynchronous multi-channel delays, a novel approach for observation partition reconstruction is proposed to convert the delayed activated observation sequences into equivalent delay-free activated observation sequences. Through generating spatial equivalency validation, it is found that the reconstructed delay-free activated observation sequences contain the same information as the original delayed activated observation sequences. Based on the reconstructed activated observation sequence and activated probability, a novel unbiased h+1-step recursive estimator is constructed. Then, the evolution of the probability distribution of the energy level is discussed. The estimation gains are obtained by minimizing the filtering error covariance. Subsequently, through parameter assumptions, a uniform lower bound and a recursive upper bound for the filtering error covariance are presented. And the monotonicity analysis of activated probability on estimation performance is given. Finally, the effectiveness of the proposed estimation scheme is verified through a numerical simulation example.
[85] arXiv:2405.07890 (replaced) [pdf, ps, other]: Title: Subspace-Informed Matrix Completion

Hamideh.Sadat Fazael Ardakani, Sajad Daei, Arash Amini, Mikael Skoglund, Gabor Fodor

Comments: arXiv admin note: text overlap with arXiv:2111.00235

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior information is oftentimes the low-rank nature of the ground-truth matrix, in various real-world scenarios, additional knowledge about the ground-truth low-rank matrix is available. For instance, in collaborative filtering, Netflix problem, and dynamic channel estimation in wireless communications, we have partial or full knowledge about the signal subspace in advance. Specifically, we are aware of some subspaces that form multiple angles with the column and row spaces of the ground-truth matrix. Leveraging this valuable information has the potential to significantly reduce the required number of observations. To this end, we introduce a multi-weight nuclear norm optimization problem that concurrently promotes the low-rank property as well the information about the available subspaces. The proposed weights are tailored to penalize each angle corresponding to each basis of the prior subspace independently. We further propose an optimal weight selection strategy by minimizing the coherence parameter of the ground-truth matrix, which is equivalent to minimizing the required number of observations. Simulation results validate the advantages of incorporating multiple weights in the completion procedure. Specifically, our proposed multi-weight optimization problem demonstrates a substantial reduction in the required number of observations compared to the state-of-the-art methods.
[86] arXiv:2209.07618 (replaced) [pdf, ps, other]: Title: Differentiable Bilevel Programming for Stackelberg Congestion Games

Jiayang Li, Jing Yu, Qianni Wang, Boyi Liu, Zhaoran Wang, Yu Marco Nie

Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

In a Stackelberg congestion game (SCG), a leader aims to maximize their own gain by anticipating and manipulating the equilibrium state at which the followers settle by playing a congestion game. Often formulated as bilevel programs, large-scale SCGs are well known for their intractability and complexity. Here, we attempt to tackle this computational challenge by marrying traditional methodologies with the latest differentiable programming techniques in machine learning. The core idea centers on replacing the lower-level equilibrium problem with a smooth evolution trajectory defined by the imitative logit dynamic (ILD), which we prove converges to the equilibrium of the congestion game under mild conditions. Building upon this theoretical foundation, we propose two new local search algorithms for SCGs. The first is a gradient descent algorithm that obtains the derivatives by unrolling ILD via differentiable programming. Thanks to the smoothness of ILD, the algorithm promises both efficiency and scalability. The second algorithm adds a heuristic twist by cutting short the followers' evolution trajectory. Behaviorally, this means that, instead of anticipating the followers' best response at equilibrium, the leader seeks to approximate that response by only looking ahead a limited number of steps. Our numerical experiments are carried out over various instances of classic SCG applications, ranging from toy benchmarks to large-scale real-world examples. The results show the proposed algorithms are reliable and scalable local solvers that deliver high-quality solutions with greater regularity and significantly less computational effort compared to the many incumbents included in our study.
[87] arXiv:2302.01924 (replaced) [pdf, ps, html, other]: Title: CLASH: Contrastive learning through alignment shifting to extract stimulus information from EEG

Bernd Accou, Hugo Van hamme, Tom Francart

Subjects: Neurons and Cognition (q-bio.NC); Signal Processing (eess.SP)

Stimulus-evoked EEG data has a notoriously low signal-to-noise ratio and high inter-subject variability. We propose a novel paradigm for the self-supervised extraction of stimulus-related brain response data: a model is trained to extract similar information between two time-aligned segments of EEG in response to the same stimulus. The extracted information can subsequently be used to obtain better results in downstream tasks that utilize the response to the stimulus. We show the efficacy of our method for a downstream task of decoding the speech envelope from auditory EEG. Our method outperforms other state-of-the-art denoising techniques, improving reconstruction scores by 45\%. Additionally, we show that in contrast to the baseline denoising techniques, our method can be used with data of unseen subjects and stimuli without retraining, improving decoding performance by 19\% and 34\% over raw EEG for two holdout datasets. Finally, the last experiment reveals that the accuracies obtained in the CLASH paradigm are significantly correlated with the percentile of obtained reconstruction correlation on the null distribution. In general, we showed that the proposed paradigm is suitable to train deep learning models to extract stimulus information from EEG while being stimulus feature agnostic.
[88] arXiv:2303.05240 (replaced) [pdf, ps, other]: Title: Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation

Xingzhe Su, Wenwen Qiang, Jie Hu, Fengge Wu, Changwen Zheng, Fuchun Sun

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Generative adversarial networks (GANs) have achieved remarkable progress in the natural image field. However, when applying GANs in the remote sensing (RS) image generation task, an extraordinary phenomenon is observed: the GAN model is more sensitive to the size of training data for RS image generation than for natural image generation. In other words, the generation quality of RS images will change significantly with the number of training categories or samples per category. In this paper, we first analyze this phenomenon from two kinds of toy experiments and conclude that the amount of feature information contained in the GAN model decreases with reduced training data. Then we establish a structural causal model (SCM) of the data generation process and interpret the generated data as the counterfactuals. Based on this SCM, we theoretically prove that the quality of generated images is positively correlated with the amount of feature information. This provides insights for enriching the feature information learned by the GAN model during training. Consequently, we propose two innovative adjustment schemes, namely Uniformity Regularization (UR) and Entropy Regularization (ER), to increase the information learned by the GAN model at the distributional and sample levels, respectively. We theoretically and empirically demonstrate the effectiveness and versatility of our methods. Extensive experiments on three RS datasets and two natural datasets show that our methods outperform the well-established models on RS image generation tasks. The source code is available at this https URL.
[89] arXiv:2304.00999 (replaced) [pdf, ps, other]: Title: Bandits for Sponsored Search Auctions under Unknown Valuation Model: Case Study in E-Commerce Advertising

Danil Provodin, Jérémie Joudioux, Eduard Duryev

Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

This paper presents a bidding system for sponsored search auctions under an unknown valuation model. This formulation assumes that the bidder's value is unknown, evolving arbitrarily, and observed only upon winning an auction. Unlike previous studies, we do not impose any assumptions on the nature of feedback and consider the problem of bidding in sponsored search auctions in its full generality. Our system is based on a bandit framework that is resilient to the black-box auction structure and delayed and batched feedback. To validate our proposed solution, we conducted a case study at Zalando, a leading fashion e-commerce company. We outline the development process and describe the promising outcomes of our bandits-based approach to increase profitability in sponsored search auctions. We discuss in detail the technical challenges that were overcome during the implementation, shedding light on the mechanisms that led to increased profitability.
[90] arXiv:2309.02340 (replaced) [pdf, ps, other]: Title: Local Padding in Patch-Based GANs for Seamless Infinite-Sized Texture Synthesis

Alhasan Abdellatif, Ahmed H. Elsheikh, Hannah Menke

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Texture models based on Generative Adversarial Networks (GANs) use zero-padding to implicitly encode positional information of the image features. However, when extending the spatial input to generate images at large sizes, zero-padding can often lead to degradation of quality due to the incorrect positional information at the center of the image and limit the diversity within the generated images. In this paper, we propose a novel approach for generating stochastic texture images at large arbitrary sizes using GANs model that is based on patch-by-patch generation. Instead of zero-padding, the model uses \textit{local padding} in the generator that shares border features between the generated patches; providing positional context and ensuring consistency at the boundaries. The proposed models are trainable on a single texture image and have a constant GPU scalability with respect to the output image size, and hence can generate images of infinite sizes. We show in the experiments that our method has a significant advancement beyond existing texture models in terms of the quality and diversity of the generated textures. Furthermore, the implementation of local padding in the state-of-the-art super-resolution models effectively eliminates tiling artifacts enabling large-scale super-resolution. Our code is available at \url{this https URL
[91] arXiv:2402.02807 (replaced) [pdf, ps, other]: Title: Are Sounds Sound for Phylogenetic Reconstruction?

Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis

Comments: Paper accepted for SIGTYP (2024): Häuser, Luise; Jäger, Gerhard; List, Johann-Mattis; Rama, Taraka; and Stamatakis, Alexandros (2024): Are sounds sound for phylogenetic reconstruction? In: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (SIGTYP 2024)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.
[92] arXiv:2402.15151 (replaced) [pdf, ps, other]: Title: Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

Comments: An Erratum was added on the last page of this paper

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)

In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM), to maximize the context modeling ability by bringing the overwhelming power of LLMs. Specifically, VSP-LLM is designed to perform multi-tasks of visual speech recognition and translation, where the given instructions control the type of task. The input video is mapped to the input latent space of an LLM by employing a self-supervised visual speech model. Focused on the fact that there is redundant information in input frames, we propose a novel deduplication method that reduces the embedded visual features by employing visual speech units. Through the proposed deduplication and Low Rank Adaptation (LoRA), VSP-LLM can be trained in a computationally efficient manner. In the translation dataset, the MuAViC benchmark, we demonstrate that VSP-LLM trained on just 30 hours of labeled data can more effectively translate lip movements compared to the recent model trained with 433 hours of data.
[93] arXiv:2403.03104 (replaced) [pdf, ps, other]: Title: Low-rank approximated Kalman-Bucy filters using Oja's principal component flow for linear time-invariant systems

Daiki Tsuzuki, Kentaro Ohki

Comments: 6 pages, fixed typographical errors and clarified some unclear statements

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

The Kalman-Bucy filter is extensively utilized across various applications. However, its computational complexity increases significantly in large-scale systems. To mitigate this challenge, a low-rank approximated Kalman--Bucy filter was proposed, comprising Oja's principal component flow and a low-dimensional Riccati differential equation. Previously, the estimation error was confirmed solely for linear time-invariant systems with a symmetric system matrix. This study extends the application by eliminating the constraint on the symmetricity of the system matrix and describes the equilibrium points of the Oja flow along with their stability for general matrices. In addition, the domain of attraction for a set of stable equilibrium points is estimated. Based on these findings, we demonstrate that the low-rank approximated Kalman--Bucy filter with a suitable rank maintains a bounded estimation error covariance matrix if the system is controllable and observable.
[94] arXiv:2403.18296 (replaced) [pdf, ps, other]: Title: GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication Paradigm

Chunhang Zheng, Kechao Cai

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. Moreover, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation.

Total of 94 entries

Showing up to 1000 entries per page: fewer | more | all

Electrical Engineering and Systems Science

New submissions for Wednesday, 15 May 2024 (showing 41 of 41 entries )

Cross submissions for Wednesday, 15 May 2024 (showing 23 of 23 entries )

Replacement submissions for Wednesday, 15 May 2024 (showing 30 of 30 entries )