Electrical Engineering and Systems Science
- [1] arXiv:2405.08919 [pdf, ps, html, other]
-
Title: Joint Instantaneous Amplitude-Frequency Analysis of Vibration Signals for Vibration-Based Condition Monitoring of Rolling BearingsSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
Vibrations of damaged bearings are manifested as modulations in the amplitude of the generated vibration signal, making envelope analysis an effective approach for discriminating between healthy and abnormal vibration patterns. Motivated by this, we introduce a low-complexity method for vibration-based condition monitoring (VBCM) of rolling bearings based on envelope analysis. In the proposed method, the instantaneous amplitude (envelope) and instantaneous frequency of the vibration signal are jointly utilized to facilitate three novel envelope representations: instantaneous amplitude-frequency mapping (IAFM), instantaneous amplitude-frequency correlation (IAFC), and instantaneous energy-frequency distribution (IEFD). Maintaining temporal information, these representations effectively capture energy-frequency variations that are unique to the condition of the bearing, thereby enabling the extraction of discriminative features with high sensitivity to variations in operational conditions. Accordingly, six new highly discriminative features are engineered from these representations, capturing and characterizing their shapes. The experimental results show outstanding performance in detecting and diagnosing various fault types, demonstrating the effectiveness of the proposed method in capturing unique variations in energy and frequency between healthy and faulty bearings. Moreover, the proposed method has moderate computational complexity, meeting the requirements of real-time applications. Further, the Python code of the proposed method is made public to support collaborative research efforts and ensure the reproducibility of the presented work
- [2] arXiv:2405.08949 [pdf, ps, html, other]
-
Title: Task-Oriented Mulsemedia Communication using Unified Perceiver and Conformal Prediction in 6G Metaverse SystemsSubjects: Signal Processing (eess.SP)
The growing prominence of extended reality (XR), holographic-type communications, and metaverse demands truly immersive user experiences by using many sensory modalities, including sight, hearing, touch, smell, taste, etc. Additionally, the widespread deployment of sensors in areas such as agriculture, manufacturing, and smart homes is generating a diverse array of sensory data. A new media format known as multisensory media (mulsemedia) has emerged, which incorporates a wide range of sensory modalities beyond the traditional visual and auditory media. 6G wireless systems are envisioned to support the internet of senses, making it crucial to explore effective data fusion and communication strategies for mulsemedia. In this paper, we introduce a task-oriented multi-task mulsemedia communication system named MuSeCo, which is developed using unified Perceiver models and Conformal Prediction. This unified model can accept any sensory input and efficiently extract latent semantic features, making it adaptable for deployment across various Artificial Intelligence of Things (AIoT) devices. Conformal Prediction is employed for modality selection and combination, enhancing task accuracy while minimizing data communication overhead. The model has been trained using six sensory modalities across four classification tasks. Simulations and experiments demonstrate that MuSeCo can effectively select and combine sensory modalities, significantly reduce end-to-end communication latency and energy consumption, and maintain high accuracy in communication-constrained systems.
- [3] arXiv:2405.09004 [pdf, ps, html, other]
-
Title: Improving Sequential Market Clearing via Value-oriented Renewable Energy ForecastingSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Large penetration of renewable energy sources (RESs) brings huge uncertainty into the electricity markets. While existing deterministic market clearing fails to accommodate the uncertainty, the recently proposed stochastic market clearing struggles to achieve desirable market properties. In this work, we propose a value-oriented forecasting approach, which tactically determines the RESs generation that enters the day-ahead market. With such a forecast, the existing deterministic market clearing framework can be maintained, and the day-ahead and real-time overall operation cost is reduced. At the training phase, the forecast model parameters are estimated to minimize expected day-ahead and real-time overall operation costs, instead of minimizing forecast errors in a statistical sense. Theoretically, we derive the exact form of the loss function for training the forecast model that aligns with such a goal. For market clearing modeled by linear programs, this loss function is a piecewise linear function. Additionally, we derive the analytical gradient of the loss function with respect to the forecast, which inspires an efficient training strategy. A numerical study shows our forecasts can bring significant benefits of the overall cost reduction to deterministic market clearing, compared to quality-oriented forecasting approach.
- [4] arXiv:2405.09016 [pdf, ps, other]
-
Title: IoT-enabled Stability Chamber for the Pharmaceutical IndustrySubjects: Systems and Control (eess.SY)
A stability chamber is a critical piece of equipment for any pharmaceutical facility to retain the manufactured product for testing the stability and quality of the products over a certain period of time by keeping the products in different sets of environmental conditions. In this paper, we proposed an IoT-enabled stability chamber for the pharmaceutical industry. We developed four stability chambers by using the existing utilities of a manufacturing facility. The state-of-the-art automatic PID controlling system of Siemens S7-1200 PLC was used to control each chamber. Seven precise temperature and humidity sensors were used to monitor the environment of each chamber. PC-based Siemens WinCC Runtime Advanced visualization platform was used to visualize the data of the chamber which is FDA 21 CFR Part 11 Compliant. Sensor data of the chamber are stored in the database in a periodic manner and also have report generation features. This chamber also has an alarm management system. The critical alarms are automatically emailed to the user to take action. Additionally, an Internet of Things-based (IoT-based) application was also developed to monitor the sensor's data remotely using any client application.
- [5] arXiv:2405.09022 [pdf, ps, html, other]
-
Title: Multi-Objective Optimization-based Transmit Beamforming for Multi-Target and Multi-User MIMO-ISAC SystemsSubjects: Signal Processing (eess.SP)
Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi-target sensing mutual information (MI) is derived, along with its upper bound, which can be interpreted as the sum of individual single-target sensing MI. Additionally, this upper bound can be achieved by suppressing the cross-correlation among reflected signals from different targets, which aligns with the principles of adaptive MIMO radar. Then, we propose a multi-objective optimization framework based on the signal-to-interference-plus-noise ratio of each user and the tight upper bound of sensing MI, introducing the Pareto boundary to characterize the achievable communication-sensing performance boundary of the proposed ISAC system. To achieve the Pareto boundary, the max-min system utility function method is employed, while considering the fairness between communication users and radar targets. Subsequently, the bisection search method is employed to find a specific Pareto optimal solution by solving a series of convex feasible problems. Finally, simulation results validate that the proposed method achieves a better tradeoff between multi-user communication and multi-target sensing performance. Additionally, utilizing the tight upper bound of sensing MI as a performance metric can enhance the multi-target resolution capability and angle estimation accuracy.
- [6] arXiv:2405.09044 [pdf, ps, html, other]
-
Title: Modeling and Design Optimization of Looped Water Distribution Networks using MS Excel: Developing the Open-Source X-WHAT ModelMarcus Nóbrega Gomes Jr., Igor Matheus Benites, Salma M. Elsherif, Ahmad F. Taha, Marcio H. GiacomoniSubjects: Systems and Control (eess.SY)
Cost-effective water distribution network (WDN) design with acceptable pressure performance is crucial for the management of drinking water in cities. This paper presents a Microsoft Excel tool to model, simulate, and optimize WDNs with looped pipelines under steady-state incompressible flow simulations. Typically, the hardy-cross method is applied using spreadsheet calculations to estimate discharges. This method requires mass-conservative initial estimates and requires successive iterations to converge. In this paper, however, we develop an alternative method that uses the built-in solver capabilities of Excel, does not require initial mass-conservative estimation, and is free of flow corrections. The main objective of this paper is to develop an open-source accessible tool for simulating hydraulic networks also adapted for teaching and learning purposes. The governing equations and the mathematical basis for the hydraulic modeling of the system are mathematically described, considering the topology of the network, mass and energy conservation, cost of tank material, foundation, and cost of pumping energy to fill the tank. The use of this tool is encouraged at the undergraduate and graduate engineering levels, as it offers the opportunity to address complex concepts in a comprehensive way using a spreadsheet that does not require coding expertise. Hence, users can debug all cells and understand all equations used in the hydraulic model, as well as modify them. To demonstrate the model capabilities, three practical examples are presented, with the first one solved step by step, and the results are compared with the EPANET and with the results reported in the literature. Using the optimization method presented in this paper, it was possible to achieve a cost reduction of 151,790 USD (9.8% of the total cost) in a network that supplies a 44,416 population.
- [7] arXiv:2405.09053 [pdf, ps, html, other]
-
Title: Deep Learning-Based CSI Feedback for XL-MIMO Systems in the Near-Field DomainSubjects: Signal Processing (eess.SP)
In this paper, we consider an extremely large-scale massive multiple-input-multiple-output (XL-MIMO) system. As the scale of antenna arrays increases, the range of near-field communications also expands. In this case, the signals no longer exhibit planar wave characteristics but spherical wave characteristics in the near-field channel, which makes the channel state information (CSI) highly complex. Additionally, the increase of the antenna arrays scale also makes the size of the CSI matrix significantly increase. Therefore, CSI feedback in the near-field channel becomes highly challenging. To solve this issue, we propose a deep-learning (DL)-based ExtendNLNet that can compress the CSI, and further reduce the overhead of CSI feedback. In addition, we have introduced the Non-Local block to obtain a larger area of CSI features. Simulation results show that the proposed ExtendNLNet can significantly improve the CSI recovery quality compared to other DL-based methods.
- [8] arXiv:2405.09073 [pdf, ps, html, other]
-
Title: Interpretable attributed scattering center extracted via deep unfoldingComments: This paper has been accepted by IGARSS2024Subjects: Signal Processing (eess.SP)
Most existing sparse representation-based approaches for attributed scattering center (ASC) extraction adopt traditional iterative optimization algorithms, which suffer from lengthy computation times and limited precision. This paper presents a solution by introducing an interpretable network that can effectively and rapidly extract ASC via deep unfolding. Initially, we create a dictionary containing reliable prior knowledge and apply it to the iterative shrinkage-thresholding algorithm (ISTA). Then, we unfold ISTA into a neural network, employing it to autonomously and precisely optimize the hyperparameters. The interpretability of physics is retained by applying a dictionary with physical meaning. The experiments are conducted on multiple test sets with diverse data distributions and demonstrate the superior performance and generalizability of our method.
- [9] arXiv:2405.09077 [pdf, ps, html, other]
-
Title: Compressive Feature Selection for Remote Visual Multi-Task InferenceComments: 6 pages, 8 figures, IEEE ICME Workshop on Coding for MachinesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Deep models produce a number of features in each internal layer. A key problem in applications such as feature compression for remote inference is determining how important each feature is for the task(s) performed by the model. The problem is especially challenging in the case of multi-task inference, where the same feature may carry different importance for different tasks. In this paper, we examine how effective is mutual information (MI) between a feature and a model's task output as a measure of the feature's importance for that task. Experiments involving hard selection and soft selection (unequal compression) based on MI are carried out to compare the MI-based method with alternative approaches. Multi-objective analysis is provided to offer further insight.
- [10] arXiv:2405.09079 [pdf, ps, html, other]
-
Title: Integrated Monostatic Sensing and Full-Duplex Multiuser Communication for mmWave SystemsComments: 13 pages, 7 figuresSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In this paper, we propose a hybrid precoding/combining framework for communication-centric integrated sensing and full-duplex (FD) communication operating at mmWave bands. The designed precoders and combiners enable multiuser (MU) FD communication while simultaneously supporting monostatic sensing in a frequency-selective setting. The joint design of precoders and combiners involves the mitigation of self-interference (SI) caused by simultaneous transmission and reception at the FD base station (BS). Additionally, MU interference needs to be handled by the precoder/combiner design. The resulting optimization problem involves non-convex constraints since hybrid analog/digital architectures utilize networks of phase shifters. To solve the proposed problem, we separate the optimization of each precoder/combiner, and design each one of them while fixing the others. The precoders at the FD BS are designed by reformulating the communication and sensing constraints as signal-to-leakage-plus-noise ratio (SLNR) maximization problems that consider SI and MU interference as leakage. Furthermore, we design the frequency-flat analog combiner such that the residual SI at the FD BS is minimized under communication and sensing gain constraints. Finally, we design an interference-aware digital combining stage that separates MU signals and target reflections. The communication performance and sensing results show that the proposed framework efficiently supports both functionalities simultaneously.
- [11] arXiv:2405.09142 [pdf, ps, html, other]
-
Title: Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker DiarizationComments: Proceedings of Odyssey 2024: The Speaker and Language Recognition WorkshopSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Current speaker diarization systems rely on an external voice activity detection model prior to speaker embedding extraction on the detected speech segments. In this paper, we establish that the attention system of a speaker embedding extractor acts as a weakly supervised internal VAD model and performs equally or better than comparable supervised VAD systems. Subsequently, speaker diarization can be performed efficiently by extracting the VAD logits and corresponding speaker embedding simultaneously, alleviating the need and computational overhead of an external VAD model. We provide an extensive analysis of the behavior of the frame-level attention system in current speaker verification models and propose a novel speaker diarization pipeline using ECAPA2 speaker embeddings for both VAD and embedding extraction. The proposed strategy gains state-of-the-art performance on the AMI, VoxConverse and DIHARD III diarization benchmarks.
- [12] arXiv:2405.09163 [pdf, ps, html, other]
-
Title: DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State RepresentationSubjects: Systems and Control (eess.SY)
Variable speed limit (VSL) control is an established yet challenging problem to improve freeway traffic mobility and alleviate bottlenecks by customizing speed limits at proper locations based on traffic conditions. Recent advances in deep reinforcement learning (DRL) have shown promising results in solving VSL control problems by interacting with sophisticated environments. However, the modeling of these methods ignores the inherent graph structure of the traffic state which can be a key factor for more efficient VSL control. Graph structure can not only capture the static spatial feature but also the dynamic temporal features of traffic. Therefore, we propose the DVS-RG: DRL-based differential variable speed limit controller with graph state representation. DVS-RG provides distinct speed limits per lane in different locations dynamically. The road network topology and traffic information(e.g., occupancy, speed) are integrated as the state space of DVS-RG so that the spatial features can be learned. The normalization reward which combines efficiency and safety is used to train the VSL controller to avoid excessive inefficiencies or low safety. The results obtained from the simulation study on SUMO show that DRL-RG achieves higher traffic efficiency (the average waiting time reduced to 68.44\%) and improves the safety measures (the number of potential collision reduced by 15.93\% ) compared to state-of-the-art DRL methods.
- [13] arXiv:2405.09179 [pdf, ps, html, other]
-
Title: Integrated Sensing and Communication Enabled Cooperative Passive Sensing Using Mobile Communication SystemComments: 16 pages, 11 figures, Submitted to IEEE Transactions on Mobile ComputingSubjects: Signal Processing (eess.SP)
Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sensing, cooperative passive sensing, and cooperative active and passive sensing, where the multi-BS cooperative passive sensing has the advantages of low hardware modification cost and large sensing coverage. However, multi-BS cooperative passive sensing faces the challenges of synchronization offsets mitigation and sensing information fusion. To address these challenges, a non-line of sight (NLoS) and line of sight (LoS) signal cross-correlation (NLCC) method is proposed to mitigate carrier frequency offset (CFO) and time offset (TO). Besides, a symbol-level multi-BS sensing information fusion method is proposed. The discrete samplings of echo signals from multiple BSs are matched independently and coherent accumulated to improve sensing accuracy. Moreover, a lowcomplexity joint angle-of-arrival (AoA) and angle-of-departure (AoD) estimation method is proposed to reduce the computational complexity. Simulation results show that symbol-level multi-BS cooperative passive sensing scheme has an order of magnitude higher sensing accuracy than single-BS passive sensing. This work provides a reference for the research on multi-BS cooperative passive sensing.
- [14] arXiv:2405.09193 [pdf, ps, html, other]
-
Title: Autonomous Cooperative Levels of Multiple-Heterogeneous Unmanned Vehicle SystemsSubjects: Systems and Control (eess.SY)
As multiple and heterogenous unmanned vehicle systems continue to play an increasingly important role in addressing complex missions in the real world, the need for effective cooperation among unmanned vehicles becomes paramount. The concept of autonomous cooperation, wherein unmanned vehicles cooperate without human intervention or human control, offers promising avenues for enhancing the efficiency and adaptability of intelligence of multiple-heterogeneous unmanned vehicle systems. Despite the growing interests in this domain, as far as the authors are concerned, there exists a notable lack of comprehensive literature on defining explicit concept and classifying levels of autonomous cooperation of multiple-heterogeneous unmanned vehicle systems. In this aspect, this article aims to define the explicit concept of autonomous cooperation of multiple-heterogeneous unmanned vehicle systems. Furthermore, we provide a novel criterion to assess the technical maturity of the developed unmanned vehicle systems by classifying the autonomous cooperative levels of multiple-heterogeneous unmanned vehicle systems.
- [15] arXiv:2405.09222 [pdf, ps, html, other]
-
Title: Anchor Layout Optimization for Ultrasonic Indoor Positioning Using Swarm IntelligenceJournal-ref: 2023 13th International Conference on Indoor Positioning and Indoor Navigation (IPIN)Subjects: Signal Processing (eess.SP)
Indoor positioning applications are craving for ever higher precision and accuracy across the entire coverage zone. Optimal anchor placement and the deployment of multiple distributed anchor nodes could have a major impact in this regard. This paper examines the influences of these two difficult to approach hypotheses by means of a straightforward ultrasonic 3D indoor positioning system deployed in a real-life scenario via a geometric based simulation framework. To obtain an optimal anchor placement, a particle swarm optimization (PSO) algorithm is introduced and consequently performed for setups ranging from 4 to 10 anchors. In this way, besides the optimal anchor placement layout, the influence of deploying several distributed anchor nodes is investigated. In order to theoretically compare the optimization progress, a system model and Cramér-Rao lower bound (CRLB) are established and the results are quantified based on the simulation data. With limited anchors, the placement is crucial to obtain a high precision high reliability (HPHR) indoor positioning system (IPS), while the addition of anchors, to a lesser extent, gives a supplementary improvement.
- [16] arXiv:2405.09234 [pdf, ps, html, other]
-
Title: Enhancing Image Privacy in Semantic Communication over Wiretap Channels leveraging Differential PrivacySubjects: Image and Video Processing (eess.IV)
Semantic communication (SemCom) enhances transmission efficiency by sending only task-relevant information compared to traditional methods. However, transmitting semantic-rich data over insecure or public channels poses security and privacy risks. This paper addresses the privacy problem of transmitting images over wiretap channels and proposes a novel SemCom approach ensuring privacy through a differential privacy (DP)-based image protection and deprotection mechanism. The method utilizes the GAN inversion technique to extract disentangled semantic features and applies a DP mechanism to protect sensitive features within the extracted semantic information. To address the non-invertibility of DP, we introduce two neural networks to approximate the DP application and removal processes, offering a privacy protection level close to that by the original DP process. Simulation results validate the effectiveness of our method in preventing eavesdroppers from obtaining sensitive information while maintaining high-fidelity image reconstruction at the legitimate receiver.
- [17] arXiv:2405.09245 [pdf, ps, html, other]
-
Title: A Robust UAV-Based Approach for Power-Modulated Jammer Localization Using DoAComments: Submitted to the 2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall)Subjects: Signal Processing (eess.SP)
Unmanned aerial vehicles (UAVs) are well-suited to localize jammers, particularly when jammers are at non-terrestrial locations, where conventional detection methods face challenges. In this work we propose a novel localization method, sample pruning gradient descend (SPGD), which offers robust performance against multiple power-modulated jammers with low computational complexity.
- [18] arXiv:2405.09283 [pdf, ps, html, other]
-
Title: Bounds and Approximations for the Distribution of a Sum of Lognormal Random VariablesSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
A sum of lognormal random variables (RVs) appears in many problems of science and engineering. For example, it is invloved in computing the distribution of recevied signal and interference powers for radio channels subject to lognormal shadow fading. Its distribution has no closed-from expression and it is typically characterized by approximations, asymptotes or bounds. We give a novel upper bound on the cumulative distribution function (CDF) of a sum of $N$ lognormal RVs. The bound is derived from the tangential mean-arithmetic mean inequality. By using the tangential mean, our method replaces the sum of $N$ lognormal RVs with a product of $N$ shifted lognormal RVs. It is shown that the bound can be made arbitrarily close to the desired CDF, and thus it becomes more accurate than any other bound or approximation, as the shift approaches infinity. The bound is computed by numerical integration, for which we introduce the Mellin transform, which is applicable to products of RVs. At the left tail of the CDF, the bound can be expressed by a single Q-function. Moreover, we derive simple new approximations to the CDF, expressed as a product $N$ Q-functions, which are more accurate than the previous method of Farley.
- [19] arXiv:2405.09298 [pdf, ps, html, other]
-
Title: Deep Blur Multi-Model (DeepBlurMM) -- a strategy to mitigate the impact of image blur on deep learning model performance in histopathology image analysisSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality can impact model performance. Here, we investigate to what extent unsharp areas of WSIs impact deep convolutional neural network classification performance. We propose a multi-model approach, i.e. DeepBlurMM, to alleviate the impact of unsharp image areas and improve the model performance. DeepBlurMM uses the sigma cut-offs to determine the most suitable model for predicting tiles with various levels of blurring within a single WSI, where sigma is the standard deviation of the Gaussian distribution. Specifically, the cut-offs categorise the tiles into sharp or slight blur, moderate blur, and high blur. Each blur level has a corresponding model to be selected for tile-level predictions. Throughout the simulation study, we demonstrated the application of DeepBlurMM in a binary classification task for breast cancer Nottingham Histological Grade 1 vs 3. Performance, evaluated over 5-fold cross-validation, showed that DeepBlurMM outperformed the base model under moderate blur and mixed blur conditions. Unsharp image tiles (local blurriness) at prediction time reduced model performance. The proposed multi-model approach improved performance under some conditions, with the potential to improve quality in both research and clinical applications.
- [20] arXiv:2405.09317 [pdf, ps, html, other]
-
Title: Controllability Test for Nonlinear Datatic SystemsSubjects: Systems and Control (eess.SY)
Controllability is a fundamental property of control systems, serving as the prerequisite for controller design. While controllability test is well established in modelic (i.e., model-driven) control systems, extending it to datatic (i.e., data-driven) control systems is still a challenging task due to the absence of system models. In this study, we propose a general controllability test method for nonlinear systems with datatic description, where the system behaviors are merely described by data. In this situation, the state transition information of a dynamic system is available only at a limited number of data points, leaving the behaviors beyond these points unknown. Different from traditional exact controllability, we introduce a new concept called $\epsilon$-controllability, which extends the definition from point-to-point form to point-to-region form. Accordingly, our focus shifts to checking whether the system state can be steered to a closed state ball centered on the target state, rather than exactly at that target state. On its basis, we propose a tree search algorithm called maximum expansion of controllable subset (MECS) to identify controllable states in the dataset. Starting with a specific target state, our algorithm can iteratively propagate controllability from a known state ball to a new one. This iterative process gradually enlarges the $\epsilon$-controllable subset by incorporating new controllable balls until all $\epsilon$-controllable states are searched. Besides, a simplified version of MECS is proposed by solving a special shortest path problem, called Floyd expansion with radius fixed (FERF). FERF maintains a fixed radius of all controllable balls based on a mutual controllability assumption of neighboring states. The effectiveness of our method is validated in three datatic control systems whose dynamic behaviors are described by sampled data.
- [21] arXiv:2405.09346 [pdf, ps, html, other]
-
Title: Full-wave EM simulation analysis of human body blockage by dense 2D antenna arraysSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
Recently, proposals of human-sensing-based services for cellular and local area networks have brought indoor localization to the attention of several research groups. In response to these stimuli, various Device-Free Localization (DFL) techniques, also known as Passive Localization methods, have emerged by exploiting ambient signals to locate and track individuals that do not carry any electronic device. This study delves into human passive indoor localization through full-wave electromagnetic simulations. For the scope, we exploit simulations from the commercial tool FEKO software that employs the Method of Moments (MoM). In particular, we collect and analyze the electric field values in a scenario constituted by a dense 2D/3D deployment of receivers in the presence of an anthropomorphic mobile target. The paper describes in detail the collected dataset and provides a first analysis based on a statistical approach. Possible use cases are also investigated through examples in the context of passive localization, sensing, and imaging.
- [22] arXiv:2405.09352 [pdf, ps, html, other]
-
Title: On the impact of the antenna radiation patterns in passive radio sensingJournal-ref: IEEE Antennas and Wireless Propagation Letters (Volume: 23, Issue: 2, February 2024)Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
Electromagnetic (EM) body models based on the scalar diffraction theory allow to predict the impact of subject motions on the radio propagation channel without requiring a time-consuming full-wave approach. On the other hand, they are less effective in complex environments characterized by significant multipath effects. Recently, emerging radio sensing applications have proposed the adoption of smart antennas with non-isotropic radiation characteristics to improve coverage.This letter investigates the impact of antenna radiation patterns in passive radio sensing applications. Adaptations of diffraction-based EM models are proposed to account for antenna non-uniform angular filtering. Next, we quantify experimentally the impact of diffraction and multipath disturbance components on radio sensing accuracy in environments with smart antennas.
- [23] arXiv:2405.09353 [pdf, ps, html, other]
-
Title: Large coordinate kernel attention network for lightweight image super-resolutionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in computational and memory footprints as the convolutional kernel size increases. To address the first issue, we propose the multi-scale blueprint separable convolutions (MBSConv) as highly efficient building block with multi-scale receptive field, it can focus on the learning for the multi-scale information which is a vital component of discriminative representation. As for the second issue, we revisit the key properties of LKA in which we find that the adjacent direct interaction of local information and long-distance dependencies is crucial to provide remarkable performance. Thus, taking this into account and in order to mitigate the complexity of LKA, we propose a large coordinate kernel attention (LCKA) module which decomposes the 2D convolutional kernels of the depth-wise convolutional layers in LKA into horizontal and vertical 1-D kernels. LCKA enables the adjacent direct interaction of local information and long-distance dependencies not only in the horizontal direction but also in the vertical. Besides, LCKA allows for the direct use of extremely large kernels in the depth-wise convolutional layers to capture more contextual information, which helps to significantly improve the reconstruction performance, and it incurs lower computational complexity and memory footprints. Integrating MBSConv and LCKA, we propose a large coordinate kernel attention network (LCAN).
- [24] arXiv:2405.09390 [pdf, ps, html, other]
-
Title: Continuous Multi-Link Operation: A Contention-Free Mechanism for the Unlicensed SpectrumComments: 5 pages, 6 figures, conference paperSubjects: Signal Processing (eess.SP)
This paper proposes a novel mechanism to enforce contention-free channel access in the unlicensed spectrum, as opposed to the traditional contention-based approach. To achieve this objective, we build on the Wi-Fi~7 multi-link operation (MLO) and define the means whereby independent channel access attempts are performed in all the addressable links to ensure one available channel/link is ready for transmission at all times, such that a sequence of continuous acquired channels can be maintained. We call this method continuous multi-link operation (ConMLO). In this work, we aim to verify the applicability of ConMLO, its ability to retain spectrum resources for a given duration of time, and its fairness with respect existing approaches, namely legacy single-link operation (SLO) and MLO. To this end, we use realistic data traffic measurements acquired in a crowded football stadium as an exemplary case of challenging spectrum occupation. Our results show that the proposed ConMLO can effectively guarantee continuous channel acquisition under different occupancy scenarios without compromising fairness of channel access compared to existing legacy modes.
- [25] arXiv:2405.09405 [pdf, ps, html, other]
-
Title: On identifying the non-linear dynamics of a hovercraft using an end-to-end deep learning approachSubjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
We present the identification of the non-linear dynamics of a novel hovercraft design, employing end-to-end deep learning techniques. Our experimental setup consists of a hovercraft propelled by racing drone propellers mounted on a lightweight foam base, allowing it to float and be controlled freely on an air hockey table. We learn parametrized physics-inspired non-linear models directly from data trajectories, leveraging gradient-based optimization techniques prevalent in machine learning research. The chosen model structure allows us to control the position of the hovercraft precisely on the air hockey table. We then analyze the prediction performance and demonstrate the closed-loop control performance on the real system.
- [26] arXiv:2405.09430 [pdf, ps, html, other]
-
Title: Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of BanditsSubjects: Systems and Control (eess.SY)
In recent years, the integration of communication and control systems has gained significant traction in various domains, ranging from autonomous vehicles to industrial automation and beyond. Multi-armed bandit (MAB) algorithms have proven their effectiveness as a robust framework for solving control problems. In this work, we investigate the use of MAB algorithms to control remote devices, which faces considerable challenges primarily represented by latency and reliability. We analyze the effectiveness of MABs operating in environments where the action feedback from controlled devices is transmitted over an unreliable communication channel and stored in a Geo/Geo/1 queue. We investigate the impact of queue sampling strategies on the MAB performance, and introduce a new stochastic approach. Its performance in terms of regret is evaluated against established algorithms in the literature for both upper confidence bound (UCB) and Thompson Sampling (TS) algorithms. Additionally, we study the trade-off between maximizing rewards and minimizing energy consumption.
- [27] arXiv:2405.09436 [pdf, ps, html, other]
-
Title: Outlier-resilient model fitting via percentile losses: Methods for general and convex residualsComments: 5 pages, 1 figure. Submitted to IEEE Signal Processing Letters on 21-Apr-2024Subjects: Signal Processing (eess.SP)
We consider the problem of robustly fitting a model to data that includes outliers by formulating a percentile optimization problem. This problem is non-smooth and non-convex, hence hard to solve. We derive properties that the minimizers of such problems must satisfy. These properties lead to methods that solve the percentile formulation both for general residuals and for convex residuals. The methods fit the model to subsets of the data, and then extract the solution of the percentile formulation from these partial fits. As illustrative simulations show, such methods endure higher outlier percentages, when compared with standard robust estimates. Additionally, the derived properties provide a broader and alternative theoretical validation for existing robust methods, whose validity was previously limited to specific forms of the residuals.
- [28] arXiv:2405.09438 [pdf, ps, html, other]
-
Title: Perturbed Integrators Chain Control via Barrier Function Adaptation and Lyapunov RedesignComments: 12 pages, 9 figuresSubjects: Systems and Control (eess.SY)
Lyapunov redesign is a classical technique that uses a nominal control and its corresponding nominal Lyapunov function to design a discontinuous control, such that it compensates the uncertainties and disturbances. In this paper, the idea of Lyapunov redesign is used to propose an adaptive time-varying gain controller to stabilize a class of perturbed chain of integrators with an unknown control coefficient. It is assumed that the upper bound of the perturbation exists but is unknown. A proportional navigation feedback type gain is used to drive the system's trajectories into a prescribed vicinity of the origin in a predefined time, measured using a quadratic Lyapunov function. Once this neighborhood is reached, a barrier function-based gain is used, ensuring that the system's trajectories never leave this neighborhood despite uncertainties and perturbations. Experimental validation of the proposed controller in Furuta's pendulum is presented.
- [29] arXiv:2405.09446 [pdf, ps, html, other]
-
Title: M$^4$oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of ExpertsSubjects: Image and Video Processing (eess.IV)
Medical imaging data is inherently heterogeneous across different modalities and clinical centers, posing unique challenges for developing generalizable foundation models. Conventional entails training distinct models per dataset or using a shared encoder with modality-specific decoders. However, these approaches incur heavy computational overheads and suffer from poor scalability. To address these limitations, we propose the Medical Multimodal Mixture of Experts (M$^4$oE) framework, leveraging the SwinUNet architecture. Specifically, M$^4$oE comprises modality-specific experts; each separately initialized to learn features encoding domain knowledge. Subsequently, a gating network is integrated during fine-tuning to modulate each expert's contribution to the collective predictions dynamically. This enhances model interpretability and generalization ability while retaining expertise specialization. Simultaneously, the M$^4$oE architecture amplifies the model's parallel processing capabilities, and it also ensures the model's adaptation to new modalities with ease. Experiments across three modalities reveal that M$^4$oE can achieve 3.45% over STU-Net-L, 5.11% over MED3D, and 11.93% over SAM-Med2D across the MICCAI FLARE22, AMOS2022, and ATLAS2023 datasets. Moreover, M$^4$oE showcases a significant reduction in training duration with 7 hours less while maintaining a parameter count that is only 30% of its compared methods. The code is available at this https URL.
- [30] arXiv:2405.09458 [pdf, ps, html, other]
-
Title: Non-contact Lung Disease Classification via OFDM-based Passive 6G ISAC SensingHasan Mujtaba Buttar, Muhammad Mahboob Ur Rahman, Muhammad Wasim Nawaz, Adnan Noor Mian, Adnan Zahid, Qammer H. AbbasiComments: submitted to a journal, 12 pages, 5 figures, 5 tablesSubjects: Signal Processing (eess.SP)
This paper is the first to present a novel, non-contact method that utilizes orthogonal frequency division multiplexing (OFDM) signals (of frequency 5.23 GHz, emitted by a software defined radio) to radio-expose the pulmonary patients in order to differentiate between five prevalent respiratory diseases, i.e., Asthma, Chronic obstructive pulmonary disease (COPD), Interstitial lung disease (ILD), Pneumonia (PN), and Tuberculosis (TB). The fact that each pulmonary disease leads to a distinct breathing pattern, and thus modulates the OFDM signal in a different way, motivates us to acquire OFDM-Breathe dataset, first of its kind. It consists of 13,920 seconds of raw RF data (at 64 distinct OFDM frequencies) that we have acquired from a total of 116 subjects in a hospital setting (25 healthy control subjects, and 91 pulmonary patients). Among the 91 patients, 25 have Asthma, 25 have COPD, 25 have TB, 5 have ILD, and 11 have PN. We implement a number of machine and deep learning models in order to do lung disease classification using OFDM-Breathe dataset. The vanilla convolutional neural network outperforms all the models with an accuracy of 97%, and stands out in terms of precision, recall, and F1-score. The ablation study reveals that it is sufficient to radio-observe the human chest on seven different microwave frequencies only, in order to make a reliable diagnosis (with 96% accuracy) of the underlying lung disease. This corresponds to a sensing overhead that is merely 10.93% of the allocated bandwidth. This points to the feasibility of 6G integrated sensing and communication (ISAC) systems of future where 89.07% of bandwidth still remains available for information exchange amidst on-demand health sensing. Through 6G ISAC, this work provides a tool for mass screening for respiratory diseases (e.g., COVID-19) at public places.
- [31] arXiv:2405.09472 [pdf, ps, html, other]
-
Title: Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality AssessmentComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
With the advent of image super-resolution (SR) algorithms, how to evaluate the quality of generated SR images has become an urgent task. Although full-reference methods perform well in SR image quality assessment (SR-IQA), their reliance on high-resolution (HR) images limits their practical applicability. Leveraging available reconstruction information as much as possible for SR-IQA, such as low-resolution (LR) images and the scale factors, is a promising way to enhance assessment performance for SR-IQA without HR for reference. In this letter, we attempt to evaluate the perceptual quality and reconstruction fidelity of SR images considering LR images and scale factors. Specifically, we propose a novel dual-branch reduced-reference SR-IQA network, \ie, Perception- and Fidelity-aware SR-IQA (PFIQA). The perception-aware branch evaluates the perceptual quality of SR images by leveraging the merits of global modeling of Vision Transformer (ViT) and local relation of ResNet, and incorporating the scale factor to enable comprehensive visual perception. Meanwhile, the fidelity-aware branch assesses the reconstruction fidelity between LR and SR images through their visual perception. The combination of the two branches substantially aligns with the human visual system, enabling a comprehensive SR image evaluation. Experimental results indicate that our PFIQA outperforms current state-of-the-art models across three widely-used SR-IQA benchmarks. Notably, PFIQA excels in assessing the quality of real-world SR images.
- [32] arXiv:2405.09514 [pdf, ps, html, other]
-
Title: Tackling Distribution Shifts in Task-Oriented Communication with Information BottleneckComments: 13 pages, 8 figures, submitted to IEEE for potential publicationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)
Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that the encoded features can generalize to domain-shifted data and detect semanticshifted data, while remaining compact for transmission. In this paper, we propose a novel approach based on the information bottleneck (IB) principle and invariant risk minimization (IRM) framework. The proposed method aims to extract compact and informative features that possess high capability for effective domain-shift generalization and accurate semantic-shift detection without any knowledge of the test data during training. Specifically, we propose an invariant feature encoding approach based on the IB principle and IRM framework for domainshift generalization, which aims to find the causal relationship between the input data and task result by minimizing the complexity and domain dependence of the encoded feature. Furthermore, we enhance the task-oriented communication with the label-dependent feature encoding approach for semanticshift detection which achieves joint gains in IB optimization and detection performance. To avoid the intractable computation of the IB-based objective, we leverage variational approximation to derive a tractable upper bound for optimization. Extensive simulation results on image classification tasks demonstrate that the proposed scheme outperforms state-of-the-art approaches and achieves a better rate-distortion tradeoff.
- [33] arXiv:2405.09528 [pdf, ps, html, other]
-
Title: Energy-Efficient Sleep Mode Optimization of 5G mmWave Networks Using Deep Contextual MABSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)
Millimeter-wave (mmWave) networks, integral to 5G communication, offer a vast spectrum that addresses the issue of spectrum scarcity and enhances peak rate and capacity. However, their dense deployment, necessary to counteract propagation losses, leads to high power consumption. An effective strategy to reduce this energy consumption in mobile networks is the sleep mode optimization (SMO) of base stations (BSs). In this paper, we propose a novel SMO approach for mmWave BSs in a 3D urban environment. This approach, which incorporates a neural network (NN) based contextual multi-armed bandit (C-MAB) with an epsilon decay algorithm, accommodates the dynamic and diverse traffic of user equipment (UE) by clustering the UEs in their respective tracking areas (TAs). Our strategy includes beamforming, which helps reduce energy consumption from the UE side, while SMO minimizes energy use from the BS perspective. We extended our investigation to include Random, Epsilon Greedy, Upper Confidence Bound (UCB), and Load Based sleep mode (SM) strategies. We compared the performance of our proposed C-MAB based SM algorithm with those of All On and other alternative approaches. Simulation results show that our proposed method outperforms all other SM strategies in terms of the $10^{th}$ percentile of user rate and average throughput while demonstrating comparable average throughput to the All On approach. Importantly, it outperforms all approaches in terms of energy efficiency (EE).
- [34] arXiv:2405.09539 [pdf, ps, html, other]
-
Title: MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal CancerComments: Early accepted to MICCAI 2024 (6/6/5)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients' conditions. However, multi-modal based methods may likely introduce information redundancy, leading to underperformance. In addition, efficient and effective interactions between multi-modal representations need to be further explored, lacking insightful exploration of prognostic correlation in multi-modality features. In this work, we introduce a multi-modal heterogeneous graph-based conditional feature-guided diffusion model for lymph node metastasis diagnosis based on CT images as well as clinical measurements and radiomics data. To explore the intricate relationships between multi-modal features, we construct a heterogeneous graph. Following this, a conditional feature-guided diffusion approach is applied to eliminate information redundancy. Moreover, we propose a masked relational representation learning strategy, aiming to uncover the latent prognostic correlations and priorities of primary tumor and lymph node image representations. Various experimental results validate the effectiveness of our proposed method. The code is available at this https URL.
New submissions for Thursday, 16 May 2024 (showing 34 of 34 entries )
- [35] arXiv:2405.08838 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: PolyGlotFake: A Novel Multilingual and Multimodal DeepFake DatasetComments: 13 page, 4 figuresSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, and the few that are multimodal employ outdated techniques, and their audio content is limited to a single language, thereby failing to represent the cutting-edge advancements and globalization trends in current deepfake technologies. To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake. It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies. We conduct comprehensive experiments using state-of-the-art detection methods on PolyGlotFake dataset. These experiments demonstrate the dataset's significant challenges and its practical value in advancing research into multimodal deepfake detection.
- [36] arXiv:2405.08976 (cross-list from cs.NI) [pdf, ps, html, other]
-
Title: Slice-aware Resource Allocation and Admission Control for Smart Factory Wireless NetworksComments: 7 pages, submitted to VTCfall for reviewSubjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
The 5th generation (5G) and beyond network offers substantial promise as the ideal wireless technology to replace the existing inflexible wired connections in traditional factories of today. 5G network slicing allows for tailored allocation of resources to different network services, each with unique Quality of Service (QoS) requirements. This paper presents a novel solution for slice-aware radio resource allocation based on a convex optimisation control framework for applications in smart factory wireless networks. The proposed framework dynamically allocates minimum power and sub-channels to downlink mixed service type industrial users categorised into three slices: Capacity Limited (CL), Ultra Reliable Low Latency Communication (URLLC), and Time Sensitive (TS) slices. Given that the base station (BS) has limited transmission power, we enforce admission control by effectively relaxing the target rate constraints for current connections in the CL slice. This rate readjustment occurs whenever power consumption exceeds manageable levels. Simulation results show that our approach minimises power, allocates sub-channels to users, maintains slice isolation, and delivers QoS-specific communications to users in all the slices despite time-varying number of users and changing network conditions.
- [37] arXiv:2405.09062 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: Naturalistic Music Decoding from EEG Data via Latent Diffusion ModelsEmilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo AkamaSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. We additionally perform song classification based on the generated tracks. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.
- [38] arXiv:2405.09081 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Explainable AI for Ship Collision Avoidance: Decoding Decision-Making Processes and Behavioral IntentionsComments: 24 pases and 15 figures. If the program is needed, please contuct usSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
This study developed an explainable AI for ship collision avoidance. Initially, a critic network composed of sub-task critic networks was proposed to individually evaluate each sub-task in collision avoidance to clarify the AI decision-making processes involved. Additionally, an attempt was made to discern behavioral intentions through a Q-value analysis and an Attention mechanism. The former focused on interpreting intentions by examining the increment of the Q-value resulting from AI actions, while the latter incorporated the significance of other ships in the decision-making process for collision avoidance into the learning objective. AI's behavioral intentions in collision avoidance were visualized by combining the perceived collision danger with the degree of attention to other ships. The proposed method was evaluated through a numerical experiment. The developed AI was confirmed to be able to safely avoid collisions under various congestion levels, and AI's decision-making process was rendered comprehensible to humans. The proposed method not only facilitates the understanding of DRL-based controllers/systems in the ship collision avoidance task but also extends to any task comprising sub-tasks.
- [39] arXiv:2405.09101 (cross-list from cs.RO) [pdf, ps, html, other]
-
Title: Adaptive Koopman Embedding for Robust Control of Complex Dynamical SystemsSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
The discovery of linear embedding is the key to the synthesis of linear control techniques for nonlinear systems. In recent years, while Koopman operator theory has become a prominent approach for learning these linear embeddings through data-driven methods, these algorithms often exhibit limitations in generalizability beyond the distribution captured by training data and are not robust to changes in the nominal system dynamics induced by intrinsic or environmental factors. To overcome these limitations, this study presents an adaptive Koopman architecture capable of responding to the changes in system dynamics online. The proposed framework initially employs an autoencoder-based neural network that utilizes input-output information from the nominal system to learn the corresponding Koopman embedding offline. Subsequently, we augment this nominal Koopman architecture with a feed-forward neural network that learns to modify the nominal dynamics in response to any deviation between the predicted and observed lifted states, leading to improved generalization and robustness to a wide range of uncertainties and disturbances compared to contemporary methods. Extensive tracking control simulations, which are undertaken by integrating the proposed scheme within a Model Predictive Control framework, are used to highlight its robustness against measurement noise, disturbances, and parametric variations in system dynamics.
- [40] arXiv:2405.09108 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: A Linear Test for Global Nonlinear ControllabilitySubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
It is known that if a nonlinear control affine system without drift is bracket generating, then its associated sub-Laplacian is invertible under some conditions on the domain. In this note, we investigate the converse. We show how invertibility of the sub-Laplacian operator implies a weaker form of controllability, where the reachable sets of a neighborhood of a point have full measure. From a computational point of view, one can then use the spectral gap of the (infinite-dimensional) self-adjoint operator to define a notion of degree of controllability.
An essential tool to establish the converse result is to use the relation between invertibility of the sub-Laplacian to the the controllability of the corresponding continuity equation using possibly non-smooth controls. Then using Ambrosio-Gigli-Savare's superposition principle from optimal transport theory we relate it to controllability properties of the control system. While the proof can be considered of the Perron-Frobenius type, we also provide a second dual Koopman point of view. - [41] arXiv:2405.09137 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: On Convergence of the Iteratively Preconditioned Gradient-Descent (IPG) ObserverComments: 7 pagesSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper considers the observer design problem for discrete-time nonlinear dynamical systems with sampled measurement data. Earlier, the recently proposed Iteratively Preconditioned Gradient-Descent (IPG) observer, a Newton-type observer, has been empirically shown to have improved robustness against measurement noise than the prominent nonlinear observers, a property that other Newton-type observers lack. However, no theoretical guarantees on the convergence of the IPG observer were provided. This paper presents a rigorous convergence analysis of the IPG observer for a class of nonlinear systems in deterministic settings, proving its local linear convergence to the actual trajectory. Our assumptions are standard in the existing literature of Newton-type observers, and the analysis further confirms the relation of the IPG observer with the Newton observer, which was only hypothesized earlier.
- [42] arXiv:2405.09171 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: Hierarchical Emotion Prediction and Control in Text-to-Speech SynthesisComments: This is accepted to IEEE ICASSP 2024Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS) synthesis. Prior studies have primarily focused on learning a global prosodic representation at the utterance level, which strongly correlates with linguistic prosody. Our goal is to construct a hierarchical emotion distribution (ED) that effectively encapsulates intensity variations of emotions at various levels of granularity, encompassing phonemes, words, and utterances. During TTS training, the hierarchical ED is extracted from the ground-truth audio and guides the predictor to establish a connection between emotional and linguistic prosody. At run-time inference, the TTS model generates emotional speech and, at the same time, provides quantitative control of emotion over the speech constituents. Both objective and subjective evaluations validate the effectiveness of the proposed framework in terms of emotion prediction and control.
- [43] arXiv:2405.09207 (cross-list from cs.IT) [pdf, ps, html, other]
-
Title: An Exact Theory of Causal Emergence for Linear Stochastic Iteration SystemsSubjects: Information Theory (cs.IT); Systems and Control (eess.SY)
After coarse-graining a complex system, the dynamics of its macro-state may exhibit more pronounced causal effects than those of its micro-state. This phenomenon, known as causal emergence, is quantified by the indicator of effective information. However, two challenges confront this theory: the absence of well-developed frameworks in continuous stochastic dynamical systems and the reliance on coarse-graining methodologies. In this study, we introduce an exact theoretic framework for causal emergence within linear stochastic iteration systems featuring continuous state spaces and Gaussian noise. Building upon this foundation, we derive an analytical expression for effective information across general dynamics and identify optimal linear coarse-graining strategies that maximize the degree of causal emergence when the dimension averaged uncertainty eliminated by coarse-graining has an upper bound. Our investigation reveals that the maximal causal emergence and the optimal coarse-graining methods are primarily determined by the principal eigenvalues and eigenvectors of the dynamic system's parameter matrix, with the latter not being unique. To validate our propositions, we apply our analytical models to three simplified physical systems, comparing the outcomes with numerical simulations, and consistently achieve congruent results.
- [44] arXiv:2405.09224 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: Perception-Inspired Graph Convolution for Music Understanding TasksComments: Accepted at the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
We propose a new graph convolutional block, called MusGConv, specifically designed for the efficient processing of musical score data and motivated by general perceptual principles. It focuses on two fundamental dimensions of music, pitch and rhythm, and considers both relative and absolute representations of these components. We evaluate our approach on four different musical understanding problems: monophonic voice separation, harmonic analysis, cadence detection, and composer identification which, in abstract terms, translate to different graph learning problems, namely, node classification, link prediction, and graph classification. Our experiments demonstrate that MusGConv improves the performance on three of the aforementioned tasks while being conceptually very simple and efficient. We interpret this as evidence that it is beneficial to include perception-informed processing of fundamental musical concepts when developing graph network applications on musical score data.
- [45] arXiv:2405.09241 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: SMUG-Explain: A Framework for Symbolic Music Graph ExplanationsComments: In Proceedings of the Sound and Music Computing Conference 2024 (SMC2024), Porto, PortugalSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
In this work, we present Score MUsic Graph (SMUG)-Explain, a framework for generating and visualizing explanations of graph neural networks applied to arbitrary prediction tasks on musical scores. Our system allows the user to visualize the contribution of input notes (and note features) to the network output, directly in the context of the musical score. We provide an interactive interface based on the music notation engraving library Verovio. We showcase the usage of SMUG-Explain on the task of cadence detection in classical music. All code is available on this https URL.
- [46] arXiv:2405.09266 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: Dance Any Beat: Blending Beats with Visuals in Dance Video GenerationComments: 11 pages, 6 figures, demo page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
The task of generating dance from music is crucial, yet current methods, which mainly produce joint sequences, lead to outputs that lack intuitiveness and complicate data collection due to the necessity for precise joint annotations. We introduce a Dance Any Beat Diffusion model, namely DabFusion, that employs music as a conditional input to directly create dance videos from still images, utilizing conditional image-to-video generation principles. This approach pioneers the use of music as a conditioning factor in image-to-video synthesis. Our method unfolds in two stages: training an auto-encoder to predict latent optical flow between reference and driving frames, eliminating the need for joint annotation, and training a U-Net-based diffusion model to produce these latent optical flows guided by music rhythm encoded by CLAP. Although capable of producing high-quality dance videos, the baseline model struggles with rhythm alignment. We enhance the model by adding beat information, improving synchronization. We introduce a 2D motion-music alignment score (2D-MM Align) for quantitative assessment. Evaluated on the AIST++ dataset, our enhanced model shows marked improvements in 2D-MM Align score and established metrics. Video results can be found on our project page: this https URL.
- [47] arXiv:2405.09282 (cross-list from cs.RO) [pdf, ps, html, other]
-
Title: Three-Dimensional Path Planning: Navigating through Rough MereologyComments: number of pages: 16, number of figures: 10Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
In this paper, we present an innovative technique for the path planning of flying robots in a 3D environment in Rough Mereology terms. The main goal was to construct the algorithm that would generate the mereological potential fields in 3-dimensional space. To avoid falling into the local minimum, we assist with a weighted Euclidean distance. Moreover, a searching path from the start point to the target, with respect to avoiding the obstacles was applied. The environment was created by connecting two cameras working in real-time. To determine the gate and elements of the world inside the map was responsible the Python Library OpenCV [1] which recognized shapes and colors. The main purpose of this paper is to apply the given results to drones.
- [48] arXiv:2405.09291 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: Sensitivity Decouple Learning for Image Compression Artifacts ReductionComments: Accepted by Transactions on Image ProcessingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a mapping from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing tasks. Different from these methods, we propose to decouple the intrinsic attributes into two complementary features for artifacts reduction,ie, the compression-insensitive features to regularize the high-level semantic representations during training and the compression-sensitive features to be aware of the compression degree. To achieve this, we first employ adversarial training to regularize the compressed and original encoded features for retaining high-level semantics, and we then develop the compression quality-aware feature encoder for compression-sensitive features. Based on these dual complementary features, we propose a Dual Awareness Guidance Network (DAGN) to utilize these awareness features as transformation guidance during the decoding phase. In our proposed DAGN, we develop a cross-feature fusion module to maintain the consistency of compression-insensitive features by fusing compression-insensitive features into the artifacts reduction baseline. Our method achieves an average 2.06 dB PSNR gains on BSD500, outperforming state-of-the-art methods, and only requires 29.7 ms to process one image on BSD500. Besides, the experimental results on LIVE1 and LIU4K also demonstrate the efficiency, effectiveness, and superiority of the proposed method in terms of quantitative metrics, visual quality, and downstream machine vision tasks.
- [49] arXiv:2405.09336 (cross-list from cs.IT) [pdf, ps, html, other]
-
Title: Analytical Characterization of the Operational Diversity Order in Fading ChannelsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
We introduce and characterize the operational diversity order (ODO) in fading channels, as a proxy to the classical notion of diversity order at any arbitrary operational signal-to-noise ratio (SNR). Thanks to this definition, relevant insights are brought up in a number of cases: (i) We quantify that in line-of-sight scenarios an increased diversity order is attainable compared to that achieved asymptotically; (ii) this effect is attenuated, but still visible, in the presence of an additional dominant specular component; (iii) we confirm that the decay slope in Rayleigh product channels increases very slowly and never fully achieves unitary slope for finite values of SNR.
- [50] arXiv:2405.09425 (cross-list from cs.IT) [pdf, ps, html, other]
-
Title: Robust Covariance-Based Activity Detection for Massive AccessComments: 5 pages, 11 figures. Asilomar SSC 2023 ConferenceSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The wireless channel is undergoing continuous changes, and the block-fading assumption, despite its popularity in theoretical contexts, never holds true in practical scenarios. This discrepancy is particularly critical for user activity detection in grant-free random access, where joint processing across multiple resource blocks is usually undesirable. In this paper, we propose employing a low-dimensional approximation of the channel to capture variations over time and frequency and robustify activity detection algorithms. This approximation entails projecting channel fading vectors onto their principal directions to minimize the approximation order. Through numerical examples, we demonstrate a substantial performance improvement achieved by the resulting activity detection algorithm.
- [51] arXiv:2405.09443 (cross-list from cs.IT) [pdf, ps, html, other]
-
Title: Low-Complexity Joint Azimuth-Range-Velocity Estimation for Integrated Sensing and Communication with OFDM WaveformComments: 16 pages, 12 figures, submitted to IEEE journalSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Integrated sensing and communication (ISAC) is a main application scenario of the sixth-generation mobile communication systems. Due to the fast-growing number of antennas and subcarriers in cellular systems, the computational complexity of joint azimuth-range-velocity estimation (JARVE) in ISAC systems is extremely high. This paper studies the JARVE problem for a monostatic ISAC system with orthogonal frequency division multiplexing (OFDM) waveform, in which a base station receives the echos of its transmitted cellular OFDM signals to sense multiple targets. The Cramer-Rao bounds are first derived for JARVE. A low-complexity algorithm is further designed for super-resolution JARVE, which utilizes the proposed iterative subspace update scheme and Levenberg-Marquardt optimization method to replace the exhaustive search of spatial spectrum in multiple-signal-classification (MUSIC) algorithm. Finally, with the practical parameters of 5G New Radio, simulation results verify that the proposed algorithm can reduce the computational complexity by three orders of magnitude and two orders of magnitude compared to the existing three-dimensional MUSIC algorithm and estimation-of-signal-parameters-using-rotational-invariance-techniques (ESPRIT) algorithm, respectively, and also improve the estimation performance.
- [52] arXiv:2405.09470 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style TransferComments: Accepted to SecTL (AsiaCCS Workshop) 2024Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.
- [53] arXiv:2405.09497 (cross-list from cs.IT) [pdf, ps, html, other]
-
Title: Towards the limits: Sensing Capability Measurement for ISAC Through Channel EncoderSubjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Integrated Sensing and Communication (ISAC) is gradually becoming a reality due to the significant increase in frequency and bandwidth of next-generation wireless communication technologies. Therefore it becomes crucial to evaluate the communication and sensing performance using appropriate channel models to address resource competition from each other. Existing work only models the sensing capability based on the mutual information between the channel response and the received signal, and its theoretical resolution is difficult to support the high-precision requirements of ISAC for sensing tasks, and may even affect its communication optimal.
In this paper, we propose a sensing channel encoder model to measure the sensing capacity with higher resolution by discrete task mutual information. For the first time, derive upper and lower bounds on the sensing accuracy for a given channel. This model not only provides the possibility of optimizing the ISAC systems at a finer granularity and balancing communication and sensing resources, but also provides theoretical explanations for classical intuitive feelings (like more modalities more accuracy) in wireless sensing. Furthermore, we validate the effectiveness of the proposed channel model through real-case studies, including person identification, displacement detection, direction estimation, and device recognition. The evaluation results indicate a Pearson correlation coefficient exceeding 0.9 between our task mutual information and conventional experimental metrics (e.g., accuracy).
Cross submissions for Thursday, 16 May 2024 (showing 19 of 19 entries )
- [54] arXiv:2307.16178 (replaced) [pdf, ps, html, other]
-
Title: On Updating Static Output Feedback Controllers Under State-Space PerturbationSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
In this paper, we propose a novel update of a nominal stabilizing static output feedback (SOF) controller for a perturbed linear system. In almost every classical feedback controller design problem, a stabilizing feedback controller is designed given a stabilizable unstable system. In realistic scenarios, the system model is usually imperfect and subject to perturbations. A typical approach to attenuate the impacts of such perturbations on the system stability is repeating the whole controller design procedure to find an updated stabilizing SOF controller. Such an approach can be inefficient and occasionally infeasible. Using the notion of minimum destabilizing real perturbation (MDRP), we construct a simple norm minimization problem (a least-squares problem) to propose an efficient update of a nominal stabilizing SOF controller that can be applied to various control engineering applications in the case of perturbed scenarios like abrupt changes or inaccurate system models. In particular, considering norm-bounded known or unknown perturbations, this paper presents updated stabilizing SOF controllers and derives sufficient stability conditions. Geometric metrics to quantitatively measure the approach's robustness are defined. Moreover, we characterize the corresponding guaranteed stability regions, and specifically, for the case of norm-bounded unknown perturbations, we propose non-fragility-based robust updated stabilizing SOF controllers. Through extensive numerical simulations, we assess the effectiveness of the theoretical results.
- [55] arXiv:2309.07927 (replaced) [pdf, ps, html, other]
-
Title: Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. AdultsSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data. However, this progress doesn't readily extend to ASR for children due to the limited availability of suitable child-specific databases and the distinct characteristics of children's speech. A recent study investigated leveraging the My Science Tutor (MyST) children's speech corpus to enhance Whisper's performance in recognizing children's speech. They were able to demonstrate some improvement on a limited testset. This paper builds on these findings by enhancing the utility of the MyST dataset through more efficient data preprocessing. We reduce the Word Error Rate (WER) on the MyST testset 13.93% to 9.11% with Whisper-Small and from 13.23% to 8.61% with Whisper-Medium and show that this improvement can be generalized to unseen datasets. We also highlight important challenges towards improving children's ASR performance. The results showcase the viable and efficient integration of Whisper for effective children's speech recognition.
- [56] arXiv:2312.06344 (replaced) [pdf, ps, html, other]
-
Title: Learning Robust Policies for Uncertain Parametric Markov Decision ProcessesComments: 10 pages, accepted for oral presentation at L4DCSubjects: Systems and Control (eess.SY); Logic in Computer Science (cs.LO)
Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.
- [57] arXiv:2312.12625 (replaced) [pdf, ps, html, other]
-
Title: Calibrating Wireless Ray Tracing for Digital Twinning using Local Phase Error EstimatesComments: Revised 10 May 2024: additional FDTD experimentsSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Embodying the principle of simulation intelligence, digital twin (DT) systems construct and maintain a high-fidelity virtual model of a physical system. This paper focuses on ray tracing (RT), which is widely seen as an enabling technology for DTs of the radio access network (RAN) segment of next-generation disaggregated wireless systems. RT makes it possible to simulate channel conditions, enabling data augmentation and prediction-based transmission. However, the effectiveness of RT hinges on the adaptation of the electromagnetic properties assumed by the RT to actual channel conditions, a process known as calibration. The main challenge of RT calibration is the fact that small discrepancies in the geometric model fed to the RT software hinder the accuracy of the predicted phases of the simulated propagation paths. Existing solutions to this problem either rely on the channel power profile, hence disregarding phase information, or they operate on the channel responses by assuming the simulated phases to be sufficiently accurate for calibration. This paper proposes a novel channel response-based scheme that, unlike the state of the art, estimates and compensates for the phase errors in the RT-generated channel responses. The proposed approach builds on the variational expectation maximization algorithm with a flexible choice of the prior phase-error distribution that bridges between a deterministic model with no phase errors and a stochastic model with uniform phase errors. The algorithm is computationally efficient, and is demonstrated, by leveraging the open-source differentiable RT software available within the Sionna library, to outperform existing methods in terms of the accuracy of RT predictions.
- [58] arXiv:2402.03897 (replaced) [pdf, ps, html, other]
-
Title: Robust Data-EnablEd Predictive Leading Cruise Control via Reachability AnalysisComments: 8 pages, 4 figuresSubjects: Systems and Control (eess.SY)
Data-driven predictive control promises model-free wave-dampening strategies for Connected and Autonomous Vehicles (CAVs) in mixed traffic flow. However, its performance relies on data quality, which suffers from unknown noise and disturbances.This paper introduces a Robust Data-EnablEd Predictive Leading Cruise Control (RDeeP-LCC) method based on reachability analysis, aiming to achieve safe and optimal CAV control under bounded process noise and external disturbances. Precisely, the matrix zonotope set technique and Willems' Fundamental Lemma are employed to derive the over-approximated system dynamics directly from data, and a data-driven feedback control technique is utilized to obtain an additional feedback input for stability. We decouple the mixed platoon into an error system and a nominal system, where the error system provides data-driven reachability sets for the enhanced safety constraints in the nominal system. Finally, a data-driven predictive control framework is formulated in a tube-based control manner for robustness guarantees. Nonlinear simulations with noise-corrupted data demonstrate that the proposed method outperforms baseline methods in mitigating traffic waves.
- [59] arXiv:2402.04157 (replaced) [pdf, ps, html, other]
-
Title: Controller synthesis for input-state data with measurement errorsSubjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
We consider the problem of designing a state-feedback controller for a linear system, based only on noisy input-state data. We focus on input-state data corrupted by measurement errors, which, albeit less investigated, are as relevant as process disturbances in applications. For energy and instantaneous bounds on these measurement errors, we derive linear matrix inequalities for controller design where the one for the energy bound is equivalent to robust stabilization of all systems consistent with the noisy data points via a common Lyapunov function.
- [60] arXiv:2402.14832 (replaced) [pdf, ps, other]
-
Title: Integrating Simulation Budget Management into Drum-Buffer-Rope: A Study on Parametrization and Reducing Computational EffortSubjects: Systems and Control (eess.SY)
In manufacturing, a bottleneck workstation frequently emerges, complicating production planning and escalating costs. To address this, Drum-Buffer-Rope (DBR) is a widely recognized production planning and control method that focuses on centralizing the bottleneck workstation, thereby improving production system performance. Although DBR is primarily focused on cre-ating a bottleneck schedule, the selection of planning parameters is crucial, as they significantly influence the scheduling process. Conducting a compre-hensive full factorial enumeration to identify the ideal planning parameters requires substantial computational effort. Simulation Budget Management (SBM) offers an effective concept to reduce this effort by skipping less promising parameter combinations. This publication introduces a method for integrating SBM into multi-stage multi-item DBR planned and controlled production system with limited capacity, aimed at determining the optimal planning parameters. Furthermore, we conduct a simulation study to analyze the effects of different production system environments, i.e., varying levels of shop load and process uncertainty, on both the performance and parame-terization of DBR and the efficacy of SBM. Our results show significant re-duction in simulation budget for identifying optimal planning parameters compared to traditional full factorial enumeration.
- [61] arXiv:2403.06573 (replaced) [pdf, ps, other]
-
Title: Electrical Consumption Flexibility in the Cement IndustrySebastián Rojas-Innocenti, Enrique Baeyens, Alejandro Martín-Crespo, Sergio Saludes-Rodil, Fernando Frechoso-EscuderoSubjects: Systems and Control (eess.SY)
A method for identifying and quantifying the flexibility of electricity demand in a production plant is reported. The plant is equipped with electric machines, product storage silos, distributed generation, and electrical storage systems. The method aims to minimize production costs. To achieve this, the plant is mathematically modeled, and an economic optimization problem is formulated by managing these plant equipment. From this optimal schedule (base schedule), the feasibility of modifying it to sell or buy energy in the electricity balancing regulation markets is evaluated, thus obtaining the so called flexibility schedule. Finally, this method was successfully applied to a real case using data from a Spanish cement production plant.
- [62] arXiv:2403.17324 (replaced) [pdf, ps, html, other]
-
Title: Unsupervised Learning for Joint Beamforming Design in RIS-aided ISAC SystemsComments: Accpeted by IEEE Wireless Communications LettersSubjects: Signal Processing (eess.SP)
It is critical to design efficient beamforming in reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) systems for enhancing spectrum utilization. However, conventional methods often have limitations, either incurring high computational complexity due to iterative algorithms or sacrificing performance when using heuristic methods. To achieve both low complexity and high spectrum efficiency, an unsupervised learning-based beamforming design is proposed in this work. We tailor image-shaped channel samples and develop an ISAC beamforming neural network (IBF-Net) model for beamforming. By leveraging unsupervised learning, the loss function incorporates key performance metrics like sensing and communication channel correlation and sensing channel gain, eliminating the need of labeling. Simulations show that the proposed method achieves competitive performance compared to benchmarks while significantly reduces computational complexity.
- [63] arXiv:2404.00252 (replaced) [pdf, ps, html, other]
-
Title: Learned Scanpaths Aid Blind Panoramic Video Quality AssessmentComments: Accepted to CVPR 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through visual scanpaths. Our method consists of two modules: a scanpath generator and a quality assessor. The scanpath generator is initially trained to predict future scanpaths by minimizing their expected code length and then jointly optimized with the quality assessor for quality prediction. Our blind PVQA method enables direct quality assessment of panoramic images by treating them as videos composed of identical frames. Experiments on three public panoramic image and video quality datasets, encompassing both synthetic and authentic distortions, validate the superiority of our blind PVQA model over existing methods.
- [64] arXiv:2404.19463 (replaced) [pdf, ps, html, other]
-
Title: Enhancing Physical Layer Security with Deep SIMO Auto-Encoder and RF Impairments ModelingComments: 6 pages, 5 figures, conferenceSubjects: Signal Processing (eess.SP)
This paper presents a novel approach to achieving secure wireless communication by leveraging the inherent characteristics of wireless channels through end-to-end learning using a single-input-multiple-output (SIMO) autoencoder (AE). To ensure a more realistic signal transmission, we derive the signal model that captures all radio frequency (RF) hardware impairments to provide reliable and secure communication. Performance evaluations against traditional linear decoders, such as zero-forcing (ZR) and linear minimum mean square error (LMMSE), and the optimal nonlinear decoder, maximum likelihood (ML), demonstrate that the AE-based SIMO model exhibits superior bit error rate (BER) performance, but with a substantial gap even in the presence of RF hardware impairments. Additionally, the proposed model offers enhanced security features, preventing potential eavesdroppers from intercepting transmitted information and leveraging RF impairments for augmented physical layer security and device identification. These findings underscore the efficacy of the proposed end-to-end learning approach in achieving secure and robust wireless communication.
- [65] arXiv:2405.00725 (replaced) [pdf, ps, html, other]
-
Title: Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram DataVikhyat Agrawal, Sunil Vasu Kalmady, Venkataseetharam Manoj Malipeddi, Manisimha Varma Manthena, Weijie Sun, Saiful Islam, Abram Hindle, Padma Kaul, Russell GreinerComments: Accepted for ICMHI 2024Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training. Our code is available at this https URL.
- [66] arXiv:2405.01726 (replaced) [pdf, ps, html, other]
-
Title: SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image DenoisingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Denoising hyperspectral images (HSIs) is a crucial preprocessing procedure due to the noise originating from intra-imaging mechanisms and environmental factors. Utilizing domain-specific knowledge of HSIs, such as spectral correlation, spatial self-similarity, and spatial-spectral correlation, is essential for deep learning-based denoising. Existing methods are often constrained by running time, space complexity, and computational complexity, employing strategies that explore these priors separately. While these strategies can avoid some redundant information, they inevitably overlook broader and more underlying long-range spatial-spectral information that positively impacts image restoration. This paper proposes a Spatial-Spectral Selective State Space Model-based U-shaped network, termed Spatial-Spectral U-Mamba (SSUMamba), for hyperspectral image denoising. We can obtain complete global spatial-spectral correlation within a module thanks to the linear space complexity in State Space Model (SSM) computations. We introduce a Spatial-Spectral Alternating Scan (SSAS) strategy for HSIs, which helps model the information flow in multiple directions in 3-D HSIs. Experimental results demonstrate that our method outperforms compared methods. The source code is available at this https URL.
- [67] arXiv:2405.02131 (replaced) [pdf, ps, html, other]
-
Title: Physics-informed generative neural networks for RF propagation prediction with application to indoor body perceptionSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations.
- [68] arXiv:2405.06445 (replaced) [pdf, ps, html, other]
-
Title: Systematic interval observer design for linear systemsComments: 5 pages, 2 figuresSubjects: Systems and Control (eess.SY)
We first propose systematic and comprehensive interval observer designs for linear time-invariant systems, under standard assumptions involving observability and interval bounds on the initial condition and disturbances. Historically, such designs rely on transformations with certain limitations into a form that is Metzler (for continuous time) or non-negative (for discrete time). We show that they can be effectively replaced with a linear time-invariant transformation that can be easily computed offline. Then, we propose the extension to the time-varying setting, where conventional transformations lack guaranteed outcomes. Academic examples are presented to illustrate our methods.
- [69] arXiv:2405.07578 (replaced) [pdf, ps, html, other]
-
Title: PRANK: a singular value based noise filtering approachSubjects: Signal Processing (eess.SP)
High quality measurements are paramount to a successful application of experimental techniques in structural dynamics. The presence of noise and disturbances can significantly distort the information stored in the data and, if not adequately treated, may result in erroneous findings and misleading predictions. A common technique to filter out noise relies on decomposing the dataset into singular components sorted by their degree of significance. Discarding low-value contributions helps to clean the data and remove spuriousness. This paper presents PRANK, a novel singular value-based reconstruction approach for multiple-response vibration datasets. PRANK integrates the effect of Principal Response Functions and Hankel filtering actions, resulting in an improved data reconstruction for both system poles and zeros. The mixed formulation, incorporating the e-15 algorithm for automatic truncation, is tested on both analytical and numerical examples, showcasing its robustness, efficiency and versatility. PRANK operates with both time- and frequency-based data. Applied to noisy full-field camera measurements, the filter delivered excellent performance, indicating its potential for various identification tasks and applications in vibration analysis.
- [70] arXiv:2405.08431 (replaced) [pdf, ps, html, other]
-
Title: Similarity Metrics for MR Image-To-Image TranslationComments: 29 pages, 6 figures, appendix with 5 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Image-to-image translation can create large impact in medical imaging, i.e. if images of a patient can be translated to another modality, type or sequence for better diagnosis. However, these methods must be validated by human reader studies, which are costly and restricted to small samples. Automatic evaluation of large samples to pre-evaluate and continuously improve methods before human validation is needed. In this study, we give an overview of reference and non-reference metrics for image synthesis assessment and investigate the ability of nine metrics, that need a reference (SSIM, MS-SSIM, PSNR, MSE, NMSE, MAE, LPIPS, NMI and PCC) and three non-reference metrics (BLUR, MSN, MNG) to detect 11 kinds of distortions in MR images from the BraSyn dataset. In addition we test a downstream segmentation metric and the effect of three normalization methods (Minmax, cMinMax and Zscore). Although PSNR and SSIM are frequently used to evaluate generative models for image-to-image-translation tasks in the medical domain, they show very specific shortcomings. SSIM ignores blurring but is very sensitive to intensity shifts in unnormalized MR images. PSNR is even more sensitive to different normalization methods and hardly measures the degree of distortions. Further metrics, such as LPIPS, NMI and DICE can be very useful to evaluate other similarity aspects. If the images to be compared are misaligned, most metrics are flawed. By carefully selecting and reasonably combining image similarity metrics, the training and selection of generative models for MR image synthesis can be improved. Many aspects of their output can be validated before final and costly evaluation by trained radiologists is conducted.
- [71] arXiv:2405.08621 (replaced) [pdf, ps, html, other]
-
Title: RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video ContentComments: 8pages, 2figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artefacts and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a novel blind deep video quality assessment (VQA) method specifically for enhanced video content. It employs a new Recurrent Memory Transformer (RMT) based network architecture to obtain video quality representations, which is optimised through a novel content-quality-aware contrastive learning strategy based on a new database containing 13K training patches with enhanced content. The extracted quality representations are then combined through linear regression to generate video-level quality indices. The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation. The results show its superior correlation performance when compared to ten existing no-reference quality metrics.
- [72] arXiv:2305.03568 (replaced) [pdf, ps, html, other]
-
Title: A vector quantized masked autoencoder for audiovisual speech emotion recognitionComments: 15 pages, 5 figures, this https URLSubjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
The limited availability of labeled data is a major challenge in audiovisual speech emotion recognition (SER). Self-supervised learning approaches have recently been proposed to mitigate the need for labeled data in various applications. This paper proposes the VQ-MAE-AV model, a vector quantized masked autoencoder (MAE) designed for audiovisual speech self-supervised representation learning and applied to SER. Unlike previous approaches, the proposed method employs a self-supervised paradigm based on discrete audio and visual speech representations learned by vector quantized variational autoencoders. A multimodal MAE with self- or cross-attention mechanisms is proposed to fuse the audio and visual speech modalities and to learn local and global representations of the audiovisual speech sequence, which are then used for an SER downstream task. Experimental results show that the proposed approach, which is pre-trained on the VoxCeleb2 database and fine-tuned on standard emotional audiovisual speech datasets, outperforms the state-of-the-art audiovisual SER methods. Extensive ablation experiments are also provided to assess the contribution of the different model components.
- [73] arXiv:2306.05597 (replaced) [pdf, ps, html, other]
-
Title: On the implementation of zero-determinant strategies in repeated gamesComments: 19 pagesSubjects: Statistical Mechanics (cond-mat.stat-mech); Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
Zero-determinant strategies are a class of strategies in repeated games which unilaterally control payoffs. Zero-determinant strategies have attracted much attention in studies of social dilemma, particularly in the context of evolution of cooperation. So far, not only general properties of zero-determinant strategies have been investigated, but zero-determinant strategies have been applied to control in the fields of information and communications technology and analysis of imitation. Here, we further deepen our understanding on general mathematical properties of zero-determinant strategies. We first prove that zero-determinant strategies, if exist, can be implemented by some one-dimensional transition probability. Next, we prove that, if a two-player game has a non-trivial potential function, a zero-determinant strategy exists in its repeated version. These results assist us to implement zero-determinant strategies in broader situations.
- [74] arXiv:2307.16580 (replaced) [pdf, ps, other]
-
Title: A multiscale and multicriteria Generative Adversarial Network to synthesize 1-dimensional turbulent fieldsCarlos Granero-Belinchon (ODYSSEY, IMT Atlantique - MEE, Lab-STICC_OSE), Manuel Cabeza Gallucci (IMT Atlantique - MEE, UBA, IMT Atlantique)Journal-ref: Machine Learning: Science and Technology, 2024, 5 (2), pp.025032.Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Fluid Dynamics (physics.flu-dyn)
This article introduces a new Neural Network stochastic model to generate a 1-dimensional stochastic field with turbulent velocity statistics. Both the model architecture and training procedure ground on the Kolmogorov and Obukhov statistical theories of fully developed turbulence, so guaranteeing descriptions of 1) energy distribution, 2) energy cascade and 3) intermittency across scales in agreement with experimental observations. The model is a Generative Adversarial Network with multiple multiscale optimization criteria. First, we use three physics-based criteria: the variance, skewness and flatness of the increments of the generated field that retrieve respectively the turbulent energy distribution, energy cascade and intermittency across scales. Second, the Generative Adversarial Network criterion, based on reproducing statistical distributions, is used on segments of different length of the generated field. Furthermore, to mimic multiscale decompositions frequently used in turbulence's studies, the model architecture is fully convolutional with kernel sizes varying along the multiple layers of the model. To train our model we use turbulent velocity signals from grid turbulence at Modane wind tunnel.
- [75] arXiv:2308.02324 (replaced) [pdf, ps, html, other]
-
Title: Robust mmWave/sub-THz multi-connectivity using minimal coordination and coarse synchronizationComments: Major revision: added ray-tracing simulation to validate the theoretical analysis, and refactored the presentation to avoid misleading connections with the canonical cell-free massive MIMO literatureSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This study investigates simpler alternatives to coherent joint transmission for supporting robust connectivity against signal blockage in mmWave/sub-THz access networks. By taking an information-theoretic viewpoint, we demonstrate analytically that with a careful design, full macrodiversity gains and significant SNR gains can be achieved through canonical receivers and minimal coordination and synchronization requirements at the infrastructure side. Our proposed scheme extends non-coherent joint transmission by employing a special form of diversity to counteract artificially induced deep fades that would otherwise make this technique often compare unfavorably against standard transmitter selection schemes. Additionally, the inclusion of an Alamouti-like space-time coding layer is shown to recover a significant fraction of the optimal performance. Our conclusions are based on an insightful multi-point intermittent block fading channel model that enables rigorous ergodic and outage rate analysis, while also considering timing offsets due to imperfect delay compensation. Although simplified, our approach captures the essential features of modern mmWave/sub-THz communications, thereby providing practical design guidelines for realistic systems.
- [76] arXiv:2309.16967 (replaced) [pdf, ps, html, other]
-
Title: nnSAM: Plug-and-play Segment Anything Model Improves nnUNet PerformanceSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Automatic segmentation of medical images is crucial in modern clinical workflows. The Segment Anything Model (SAM) has emerged as a versatile tool for image segmentation without specific domain training, but it requires human prompts and may have limitations in specific domains. Traditional models like nnUNet perform automatic segmentation during inference and are effective in specific domains but need extensive domain-specific training. To combine the strengths of foundational and domain-specific models, we propose nnSAM, integrating SAM's robust feature extraction with nnUNet's automatic configuration to enhance segmentation accuracy on small datasets. Our nnSAM model optimizes two main approaches: leveraging SAM's feature extraction and nnUNet's domain-specific adaptation, and incorporating a boundary shape supervision loss function based on level set functions and curvature calculations to learn anatomical shape priors from limited data. We evaluated nnSAM on four segmentation tasks: brain white matter, liver, lung, and heart segmentation. Our method outperformed others, achieving the highest DICE score of 82.77% and the lowest ASD of 1.14 mm in brain white matter segmentation with 20 training samples, compared to nnUNet's DICE score of 79.25% and ASD of 1.36 mm. A sample size study highlighted nnSAM's advantage with fewer training samples. Our results demonstrate significant improvements in segmentation performance with nnSAM, showcasing its potential for small-sample learning in medical image segmentation.
- [77] arXiv:2312.04140 (replaced) [pdf, ps, html, other]
-
Title: Polarimetric Light Transport Analysis for Specular Inter-reflectionComments: Accepted to IEEE Transactions on Computational Imaging (TCI)Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Polarization is well known for its ability to decompose diffuse and specular reflections. However, the existing decomposition methods only focus on direct reflection and overlook multiple reflections, especially specular inter-reflection. In this paper, we propose a novel decomposition method for handling specular inter-reflection of metal objects by using a unique polarimetric feature: the rotation direction of linear polarization. This rotation direction serves as a discriminative factor between direct and inter-reflection on specular surfaces. To decompose the reflectance components, we actively rotate the linear polarization of incident light and analyze the rotation direction of the reflected light. We evaluate our method using both synthetic and real data, demonstrating its effectiveness in decomposing specular inter-reflections of metal objects. Furthermore, we demonstrate that our method can be combined with other decomposition methods for a detailed analysis of light transport. As a practical application, we show its effectiveness in improving the accuracy of 3D measurement against strong specular inter-reflection.
- [78] arXiv:2402.01708 (replaced) [pdf, ps, other]
-
Title: Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech GeneratorsComments: 17 pages, 4 tables, 4 figures Accepted at the 2024 ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT '24)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Audio and Speech Processing (eess.AS)
The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States, where anonymous perpetrators create synthetic voices that call police officers to close down schools and hospitals, or to violently gain access to innocent citizens' homes. Incidents like this demonstrate that multimodal generative AI risks and harms do not exist in isolation, but arise from the interactions of multiple stakeholders and technical AI systems. In this paper we analyse speech generation incidents to study how patterns of specific harms arise. We find that specific harms can be categorised according to the exposure of affected individuals, that is to say whether they are a subject of, interact with, suffer due to, or are excluded from speech generation systems. Similarly, specific harms are also a consequence of the motives of the creators and deployers of the systems. Based on these insights we propose a conceptual framework for modelling pathways to ethical and safety harms of AI, which we use to develop a taxonomy of harms of speech generators. Our relational approach captures the complexity of risks and harms in sociotechnical AI systems, and yields a taxonomy that can support appropriate policy interventions and decision making for the responsible development and release of speech generation models.
- [79] arXiv:2402.02551 (replaced) [pdf, ps, html, other]
-
Title: Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching TasksComments: This paper has been accepted at the International Conference on Mechatronics and Automation (ICMA 2024), sponsored by the IEEESubjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability, which may pose challenges in ensuring stability and safety. To address these issues, we propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy, all while actively engaging in the learning phase through interactions with the environment. This approach circumvents the control performance and complexities associated with computations while addressing nonrepetitive reaching tasks in the presence of obstacles. First, a model-free DRL agent is employed to plan velocity-bounded motion for a manipulator with 'n' degrees of freedom (DoF), ensuring collision avoidance for the end-effector through joint-level reasoning. The generated reference motion is then input into a robust subsystem-based adaptive controller, which produces the necessary torques, while the cuckoo search optimization (CSO) algorithm enhances control gains to minimize the stabilization and tracking error in the steady state. This approach guarantees robustness and uniform exponential convergence in an unfamiliar environment, despite the presence of uncertainties and disturbances. Theoretical assertions are validated through the presentation of simulation outcomes.
- [80] arXiv:2402.06178 (replaced) [pdf, ps, html, other]
-
Title: MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion ModelsYixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon DixonComments: Accepted to IJCAI 2024Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged. Our method transforms text editing to \textit{latent space manipulation} while adding an extra constraint to enforce consistency. It seamlessly integrates with existing pretrained text-to-music diffusion models without requiring additional training. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations. Additionally, we showcase the practical applicability of our approach in real-world music editing scenarios.
- [81] arXiv:2403.18621 (replaced) [pdf, ps, html, other]
-
Title: Performance Analysis of Integrated Sensing and Communication Networks with Blockage EffectsComments: Submitted to IEEE Transactions on Vehicular TechnologySubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Communication-sensing integration represents an up-and-coming area of research, enabling wireless networks to simultaneously perform communication and sensing tasks. However, in urban cellular networks, the blockage of buildings results in a complex signal propagation environment, affecting the performance analysis of integrated sensing and communication (ISAC) networks. To overcome this obstacle, this paper constructs a comprehensive framework considering building blockage and employs a distance-correlated blockage model to analyze interference from line of sight (LoS), non-line of sight (NLoS), and target reflection cascading (TRC) links. Using stochastic geometric theory, expressions for signal-to-interference-plus-noise ratio (SINR) and coverage probability for communication and sensing in the presence of blockage are derived, allowing for a comprehensive comparison under the same parameters. The research findings indicate that blockage can positively impact coverage, especially in enhancing communication performance. The analysis also suggests that there exists an optimal base station (BS) density when blockage is of the same order of magnitude as the BS density, maximizing communication or sensing coverage probability.
- [82] arXiv:2405.04471 (replaced) [pdf, ps, html, other]
-
Title: Universal Spatial Audio TranscoderComments: 12 pages, 8 figures. Accepted for presentation at the AES 156th Convention, Madrid, Spain (June 2024)Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
This paper addresses the challenges associated with both the conversion between different spatial audio formats and the decoding of a spatial audio format to a specific loudspeaker layout. Existing approaches often rely on layout remapping tools, which may not guarantee optimal conversion from a psychoacoustic perspective. To overcome these challenges, we present the Universal Spatial Audio Transcoder (USAT) method and its corresponding open source implementation. USAT generates an optimal decoder or transcoder for any input spatial audio format, adapting it to any output format or 2D/3D loudspeaker configuration. Drawing upon optimization techniques based on psychoacoustic principles, the algorithm maximizes the preservation of spatial information. We present examples of the decoding and transcoding of several audio formats, and show that USAT approach is advantageous compared to the most common methods in the field.
- [83] arXiv:2405.04880 (replaced) [pdf, ps, html, other]
-
Title: The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake AudioYuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi SunSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including 2 languages, over 1M audio samples, and various test conditions, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.
- [84] arXiv:2405.07536 (replaced) [pdf, ps, html, other]
-
Title: Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path GeneratorSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system.
- [85] arXiv:2405.08596 (replaced) [pdf, ps, other]
-
Title: EVDA: Evolving Deepfake Audio Detection Continual Learning BenchmarkComments: This paper need more modificationSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts as an effective tool for detecting newly emerged deepfake audio while maintaining performance on older types, lacks a well-constructed and user-friendly evaluation framework. To address this gap, we introduce EVDA, a benchmark for evaluating continual learning methods in deepfake audio detection. EVDA includes classic datasets from the Anti-Spoofing Voice series, Chinese fake audio detection series, and newly generated deepfake audio from models like GPT-4 and GPT-4o. It supports various continual learning techniques, such as Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), and recent methods like Regularized Adaptive Weight Modification (RAWM) and Radian Weight Modification (RWM). Additionally, EVDA facilitates the development of robust algorithms by providing an open interface for integrating new continual learning methods
- [86] arXiv:2405.08691 (replaced) [pdf, ps, html, other]
-
Title: Enhancing Reinforcement Learning in Sensor Fusion: A Comparative Analysis of Cubature and Sampling-based Integration Methods for Rover Search PlanningComments: Submitted to IROS 2024Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This study investigates the computational speed and accuracy of two numerical integration methods, cubature and sampling-based, for integrating an integrand over a 2D polygon. Using a group of rovers searching the Martian surface with a limited sensor footprint as a test bed, the relative error and computational time are compared as the area was subdivided to improve accuracy in the sampling-based approach. The results show that the sampling-based approach exhibits a $14.75\%$ deviation in relative error compared to cubature when it matches the computational performance at $100\%$. Furthermore, achieving a relative error below $1\%$ necessitates a $10000\%$ increase in relative time to calculate due to the $\mathcal{O}(N^2)$ complexity of the sampling-based method. It is concluded that for enhancing reinforcement learning capabilities and other high iteration algorithms, the cubature method is preferred over the sampling-based method.