Sound

Authors and titles for recent submissions

Mon, 29 Apr 2024
Fri, 26 Apr 2024
Thu, 25 Apr 2024
Wed, 24 Apr 2024
Tue, 23 Apr 2024

[ total of 51 entries: 1-25 | 26-50 | 51 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 29 Apr 2024

[1] arXiv:2404.17280 [pdf, other]: Title: Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2404.17161 [pdf, other]: Title: An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

Authors: Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu

Comments: arXiv admin note: text overlap with arXiv:2311.14957

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[3] arXiv:2404.17022 [pdf, ps, other]: Title: Investigating differences in lab-quality and remote recording methods with dynamic acoustic measures

Authors: Cong Zhang, Kathleen Jepson, Yu-Ying Chuang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:2404.16969 [pdf, other]: Title: COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

Authors: Ruben Ciranni, Emilian Postolache, Giorgio Mariani, Michele Mancusi, Luca Cosmo, Emanuele Rodolà

Comments: Demo page: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5] arXiv:2404.17552 (cross-list from eess.AS) [pdf, other]: Title: A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification

Authors: Rémi Uro, David Doukhan, Albert Rilliard, Laëtitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon, Antoine Laurent

Comments: Keywords:, semi-automatic processing, corpus creation, diarization, speaker identification, gender-balanced, age-balanced, speaker corpus, diachrony

Journal-ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 3271-3280, Marseille, 20-25 June 2022. European Language Resources Association (ELRA)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG); Sound (cs.SD)
[6] arXiv:2404.17490 (cross-list from eess.AS) [pdf, other]: Title: The CARFAC v2 Cochlear Model in Matlab, NumPy, and JAX

Authors: Richard F. Lyon, Rob Schonberger, Malcolm Slaney, Mihajlo Velimirović, Honglin Yu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[7] arXiv:2404.17252 (cross-list from cs.LG) [pdf, ps, other]: Title: Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition

Authors: Houtan Ghaffari, Paul Devos

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[8] arXiv:2404.17107 (cross-list from eess.AS) [pdf, other]: Title: Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection

Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Comments: 4 pages, 1 figure, and 4 tables. Accepted by IEEE EMBC 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2404.16905 (cross-list from cs.CL) [pdf, other]: Title: Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations

Authors: Shen Zhang, Haojie Zhang, Jing Zhang, Xudong Zhang, Yimeng Zhuang, Jinting Wu

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 26 Apr 2024

[10] arXiv:2404.16619 [pdf, other]: Title: The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge

Authors: Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu

Comments: Accepted in Grand Challenge of ICASSP 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2404.16436 [pdf, ps, other]: Title: Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics

Authors: Ben Williams, Bart van Merriënboer, Vincent Dumoulin, Jenny Hamer, Eleni Triantafillou, Abram B. Fleishman, Matthew McKown, Jill E. Munger, Aaron N. Rice, Ashlee Lillis, Clemency E. White, Catherine A. D. Hobbs, Tries B. Razak, Kate E. Jones, Tom Denton

Comments: 18 pages, 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12] arXiv:2404.16259 [pdf, other]: Title: An Experiment with Electric Guitar Signals for Exploring the Virtuosity based on the Entropy of Music

Authors: Igor Lugo, Martha G. Alatriste-Contreras

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13] arXiv:2404.16743 (cross-list from cs.CL) [pdf, other]: Title: Automatic Speech Recognition System-Independent Word Error Rate Estimation

Authors: Chanho Park, Mingjie Chen, Thomas Hain

Comments: Accepted to LREC-COLING 2024 (long)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2404.16547 (cross-list from eess.AS) [pdf, other]: Title: Developing Acoustic Models for Automatic Speech Recognition in Swedish

Authors: Giampiero Salvi

Comments: 16 pages, 7 figures

Journal-ref: European Student Journal of Language and Speech, 1999

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[15] arXiv:2404.16305 (cross-list from cs.MM) [pdf, other]: Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model

Authors: Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao Sang

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2404.16216 (cross-list from cs.CV) [pdf, other]: Title: ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling

Authors: Arjun Somayazulu, Sagnik Majumder, Changan Chen, Kristen Grauman

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2404.16104 (cross-list from eess.AS) [pdf, other]: Title: Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective

Authors: Albert Rilliard, David Doukhan, Rémi Uro, Simon Devauchelle

Comments: 5 pages, 2 figures, keywords:, Gender, Diachrony, Vocal Tract Resonance, Vocal register, Broadcast speech

Journal-ref: Radek Skarnitzl & Jan Vol\'in (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), Prague 2023, pp. 753-757. Guarant International. ISBN 978-80-908 114-2-3

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[18] arXiv:2404.13101 (cross-list from eess.IV) [pdf, ps, other]: Title: DensePANet: An improved generative adversarial network for photoacoustic tomography image reconstruction from sparse data

Authors: Hesam Hakimnejad, Zohreh Azimifar, Narjes Goshtasbi

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)

Thu, 25 Apr 2024

[19] arXiv:2404.15637 [pdf, other]: Title: HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts

Authors: Xinlei Niu, Jing Zhang, Charles Patrick Martin

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20] arXiv:2404.15854 (cross-list from cs.CR) [pdf, other]: Title: CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

Authors: Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu

Comments: Submitted to IEEE TDSC

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2404.15704 (cross-list from cs.LG) [pdf, other]: Title: Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2404.15321 (cross-list from eess.SP) [pdf, other]: Title: Characteristics-Based Design of Multi-Exponent Bandpass Filters

Authors: Samiya A Alkhairy

Comments: 14 pages, 5 figures, 2 tables, 62 equations. Submitted to IEEE Transactions on Circuits and Systems I: Regular Papers

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 24 Apr 2024 (showing first 3 of 15 entries)

[23] arXiv:2404.15181 [pdf, ps, other]: Title: Tailors: New Music Timbre Visualizer to Entertain Music Through Imagery

Authors: ChungHa Lee

Comments: 47 pages, 9 figures, 5 tables

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[24] arXiv:2404.15160 [pdf, ps, other]: Title: Vector Signal Reconstruction Sparse and Parametric Approach of direction of arrival Using Single Vector Hydrophone

Authors: Jiabin Guo

Comments: 22 pages. arXiv admin note: substantial text overlap with arXiv:2404.13568

Subjects: Sound (cs.SD)
[25] arXiv:2404.15143 [pdf, other]: Title: Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Authors: Seth Layton, Thiago De Andrade, Daniel Olszewski, Kevin Warren, Carrie Gates, Kevin Butler, Patrick Traynor

Comments: Submitted to ACM journal -- Digital Threats: Research and Practice

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Mon, 29 Apr 2024
Fri, 26 Apr 2024
Thu, 25 Apr 2024
Wed, 24 Apr 2024
Tue, 23 Apr 2024

[ total of 51 entries: 1-25 | 26-50 | 51 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions

Mon, 29 Apr 2024

Fri, 26 Apr 2024

Thu, 25 Apr 2024

Wed, 24 Apr 2024 (showing first 3 of 15 entries)