Information Theory

New submissions
Cross-lists
Replacements

Total of 21 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2405.09746 [pdf, ps, html, other]: Title: Algebraic Geometric Rook Codes for Coded Distributed Computing

Gretchen L. Matthews, Pedro Soto

Comments: 6 pages

Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM); Algebraic Geometry (math.AG)

We extend coded distributed computing over finite fields to allow the number of workers to be larger than the field size. We give codes that work for fully general matrix multiplication and show that in this case we serendipitously have that all functions can be computed in a distributed fault-tolerant fashion over finite fields. This generalizes previous results on the topic. We prove that the associated codes achieve a recovery threshold similar to the ones for characteristic zero fields but now with a factor that is proportional to the genus of the underlying function field. In particular, we have that the recovery threshold of these codes is proportional to the classical complexity of matrix multiplication by a factor of at most the genus.
[2] arXiv:2405.09753 [pdf, ps, html, other]: Title: Stacked Intelligent Metasurfaces for Holographic MIMO Aided Cell-Free Networks

Qingchao Li, Mohammed El-Hajjar, Chao Xu, Jiancheng An, Chau Yuen, Lajos Hanzo

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Large-scale multiple-input and multiple-output (MIMO) systems are capable of achieving high date rate. However, given the high hardware cost and excessive power consumption of massive MIMO systems, as a remedy, intelligent metasurfaces have been designed for efficient holographic MIMO (HMIMO) systems. In this paper, we propose a HMIMO architecture based on stacked intelligent metasurfaces (SIM) for the uplink of cell-free systems, where the SIM is employed at the access points (APs) for improving the spectral- and energy-efficiency. Specifically, we conceive distributed beamforming for SIM-assisted cell-free networks, where both the SIM coefficients and the local receiver combiner vectors of each AP are optimized based on the local channel state information (CSI) for the local detection of each user equipment (UE) information. Afterward, the central processing unit (CPU) fuses the local detections gleaned from all APs to detect the aggregate multi-user signal. Specifically, to design the SIM coefficients and the combining vectors of the APs, a low-complexity layer-by-layer iterative optimization algorithm is proposed for maximizing the equivalent gain of the channel spanning from the UEs to the APs. At the CPU, the weight vector used for combining the local detections from all APs is designed based on the minimum mean square error (MMSE) criterion, where the hardware impairments (HWIs) are also taken into consideration based on their statistics. The simulation results show that the SIM-based HMIMO outperforms the conventional single-layer HMIMO in terms of the achievable rate. We demonstrate that both the HWI of the radio frequency (RF) chains at the APs and the UEs limit the achievable rate in the high signal-to-noise-ratio (SNR) region.
[3] arXiv:2405.09905 [pdf, ps, html, other]: Title: Cell-Free Terahertz Massive MIMO: A Novel Paradigm Beyond Ultra-Massive MIMO

Wei Jiang, Hans D. Schotten

Comments: The Fourth IEEE International Mediterranean Conference on Communications and Networking (IEEE MEDITCOM 2024), July 2024, Madrid, Spain

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Terahertz (THz) frequencies have recently garnered considerable attention due to their potential to offer abundant spectral resources for communication, as well as distinct advantages in sensing, positioning, and imaging. Nevertheless, practical implementation encounters challenges stemming from the limited distances of signal transmission, primarily due to notable propagation, absorption, and blockage losses. To address this issue, the current strategy involves employing ultra-massive multi-input multi-output (UMMIMO) to generate high beamforming gains, thereby extending the transmission range. This paper introduces an alternative solution through the utilization of cell-free massive MIMO (CFmMIMO) architecture, wherein the closest access point is actively chosen to reduce the distance, rather than relying solely on a substantial number of antennas. We compare these two techniques through simulations and the numerical results justify that CFmMIMO is superior to UMMIMO in both spectral and energy efficiency at THz frequencies.
[4] arXiv:2405.09914 [pdf, ps, html, other]: Title: Distributed Joint User Activity Detection, Channel Estimation, and Data Detection via Expectation Propagation in Cell-Free Massive MIMO

Christian Forsch, Alexander Karataev, Laura Cottatellucci

Comments: 39 pages, 4 figures, submitted for possible conference publication (shortened version)

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We consider the uplink of a grant-free cell-free massive multiple-input multiple-output (GF-CF-MaMIMO) system. We propose an algorithm for distributed joint activity detection, channel estimation, and data detection (JACD) based on expectation propagation (EP) called JACD-EP. We develop the algorithm by factorizing the a posteriori probability (APP) of activities, channels, and transmitted data, then, mapping functions and variables onto a factor graph, and finally, performing a message passing on the resulting factor graph. If users with the same pilot sequence are sufficiently distant from each other, the JACD-EP algorithm is able to mitigate the effects of pilot contamination which naturally occurs in grant-free systems due to the large number of potential users and limited signaling resources. Furthermore, it outperforms state-of-the-art algorithms for JACD in GF-CF-MaMIMO systems.
[5] arXiv:2405.09928 [pdf, ps, html, other]: Title: Unified Modeling and Performance Comparison for Cellular and Cell-Free Massive MIMO

Wei Jiang, Hans D. Schotten

Comments: The Fourth IEEE International Mediterranean Conference on Communications and Networking (IEEE MEDITCOM 2024)

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Cell-free massive multi-input multi-output (MIMO) has recently gained a lot of attention due to its high potential in sixth-generation (6G) wireless systems. The goal of this paper is to first present a unified modeling for massive MIMO, encompassing both cellular and cell-free architectures with a variable number of antennas per access point. We derive signal transmission models and achievable spectral efficiency in both the downlink and uplink using zero-forcing and maximal-ratio schemes. We also provide performance comparisons in terms of per-user and sum spectral efficiency.
[6] arXiv:2405.09944 [pdf, ps, other]: Title: Reed-Muller codes in the sum-rank metric

Elena Berardini (UB, IMB, CANARI), Xavier Caruso (IMB, CANARI, UB)

Subjects: Information Theory (cs.IT)

We introduce the sum-rank metric analogue of Reed-Muller codes, which we called linearized Reed-Muller codes, using multivariate Ore polynomials. We study the parameters of these codes, compute their dimension and give a lower bound for their minimum distance. Our codes exhibit quite good parameters, respecting a similar bound to Reed-Muller codes in the Hamming metric. Finally, we also show that many of the newly introduced linearized Reed--Muller codes can be embedded in some linearized Algebraic Geometry codes, a property which could turn out to be useful in light of decoding.
[7] arXiv:2405.09951 [pdf, ps, html, other]: Title: A Review of Multiple Access Techniques for Intelligent Reflecting Surface-Assisted Systems

Wei Jiang, Hans Schotten

Comments: The Fourth IEEE International Mediterranean Conference on Communications and Networking (IEEE MEDITCOM 2024)

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Intelligent Reflecting Surface (IRS) is envisioned to be a technical enabler for the sixth-generation (6G) wireless system. Its potential lies in delivering high performance while maintaining both power efficiency and cost-effectiveness. Previous studies have primarily focused on point-to-point IRS communications involving a single user. Nevertheless, a practical system must serve multiple users simultaneously. The unique characteristics of IRS, such as non-frequency-selective reflection and the necessity for joint active/passive beamforming, create obstacles to the use of conventional multiple access (MA) techniques. This motivates us to review various MA techniques to make clear their functionalities in the presence of IRS. Through this paper, our aim is to provide researchers with a comprehensive understanding of challenges and available solutions, offering insights to foster their design of efficient multiple access for IRS-aided systems.
[8] arXiv:2405.10007 [pdf, ps, html, other]: Title: Sampling Theorem and interpolation formula for non-vanishing signals

Nikolai Dokuchaev

Comments: arXiv admin note: substantial text overlap with arXiv:2405.05566

Subjects: Information Theory (cs.IT)

The paper establishes frequency predictability criteria and presents predictors for two-sided non-vanishing bounded continuous time signals, i.e., for signals from $L_{\infty}(\R)$ that do not necessarily vanish at $\pm\infty$. The notions of transfer functions, the spectrum gaps, bandlimitness, and high-pass filters are introduced for these signals. This allowed to obtain an analog Whittaker-Shannon-Kotelnikov sampling theorem and a new modification of the corresponding interpolation formula that makes it applicable for non-vanishing signals.
[9] arXiv:2405.10124 [pdf, ps, html, other]: Title: Smoothing Linear Codes by R\'enyi Divergence and Applications to Security Reduction

Hao Yan, Cong Ling

Subjects: Information Theory (cs.IT)

The concept of the smoothing parameter plays a crucial role in both lattice-based and code-based cryptography, primarily due to its effectiveness in achieving nearly uniform distributions through the addition of noise. Recent research by Pathegama and Barg has determined the optimal smoothing bound for random codes under Rényi Divergence for any order $\alpha \in (1, \infty)$ \cite{pathegama2024r}. Considering the inherent complexity of encoding/decoding algorithms in random codes, our research introduces enhanced structural elements into these coding schemes. Specifically, this paper presents a novel derivation of the smoothing bound for random linear codes, maintaining the same order of Rényi Divergence and achieving optimality for any $\alpha\in (1,\infty)$. We extend this framework under KL Divergence by transitioning from random linear codes to random self-dual codes, and subsequently to random quasi-cyclic codes, incorporating progressively more structures. As an application, we derive an average-case to average-case reduction from the Learning Parity with Noise (LPN) problem to the average-case decoding problem. This reduction aligns with the parameter regime in \cite{debris2022worst}, but uniquely employs Rényi divergence and directly considers Bernoulli noise, instead of combining ball noise and Bernoulli noise.

[10] arXiv:2405.07393 (cross-list from cs.LG) [pdf, ps, html, other]: Title: Intrinsic Fairness-Accuracy Tradeoffs under Equalized Odds

Meiyu Zhong, Ravi Tandon

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

With the growing adoption of machine learning (ML) systems in areas like law enforcement, criminal justice, finance, hiring, and admissions, it is increasingly critical to guarantee the fairness of decisions assisted by ML. In this paper, we study the tradeoff between fairness and accuracy under the statistical notion of equalized odds. We present a new upper bound on the accuracy (that holds for any classifier), as a function of the fairness budget. In addition, our bounds also exhibit dependence on the underlying statistics of the data, labels and the sensitive group attributes. We validate our theoretical upper bounds through empirical analysis on three real-world datasets: COMPAS, Adult, and Law School. Specifically, we compare our upper bound to the tradeoffs that are achieved by various existing fair classifiers in the literature. Our results show that achieving high accuracy subject to a low-bias could be fundamentally limited based on the statistical disparity across the groups.
[11] arXiv:2405.09554 (cross-list from eess.SP) [pdf, ps, other]: Title: Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior

Yongfeng Huang, Zhendong Chen, Kun Ye, Lang Zhou, Haixin Sun

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots.
[12] arXiv:2405.09556 (cross-list from eess.SP) [pdf, ps, html, other]: Title: Co-learning-aided Multi-modal-deep-learning Framework of Passive DOA Estimators for a Heterogeneous Hybrid Massive MIMO Receiver

Jiatong Bai, Feng Shu, Qinghe Zheng, Bo Xu, Baihua Shi, Yiwen Chen, Weibin Zhang, Xianpeng Wang

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

Due to its excellent performance in rate and resolution, fully-digital (FD) massive multiple-input multiple-output (MIMO) antenna arrays has been widely applied in data transmission and direction of arrival (DOA) measurements, etc. But it confronts with two main challenges: high computational complexity and circuit cost. The two problems may be addressed well by hybrid analog-digital (HAD) structure. But there exists the problem of phase ambiguity for HAD, which leads to its low-efficiency or high-latency. Does exist there such a MIMO structure of owning low-cost, low-complexity and high time efficiency at the same time. To satisfy the three properties, a novel heterogeneous hybrid MIMO receiver structure of integrating FD and heterogeneous HAD ($\rm{H}^2$AD-FD) is proposed and corresponding multi-modal (MD)-learning framework is developed. The framework includes three major stages: 1) generate the candidate sets via root multiple signal classification (Root-MUSIC) or deep learning (DL); 2) infer the class of true solutions from candidate sets using machine learning (ML) methods; 3) fuse the two-part true solutions to achieve a better DOA estimation. The above process form two methods named MD-Root-MUSIC and MDDL. To improve DOA estimation accuracy and reduce the clustering complexity, a co-learning-aided MD framework is proposed to form two enhanced methods named CoMDDL and CoMD-RootMUSIC. Moreover, the Cramer-Rao lower bound (CRLB) for the proposed $\rm{H}^2$AD-FD structure is also derived. Experimental results demonstrate that our proposed four methods could approach the CRLB for signal-to-noise ratio (SNR) > 0 dB and the proposed CoMDDL and MDDL perform better than CoMD-RootMUSIC and MD-RootMUSIC, particularly in the extremely low SNR region.
[13] arXiv:2405.09909 (cross-list from cs.LG) [pdf, ps, html, other]: Title: A Machine Learning Approach for Simultaneous Demapping of QAM and APSK Constellations

Arwin Gansekoele, Alexios Balatsoukas-Stimming, Tom Brusse, Mark Hoogendoorn, Sandjai Bhulai, Rob van der Mei

Comments: To appear in the ICMLCN 2024 proceedings

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

As telecommunication systems evolve to meet increasing demands, integrating deep neural networks (DNNs) has shown promise in enhancing performance. However, the trade-off between accuracy and flexibility remains challenging when replacing traditional receivers with DNNs. This paper introduces a novel probabilistic framework that allows a single DNN demapper to demap multiple QAM and APSK constellations simultaneously. We also demonstrate that our framework allows exploiting hierarchical relationships in families of constellations. The consequence is that we need fewer neural network outputs to encode the same function without an increase in Bit Error Rate (BER). Our simulation results confirm that our approach approaches the optimal demodulation error bound under an Additive White Gaussian Noise (AWGN) channel for multiple constellations. Thereby, we address multiple important issues in making DNNs flexible enough for practical use as receivers.
[14] arXiv:2405.09915 (cross-list from eess.SP) [pdf, ps, html, other]: Title: Sparse Regression Codes for Non-Coherent SIMO channels

Sai Dinesh Kancharana, Madhusudan Kumar Sinha, Arun Pachai Kannu

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

We study the sparse regression codes over flat-fading channels with multiple receive antennas. We consider a practical scenario where the channel state information is not available at the transmitter and the receiver. In this setting, we study the maximum likelihood (ML) detector for SPARC, which has a prohibitively high search complexity. We propose a novel practical decoder, named maximum likelihood matching pursuit (MLMP), which incorporates a greedy search mechanism along with the ML metric. We also introduce a parallel search mechanism for MLMP. Comparing with the existing block-orthogonal matching pursuit based decoders, we show that MLMP has significant gains in the block error rate (BLER) performance. We also show that the proposed approach has significant gains over polar codes employing pilot-aided channel estimation.

[15] arXiv:2311.08386 (replaced) [pdf, ps, html, other]: Title: Capacity of Summation over a Symmetric Quantum Erasure MAC with Partially Replicated Inputs

Yuhang Yao, Syed A. Jafar

Subjects: Information Theory (cs.IT)

The optimal quantum communication cost of computing a classical sum of distributed sources is studied over a quantum erasure multiple access channel (QEMAC). K classical messages comprised of finite-field symbols are distributed across $S$ servers, who also share quantum entanglement in advance. Each server $s\in[S]$ manipulates its quantum subsystem $\mathcal{Q}_s$ according to its own available classical messages and sends $\mathcal{Q}_s$ to the receiver who then computes the sum of the messages based on a joint quantum measurement. The download cost from Server $s\in [S]$ is the logarithm of the dimension of $\mathcal{Q}_s$. The rate $R$ is defined as the number of instances of the sum computed at the receiver, divided by the total download cost from all the servers. The main focus is on the symmetric setting with $K= {S \choose \alpha} $ messages where each message is replicated among a unique subset of $\alpha$ servers, and the answers from any $\beta$ servers may be erased. If no entanglement is initially available to the receiver, then we show that the capacity (maximal rate) is precisely $C= \max\left\{ \min \left\{ \frac{2(\alpha-\beta)}{S}, \frac{S-2\beta}{S} \right\}, \frac{\alpha-\beta}{S} \right\}$. The capacity with arbitrary levels of prior entanglement $(\Delta_0)$ between the $S$ data-servers and the receiver is also characterized, by including an auxiliary server (Server $0$) that has no classical data, so that the communication cost from Server $0$ is a proxy for the amount of receiver-side entanglement that is available in advance. The challenge on the converse side resides in the optimal application of the weak monotonicity property, while the achievability combines ideas from classical network coding and treating qudits as classical dits, as well as new constructions based on the $N$-sum box abstraction that rely on absolutely maximally entangled quantum states.
[16] arXiv:2404.17946 (replaced) [pdf, ps, html, other]: Title: Geometric Characteristic in Phaseless Operator and Structured Matrix Recovery

Gao Huang, Song Li

Subjects: Information Theory (cs.IT)

In this paper, we first propose a simple and unified approach to stability of phaseless operator to both amplitude and intensity measurement, both complex and real cases on arbitrary geometric set, thus characterizing the robust performance of phase retrieval via empirical minimization method. The unified analysis involves the random embedding of concave lifting operator on tangent space. Similarly, we investigate structured matrix recovery problem through the robust injectivity of linear rank one measurement operator on arbitrary matrix set. The core of our analysis lies in bounding the empirical chaos process. We introduce Talagrand's $\gamma_{\alpha}$ functionals to characterize the relationship between the required number of measurements and the geometric constraints. Additionally, adversarial noise is generated to illustrate the recovery bounds are sharp in the above situations.
[17] arXiv:2404.18705 (replaced) [pdf, ps, html, other]: Title: Wireless Information and Energy Transfer in the Era of 6G Communications

Constantinos Psomas, Konstantinos Ntougias, Nikita Shanin, Dongfang Xu, Kenneth MacSporran Mayer, Nguyen Minh Tran, Laura Cottatellucci, Kae Won Choi, Dong In Kim, Robert Schober, Ioannis Krikidis

Comments: Proceedings of the IEEE, 36 pages, 33 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Wireless information and energy transfer (WIET) represents an emerging paradigm which employs controllable transmission of radio-frequency signals for the dual purpose of data communication and wireless charging. As such, WIET is widely regarded as an enabler of envisioned 6G use cases that rely on energy-sustainable Internet-of-Things (IoT) networks, such as smart cities and smart grids. Meeting the quality-of-service demands of WIET, in terms of both data transfer and power delivery, requires effective co-design of the information and energy signals. In this article, we present the main principles and design aspects of WIET, focusing on its integration in 6G networks. First, we discuss how conventional communication notions such as resource allocation and waveform design need to be revisited in the context of WIET. Next, we consider various candidate 6G technologies that can boost WIET efficiency, namely, holographic multiple-input multiple-output, near-field beamforming, terahertz communication, intelligent reflecting surfaces (IRSs), and reconfigurable (fluid) antenna arrays. We introduce respective WIET design methods, analyze the promising performance gains of these WIET systems, and discuss challenges, open issues, and future research directions. Finally, a near-field energy beamforming scheme and a power-based IRS beamforming algorithm are experimentally validated using a wireless energy transfer testbed. The vision of WIET in communication systems has been gaining momentum in recent years, with constant progress with respect to theoretical but also practical aspects. The comprehensive overview of the state of the art of WIET presented in this paper highlights the potentials of WIET systems as well as their overall benefits in 6G networks.
[18] arXiv:2405.00391 (replaced) [pdf, ps, html, other]: Title: Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays

Fenghao Zhu, Xinquan Wang, Chongwen Huang, Ahmed Alhammadi, Hui Chen, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial intelligence have been developed to reduce beamforming overhead. Intuitively, if we can implement the near-optimal beamforming only using a tiny subset of the all channel information, the overhead for channel estimation and beamforming would be reduced significantly compared with the traditional beamforming methods that usually need full channel information and the inversion of large dimensional matrix. In light of this idea, we propose a novel scheme that utilizes Wasserstein generative adversarial network with gradient penalty to infer the full beamforming matrices based on very little of channel information. Simulation results confirm that it can accomplish comparable performance with the weighted minimum mean-square error algorithm, while reducing the overhead by over 50%.
[19] arXiv:2310.15092 (replaced) [pdf, ps, html, other]: Title: Dihedral Quantum Codes

Nadja Willenborg, Martino Borello, Anna-Lena Horlemann, Habibul Islam

Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)

We establish dihedral quantum codes of short block length, a class of CSS codes obtained by the lifted product construction. We present the code construction and give a formula for the code dimension, depending on the two classical codes that the CSS code is based on. We also give a lower bound on the code distance and construct an example of short dihedral quantum codes.
[20] arXiv:2403.17868 (replaced) [pdf, ps, html, other]: Title: An invitation to the sample complexity of quantum hypothesis testing

Hao-Chung Cheng, Nilanjana Datta, Nana Liu, Theshani Nuradha, Robert Salzmann, Mark M. Wilde

Comments: v3: 58 pages, 1 figure, correction to Corollary 10; see independent and concurrent work of Pensia, Jog, Loh at arXiv:2403.16981

Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)

Quantum hypothesis testing (QHT) has been traditionally studied from the information-theoretic perspective, wherein one is interested in the optimal decay rate of error probabilities as a function of the number of samples of an unknown state. In this paper, we study the sample complexity of QHT, wherein the goal is to determine the minimum number of samples needed to reach a desired error probability. By making use of the wealth of knowledge that already exists in the literature on QHT, we characterize the sample complexity of binary QHT in the symmetric and asymmetric settings, and we provide bounds on the sample complexity of multiple QHT. In more detail, we prove that the sample complexity of symmetric binary QHT depends logarithmically on the inverse error probability and inversely on the negative logarithm of the fidelity. As a counterpart of the quantum Stein's lemma, we also find that the sample complexity of asymmetric binary QHT depends logarithmically on the inverse type II error probability and inversely on the quantum relative entropy, provided that the type II error probability is sufficiently small. We then provide lower and upper bounds on the sample complexity of multiple QHT, with it remaining an intriguing open question to improve these bounds. The final part of our paper outlines and reviews how sample complexity of QHT is relevant to a broad swathe of research areas and can enhance understanding of many fundamental concepts, including quantum algorithms for simulation and search, quantum learning and classification, and foundations of quantum mechanics. As such, we view our paper as an invitation to researchers coming from different communities to study and contribute to the problem of sample complexity of QHT, and we outline a number of open directions for future research.
[21] arXiv:2404.07009 (replaced) [pdf, ps, html, other]: Title: A Mathematical Theory for Learning Semantic Languages by Abstract Learners

Kuo-Yu Liao, Cheng-Shang Chang, Y.-W. Peter Hong

Comments: V1 was submitted to ISIT 2024 on Jan. 28, 2024. V2 was uploaded to ArXiv on April 13, 2024. V3 was uploaded to ArXiv on May 16, 2024

Subjects: Computation and Language (cs.CL); Information Theory (cs.IT); Machine Learning (cs.LG)

Recent advances in Large Language Models (LLMs) have demonstrated the emergence of capabilities (learned skills) when the number of system parameters and the size of training data surpass certain thresholds. The exact mechanisms behind such phenomena are not fully understood and remain a topic of active research. Inspired by the skill-text bipartite graph model proposed by Arora and Goyal for modeling semantic languages, we develop a mathematical theory to explain the emergence of learned skills, taking the learning (or training) process into account. Our approach models the learning process for skills in the skill-text bipartite graph as an iterative decoding process in Low-Density Parity Check (LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density evolution analysis, we demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold. Our analysis also yields a scaling law for testing errors relative to this ratio. Upon completion of the training, the association of learned skills can also be acquired to form a skill association graph. We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph. Our analysis can also be extended to the setting with a hierarchy of skills, where a fine-tuned model is built upon a foundation model. It is also applicable to the setting with multiple classes of skills and texts. As an important application, we propose a method for semantic compression and discuss its connections to semantic communication.

Total of 21 entries

Showing up to 2000 entries per page: fewer | more | all

Information Theory

New submissions for Friday, 17 May 2024 (showing 9 of 9 entries )

Cross submissions for Friday, 17 May 2024 (showing 5 of 5 entries )

Replacement submissions for Friday, 17 May 2024 (showing 7 of 7 entries )