Social and Information Networks
- [1] arXiv:2405.17571 [pdf, ps, html, other]
-
Title: Bluesky: Network Topology, Polarisation, and Algorithmic CurationSubjects: Social and Information Networks (cs.SI)
Bluesky is a nascent ``Twitter-like'' and decentralized social media network with novel features and unprecedented data access. This paper provides a characterization of the network, studying the political leaning, polarization, network structure, and algorithmic curation mechanisms of five million users. The dataset spans from the website's first release in February of 2023. Users of the new social media site are predominantly left-center leaning and share little to no links associated with questionable sources. In contrast to the homogeneous political stance, we find significant issues-based divergence by studying opinions related to the Israel-Palestine conflict. Two clear homophilic clusters emerge: Pro-Palestinian voices make up the plurality of messages related to the conflict and the proportion has increased with a lessening of interest. We investigate multiple layers of the multi-scale Bluesky network based on replies, likes, reposts, and follows, highlighting differences and similarities between the layers. We differentiate between persistent and non-persistent interactions and measure metrics of network topology over time. All networks are heavy-tailed, clustered, and connected by short paths. We showcase all feeds - algorithmic content recommenders - created for and by users. A large number of custom feeds have been created but their uptake by users is limited. Multiple popular feeds aim to provide similar feeds that are neither topical nor chronological. We conclude by claiming that Bluesky - for all its novel features - is very similar in terms of its network structure to existing and larger social media sites and provides unprecedented research opportunities for social scientists, network scientists, and political scientists alike.
- [2] arXiv:2405.17710 [pdf, ps, html, other]
-
Title: Does Geo-co-location Matter? A Case Study of Public Health Conversations during COVID-19Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)
Social media platforms like Twitter (now X) have been pivotal in information dissemination and public engagement, especially during COVID-19. A key goal for public health experts was to encourage prosocial behavior that could impact local outcomes such as masking and social distancing. Given the importance of local news and guidance during COVID-19, the objective of our research is to analyze the effect of localized engagement, on social media conversations. This study examines the impact of geographic co-location, as a proxy for localized engagement between public health experts (PHEs) and the public, on social media. We analyze a Twitter conversation dataset from January 2020 to November 2021, comprising over 19 K tweets from nearly five hundred PHEs, along with approximately 800 K replies from 350 K participants. Our findings reveal that geo-co-location is associated with higher engagement rates, especially in conversations on topics including masking, lockdowns, and education, and in conversations with academic and medical professionals. Lexical features associated with emotion and personal experiences were more common in geo-co-located contexts. This research provides insights into how geographic co-location influences social media engagement and can inform strategies to improve public health messaging.
- [3] arXiv:2405.18059 [pdf, ps, html, other]
-
Title: Rank-Refining Seed Selection Methods for Budget Constrained Influence Maximisation in Multilayer Networks under Linear Threshold ModelComments: Submitted to Network Science (this https URL)Subjects: Social and Information Networks (cs.SI)
The problem of selecting an optimal seed set to maximise influence in networks has been a subject of intense research in recent years. However, despite numerous works addressing this area, it remains a topic that requires further elaboration. Most often, it is considered within the scope of classically defined graphs with a spreading model in the form of Independent Cascades. In this work, we focus on the problem of budget-constrained influence maximisation in multilayer networks using a Linear Threshold Model. Both the graph model and the spreading process we employ are less prevalent in the literature, even though their application allows for a more precise representation of the opinion dynamics in social networks. This paper aims to answer which of the sixteen evaluated seed selection methods is the most effective and how similar they are. Additionally, we focus our analysis on the impact of spreading model parameters, network characteristics, a budget, and the seed selection methods on the diffusion effectiveness in multilayer networks. Our contribution also includes extending several centrality measures and heuristics to the case of such graphs. The results indicate that all the factors mentioned above collectively contribute to the effectiveness of influence maximisation. Moreover, there is no seed selection method which always provides the best results. However, the seeds chosen with VoteRank-based methods (especially with the $v-rnk-m$ variant we propose) usually provide the most extensive diffusion.
- [4] arXiv:2405.18085 [pdf, ps, html, other]
-
Title: Network Diffusion -- Framework to Simulate Spreading Processes in Complex NetworksMichał Czuba, Mateusz Nurek, Damian Serwata, Yu-Xuan Qiu, Mingshan Jia, Katarzyna Musial, Radosław Michalski, Piotr BródkaComments: To be published in: Big Data Mining and Analytics (this https URL)Subjects: Social and Information Networks (cs.SI); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
With the advancement of computational network science, its research scope has significantly expanded beyond static graphs to encompass more complex structures. The introduction of streaming, temporal, multilayer, and hypernetwork approaches has brought new possibilities and imposed additional requirements. For instance, by utilising these advancements, one can model structures such as social networks in a much more refined manner, which is particularly relevant in simulations of the spreading processes. Unfortunately, the pace of advancement is often too rapid for existing computational packages to keep up with the functionality updates. This results in a significant proliferation of tools used by researchers and, consequently, a lack of a universally accepted technological stack that would standardise experimental methods (as seen, e.g. in machine learning). This article addresses that issue by presenting an extended version of the Network Diffusion library. First, a survey of the existing approaches and toolkits for simulating spreading phenomena is shown and then, an overview of the framework functionalities. Finally, we report four case studies conducted with the package to demonstrate its usefulness: the impact of sanitary measures on the spread of COVID-19, the comparison of information diffusion on two temporal network models, and the effectiveness of seed selection methods in the task of influence maximisation in multilayer networks. We conclude the paper with a critical assessment of the library and the outline of still awaiting challenges to standardise research environments in computational network science.
New submissions for Wednesday, 29 May 2024 (showing 4 of 4 entries )
- [5] arXiv:2405.17473 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Repeat-Aware Neighbor Sampling for Dynamic Graph LearningComments: Accepted by KDD 2024, Research TrackSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Dynamic graph learning equips the edges with time attributes and allows multiple links between two nodes, which is a crucial technology for understanding evolving data scenarios like traffic prediction and recommendation systems. Existing works obtain the evolving patterns mainly depending on the most recent neighbor sequences. However, we argue that whether two nodes will have interaction with each other in the future is highly correlated with the same interaction that happened in the past. Only considering the recent neighbors overlooks the phenomenon of repeat behavior and fails to accurately capture the temporal evolution of interactions. To fill this gap, this paper presents RepeatMixer, which considers evolving patterns of first and high-order repeat behavior in the neighbor sampling strategy and temporal information learning. Firstly, we define the first-order repeat-aware nodes of the source node as the destination nodes that have interacted historically and extend this concept to high orders as nodes in the destination node's high-order neighbors. Then, we extract neighbors of the source node that interacted before the appearance of repeat-aware nodes with a slide window strategy as its neighbor sequence. Next, we leverage both the first and high-order neighbor sequences of source and destination nodes to learn temporal patterns of interactions via an MLP-based encoder. Furthermore, considering the varying temporal patterns on different orders, we introduce a time-aware aggregation mechanism that adaptively aggregates the temporal representations from different orders based on the significance of their interaction time sequences. Experimental results demonstrate the superiority of RepeatMixer over state-of-the-art models in link prediction tasks, underscoring the effectiveness of the proposed repeat-aware neighbor sampling strategy.
- [6] arXiv:2405.17768 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Revisiting the Message Passing in Heterophilous Graph Neural NetworksZhuonan Zheng, Yuanchen Bei, Sheng Zhou, Yao Ma, Ming Gu, HongJia XU, Chengyu Lai, Jiawei Chen, Jiajun BuSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Graph Neural Networks (GNNs) have demonstrated strong performance in graph mining tasks due to their message-passing mechanism, which is aligned with the homophily assumption that adjacent nodes exhibit similar behaviors. However, in many real-world graphs, connected nodes may display contrasting behaviors, termed as heterophilous patterns, which has attracted increased interest in heterophilous GNNs (HTGNNs). Although the message-passing mechanism seems unsuitable for heterophilous graphs due to the propagation of class-irrelevant information, it is still widely used in many existing HTGNNs and consistently achieves notable success. This raises the question: why does message passing remain effective on heterophilous graphs? To answer this question, in this paper, we revisit the message-passing mechanisms in heterophilous graph neural networks and reformulate them into a unified heterophilious message-passing (HTMP) mechanism. Based on HTMP and empirical analysis, we reveal that the success of message passing in existing HTGNNs is attributed to implicitly enhancing the compatibility matrix among classes. Moreover, we argue that the full potential of the compatibility matrix is not completely achieved due to the existence of incomplete and noisy semantic neighborhoods in real-world heterophilous graphs. To bridge this gap, we introduce a new approach named CMGNN, which operates within the HTMP mechanism to explicitly leverage and improve the compatibility matrix. A thorough evaluation involving 10 benchmark datasets and comparative analysis against 13 well-established baselines highlights the superior performance of the HTMP mechanism and CMGNN method.
- [7] arXiv:2405.18255 (cross-list from cs.CR) [pdf, ps, html, other]
-
Title: Channel Reciprocity Based Attack Detection for Securing UWB Ranging by AutoencoderSubjects: Cryptography and Security (cs.CR); Social and Information Networks (cs.SI); Signal Processing (eess.SP)
A variety of ranging threats represented by Ghost Peak attack have raised concerns regarding the security performance of Ultra-Wide Band (UWB) systems with the finalization of the IEEE 802.15.4z standard. Based on channel reciprocity, this paper proposes a low complexity attack detection scheme that compares Channel Impulse Response (CIR) features of both ranging sides utilizing an autoencoder with the capability of data compression and feature extraction. Taking Ghost Peak attack as an example, this paper demonstrates the effectiveness, feasibility and generalizability of the proposed attack detection scheme through simulation and experimental validation. The proposed scheme achieves an attack detection success rate of over 99% and can be implemented in current systems at low cost.
- [8] arXiv:2405.18414 (cross-list from cs.CL) [pdf, ps, html, other]
-
Title: Don't Forget to Connect! Improving RAG with Graph-based RerankingSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Retrieval Augmented Generation (RAG) has greatly improved the performance of Large Language Model (LLM) responses by grounding generation with context from existing documents. These systems work well when documents are clearly relevant to a question context. But what about when a document has partial information, or less obvious connections to the context? And how should we reason about connections between documents? In this work, we seek to answer these two core questions about RAG generation. We introduce G-RAG, a reranker based on graph neural networks (GNNs) between the retriever and reader in RAG. Our method combines both connections between documents and semantic information (via Abstract Meaning Representation graphs) to provide a context-informed ranker for RAG. G-RAG outperforms state-of-the-art approaches while having smaller computational footprint. Additionally, we assess the performance of PaLM 2 as a reranker and find it to significantly underperform G-RAG. This result emphasizes the importance of reranking for RAG even when using Large Language Models.
Cross submissions for Wednesday, 29 May 2024 (showing 4 of 4 entries )
- [9] arXiv:2401.06872 (replaced) [pdf, ps, html, other]
-
Title: Disease Transmission on Random Graphs Using Edge-Based PercolationSubjects: Social and Information Networks (cs.SI); Dynamical Systems (math.DS); Populations and Evolution (q-bio.PE)
Edge-based percolation methods can be used to analyze disease transmission on complex social networks. This allows us to include complex social heterogeneity in our models while maintaining tractability. Here we review the seminal works on this field by Newman et al (2001); Newman (2002, 2003), and Miller et al (2012). We present a systematic discussion of the theoretical background behind these models, including an extensive derivation of the major results. We also connect these results relate back to the classical literature in random graph theory Molloy and Reed (1995, 1998). Finally, we also present an accompanying R package that takes epidemic and network parameters as input and generates estimates of the epidemic trajectory and final size. This manuscript and the R package was developed to help researchers easily understand and use network models to investigate the interaction between different community structures and disease transmission.
- [10] arXiv:2401.11254 (replaced) [pdf, ps, html, other]
-
Title: The Great Ban: Efficacy and Unintended Consequences of a Massive Deplatforming Operation on RedditJournal-ref: DHOW - 16th ACM Web Science Conference (Websci Companion 24), 2024Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
In the current landscape of online abuses and harms, effective content moderation is necessary to cultivate safe and inclusive online spaces. Yet, the effectiveness of many moderation interventions is still unclear. Here, we assess the effectiveness of The Great Ban, a massive deplatforming operation that affected nearly 2,000 communities on Reddit. By analyzing 16M comments posted by 17K users during 14 months, we provide nuanced results on the effects, both desired and otherwise, of the ban. Among our main findings is that 15.6% of the affected users left Reddit and that those who remained reduced their toxicity by 6.6% on average. The ban also caused 5% users to increase their toxicity by more than 70% of their pre-ban level. Overall, our multifaceted results provide new insights into the efficacy of deplatforming. As such, our findings can inform the development of future moderation interventions and the policing of online platforms.
- [11] arXiv:2405.16059 (replaced) [pdf, ps, html, other]
-
Title: Interpretable Transformer Hawkes Processes: Unveiling Complex Interactions in Social NetworksSubjects: Social and Information Networks (cs.SI)
Social networks represent complex ecosystems where the interactions between users or groups play a pivotal role in information dissemination, opinion formation, and social interactions. Effectively harnessing event sequence data within social networks to unearth interactions among users or groups has persistently posed a challenging frontier within the realm of point processes. Current deep point process models face inherent limitations within the context of social networks, constraining both their interpretability and expressive power. These models encounter challenges in capturing interactions among users or groups and often rely on parameterized extrapolation methods when modelling intensity over non-event intervals, limiting their capacity to capture intricate intensity patterns, particularly beyond observed events. To address these challenges, this study proposes modifications to Transformer Hawkes processes (THP), leading to the development of interpretable Transformer Hawkes processes (ITHP). ITHP inherits the strengths of THP while aligning with statistical nonlinear Hawkes processes, thereby enhancing its interpretability and providing valuable insights into interactions between users or groups. Additionally, ITHP enhances the flexibility of the intensity function over non-event intervals, making it better suited to capture complex event propagation patterns in social networks. Experimental results, both on synthetic and real data, demonstrate the effectiveness of ITHP in overcoming the identified limitations. Moreover, they highlight ITHP's applicability in the context of exploring the complex impact of users or groups within social networks.
- [12] arXiv:2301.10856 (replaced) [pdf, ps, html, other]
-
Title: Partial Mobilization: Tracking Multilingual Information Flows Amongst Russian Media Outlets and TelegramComments: Accepted to ICWSM 2024 (ICWSM version)Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
In response to disinformation and propaganda from Russian online media following the invasion of Ukraine, Russian media outlets such as Russia Today and Sputnik News were banned throughout Europe. To maintain viewership, many of these Russian outlets began to heavily promote their content on messaging services like Telegram. In this work, we study how 16 Russian media outlets interacted with and utilized 732 Telegram channels throughout 2022. Leveraging the foundational model MPNet, DP-means clustering, and Hawkes processes, we trace how narratives spread between news sites and Telegram channels. We show that news outlets not only propagate existing narratives through Telegram but that they source material from the messaging platform. For example, across the websites in our study, between 2.3% (ura.news) and 26.7% (this http URL) of articles discussed content that originated/resulted from activity on Telegram. Finally, tracking the spread of individual topics, we measure the rate at which news outlets and Telegram channels disseminate content within the Russian media ecosystem, finding that websites like ura.news and Telegram channels such as @genshab are the most effective at disseminating their content.
- [13] arXiv:2308.08012 (replaced) [pdf, ps, html, other]
-
Title: Comprehensive Analysis of Network Robustness Evaluation Based on Convolutional Neural Networks with Spatial Pyramid PoolingComments: 25 pages, 8 figures, 7 tables, journalSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
Connectivity robustness, a crucial aspect for understanding, optimizing, and repairing complex networks, has traditionally been evaluated through time-consuming and often impractical simulations. Fortunately, machine learning provides a new avenue for addressing this challenge. However, several key issues remain unresolved, including the performance in more general edge removal scenarios, capturing robustness through attack curves instead of directly training for robustness, scalability of predictive tasks, and transferability of predictive capabilities. In this paper, we address these challenges by designing a convolutional neural networks (CNN) model with spatial pyramid pooling networks (SPP-net), adapting existing evaluation metrics, redesigning the attack modes, introducing appropriate filtering rules, and incorporating the value of robustness as training data. The results demonstrate the thoroughness of the proposed CNN framework in addressing the challenges of high computational time across various network types, failure component types and failure scenarios. However, the performance of the proposed CNN model varies: for evaluation tasks that are consistent with the trained network type, the proposed CNN model consistently achieves accurate evaluations of both attack curves and robustness values across all removal scenarios. When the predicted network type differs from the trained network, the CNN model still demonstrates favorable performance in the scenario of random node failure, showcasing its scalability and performance transferability. Nevertheless, the performance falls short of expectations in other removal scenarios. This observed scenario-sensitivity in the evaluation of network features has been overlooked in previous studies and necessitates further attention and optimization. Lastly, we discuss important unresolved questions and further investigation.
- [14] arXiv:2310.19697 (replaced) [pdf, ps, html, other]
-
Title: A nonlinear spectral core-periphery detection method for multiplex networksSubjects: Numerical Analysis (math.NA); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Core-periphery detection aims to separate the nodes of a complex network into two subsets: a core that is densely connected to the entire network and a periphery that is densely connected to the core but sparsely connected internally. The definition of core-periphery structure in multiplex networks that record different types of interactions between the same set of nodes on different layers is nontrivial since a node may belong to the core in some layers and to the periphery in others. We propose a nonlinear spectral method for multiplex networks that simultaneously optimises a node and a layer coreness vector by maximising a suitable nonconvex homogeneous objective function by a provably convergent alternating fixed point iteration. We derive a quantitative measure for the quality of a given multiplex core-periphery structure that allows the determination of the optimal core size. Numerical experiments on synthetic and real-world networks illustrate that our approach is robust against noisy layers and significantly outperforms baseline methods while improving the latter with our novel optimised layer coreness weights. As the runtime of our method depends linearly on the number of edges of the network it is scalable to large-scale multiplex networks.
- [15] arXiv:2402.00447 (replaced) [pdf, ps, html, other]
-
Title: A Survey of Data-Efficient Graph LearningComments: Accepted by Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Graph-structured data, prevalent in domains ranging from social networks to biochemical analysis, serve as the foundation for diverse real-world systems. While graph neural networks demonstrate proficiency in modeling this type of data, their success is often reliant on significant amounts of labeled data, posing a challenge in practical scenarios with limited annotation resources. To tackle this problem, tremendous efforts have been devoted to enhancing graph machine learning performance under low-resource settings by exploring various approaches to minimal supervision. In this paper, we introduce a novel concept of Data-Efficient Graph Learning (DEGL) as a research frontier, and present the first survey that summarizes the current progress of DEGL. We initiate by highlighting the challenges inherent in training models with large labeled data, paving the way for our exploration into DEGL. Next, we systematically review recent advances on this topic from several key aspects, including self-supervised graph learning, semi-supervised graph learning, and few-shot graph learning. Also, we state promising directions for future research, contributing to the evolution of graph machine learning.
- [16] arXiv:2405.04773 (replaced) [pdf, ps, html, other]
-
Title: Hypergraph-enhanced Dual Semi-supervised Graph ClassificationWei Ju, Zhengyang Mao, Siyu Yi, Yifang Qin, Yiyang Gu, Zhiping Xiao, Yifan Wang, Xiao Luo, Ming ZhangComments: Accepted by Proceedings of the 41st International Conference on Machine Learning (ICML 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. Despite the promising capability of graph neural networks (GNNs), they typically require a large number of costly labeled graphs, while a wealth of unlabeled graphs fail to be effectively utilized. Moreover, GNNs are inherently limited to encoding local neighborhood information using message-passing mechanisms, thus lacking the ability to model higher-order dependencies among nodes. To tackle these challenges, we propose a Hypergraph-Enhanced DuAL framework named HEAL for semi-supervised graph classification, which captures graph semantics from the perspective of the hypergraph and the line graph, respectively. Specifically, to better explore the higher-order relationships among nodes, we design a hypergraph structure learning to adaptively learn complex node dependencies beyond pairwise relations. Meanwhile, based on the learned hypergraph, we introduce a line graph to capture the interaction between hyperedges, thereby better mining the underlying semantic structures. Finally, we develop a relational consistency learning to facilitate knowledge transfer between the two branches and provide better mutual guidance. Extensive experiments on real-world graph datasets verify the effectiveness of the proposed method against existing state-of-the-art methods.