Computers and Society
- [1] arXiv:2405.17728 [pdf, ps, other]
-
Title: Facilitating Holistic Evaluations with LLMs: Insights from Scenario-Based ExperimentsSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Workshop courses designed to foster creativity are gaining popularity. However, achieving a holistic evaluation that accommodates diverse perspectives is challenging, even for experienced faculty teams. Adequate discussion is essential to integrate varied assessments, but faculty often lack the time for such deliberations. Deriving an average score without discussion undermines the purpose of a holistic evaluation. This paper explores the use of a Large Language Model (LLM) as a facilitator to integrate diverse faculty assessments. Scenario-based experiments were conducted to determine if the LLM could synthesize diverse evaluations and explain the underlying theories to faculty. The results were noteworthy, showing that the LLM effectively facilitated faculty discussions. Additionally, the LLM demonstrated the capability to generalize and create evaluation criteria from a single scenario based on its learned domain knowledge.
- [2] arXiv:2405.17971 [pdf, ps, html, other]
-
Title: A Qualitative Analysis Framework for mHealth Privacy PracticesComments: 8 pages, 6 figures, accepted at the 2024 International Workshop on Privacy Engineering (IWPE'24)Subjects: Computers and Society (cs.CY)
Mobile Health (mHealth) applications have become a crucial part of health monitoring and management. However, the proliferation of these applications has also raised concerns over the privacy and security of Personally Identifiable Information and Protected Health Information. Addressing these concerns, this paper introduces a novel framework for the qualitative evaluation of privacy practices in mHealth apps, particularly focusing on the handling and transmission of sensitive user data. Our investigation encompasses an analysis of 152 leading mHealth apps on the Android platform, leveraging the proposed framework to provide a multifaceted view of their data processing activities. Despite stringent regulations like the General Data Protection Regulation in the European Union and the Health Insurance Portability and Accountability Act in the United States, our findings indicate persistent issues with negligence and misuse of sensitive user information. We uncover significant instances of health information leakage to third-party trackers and a widespread neglect of privacy-by-design and transparency principles. Our research underscores the critical need for stricter enforcement of data protection laws and sets a foundation for future efforts aimed at enhancing user privacy within the mHealth ecosystem.
- [3] arXiv:2405.18179 [pdf, ps, other]
-
Title: Rethinking the A in STEAM: Insights from and for AI Literacy EducationSubjects: Computers and Society (cs.CY)
This article rethinks the role of arts in STEAM education, emphasizing its importance in AI literacy within K-12 contexts. Arguing against the marginalization of arts, the paper is structured around four key domains: language studies, philosophy, social studies, and visual arts. Each section addresses critical AI-related phenomena and provides pedagogical strate-gies for effective integration into STEAM education. Language studies focus on media representations and the probabilistic nature of AI language models. The philosophy section examines anthropomorphism, ethics, and the misconstrued human-like capabilities of AI. Social studies discuss AI's societal impacts, biases, and ethical considerations in data prac-tices. Visual arts explore the implications of generative AI on artistic processes and intellec-tual property. The article concludes by advocating for a robust inclusion of arts in STEAM to foster a holistic, equitable, and sustainable understanding of AI, ultimately inspiring technologies that promote fairness and creativity.
- [4] arXiv:2405.18339 [pdf, ps, other]
-
Title: What characteristics define disinformation and fake news?: review of taxonomies and definitionsSubjects: Computers and Society (cs.CY); Information Theory (cs.IT)
What characteristics define disinformation and fake news? To address this research question, this Technical Note provides a comprehensive analysis of disinformation and fake news, synthesizing 46 definitions and highlighting four key points addressing their fundamental characteristics. Adopting the Prisma 2020 method, five search sets with the Boolean operator AND were selected in both Portuguese and English, which were applied across four databases, resulting in 237 reviewed articles. Following a meticulous analysis, relevant articles were identified and included, while duplicates and inaccessible documents were excluded. It points to disinformation as information that is totally or partially false, crafted by a sender with the aim of misleading, with opportunistic content designed to manipulate reality, being amplified by individual characteristics of the receiver in their interpretation and by contextual factors in which they are embedded. This Technical Note seeks to contribute to an understanding of the phenomenon of disinformation that includes the contextual dimension, obtaining as fundamental elements of analysis: I.) Sender; II.) Content; III.) Receiver; and IV.) Environment.
- [5] arXiv:2405.18374 [pdf, ps, html, other]
-
Title: Hostile Counterspeech Drives Users From Hate SubredditsDaniel Hickey, Matheus Schmitz, Daniel M.T. Fessler, Paul E. Smaldino, Kristina Lerman, Goran Murić, Keith BurghardtComments: 19 pages, 11 figures. arXiv admin note: text overlap with arXiv:2303.13641Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Counterspeech -- speech that opposes hate speech -- has gained significant attention recently as a strategy to reduce hate on social media. While previous studies suggest that counterspeech can somewhat reduce hate speech, little is known about its effects on participation in online hate communities, nor which counterspeech tactics reduce harmful behavior. We begin to address these gaps by identifying 25 large hate communities ("subreddits") within Reddit and analyzing the effect of counterspeech on newcomers within these communities. We first construct a new public dataset of carefully annotated counterspeech and non-counterspeech comments within these subreddits. We use this dataset to train a state-of-the-art counterspeech detection model. Next, we use matching to evaluate the causal effects of hostile and non-hostile counterspeech on the engagement of newcomers in hate subreddits. We find that, while non-hostile counterspeech is ineffective at keeping users from fully disengaging from these hate subreddits, a single hostile counterspeech comment substantially reduces both future likelihood of engagement. While offering nuance to the understanding of counterspeech efficacy, these results a) leave unanswered the question of whether hostile counterspeech dissuades newcomers from participation in online hate writ large, or merely drives them into less-moderated and more extreme hate communities, and b) raises ethical considerations about hostile counterspeech, which is both comparatively common and might exacerbate rather than mitigate the net level of antagonism in society. These findings underscore the importance of future work to improve counterspeech tactics and minimize unintended harm.
New submissions for Wednesday, 29 May 2024 (showing 5 of 5 entries )
- [6] arXiv:2405.17451 (cross-list from cs.LG) [pdf, ps, other]
-
Title: Green AI in Action: Strategic Model Selection for Ensembles in ProductionComments: 9 pages. Accepted at the 1st ACM International Conference on AI-powered Software (AIware), 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Software Engineering (cs.SE)
Integrating Artificial Intelligence (AI) into software systems has significantly enhanced their capabilities while escalating energy demands. Ensemble learning, combining predictions from multiple models to form a single prediction, intensifies this problem due to cumulative energy consumption. This paper presents a novel approach to model selection that addresses the challenge of balancing the accuracy of AI models with their energy consumption in a live AI ensemble system. We explore how reducing the number of models or improving the efficiency of model usage within an ensemble during inference can reduce energy demands without substantially sacrificing accuracy. This study introduces and evaluates two model selection strategies, Static and Dynamic, for optimizing ensemble learning systems performance while minimizing energy usage. Our results demonstrate that the Static strategy improves the F1 score beyond the baseline, reducing average energy usage from 100\% from the full ensemble to 6\2%. The Dynamic strategy further enhances F1 scores, using on average 76\% compared to 100% of the full ensemble. Moreover, we propose an approach that balances accuracy with resource consumption, significantly reducing energy usage without substantially impacting accuracy. This method decreased the average energy usage of the Static strategy from approximately 62\% to 14\%, and for the Dynamic strategy, from around 76\% to 57\%. Our field study of Green AI using an operational AI system developed by a large professional services provider shows the practical applicability of adopting energy-conscious model selection strategies in live production environments.
- [7] arXiv:2405.17469 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: A Dataset for Research on Water SustainabilityComments: Accepted by ACM e-Energy 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Freshwater scarcity is a global problem that requires collective efforts across all industry sectors. Nevertheless, a lack of access to operational water footprint data bars many applications from exploring optimization opportunities hidden within the temporal and spatial variations. To break this barrier into research in water sustainability, we build a dataset for operation direct water usage in the cooling systems and indirect water embedded in electricity generation. Our dataset consists of the hourly water efficiency of major U.S. cities and states from 2019 to 2023. We also offer cooling system models that capture the impact of weather on water efficiency. We present a preliminary analysis of our dataset and discuss three potential applications that can benefit from it. Our dataset is publicly available at Open Science Framework (OSF)
- [8] arXiv:2405.17512 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: On Fairness of Low-Rank Adaptation of Large ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Low-rank adaptation of large models, particularly LoRA, has gained traction due to its computational efficiency. This efficiency, contrasted with the prohibitive costs of full-model fine-tuning, means that practitioners often turn to LoRA and sometimes without a complete understanding of its ramifications. In this study, we focus on fairness and ask whether LoRA has an unexamined impact on utility, calibration, and resistance to membership inference across different subgroups (e.g., genders, races, religions) compared to a full-model fine-tuning baseline. We present extensive experiments across vision and language domains and across classification and generation tasks using ViT-Base, Swin-v2-Large, Llama-2 7B, and Mistral 7B. Intriguingly, experiments suggest that while one can isolate cases where LoRA exacerbates model bias across subgroups, the pattern is inconsistent -- in many cases, LoRA has equivalent or even improved fairness compared to the base model or its full fine-tuning baseline. We also examine the complications of evaluating fine-tuning fairness relating to task design and model token bias, calling for more careful fairness evaluations in future work.
- [9] arXiv:2405.17544 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Towards Human-AI Complementarity with Predictions SetsSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Decision support systems based on prediction sets have proven to be effective at helping human experts solve classification tasks. Rather than providing single-label predictions, these systems provide sets of label predictions constructed using conformal prediction, namely prediction sets, and ask human experts to predict label values from these sets. In this paper, we first show that the prediction sets constructed using conformal prediction are, in general, suboptimal in terms of average accuracy. Then, we show that the problem of finding the optimal prediction sets under which the human experts achieve the highest average accuracy is NP-hard. More strongly, unless P = NP, we show that the problem is hard to approximate to any factor less than the size of the label set. However, we introduce a simple and efficient greedy algorithm that, for a large class of expert models and non-conformity scores, is guaranteed to find prediction sets that provably offer equal or greater performance than those constructed using conformal prediction. Further, using a simulation study with both synthetic and real expert predictions, we demonstrate that, in practice, our greedy algorithm finds near-optimal prediction sets offering greater performance than conformal prediction.
- [10] arXiv:2405.17782 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Post-Fair Federated Learning: Achieving Group and Community Fairness in Federated Learning via Post-processingSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Federated Learning (FL) is a distributed machine learning framework in which a set of local communities collaboratively learn a shared global model while retaining all training data locally within each community. Two notions of fairness have recently emerged as important issues for federated learning: group fairness and community fairness. Group fairness requires that a model's decisions do not favor any particular group based on a set of legally protected attributes such as race or gender. Community fairness requires that global models exhibit similar levels of performance (accuracy) across all collaborating communities. Both fairness concepts can coexist within an FL framework, but the existing literature has focused on either one concept or the other. This paper proposes and analyzes a post-processing fair federated learning (FFL) framework called post-FFL. Post-FFL uses a linear program to simultaneously enforce group and community fairness while maximizing the utility of the global model. Because Post-FFL is a post-processing approach, it can be used with existing FL training pipelines whose convergence properties are well understood. This paper uses post-FFL on real-world datasets to mimic how hospital networks, for example, use federated learning to deliver community health care. Theoretical results bound the accuracy lost when post-FFL enforces both notion of fairness. Experimental results illustrate that post-FFL simultaneously improves both group and community fairness in FL. Moreover, post-FFL outperforms the existing in-processing fair federated learning in terms of improving both notions of fairness, communication efficiency and computation cost.
- [11] arXiv:2405.17898 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic PredictionComments: This paper has been accepted by ICML 2024 (poster)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
The objective of traffic prediction is to accurately forecast and analyze the dynamics of transportation patterns, considering both space and time. However, the presence of distribution shift poses a significant challenge in this field, as existing models struggle to generalize well when faced with test data that significantly differs from the training distribution. To tackle this issue, this paper introduces a simple and universal spatio-temporal prompt-tuning framework-FlashST, which adapts pre-trained models to the specific characteristics of diverse downstream datasets, improving generalization in diverse traffic prediction scenarios. Specifically, the FlashST framework employs a lightweight spatio-temporal prompt network for in-context learning, capturing spatio-temporal invariant knowledge and facilitating effective adaptation to diverse scenarios. Additionally, we incorporate a distribution mapping mechanism to align the data distributions of pre-training and downstream data, facilitating effective knowledge transfer in spatio-temporal forecasting. Empirical evaluations demonstrate the effectiveness of our FlashST across different spatio-temporal prediction tasks using diverse urban datasets. Code is available at this https URL.
- [12] arXiv:2405.17905 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
With the development of modern society, traffic volume continues to increase in most countries worldwide, leading to an increase in the rate of pavement damage Therefore, the real-time and highly accurate pavement damage detection and maintenance have become the current need. In this paper, an enhanced pavement damage detection method with CycleGAN and improved YOLOv5 algorithm is presented. We selected 7644 self-collected images of pavement damage samples as the initial dataset and augmented it by CycleGAN. Due to a substantial difference between the images generated by CycleGAN and real road images, we proposed a data enhancement method based on an improved Scharr filter, CycleGAN, and Laplacian pyramid. To improve the target recognition effect on a complex background and solve the problem that the spatial pyramid pooling-fast module in the YOLOv5 network cannot handle multiscale targets, we introduced the convolutional block attention module attention mechanism and proposed the atrous spatial pyramid pooling with squeeze-and-excitation structure. In addition, we optimized the loss function of YOLOv5 by replacing the CIoU with EIoU. The experimental results showed that our algorithm achieved a precision of 0.872, recall of 0.854, and mean average precision@0.5 of 0.882 in detecting three main types of pavement damage: cracks, potholes, and patching. On the GPU, its frames per second reached 68, meeting the requirements for real-time detection. Its overall performance even exceeded the current more advanced YOLOv7 and achieved good results in practical applications, providing a basis for decision-making in pavement damage detection and prevention.
- [13] arXiv:2405.17921 (cross-list from cs.AI) [pdf, ps, other]
-
Title: Towards Clinical AI Fairness: Filling Gaps in the PuzzleMingxuan Liu, Yilin Ning, Salinelat Teixayavong, Xiaoxuan Liu, Mayli Mertens, Yuqing Shang, Xin Li, Di Miao, Jie Xu, Daniel Shu Wei Ting, Lionel Tim-Ee Cheng, Jasmine Chiat Ling Ong, Zhen Ling Teo, Ting Fang Tan, Narrendar RaviChandran, Fei Wang, Leo Anthony Celi, Marcus Eng Hock Ong, Nan LiuSubjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical advancements and their practical clinical applications, resulting in a lack of contextualized discussion of AI fairness in clinical settings. Through a detailed evidence gap analysis, our review systematically pinpoints several deficiencies concerning both healthcare data and the provided AI fairness solutions. We highlight the scarcity of research on AI fairness in many medical domains where AI technology is increasingly utilized. Additionally, our analysis highlights a substantial reliance on group fairness, aiming to ensure equality among demographic groups from a macro healthcare system perspective; in contrast, individual fairness, focusing on equity at a more granular level, is frequently overlooked. To bridge these gaps, our review advances actionable strategies for both the healthcare and AI research communities. Beyond applying existing AI fairness methods in healthcare, we further emphasize the importance of involving healthcare professionals to refine AI fairness concepts and methods to ensure contextually relevant and ethically sound AI applications in healthcare.
Cross submissions for Wednesday, 29 May 2024 (showing 8 of 8 entries )
- [14] arXiv:2301.10856 (replaced) [pdf, ps, html, other]
-
Title: Partial Mobilization: Tracking Multilingual Information Flows Amongst Russian Media Outlets and TelegramComments: Accepted to ICWSM 2024 (ICWSM version)Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
In response to disinformation and propaganda from Russian online media following the invasion of Ukraine, Russian media outlets such as Russia Today and Sputnik News were banned throughout Europe. To maintain viewership, many of these Russian outlets began to heavily promote their content on messaging services like Telegram. In this work, we study how 16 Russian media outlets interacted with and utilized 732 Telegram channels throughout 2022. Leveraging the foundational model MPNet, DP-means clustering, and Hawkes processes, we trace how narratives spread between news sites and Telegram channels. We show that news outlets not only propagate existing narratives through Telegram but that they source material from the messaging platform. For example, across the websites in our study, between 2.3% (ura.news) and 26.7% (this http URL) of articles discussed content that originated/resulted from activity on Telegram. Finally, tracking the spread of individual topics, we measure the rate at which news outlets and Telegram channels disseminate content within the Russian media ecosystem, finding that websites like ura.news and Telegram channels such as @genshab are the most effective at disseminating their content.
- [15] arXiv:2304.00427 (replaced) [pdf, ps, html, other]
-
Title: Finding Pareto Efficient Redistricting Plans with Short BurstsComments: 7 pages, 3 figures, plus references and appendices Updated to correct mis-ordering of maps in Figure 2 and typo in compactness formulaSubjects: Computers and Society (cs.CY); Optimization and Control (math.OC); Applications (stat.AP)
Redistricting practitioners must balance many competing constraints and criteria when drawing district boundaries. To aid in this process, researchers have developed many methods for optimizing districting plans according to one or more criteria. This research note extends a recently-proposed single-criterion optimization method, short bursts (Cannon et al., 2023), to handle the multi-criterion case, and in doing so approximate the Pareto frontier for any set of constraints. We study the empirical performance of the method in a realistic setting and find it behaves as expected and is not very sensitive to algorithmic parameters. The proposed approach, which is implemented in open-source software, should allow researchers and practitioners to better understand the tradeoffs inherent to the redistricting process.
- [16] arXiv:2401.14908 (replaced) [pdf, ps, html, other]
-
Title: A Framework for Assurance Audits of Algorithmic SystemsJournal-ref: The 2024 ACM Conference on Fairness, Accountability, and TransparencySubjects: Computers and Society (cs.CY)
An increasing number of regulations propose AI audits as a mechanism for achieving transparency and accountability for artificial intelligence (AI) systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently lacks agreed-upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operationalizable compliance and assurance external audit framework. We model elements of this approach after financial auditing practices, and argue that AI audits should similarly provide assurance to their stakeholders about AI organizations' ability to govern their algorithms in ways that mitigate harms and uphold human values. We discuss the necessary conditions for the criterion audit and provide a procedural blueprint for performing an audit engagement in practice. We illustrate how this framework can be adapted to current regulations by deriving the criteria on which bias audits can be performed for in-scope hiring algorithms, as required by the recently effective New York City Local Law 144 of 2021. We conclude by offering a critical discussion on the benefits, inherent limitations, and implementation challenges of applying practices of the more mature financial auditing industry to AI auditing where robust guardrails against quality assurance issues are only starting to emerge. Our discussion -- informed by experiences in performing these audits in practice -- highlights the critical role that an audit ecosystem plays in ensuring the effectiveness of audits.
- [17] arXiv:2402.01804 (replaced) [pdf, ps, html, other]
-
Title: Analysis of Internet of Things Implementation Barriers in the Cold Supply Chain: An Integrated ISM-MICMAC and DEMATEL ApproachSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Integrating Internet of Things (IoT) technology inside the cold supply chain can enhance transparency, efficiency, and quality, optimizing operating procedures and increasing productivity. The integration of IoT in this complicated setting is hindered by specific barriers that need a thorough examination. Prominent barriers to IoT implementation in the cold supply chain are identified using a two-stage model. After reviewing the available literature on the topic of IoT implementation, a total of 13 barriers were found. The survey data was cross-validated for quality, and Cronbach's alpha test was employed to ensure validity. This research applies the interpretative structural modeling technique in the first phase to identify the main barriers. Among those barriers, "regularity compliance" and "cold chain networks" are key drivers for IoT adoption strategies. MICMAC's driving and dependence power element categorization helps evaluate the barrier interactions. In the second phase of this research, a decision-making trial and evaluation laboratory methodology was employed to identify causal relationships between barriers and evaluate them according to their relative importance. Each cause is a potential drive, and if its efficiency can be enhanced, the system as a whole benefits. The research findings provide industry stakeholders, governments, and organizations with significant drivers of IoT adoption to overcome these barriers and optimize the utilization of IoT technology to improve the effectiveness and reliability of the cold supply chain.
- [18] arXiv:2403.02118 (replaced) [pdf, ps, html, other]
-
Title: Position: Towards Implicit Prompt For Text-To-Image ModelsYue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping LuoSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Recent text-to-image (T2I) models have had great success, and many benchmarks have been proposed to evaluate their performance and safety. However, they only consider explicit prompts while neglecting implicit prompts (hint at a target without explicitly mentioning it). These prompts may get rid of safety constraints and pose potential threats to the applications of these models. This position paper highlights the current state of T2I models toward implicit prompts. We present a benchmark named ImplicitBench and conduct an investigation on the performance and impacts of implicit prompts with popular T2I models. Specifically, we design and collect more than 2,000 implicit prompts of three aspects: General Symbols, Celebrity Privacy, and Not-Safe-For-Work (NSFW) Issues, and evaluate six well-known T2I models' capabilities under these implicit prompts. Experiment results show that (1) T2I models are able to accurately create various target symbols indicated by implicit prompts; (2) Implicit prompts bring potential risks of privacy leakage for T2I models. (3) Constraints of NSFW in most of the evaluated T2I models can be bypassed with implicit prompts. We call for increased attention to the potential and risks of implicit prompts in the T2I community and further investigation into the capabilities and impacts of implicit prompts, advocating for a balanced approach that harnesses their benefits while mitigating their risks.
- [19] arXiv:2212.02985 (replaced) [pdf, ps, html, other]
-
Title: Multi-Layer Personalized Federated Learning for Mitigating Biases in Student Predictive AnalyticsYun-Wei Chu, Seyyedali Hosseinalipour, Elizabeth Tenorio, Laura Cruz, Kerrie Douglas, Andrew Lan, Christopher BrintonComments: IEEE Transactions on Emerging Topics in Computing, 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Conventional methods for student modeling, which involve predicting grades based on measured activities, struggle to provide accurate results for minority/underrepresented student groups due to data availability biases. In this paper, we propose a Multi-Layer Personalized Federated Learning (MLPFL) methodology that optimizes inference accuracy over different layers of student grouping criteria, such as by course and by demographic subgroups within each course. In our approach, personalized models for individual student subgroups are derived from a global model, which is trained in a distributed fashion via meta-gradient updates that account for subgroup heterogeneity while preserving modeling commonalities that exist across the full dataset. The evaluation of the proposed methodology considers case studies of two popular downstream student modeling tasks, knowledge tracing and outcome prediction, which leverage multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums) in model training. Experiments on three real-world online course datasets show significant improvements achieved by our approach over existing student modeling benchmarks, as evidenced by an increased average prediction quality and decreased variance across different student subgroups. Visual analysis of the resulting students' knowledge state embeddings confirm that our personalization methodology extracts activity patterns clustered into different student subgroups, consistent with the performance enhancements we obtain over the baselines.
- [20] arXiv:2307.00364 (replaced) [pdf, ps, html, other]
-
Title: The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanationsComments: Viewpoint paper, under review at JAIRSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single post-hoc explainer, whereas recent work has identified systematic disagreement between post-hoc explainers when applied to the same instances of underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose a shift from post-hoc explainability to designing interpretable neural network architectures. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing with InterpretCC and temporal diagnostics with I2MD). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.
- [21] arXiv:2309.03426 (replaced) [pdf, ps, other]
-
Title: Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit RateComments: Published at the Forty-first International Conference on Machine Learning (ICML 2024)Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Decisions made by machine learning models can have lasting impacts, making long-term fairness a critical consideration. It has been observed that ignoring the long-term effect and directly applying fairness criterion in static settings can actually worsen bias over time. To address biases in sequential decision-making, we introduce a long-term fairness concept named Equal Long-term Benefit Rate (ELBERT). This concept is seamlessly integrated into a Markov Decision Process (MDP) to consider the future effects of actions on long-term fairness, thus providing a unified framework for fair sequential decision-making problems. ELBERT effectively addresses the temporal discrimination issues found in previous long-term fairness notions. Additionally, we demonstrate that the policy gradient of Long-term Benefit Rate can be analytically simplified to standard policy gradients. This simplification makes conventional policy optimization methods viable for reducing bias, leading to our bias mitigation approach ELBERT-PO. Extensive experiments across various diverse sequential decision-making environments consistently reveal that ELBERT-PO significantly diminishes bias while maintaining high utility. Code is available at this https URL.
- [22] arXiv:2310.02932 (replaced) [pdf, ps, html, other]
-
Title: Assessing Large Language Models on Climate InformationJannis Bulian, Mike S. Schäfer, Afra Amini, Heidi Lam, Massimiliano Ciaramita, Ben Gaiarin, Michelle Chen Hübscher, Christian Buck, Niels G. Mede, Markus Leippold, Nadine StraußJournal-ref: Proceedings of the 41st International Conference on Machine Learning (ICML), 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication.
- [23] arXiv:2311.02790 (replaced) [pdf, ps, html, other]
-
Title: CausalCite: A Causal Formulation of Paper CitationsIshan Kumar, Zhijing Jin, Ehsan Mokhtarian, Siyuan Guo, Yuen Chen, Mrinmaya Sachan, Bernhard SchölkopfComments: ACL 2024 FindingsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. TextMatch encodes each paper using text embeddings from large language models (LLMs), extracts similar samples by cosine similarity, and synthesizes a counterfactual sample as the weighted average of similar papers according to their similarity values. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various subfields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of the quality of a paper. Our code is available at this https URL.
- [24] arXiv:2401.11254 (replaced) [pdf, ps, html, other]
-
Title: The Great Ban: Efficacy and Unintended Consequences of a Massive Deplatforming Operation on RedditJournal-ref: DHOW - 16th ACM Web Science Conference (Websci Companion 24), 2024Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
In the current landscape of online abuses and harms, effective content moderation is necessary to cultivate safe and inclusive online spaces. Yet, the effectiveness of many moderation interventions is still unclear. Here, we assess the effectiveness of The Great Ban, a massive deplatforming operation that affected nearly 2,000 communities on Reddit. By analyzing 16M comments posted by 17K users during 14 months, we provide nuanced results on the effects, both desired and otherwise, of the ban. Among our main findings is that 15.6% of the affected users left Reddit and that those who remained reduced their toxicity by 6.6% on average. The ban also caused 5% users to increase their toxicity by more than 70% of their pre-ban level. Overall, our multifaceted results provide new insights into the efficacy of deplatforming. As such, our findings can inform the development of future moderation interventions and the policing of online platforms.
- [25] arXiv:2402.02933 (replaced) [pdf, ps, html, other]
-
Title: InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of ExpertsSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Interpretability for neural networks is a trade-off between three key requirements: 1) faithfulness of the explanation (i.e., how perfectly it explains the prediction), 2) understandability of the explanation by humans, and 3) model performance. Most existing methods compromise one or more of these requirements; e.g., post-hoc approaches provide limited faithfulness, automatically identified feature masks compromise understandability, and intrinsically interpretable methods such as decision trees limit model performance. These shortcomings are unacceptable for sensitive applications such as education and healthcare, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability, while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply variations of the InterpretCC architecture for text, time series and tabular data across several real-world benchmarks, demonstrating comparable performance with non-interpretable baselines, outperforming interpretable-by-design baselines, and showing higher actionability and usefulness according to a user study.
- [26] arXiv:2404.18308 (replaced) [pdf, ps, other]
-
Title: Near-Term Enforcement of AI Chip Export Controls Using A Firmware-Based Design for Offline LicensingSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Offline Licensing is a mechanism for compute governance that could be used to prevent unregulated training of potentially dangerous frontier AI models. The mechanism works by disabling AI chips unless they have an unused license from a regulator. In this report, we present a design for a minimal version of Offline Licensing that could be delivered via a firmware update. Existing AI chips could potentially support Offline Licensing within a year if they have the following (relatively common) hardware security features: firmware verification, firmware rollback protection, and secure non-volatile memory. Public documentation suggests that NVIDIA's H100 AI chip already has these security features. Without additional hardware modifications, the system is susceptible to physical hardware attacks. However, these attacks might require expensive equipment and could be difficult to reliably apply to thousands of AI chips. A firmware-based Offline Licensing design shares the same legal requirements and license approval mechanism as a hardware-based solution. Implementing a firmware-based solution now could accelerate the eventual deployment of a more secure hardware-based solution in the future. For AI chip manufacturers, implementing this security mechanism might allow chips to be sold to customers that would otherwise be prohibited by export restrictions. For governments, it may be important to be able to prevent unsafe or malicious actors from training frontier AI models in the next few years. Based on this initial analysis, firmware-based Offline Licensing could partially solve urgent security and trade problems and is technically feasible for AI chips that have common hardware security features.
- [27] arXiv:2405.14555 (replaced) [pdf, ps, other]
-
Title: Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language ModelsComments: 9 pages (excluding references), accepted to ACL 2024 Main ConferenceSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models' outputs toward particular social narratives. This study addresses two such biases within LLMs: representative bias, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and affinity bias, reflecting the models' evaluative preferences for specific narratives or viewpoints. We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), and present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect these subtle biases. Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men. Furthermore, our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to `bias fingerprints'. This trend is also seen in human evaluators, highlighting a complex interplay between human and machine bias perceptions.