Information Retrieval

New submissions

Submissions received from Fri 3 May 24 to Mon 6 May 24, announced Tue, 7 May 24

New submissions
Cross-lists
Replacements

[ total of 35 entries: 1-35 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 7 May 24

[1] arXiv:2405.02429 [pdf, other]: Title: CALRec: Contrastive Alignment of Generative LLMs For Sequential Recommendation

Authors: Yaoyiran Li, Xiang Zhai, Moustafa Alzantot, Keyi Yu, Ivan Vulić, Anna Korhonen, Mohamed Hammad

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Traditional recommender systems such as matrix factorization methods rely on learning a shared dense embedding space to represent both items and user preferences. Sequence models such as RNN, GRUs, and, recently, Transformers have also excelled in the task of sequential recommendation. This task requires understanding the sequential structure present in users' historical interactions to predict the next item they may like. Building upon the success of Large Language Models (LLMs) in a variety of tasks, researchers have recently explored using LLMs that are pretrained on vast corpora of text for sequential recommendation. To use LLMs in sequential recommendations, both the history of user interactions and the model's prediction of the next item are expressed in text form. We propose CALRec, a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion using a mixture of two contrastive losses and a language modeling loss: the LLM is first finetuned on a data mixture from multiple domains followed by another round of target domain finetuning. Our model significantly outperforms many state-of-the-art baselines (+37% in Recall@1 and +24% in NDCG@10) and systematic ablation studies reveal that (i) both stages of finetuning are crucial, and, when combined, we achieve improved performance, and (ii) contrastive alignment is effective among the target domains explored in our experiments.
[2] arXiv:2405.02503 [pdf, other]: Title: Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models

Authors: Catherine Chen, Jack Merullo, Carsten Eickhoff

Comments: 10 pages, 10 figures, accepted at SIGIR 2024 as perspective paper

Subjects: Information Retrieval (cs.IR)

Neural models have demonstrated remarkable performance across diverse ranking tasks. However, the processes and internal mechanisms along which they determine relevance are still largely unknown. Existing approaches for analyzing neural ranker behavior with respect to IR properties rely either on assessing overall model behavior or employing probing methods that may offer an incomplete understanding of causal mechanisms. To provide a more granular understanding of internal model decision-making processes, we propose the use of causal interventions to reverse engineer neural rankers, and demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms within a ranking model. We identify a group of attention heads that detect duplicate tokens in earlier layers of the model, then communicate with downstream heads to compute overall document relevance. More generally, we propose that this style of mechanistic analysis opens up avenues for reverse engineering the processes neural retrieval models use to compute relevance. This work aims to initiate granular interpretability efforts that will not only benefit retrieval model development and training, but ultimately ensure safer deployment of these models.
[3] arXiv:2405.02525 [pdf, other]: Title: RLStop: A Reinforcement Learning Stopping Method for TAR

Authors: Reem Bin-Hezam, Mark Stevenson

Comments: Accepted at SIGIR 2024

Subjects: Information Retrieval (cs.IR)

We present RLStop, a novel Technology Assisted Review (TAR) stopping rule based on reinforcement learning that helps minimise the number of documents that need to be manually reviewed within TAR applications. RLStop is trained on example rankings using a reward function to identify the optimal point to stop examining documents. Experiments at a range of target recall levels on multiple benchmark datasets (CLEF e-Health, TREC Total Recall, and Reuters RCV1) demonstrated that RLStop substantially reduces the workload required to screen a document collection for relevance. RLStop outperforms a wide range of alternative approaches, achieving performance close to the maximum possible for the task under some circumstances.
[4] arXiv:2405.02637 [pdf, other]: Title: TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants

Authors: Mohammad Aliannejadi, Zahra Abbasiantaeb, Shubham Chatterjee, Jeffery Dalton, Leif Azzopardi

Comments: To appear in SIGIR 2024. arXiv admin note: substantial text overlap with arXiv:2401.01330

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Conversational information seeking has evolved rapidly in the last few years with the development of Large Language Models (LLMs), providing the basis for interpreting and responding in a naturalistic manner to user requests. The extended TREC Interactive Knowledge Assistance Track (iKAT) collection aims to enable researchers to test and evaluate their Conversational Search Agents (CSA). The collection contains a set of 36 personalized dialogues over 20 different topics each coupled with a Personal Text Knowledge Base (PTKB) that defines the bespoke user personas. A total of 344 turns with approximately 26,000 passages are provided as assessments on relevance, as well as additional assessments on generated responses over four key dimensions: relevance, completeness, groundedness, and naturalness. The collection challenges CSA to efficiently navigate diverse personal contexts, elicit pertinent persona information, and employ context for relevant conversations. The integration of a PTKB and the emphasis on decisional search tasks contribute to the uniqueness of this test collection, making it an essential benchmark for advancing research in conversational and interactive knowledge assistants.
[5] arXiv:2405.02714 [pdf, other]: Title: Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

Authors: Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents? We reform and extend six existing tasks to create a benchmark for retrieval, where we have diverse perspectives described in free-form text, besides root, neutral queries. We show that current retrievers covered in our experiments have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives. Motivated by the observation, we further explore the potential to leverage geometric features of retriever representation space to improve the perspective awareness of retrievers in a zero-shot manner. We demonstrate the efficiency and effectiveness of our projection-based methods on the same set of tasks. Further analysis also shows how perspective awareness improves performance on various downstream tasks, with 4.2% higher accuracy on AmbigQA and 29.9% more correlation with designated viewpoints on essay writing, compared to non-perspective-aware baselines.
[6] arXiv:2405.02716 [pdf, other]: Title: Sign-Guided Bipartite Graph Hashing for Hamming Space Search

Authors: Xueyi Wu

Subjects: Information Retrieval (cs.IR)

Bipartite graph hashing (BGH) is extensively used for Top-K search in Hamming space at low storage and inference costs. Recent research adopts graph convolutional hashing for BGH and has achieved the state-of-the-art performance. However, the contributions of its various influencing factors to hashing performance have not been explored in-depth, including the same/different sign count between two binary embeddings during Hamming space search (sign property), the contribution of sub-embeddings at each layer (model property), the contribution of different node types in the bipartite graph (node property), and the combination of augmentation methods. In this work, we build a lightweight graph convolutional hashing model named LightGCH by mainly removing the augmentation methods of the state-of-the-art model BGCH. By analyzing the contributions of each layer and node type to performance, as well as analyzing the Hamming similarity statistics at each layer, we find that the actual neighbors in the bipartite graph tend to have low Hamming similarity at the shallow layer, and all nodes tend to have high Hamming similarity at the deep layers in LightGCH. To tackle these problems, we propose a novel sign-guided framework SGBGH to make improvement, which uses sign-guided negative sampling to improve the Hamming similarity of neighbors, and uses sign-aware contrastive learning to help nodes learn more uniform representations. Experimental results show that SGBGH outperforms BGCH and LightGCH significantly in embedding quality.
[7] arXiv:2405.02778 [pdf, other]: Title: Improve Temporal Awareness of LLMs for Sequential Recommendation

Authors: Zhendong Chu, Zichao Wang, Ruiyi Zhang, Yangfeng Ji, Hongning Wang, Tong Sun

Comments: 10 pages

Subjects: Information Retrieval (cs.IR)

Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. However, it is empirically found that LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data, such as sequential recommendation. In this paper, we aim to improve temporal awareness of LLMs by designing a principled prompting framework inspired by human cognitive processes. Specifically, we propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation. Besides, we emulate divergent thinking by aggregating LLM ranking results derived from these strategies. Evaluations on MovieLens-1M and Amazon Review datasets indicate that our proposed method significantly enhances the zero-shot capabilities of LLMs in sequential recommendation tasks.
[8] arXiv:2405.03110 [pdf, other]: Title: Vector Quantization for Recommender Systems: A Review and Outlook

Authors: Qijiong Liu, Xiaoyu Dong, Jiaren Xiao, Nuo Chen, Hengchang Hu, Jieming Zhu, Chenxu Zhu, Tetsuya Sakai, Xiao-Ming Wu

Subjects: Information Retrieval (cs.IR)

Vector quantization, renowned for its unparalleled feature compression capabilities, has been a prominent topic in signal processing and machine learning research for several decades and remains widely utilized today. With the emergence of large models and generative AI, vector quantization has gained popularity in recommender systems, establishing itself as a preferred solution. This paper starts with a comprehensive review of vector quantization techniques. It then explores systematic taxonomies of vector quantization methods for recommender systems (VQ4Rec), examining their applications from multiple perspectives. Further, it provides a thorough introduction to research efforts in diverse recommendation scenarios, including efficiency-oriented approaches and quality-oriented approaches. Finally, the survey analyzes the remaining challenges and anticipates future trends in VQ4Rec, including the challenges associated with the training of vector quantization, the opportunities presented by large language models, and emerging trends in multimodal recommender systems. We hope this survey can pave the way for future researchers in the recommendation community and accelerate their exploration in this promising field.
[9] arXiv:2405.03167 [pdf, other]: Title: TF4CTR: Twin Focus Framework for CTR Prediction via Adaptive Sample Differentiation

Authors: Honghao Li, Yiwen Zhang, Yi Zhang, Lei Sang, Yun Yang

Subjects: Information Retrieval (cs.IR)

Effective feature interaction modeling is critical for enhancing the accuracy of click-through rate (CTR) prediction in industrial recommender systems. Most of the current deep CTR models resort to building complex network architectures to better capture intricate feature interactions or user behaviors. However, we identify two limitations in these models: (1) the samples given to the model are undifferentiated, which may lead the model to learn a larger number of easy samples in a single-minded manner while ignoring a smaller number of hard samples, thus reducing the model's generalization ability; (2) differentiated feature interaction encoders are designed to capture different interactions information but receive consistent supervision signals, thereby limiting the effectiveness of the encoder. To bridge the identified gaps, this paper introduces a novel CTR prediction framework by integrating the plug-and-play Twin Focus (TF) Loss, Sample Selection Embedding Module (SSEM), and Dynamic Fusion Module (DFM), named the Twin Focus Framework for CTR (TF4CTR). Specifically, the framework employs the SSEM at the bottom of the model to differentiate between samples, thereby assigning a more suitable encoder for each sample. Meanwhile, the TF Loss provides tailored supervision signals to both simple and complex encoders. Moreover, the DFM dynamically fuses the feature interaction information captured by the encoders, resulting in more accurate predictions. Experiments on five real-world datasets confirm the effectiveness and compatibility of the framework, demonstrating its capacity to enhance various representative baselines in a model-agnostic manner. To facilitate reproducible research, our open-sourced code and detailed running logs will be made available at: https://github.com/salmon1802/TF4CTR.
[10] arXiv:2405.03303 [pdf, other]: Title: Explainability for Transparent Conversational Information-Seeking

Authors: Weronika Łajewska, Damiano Spina, Johanne Trippas, Krisztian Balog

Comments: This is the author's version of the work. The definitive version is published in: 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14-18, 2024, Washington, DC, USA

Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

The increasing reliance on digital information necessitates advancements in conversational search systems, particularly in terms of information transparency. While prior research in conversational information-seeking has concentrated on improving retrieval techniques, the challenge remains in generating responses useful from a user perspective. This study explores different methods of explaining the responses, hypothesizing that transparency about the source of the information, system confidence, and limitations can enhance users' ability to objectively assess the response. By exploring transparency across explanation type, quality, and presentation mode, this research aims to bridge the gap between system-generated responses and responses verifiable by the user. We design a user study to answer questions concerning the impact of (1) the quality of explanations enhancing the response on its usefulness and (2) ways of presenting explanations to users. The analysis of the collected data reveals lower user ratings for noisy explanations, although these scores seem insensitive to the quality of the response. Inconclusive results on the explanations presentation format suggest that it may not be a critical factor in this setting.
[11] arXiv:2405.03382 [pdf, ps, other]: Title: Improving (Re-)Usability of Musical Datasets: An Overview of the DOREMUS Project

Authors: Pasquale Lisena, Manel Achichi (WEB3), Pierre Choffé (BnF), Cécile Cecconi, Konstantin Todorov (WEB3), Bernard Jacquemin (GERIICO), Raphaël Troncy

Journal-ref: Bibliothek Forschung und Praxis, 2018, 42 (2), pp.194-205.

Subjects: Information Retrieval (cs.IR)

DOREMUS works on a better description of music by building new tools to link and explore the data of three French institutions. This paper gives an overview of the data model based on FRBRoo, explains the conversion and linking processes using linked data technologies and presents the prototypes created to consume the data according to the web users' needs.
[12] arXiv:2405.03480 [pdf, other]: Title: Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search

Authors: Hideaki Joko, Shubham Chatterjee, Andrew Ramsay, Arjen P. de Vries, Jeff Dalton, Faegheh Hasibi

Comments: Accepted at SIGIR 2024 (Full Paper)

Subjects: Information Retrieval (cs.IR)

The future of conversational agents will provide users with personalized information responses. However, a significant challenge in developing models is the lack of large-scale dialogue datasets that span multiple sessions and reflect real-world user preferences. Previous approaches rely on experts in a wizard-of-oz setup that is difficult to scale, particularly for personalized tasks. Our method, LAPS, addresses this by using large language models (LLMs) to guide a single human worker in generating personalized dialogues. This method has proven to speed up the creation process and improve quality. LAPS can collect large-scale, human-written, multi-session, and multi-domain conversations, including extracting user preferences. When compared to existing datasets, LAPS-produced conversations are as natural and diverse as expert-created ones, which stays in contrast with fully synthetic methods. The collected dataset is suited to train preference extraction and personalized response generation. Our results show that responses generated explicitly using extracted preferences better match user's actual preferences, highlighting the value of using extracted preferences over simple dialogue history. Overall, LAPS introduces a new method to leverage LLMs to create realistic personalized conversational data more efficiently and effectively than previous methods.
[13] arXiv:2405.03562 [pdf, other]: Title: ID-centric Pre-training for Recommendation

Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Fuzhen Zhuang, Xu Zhang, Leyu Lin, Zhanhui Kang, Yongjun Xu

Subjects: Information Retrieval (cs.IR)

Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered universal across domains via PLM. Unfortunately, the behavioral information in ID embeddings is still verified to be dominating in PLM-based recommendation models compared to modality information and thus limits these models' performance. In this work, we propose a novel ID-centric recommendation pre-training paradigm (IDP), which directly transfers informative ID embeddings learned in pre-training domains to item representations in new domains. Specifically, in pre-training stage, besides the ID-based sequential model for recommendation, we also build a Cross-domain ID-matcher (CDIM) learned by both behavioral and modality information. In the tuning stage, modality information of new domain items is regarded as a cross-domain bridge built by CDIM. We first leverage the textual information of downstream domain items to retrieve behaviorally and semantically similar items from pre-training domains using CDIM. Next, these retrieved pre-trained ID embeddings, rather than certain textual embeddings, are directly adopted to generate downstream new items' embeddings. Through extensive experiments on real-world datasets, both in cold and warm settings, we demonstrate that our proposed model significantly outperforms all baselines. Codes will be released upon acceptance.
[14] arXiv:2405.03651 [pdf, other]: Title: Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders

Authors: Nishant Yadav, Nicholas Monath, Manzil Zaheer, Rob Fergus, Andrew McCallum

Comments: ICLR 2024

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. Existing approaches perform k-NN search with CE by approximating the CE similarity with a vector embedding space fit either with dual-encoders (DE) or CUR matrix factorization. DE-based retrieve-and-rerank approaches suffer from poor recall on new domains and the retrieval with DE is decoupled from the CE. While CUR-based approaches can be more accurate than the DE-based approach, they require a prohibitively large number of CE calls to compute item embeddings, thus making it impractical for deployment at scale. In this paper, we address these shortcomings with our proposed sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity. We compute item embeddings offline by factorizing a sparse matrix containing query-item CE scores for a set of train queries. Our method produces a high-quality approximation while requiring only a fraction of CE calls as compared to CUR-based methods, and allows for leveraging DE to initialize the embedding space while avoiding compute- and resource-intensive finetuning of DE via distillation. At test time, the item embeddings remain fixed and retrieval occurs over rounds, alternating between a) estimating the test query embedding by minimizing error in approximating CE scores of items retrieved thus far, and b) using the updated test query embedding for retrieving more items. Our k-NN search method improves recall by up to 5% (k=1) and 54% (k=100) over DE-based approaches. Additionally, our indexing approach achieves a speedup of up to 100x over CUR-based and 5x over DE distillation methods, while matching or improving k-NN search recall over baselines.

Cross-lists for Tue, 7 May 24

[15] arXiv:2405.02321 (cross-list from cs.AI) [pdf, other]: Title: Accelerating Medical Knowledge Discovery through Automated Knowledge Graph Generation and Enrichment

Authors: Mutahira Khalid, Raihana Rahman, Asim Abbas, Sushama Kumari, Iram Wajahat, Syed Ahmad Chan Bukhari

Comments: 18 pages, 5 figures

Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Knowledge graphs (KGs) serve as powerful tools for organizing and representing structured knowledge. While their utility is widely recognized, challenges persist in their automation and completeness. Despite efforts in automation and the utilization of expert-created ontologies, gaps in connectivity remain prevalent within KGs. In response to these challenges, we propose an innovative approach termed ``Medical Knowledge Graph Automation (M-KGA)". M-KGA leverages user-provided medical concepts and enriches them semantically using BioPortal ontologies, thereby enhancing the completeness of knowledge graphs through the integration of pre-trained embeddings. Our approach introduces two distinct methodologies for uncovering hidden connections within the knowledge graph: a cluster-based approach and a node-based approach. Through rigorous testing involving 100 frequently occurring medical concepts in Electronic Health Records (EHRs), our M-KGA framework demonstrates promising results, indicating its potential to address the limitations of existing knowledge graph automation techniques.
[16] arXiv:2405.02664 (cross-list from cs.AI) [pdf, other]: Title: MedPromptExtract (Medical Data Extraction Tool): Anonymization and Hi-fidelity Automated data extraction using NLP and prompt engineering

Authors: Roomani Srivastava, Suraj Prasad, Lipika Bhat, Sarvesh Deshpande, Barnali Das, Kshitij Jadhav

Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

A major roadblock in the seamless digitization of medical records remains the lack of interoperability of existing records. Extracting relevant medical information required for further treatment planning or even research is a time consuming labour intensive task involving the much valuable time of doctors. In this demo paper we present, MedPromptExtract an automated tool using a combination of semi supervised learning, large language models, natural lanuguage processing and prompt engineering to convert unstructured medical records to structured data which is amenable to further analysis.
[17] arXiv:2405.02677 (cross-list from cs.CL) [pdf, other]: Title: Evaluating the Ability of Computationally Extracted Narrative Maps to Encode Media Framing

Authors: Sebastián Concha Macías, Brian Keith Norambuena

Comments: Text2Story Workshop 2024

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Narratives serve as fundamental frameworks in our understanding of the world and play a crucial role in collaborative sensemaking, providing a versatile foundation for sensemaking. Framing is a subtle yet potent mechanism that influences public perception through specific word choices, shaping interpretations of reported news events. Despite the recognized importance of narratives and framing, a significant gap exists in the literature with regard to the explicit consideration of framing within the context of computational extraction and representation. This article explores the capabilities of a specific narrative extraction and representation approach -- narrative maps -- to capture framing information from news data. The research addresses two key questions: (1) Does the narrative extraction method capture the framing distribution of the data set? (2) Does it produce a representation with consistent framing? Our results indicate that while the algorithm captures framing distributions, achieving consistent framing across various starting and ending events poses challenges. Our results highlight the potential of narrative maps to provide users with insights into the intricate framing dynamics within news narratives. However, we note that directly leveraging framing information in the computational narrative extraction process remains an open challenge.
[18] arXiv:2405.02732 (cross-list from cs.CL) [pdf, other]: Title: Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents

Authors: Sneha Singhania, Simon Razniewski, Gerhard Weikum

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Methods for relation extraction from text mostly focus on high precision, at the cost of limited recall. High recall is crucial, though, to populate long lists of object entities that stand in a specific relation with a given subject. Cues for relevant objects can be spread across many passages in long texts. This poses the challenge of extracting long lists from long texts. We present the L3X method which tackles the problem in two stages: (1) recall-oriented generation using a large language model (LLM) with judicious techniques for retrieval augmentation, and (2) precision-oriented scrutinization to validate or prune candidates. Our L3X method outperforms LLM-only generations by a substantial margin.
[19] arXiv:2405.02816 (cross-list from cs.CL) [pdf, other]: Title: Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization

Authors: Hamed Zamani, Michael Bendersky

Comments: To appear in the proceedings of SIGIR 2024

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

This paper introduces Stochastic RAG--a novel approach for end-to-end optimization of retrieval-augmented generation (RAG) models that relaxes the simplifying assumptions of marginalization and document independence, made in most prior work. Stochastic RAG casts the retrieval process in RAG as a stochastic sampling without replacement process. Through this formulation, we employ straight-through Gumbel-top-k that provides a differentiable approximation for sampling without replacement and enables effective end-to-end optimization for RAG. We conduct extensive experiments on seven diverse datasets on a wide range of tasks, from open-domain question answering to fact verification to slot-filling for relation extraction and to dialogue systems. By applying this optimization method to a recent and effective RAG model, we advance state-of-the-art results on six out of seven datasets.
[20] arXiv:2405.02951 (cross-list from cs.CV) [pdf, other]: Title: iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval

Authors: Lorenzo Agnolucci, Alberto Baldrati, Marco Bertini, Alberto Del Bimbo

Comments: Extended version of the ICCV2023 paper arXiv:2303.15247

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption. The reliance of supervised methods on labor-intensive manually labeled datasets hinders their broad applicability. In this work, we introduce a new task, Zero-Shot CIR (ZS-CIR), that addresses CIR without the need for a labeled training dataset. We propose an approach named iSEARLE (improved zero-Shot composEd imAge Retrieval with textuaL invErsion) that involves mapping the visual information of the reference image into a pseudo-word token in CLIP token embedding space and combining it with the relative caption. To foster research on ZS-CIR, we present an open-domain benchmarking dataset named CIRCO (Composed Image Retrieval on Common Objects in context), the first CIR dataset where each query is labeled with multiple ground truths and a semantic categorization. The experimental results illustrate that iSEARLE obtains state-of-the-art performance on three different CIR datasets -- FashionIQ, CIRR, and the proposed CIRCO -- and two additional evaluation settings, namely domain conversion and object composition. The dataset, the code, and the model are publicly available at https://github.com/miccunifi/SEARLE.
[21] arXiv:2405.03267 (cross-list from cs.DC) [pdf, other]: Title: Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Authors: Rongxin Cheng, Yifan Peng, Xingda Wei, Hongrui Xie, Rong Chen, Sijie Shen, Haibo Chen

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Information Retrieval (cs.IR)

Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by {5.7\,$\times$} and {1.7\,$\times$}, these indexes have to pay a {5.8\,$\times$} storage amplification and {7.7\,$\times$} with respect to the dataset size, respectively. The root cause is that the coarse-grained access of SSD mismatches the fine-grained random read required by vector indexes with small amplification.
This paper argues that second-tier memory, such as remote DRAM/NVM connected via RDMA or CXL, is a powerful storage for addressing the problem from a system's perspective, thanks to its fine-grained access granularity. However, putting existing indexes -- primarily designed for SSD -- directly on second-tier memory cannot fully utilize its power. Meanwhile, second-tier memory still behaves more like storage, so using it as DRAM is also inefficient. To this end, we build a graph and cluster index that centers around the performance features of second-tier memory. With careful execution engine and index layout designs, we show that vector indexes can achieve optimal performance with orders of magnitude smaller index amplification, on a variety of second-tier memory devices.
Based on our improved graph and vector indexes on second-tier memory, we further conduct a systematic study between them to facilitate developers choosing the right index for their workloads. Interestingly, the findings on the second-tier memory contradict the ones on SSDs.
[22] arXiv:2405.03359 (cross-list from cs.CL) [pdf, ps, other]: Title: MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline

Authors: Mohamed Yaseen Jabarulla, Steffen Oeltze-Jafra, Philipp Beerbaum, Theodor Uden

Comments: {copyright} 2024 IEEE. This work has been accepted for publication and presentation at the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, to be held in Orlando, Florida, USA, July 15-19, 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

This research focuses on evaluating the non-commercial open-source large language models (LLMs) Meditron, MedAlpaca, Mistral, and Llama-2 for their efficacy in interpreting medical guidelines saved in PDF format. As a specific test scenario, we applied these models to the guidelines for hypertension in children and adolescents provided by the European Society of Cardiology (ESC). Leveraging Streamlit, a Python library, we developed a user-friendly medical document chatbot tool (MedDoc-Bot). This tool enables authorized users to upload PDF files and pose questions, generating interpretive responses from four locally stored LLMs. A pediatric expert provides a benchmark for evaluation by formulating questions and responses extracted from the ESC guidelines. The expert rates the model-generated responses based on their fidelity and relevance. Additionally, we evaluated the METEOR and chrF metric scores to assess the similarity of model responses to reference answers. Our study found that Llama-2 and Mistral performed well in metrics evaluation. However, Llama-2 was slower when dealing with text and tabular data. In our human evaluation, we observed that responses created by Mistral, Meditron, and Llama-2 exhibited reasonable fidelity and relevance. This study provides valuable insights into the strengths and limitations of LLMs for future developments in medical document interpretation. Open-Source Code: https://github.com/yaseen28/MedDoc-Bot

Replacements for Tue, 7 May 24

[23] arXiv:1905.10951 (replaced) [pdf, other]: Title: On the Evaluation Metric for Hashing

Authors: Qing-Yuan Jiang, Ming-Wei Li, Wu-Jun Li

Subjects: Information Retrieval (cs.IR)
[24] arXiv:2208.05716 (replaced) [pdf, other]: Title: Task Aligned Meta-learning based Augmented Graph for Cold-Start Recommendation

Authors: Yuxiang Shi, Yue Ding, Bo Chen, Yuyang Huang, Yule Wang, Ruiming Tang, Dong Wang

Subjects: Information Retrieval (cs.IR)
[25] arXiv:2210.09430 (replaced) [pdf, other]: Title: Evaluating Search System Explainability with Psychometrics and Crowdsourcing

Authors: Catherine Chen, Carsten Eickhoff

Comments: 11 pages, 4 figures, accepted at SIGIR 2024 as full paper

Subjects: Information Retrieval (cs.IR)
[26] arXiv:2305.14685 (replaced) [pdf, other]: Title: Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval

Authors: Shi Yu, Chenghao Fan, Chenyan Xiong, David Jin, Zhiyuan Liu, Zhenghao Liu

Comments: COLING 2024

Subjects: Information Retrieval (cs.IR)
[27] arXiv:2312.02445 (replaced) [pdf, other]: Title: LLaRA: Large Language-Recommendation Assistant

Authors: Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, Xiangnan He

Comments: 11 pages, 5 figures

Subjects: Information Retrieval (cs.IR)
[28] arXiv:2403.15246 (replaced) [pdf, other]: Title: FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Authors: Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
[29] arXiv:2403.18604 (replaced) [pdf, other]: Title: Modeling Sustainable City Trips: Integrating CO2e Emissions, Popularity, and Seasonality into Tourism Recommender Systems

Authors: Ashmi Banerjee, Tunar Mahmudov, Emil Adler, Fitri Nur Aisyah, Wolfgang Wörndl

Subjects: Information Retrieval (cs.IR)
[30] arXiv:2404.11982 (replaced) [pdf, other]: Title: SIGformer: Sign-aware Graph Transformer for Recommendation

Authors: Sirui Chen, Jiawei Chen, Sheng Zhou, Bohao Wang, Shen Han, Chanfei Su, Yuqing Yuan, Can Wang

Comments: Accepted by SIGIR2024

Subjects: Information Retrieval (cs.IR)
[31] arXiv:2404.17723 (replaced) [pdf, other]: Title: Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering

Authors: Zhentao Xu, Mark Jerome Cruz, Matthew Guevara, Tie Wang, Manasi Deshpande, Xiaofeng Wang, Zheng Li

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[32] arXiv:2201.02797 (replaced) [pdf, other]: Title: A Unified Review of Deep Learning for Automated Medical Coding

Authors: Shaoxiong Ji, Wei Sun, Xiaobo Li, Hang Dong, Ara Taalas, Yijia Zhang, Honghan Wu, Esa Pitkänen, Pekka Marttinen

Comments: ACM Computing Surveys

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[33] arXiv:2402.17152 (replaced) [pdf, other]: Title: Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Authors: Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi

Comments: 26 pages, 13 figures. ICML'24. Code available at this https URL

Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[34] arXiv:2404.14066 (replaced) [pdf, other]: Title: SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

Authors: Xuzheng Yu, Chen Jiang, Xingning Dong, Tian Gan, Ming Yang, Qingpei Guo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
[35] arXiv:2405.01972 (replaced) [pdf, other]: Title: A quantitative and typological study of Early Slavic participle clauses and their competition

Authors: Nilo Pedrazzini

Comments: 259 pages, 138 figures. DPhil Thesis in Linguistics submitted and defended at the University of Oxford (December 2023). This manuscript is a version formatted for improved readability and broader dissemination

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

New submissions
Cross-lists
Replacements

[ total of 35 entries: 1-35 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help (Access key information)

> cs > cs.IR

Information Retrieval

New submissions

New submissions for Tue, 7 May 24

Cross-lists for Tue, 7 May 24

Replacements for Tue, 7 May 24