Large language models (LLMs) and Retrieval-Augmented Generation (RAG) are increasingly integrated into software systems to realize intelligent features. However, this integration poses significant challenges due to undefined interface specifications, diverse software context requirements, and complex system management.In this talk, we first present a comprehensive empirical study on the correctness of LLM integration. By analyzing 100 open-source LLM-enabled applications, we identified 18 distinct defect patterns located across the LLM agent, vector database, software components, and system management. Our study reveals that integration defects are widespread, with 77% of these applications containing more than three types of defects that degrade functionality, efficiency, and security. To facilitate future research, we constructed Hydrangea, a defect library containing 546 identified defects.Guided by the findings from our empirical study, we then introduce Comfrey, a runtime framework designed to prevent integration failures in LLM-enabled software. Serving as a middle layer between AI and software components, Comfrey automatically detects and resolves potential integration failures through a three-stage workflow targeting format, syntax, and repetition errors. Our evaluation demonstrates that Comfrey effectively detects 75.1% and prevents 63.3% of potential integration failures with only 8.4% overhead, significantly outperforming existing baselines.
Speaker Info:
Yuchen Shao is a third-year Ph.D. student at the Software Engineering Institute, East China Normal University (ECNU) and the Shanghai Innovation Institute, co-advised by Prof. Chengcheng Wan and Prof. Ting Su. Her research interests lie in SE/Sys for AI and software testing. Her recent work centers on the correctness and reliability of Large Language Model (LLM) integration in software systems, including analyzing integration patterns and mitigating runtime failures in LLM-enabled software.