ACME: Automated Clause Mapping Engine for Testing Emerging Database Systems

Abstract

A growing number of emerging database management systems, such as time-series and streaming databases, have been developed to support specialized workloads with enhanced performance and functionality. However, these systems are often less mature than traditional relational databases, making them more prone to logic bugs and internal errors that affect correctness and reliability. To address this, we propose an enhanced differential testing framework designed for emerging SQL-like databases. Our key insight is that many of these systems are conceptually extensions of relational databases, allowing us to uncover bugs by comparing query results with those from more robust relational systems. To bridge the differences in syntax and semantics between emerging and relational databases, we leverage Large Language Models (LLMs) to automate the discovery of supported clauses and generate clause mappings that translate system-specific features into equivalent expressions in SQL. Our approach proceeds in three steps: (i) collecting and analyzing the syntax of clauses supported by both the emerging database system and a relational reference system, (ii) constructing clause mappings via LLMs, validating them through testing queries, and formalizing them into Abstract Syntax Tree (AST) transformations or mapping functions, and (iii) generating semantically equivalent but syntactically varied queries to expand the scope of differential testing. To ensure the reliability of LLM-generated clause mappings, we introduce a testing query mechanism that re-prompts incorrect mappings after runtime verification. We implemented this approach in a tool called ACME and applied it to four widely used emerging database systems, uncovering 57 previously unknown bugs, including 17 logic bugs and 40 internal errors. Of these, 50 have been fixed and 5 confirmed by vendors. Our results demonstrate the practicality and effectiveness of ACME in improving the reliability of emerging database systems through scalable, LLM-assisted differential testing.

Publication
The ACM International Conference on the Foundations of Software Engineering (FSE)
Yuancheng Jiang
Yuancheng Jiang
Student Collaborators

Yuancheng Jiang is a Ph.D. student at National University of Singapore

Manuel Rigger
Manuel Rigger
Faculty

Manuel is an Assistant Professor and leads the group.