Database Management Systems (DBMSs) are notoriously hard to test because you need a test oracle — a way to know if the output is correct. Prior work builds these oracles by hand, creating a never-ending cycle of manual effort. Argus breaks this cycle by using LLMs to automatically discover test oracles, then formally verifies them with a SQL equivalence prover for soundness, and efficiently instantiates them into thousands of concrete test cases. Evaluated on five heavily-tested DBMSs, Argus found 41 previously unknown bugs (36 logic bugs), outperforming state-of-the-art manual oracle designs. In practice, spending just ~$10 on LLM calls generates millions of reliable SQL tests — each capable of catching logic bugs, where a query silently returns wrong results instead of throwing an error.
For more info about the paper, see Link