Weekly Talk

Static bug detection in the era of LLMs

The rapid advancement of Large Language Models (LLMs) has opened new opportunities for static bug and vulnerability detection, offering complementary insights to traditional static analysis. In this talk, I will present our recent work on LLM-based …

Why the Proof Fails in Different Versions of Theorem Provers: An Empirical Study of Compatibility Issues in Isabelle

Proof assistants are software tools for formal modeling and verification of software, hardware, design, and mathematical proofs. Due to the growing complexity and scale of formal proofs, compatibility issues frequently arise when using different …

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended …

CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models

Dialect translation plays a key role in enabling seamless interaction across heterogeneous database systems. However, translating SQL queries between different dialects (e.g., from PostgreSQL to MySQL) remains a challenging task due to syntactic …

Towards Dialect-Agnostic Query Parsing and Rewriting

SQL query rewriting is a core technique used in applications such as optimization, dialect translation, and testing, yet most existing tools remain restricted by dialect-specific parsers. We present SQLFlex, a novel framework that combines …

On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations

Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment. DRL has recently gained traction from being able to solve complex environments like …

Quantifying Bug Reporting Agreement -- Towards More Verifiable and Valid Software Testing Research

Many software testing papers claim to discover bugs but without providing sufficient evidence. My analysis of 39 papers from top conferences revealed that only 51% provide complete bug identifiers and 28% have inaccessible artifacts. Their …

Siloso: Finding Logic Bugs in RDBMS via Dialect-Adaptable Reference Engine Construction

Relational DBMSs (RDBMSs) are ubiquitous, so any bugs or inconsistencies within RDBMSs are highly consequential. Particularly, logic bugs, which can cause an incorrect result to be returned for a given query evaluation, are critical because they are …

Paper Share - DOVE: Diagnosis-driven SLO Violation Detection

Service-level objectives (SLOs), as network performance requirements for delay and packet loss typically, should be guaranteed for increasing high-performance applications, e.g., telesurgery and cloud gaming. However, SLO violations are common and …

NeurBench: Benchmarking Learned Database Components with Data and Workload Drift Modeling

Learned database components, which deeply integrate machine learning into their design, have been extensively studied in recent years. Given the dynamism of databases, where data and workloads continuously drift, it is crucial for learned database …