Weekly Talk

NeurBench: Benchmarking Learned Database Components with Data and Workload Drift Modeling

Learned database components, which deeply integrate machine learning into their design, have been extensively studied in recent years. Given the dynamism of databases, where data and workloads continuously drift, it is crucial for learned database …

Translating C To Rust: Lessons from a User Study

Rust aims to offer full memory safety for programs, a guarantee that untamed C programs do not enjoy. How difficult is it to translate existing C code to Rust? To get a complementary view from that of automatic C to Rust translators, we report on a …

Automatic Differential Testing of the PHP Interpreter

The PHP interpreter, powering over 70% of websites on the internet, plays a crucial role in web development. Existing approaches to finding bugs in PHP primarily focus on detecting explicit security issues through crashes or sanitizer-based oracles, …

Fuzzing the PHP Interpreter via Dataflow Fusion

PHP, a dominant scripting language in web development, powers a vast range of websites, from personal blogs to major platforms. While existing research primarily focuses on PHP application-level security issues like code injection, memory errors …

A Benchmark Harness for Query Execution Correctness Verification and Query Optimizer Evaluation of Database Systems

Query engines are the cornerstone of any relational databases, including query optimizers and query executors. It is imperative for database developers to be equipped with a tool to detect the query execution bug and evaluate the query optimizer …

Efficient and Scalable Distributed LLM Training: Hiding Communication Overhead

Training Large Language Models (LLMs) is often inefficient due to high communication overhead, resulting in sub-50% Model FLOPS Utilization (MFU). In this talk, I will discuss how to build a cost-efficient and scalable machine learning system, using …

Type Systems for Query Languages

In this talk, I will introduce type systems for query languages, with a focus on SQL and GQL. Practical SQL engines exhibit subtle differences in their handling of typing constraints and implicit type casts, often overlooked in formal accounts of …

SGL: Deriving Test Case Generators using Domain-Specific Language to Test Database Engines

Various automated testing approaches have been proposed for Database Management Systems (DBMS), which can automatically detect different kinds of bugs such as logic and performance bugs. Such approaches typically compare the results of executing two …

Automated test case reduction in query specific language(s)

Database testing tools like SQLsmith and SQLancer generate lengthy test cases to identify several categories of database bugs. While these tools are effective in identifying issues, usually the resulting test is large and complex, making it difficult …

Improving the Extensibility of SQLancer

SQLancer, an open-source tool for testing database management systems (DBMS), is instrumental in uncovering bugs within real-world applications. However, maintaining SQLancer has become increasingly challenging due to tightly coupled components, …