Evaluation Tasks – CCKS-IJCKG 2024

Task 1: Archer: Bilingual Text-to-SQL evaluation [Task Guideline Download]

Introduction：

Natural language interaction with databases in a more friendly and intuitive way is a challenging work, which aims to translate natural language questions into executable SQL statements. Some recent works have achieved good performance on existing datasets, but they cannot efficiently perform complex reasoning such as mathematics, common sense, and hypothesis. To this end, we propose Archer, a dataset that incorporates the above three types of inference to make more complex and subtle queries. In addition, we tested with both large language models and fine-tuned models. Even methods that achieve SOTA on existing datasets still only achieve 6.73% executable rate on our dataset, indicating that Archer is still a challenging dataset for current models and techniques.

Archer has three types of reasoning: mathematical reasoning, commonsense reasoning and hypothetical reasoning. Mathematical reasoning has an important proportion in the specific application scenarios of SQL. Commonsense reasoning refers to the ability to reason based on implicit commonsense knowledge, Archer contains some questions that require understanding the database to infer missing details; Hypothesis reasoning requires the model to have counterfactual thinking ability, which is the ability to imagine and reason about unseen situations based on visible facts and counterfactual hypotheses.

Task organizers：

Jeff Pan, University of Edinburgh（j.z.pan@ed.ac.uk）
Zhichao Yan，Shanxi University（zhichaoyan@foxmail.com）
Wenyu Huang, University of Edinburgh

Academic Guidance Group：

Jeff Pan, University of Edinburgh
Ru Li，Shanxi University
Mirela Lapata, University of Edinburgh

Contacts：

Zhichao Yan（zhichaoyan@foxmail.com）