特邀报告 – CCKS2016-全国知识图谱与语义计算大会

特邀报告 1: Using Semantic Technology to Tackle Industry’s Data Variety Challenge 特邀讲者:Prof. Ian Horrocks (Oxford University)

Abstract: Big Data technologies have made significant progress in addressing problems related to the volume and velocity of data, but they are less effective at dealing with data variety and heterogeneity; this so-called “variety challenge” is the main barrier to effective data access in many industry applications. Semantic Technologies offer a potential solution to the variety challenge, and in the Ontology Based Data Access (OBDA) approach they do so in a way that layers on top of existing infrastructure and exploits its scalability. In this talk I will explain the OBDA approach, and show how it is being used to address the variety challenge in two large companies: Siemens and Statoil. I will also highlight some of the problems and limitations of OBDA, discuss how these can be mitigated, and present some recent research that shows how semantic data access can go beyond what is possible with OBDA.

Speaker Bio: Ian Robert Horrocks is a Professor of Computer Science at the University of Oxford in the UK and a Fellow of Oriel College, Oxford. His research focuses on knowledge representation and reasoning, particularly ontology languages, description logic and optimised tableaux decision procedures.Professor Horrocks was jointly responsible for development of the OIL and DAML+OIL ontology languages, and he played a central role in the development of the Web Ontology Language OWL. These languages and associated tools have been used by the Open Biomedical Ontologies Consortium, the National Cancer Institute in America, the United Nations Food and Agriculture Organization the World Wide Web Consortium and a whole range of major corporations and government agencies. Horrocks was elected a Fellow of the Royal Society (英国皇家学会) in 2011 and the BCS Roger Needham award in 2005.

特邀报告 2:What Computers Should Know
特邀讲者:Prof. Gerhard Weikum Max-Planck-Institut für Informatik(⻢普研究所)

Abstract: Machines with comprehensive knowledge of the world’s entities and their relationships has been a long-standing vision and challenge of AI. In the last decade, huge knowledge bases (aka. knowledge graphs) have been automatically constructed from web data and text sources, and have become a key asset for search, analytics, recommendations and data integration. This digital knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, contributing to natural language processing and data analytics. This talk reviews these advances, discusses recent directions such as acquiring commonsense, and identifies new opportunities and future challenges.

Speaker Bio: Gerhard Weikum is one of the key creators of the YAGO knowledge base. He is a Research Director at the Max-Planck Institute for Informatics (MPII) in Saarbruecken, Germany, where he is leading the department on databases and information systems. He is a principal investigator of the Cluster of Excellence on Multimodal Computing and Interaction. Earlier he held positions at Saarland University in Saarbruecken, Germany, at ETH Zurich, Switzerland, at MCC in Austin, Texas, and he was a visiting senior researcher at Microsoft Research in Redmond, Washington. Gerhard Weikum’s research spans transactional and distributed systems, self-tuning database systems, data and text integration, and the automatic construction of knowledge bases. He co-authored a comprehensive textbook on transactional systems, received the VLDB 10-Year Award for his work on automatic DB tuning, and is one of the creators of the YAGO knowledge base. Weikum is an ACM Fellow and a member of several academies. He has served on various editorial boards and as PC chair of conferences like SIGMOD, ICDE and CIDR. He received a Google Focused Research Award in 2010, the ACM SIGMOD Contributions Award in 2011, an ERC Synergy Grant in 2013, and the ACM SIGMOD Edgar F. Codd Innovations Award in 2016.

特邀报告 3: 面向基础教育的大数据类人智能答题系统总体设想及其困难与挑战
特邀讲者:⻩河燕教授，北京理工大学

摘要:大数据时代，如何充分利用海量知识资源及其处理能力，高国家信息化建设的智能水平具有极其重要的战略意义;同时伴随网络信息的飞速发展，互联网正从信息网络演变为知识网络，因此，海量知识资源获取与智能知识问答技术将促使智能信息服务水平有质的飞跃，已成为网络大数据时代研究的热点和难点。类人答题作为智能问答的一种有效验证手段，越来越引起学术界、工业界的高度关注。国家重点研发计划立项支持的“基于大数据的类人智能关键技术与系统”项目，通过研究海量知识获取与深度学习、内容理解与推理、问题分析与求解、交互式问答等类人智能的关键技术，构建大数据环境下面向基础教育的海量知识资源和知识图谱，研制具有类人知识处理能力的智能答题系统，其中涉及自然语言处理、信息检索、机器学习、知识工程等多项人工智能核心技术。本报告将主要介绍该系统的总体设想与方案、研究进展，尤其是其中所面临的难点与挑战，以及未来的研究重点与方向展望。

报告人简介:黄河燕教授 1989 年于中国科学院计算技术研究所获博士学位后留所历任助理研究员、副研究员、研究员;1997 年至 2009 年在中科院计算机语言信息工程研究中心任副主任、研究员;2009 年至今任北京理工大学计算机学院院长、教授，北京市海量语言信息处理与云计算应用工程技术研究中心主任。现兼任国家 “863 计划”主题专家组成员、中国人工智能学会及中国中文信息学会副理事长、教育部计算机教学指导委员会委员、北京市学位委员会委员。主持承担了国家自科基金重点项目、“973 计划”课题、“863 计划”项目等 20 多项国家级科研攻关项目，获得了国家科技进步一等奖等 8 项国家级和省部级奖励，1995 年享受国务院政府特殊津贴，2014 年荣获全国优秀科技工作者称号。

特邀报告 4:Short Text Understanding
特邀讲者:Dr. Haixun Wang. Facebook

Abstract: Billions of short texts are produced every day, in the form of search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Unlike documents, short texts have some unique characteristics, which make them difficult to handle. First, short texts, especially search queries, do not always observe the syntax of a written language. This means traditional NLP techniques, such as syntactic parsing, do not always apply to short texts. Second, short texts contain limited context.

The majority of search queries contain less than 5 words, and tweets can have no more than 140 characters. Because of the above reasons, short texts give rise to a significant amount of ambiguity, which makes them extremely difficult to handle. On the other hand, many applications, including search engines, ads, automatic question answering, online advertising, recommendation systems, etc., rely on short text understanding. In this talk, I will go over various techniques in knowledge acquisition, representation, and inferencing has been proposed for text understanding, and will describe massive structured and semi-structured data that have been made available in the recent decade that directly or indirectly encode human knowledge, turning the knowledge representation problems into a computational grand challenge with feasible solutions insight.

Speaker Bio:Haixun Wang is a research scientist / engineering manager at Facebook. Before Facebook, he is with Google Research, working on natural language processing. He led research in semantic search, graph data processing systems, and distributed query processing at Microsoft Research Asia. He had been a research staff member at IBM T. J. Watson Research Center from 2000 to 2009. He was Technical Assistant to Stuart Feldman (Vice President of Computer Science of IBM Research) from 2006 to 2007, and Technical Assistant to Mark Wegman (Head of Computer Science of IBM Research) from 2007 to 2009. He received the Ph.D. degree in computer science from the University of California, Los Angeles in 2000. He has published more than 150 research papers in referred international journals and conference proceedings. He served PC Chair of conferences such as CIKM’12 and he is on the editorial board of IEEE Transactions of Knowledge and Data Engineering (TKDE), and Journal of Computer Science and Technology (JCST). He won the best paper award in ICDE 2015, 10 year best paper award in ICDM 2013, and best paper award of ER 2009.