报告1: Mining Structured Knowledge from Massive Unstructured Text
Jiawei Han ( University of Illinois at Urbana-Champaign )
Bio: Jiawei Han is Michael Aiken Chair Professor in the Department of Computer Science, University of Illinois at Urbana-Champaign. He received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Japan’s Funai Achievement Award (2018). He is Fellow of ACM and Fellow of IEEE and served as the Director of Information Network Academic Research Center (INARC) (2009-2016) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab and co-Director of KnowEnG, a Center of Excellence in Big Data Computing (2014-2019), funded by NIH Big Data to Knowledge (BD2K) Initiative.
Abstract: The real-world big data are largely dynamic, interconnected and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data. Such approaches, however, are not scalable. We vision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with pretrained language models and text embedding methods, it is promising to transform unstructured data into structured knowledge. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including joint spherical text embedding, discriminative topic mining, taxonomy construction, text classification, and taxonomy-guided text analysis. We show that data-driven approach could be promising at transforming massive text data into structured knowledge.
报告2: 复杂推理的进展和挑战
周明( 创新工场 )
报告人简介:周明现任创新工场首席科学家、澜舟科技创始人和CEO、中国计算机学会副理事长、中国中文信息学会常务理事。他曾任微软亚洲研究院副院长、 国际计算语言学协会(ACL)主席。他还担任哈工大、天津大学、南开大学、北航、中国科技大学等高校的博士导师。他是2018年首都劳动奖章获得者。
摘要:近年来,复杂推理的研究受到了越来越多的关注,它需要理解相关信息并应用复杂规则得到正确的推论。作为人类智能决策的一项重要能力,复杂推理在许多复杂的现实场景中发挥了作用,如数学应用题、辩论谈判和医学诊断等。我们通过对美国司法学院入学考试LSAT中的三种任务(分析推理、逻辑推理和阅读理解)来探索复杂推理研究目前的进展和挑战。我们比较了符号方法、神经方法和神经符号方法的优缺点,并针对三种任务分别提出了对应的模型以探索复杂推理的能力,尤其是极具挑战性的逻辑推理能力。我们实验了大规模预训练模型和任务特定的推理模块相结合,以及符号知识和离散推理步骤相结合等多种方法。我们既看到了有意思的进展,也遇到了很大的困难。本演讲将介绍我们在这些方面的一系列进展,分析复杂推理的存在的挑战,并探究未来可能的研究方向。
报告3: 图神经网络 (GNN) 及自监督学习
唐杰 ( 清华大学计算机系 )
报告人简介:唐杰,清华大学计算机系教授、系副主任,获杰青。研究人工智能、认知图谱、数据挖掘、社交网络和机器学习。发表论文300余篇,引用18000余次,获ACM SIGKDD Test-of-Time Award(十年最佳论文)。主持研发了研究者社会网络挖掘系统AMiner。担任IEEE T. on Big Data、AI OPEN主编以及WWW’21、CIKM’16、WSDM’15的PC Chair。获北京市科技进步一等奖、人工智能学会一等奖、KDD杰出贡献奖。
摘要:图神经网络将深度学习方法延伸到非欧几里得的图数据上,大大提高了图数据应用的精度。在这个报告中,我将简单回顾一下图神经网络(GNN)并探讨如何提高GNN在图数据上的表示学习能力,并且有效避免传统GNN存在的过平滑(Over-smoothing)、过拟合(Over-fitting)以及鲁棒性差的问题,同时我也会探讨负采样在GNN表示学习中的重要性。其次我会介绍最近我们在GNN自监督学习(Self-supervised Learning)方面的一些工作。我还将简单介绍一下如何将图神经网络应用于决策。