Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining


The conference program, with its three parallel tracks - the Research Track, the Applied Data Science Track and the Applied Invited Speakers Track - brings the two groups together.



(3)应用邀请演讲者轨道 - 将两个小组合并在一起。

The conference this year continues with its tradition of a strong tutorial and workshop program on leading edge issues of data mining during the first two days of the program. The last three days are devoted to contributed technical papers, describing both novel, important research contributions, and deployed, innovative solutions.




paper List :

Three keynote talks, by Cynthia Dwork, Bin Yu, and Renée J. Miller touch on some of the hard, emerging issues before the field of data mining.

1. 三个主题演讲——数据挖掘领域面临的新兴难题。

(1)What’s Fair?——Cynthia Dwork (Microsoft Research & Harvard University)

(2)The Future of Data Integration 

数据集成的未来——Renée J. Miller (University of Toronto)

(3)Three Principles of Data Science: Predictability, Stability and Computability

数据科学的三个原则:可预测性,稳定性和可计算性——Bin Yu (University of California, Berkeley)

2. 12个 Applied Invited Talks

(1)Foreword to the Applied Data Science – Invited Talks Track at KDD-2017

应用数据科学前言 - KDD-2017特邀报告

(2)More than the Sum of its Parts: Building Domino Data Lab

不仅仅是相加:构建Domino数据实验室——Eduardo Ariño de la Rubia (Domino Data Lab)

(3)Mining Big Data in Neuro Genetics to Understand Muscular Dystrophy

挖掘神经遗传学中的大数据来了解肌营养不良症——Andy Berglund (University of Florida

(4)Industrial Machine Learning

工业机器学习——Josh Bloom (GE)

(5)Behavior Informatics to Discover Behavior Insight for Active and Tailored Client Management

行为信息学进行行为洞察,用于主动和定制的客户端管理——Longbing Cao (University of Technology Sydney)

(6)It Takes More than Math and Engineering to Hit the Bullseye with Data

击中数据靶心不仅需要数学和工程——Paritosh Desai (Target)

(7)Planning and Learning under Uncertainty: Theory and Practice

不确定性下的规划与学习:理论与实践——Jonathan P. How (Massachusetts Institute of Technology)

(8)Big Data in Climate: Opportunities and Challenges for Machine Learning

气候大数据:机器学习的机遇和挑战——Anuj Karpatne, Vipin Kumar (University of Minnesota)

(9)Addressing Challenges with Big Data for Media Measurement

应对大数据媒体测量挑战——Mainak Mazumdar (Nielsen)

(10)Machine Learning Software in Practice: Quo Vadis?

机器学习软件的实践:Quo Vadis?——Szilárd Pafka (Epoch)

(11)Designing AI at Scale to Power Everyday Life

设计人工智能以帮助日常生活——Rajesh Parekh (Facebook)

(12)Spaceborne Data Enters the Mainstream

星载数据进入主流——David Potere (Tellus Laboratories)

3. KDD 2017 Panels(人工智能相关)

(1)Benchmarks and Process Management in Data Science: Will We Ever Get Over the Mess?

数据科学中的基准测试和流程管理:我们能否克服困难?——Usama M. Fayyad (Open Insights), Arno Candel (, Inc.), Eduardo Ariño de la Rubia (Domino Data Lab),Szilárd Pafka (Epoch), Anthony Chong (IKASI), Jeong-Yoon Lee (Microsoft)

(2)The Future of Artificially Intelligent Assistants

人工智能助手的未来——Muthu Muthukrishnan (Rutgers University), Andrew Tomkins, Larry Heck (Google), Alborz Geramifard (Amazon), Deepak Agarwal (LinkedIn)

4.KDD 2017 Research Papers (Oral Papers) 研究文献

(1)Learning Certifiably Optimal Rule Lists

学习可证明的最优规则列表——Elaine Angelino (University of California, Berkeley),Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer (Harvard University), Cynthia Rudin (Duke University)

(2)Improved Degree Bounds and Full Spectrum Power Laws in Preferential Attachment Networks

在优先附着网络中改进度边界和全谱幂律——Chen Avin, Zvi Lotker (Ben Gurion University of the Negev),Yinon Nahum, David Peleg (Weizmann Institute of Science)

(3)Unsupervised Network Discovery for Brain Imaging Data

脑成像数据的无监督网络发现——Zilong Bai (University of California, Davis), Peter Walker, Anna Tschiffely (Naval Medical Research Center),Fei Wang (Cornell University), Ian Davidson (University of California, Davis)

(4)Patient Subtyping via Time-Aware LSTM Networks

病人分类,通过时间感知的LSTM网络——Inci M. Baytas (Michigan State University), Cao Xiao (IBM T. J. Watson Research Center),Xi Zhang, Fei Wang (Cornell University), Anil K. Jain, Jiayu Zhou (Michigan State University)

(5)Robust Top-k Multiclass SVM for Visual Category Recognition

稳健Top-k多类SVM,用于视觉分类识别——Xiaojun Chang (Carnegie Mellon University), Yao-Liang Yu (University of Waterloo),Yi Yang (University of Technology Sydney)

(6)KATE: K-Competitive Autoencoder for Text

KATE:文本K-竞争自动编码器——Yu Chen, Mohammed J. Zaki (Rensselaer Polytechnic Institute)

(7)A Minimal Variance Estimator for the Cardinality of Big Data Set Intersection

大数据集交叉基数的最小方差估计——Reuven Cohen, Liran Katzir, Aviv Yehezkel (Technion)

(8)HyperLogLog Hyperextended: Sketches for Concave Sublinear Frequency Statistics

HyperLogLog Hyperextended:用于凹次线性频率统计的草图——Edith Cohen (Google Research)

(9)Fast Enumeration of Large k-Plexes

Large k-Plexes的快速枚举——Alessio Conte (University of Pisa), Donatella Firmani (Roma Tre University),Caterina Mordente (Be Think Solve Execute), Maurizio Patrignani, Riccardo Torlone (Roma Tre University)

(10)Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery

矩阵Profile V:将领域知识合并到Motif发现中的一种通用技术

(10)metapath2vec: Scalable Representation Learning for Heterogeneous Networks


(11)Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters


(12)Contextual Motifs: Increasing the Utility of Motifs using Contextual Data


(13)Unsupervised P2P Rental Recommendations via Integer Programming


(14)The Co-Evolution Model for Social Network Evolving and Opinion Migration


(15)Groups-Keeping Solution Path Algorithm for Sparse Regression with Automatic Feature Grouping


(16)Clustering Individual Transactional Data for Masses of Users


(17)Network Inference via the Time-Varying Graphical Lasso


(18)Efficient Correlated Topic Modeling with Topic Embedding


(19)Accelerating Innovation Through Analogy Mining


(20)Communication-Efficient Distributed Block Minimization for Nonlinear Kernel Machines


(21)A Hierarchical Algorithm for Extreme Clustering


(21)Estimating Treatment Effect in the Wild via Differentiated Confounder Balancing


(22)The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables


(23)Constructivism Learning: A Learning Paradigm for Transparent Predictive Analytics


(24)Is the Whole Greater Than the Sum of Its Parts?

(25)Collaborative Variational Autoencoder for Recommender Systems


(26)Linearized GMM Kernels and Normalized Random Fourier Features


(27)Discrete Content-aware Matrix Factorization


(28)Effective and Real-time In-App Activity Analysis in Encrypted Internet Traffic Streams


(29)Functional Annotation of Human Protein Coding Isoforms via Non-convex Multi-Instance Learning


(30)Discovering Reliable Approximate Functional Dependencies


(21)Towards an Optimal Subspace for K-Means

(22)SPARTan: Scalable PARAFAC2 for Large & Sparse Data


(23)struc2vec: Learning Node Representations from Structural Identity

(24)Similarity Forests

(25)Structural Deep Brain Network Mining

(26)On Finding Socially Tenuous Groups for Online Social Networks

(27)A Local Algorithm for Structure-Preserving Graph Cut

