KDD 2011的關於Scaling Up Machine Learning的Tutorial

KDD 2011的關於Scaling Up Machine Learning的Tutorial

Scaling Up Machine Learning相對於研究來說,其實在實際應用中更爲迫切一點。因爲實際應用中存在太多的數據,造成了嚴重的效率問題,如何在有效地時間內,並且儘量最大化的利用好手裏的資源來解決問題,是一個迫在眉睫的問題。

這裏有一個KDD 2011的Tutorial,大家看看。

http://www.kdd.org/kdd2011/images/KDD_Banner_10_Jan.jpg

 

Scaling Up Machine Learning, the Tutorial,KDD 2011

 

Ron BekkermanMisha Bilenko and John Langford

Part I slides (Powerpoint) Introduction

Part II.a slides (Powerpoint) Use of Trees

Part II.b slides (Powerpoint) Graphical models

Part III slides (Summary + GPU learning + Terascale linear learning)

This tutorial gives a broad view of modern approaches for scaling up machine learning and data mining methods on parallel/distributed platforms. Demand for scaling up machine learning is task-specific: for some tasks it is driven by the enormous dataset sizes, for others by model complexity or by the requirement for real-time prediction. Selecting a task-appropriate parallelization platform and algorithm requires understanding their benefits, trade-offs and constraints. This tutorial focuses on providing an integrated overview of state-of-the-art platforms and algorithm choices. These span a range of hardware options (from FPGAs and GPUs to multi-core systems and commodity clusters), programming frameworks (including CUDA, MPI, MapReduce, and DryadLINQ), and learning settings (e.g., semi-supervised and online learning). The tutorial is example-driven, covering a number of popular algorithms (e.g., boosted trees, spectral clustering, belief propagation) and diverse applications (e.g., speech recognition and object recognition in vision).

The tutorial is based on (but not limited to) the material from our upcoming Cambridge U. Press edited bookwhich is currently in production and will be available in December 2011.

Presenters

Ron Bekkerman is a senior research scientist at LinkedIn where he develops machine learning and data mining algorithms to enhance LinkedIn products. Prior to LinkedIn, he was a researcher at HP Labs. Ron completed his PhD in Computer Science at the University of Massachusetts Amherst in 2007. He holds BSc and MSc degrees from the Technion---Israel Institute of Technology. Ron has published on various aspects of clustering, including multimodal clustering, semi-supervised clustering, interactive clustering, consensus clustering, one-class clustering, and clustering parallelization.

Misha Bilenko is a researcher in Machine Learning and Intelligence group at Microsoft Research, which he joined in 2006 after receiving his PhD from the University of Texas at Austin. His current research interests include large-scale machine learning methods, adaptive similarity functions and personalized advertising.

John Langford is a senior researcher at Yahoo! Research. He studied Physics and Computer Science at the California Institute of Technology, earning a double bachelor's degree in 1997, and received his PhD from Carnegie Mellon University in 2002. Previously, he was affiliated with the Toyota Technological Institute and IBM's Watson Research Center. He is the author of the popular Machine Learning weblog, hunch.net. John's research focuses on the fundamentals of learning, including sample complexity, learning reductions, active learning, learning with exploration, and the limits of efficient optimization.

=========

另外還有Blei的關於LDA以及其他的東西的Tutotial

many of you have asked for the slides from the tutorial at KDD.  i
posted them on this page:

http://www.cs.princeton.edu/~blei/topicmodeling.html

the link to the PDF of the slides is

http://www.cs.princeton.edu/~blei/kdd-tutorial.pdf

if any of you have comments or suggestions, please email me.  i hope
to use these slides again.

發佈了0 篇原創文章 · 獲贊 13 · 訪問量 18萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章