by Shailesh Kumar (@shkumar) on Tuesday, June 3, 2014

+18
Vote on this proposal
Status: Confirmed & Scheduled
View session in schedule
Section
Full talk

Technical level
Intermediate

Objective

Machine Learning and data mining is part SCIENCE (ML algorithms, optimization), part ENGINEERING (large scale modeling, real-time decisions), part PROCESS (data understanding, feature engineering, modelling, evaluation, and deployment), and part ART. In this talk we will focus more on the "ART of data mining" - the little things that make the big difference in the quality and sophistication of machine learning models we build. Using real-world analytics problems from a variety of domains, we will share a number of practical learnings in:

(1) The art of understanding the data better - (e.g. visualization of text data in a semantic space)

(2) The art of feature engineering - (e.g. converting raw inputs into meaningful and discriminative features)

(3) The art of dealing with nuances in class labels - (e.g. creating, sampling, and cleaning up class labels)

(4) The art of combining labeled and unlabeled data - (e.g. semi-supervised and active learning)

(5) The art of decomposing a complex modelling problem into simpler ones - (e.g. divide and conquer)

(6) The art of using textual features with structured features to build models, etc.

The key objective of the talk is to share some of the learnings that might come in handy while "designing" and "debugging" machine learning solutions and to give a fresh perspective on why data mining is still mostly an ART.

Description

The role of a data scientist has evolved in the last few years from someone who can "put-together" a "modelling pipeline" to someone who can: (a) "understand" the data beyond basic statistics and simple visualizations, (b) extract "deep" and "novel" insights from the data, (c) engineer "better features" to fairly distribute complexity between features and models, (d) visualize and make sense of complex data types like networks, unstructured text corpora, etc., and (e) create innovative ways of harnessing data to make smarter decisions.

In order to create "magic from data", a data scientist must go beyond the SCIENCE, ENGINEERING, and PROCESS and delve into the ART of data mining. In this talk I will share a number of "mistakes" and "innovations" in this context that helped me build better models in domains as diverse as remote sensing, text classification, text clustering, fraud detection, information retrieval, bioinformatics, retail data mining, and image understanding, etc.

These practical insights might help the audience pay attention to the right details in the modelling process, look for model improvements in the right places, be more creative with their data and use its full potential, and even overcome the limitations of their modeling tools.

Requirements

This talk is more about modelling methodology insights than tools and algorithms. Some prior experience with building machine learning models (in any domaiin, using any technique) might be helpful but not required.

Speaker bio

http://www.linkedin.com/in/shaileshk

Comments

  • 3
    [-] Inder Singh (@indersingh) 2 years ago

    Looking forward to this.

  • 1
    [-] Sudhindra (@arsudhindra) 2 years ago

    Great to attend this lecture from Shailesh!

  • 1
    [-] Veenus A V (@veenusav) 2 years ago

    Nice to know that Shailesh speaks this time too . also about very practical experinces…

  • 1
    [-] sai krishna bala (@saikrishnab) 2 years ago

    Is this talk posted in hasgeek tv or in hasgeek youtube channel?

    • 1
      [-] Shailesh Kumar (@shkumar) 2 years ago

      Yup video will be released soon.

  • 1
    [-] Kunal Patel (@kppatel) 2 years ago

    please post video link of this talk.

    • 1
      [-] Shailesh Kumar (@shkumar) 2 years ago

      Yup video will be released soon.

  • 1
    [-] Shailesh Kumar (@shkumar) 2 years ago

    Hi Kunal/Sai, HasGeek will release the videos of this talk soon.

Login with Twitter or Google to leave a comment