Survey of Machine Learning tools as the data grows (Weka, R, Octave, Mahout)
Submitted by Vivek Mehta (@vivekmehta) on Sunday, 24 June 2012
To review the available machine learning tools relevant for different size and scale of data.
At different stages of organization growth, size and scale of data is different and so is the tool needed for doing machine learning(ML). Its not necessary to build huge team and GBs of data for ML tricks to be useful and relevant. ML can be applied using small amount of data with appropriate tools at early stage of organization. And as the org grows and so is the data size, one needs to change the tools required. One need to look at distributed ML system.
In this talk we will explore, with examples, use of specific tool based on the requirement. We will explore various practical requirements and intelligent use of tools like Weka, R, Octave, Hadoop, Mahout. Also we will compare advantages and limitations of these tools based on nature of ml algo(clustring, regression, etc.), type of data and modeling of the problem.
Vivek Mehta is Senior Research Engineer at Flipkart and works on various ML related projects. Vivek have several years of experience in machine learning, statistical modeling, probability models, NLP and big data analytics. After completing MS from CMU, Vivek has worked at Read-Ink, PubMatic and TouchMagix before joining FlipKart. Vivek's experience spans across various domain like handwriting recognition, optimization of online ad-revenue and e-commerce.