The Fifth Elephant 2014

A conference on big data and analytics

How to build a Data Stack from scratch

Submitted by Vinayak Hegde (@vin) on Wednesday, 21 May 2014

videocam_off

Technical level

Intermediate

Section

Full talk

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +32

Objective

This talk will cover a framework for thinking about the analytics data stack. What are the things to consider when building a data stack from scratch. How to choose the right software for your stack whether it is visualisation, analytics or storage ? It will talk about the relations between different techniques for extracting insights outs of raw data. I will draw upon examples from my experience of building 3 different data stacks in 3 different industry verticals (Networks, Advertising and Customer Support) and what I learnt from each.

Description

In the talk I will talk about my experience of how to build a data stack from stratch. I have built a big data analytics stack at Akamai and Inmobi before and am currently building one now at Helpshift. These are three different domains - Content Delivery Networks (Akamai), Mobile Advertising (Inmobi) and now Customer Service (Helpshift).

More specifically, my talk will try to cover these questions and more

  • What are the different components of an analytics stack and what function does each layer have ?
  • How do you choose the right software for different layers of your analytics data stack ?
  • Do you use real-time analytics or batch processing is right for you ? What are the costs/benefits of both ?
  • What is the relation between statistical and probabilistic techniques ? Which to choose when ?
  • How to decide on the right structure and storage for your data and how they influence your analytics stack ?
  • How to decide on the right metrics for your business and how they influence your analytics stack ?

I will use specific industry examples how each of these questions were answered differently in different contexts. I will also talk the factors that influenced these decisions and how they influenced the final output and architecture.

Requirements

An open mind and some understanding of mathematics and computer science.

Speaker bio

Vinayak is an early adopter of technologies having worked across diverse and complex computer systems including embedded systems, networking, large-scale distributed systems and data-processing systems. He has more than a decade of experience in hardcore product development & software/deployment architecture.

He has led engineering teams at Akamai, Inmobi and Helpshift to build big data stacks from scratch. He organised one of the first Cloudcamps and Barcamps in India. He co-founded Headstart, a grass-roots community driven by volunteers for helping startups. Other than his interests in tech and startups, he is an avid traveller and amateur photographer.

Slides

https://speakerdeck.com/helpshift/how-to-build-a-data-stack-from-scratch

Comments

  • 1
    Dinesh Rathi (@dinrat) 4 years ago

    I think Data Storage is a concern which applies to all other layers of Data Stack independently, different layers of Data Stack have different needs for their storage.

Login with Twitter or Google to leave a comment