23rd-26th July 2014
Status: Awaiting jury selection

In 2014, infrastructure components such as Hadoop, Berkeley Data Stack and other commercial tools have stabilized and are thriving. The challenges have moved higher up the stack from data collection and storage to data analysis and its presentation to users. The focus for this year’s conference on analytics – the infrastructure that powers analytics and how analytics is done.

Talks will cover various forms of analytics including real-time and opportunity analytics, and technologies and models used for analyzing data.

Proposals will be reviewed using 5 criteria:
Domain diversity – proposals will be selected from different domains – medical, insurance, banking, online transactions, retail. If there is more than one proposal from a domain, the one which meets the editorial criteria will be chosen.
Novelty – what has been done beyond the obvious. Insights – what insights does the proposal share with the audience that they did not know earlier. Practical versus theoretical – we are looking for applied knowledge. If the proposal covers material that can be looked up online, it will not be considered.
Conceptual versus tools-centric – tell us why, not how. Tell the audience what was the philosophy underlying your use of an application, not how an application was used. Presentation skills – proposer’s presentation skills will be reviewed carefully and assistance provided to ensure that the material is communicated in the most precise and effective manner to the audience.

Tickets: http://fifthel.doattend.com

Website: https://fifthelephant.in/2014

For queries about proposals / submissions, write to info@hasgeek.com


  1. Data Collection and Transport – for e.g, Opendatatoolkit, Scribe, Kafka, RabbitMQ, etc.

  2. Data Storage, Caching and Management – Distributed storage (such as Gluster, HDFS) or hardware-specific (such as SSD or memory) or databases (Postgresql, MySQL, Infobright) or caching/storage (Memcache, Cassandra, Redis, etc).

  3. Data Processing, Querying and Analysis – Oozie, Azkaban, scikit-learn, Mahout, Impala, Hive, Tez, etc.

  4. Real-time analytics

  5. Opportunity analytics

  6. Big data and security

  7. Big data and internet of things

  8. Data Usage and BI (Business Intelligence) in different sectors.

Please note: the technology stacks mentioned above indicate latest technologies that will be of interest to the community. Talks should not be on the technologies per se, but how these have been used and implemented in various sectors, enterprises and contexts.

Confirmed sessions

# Speaker Section Level +1 Submitted
1 Large Scale Modelling and Analytics Challenges at a Payments Company
subhajit sanyal (@subhajit) Full talk Intermediate 1 0 Fri, 4 Jul
2 Real Time User-Scoring for Bidding in Display Retargeting
Ambuj Singh (@ambujs) Crisp talk Beginner 1 0 Thu, 3 Jul
3 Getting your hands dirty with Aerospike
Sunil Sayyaparaju (@sunils) Sponsored workshop Intermediate 4 0 Tue, 1 Jul
4 Using Data for Art
Rasagy Sharma (@rasagy) Crisp talk Beginner 24 0 Sun, 15 Jun
5 Scaling SolrCloud to a large number of collections
Shalin Mangar (@shalinmangar) Full talk Advanced 7 0 Sun, 15 Jun
6 Big data in finance  
Chirag Anand (@chiraganand) Full talk Intermediate 8 1 Fri, 13 Jun
7 Dr. Hadoop – Diagnose your Hadoop Jobs
Chandraprakash Bhagtani (@cpbhagtani) Crisp talk Intermediate 15 4 Fri, 13 Jun
8 De-dup on Hadoop  
Neeta Pande (@neetapande) Crisp talk Beginner 4 0 Thu, 12 Jun
9 Real world machine learning
Harshad Saykhedkar (@harshss) Workshops Intermediate 6 0 Wed, 11 Jun
10 The state of Julia - a fast language for technical computing
Viral B. Shah (@viralbshah) Crisp talk Intermediate 9 0 Wed, 11 Jun
11 Lessons from Elasticsearch in production
Swaroop (@swaroopch) Full talk Intermediate 32 1 Wed, 11 Jun
12 Data sciences (is) in fashion @ Myntra
Divya Alok (@divyaalok) Full talk Intermediate 8 2 Tue, 10 Jun
13 Analytics on Large Scale, Unstructured, Dynamic Data using Lambda Architecture
Rajesh Muppalla (@codingnirvana) Full talk Intermediate 16 4 Mon, 9 Jun
14 Using Cascalog and Clojure to make the elephant move!
Harshad Saykhedkar (@harshss) Crisp talk Intermediate 3 1 Sun, 8 Jun
15 The ART of Data Mining - Practical Learnings from Real-world Data Mining applications
Shailesh Kumar (@shkumar) Full talk Intermediate 18 8 Tue, 3 Jun
16 Scaling Spatial Data - OpenStreetMap as Infrastructure.
Sajjad Anwar (@geohacker) Full talk Intermediate 13 0 Tue, 3 Jun
17 Machine Learning using R : Crash course in Classification Methods
Bargava Subramanian (@barsubra) Workshops Beginner 8 2 Sun, 1 Jun
18 Machine learning at scale with Spark
madhukara phatak (@madhukaraphatak) Full talk Beginner 5 0 Sat, 31 May
19 Live analytical dashboards at scale - SQL style
Shashwat Agarwal (@shashwatag) Full talk Intermediate 10 7 Mon, 26 May
20 Apache Tez: Accelerating Hadoop Data Pipelines
t3rmin4t0r (@t3rmin4t0r) Full talk Beginner 13 4 Fri, 23 May
21 How to build a Data Stack from scratch  
Vinayak Hegde (@vin) Full talk Intermediate 32 1 Wed, 21 May
22 Scaling real time visualisations for Elections 2014
Anand S (@sanand0) Full talk Intermediate 18 0 Mon, 19 May
23 Experimentation to Productization : developing a Dynamic Bidding system for a location aware Mobile landscape  
Ekta Grover (@ekta1007) Full talk Intermediate 20 0 Mon, 12 May
24 Unified analytics platform for Bigdata
Amareshwari Sriramadasu (@amareshwari) Full talk Intermediate 12 3 Mon, 12 May
25 Storing relationships in large data-sets using Graphs  
Inder Singh (@indersingh) Crisp talk Advanced 9 3 Sun, 11 May
26 Realizing Large-scale Distributed Deep Learning Networks over GraphLab
Dr. Vijay Srinivas A (@avijaysrinivas) Full talk Intermediate 19 1 Wed, 7 May
27 Building distributed search applications using Apache SOLR    
Saumitra Srivastav (@saumitra) Workshops Beginner 8 6 Mon, 28 Apr
28 Why we built the most adopted Polyglot Object Mapper for NoSQL?
Vivek Shrivastava (@vishri) Full talk Intermediate 27 1 Fri, 25 Apr
29 'Know Your Customer!' - Advanced Data Science for Audience Segmentation
prabhakar srinivasan (@prabhacar7) Full talk Advanced 17 3 Mon, 21 Apr
30 Crafting Visual Stories with Data  
Amit Kapoor (@amitkaps) Full talk Beginner 10 0 Fri, 28 Mar
31 Circuitscape - A Case Study on Scientific Computing
Viral B. Shah (@viralbshah) via Tanmay K. Mohapatra (@tanmaykm) Full talk Intermediate 10 0 Mon, 3 Mar
32 Serving user intent : Facebook style notifications using HBase and Event streams
Regunath Balasubramanian (@regunathb) Full talk Intermediate 23 2 Fri, 31 Jan

Unconfirmed proposals

# Speaker Section Level +1 Submitted
1 How to deploy a 50 node SolrCloud cluster on AWS in 15 minutes
Shalin Mangar (@shalinmangar) Crisp talk Beginner 6 0 Sun, 15 Jun
2 Overcoming problems that you will face when trying to break speed limit
Sunil Sayyaparaju (@sunils) Full talk Intermediate 16 1 Sun, 15 Jun
3 Supercharge Application I/O Performance with SSD caching
Sumit Kumar (@sumitk) Full talk Intermediate 15 0 Sun, 15 Jun
4 big data analytics with machine learning
Swapnil Birla (@swapnilbirla) Crisp talk Beginner 2 0 Thu, 12 Jun
5 Interactive analytics on event streams with complexly nested schemas
Abishek Baskaran (@abishekbaskaran) Full talk Intermediate 47 0 Thu, 12 Jun
6 Twitter data collection framework for dummies.
Nischal HP (@nischalhp) Full talk Beginner 9 0 Wed, 11 Jun
7 Latest trends in Market Mix Modeling & a unique way of making measurement & optimization more effective  
rhebbar (@rhebbar) (proposing) Crisp talk Advanced 17 1 Mon, 9 Jun
8 Ten things to consider for Interactive Analytics on high volume, write-once workloads  
Abinasha Karana (@abhinashak) Full talk Advanced 5 0 Mon, 9 Jun
9 Filtering the noise from an avalanche of Google Analytics Metrics : Anomaly Detection
Kushan Shah (@shahkushan17) Crisp talk Intermediate 4 0 Sat, 7 Jun
10 Real Time Secure API delivering data @ scale
Akash Mishra Crisp talk Beginner 4 1 Wed, 4 Jun
11 Migrating traditional warehouse and its applications to a Big-data platform
Manish Shukla (@manishshukla) Full talk Intermediate 6 0 Wed, 4 Jun
12 Fast Elephant - the Cheeliphant (Cheetah-Elephant)!
Ashok Banerjee (@ashokbanerjee) Full talk Beginner 24 0 Tue, 3 Jun
13 Run Predictive Machine Learning algorithms on Hadoop without even knowing Mapreduce.
GaganDeep Juneja (@gagandeepjuneja) Full talk Intermediate 6 1 Tue, 3 Jun
14 Advanced Big Data Analytics using Apache Mahout and Giraph  
swapnil dubey (@swapnildubey1984) Workshops Advanced 25 6 Mon, 2 Jun
15 Machine learning + Interactive visualization: A pragmatic approach to fixing knowledge bases
Viraj Paripatyadar (@virajparipatyadar) Full talk Beginner 2 0 Sun, 1 Jun
16 Tailor made stores at myntra or how to personalize your search results
Apoorva Gaurav (@apoorvagaurav) Crisp talk Intermediate 7 2 Sat, 31 May
17 Lambda Architecture
Nitin Supekar (@nsupekar) Full talk Intermediate 3 1 Fri, 23 May
18 De-dup @ Scale : Experiments with DynamoDB
Hemanth Yamijala (@yhemanth) Full talk Intermediate 27 3 Thu, 22 May
19 Hive and Presto for Big Data Analytics in the Cloud  
Vikram Agrawal (@vikram) Full talk Intermediate 19 2 Tue, 20 May
20 Using Elasticsearch for Analytics
Vaidik Kapoor (@vaidik) Full talk Intermediate 13 4 Sun, 18 May
21 Extracting and Employing Domain-Specific Knowledge Graphs (DKGraphs)
Satnam Singh, PhD (@satnam-datageek) Full talk Beginner 9 0 Tue, 13 May
22 Extending Vega - A visualisation grammar to create interactive visualisations
anupamme (@anupamme) Crisp talk Beginner 6 4 Sat, 3 May
23 Spot the model hiding in the Big Data
Ashok Banerjee (@ashokbanerjee) Full talk Beginner 34 7 Wed, 30 Apr
24 Apache Pig Power tools  
visuthemoon (@vissuthedatascientist) Workshops Intermediate 20 8 Mon, 28 Apr
25 BDAS, the Berkeley Data Analytics Stack
Mukesh Gangadhar (@mukgbv) Crisp talk Beginner 4 0 Tue, 15 Apr
26 What would you recommend?
Anand (@anandk) Workshops Intermediate 5 1 Fri, 11 Apr
27 Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and Etcd
Suman Karthik (@mrphoebs) Full talk Intermediate 16 1 Wed, 9 Apr
28 Scaling with Queues
Rohit Yadav (@bhaisaab) Full talk Intermediate 11 2 Tue, 1 Apr
29 What chemistry can teach us about designing better NLP algorithms
Siva Prakash Kollana (@sivaprakash) Crisp talk Beginner 17 4 Thu, 27 Mar
30 Big Data in Telecom - Case studies
Siddharth Vijayvergiya (@vijayvergiya) Full talk Intermediate 2 0 Thu, 27 Mar
31 Developing Real-Time Data Pipelines with Apache Kafka
Manisha Sethi (@manishasethi) Full talk Advanced 4 1 Thu, 27 Mar
32 How to Make Big Data Real and Valuable ...
Mayur Shah (@ssmayur) Crisp talk Intermediate -1 1 Thu, 27 Mar
Arvind Gopinath (@arvindo) Full talk Intermediate 24 3 Sun, 2 Mar
34 Engineering custom visualisations with advanced d3.js
Chirag Gehlot (@chiraggehlot) Workshops Advanced 12 0 Mon, 3 Feb
35 Visualizing large data sets  
Puneet Mohan Sangal (@pmsangal) Full talk Intermediate 17 1 Thu, 30 Jan