23rd-26th July 2014
Status: Submissions and voting closed, awaiting jury selection

In 2014, infrastructure components such as Hadoop, Berkeley Data Stack and other commercial tools have stabilized and are thriving. The challenges have moved higher up the stack from data collection and storage to data analysis and its presentation to users. The focus for this year’s conference on analytics – the infrastructure that powers analytics and how analytics is done.

Talks will cover various forms of analytics including real-time and opportunity analytics, and technologies and models used for analyzing data.

Proposals will be reviewed using 5 criteria:
Domain diversity – proposals will be selected from different domains – medical, insurance, banking, online transactions, retail. If there is more than one proposal from a domain, the one which meets the editorial criteria will be chosen.
Novelty – what has been done beyond the obvious. Insights – what insights does the proposal share with the audience that they did not know earlier. Practical versus theoretical – we are looking for applied knowledge. If the proposal covers material that can be looked up online, it will not be considered.
Conceptual versus tools-centric – tell us why, not how. Tell the audience what was the philosophy underlying your use of an application, not how an application was used. Presentation skills – proposer’s presentation skills will be reviewed carefully and assistance provided to ensure that the material is communicated in the most precise and effective manner to the audience.

  1. Data Collection and Transport – for e.g, Opendatatoolkit, Scribe, Kafka, RabbitMQ, etc.

  2. Data Storage, Caching and Management – Distributed storage (such as Gluster, HDFS) or hardware-specific (such as SSD or memory) or databases (Postgresql, MySQL, Infobright) or caching/storage (Memcache, Cassandra, Redis, etc).

  3. Data Processing, Querying and Analysis – Oozie, Azkaban, scikit-learn, Mahout, Impala, Hive, Tez, etc.

  4. Real-time analytics

  5. Opportunity analytics

  6. Big data and security

  7. Big data and internet of things

  8. Data Usage and BI (Business Intelligence) in different sectors.

Please note: the technology stacks mentioned above indicate latest technologies that will be of interest to the community. Talks should not be on the technologies per se, but how these have been used and implemented in various sectors, enterprises and contexts.

Confirmed sessions

# Speaker Section Level +1 Submitted
1 Large Scale Modelling and Analytics Challenges at a Payments Company
subhajit sanyal (@subhajit) Full talk Intermediate 1 0 Fri, Jul 4
2 Real Time User-Scoring for Bidding in Display Retargeting
Ambuj Singh (@ambujs) Crisp talk Beginner 1 0 Thu, Jul 3
3 Getting your hands dirty with Aerospike
Sunil Sayyaparaju (@sunils) Sponsored workshop Intermediate 4 0 Tue, Jul 1
4 Using Data for Art
Rasagy Sharma (@rasagy) Crisp talk Beginner 24 0 Sun, Jun 15
5 Scaling SolrCloud to a large number of collections
Shalin Mangar (@shalinmangar) Full talk Advanced 7 0 Sun, Jun 15
6 Big data in finance
Chirag Anand (@chiraganand) Full talk Intermediate 8 1 Fri, Jun 13
7 Dr. Hadoop – Diagnose your Hadoop Jobs
Chandraprakash Bhagtani (@cpbhagtani) Crisp talk Intermediate 15 4 Fri, Jun 13
8 De-dup on Hadoop
Neeta Pande (@neetapande) Crisp talk Beginner 4 0 Thu, Jun 12
9 Real world machine learning
Harshad Saykhedkar (@harshss) Workshops Intermediate 6 0 Wed, Jun 11
10 The state of Julia - a fast language for technical computing
Viral B. Shah (@viralbshah) Crisp talk Intermediate 9 0 Wed, Jun 11
11 Lessons from Elasticsearch in production
Swaroop (@swaroopch) Full talk Intermediate 32 1 Wed, Jun 11
12 Data sciences (is) in fashion @ Myntra
Divya Alok (@divyaalok) Full talk Intermediate 8 2 Tue, Jun 10
13 Analytics on Large Scale, Unstructured, Dynamic Data using Lambda Architecture
Rajesh Muppalla (@codingnirvana) Full talk Intermediate 16 4 Mon, Jun 9
14 Using Cascalog and Clojure to make the elephant move!
Harshad Saykhedkar (@harshss) Crisp talk Intermediate 3 1 Sun, Jun 8
15 The ART of Data Mining - Practical Learnings from Real-world Data Mining applications
Shailesh Kumar (@shkumar) Full talk Intermediate 18 8 Tue, Jun 3
16 Scaling Spatial Data - OpenStreetMap as Infrastructure.
Sajjad Anwar (@geohacker) Full talk Intermediate 13 0 Tue, Jun 3
17 Machine Learning using R : Crash course in Classification Methods
Bargava Subramanian (@barsubra) Workshops Beginner 8 2 Sun, Jun 1
18 Machine learning at scale with Spark
madhukara phatak (@madhukaraphatak) Full talk Beginner 5 0 Sat, May 31
19 Live analytical dashboards at scale - SQL style
Shashwat Agarwal (@shashwatag) Full talk Intermediate 10 7 Mon, May 26
20 Apache Tez: Accelerating Hadoop Data Pipelines
t3rmin4t0r (@t3rmin4t0r) Full talk Beginner 13 4 Fri, May 23
21 How to build a Data Stack from scratch
Vinayak Hegde (@vin) Full talk Intermediate 32 1 Wed, May 21
22 Scaling real time visualisations for Elections 2014
Anand S (@sanand0) Full talk Intermediate 18 0 Mon, May 19
23 Experimentation to Productization : developing a Dynamic Bidding system for a location aware Mobile landscape
Ekta Grover (@ekta1007) Full talk Intermediate 20 0 Mon, May 12
24 Unified analytics platform for Bigdata
Amareshwari Sriramadasu (@amareshwari) Full talk Intermediate 12 3 Mon, May 12
25 Storing relationships in large data-sets using Graphs
Inder Singh (@indersingh) Crisp talk Advanced 9 3 Sun, May 11
26 Realizing Large-scale Distributed Deep Learning Networks over GraphLab
Dr. Vijay Srinivas A (@avijaysrinivas) Full talk Intermediate 19 1 Wed, May 7
27 Building distributed search applications using Apache SOLR
Saumitra Srivastav (@saumitra) Workshops Beginner 8 6 Mon, Apr 28
28 Why we built the most adopted Polyglot Object Mapper for NoSQL?
Vivek Shrivastava (@vishri) Full talk Intermediate 27 1 Fri, Apr 25
29 'Know Your Customer!' - Advanced Data Science for Audience Segmentation
prabhakar srinivasan (@prabhacar7) Full talk Advanced 17 3 Mon, Apr 21
30 Crafting Visual Stories with Data
Amit Kapoor (@amitkaps) Full talk Beginner 10 0 Fri, Mar 28
31 Circuitscape - A Case Study on Scientific Computing
Viral B. Shah (@viralbshah) via Tanmay K. Mohapatra (@tanmaykm) Full talk Intermediate 10 0 Mon, Mar 3
32 Serving user intent : Facebook style notifications using HBase and Event streams
Regunath Balasubramanian (@regunathb) Full talk Intermediate 23 2 Fri, Jan 31

Unconfirmed proposals

# Speaker Section Level +1 Submitted
1 How to deploy a 50 node SolrCloud cluster on AWS in 15 minutes
Shalin Mangar (@shalinmangar) Crisp talk Beginner 5 0 Sun, Jun 15
2 Overcoming problems that you will face when trying to break speed limit
Sunil Sayyaparaju (@sunils) Full talk Intermediate 16 1 Sun, Jun 15
3 Supercharge Application I/O Performance with SSD caching
Sumit Kumar (@sumitk) Full talk Intermediate 15 0 Sun, Jun 15
4 big data analytics with machine learning
Swapnil Birla (@swapnilbirla) Crisp talk Beginner 2 0 Thu, Jun 12
5 Interactive analytics on event streams with complexly nested schemas
Abishek Baskaran (@abishekbaskaran) Full talk Intermediate 47 0 Thu, Jun 12
6 Twitter data collection framework for dummies.
Nischal HP (@nischalhp) Full talk Beginner 9 0 Wed, Jun 11
7 Latest trends in Market Mix Modeling & a unique way of making measurement & optimization more effective
rhebbar (@rhebbar) (proposing) Crisp talk Advanced 17 1 Mon, Jun 9
8 Ten things to consider for Interactive Analytics on high volume, write-once workloads
Abinasha Karana (@abhinashak) Full talk Advanced 5 0 Mon, Jun 9
9 Filtering the noise from an avalanche of Google Analytics Metrics : Anomaly Detection
Kushan Shah Crisp talk Intermediate 4 0 Sat, Jun 7
10 Real Time Secure API delivering data @ scale
Akash Mishra Crisp talk Beginner 4 1 Wed, Jun 4
11 Migrating traditional warehouse and its applications to a Big-data platform
Manish Shukla (@manishshukla) Full talk Intermediate 6 0 Wed, Jun 4
12 Fast Elephant - the Cheeliphant (Cheetah-Elephant)!
Ashok Banerjee (@ashokbanerjee) Full talk Beginner 24 0 Tue, Jun 3
13 Run Predictive Machine Learning algorithms on Hadoop without even knowing Mapreduce.
GaganDeep Juneja (@gagandeepjuneja) Full talk Intermediate 6 1 Tue, Jun 3
14 Advanced Big Data Analytics using Apache Mahout and Giraph
swapnil dubey (@swapnildubey1984) Workshops Advanced 25 6 Mon, Jun 2
15 Machine learning + Interactive visualization: A pragmatic approach to fixing knowledge bases
Viraj Paripatyadar (@virajparipatyadar) Full talk Beginner 2 0 Sun, Jun 1
16 Tailor made stores at myntra or how to personalize your search results
Apoorva Gaurav (@apoorvagaurav) Crisp talk Intermediate 7 2 Sat, May 31
17 Lambda Architecture
Nitin Supekar (@nsupekar) Full talk Intermediate 3 1 Fri, May 23
18 De-dup @ Scale : Experiments with DynamoDB
Hemanth Yamijala (@yhemanth) Full talk Intermediate 27 3 Thu, May 22
19 Hive and Presto for Big Data Analytics in the Cloud
Vikram Agrawal (@vikram) Full talk Intermediate 19 2 Tue, May 20
20 Using Elasticsearch for Analytics
Vaidik Kapoor (@vaidik) Full talk Intermediate 13 4 Sun, May 18
21 Extracting and Employing Domain-Specific Knowledge Graphs (DKGraphs)
Satnam Singh, PhD (@satnam-datageek) Full talk Beginner 8 0 Tue, May 13
22 Extending Vega - A visualisation grammar to create interactive visualisations
anupamme (@anupamme) Crisp talk Beginner 6 4 Sat, May 3
23 Spot the model hiding in the Big Data
Ashok Banerjee (@ashokbanerjee) Full talk Beginner 34 7 Wed, Apr 30
24 Apache Pig Power tools
visuthemoon (@vissuthedatascientist) Workshops Intermediate 20 8 Mon, Apr 28
25 BDAS, the Berkeley Data Analytics Stack
Mukesh Gangadhar (@mukgbv) Crisp talk Beginner 4 0 Tue, Apr 15
26 What would you recommend?
Anand (@anandk) Workshops Intermediate 5 1 Fri, Apr 11
27 Curating A Hunderd Thousand Online Stores Using Storm, ElasticSearch and Etcd
Suman Karthik (@mrphoebs) Full talk Intermediate 16 1 Wed, Apr 9
28 Scaling with Queues
Rohit Yadav (@bhaisaab) Full talk Intermediate 11 2 Tue, Apr 1
29 What chemistry can teach us about designing better NLP algorithms
Siva Prakash Kollana (@sivaprakash) Crisp talk Beginner 17 4 Thu, Mar 27
30 Big Data in Telecom - Case studies
Siddharth Vijayvergiya (@vijayvergiya) Full talk Intermediate 2 0 Thu, Mar 27
31 Developing Real-Time Data Pipelines with Apache Kafka
Manisha Sethi (@manishasethi) Full talk Advanced 4 1 Thu, Mar 27
32 How to Make Big Data Real and Valuable ...
Mayur Shah (@ssmayur) Crisp talk Intermediate -1 1 Thu, Mar 27
Arvind Gopinath (@arvindo) Full talk Intermediate 24 3 Sun, Mar 2
34 Engineering custom visualisations with advanced d3.js
Chirag Gehlot (@chiraggehlot) Workshops Advanced 12 0 Mon, Feb 3
35 Visualizing large data sets
Puneet Mohan Sangal (@pmsangal) Full talk Intermediate 17 1 Thu, Jan 30