by Puneet (@puneetkrojha) on Tuesday, 27 March 2018

Vote on this proposal
Status: Confirmed & Scheduled
Sponsored talk

Technical level



About Product: XSTREAM

XStream is a Unified Self-Service Analytics ETL & ML Platform Built On Top Of Apache Spark, which allows you to create scalable and fault tolerant pipelines.You can express your Big Data Spark computation logic in a much simpler and intuitive fashion and get your complex pipelines ready in minutes.
XStream is also capable of running Big Data batch jobs as streaming computation on a static data.It allows to switch from batch processing jobs to stream processing jobs and viceversa. XStream provides you with ready to use I/O connectors,interface to use static dataset for joins and lookups and connectors to perform realtime fast lookup on Redis,HBase and BigTable.
Complex and important part of handling job failures gracefully,bad data handling,getting realtime descriptiive and prescriptive metrics dashboard for your running jobs,defining and scheduling workflows on your batch and streaming jobs, are another very important aspect it gracefully handles out of the box for all your pipelines.
XStream also focuses on defining all important data featurization operators needed to create Machine Learning models.It allows to embed online Machine Learning models into XStream pieplines and also create Machine Learning models using it Drag and Drop constructs.

UseCase: Realtime Market Propensity Modeling

An existing Fortune 500 Online Retailor had their batch Market Propensity models which took around 24 hours to generate updated models to be used in their Machine Learning Pipelines.Due to huge infrastructure cost they created their Models on sample data. Business usecases needed upgrade in existing model to be updated in realtime.They had issues in maintaining realtime customer segment profiles and customer product profiles.

XStream helped not only change the existing Market Propensity pipeline from Batch to Realtime but its effective feature generation operators helped reduce the time and infrastructure cost. Complete input data was used to generate Market Propensity models , Realtime Customer Segment Profiles and Customer Product Profiles.
The customer could use the same pipeline for batch or streaming inputs,on a click of a button, thereby avoiding the re-engineering required to developed two workflows.
We will explain the existing model logic , how it was mapped in XStream by a ETL Developer who could never imagine creating similar workflows like skilled Big Data Developers and run it without much hassle.One doesn’t need to focus on tuning the jobs as the important aspects of connection tuning, getting metrics on input rate,memory usage,shuffle and alert on ill configured job parameters,bypassing and storing bad data records in separate sink are handled by XStream.


Introducing XStream
Features of the Product
Machine Learning Usecase(Realtime Market Propesity Modeling) using XStream

Speaker bio

Puneet Kumar Ojha
VP Data Engineering and Analytics

Proven Experience in building scalable Big Data and Machine Learning,Data Quality and Analytics Products.He has delivered solutions for Online Retail,AdTech,HeathCare Domains.Experience in architecting solutions scaling to petaByte scale data for low lantecy and high velocity.

Experienced Data Modeler for relational and NoSQL databases.Solved Usecases on Data Convergence-Customer360, Market Propensity,Enterprise Platform Migration - DataCenter to Google Cloud & AWS, Customer Segmentation,Conversational BOT Platform and Realtime Decision Platform for Retail Industry and Connected Devices.