The Fifth Elephant 2014

A conference on big data and analytics

Storing relationships in large data-sets using Graphs

Submitted by Inder Singh (@indersingh) on Sunday, 11 May 2014

videocam_off

Technical level

Advanced

Section

Crisp talk

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +9

Objective

Problem Statement - Fast Programmatic/self-serve analytics on linked data in an ad system by indexing it across all cuts, especially for traversals like -

  • Find all users who came from 'iphone' and 'SFO' with 10k or more clicks within the last two days.
  • Find all users who played 'Subway sufers' from U.S. more than 10 times in the last week.

As it's evident from the above examples these class of queries are different from a typical pointed query like - "find my friends who have been to golden gate birdge in the last year and have liked hiking articles". This class of query start with a point lookup and then a BFS traversal with appropiate filtering criteria which are addressed by db's like neo4j, titan in a generic fashion.

Scope of the talk -

  • highlight the internals of what it takes to solve for non-pointed queries in a generic fashion.
  • extend it to support the tinker pop api specification from neo4j, titan so that users can easily flip from one backend to another.

Description

This work was motivated to store large amounts of linkeddata in an ad system and make it available for programmatic/analytics consumption.

This talk outlines our journey which started from researching existing graphdb's/processing frameworks, why they didn't work for us at our scale and then moving on to build something.
We will go in depth to explain the data-structures used and how we supported the tinker-pop graph API specification( used by all graph databases). We will also touch upon how our ad-system unique data model allowed us to come up with a fairly simplistic technique to shard the entire thing and query over it.

Takeaways from this talk -

  • what are graphdb's, when should you choose one.
  • different use-cases require different stores.
  • what it takes to build a graph store for allo-centric(alike OLAP) graph traversals.

Requirements

  • inclination towards linked-data, everything else will be covered.

Speaker bio

Inder Singh - have been working on solving data related problems at Inmobi(World's largest independent ad-network) for the past ~3 years.

Links

Slides

http://www.slideshare.net/InderSingh10/graph-store

Comments

  • 1
    Vinayak Hegde (@vin) 4 years ago

    Can you add slides to this talk ? Also more detail about what you used to build the final solution would be useful. Please describe the exact problem you were trying to solve using Graph.

  • 1
    Inder Singh (@indersingh) Proposer 4 years ago (edited 4 years ago)

    Vinayak, thanks for your feedback. Will add the slides in a day/two.

    Problem Statement - Fast Programmatic/self-serve analytics on linked data in an ad system by indexing it across all cuts, especially for traversals like -

    • Find all users who came from 'iphone' and 'SFO' with 10k or more clicks within the last two days.
    • Find all users who played 'Subway sufers' from U.S. more than 10 times in the last week.

    As it's evident from the above examples these class of queries are different from a typical pointed query like - "find my friends who have been to golden gate birdge in the last year and have liked hiking articles". This class of query start with a point lookup and then a BFS traversal with appropiate filtering criteria which are addressed by db's like neo4j, titan in a generic fashion.

    Scope of the talk -

    • highlight the internals of what it takes to solve for non-pointed queries in a generic fashion.
    • extend it to support the tinker pop api specification from neo4j, titan so that users can easily flip from one backend to another.
  • 1
    Inder Singh (@indersingh) Proposer 4 years ago

    Vinayak,

    Have uploaded the outline slides as per your suggestion. These would be made crisp and adjusted as per the audience and duration of the talk.

    Thanks,
    - inder

Login with Twitter or Google to leave a comment