The Fifth Elephant 2012

Finding the elephant in the data.

GlusterFS "Big Data" Interface

Submitted by Venky Shankar (@efault) on Tuesday, 17 July 2012

videocam_off

Technical level

Intermediate

Section

Big Data Infrastructure & Processing

Session type

Demo

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +20

Objective

Infrastructure for Big-Data processing (drop-in replacement for Hadoop Distributed File System - HDFS)

Description

GlusterFS is an open source, distributed file system capable of scaling to several petabytes and handling thousands of clients. GlusterFS clusters together storage building blocks over Infiniband RDMA or TCP/IP interconnect.

GlusterFS can also be used as a replacement for HDFS and to run Map/Reduce jobs on data residing on it. GlusterFS Hadoop plugin allows exisitng Map/Reduce jobs to seamlessly work without any changes. This is done by using Hadoop's FileSystem interface and communicating to GlusterFS via it's native protocol (using FUSE).

Requirements

Basic know-how of GlusterFS Distributed File System Working knowledge of Hadoop UNIX

Speaker bio

Venky Shankar works on GlusterFS at Red Hat. He is a Team lead for the Replication team and is also responsible for designing and implementing the Hadoop compatibility plugin in GlusterFS. He has about six years of experience in the industry. His interests include System Programming, Distributed Systems, Big Data.

Links

Comments

Login with Twitter or Google to leave a comment