by Anand Chitipothu (@anandology) on Wednesday, 20 June 2012

Vote on this proposal
Status: Confirmed
Big Data Infrastructure & Processing

Session type

Technical level



Using Internet Archive as a case study, this talk presents aspects of big data in the context of long-term preservation.


The Internet Archive has been archiving the internet since 1996. It also archives and makes available a vast collection of data including films, audio and books.

The Internet Archive is one of the earliest organizations to work with petabytes of data. It built its own infrastructure to store, process and manage its data reliably, much before the cloud. Being an archive, preservation of data is the primary concern and it affects engineering decisions.

This talk is an introduction to the Internet Archive and its infrastructure.

Speaker bio

This talk will be presented by Anand Chitipothu and Noufal Ibrahim. Both of them are employees of the Archive, working remotely from Bangalore.

Anand is a software consultant and trainer. He has been working with the Archive since 2007. He is co-ordinator of the PyCon India 2012 conference.

Noufal is a freelance trainer and consultant based out of Bangalore. Founder of PyCon India and organiser of the first two conferences in India.