Data Serialization: from pickle to databases and HDF5

Executive summary

When scientists have finished their data reduction tasks they need a way to consolidate the results in persistent storage media so that they can easily recover data afterward. I'll talk about the basic tools that comes with Python library for allowing this task, as well as introducing relational databases and general numerical oriented formats (NPY and HDF5).

The talk will be given in a tutorial style, so that people can directly look at how things are done. During the tutorial emphasis will be put in comparing serialized sizes and performance.

Contents

1. The Basics

  • Introduction
  • Pickling our objects
  • Relational databases

2. Numerical Binary Formats

  • Why we need them?
  • The NPY format
  • The HDF5 format

3. Adding Compression

  • Why compression?
  • Solid compression
  • Chunked compression

Preliminary slides here

 
materials/serialization.txt · Last modified: 2010/09/29 18:33 by python-faculty
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki