GlusterFS on EC2

GlusterFS is a POSIX compliant distributed file system. It is very flexible due to it's modular "translators" and well suited for cloud comptuing. GlusterFS can replicate files, parallelize volume access, stripe data, distribute files (a server that stores a file will have the whole file) and much more. The actual configuration of the storage cluster depends on the desired specs, e.g. if the application data integrity/durability is a primary concern then striping is probably not an option, replication can enhance read performance and durability but screw up write performance, etc.

The filesystem structure is defined on the client side, as opposed to many other distributed file systems. The translator stack on the client side should therefoe be consistent between clients of the gluster file system, or at least architectually compatible.

Using GlusterFS for highly accessed small files is infeasible - A good example is sharing PHP scripts of a busy website. Cpu usage and latency are so bad that it's practically unusable.

Software installation and configuration

The binary packages on gluster.com are for 64bit platforms. There are no packages for version 3.1+ on the debian repositories yet, so I needed to compile and package to use on small and medium instances. Compilation and packaging is a snap, gluster use autotools so building a debian package with cdbs is a snap.

Unlike version 3.0+, the packages are no longer split to client, server and lib packages (maybe the debian guys will split them on the official deb, beats me), there is one package containing everything, so be gentle with the postinst script.

Server configuration

Client configuration

On Gluster, almost everything works client side. It is very important to have things synchronized between clients - time, user id's, Gluster client configurations. Like NFS v3, Gluster does not translate posix uid number and the clients must have the same id numbers if you want permissions to work. This also means that the same security assumption as NFSv3, you have total control of all the systems that are capable of accessing the file system. See security below. NTP time synchronization is a must if you use IO caching, or else you will have a stale cache.

Disasters and recovery

Performance

Performance varies greatly between different server and client configurations. As a general rule, native Gluster clients can utilize the cluster more efficiently, but this does not mean they will always perform better. For example, the Gluster FUSE mount client is horrible when it comes to many small files/blocks because of the many context switches to kernel space and back. If you are facing a use case which requires many small files, use heavy caching or switch to a different client like NFS or Booster.

Caching

Security

-- AvishaiIshShalom - 31 Dec 2010
Topic revision: r6 - 10 Sep 2011 - 07:42:52 - AvishaiIshShalom
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback