+4 votes
248 views
in Know how by (242k points)
reopened
GlusterFS vs. Ceph: the two face-to-face storage systems

1 Answer

+5 votes
by (1.6m points)
edited
 
Best answer

Saving large data sets: GlusterFS and Ceph make it possible
High availability is key
Brief presentation of GlusterFS
How does GlusterFS work?
Brief presentation of Ceph
How does Ceph work?
Comparison: GlusterFS vs. Ceph
When should you use each system?

image

GlusterFS vs. Ceph: the two face-to-face storage systems

Distributed storage systems are the solution to store and manage data that does not fit on a conventional server. In this sense, size is not the only problem, but classic file systems, with their folder structure, do not support unstructured data either..

Index
  1. Saving large data sets: GlusterFS and Ceph make it possible
  2. High availability is key
  3. Brief presentation of GlusterFS
  4. How does GlusterFS work?
  5. Brief presentation of Ceph
  6. How does Ceph work?
  7. Comparison: GlusterFS vs. Ceph
  8. When should you use each system?

Saving large data sets: GlusterFS and Ceph make it possible

When we talk about big data or big data , the amount of data that will have to be managed is not known at the beginning of the project. Therefore, systems must be able to be easily expanded , while continuing to function, with additional servers that can be seamlessly integrated into the given storage system. The so-called Distributed File System (distributed file system) is shown to the user as a simple folder of a traditional file system, so that said person does not intuit that the individual data or even parts of it may be located on servers different that, perhaps, are in geographical points far from each other. Since both GlusterFS and Ceph are already software layers in Linux operating systems, they do not require additional hardware features . Linux works on any standard server and is compatible with all common hard drives on the market.

High availability is key

High availability is an important issue in distributed storage solutions: hardware breakdowns should be avoided as much as possible and the software running that runs the system should not be interrupted when new components are added, or when necessary. maintenance work is necessary. Important metadata cannot be stored in a single central location, but must be accessible in a decentralized way and no items should be left without redundancy. In the event of a server failure, the entire system must never be compromised. GlusterFS and Ceph are systems to host data from big data projects in the same system and to be able to filter it from there. Both can be expanded almost as much as you like , but are based on different approaches ..

Done

The term big data ( big data , massive data) refers to quantities, rather masses of data very large, complex and with little structure, such as those collected in certain sensors for scientific purposes (GPS satellites, for example) or in meteorological or statistical systems. In the field of big data, in addition to storage, efficient search and systematic organization of data play a key role.

Brief presentation of GlusterFS

GlusterFS is a distributed file system with a modular structure, in which several servers are connected to each other over a TCP / IP network. Since it is a POSIX (Portable Operating System Interface) compliant system, GlusterFS can be easily integrated into Linux server environments , as can FreeBSD, OpenSolaris, and macOS, which are also POSIX compliant. Integration in Windows environments, however, is currently only possible indirectly through a Linux server acting as a gateway ..

How does GlusterFS work?

GlusterFS started out as a classic, file-based storage system. Later it became object-oriented and when making the change, special importance was placed on its ability to be properly integrated into the well-known open source solution OpenStack. In the background, GlusterFS continues to work with files: each file is assigned an object and the connection between them is established by hardlinks on the file system. For the user, no dedicated server is shown , but he or she has their own interfaces to save their data in GlusterFS, which is presented as a single system.

Advantage Drawbacks
Easy to integrate into Linux systems Integration into Windows systems only indirectly
POSIX compatible  
Compatible with FUSE (File System in User Space)  

Brief presentation of Ceph

Ceph's open source distributed storage solution is an object-oriented memory based on binary objects, thus avoiding the rigid block structures of conventional data carriers. In terms of hardware, Ceph also uses hard drives, but an algorithm is responsible for managing the binary objects , which are divided into many parts and spread over many servers, but then re-unified.

How does Ceph work?

All components work in a decentralized manner. All OSDs (Object Based Storage Device) have the same rights. In this way, as many servers as you like, with their different hard drives , can be connected to each other to form a unified storage system . Through three important interfaces, Ceph offers different possibilities to integrate it into the existing system environment: CephFS as a Linux file system driver, RADOS Block Devices (RBD) as a Linux device, which can be directly integrated; and RADOS Gateway, compatible with Swift and Amazon S3.

Advantage Drawbacks
Easy to integrate into all systems, regardless of the operating system Poor quality of file system functions
Block Oriented Device for Linux Increased familiarization effort with storage structures, which are brand new
CephFS file system for Linux  
Amazon S3 interface  
Seamless integration with Keystone authentication  
FUSE (File System in User Space) module to support systems without CephFS client  

Comparison: GlusterFS vs. Ceph

Since there are several technical differences between GlusterFS and Ceph, there is no clear winner . Ceph is in principle an object-based storage system for unstructured data, while GlusterFS uses tree-shaped file systems on block-based devices. GlusterFS originates from a highly efficient , file-based storage system , but is increasingly developing in an object-oriented way. Ceph, however, it was originally developed as storage device objects ( object storage ) binary , not a classic file system. This can lead to weaknesses in typical operations of traditional file systems.

GlusterFS Ceph
Better at file systems Better at object storage
Faster storage algorithm Better performance on simple hardware
Does not require a central metadata server Easy to integrate into all systems, regardless of the operating system
Less complexity Block Oriented Device for Linux
Best suited for storing large data (starting at about 4MB per file) Easier adaptations to customer needs
More suitable for files with sequential access Compatible with RADOS

When should you use each system?

Ceph, thanks to its varied interfaces, works well in heterogeneous networks , in which not only Linux is used, but also other operating systems. The strong point of GlusterFS, on the other hand, is the storage of large amounts of data in traditional format, as well as large data . Since Ceph was developed from the outset as an open source solution , in the past it was easier to use in many cases, until GlusterFS became open source as well. A very relevant area of ​​application for distributed storage systems is cloud services. In this sense, OpenStack is one of the most important software projects offering architectures for cloud computing. Both GlusterFS and Ceph work equally well with OpenStack.


...