+3 votes
266 views
in Know how by (242k points)
reopened
Ceph? A practical storage solution for businesses of any size

1 Answer

+4 votes
by (1.6m points)
edited
 
Best answer

What to know about Ceph and its main characteristics
How Ceph works
Access to saved data
Alternatives to Ceph
Advantages and disadvantages of Ceph
Advantages of Ceph
Disadvantages of Ceph

image

Ceph? A practical storage solution for businesses of any size

Ceph is a complete storage solution that has its own file system, the Ceph File System (CephFS). With Ceph, distributed data can be stored in various components of the network itself. In addition, the data can be saved on different physical storage media. Ceph guarantees great flexibility in the choice of storage media, as well as great scalability..

Index
  1. What to know about Ceph and its main characteristics
  2. How Ceph works
  3. Access to saved data
  4. Alternatives to Ceph
  5. Advantages and disadvantages of Ceph
    1. Advantages of Ceph
    2. Disadvantages of Ceph

What to know about Ceph and its main characteristics

The concept of Ceph was the brainchild of Sage A. Weil, who developed it as part of his PhD project and published it in 2006. Later, he continued to lead the project himself with his company, Intank Storage. In 2014, the company was acquired by RedHat, but Weil continues to manage the systems architecture and therefore remains primarily responsible for developing the concept.

Ceph only works on Linux systems and is compatible with eg CentOS, Debian, Fedora, RedHat / RHEL, OpenSUSE, and Ubuntu. It is not possible to access Ceph directly from Windows systems, but can be accessed through an iSCSI (Internet Small Computer System Interface). For this reason, Ceph is especially suitable for use in computer centers that make their storage space available to users through servers, as well as in all kinds of cloud solutions that use software to manage storage options..

Here we summarize the most important features of Ceph:

How Ceph works

Ceph requires that several computers be connected to each other in what is called a cluster (literally group or heap , that is, a set of several computers). Each connected computer is called a node .

In a cluster there are different types of nodes, depending on the tasks they perform:

  • Monitor nodes : They manage the status of each node in the cluster and especially monitor the manager service , object storage service and metadata server (MDS) components . In order to ensure certain security, it is recommended to have at least three monitor nodes .
  • Manager : They manage the status of space utilization, system load and the level of utilization of the nodes.
  • Ceph OSDs ( Object Storage Devices ) : These are the back-end services that are actually in charge of managing the files: they are responsible for storing, duplicating and restoring the data. It is recommended to have at least three OSDs in the cluster .
  • Metadata server (MDSs) : They are responsible for storing metadata such as storage paths, timestamps and names of the files saved in CephFS, for performance reasons. They are created following the POSIX standard and can be requested via Unix command lines, such as ls , find, and like .

The key component of data storage is an algorithm called CRUSH ( Controlled Replication Under Scalable Hashing , that is, replication controlled under scalable hashing ). This algorithm is able to find an OSD with the requested file thanks to an assignment table..

The distribution of the files in Ceph is done in a pseudo-random way, that is, in such a way that it could appear that they are located in any way. In reality, however, CRUSH calculates the most suitable place to store them based on criteria defined by the network administrator. Doing so also duplicates files and stores them on separate physical media.

The files are distributed in so-called placement groups , processing the file name as a hash value . Another characteristic on which the location is based is, for example, the number of duplicates in the file.

Note

The hash value is a sequence of characters that results from processing a data input through computational operations. A simple representation of the process would be, for example, adding the figures that make up the primary data. Actually, of course, highly complex algorithms are used that generate an unambiguous fingerprint from long data streams. The result is always the same compact length and contains no unwanted characters, so it is also suitable for processing file names.

To ensure data security, journaling is used at the OSD level . There, the files to be stored are temporarily saved while waiting for them to be correctly located in all the OSDs provided.

Access to saved data

The basis of data storage in Ceph is called RADOS ( a reliable, distributed object store comprised of self-healing, self-mapping, intelligent storage nodes ), that is, a reliable and distributed memory composed of intelligent storage nodes that regenerate themselves. and they self-organize.

Saved files can be accessed using different methods:

  • librados : They can be accessed natively using the librados library through programming interfaces (APIs) with programming and scripting languages ​​such as C / C ++, Python, Java or PHP.
  • radosgw : Through this gateway data can be read or written using the Internet protocol HTTP.
  • CephFS : This is Ceph's proprietary file system, which conforms to the POSIX standard, offers a kernel module for accessing computers, and is compatible with FUSE (file system plug-in, no administrator rights).
  • RADOS Block Device : Integrates as block-oriented memory through kernel modules or virtual systems such as QEMU / KVM.

Alternatives to Ceph

The best known alternative is  GlusterFS , which is also owned by the RedHat / RHEL Linux distributor and is available for free. Gluster takes a similar approach: it also unifies the memories distributed in a storage space on the network. As you might expect, both GlusterFS and Ceph have their advantages and disadvantages.

There are also other free alternatives such as XtremFS and BeeGfs . For Windows servers, Microsoft offers commercial software- based storage solutions , including Storage Spaces Direct (S2D).

Advantages and disadvantages of Ceph

While Ceph is the best option in many situations, this storage method does not only bring advantages.

Advantages of Ceph

Ceph is free and is already an established option, despite being a relatively young project. Therefore, many manuals on how to carry out installation and maintenance tasks can be found on the net, and Ceph also has very good manufacturer's instructions . The acquisition of the project by RedHat indicates that it will likely continue to develop for some time. The scalability and built -in redundancy ensure data security and network flexibility. Furthermore, the CRUSH algorithm ensures the availability of data.

Note

Redundancy means, in this context, excess . In the field of computer science, the data stored excessively, repeated, is redundant . Redundancy is usually generated on purpose in order to guarantee the survival of the data in the event of a failure, which can be achieved both at the software and hardware level : on the one hand, data or information relevant to the machine can be stored in memory. restoration repeatedly; on the other, several physical storage media may be available to compensate for the possible breakdown of one of the computers.

Disadvantages of Ceph

Due to the wide variety of components it includes, an extensive network is required to take advantage of Ceph's strengths. Furthermore, it is relatively expensive to install and it is not always clear to the user exactly where the data is being stored.


...