Ceph requires that several computers be connected to each other in what is called a cluster (literally group or heap , that is, a set of several computers). Each connected computer is called a node .
In a cluster there are different types of nodes, depending on the tasks they perform:
- Monitor nodes : They manage the status of each node in the cluster and especially monitor the manager service , object storage service and metadata server (MDS) components . In order to ensure certain security, it is recommended to have at least three monitor nodes .
- Manager : They manage the status of space utilization, system load and the level of utilization of the nodes.
- Ceph OSDs ( Object Storage Devices ) : These are the back-end services that are actually in charge of managing the files: they are responsible for storing, duplicating and restoring the data. It is recommended to have at least three OSDs in the cluster .
- Metadata server (MDSs) : They are responsible for storing metadata such as storage paths, timestamps and names of the files saved in CephFS, for performance reasons. They are created following the POSIX standard and can be requested via Unix command lines, such as ls , find, and like .
The key component of data storage is an algorithm called CRUSH ( Controlled Replication Under Scalable Hashing , that is, replication controlled under scalable hashing ). This algorithm is able to find an OSD with the requested file thanks to an assignment table..
The distribution of the files in Ceph is done in a pseudo-random way, that is, in such a way that it could appear that they are located in any way. In reality, however, CRUSH calculates the most suitable place to store them based on criteria defined by the network administrator. Doing so also duplicates files and stores them on separate physical media.
The files are distributed in so-called placement groups , processing the file name as a hash value . Another characteristic on which the location is based is, for example, the number of duplicates in the file.