+4 votes
296 views
in Tools by (242k points)
Tar archive program: creating backups and archiving files

Please log in or register to answer this question.

+5 votes
by (1.6m points)

How does tar work?
Backups with tar
How to install tar?
How to use tar?
Creating backups with tar
Incremental backup with tar
Creating an incremental backup with tar
Restore a system with a backup

image

Tar archive program: creating backups and archiving files

The program tar file, which stands for T ape Ar Chiver is a storage system tape drives based on the old method of backups or backups which is still convincing its customers. Although it is rarely used by home users today, tar continues to be considered the preferred archiving tool on UNIX systems. In addition, this file archiver allows you to regularly create incremental backups of servers. Here we will not only clarify how to use the program, but we will also indicate the necessary commands to perform backups with tar..

Index
  1. How does tar work?
    1. How to install tar?
    2. How to use tar?
  2. Backups with tar
    1. Creating backups with tar
    2. Incremental backup with tar
    3. Creating an incremental backup with tar
    4. Restore a system with a backup

How does tar work?

The tar archive program is used for archiving files and folders on Linux and other related systems, and while it may seem strange, it does not offer a compressing function in its standard form. Even so, the program is well known, since it allows you to combine entire folders into a single file . This technology is linked to the history of the program itself, because in a storage system of tape drives all data is transferred consecutively to a magnetic tape, which explains the sequential and linear storage of the tar format, in which the new files are attached to a main file. The file resulting from the concatenation of files is also known as a tarball , since they are literally linked to each other.

If you want to compress this type of file, you need to combine tar with gzip. Both programs complement each other perfectly, as gzip can only compress single files. For this reason, tar is always used first followed by gzip or another compression program, so that once compressed, .tar.gz or .tzip files are generated..

How to install tar?

In Ubuntu, the tarball program is default. If you use another Linux or Unix distribution, you can install the program with:

  sudo apt-get install tar tar-doc  

The tar-doc package is optional and contains documentation about the archiving program.

How to use tar?

It is possible to use tar by simply following the syntax:

  tar Option fichero  

Among the options we find:

Option Description Particularity  
--help Show all options.    
--version Shows the current version of tar.    
-c Create a new file (create)  
-d Compare the files in the archive and the filesystem. (diff)  
-F Write a file to the specified file or select the data from the given file. (file) This option must be indicated at the end, since the entries that follow are interpreted as files.  
-z Compress or unzip the file directly with gzip. gzip must already be installed.  
-Z Compress or decompress the file directly with compress. compress must already be installed. Attention should be paid to the capital letter.  
-j Zip or unzip the file directly with bzip2. bzip2 must already be installed.  
-J Zip or unzip the file directly with xz. xz must already be installed. Attention with the capital letter.  
-k Prevents files from overwriting other existing files when extracting them from the archive.    
-p Save access rights on extraction.    
-r Add a file to the existing archive. (recreate) The file is concatenated to the previous one, which works only if the file has not been compressed.  
-t Displays the contents of a file. (table)  
-or It only adds files more recent than their current versions in the archive.    
-v Shows the archiving process. (verbose)  
-vv Shows more detailed information about the archiving process. (very verbose)  
-w Every action must be ratified.    
-x Extract files from archive. (extract) The files remain in the file.  
-TO Add the files that make up an existing archive to another. Attention should be paid to the capital letter.  
-C Shows the location to extract the files. Attention should be paid to the capital letter  
-M Create, display or extract a multiple file. Attention should be paid to the capital letter  
-L Change the media after the file is a certain size. Size is measured in kilobytes. Attention should be paid to the capital letter.  
-W Check the file after it has been written. Attention should be paid to the capital letter.  
-P Archive all files in the root directory. Attention should be paid to the capital letter.  
--exclude Exclude files or folders. Specify after create command with --exclude = <file / folder>  
-X Read a list of excluded files. Requires a previously created list: -X <List> .list. Attention should be paid to the capital letter.  
-g Creates a record of all folders including checksums.    

In the creation of tar files you also have the possibility to apply wildcard characters with an asterisk. When creating a file with this program, first, you have to write the options, followed by the name of the file you want to create and finally the files and folders that it should contain. The following example creates an archive (-c) with two text files, compresses it with gzip (-z) and writes it to the file archiv.tar.gz ( -f ):

  tar -czf archiv.tar.gz ejemplo_1.txt ejemplo_2.txt  

If you want to group all the text files in a folder in a tar file, use the corresponding wildcard:

  tar -cf text_archiv.tar *.txt  

It is also possible to attach entire folders and subfolders in a tar file. In the example, the / folder1 is stored with all the subfolders and the files contained in them except the subfolder / folder1 / subfolder_x :

  tar -cf archiv.tar --exclude="/carpeta1/subcarpeta_x" /carpeta_1  

In the following example you can extract (-x) the compressed file (-z) created in the first example in another folder (-C):

  tar -xzf archiv.tar.gz -C /home/ carpeta1/archiv_carpeta  

To add another file to a tar archive that should not be compressed, add the following command:

  tar -rf archiv.tar ejemplo_extra.txt  

Backups with tar

Webmasters are inclined to use tar to create backups for two reasons. On the one hand, the folder structure remains unchanged and, on the other, the scope of the program's functions allows numerous additional precision adjustments , made evident if the numerous options that have been detailed in the previous section are taken into account. Next, we will explain how to use tar to create full backups and incremental backups..

Creating backups with tar

In any security strategy it is advisable to create a backup script to protect your system instead of having to create files manually. By automating the process, more folders can be saved , compressed, and transferred to an external storage system. For this, it is important to have continuous authorization to write and read in the corresponding folders. First, put a bin folder in your home directory , if you don't already have one, and create the script there. Below is an example script that you should adapt to your needs and the structure of your folders: 

  #!/bin/bash DATE=$(date +%Y-%m-%d-%H%M%S) BACKUP_DIR="/carpetadestino /backup" SOURCE="$HOME/carpetaorigen" tar -cvzpf $BACKUP_DIR/backup-$DATE.tar.gz $SOURCE  

So that you can perfectly understand the effect of this script, we explain it to you line by line:

  1. The first line is called shebang, in charge of sharing the interpreter to be used with the operating system. In this specific case, bash is used.

  2. Each backup created with tar contains a time stamp, necessary so that the backups can be distinguished from one another. The variables show, for example, the format year-month-day-hour minute second: 2017-09-07-152833.

  3. Then determine in which folder you are going to create the backup and place? /? after the last subfolder.

  4. This line indicates which folder or folders you want to insert into the file, since it can be made up of more than one, as long as they are separated with a space SOURCE = "$ HOME / originfolder1 $ HOME / originfolder2 ". As you can see, is not included here? /? at the end of the folder, although you do have to add a space before the closing quote.

  5. The last line of the script contains the tar command:
    1. -cvzpf creates a tar file ( -c ), shows the archiving process ( -v ), compresses with gzip ( -z ), saves access rights ( -p ) and everything is sent to the next file ( -f ) . Especially -v and ? P are optional, that is, you can choose between other options to create your backup.

    2. $ BACKUP_DIR / backup- $ DATE.tar.gz names the folder ( $ BACKUP_DIR ) and the file in which the backup is to be saved. In our example it follows the structure that appears below: first, the name of the folder, followed by the name of the backup file . After these the current time stamp and the format. Do not forget that, if you choose a different compressing method, you must change the file format and the command option.

    3. Finally, it tells tar with the $ SOURCE variable what to archive. Note that with --exclude or ? X you can exclude files or folders that you don't want to save in the backup.
advice

In principle, in Linux and Unix the extension given to the script file is useless, since the systems read the file type, comparing its structure with a magic file, that is, a database that they are usually found in / etc / magic. In spite of everything, it has become general to indicate the extension, since it gives the user an overview .

Now save the backup file in the bin folder and add its path to the PATH variable:

  PATH=$PATH:$HOME/bin  

The backup script that was just created must still become executable:

  chmod u+x $HOME/bin/backup  

For this, you can establish that only you can execute the file ( u ), although it is also possible to transfer these rights to a group ( g ), to others ( o ) or to all ( a ). It is at this time when the creation process ends and the script can be executed:

  sudo backup  

If what you want is to create the backup again, that is, extract the tar file, you just have to write the following command:

  tar -xzf backup.tar.gz -C /  

The script creates a complete backup, although it should be noted that this choice is not always the most appropriate if you want to proceed with the storage of a complete server. Perhaps it is better to consider whether an incremental backup with tar would not suit your interests better.

Note

When creating an archive with an absolute file path, tar displays the following warning:? Tar: remove the leading character? /? of the name of the elements ?. This is not an error message, but an indication of rewind safety, as tar turns / home / subfolder into home / subfolder . If you are not in the root folder when extracting the file, tar creates a new folder structure, for example / home / subfolder / home / subfolder , reducing the chance that you will overwrite an entire system by mistake. Note that Unix does not ask if it should be overwritten. If there really is content you want to replace, you must first navigate to the root folder with the? P option.

Incremental backup with tar

It is common for webmasters to create backups periodically to avoid data loss . If the current system rejects your service, compromises it, or deletes it, a functional version of backup can be used. The more regularly the save points are set, the less data will be lost if something happens to the system. However, if you do a full backup each time, that is, you archive all the data on the system, it will not only take a long time, but also a lot of memory space. Instead it is possible to create incremental backups with tar.

All incremental backup requires full storage. For this reason, first you must archive the entire system or the part of it that you want to save and then add only the new or modified tar files with the incremental backup . So when the backup is rerun, the last full copy will be required along with each incremental backup added after it, resulting in a much smaller data volume, but requiring a higher cost to restore. If one of the files is lost, much more unlikely today than in the days of magnetic tape, the backup will be incomplete.

Creating an incremental backup with tar

With tar it is possible to create incremental backups on a regular basis. Also for this you have to create your own backup script with which you can establish, for example, that a full backup is made once a month and an incremental backup every day. In addition, this script allows you to move old backups regularly to folders classified by dates. Tar is used for this and also cron, a daemon or program that acts in the background and allows time-based executions of other processes. This program is by default in Ubuntu. Then open a word processor and create the script:

  #!/bin/bash BACKUP_DIR="/ carpetadestino /backup" ROTATE_DIR="/ carpetadestino /backup/rotate" TIMESTAMP="timestamp.dat" SOURCE="$HOME/ carpetaorigen " DATE=$(date +%Y-%m-%d-%H%M%S) EXCLUDE="--exclude=/mnt/* --exclude=/proc/* --exclude=/sys/* --exclude=/tmp/*" cd / mkdir -p ${BACKUP_DIR} set -- ${BACKUP_DIR}/backup-??.tar.gz lastname=${!#} backupnr=${lastname##*backup-} backupnr=${backupnr%%.*} backupnr=${backupnr//\?/0} backupnr=$[10#${backupnr}] if [ "$[backupnr++]" -ge 30 ]; then mkdir -p ${ROTATE_DIR}/${DATE} mv ${BACKUP_DIR}/b* ${ROTATE_DIR}/${DATE} mv ${BACKUP_DIR}/t* ${ROTATE_DIR}/${DATE} backupnr=1 fi backupnr=0${backupnr} backupnr=${backupnr: -2} filename=backup-${backupnr}.tar.gz tar -cpzf ${BACKUP_DIR}/${filename} -g ${BACKUP_DIR}/${TIMESTAMP} -X $EXCLUDE ${SOURCE]  

Here we explain this script for backups step by step:

  • First you must redefine the interpreter.
  • Then set the variables. Here you must also add a folder for the rotation of backups with tar, that is, a type of backup file, as well as a file for the timestamp or time stamp.
  • In our example we show that you don't always have to add all the folders in the tar file. In fact, we have chosen the content of the mnt , proc , sys and tmp folders and not the folders themselves, hence the use of? * ?, because the data in these folders is temporary or must be created in each system.
  • For the path to interpret everything correctly, the script changes to the root folder with cd /.
  • With mkdir the backup folder is created if it has not already been created.
  • It is now when all the variables are read. As it is intended to number the backups consecutively, the code block finds out the number of the last one, which happens when the script removes the other components from the file name.
  • It stores only 30 backups at a time, and after this number is exceeded, the script moves them to the rotation folder. This is created first and then transferred to them with mv all the files that start with bot, a limitation that is explained by taking into account that only those marked files are entered, these are backup and timestamp . Finally, the script sets the backup number back to 1, although if the script checks that 30 backups have not yet been created, the file number increases to 1 ( ++ ).
  • Now the script almost reverts to what it originally did, although the command takes care of making the file name complete again with the new number order. 
  • Finally, the script executes the tar command. Contrary to the command of a full backup, the? G option is also added here, which enables incremental backup. In addition, tar reads the timestamp of each file, compares it with the data saved so far in timestamp.dat in order to determine the changes that have originated after the last backup, which will be part of the new file.
Note

All backups carried out on a daily basis are moved by the script every month to a new folder, thus ensuring that only the most current files are found in the original backup folder. However, there is no built-in function that limits the number of file folders, for which you have to delete them manually.

After all the above, the script for creating an incremental copy with tar should already be prepared. Save the files as backup in the bin folder . Also here you have to export the path and make the script executable.

  PATH=$PATH:$HOME/bin chmod u+x $HOME/bin/backup  

In theory it should already be possible to start the backup script with sudo backup . The idea behind incremental backups is that they allow you to create daily backups in an automated process , which is achieved with cron using the crontab call, a table divided into seven fields:

Minutes Hours Days Month Weekday Order  
(0-59) (0-23) (1-31) (1-12) (0-7)    

In this field you can enter either figures according to the corresponding value frame indicated in the parentheses, or an asterisk indicating that all possible values ​​are set. In addition, the box of the days of the week has a particularity, since it allows to establish if an order has to be carried out, for example, only on Monday (1) or during working days (1-5). Sunday has two different values, 0 or 7, since there are people who start the week on this day and others who end it here.

In the command line console open the cron editor mode with:

  sudo crontab ?e  

Add the following line:

  30 7 * * * /home/bin/backup  

With this we have established that a backup must be carried out every day at 7:30 am and once a month. When you save the changes, the incremental backup is ready to use.

Note

Cron, like web servers, only works when the system is up and running. When installing the script on your computer or laptop to perform the backup, you must make sure that this device will be running every day set at the estimated time. If not, the backup will not be created unless you use the anacron program, which is responsible for moving a planned task for a time when the system is not working to another when the device is connected.

Restore a system with a backup

Although no one is wanted, it may happen that you have to restore the system. Fortunately, this process is carried out relatively easily thanks to tar and does not require an additional script. However, it takes more than the simple commands used with full backups, as it goes by the very nature of incremental backups that many files have to be extracted. Add the following command in the console:

  BACKUP_DIR=/carpetadestino/backup cd / for archiv in ${BACKUP_DIR}/backup-*.tar.gz; do tar -xpzf $archiv -C / done  
Note

When restoring a system with a backup all the folders and with them all the important files are overwritten.

So that you don't have to extract each tar file individually, use a for loop:

  1. First set the folder where the backup is located.
  2. With cd / switch to the root folder to make sure the file is extracted from the correct place.
  3. Now start a for loop. This command repeats all the actions between do and done until all possibilities have passed. For the specification of the command, set the path of your backup with an asterisk as a wildcard, since you want to extract all the archive files from the folder. 
  4. The tar command is indicated as follows: extract ( -x ) keeping access rights (-p) and decompress (-z) the archive ( -f $ archiv ) in the root folder ( -C / ).
  5. With done you set the end point of the for loop.

As the backups were numbered consecutively when creating the tar file, they will also be executed one after the other, starting with the oldest. For this, it must be taken into account that in the files created after a complete backup there are new versions of the files contained in it. So in the for loop, the old versions are first extracted to later be overwritten by the new ones, allowing the complete system overwritten with the backup to contain the latest archived version of each file.

The main reason why incremental backups are created is to allow complete system restoration, although with a little detour it is also possible to rescue only some of the files , as well as to recover a version prior to the last one created:

  BACKUP_DIR=/carpetadestino/backup ls -l ${BACKUP_DIR} for archiv in ${BACKUP_DIR}/backup-*tar.gz; do tar -tzf $archiv | grep fichero-buscado; done  

Also in the first cases, the for loop is used again, although this will be used mainly for search and not for extraction:

  1. Redefine the backup folder.
  2. With the ls command all files and folders contained in the backup folder are displayed. The? L option provides detailed information.
  3. Open a for loop as it happened when restoring the file completely.
  4. The differences from the previous process reside in the options of the tar command. Instead of creating ( c ) or extracting ( x ) an archive, the content of the tar ( t ) archive is requested to be displayed and the output is forwarded with a vertical bar to the grep command , since it is not intended to search for the files personally. The command searches the output or content of the archive for the specific files searched for.
  5. The for loop ends.

Now the terminal shows the files you are looking for and probably a few more if you have worked with them regularly and they appear in every incremental backup. Remember the file path and create a new cycle that is in charge of restoring the last saved version.

  for archiv in ${BACKUP_DIR}/backup-*.tar.gz; do tar ?xzf $archiv -C / carpetadestino/backup/archivo-buscado done  

It is at this moment when the files are restored in their original place, also overwriting later versions, if any.


...