The assembler hit a limbo when it was no longer able to generate temp files.
I was running the CLC genomics workbench for my de novo assembly work. CLC Genomics workbench is a licensed software that has all the packages for de novo assembly of DNA sequence reads and downstream analysis of contigs. Morning, when I checked the progress, I had a warning message on the screen that said: “There is not enough space in the root directory”. As of now, a stage of the assembly called “Contig generation is progressing”, and the process seems stuck at 14%. It took about 24 hours to have progress by 1%, from 13%.
The root directory(' / ') in my system is mounted onto the /dev/sda2. When I checked, the free-space was 0%. I found out that the /var/log/error_log file had consumed the majority of the disk space, and ~600 GB was freed upon deleting the file (but I observed that the same file was being generated and filled at a faster pace).
When I read about the CLC server, it says that it takes a lot of disk space for temporary files. These temporary files are needed for analysis while performing the local assembly, and these temporary files are written to the system’s default temporary directory. By default temporary directory, I believe they meant '/tmp '. When I checked the '/tmp', I found a FASTA file, sized ~ 185 GB.
When I checked the size of the /tmp folder, I got a result that says /tmp is of 174GB.
The Disk Usage Analysis shows this output: Here, almost 77% of the total used-space in the root directory is consumed by the /temp directory; of this, 99% is claimed by the fasta file, apparently generated by the CLCServer. This totally clears my doubt as to if there was a specific size-allocation for the /temp.
but it looks like they haven’t been used to the full capacity (not more than 1%). Having said that, yet I am not clear what exactly those 'tmpfs' do and how they were distinguished from the /tmp directory. Anyway, later I figured out that 'tmpfs' are something that I should ignore in this situation, as they are just 'ramdisks', and the total size is related to the RAM available in the computer. Here, in my case, there is no separate munt for the /tmp directory, but instead it is a folder within the root directory. Therefore the space available to the root directory is all that matters.
As of writing this, the /dev/sda2 (where root directory is mounted on) has a free space of 500GB. I was confused if /tmp directory was full; if there was a specific size-allocation for the /temp directory. But, the fact is there is no size-allocation for the /temp directory, and the size of the device on which /temp is mounted is wholly available. At the same time the error_log file located at /var/log/cups/error_log also raises a challenge, because it consumes disk space much faster than the assembler writes temporary files to the /temp. I don’t understand what the error is and why the error_log was getting generated. So, what can I do to prevent the root directory from getting filled up so fast as to give room for the temporaray files from CLCServer? Should I allocate a separate partition for the /tmp? Is it possible to stop the error_log file? Or dedicate a separate partition for the /tmp directory?
Few suggestions from the Linux/ Ubuntu experts
One of the responses I received says that it is - though it not a recommended solution - better to purchase a 2 -4 TB SSD and mount on the /tmp file system. After installing the SSD, format it, create a filesystem, create a mount point on /tmp, and edit the /etc/fstab.
It is also possible to remount a partition, in a different hard disk, on /tmp. For example, if you have a hard disk partition /dev/sda2, that can be mounted on /tmp mount point. The advice I received was to boot Linux on a live USB stick. /dev/sda2 partition should be of ext4 filesystem. Once it is formatted and mounted on /tmp, open the nano editor and add the following items on the /etc/fstab.
UUID=abc1234-whatever UUID /tmp ext4 defaults 0 0
The major reason why the root directory immediately became full was an error_log located at /var/log/cups/error_log. The error_log was consuming the diskspace at an alarming pace. Here is a screenshot of the contents of the error_log file.
The error_log file was deleted using the following command.
find . -type f -iname error_log -delete
Here we use the findcommand. The (.) after find command triggers searching for the file in the present directory. The Type parameter specifies a file, in other words it prompts to search only for files. And, -iname error_log specifies the files named error_log. The -delete keyword commands to delete the file.
But this is going to be a tedious process as the error_log file is getting generated at an alarming pace. One way to address this problem is to set a timer to periodically truncate the error_log file.
The following command truncates the error_log file every 60 seconds.
while sleep 60; do : >|/var/log/error_log; done
But when I ran this command, I hit an error as you see below.
I didn’t quite get why it raised the permission issue, because the ownership is with the user.
Another method is to disable logging (the process of generating logs) in the application, or enable settings that does less verbose logging.
Or, another important work-around is to change the default temporary directory that CLC Genomics Work bench uses to write the temporary files. In the CLC Server installation folder, there is a file called CLCServer.vmoptions.
This file can be opened using a text Editor, and one needs to add a new line -Djava.io.tmpdir=/path/to/tmp, where path to the new temporary directory should be provided. Restart the CLCServer, for the changes to take effect.
Finally
Finally, I ran the following command to periodically delete the error_log, so that it does not consume the root directory and incapacitate the assembler to write the temporary files.
watch -n 60 find . -type f -iname error_log -delete
Comments