top of page
Writer's pictureVijithkumar V

Article: The Chromium de novo assembly solution.

Updated: Oct 16, 2023

The de novo assembly solution

Chromium de novo solution helps to generate a diploid assembly of a diploid genome, which means it generates assembly of both alleles. 10x Linked-reads are generated from a single DNA library using the Chromium’s de novo whole genome library preparation process.


Is 10x genomics an easy way to get your de novo genome assembly done?

Getting a genome assembled - especially of higher animals or the one that contains more repeats - through an assembly process is not very easy. It is a long and arduous process; sometimes, it may involve an additional upstream process like inbreeding to get a good DNA sample. And once the reads are ready, then a bioinformatician has to assemble the reads.

In this context, we have 10x genomics, whose laboratory and bioinformatics processes are a turnkey and cookbook. They provide easy to follow instructions on DNA extraction, library preparation, and sequencing. Once the sequencing is completed, 10x genomics’ assembly software Supernova 2.1.0 can be run to assemble the genome. Supernova 2.1.0 does expect the user to be very proficient in the fundamentals of computer programming; all that the software requires is a batch of all fastq files associated with your library.


Chromium de novo assembly preparation does not require much DNA sample.

For chromium de novo assembly preparation, approximately 1ng of DNA is required. This means, you do not need to perform inbreeding to clonally select samples, so that you can avoid induction of any heterozygosity, or any complication associated with the mixing of wild samples. For library preparation, for organisms of genome size less than 1.6 gb, 10x genomics library preparation protocols recommends a loading mass of 0.6ng of DNA. if genome size is in the range of 1.6 - 3.2 gb, the loading mass of DNA is supposed to be interpolated between 0.6 and 1.2 ng. For genome size between 3.2Gb to 4.0Gb, the required DNA, to be loaded is 1.2 ng. Any genome of size 4.0 Gb is not recommended.


Reduced cost of sequencing.

Chromium de novo assembly solution costs much less as compared to other sequencing sequencing technologies, especially long-read sequencing technology. Since it requires a very low quantity of DNA (at nanograms level), and no upstream process involved, the cost is less. So, very importantly, the Chromium de novo library is sequenced using Illumina’s low -cost platforms like NovaSeq, HiSeq X, HiSeq 2500, and therefore the sequencing cost is not very exorbitant. Besides the monitory investment, 10x genomics’ linked-reads can be assembled using the Supernova 2.1.0 software package, which is less programmatically intense.

How to make Supernova work for your genome of interest?

Supernova has been tested on a wide repertoire of organisms. The scope of applicability of Supernova has been characterized by testing on genomes that vary in multitude features. the smallest genome tested using Supernova is of 140Mb size. And, the largest genome is of 3.2Gb size. A genome of size more than 4.0Gb should be considered experimental because it has not yet been tested out using Supernova. While sequencing genome, it is recommended to have a genome coverage within the range of 38 - 56X. If you know the genome size, one can manually calculate the number of reads that correspond to a recommended coverage. It is also recommended to to have reads not more than 2.14 billion. Also, optimally, the read length should be 150bp.

For instance, in my case, the total number of reads generated was 378 million, for a genome of size 2.6 Gb - as estimated by Supernova in the preliminary run. The TELL-Seq library was sequenced using the Illumina’s NovaSeq 6000 model, and the Linked reads are of 150bp. From these information, I was able to calculate the genome coverage, and it as around 21X. This is far lower than the recommended lowest coverage of 38X, and Supernova is not recommended for the assembly.

Preparation of long, undamaged DNA is important for a good assembly.

We need DNA of a single individual, and DNA from clonal population can also be used. While trying to isolate DNA from the clonal population, it is important to note that DNA from wild individuals should not get mixed.

An upstream process of inbreeding is not recommended because only few nanograms of DNA is required. Normally, an intact long DNA molecule can be easily made from the sample types of cell lines and blood. Long genomic DNA molecules, resulting from the fragmentation process, are used for the generation of barcoded-linked reads. These long genomic DNA molecules trapped in a Gel EMulsion beads (GEM beads), and the short reads that generate from them are clonally barcoded. The length of the genomic DNA molecule is key here. This dictates the quality of the assembly. If the length of the genomic DNA molecule is less than 50 Kb, then it is problematic. If it is less than 20Kb, it is highly problematic. For example, in our TELL-Seq library preparation, the genomic DNA molecule had an average length of 16 Kb, with per DNA reads are around 3 -4 in number. This is a serious problematic situation, and can result in a not-so-good quality assembly.

4 views0 comments

Comments


bottom of page