Genome Assembly Workflow (version 0)

Genome assembly of Araucaria araucana was based on an initial pool of 139,482,438 reads comprising approximately 16 TB. These reads were generated with an Oxford Nanopore Technologies PromethION instrument using genomic DNA libraries prepared with the SQK-LSK114 ligation sequencing kits and sequenced on fourteen FLO-PRO114M flowcells. Basecalling was performed independently with Dorado version 0.3.3 applying model dna_r10.4.1_e8.2_400bps_hac@v3.5.2, using default options.

Porechop version 0.2.4 (Wick, et al., 2017) was used to identify and remove reads containing middle adapters, setting the option --middle_threshold to 80. Due to an abundance of shorter reads, read identified as non-chimeric were filtered further by creating a subset of the longest reads that cumulatively would provide an estimated 25X coverage based on an expected genome size of 20 GB. This resulted in 29,877,344 input reads (approximately 500 GB) exhibiting a N50/N90 of 16153/12679 nucleotides.

Reads were assembled and polished (one iteration) with Flye version 2.9.3-b1797 using the option --nano-hq with a minimum overlap set to 10000. The de novo assembly comprised 91,326 contigs (N50: 913,876 nucleotides) with length 20,202,725,727 nucleotides. It is a haploid assembly which will have the alleles collapsed. Contigs sizes ranged from 160 to 9,118,124 nucleotides, averaging approximately 221Kb.

To assess the de novo assembly, Benchmarking Universal Single-Copy Orthologs (BUSCO, version 5.7.0; Seppey et al., 2019) with embryophyta_odb10 was performed on the entire aggregated of contigs, yielding 1,339 complete BUSCOs (83.0%).


A liftoff annotation transfer from Pinus tabuliformis was performed. We are currently working on a version1 annotation of the A. araucana genome.


Mosè Manni, Matthew R Berkeley, Mathieu Seppey, Felipe A Simão, Evgeny M Zdobnov, BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes, Molecular Biology and Evolution, Volume 38, Issue 10, October 2021, Pages 4647–4654, https://doi.org/10.1093/molbev/msab199

Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom. 2017;3(10):e000132. Published 2017 Sep 14. https://doi.org/10.1099/mgen.0.000132

img example