Untangling Heteroplasmy, Structure, and Evolution of an Atypical … · 2017. 8. 31. · Isabelle...

12
| INVESTIGATION Untangling Heteroplasmy, Structure, and Evolution of an Atypical Mitochondrial Genome by PacBio Sequencing Jean Peccoud,* ,1 Mohamed Amine Chebbi,* Alexandre Cormier,* Bouziane Moumen,* Clément Gilbert,* Isabelle Marcadé,* Christopher Chandler, and Richard Cordaux* *Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Unité Mixte de Recherche (UMR) Centre National de la Recherche Scientique (CNRS) 7267, Université de Poitiers, 86000 France and Department of Biological Sciences, State University of New York at Oswego, New York 13126 ORCID IDs: 0000-0002-3356-7869 (J.P.); 0000-0002-2131-7467 (C.G.) ABSTRACT The highly compact mitochondrial (mt) genome of terrestrial isopods (Oniscidae) presents two unusual features. First, several loci can individually encode two tRNAs, thanks to single nucleotide polymorphisms at anticodon sites. Within-individual variation (heteroplasmy) at these loci is thought to have been maintained for millions of years because individuals that do not carry all tRNA genes die, resulting in strong balancing selection. Second, the oniscid mtDNA genome comes in two conformations: a 14 kb linear monomer and a 28 kb circular dimer comprising two monomer units fused in palindrome. We hypothesized that heteroplasmy actually results from two genome units of the same dimeric molecule carrying different tRNA genes at mirrored loci. This hypothesis, however, contradicts the earlier proposition that dimeric molecules result from the replication of linear monomersa process that should yield totally identical genome units within a dimer. To solve this contradiction, we used the SMRT (PacBio) technology to sequence mirrored tRNA loci in single dimeric molecules. We show that dimers do present different tRNA genes at mirrored loci; thus covalent linkage, rather than balancing selection, maintains vital variation at anticodons. We also leveraged unique features of the SMRT technology to detect linear monomers closed by hairpins and carrying noncomplementary bases at anticodons. These molecules contain the necessary information to encode two tRNAs at the same locus, and suggest new mechanisms of transition between linear and circular mtDNA. Overall, our analyses clarify the evolution of an atypical mt genome where dimerization counterintuitively enabled further mtDNA compaction. KEYWORDS mtDNA; concerted evolution; crustacean isopods; telomeres; third-generation sequencing T HE typical bilaterian mitochondrial (mt) genome is de- scribed as a single circular molecule ranging from 15 to 20 kb in length, which contains 37 genes, including 13 pro- tein-coding genes, two rRNA genes, and 22 tRNA genes (Boore 1999). While the majority of bilaterian mt genomes conform to this description, several notable exceptions have been uncovered. Unusual bilaterian mt genomes include multipartite (e.g., Suga et al. 2008; Dickey et al. 2015) and linear (Raimond et al. 1999) structures, atypical size (e.g., Helfenbein et al. 2004; Liu et al. 2013), changes in gene content (e.g., Okimoto et al. 1992; Helfenbein et al. 2004), plasticity in gene order (e.g., Singh et al. 2009; Gissi et al. 2010), and additional genetic codes (e.g., Watanabe and Yokobori 2011; Abascal et al. 2012). Because they deviate from the standard model, these mt genomes may constitute ideal systems to further our understanding of mt biology and evolution in animals, as they can help to address questions of recombination, concerted evolution of mt loci and non- standard inheritance. The mt genome of terrestrial isopods (Isopoda: Oniscidea) is one example of such atypical genomes. It is notable for its compaction. In particular, genes coding transfer RNAs (tRNAs) can partially or fully overlap with protein coding Copyright © 2017 by the Genetics Society of America doi: https://doi.org/10.1534/genetics.117.203380 Manuscript received April 28, 2017; accepted for publication July 1, 2017; published Early Online July 5, 2017. Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10. 1534/genetics.117.203380/-/DC1. 1 Corresponding author: Laboratoire Ecologie et Biologie des Interactions (EBI), UMR CNRS 7267, Bâtiment B8-B35, 5 rue Albert Turpain, TSA 51106, 86073 Poitiers Cedex 9, France. E-mail: [email protected] 2 Present address: Laboratoire Evolution, Génomes, Comportement, Écologie, UMR 9191 CNRS, UMR 247 IRD, Université Paris-Sud, 91198 Gif-sur-Yvette, France. Genetics, Vol. 207, 269280 September 2017 269

Transcript of Untangling Heteroplasmy, Structure, and Evolution of an Atypical … · 2017. 8. 31. · Isabelle...

  • | INVESTIGATION

    Untangling Heteroplasmy, Structure, and Evolution ofan Atypical Mitochondrial Genome by

    PacBio SequencingJean Peccoud,*,1 Mohamed Amine Chebbi,* Alexandre Cormier,* Bouziane Moumen,* Clément Gilbert,*

    Isabelle Marcadé,* Christopher Chandler,† and Richard Cordaux**Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Unité Mixte de Recherche (UMR) CentreNational de la Recherche Scientifique (CNRS) 7267, Université de Poitiers, 86000 France and †Department of Biological Sciences,

    State University of New York at Oswego, New York 13126

    ORCID IDs: 0000-0002-3356-7869 (J.P.); 0000-0002-2131-7467 (C.G.)

    ABSTRACT The highly compact mitochondrial (mt) genome of terrestrial isopods (Oniscidae) presents two unusual features. First,several loci can individually encode two tRNAs, thanks to single nucleotide polymorphisms at anticodon sites. Within-individualvariation (heteroplasmy) at these loci is thought to have been maintained for millions of years because individuals that do not carry alltRNA genes die, resulting in strong balancing selection. Second, the oniscid mtDNA genome comes in two conformations: a �14 kblinear monomer and a �28 kb circular dimer comprising two monomer units fused in palindrome. We hypothesized that heteroplasmyactually results from two genome units of the same dimeric molecule carrying different tRNA genes at mirrored loci. This hypothesis,however, contradicts the earlier proposition that dimeric molecules result from the replication of linear monomers—a process thatshould yield totally identical genome units within a dimer. To solve this contradiction, we used the SMRT (PacBio) technology tosequence mirrored tRNA loci in single dimeric molecules. We show that dimers do present different tRNA genes at mirrored loci; thuscovalent linkage, rather than balancing selection, maintains vital variation at anticodons. We also leveraged unique features of theSMRT technology to detect linear monomers closed by hairpins and carrying noncomplementary bases at anticodons. These moleculescontain the necessary information to encode two tRNAs at the same locus, and suggest new mechanisms of transition between linearand circular mtDNA. Overall, our analyses clarify the evolution of an atypical mt genome where dimerization counterintuitively enabledfurther mtDNA compaction.

    KEYWORDS mtDNA; concerted evolution; crustacean isopods; telomeres; third-generation sequencing

    THE typical bilaterian mitochondrial (mt) genome is de-scribed as a single circular molecule ranging from 15 to20 kb in length, which contains 37 genes, including 13 pro-tein-coding genes, two rRNA genes, and 22 tRNA genes(Boore 1999). While the majority of bilaterian mt genomesconform to this description, several notable exceptions havebeen uncovered. Unusual bilaterian mt genomes include

    multipartite (e.g., Suga et al. 2008; Dickey et al. 2015) andlinear (Raimond et al. 1999) structures, atypical size (e.g.,Helfenbein et al. 2004; Liu et al. 2013), changes in genecontent (e.g., Okimoto et al. 1992; Helfenbein et al. 2004),plasticity in gene order (e.g., Singh et al. 2009; Gissi et al.2010), and additional genetic codes (e.g., Watanabe andYokobori 2011; Abascal et al. 2012). Because they deviatefrom the standard model, these mt genomes may constituteideal systems to further our understanding of mt biology andevolution in animals, as they can help to address questionsof recombination, concerted evolution of mt loci and non-standard inheritance.

    Themt genome of terrestrial isopods (Isopoda: Oniscidea)is one example of such atypical genomes. It is notable forits compaction. In particular, genes coding transfer RNAs(tRNAs) can partially or fully overlap with protein coding

    Copyright © 2017 by the Genetics Society of Americadoi: https://doi.org/10.1534/genetics.117.203380Manuscript received April 28, 2017; accepted for publication July 1, 2017; publishedEarly Online July 5, 2017.Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1.1Corresponding author: Laboratoire Ecologie et Biologie des Interactions (EBI), UMRCNRS 7267, Bâtiment B8-B35, 5 rue Albert Turpain, TSA 51106, 86073 PoitiersCedex 9, France. E-mail: [email protected]

    2Present address: Laboratoire Evolution, Génomes, Comportement, Écologie, UMR9191 CNRS, UMR 247 IRD, Université Paris-Sud, 91198 Gif-sur-Yvette, France.

    Genetics, Vol. 207, 269–280 September 2017 269

    http://orcid.org/0000-0002-3356-7869http://orcid.org/0000-0002-2131-7467https://doi.org/10.1534/genetics.117.203380http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1mailto:[email protected]

  • genes (Doublet et al. 2015). But one truly unique feature ofthis genome is the capacity of three tRNA loci to each encodetwo alternative tRNAs with distinct anticodons, thanks tosingle nucleotide polymorphisms (SNPs) occurring withinthe same individual (Marcadé et al. 2007; Doublet et al.2008; Chandler et al. 2015). At all three loci, mtDNA showstwo different bases at one position of the anticodon, thusmaking individuals heteroplasmic at these nucleotidepositions. This variation appears as a double peak on chro-matograms generated by direct Sanger sequencing ofPCR amplicons (Marcadé et al. 2007; Doublet et al. 2008;Chandler et al. 2015), cut and uncut amplicons on electro-phoresis gels after mtDNA digestion by appropriate en-zymes (Doublet et al. 2008), or SNPs among sequencesobtained from next-generation technologies (Chandleret al. 2015).

    The same three heteroplasmic anticodon sites have beendetected in individuals of two oniscid species, Trachelipusrathkei and Cylisticus convexus (Chandler et al. 2015), eachsite allowing the encoding of two tRNAs per locus and savingone dedicated tRNA locus. One of these heteroplasmic sites isshared with Armadillidium vulgare (Marcadé et al. 2007) anda diverse array of terrestrial isopod species (Doublet et al.2008). The presence of these heteroplasmic sites in divergentoniscid lineages suggests that at least some of them havebeen maintained for millions of years (Doublet et al. 2008).Bottlenecks resulting from the transmission of relativelyfew organelles to zygotes usually remove heteroplasmy infew generations (Wolff et al. 2011; Breton and Stewart2015; Stewart and Chinnery 2015). In these oniscids, how-ever, it is believed that “constitutive” heteroplasmy is main-tained by the requirement of all tRNA variants withinan animal, and possibly even within an individual mito-chondrion. This case of balancing selection (the evolution-ary maintenance of polymorphism) represents the onlysuspected example of vital heteroplasmy in eukaryotes(Doublet et al. 2008).

    Thehypothesis of constitutiveheteroplasmymaintainedbybalancing selection must, however, consider another uniquefeature of themtgenomeof terrestrial isopods. This genome isremarkable for presenting two conformations: one linearmonomer of �14 kb that represents one unit of mt genomecontaining the standard bilaterian mt genes; and a circular�28-kb dimer that is a palindrome composed of two mono-mers, each representing one genome unit, arranged in a mir-rored fashion (Raimond et al. 1999; Marcadé et al. 2007).The presence of dimers, which constitute about half of themtDNAmolecules in A. vulgare (Raimond et al. 1999), leavesthe possibility that both tRNAs of a heteroplasmic site can beencoded by the two genome units of a dimeric molecule, suchthat a single dimer may encode all tRNAs. The transmissionof such dimers would allow faithful inheritance of all essen-tial tRNAs genes to daughter mitochondria and to the prog-eny, and would ensure good balance of the tRNAs withinorganelles. This hypothesis implies that the two genomeunits within a dimer are not completely identical. It therefore

    conflicts with another formulated hypothesis: that dimersarise from the replication of linear monomers. The extremi-ties of linear monomers contain inverted terminal repeatsthat are thought to be telomeric hairpins covalently linkingthe two DNA strands (Doublet et al. 2013). DNA polymerasewould be able to navigate the hairpin and then replicate theother strand, circularizing the linearmonomer into a dimer inthe process (Figure 1A). If so, this dimer would be expectedto present totally identical genome units.

    Figure 1 (A) Hypothesized replication of a linear monomeric mtDNAmolecule into a circular dimer in oniscids. A gray arrow represents agenome unit or a monomer. Its “head” is close to the 16S rRNA gene,and its “tail” is close to the cytochrome b gene. Tick marks represent thelocations of known heteroplasmic tRNA loci and indicate the two tRNAsthat each can encode. Upon replication, the telomeric hairpin of a mono-mer (shown in red) becomes the junction between palindromic genomeunits of the circular dimer, each resulting from the replication of a mono-mer strand. (B) Replication of a linear molecule carrying a pair of non-complementary bases leads to an asymmetric dimer carrying differentbases at the mirrored positions.

    270 J. Peccoud et al.

  • Therefore, the nature of heteroplasmy, the adaptive ben-efits of dimeric mtDNA molecules, and the possible conver-sions between the unusual conformations of the oniscid mtgenome are entangled issues that must be investigatedtogether. To take on this task, we used long reads generatedby the Single-Molecule Real Time (SMRT) sequencingtechnology from Pacific Biosciences (Eid et al. 2009).These reads allowed us to reconstruct the haplotypes,hence the combination of tRNAs encoded by individualdimeric molecules, in four oniscid lineages. We specificallyinvestigated whether mt molecules can encode all requiredtRNAs. In addition to long reads, we took advantage ofunique features of the SMRT sequencing technology toidentify the conformation of molecules and clarify the con-versions between dimeric and monomeric forms of thisatypical mitochondrial genome.

    Materials and Methods

    Weexamined four terrestrial isopodmatrilines (Table 1): twofrom A. vulgare (named BF and WXf), one from A. nasatum,and one from T. rathkei. For each matriline, short sequencingreads (Illumina) and long reads (SMRT) have been obtainedfrom the genomic DNA of one or several related individuals(full siblings or first cousins) as part of full-genome assemblyprojects.

    Generation of mitochondrial genome sequences

    Weaimedatbuilding thedimericmtgenomesequenceof eachlineage with both units placed head to head. This configura-tion was chosen to facilitate the use of long reads spanningmirrored anticodon sites, which are much closer to the head-to-head junction than they are to the tail-to-tail junction(Figure 1).

    The mt genomes of A. vulgare BF and A. nasatum werereconstructed from contigs generated for other full-genomeassembly projects. For each of these lineages, we first retrievedcontigs comprising mitochondrial sequences by performing ablastn (Camacho et al. 2009) homology search against thenearly complete mitochondrial genome of A. vulgare (GenBankaccession number EF643519.3). These searches returned sev-eral contigs having large portions of the mitochondrial genomesequence in a head-to-head configuration, as expected if thesecontigs comprised the dimeric form. The contig encompassingthe longest homology in such configuration, and the lowest

    divergence with the reference genome, was selected andtrimmed if needed.

    For consensus sequence polishing, Illumina reads werealigned to the retained contig using Bowtie2 version 2.2.9(Langmead and Salzberg 2012), which was set to the defaultlow sensitivity (“fast” search strategy), and configured to re-tain only alignments including both reads of a pair. The align-ment file was processed with Pilon version 1.18 (Walker et al.2014) to correct potential errors in the mapped referencecontig. These two steps were repeated, and the alignmentfile was inspected for remaining errors with Integrated Ge-nome Viewer version 2.3.92 (Robinson et al. 2011). Any errorin the consensus sequence was manually corrected withGeneious Pro version 5.4 (Drummond et al. 2010).

    For A. vulgare WXf, we used the BF mt genome as a refer-ence.We corrected differences withWXf using the samemap-ping strategy as described above. For T. rathkei, we used thereference genome available in GenBank (accession numberKR013001.1). As this genome contains a complete unitflanked by short palindromic parts representing the ends ofthe other genome unit, we generated the expected dimericform and used it as a reference. We corrected potential errorsand differences with our studied lineage as described for theother genomes.

    Fromeachof themappingfile generatedabove, alignmentsof reads originating from the same DNA fragment wereremoved with SAMtools rmdup version 1.3.1 (Li et al.2009). Bases at each position were called by SAMtoolsmpileup. A custom C program was used to convert thepileup file into base counts, discarding bases with qualityscore,25. Sites where the rarer base was carried by.20%of the reads were considered possibly heteroplasmic. Atsuch site, the reference sequence of each lineage was mod-ified to show ambiguities following IUPAC conventions.This was necessary to avoid any bias in the alignment oflong reads, which minimizes mismatches at the risk of cre-ating spurious indels.

    Alignment of long reads

    We aligned long reads on the corresponding reference ge-nome of each lineage using BLASR version 1.3.1 (Chaissonand Tesler 2012) with default settings. Long reads consistedin reads of inserts (Figure 2A) and circular consensuses(CCS), both being generated by the sequencing centers. ACCS is the consensus among reads from the same polymerase

    Table 1 Summary information about the four oniscid lineages used in this study

    A. vulgare BF A. vulgare WXfa A. nasatum T. rathkei

    Matriline source location Nice, France Helsingør, Denmark Thuré, France Oswego, NYIllumina data

    Individuals sequenced 1 female 1 female 2 males 5 siblingsTechnology HiSeq 2000, 2 3 100 bp HiSeq 2000, 2 3 100 bp HiSeq 2000, 2 3 100 bp HiSeq 2500, 2 3 250 bp

    SMRT dataIndividuals sequenced 13 females 7 females 12 males 9 siblingsTechnology PacBio RS II, P6C4 chemistry

    a Illumina sequence data were obtained by Leclercq et al. (2016). Other sequence data were generated for ongoing genome assembly projects.

    PacBio and Atypical Mitogenomes 271

  • read (DNA fragment), and may not be called if not enoughreads are present.

    For subsequent analyses, it was crucial to ascertain thealignment orientation (sequenced DNA strand) of each readmapping across the palindromic genome units. This orienta-tion could be determined aswe found that genome unitswereseparated by a short nonsymmetrical junction (see Figure 1and Results). We thus retained up to two alignments per read,one per orientation, which we compared to determine themost likely sequenced strand of the junction (see below).We did not simply retain the alignment with best overallscore, as sequencing errors may prevent reliable inferenceof the most likely mapping orientation.

    All following steps were executed in R 3.3 (R CoreTeam 2014), with the help of functions from packagesGenomicAlignments (Lawrence et al. 2013) and Biostrings(Pagès et al. 2017). Our script was based on the splitting ofthe BLASR alignment (.bam) file into a matrix of individualbases, in which columns correspond to successive positions ofthe reference sequence and rows to aligned reads. For allpositions of the junction, we counted the frequency of mis-matches (including deletions) between the sequence of thereference and that of a read, in each alignment orientation.We did not count insertions in the read as these could not beretained in thematrix. If mismatch frequencies betweenmap-ping orientations of a read differed by .10%, and if thelowest mismatch frequency was #25%, we considered themapping orientation corresponding to that frequency as thecorrect one. Otherwise, the alignment orientation of a readon the junction was considered undetermined.

    For each read, we retained the alignment corresponding tothe inferred mapping orientation on the junction. If the morelikely orientation could not be inferred, we retained thealignment with best mapping score, or selected alignmentsat random if scores were identical (which was the case forreads not covering the junction).

    Determination of mtDNA molecule conformations

    To establish the combinations of tRNAs encoded by singledimeric molecules, we used SMRT reads covering mirroredanticodon sites. Counterintuitively, a read mapping acrossgenome units may still come from a linear monomer, asexplained in Figure 2. This was inferred if all reads sequencedfrom the molecule mapped on the junction between genomeunits in the same orientation (Figure 2B). If successive readsof the molecule mapped in alternate orientations, the mole-cule was instead classified as a dimer (Figure 2C). If solely thefirst read of the molecule spanned the junction, we inferredthe most likely mapping orientation of the second read, if pre-sent, based on the expectation that the start of a read shouldmap closely to the edge of the region mapped by the previousread (Supplemental Material, Figure S1 in File S1).

    For a DNA fragment to be classified as linear monomer, wefurther required that the middle of the region mapped by atleast one read was #30 bp from the center of the junction,since the hairpin telomere is sequenced at the middle of the

    Figure 2 (A) Process of SMRT sequencing. A DNA fragment is bluntlyligated to two SMRTbell adapters (blue) forming hairpins and carrying aDNA polymerase (Travers et al. 2010). During sequencing, the newlyformed fragment (striped with white arrow heads pointing toward the39 end) leads to a polymerase read, which is composed of reads of insert(simply called “reads” for short) corresponding to alternative strands ofthe original fragment and separated by SMRTbell sequences. Reads areoriented as they would align to the green strand in the original frag-ment. (B) SMRT sequencing of a molecule whose telomeric hairpin actsas a SMRTbell. All resulting reads align on the reference [which is thedimeric mitochondrial genome containing the junction between units(shown in red)] on the same orientation, and their middle (the conver-gence of equally sized black arrows) corresponds to the center of thejunction. (C) Sequencing of a dimeric molecule covering the junctionbetween genome units produces reads that align in alternate orienta-tions. The middle of these reads is unlikely to correspond to the middleof the junction. Some drawings are inspired by Fichot and Norman(2013).

    272 J. Peccoud et al.

    http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1/FileS1.pdf

  • read (Figure 2B). This can only be ascertained in “complete”reads, i.e., those starting and ending at a SMRTbell adapter.We defined a complete read as one having its start and endcoordinates in the original polymerase read (Figure 2A),70 bp away from those of the previous and next reads,respectively. This length corresponds to that of a SMRTbelladapter (45 bp) plus a safety margin. If the read was the firstof the polymerase read, we imposed that its start coordinatein the polymerase read was at most 70 and that at least one ofits first 50 bases aligned on the reference genome. CCS wereconsidered as complete reads. The position of the middle of acomplete read on the reference genome was designated asthe midpoint between start and end positions of its align-ment. These were adjusted by adding or subtracting, asappropriate, the lengths of the unaligned “clipped” read parts(which are often zero).

    Fragments whose reads mapped in the same orientationbut failed to meet this requirement were not assigned to amolecule type. Likewise, we did not assign fragments forwhich fewer than two reads could be oriented with certainty,except in the following case. We reasoned that during thesequencing of a linear monomer, the polymerase, after goingthrough the hairpin telomere, either returns to the SMRTbellor ends its polymerization. Either event terminates the read ata position that cannot be further from the center of thetelomere than where the start of the read is, assuming thatthe read starts at the SMRTbell. We classified the parentfragment as dimer if the end of the read mapped at a distancefrom the junction that was at least 100 bp longer than thedistance between the junction and the mapping position ofthe read start. This requirement must be fulfilled by actualmapping positions and by those considering clipped readparts. To exclude first reads (of a polymerase read) thatmay not start at the SMRTbell, we imposed that such readstarted at coordinate ,70bp in its parent polymerase read,and that its left clipped part was ,50 bp.

    Establishment of haplotypes carried by mtDNA dimers

    Wethenestablishinghaplotypes, hence tRNAgenes carriedbymtDNAmolecules assigned to dimers.Haplotypeswere estab-lished by concatenating the bases at heteroplasmic sites in thematrix we generated. Prior to that, we slightly correctedalignments as we found frequent 1-bp deletions in reads atthese sites, associated with mismatches at the immediateflanking positions. Most mismatched bases corresponded toone of the two possible bases carried by short reads at a

    heteroplasmic position. We thus swapped the deletion andthe mismatched base in the base matrix, which reducedmismatches without altering the original read sequence.We believe that BLASR improperly managed the alignmentof reads on the ambiguous nucleotides that we placed in thereference sequences at heteroplasmic positions.

    Despite these corrections, the frequency of bases that werenot supported by Illumina data at these sites was �7.5% onaverage, not counting deletions. Because of such a high rateof base substitution errors, all reads of a parent polymeraseread may not support the same haplotype. In such cases, weselected the haplotype according to four successive criteria:(i) higher number of sites having the bases supported byIllumina data, (ii) presence in the CCS, (iii) higher frequencyof the haplotype among reads of the polymerase read, and

    Figure 3 Alignment between dimeric mtDNA sequences of three Armadillidium lineages at the region of the junction between “heads” of genomeunits (top), and homologous region in T. rathkei (bottom). Bases shown in bold font over a gray background constitute the junctions that separate theheads of genome units. The sequences flanking a junction are the reverse complement of each other. Sequences were aligned by the muscle algorithm(Edgar 2004). The T. rathkei region was not aligned as its divergence with the other lineages would have reduced legibility.

    Figure 4 Sequencing depth of short reads on the mitochondrial dimericgenomes of four oniscid lineages. Colored segments indicate the pres-ence of SNPs, each presenting two bases at very similar relative frequen-cies (green: adenine, blue: cytosine, orange: guanine, red: thymine). OnlySNPs for which the rarer bases are carried by$20% of the mapped reads,and whose sequencing depth is .20% of the mean depth are shown.Converging red triangles represent the location of the head-to-head junc-tions.

    PacBio and Atypical Mitogenomes 273

  • (iv) fewer mismatches with the most frequent haplotypefound across all reads.

    Identification of fragments withnoncomplementary bases

    If a dimer resulted from the replication of a linearmonomer, itshould present identical genome units, and encode a singletRNA type per pair of mirrored loci. As haplotypes clearlycontradicted this prediction (see Results), we reasoned thatthe bases forming the two DNA strands of a linear monomerconverted into a dimer may not be complementary at theanticodon sites (Figure 1B).

    To assess base complementarity within linear monomers,whose sequencing reads unite both strands of a molecule(Figure 2B), we compared bases between mirrored hetero-plasmic sites covered by the same read. We also looked forbase complementary in fragments that were sequenced withtwo SMRTbells, by comparing bases carried by reads map-ping on different mtDNA strands. These fragments includethose classified as dimers, and those that do not span thejunction between genomes units, which hence could not beclassified (hereafter called “unclassified” fragments). Unclas-sified fragments were defined as molecules whose reads allaligned $100 bp away from the junction between genomeunits. Alignment positions considered clipped read parts.

    Rather than a binary value, we developed an index toquantify the complementarity of bases between DNA strandsof a molecule (polymerase read) at a given position, as eachstrandmay be sequenced several times. This index ignores allreads carrying rare bases or deletions at this position. Wedefine Bf and Br as the most frequent bases among forward-aligned reads and reversed-aligned reads, respectively. If thetwo possible bases have equal counts among reads of a givenorientation, the most frequent one is chosen at random. Welet f be the fraction of forward-aligned reads carrying Bf andr be the fraction of reverse-aligned reads carrying Br. We de-fine our index of complementarity as:

    i ¼f þ r2

    ; if   Br ¼ Bf

    2f þ r2

    ; otherwise:

    8>><>>:

    Index i varies from21, if all bases between reads mapped inopposite orientation are noncomplementary, to 1 if all arecomplementary. Intermediate values represent conflicting re-sults between reads mapping in the same orientation. Wedefined a per-fragment index I that averages i over sites cov-ered by the fragment. To minimize the influence of sequenc-ing errors, values of I that were not obtained from at least twobases per strandwere ignored. These two bases may either besequenced at the same site in two reads from the same strand,or sequenced at two sites in the same read. We consideredthat a fragment carried noncomplementary bases or comple-mentary bases if I was , 20.9 or . 0.9, respectively. Frag-ments whose indices fell between these values were notconsidered.

    Data availability

    Annotated mitochondrial genome sequences are availablefrom GenBank under accession numbers MF187611-MF187614. Sequencing reads that mapped on mitochondrialgenomes are available at the National Center for Biotechnol-ogy Information short read archive under accession numberSRP108987. All inhouse scripts and programs are availableupon request. File S1 contains Supplemental Text, Figure S1,Figure S2, Figure S3, and Table S1.

    Results

    Mitochondrial genome sequences and polymorphic sites

    Dimeric genome sequences of all four lineages were success-fully reconstructed, including junctions between the headsof genome units. These junctions are 34- to 42-bp long in

    Table 2 Location and composition of heteroplasmic sites found in the mtDNA of four oniscid lineages

    Location Matriline Nucleotide position Base counts (A/C/G/T)

    tRNA Leu2 (TAA)/Leu1 (TAG) A. vulgare BF 9,171 11,611 / 0 / 11,322 / 7A. vulgare WXf 9,168 7,010 / 1 / 6,977 / 0A. nasatum 9,176 4,211 / 0 / 4,192 / 1T. rathkei 9,279 3,079 / 0 / 3,260 / 4

    tRNA Gly (TCC)/Arg (TCG) A. vulgare BF 11,601 5 / 13,516 / 12,951 / 7A. vulgare WXf 11,604 4 / 7,659 / 7,562 / 4A. nasatum 11,605 6 / 4,352 / 4,377 / 1T. rathkei 11,718 1 / 2,272 / 2,473 / 0

    tRNA Val (TAC)/Ala (TGC) A. vulgare BF 12,004 12,277 / 0 / 12,073 / 1A. vulgare WXf 12,007 7,014 / 1 / 7,247 / 5A. nasatum 12,008 4,042 / 1 / 4,282 / 3T. rathkei 12,121 2,260 / 1 / 2,202 / 1

    tRNA Gly/Arg A. vulgare WXf 11,606 4 / 7,375 / 2 / 7,859nad3 genea A. vulgare WXf 11,784 7397 / 7,400 / 1 / 312S rRNA A. vulgare WXf 13,474 0 / 6,755 / 1 / 6,542

    Sites that are shared across lineages are designated after the tRNAs they encode depending on the anticodon (shown in parenthesis in 59 to 39 orientation). Base counts referto number of mapped Illumina reads carrying a given base. Positions are given in coordinates of the first genome unit.a Variation at position 11,784 involves a change in the nad3 protein sequence.

    274 J. Peccoud et al.

    http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1/FileS1.pdf

  • Armadillidium lineages (Figure 3), and their sequences cor-respond to the “inverted repeats” that have been located nearthe 12S rRNA gene of mtDNA monomers in A. vulgare(Doublet et al. 2013). These sequences are predicted to formsecondary hairpin structures that were suspected to consti-tute the telomeres of linear monomers (Doublet et al. 2013).The location of these sequences at the junctions betweengenome units in dimers corroborate this hypothesis, underthe model of monomer replication shown in Figure 1. InT. rathkei, only one base separates the heads of genome units

    (Figure 3). The opposite junction located between the cyto-chrome b genes of genome units is 0–3 bp long (data notshown), depending on the lineage.

    The high sequencing depth of Illumina reads aligned totheir respective dimeric genomes clearly outlined heteroplas-mic sites as SNPs (Figure 4). Three pairs of mirrored SNPs areshared by all lineages (Table 2), and correspond to variationat the three tRNA sites previously identified in T. rathkeiand C. convexus (Chandler et al. 2015). Only one of thesesites (in tRNA Ala/Val, Table 2) was previously known to

    Figure 5 Haplotypes found in dimeric mtDNA molecules at pairs of heteroplasmic anticodon sites in four oniscid lineages. Dimers are shown asconverging curved gray arrows as in Figure 1A. Bases of the dominant haplotype are shown on the coding strand of tRNAs for each genome unit, andcorresponding anticodons are indicated by the name of tRNAs in front of these bases. For each pair of mirrored site, ratios represent the number ofsequenced molecules carrying the dominant two-base haplotype over the number of successfully sequenced molecules at these sites. Error barsrepresent 95% confidence intervals estimated by the Clopper-Pearson method of R package binom (Dorai-Raj 2014). Bar plots represent the fractionof sequenced molecules carrying a six-site haplotype, among dimeric molecules that could be successfully sequenced at all sites. To minimize theinfluence of sequencing errors, molecules that showed deletion or rare bases (Table 2) at any of these sites were ignored. Bases in red representdifferences from the dominant haplotype of the same lineage. Symmetrical haplotypes, which mirror the bases found in one genome unit, are shown inbold. Numbers in parentheses represent the number of sequenced molecules carrying a haplotype. For T. rathkei, we merged counts for each haplotypeand its mirrored counterpart, as the mapping orientation of reads across the palindromic genome units could not be determined with certainty (seeResults section).

    PacBio and Atypical Mitogenomes 275

  • be heteroplasmic in A. vulgare and A. nasatum (Marcadé et al.2007; Doublet et al. 2008). All lineages present the same twoexpected bases at very similar frequencies (�50%) at theshared SNPs. The corresponding tRNA genes hence presentroughly equal frequencies among sequenced individuals of alineage, or within the only sequenced animal in the case ofA. vulgare lineages (Table 1).Within-individual variationwassystematically observed in previous studies (Doublet et al.2008; Chandler et al. 2015), and can safely be extrapolatedto all four lineages. Other bases may be found at the sharedSNPs (Table 2) but their extremely low frequencies, mea-sured in thousandths, can be explained by sequencing errors.The A. vulgareWXf lineage shows three additional SNPs thatpresent the same pattern of variation as shared heteroplasmicsites, but those are not located in anticodons of tRNA genes(Table 2).

    Asymmetry at anticodon sites of dimericmtDNA molecules

    In the three Armadillidium lineages, the mapping positionsand orientations of long reads on the junction between ge-nome units (Figure S2 in File S1) matched our predictions(Figure 2, B and C), which allowed classifying DNA frag-ments asmonomers or dimerswithout ambiguity. In T. rathkei,the 1-bp-long junction (Figure 3) was too short to determinethe most likely mapping orientation of reads, given the higherror rate of SMRT sequences. In this lineage, we couldclassify somemolecules as dimers on the sole basis of mappingcoordinates of reads, and we could not technically identifymonomers. The other lineages considerably vary with re-spect to the frequency of dimeric molecules (Table S1 in FileS1). This variation may simply reflect differences in fragment

    size selection to be sequenced during library prepara-tions, as sequenced fragments classified as linear mono-mers are considerably shorter than dimers (Figure S3 inFile S1).

    In each Armadillidium lineage, reads that spanned all sixsites of dimeric molecules indicated the presence of a domi-nant haplotype (Figure 5). This haplotype (“GCAGGA”) is thesame in A. vulgare BF and A. nasatum. In A. vulgare WXf, thedominant haplotype (“AGGACG”) is the reverse of the afore-mentioned one. We double-checked that the head-to-headjunction of the A. vulgare WXf reference genome was in thesame orientation (strand) as that of the other two lineages.This 42-bp junction between genome units (Figure 3) allowsunambiguous orientation of reads, hence of haplotypes. Adifferent dominant haplotype (“ACAGGG”/“GGGACA”) isfound in T. rathkei. We cannot establish the orientation ofthis haplotype because reads have almost equal probabilityof mapping on either strand of the reference genome, asexplained previously.

    Importantly, in each lineage, the prevalent haplotype car-ries different bases at each pair of mirrored anticodons, andthus representsmolecules that encodeall sixpossible tRNAsatthese loci. While relatively few sequencing reads spanned allsix sites without any apparent sequencing error, asymmetrybetween genome units of a dimer is confirmed by the morenumerous sequencedmolecules that covered at least one pairof mirrored anticodons: �90% of them carry different basesat any pair of sites (Figure 5), a result that extends to thethree private SNPs of A. vulgare WXf (haplotype counts notshown). Nevertheless, three six-base haplotypes (shown inbold in Figure 5) are symmetric, mirroring the bases found inone of the two genomic units in the dominant haplotype.

    Figure 6 Top: frequencies of molecules having noncomplementary bases among different types of mtDNA molecules (see text) in three Armadillidiumlineages. Red parts of molecules represent hairpin telomeres or junctions between genome units, and blue parts represent ligated SMRTbell adapters(see Figure 2). Ratios above points indicate the numbers of molecules with noncomplementary bases over molecules that could be characterized for basecomplementarity. Error bars represent 95% confidence intervals. Bottom: example of an “unclassified” mtDNA molecule from A. vulgare WXf havingnoncomplementary bases at the four variable sites it covers. These sites are named after their genomic positions (Table 2). Rows represent successivereads sequenced from complementary strands (see Figure 2A). Each strand has been sequenced six times, and reads form the reverse strand (in respectto the reference genome) have been reverse-complemented. Sequencing errors are shown in gray.

    276 J. Peccoud et al.

    http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1/FileS1.pdfhttp://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1/FileS1.pdfhttp://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1/FileS1.pdfhttp://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1/FileS1.pdf

  • These symmetric haplotypes are supported by a single se-quenced DNA molecule each, all of which were classified asdimers by the mapping coordinates of reads, rather than ori-entations. As coordinate-based classification may be affectedby incorrect in silico delineation of reads in raw polymerasereads (Figure 2A), it is not strictly excluded that these hap-lotypes are in fact carried by linear monomers. Most of theother minor six-site haplotypes differ from the dominant oneby just one base (shown in red in Figure 5).

    Linear monomers with noncomplementary bases

    Estimates of the fraction of linear monomers carrying non-complementary bases varied across lineages (Figure 6), from�28% in A. vulgare BF to�82% in A. vulgareWXf. No dimericmolecule was found to carry noncomplementary bases, andwe had no reason to expect any. By contrast, 11–35% of un-classified mtDNA fragments (those of undetermined conforma-tion, seeMaterials andMethods) did present noncomplementarybases at heteroplasmic positions (Figure 6).

    Discussion

    Source of apparent heteroplasmy in oniscids

    We were able to sequence individual mtDNA dimeric mole-cules of four oniscid lineages, using long reads that coveredmirrored tRNA loci showing apparent heteroplasmy. Almostall sequenced dimers present asymmetric haplotypes at thethree pairs of anticodon sites. Consequently, vital sequencevariation between mt genome units is distributed withinmolecules. Asymmetric dimers that covalently link genomeunits encoding different tRNAs avoid the fitness costs ofbalancing selection. They ensure good balance among tRNAgenes in an organelle, and minimize the risk of transmittingmolecules that do not encode certain tRNAs to organelles orcells. Consequently, themtgenomeof these oniscids shouldbeassimilated as the �28 kb dimer, as only it carries all essen-tial tRNA genes. Under this view, variation at tRNA loci occurswithin a mt genome, not between mt genomes, and may notbe defined as heteroplasmy sensu stricto.

    Someof the sequencevariationbetweenhomologous tRNAloci is still distributed between molecules and corresponds totrue heteroplasmy: linear monomers with fully complemen-tary strandsmaypresentdifferenthaplotypeswithina lineage,and, most likely, within an individual. Between-monomervariation is however unlikely to be maintained by balancingselection, as results suggest that monomers do not replicate.

    Evidence argues against monomer replication

    Indeed, monomer replication into dimers should eventuallyequalize the frequencies of asymmetric haplotypes amongdimers andofmoleculeswithnoncomplementarybases amongmonomers (Supplemental Text in File S1). Yet, these frequen-cies differ, especially in A. vulgare BF, whose dimeric haplo-types are almost all asymmetric (Figure 5), whereas,30% ofmonomers havemismatched bases (Figure 6).While this discrep-ancy could be explained by the rapid death of mitochondria

    inheriting symmetric dimers lacking vital tRNAs, the rate oforganelle death required to explain the almost complete ab-sence of symmetric haplotypes in lineage BF seems unbear-able. We therefore reasonably conclude that most, if not all,dimers derive from the replication of other circular dimersrather than monomers in the BF lineage, and, by parsimony,in the other lineages as well.

    Our results also argue against the replication of monomersintoothers.Replicationproduces complementary strands, andshould therefore quickly eliminate all molecules carryingnoncomplementary bases. The successful SMRT sequencingof both strands of monomers (Figure 2B) also suggests thatsuch molecules should not be able to replicate, due to theirhairpins, and may at best become dimers (Figure 1). It hasbeen suggested that monomers may replicate via a rare cir-cular form (Doublet et al. 2013), but we found no evidencefor such molecules (Supplemental Text in File S1).

    A mechanism to generate linear monomers andnoncomplementary strands

    If linear monomers do not replicate, their existence andmaintenance must be explained by another mechanism. Wepropose that these molecules arise from the self-renaturationof single-stranded dimers. Palindromic genome units wouldbecome strands that are fully complementary, except at siteswhere the molecule is asymmetric, thereby explaining theexistence ofmonomerswith noncomplementary bases. SMRTsequencing should not have produced single-stranded DNAmolecules, since the whole processing of DNA has been (andmust be) performed at, or below, room temperature withoutdenaturing agents. In living cells however, a single-strandeddimermay be produced byDNA replication, duringwhich onestrand serves as template while the other strand is lagging, asobserved in Drosophila (Goddard and Wolstenholme 1980;Joers and Jacobs 2013). AssumingmtDNA replication in onis-cids proceeds similarly, we suggest that monomers with non-complementary bases are formed by the annealing of laggingstrands of asymmetric dimers before these strands had achance to serve as replication templates. As virtually all di-mers present asymmetric haplotypes, this mechanism shouldyield monomers that all carry noncomplementary bases.Monomers with complementary bases may arise from thesemonomers in which mismatched bases have been replaced byDNA repair enzymes (Li 2008), or from the hypothesizedcleavage of a dimer in two monomers (Doublet et al.2013). Our data do not reveal which mechanism is morelikely. However, it indicates that at least one may have oc-curred at higher rate in the sequenced individuals of lineageA. vulgare BF, which present a much higher fraction ofmonomers with fully complementary strands than theother linages. Irrespective of the nature of these mecha-nisms, between-monomer variation would result from thecontinuous generation of monomers with complementarybases within individuals, rather than from the recurrentdeath of zygotes that do not inherit such variation (balanc-ing selection).

    PacBio and Atypical Mitogenomes 277

    http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1/FileS1.pdfhttp://www.genetics.org/lookup/suppl/doi:10.1534/genetics.117.203380/-/DC1/FileS1.pdf

  • Remarkably, a monomer with noncomplementary basescontains the information needed to produce two tRNAs at justone locus, using both DNA strands. To our knowledge, thisway of compressing vital information has not been reported todate. As a gene cannot be transcribed in both strands, suchinformation can only be used after conversion of a monomerinto a dimer (Figure 1B)—a process that we suggest may notoccur. This begs the question of the adaptive benefits of linearmtDNAmonomers in general. Linear monomers could simplybe byproducts of the replication of dimers, during which self-renaturation may happen a various rates. Further studiesassessing the frequencies and production rates of monomersmay help to determine the functional importance of thesemolecules.

    Dimerization and mitochondrial genome compaction

    Our observations corroborate previous suspicions that appar-ent heteroplasmy was permitted by dimerization (Doubletet al. 2012; Chandler et al. 2015), and allow a scenario forthe origin of asymmetric dimers to be drawn.

    The ancestralmtDNAdimeric genome of oniscids probablyduplicated all genes, and was therefore totally palindromic.This is supported by the absence of apparent heteroplasmy atthe tRNA-Ala/Val locus (Table 2) in certain species that showdimeric mtDNA (Doublet et al. 2008, 2012). These species(which have not been investigated at the other two hetero-plasmic loci) carry tRNA-Ala at this locus, as do isopods with“standard” mtDNA (Kilpert et al. 2012). In an ancestral di-meric genome, one of the mirrored tRNA-Ala genes wouldhave become a new tRNA-Val gene by “anticodon shift” (fromTGC to TAC via base substitution)—a type of event that issuspected to have occurred in diverse eukaryotic lineages(Rogers and Griffiths-Jones 2014). Then, as long as this newlycreated asymmetry subsisted, the two mirrored tRNA-Valgenes that were initially present at another locus could be de-leted without compromising viability. The other two hetero-plasmic tRNA loci may have evolved in a similar fashion in theshared ancestor of the species we studied.

    The evolution toward shorter, asymmetric dimers contrib-uted to the extreme level of compaction ofmt genome units inoniscids (Doublet et al. 2015), and may have been adaptive ifit saved energy for mtDNA replication. Similar mtDNA com-paction and tRNA production rates could have been achievedby simply deleting one of the mirrored tRNA genes at severalpairs of loci. The evolutionary path taken clearly minimizedthe asymmetry between genome units, possibly to sustaintransition between mtDNA conformations and recombina-tion between units (discussed in next section).

    Interestingly, dimerization appears to have permitted fur-ther genome compaction. While this evolution seems coun-terproductive in terms of space saving, the net increase inmolecule size should not be seen as inefficient compaction. Anasymmetric dimer may be slightly more efficient in storinggenetic content than two monomeric mtDNA molecules car-rying all tRNA genes, and should not require more energy forreplication. Since returning to a standardmonomeric genome

    without losing several tRNA genes now requires an improb-able chain of mutational events, the maintenance of dimericgenomes tells little about the potential initial benefits ofdimerization. These benefitsmay be revealed by investigatinglineageswith fully palindromicdimeric genomes (i.e.,withoutapparent heteroplasmy), if any exists.

    Recombination and concerted evolution in adimeric genome

    Conservatively, each individual can be considered as carryinga single asymmetric haplotype at dimeric molecules. We in-deed cannot exclude that rare haplotypes found in dimers(Figure 5) simply result from sequencing errors, which weestimated at �7.5% at these sites. The haplotype that isshared by A. vulgare BF and A. nasatum (Figure 5) may rep-resent an ancestral state that has been maintained since thelast common ancestor of both species, �20 MYA (Beckinget al. 2017). Alternatively, this haplotype may have evolvedindependently in these two Armadillidium lineages. Evolu-tionary convergence is less parsimonious, considering thateight different haplotypes (23, considering their orientationwith respect to the head-to-head junction) can encode allrequired tRNAs, and all should be equivalent with respectto fitness. Long-term maintenance of a given haplotype isexpected, since mutation at one of the asymmetric anticodonsites produces a variant that does not encode all tRNAs andthat should be counter-selected. This also applies for a cross-ing over between different genome units of two dimers.Crossing overs between genome units within a dimerwould however lead to a new haplotype encoding all re-quired tRNAs. Such event may have occurred in A. vulgareWXf, causing an inversion of the region encompassing thehead-to-head junction between the two genome units andeffectively reversing the haplotype found in the otherArmadillidium lineages. Another crossing-over may haveoccurred between the tRNA Leu1/Leu2 locus and the twoother loci, explaining the haplotype found in T. rathkei(Figure 5)—a species that diverged from Armadillidium�40 MYA (Becking et al. 2017).

    Recombination of mtDNA has been reported in diverselineages such as scorpions (Gantenbein et al. 2005), bivalves(Burzynski et al. 2003), teleost fishes (Hoarau et al. 2002;Tatarenkov and Avise 2007), lizards (Ujvari et al. 2007), andhumans (Slate and Gemmell 2004). In oniscids, recombina-tion of mtDNA can explain not only the different haplotypeswe established, but also the almost perfect identity of ge-nome units within highly divergent species. The concertedevolution of genome units should benefit the peculiar oniscidmt genome in at least two ways. First, adaptive evolution ofdimeric mtDNA molecules would be severely constrainedwithout recombination or any other mechanism susceptibleto homogenize genome units. In the absence of such mecha-nisms, an adaptive mutation would indeed remain at a “het-erozygous” state until the equivalent mutation occurs at themirrored site of the other genome unit. Second, recombina-tion restricts divergence of mirrored mitochondrial genes

    278 J. Peccoud et al.

  • that are bound to fulfill the same fundamental function (cel-lular respiration or mt protein synthesis). Alternatively torecombination, replication of monomers with complemen-tary bases into dimers can homogenize genome units,thereby offering an adaptive explanation for the existenceof linear monomers. We, however, view this process as dele-terious, since it should predominantly yield totally symmet-rical dimers lacking tRNAs genes.

    Regardless of the underlying mechanisms, homogeniza-tion of genome units of a dimer proceeds at a moderate pace.Indeed, the genome units of A. vulgare WXf differ at threeprivate sites (Table 2), and other similar sites have beenreported in lineages from C. convexus and T. rathkei(Chandler et al. 2015). None of the three private WXf muta-tions are involved in the encoding of alternative tRNAs, andno evidence suggests that variation at these positions is selected.Variation at these sites is simply maintained through the inher-itance of the asymmetric dimers carrying it. The accumulation ofthree asymmetric mutations in A. vulgareWXf must have takenthousands of generations. Relatively longmaintenance of asym-metric mutations may have left more time for the loss of tRNAloci, under the evolutionary scenario we described previously.Once these tRNA loci have been lost, variation at mirrored an-ticodons must have beenmaintained for millions of generationsby the selection of asymmetric molecules in the face of homog-enization of genome units.

    Acknowledgments

    We thank Isabelle Giraud, Thomas Becking, and LiseErnenwein for animal rearing and preparation of DNAsamples used for sequencing. We also thank Matthew Hahnand two anonymous reviewers for their recommendationsand comments on the manuscript. This work was funded byEuropean Research Council Starting Grant 260729 (Endo-SexDet) and Agence Nationale de la Recherche Grant ANR-15-CE32-0006-01 (CytoSexDet) to R.C., the 2015–2020State-Region Planning Contract and European Regional De-velopment Fund, and intramural funds from the Centre Na-tional de la Recherche Scientifique and the University ofPoitiers. C.C. was funded by the National Science Founda-tion (grant NSF-DEB1453298).

    Literature Cited

    Abascal, F., D. Posada, and R. Zardoya, 2012 The evolution of themitochondrial genetic code in arthropods revisited. MDN 23:84–91.

    Becking, T., I. Giraud, M. Raimond, B. Moumen, C. Chandler et al.,2017 Diversity and evolution of sex determination systems interrestrial isopods. Sci. Rep. 7: 1084.

    Boore, J. L., 1999 Animal mitochondrial genomes. Nucleic AcidsRes. 27: 1767–1780.

    Breton, S., and D. T. Stewart, 2015 Atypical mitochondrial inher-itance patterns in eukaryotes. Genome 58: 423–431.

    Burzynski, A., M. Zbawicka, D. O. F. Skibinski, and R. Wenne,2003 Evidence for recombination of mtDNA in the marine

    mussel Mytilus trossulus from the Baltic. Mol. Biol. Evol. 20:388–392.

    Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopouloset al., 2009 BLAST+: architecture and applications. BMC Bio-informatics 10: 421.

    Chaisson, M. J., and G. Tesler, 2012 Mapping single moleculesequencing reads using basic local alignment with successiverefinement (BLASR): application and theory. BMC Bioinfor-matics 13: 238.

    Chandler, C. H., M. Badawi, B. Moumen, P. Greve, and R. Cordaux,2015 Multiple conserved heteroplasmic sites in tRNA genes inthe mitochondrial genomes of terrestrial isopods (Oniscidea).G3 5: 1317–1322.

    Dickey, A. M., V. Kumar, J. K. Morgan, A. Jara-Cavieres, R. G. Shatterset al., 2015 A novel mitochondrial genome architecture in thrips(Insecta: Thysanoptera): extreme size asymmetry among chromo-somes and possible recent control region duplication. BMC Geno-mics 16: 439.

    Dorai-Raj, S., 2014 Binom: Binomial Confidence Intervals ForSeveral Parameterizations. Available at: https://cran.r-project.org/package=binom. Accessed: January 9, 2015.

    Doublet, V., C. Souty-Grosset, D. Bouchon, R. Cordaux, and I. Marcadé,2008 A thirty million year-old inherited heteroplasmy. PLoS One3: e2938.

    Doublet, V., R. Raimond, F. Grandjean, A. Lafitte, C. Souty-Grossetet al., 2012 Widespread atypical mitochondrial DNA structurein isopods (crustacea, Peracarida) related to a constitutive het-eroplasmy in terrestrial species. Genome 55: 234–244.

    Doublet, V., Q. Helleu, R. Raimond, C. Souty-Grosset, and I. Mar-cadé, 2013 Inverted repeats and genome architecture conver-sions of terrestrial isopods mitochondrial DNA. J. Mol. Evol. 77:107–118.

    Doublet, V., E. Ubrig, A. Alioua, D. Bouchon, I. Marcadé et al.,2015 Large gene overlaps and tRNA processing in the compactmitochondrial genome of the crustacean Armadillidium vulgare.RNA Biol. 12: 1159–1168.

    Drummond, A. J., B. Ashton, S. Buxton, M. Cheung, A. Coope et al.,2010 Geneious v5. Available at: http://www.geneious.com/.Accessed: July 19, 2013.

    Edgar, R. C., 2004 MUSCLE: multiple sequence alignment withhigh accuracy and high throughput. Nucleic Acids Res. 32:1792–1797.

    Eid, J., A. Fehr, J. Gray, K. Luong, J. Lyle et al., 2009 Real-timeDNA sequencing from single polymerase molecules. Science323: 133–138.

    Fichot, E. B., and R. S. Norman, 2013 Microbial phylogenetic pro-filing with the Pacific Biosciences sequencing platform. Micro-biome 1: 10.

    Gantenbein, B., V. Fet, I. A. Gantenbein-Ritter, and F. Balloux,2005 Evidence for recombination in scorpion mitochondrialDNA (Scorpiones: Buthidae). Proc. R. Soc. Lond., Ser. B: Biol.Sci. 272: 697–704.

    Gissi, C., G. Pesole, F. Mastrototaro, F. Iannelli, V. Guida et al.,2010 Hypervariability of ascidian mitochondrial gene order:exposing the Myth of deuterostome organelle genome stability.Mol. Biol. Evol. 27: 211–215.

    Goddard, J. M., and D. R. Wolstenholme, 1980 Origin and direc-tion of replication in mitochondrial-dna molecules from the ge-nus Drosophila. Nucleic Acids Res. 8: 741–757.

    Helfenbein, K. G., H. M. Fourcade, R. G. Vanjani, and J. L. Boore,2004 The mitochondrial genome of Paraspadella gotoi is highlyreduced and reveals that chaetognaths are a sister group to pro-tostomes. Proc. Natl. Acad. Sci. USA 101: 10639–10643.

    Hoarau, G., S. Holla, R. Lescasse, W. T. Stam, and J. L. Olsen,2002 Heteroplasmy and evidence for recombination in the mi-tochondrial control region of the flatfish Platichthys flesus. Mol.Biol. Evol. 19: 2261–2264.

    PacBio and Atypical Mitogenomes 279

    https://cran.r-project.org/package=binomhttps://cran.r-project.org/package=binomhttp://www.geneious.com/

  • Joers, P., and H. T. Jacobs, 2013 Analysis of replication intermediatesindicates that Drosophila melanogaster mitochondrial DNA repli-cates by a strand-coupled theta mechanism. PLoS One 8: e53249.

    Kilpert, F., C. Held, and L. Podsiadlowski, 2012 Multiple rearrange-ments in mitochondrial genomes of isopoda and phylogenetic im-plications. Mol. Phylogenet. Evol. 64: 106–117.

    Langmead, B., and S. L. Salzberg, 2012 Fast gapped-read align-ment with Bowtie 2. Nat. Methods 9: 357–359.

    Lawrence, M., W. Huber, H. Pages, P. Aboyoun, M. Carlson et al.,2013 Software for computing and annotating genomic ranges.PLoS Comput. Biol. 9: e1003118.

    Leclercq, S., J. Thézé, M. A. Chebbi, I. Giraud, B. Moumen et al.,2016 Birth of a W sex chromosome by horizontal transfer ofWolbachia bacterial symbiont genome. Proc. Natl. Acad. Sci.USA 113: 15036–15041.

    Li, G. M., 2008 Mechanisms and functions of DNA mismatch re-pair. Cell Res. 18: 85–98.

    Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al.,2009 The sequence alignment/map format and SAMtools. Bio-informatics 25: 2078–2079.

    Liu, Y. G., T. Kurokawa, M. Sekino, T. Tanabe, and K. Watanabe,2013 Complete mitochondrial DNA sequence of the ark shellScapharca broughtonii: an ultra-large metazoan mitochondrialgenome. Comp. Biochem. Physiol. Part D Genomics Proteomics8: 72–81.

    Marcadé, I., R. Cordaux, V. Doublet, C. Debenest, D. Bouchon et al.,2007 Structure and evolution of the atypical mitochondrialgenome of Armadillidium vulgare (Isopoda, crustacea). J. Mol.Evol. 65: 651–659.

    Okimoto, R., J. L. Macfarlane, D. O. Clary, and D. R. Wolstenholme,1992 The mitochondrial genomes of two nematodes, Caeno-rhabditis elegans and Ascaris suum. Genetics 130: 471–498.

    Pagès, H., P. Aboyoun, R. Gentleman, and S. Debroy, 2017 Biostrings:String Objects Representing Biological Sequences, and Matching Algo-rithms. R package version 2.44.1.

    Raimond, R., I. Marcadé, D. Bouchon, T. Rigaud, J. P. Bossy et al.,1999 Organization of the large mitochondrial genome in theisopod Armadillidium vulgare. Genetics 151: 203–210.

    R Core Team, 2014 R: A Language and Environment for StatisticalComputing. R Foundation for Statistical Computing, Vienna.

    Robinson, J. T., H. Thorvaldsdottir, W. Winckler, M. Guttman, E. S.Lander et al., 2011 Integrative genomics viewer. Nat. Biotech-nol. 29: 24–26.

    Rogers, H. H., and S. Griffiths-Jones, 2014 tRNA anticodon shiftsin eukaryotic genomes. RNA 20: 269–281.

    Singh, T. R., G. Tsagkogeorga, F. Delsuc, S. Blanquart, N. Shenkaret al., 2009 Tunicate mitogenomics and phylogenetics: pecu-liarities of the Herdmania momus mitochondrial genome andsupport for the new chordate phylogeny. BMC Genomics 10:534.

    Slate, J., and N. J. Gemmell, 2004 Eve ‘n’ Steve: recombination ofhuman mitochondrial DNA. Trends Ecol. Evol. 19: 561–563.

    Stewart, J. B., and P. F. Chinnery, 2015 The dynamics of mito-chondrial DNA heteroplasmy: implications for human healthand disease. Nat. Rev. Genet. 16: 530–542.

    Suga, K., D. B. M. Welch, Y. Tanaka, Y. Sakakura, and A. Hagiwarak,2008 Two circular chromosomes of unequal copy number makeup the mitochondrial genome of the rotifer Brachionus plicatilis.Mol. Biol. Evol. 25: 1129–1137.

    Tatarenkov, A., and J. C. Avise, 2007 Rapid concerted evolution inanimal mitochondrial DNA. Proc. Biol. Sci. 274: 1795–1798.

    Travers, K. J., C. S. Chin, D. R. Rank, J. S. Eid, and S. W. Turner,2010 A flexible and efficient template format for circular con-sensus sequencing and SNP detection. Nucleic Acids Res. 38: 8.

    Ujvari, B., M. Dowton, and T. Madsen, 2007 Mitochondrial DNArecombination in a free-ranging Australian lizard. Biol. Lett. 3:189–192.

    Walker, B. J., T. Abeel, T. Shea, M. Priest, A. Abouelliel et al.,2014 Pilon: an integrated tool for comprehensive microbialvariant detection and genome assembly improvement. PLoSOne 9: e112963.

    Watanabe, K., and S.-i. Yokobori, 2011 tRNA modification andgenetic code variations in animal mitochondria. J. Nucleic Acids2011: 623095.

    Wolff, J. N., D. J. White, M. Woodhams, H. E. White, and N. J.Gemmell, 2011 The strength and timing of the mitochondrialBottleneck in salmon suggests a conserved mechanism in verte-brates. PLoS One 6: e20522.

    Communicating editor: M. W. Hahn

    280 J. Peccoud et al.