Loading metrics

Open Access

What is mutation? A chapter in the series: How microbes “jeopardize” the modern synthesis

Affiliations Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America, Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, United States of America, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America, The Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas, United States of America

ORCID logo

* E-mail: [email protected]

  • Devon M. Fitzgerald, 
  • Susan M. Rosenberg

PLOS

Published: April 1, 2019

  • https://doi.org/10.1371/journal.pgen.1007995
  • Reader Comments

Fig 1

Mutations drive evolution and were assumed to occur by chance: constantly, gradually, roughly uniformly in genomes, and without regard to environmental inputs, but this view is being revised by discoveries of molecular mechanisms of mutation in bacteria, now translated across the tree of life. These mechanisms reveal a picture of highly regulated mutagenesis, up-regulated temporally by stress responses and activated when cells/organisms are maladapted to their environments—when stressed—potentially accelerating adaptation. Mutation is also nonrandom in genomic space, with multiple simultaneous mutations falling in local clusters, which may allow concerted evolution—the multiple changes needed to adapt protein functions and protein machines encoded by linked genes. Molecular mechanisms of stress-inducible mutation change ideas about evolution and suggest different ways to model and address cancer development, infectious disease, and evolution generally.

Citation: Fitzgerald DM, Rosenberg SM (2019) What is mutation? A chapter in the series: How microbes “jeopardize” the modern synthesis. PLoS Genet 15(4): e1007995. https://doi.org/10.1371/journal.pgen.1007995

Editor: W. Ford Doolittle, Dalhousie University, CANADA

Copyright: © 2019 Fitzgerald, Rosenberg. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the American Cancer Society Postdoctoral Fellowship 132206-PF-18-035-01-DMC (DMF) and NIH grant R35-GM122598. The funders had no role in the preparation of the article.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Mutation is any change in the sequence of an organism’s genome or the process by which the changes occur. Mutations range from single-basepair alterations to megabasepair deletions, insertions, duplications, and inversions. Though seemingly simple, ideas about mutation became entangled with the initially simplifying assumptions of both Darwin himself and the “Modern Synthesis”—the geneticists who embraced Darwin in the pre-DNA early 20th century, beginning evolutionary biology. The assumptions of purely “chance” mutations that occur constantly, gradually, and uniformly in genomes have underpinned biology for almost a century but began as a “wait-and-see”–based acknowledgment by early evolutionary biologists that they did not know the chemical nature of genes or how mutations in genes might occur.

Darwin considered generation of variation by chance to be a simplifying assumption, given that the origins of variation (and genes!) were unknown in his time, but he appears to have thought chance variation to be unlikely: “I have hitherto sometimes spoken as if the variations—so common and multiform in organic beings under domestication, and in a lesser degree in those in a state of nature—had been due to chance. This, of course, is a wholly incorrect expression, but it serves to acknowledge plainly our ignorance of the cause of particular variation [Chapter 5, 1].”

He also described multiple instances in which the degree and types of observable variation change in response to environmental exposures, thus seeming open to the possibility that the generation of variation might be environmentally responsive [ 1 ]. However, even once mutations were described on a molecular level, many continued to treat spontaneous mutations as necessarily chance occurrences—typically as mistakes occurring during DNA replication or repair. Darwinian evolution, however, requires only two things: heritable variation (usually genetic changes) and selection imposed by the environment. Any of many possible modes of mutation—purely “chance” or highly biased, regulated mechanisms—are compatible with evolution by variation and selection.

Here, we review some of the wealth of evidence, much of which originated in microbes, that reframes mutagenesis as dynamic and highly regulated processes. Mutation is regulated temporally by stress responses, occurring when organisms are poorly adapted to their environments, and occurs nonrandomly in genomes. Both biases may accelerate adaptation.

Bacteria teach biologists about evolution

Microbes were initially held as proof of the independence of mutational processes and selective environments. The Luria–Delbruck experiment (1943) demonstrated that bacterial mutations to phage resistance can occur prior to phage exposure [ 2 ], and the Lederbergs showed similar results for resistance to many antibiotics [ 3 ]. However, discovery of the SOS DNA-damage response and its accompanying mutagenesis [ 4 – 7 ] in the post-DNA world of molecular genetics began to erode the random-mutation zeitgeist. Harrison Echols thought that the SOS response conferred “inducible evolution” [ 8 ], echoing Barbara McClintock’s similar SOS-inspired suggestion of adaptation by regulated bursts of genome instability [ 9 ]. But SOS mutagenesis might be an unavoidable byproduct of DNA repair, and high-fidelity repair might be difficult to evolve, many argued. John Cairns’ later proposal of “directed” or “adaptive” mutagenesis in starvation-stressed Escherichia coli [ 10 , 11 ] reframed the supposed randomness of mutation as an exciting problem not yet solved. The mutagenesis they studied under the nonlethal environment of starvation is now known to reflect stress-induced mutagenesis—mutation up-regulated by stress responses. Its molecular mechanism(s), reviewed here, demonstrate regulation of mutagenesis. Similar mechanisms are now described from bacteria to humans, suggesting that regulated mutagenesis may be the rule, not the exception (discussed here and reviewed more extensively, [ 12 ]).

Stress-induced mutagenic DNA break repair in E . coli

DNA double-strand breaks (DSBs) occur spontaneously in approximately 1% of proliferating E . coli [ 13 , 14 ]. In unstressed E . coli , DSB repair by homologous recombination (HR) is relatively high fidelity. However, activation of the general stress response, for example, by starvation, flips a switch, causing DSB repair to become mutagenic [ 15 , 16 ]. This process of mutagenic break repair (MBR) causes mutations preferentially when cells are poorly adapted to their environment—when stressed—and, as modeling indicates [ 17 – 20 ], may accelerate adaptation.

At least three stress responses cooperate to increase mutagenesis in starving E . coli . The membrane stress response contributes to DSB formation at some loci [ 21 ]; the SOS response up-regulates error-prone DNA polymerases used in one of two MBR mechanisms [ 22 – 24 ]; and the general stress response licenses the use of, or persistence of errors made by, those DNA polymerases in DSB repair [ 15 , 16 ]. The requirement for multiple stress responses indicates that cells check a few environmental conditions before flipping the switch to mutation [ 25 ]. E . coli MBR is a model of general principles in mutation from bacteria to human: the regulation of mutation in time, by stress responses, and its restriction in genomic space, limited to small genomic regions, in the case of MBR, near DNA breaks. We look at MBR, then other mutation mechanisms in microbes and multicellular organisms, which share these common features.

MBR mechanisms

Two distinct but related MBR mechanisms occur in starving E . coli , and both require activation of the general/starvation response. Moreover, both occur without the starvation stress if the general stress response is artificially up-regulated [ 15 , 16 ], indicating that the stress response itself without actual stress is sufficient. Homologous-recombinational (homology-directed) MBR (HR-MBR) generates base substitutions and small indels via DNA-polymerase errors during DSB-repair synthesis ( Fig 1A–1F ). Microhomologous MBR causes amplifications and other gross chromosomal rearrangements (GCRs) [ 26 – 28 ], most probably by microhomology-mediated break-induced replication (MMBIR) [ 28 , 29 ] ( Fig 1A–1C , 1G and 1H ). Both MBR pathways challenge traditional assumptions about the "chance" nature of mutations.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

(a–c) RecBCD nuclease loads RecA HR protein onto ssDNA, similarly to human BRCA2 loading RAD51; basepairing with a strand of identical duplex DNA (gray, e.g., a sister chromosome). Parallel lines, basepaired DNA strands. Repair synthesis (dashed lines) is switched to a mutagenic mode by the general stress response (sigma S). DNA polymerase errors (d, purple X) generate indels (e, purple XX) and base substitutions (f, purple XX). Microhomologous MBR requires DNA Pol I for template switching to regions containing microhomology (g), of as little as a few basepairs, and initiates replication, creating genome rearrangements; (h) a duplicated chromosome segment (blue arrows) is shown here. Circled numbers and shading indicate the three main events in HR-MBR: ① a DSB and its repair by HR, ② the SOS response (pink), and ③ the general stress response (blue). Note that HR-MBR (d–f, purple) requires both the SOS response (②, pink, which up-regulates error-prone DNA Pol IV, necessary for HR-MBR) and general stress response (③, blue), but microhomologous MBR (g–h, blue) requires the general stress response but not SOS (③, blue). Figure modified from [ 12 ]. HR, homologous recombination; MBR, mutagenic break repair; ssDNA, single-stranded DNA.

https://doi.org/10.1371/journal.pgen.1007995.g001

Both MBR mechanisms are initiated by a DSB and require HR DSB-repair proteins ( Fig 1 , ①) [ 15 , 28 , 30 – 33 ]. The first steps mirror standard HR DSB repair: RecBCD nuclease processes DSB ends and loads RecA HR protein ( Fig 1A and 1B ). Next, the RecA–DNA nucleoprotein filament can activate the SOS response ( Fig 1 , ② pink), which is required for HR-MBR but not microhomologous MBR. RecA also facilitates strand invasion—the initial contact between the broken DNA molecule and an identical sister chromosome from which repair is templated ( Fig 1C ). In unstressed cells, this intermediate leads to high-fidelity HR repair; however, if the general stress response is activated, repair proceeds via one of two mutagenic pathways ( Fig 1D–1H , ③). In HR-DSB repair, errors generated by error-prone SOS-up-regulated DNA polymerases IV (DinB), V (UmuDC), and II (PolB) accumulate in the tracts of repair synthesis during HR repair ( Fig 1D ) [ 22 , 23 , 34 ]. Activation of the general stress response licenses the use of these polymerases and/or prevents the removal of errors they generate: base substitutions and small indels ( Fig 1E, 1F ) [ 35 , 36 ] that are located mostly in clusters/hotspots of about 100 kb around the original DSB location [ 30 ]. Microhomologous MBR requires DNA Pol I, which is proposed to promote microhomology-dependent template switching during repair synthesis to generate GCRs ( Fig 1G and 1H ) [ 28 ]. Similar MMBIR mechanisms are proposed to underlie many DSB-driven GCRs in human genetic diseases and cancers [ 28 , 29 , 37 ].

Stress response regulation of E . coli MBR

Environmentally responsive and temporally regulated MBR mechanisms challenge long-held assumptions about the constant, gradual nature of mutagenesis and its blindness to an organism’s environmental suitability, or the lack of it, showing that mutagenesis is regulated tightly via environmental inputs. The general stress response controls the switch between high-fidelity or mutagenic DSB repair [ 15 , 16 ]. This stress response, controlled by the alternative sigma factor σ S , is activated by starvation, cold, acid, antibiotic, oxidative, and osmotic stresses, among others. During a general stress response, the σ S transcriptional activator increases the transcription of hundreds of genes (approximately 10% of all E . coli genes) that provide a range of protective functions (reviewed, [ 38 ]). We do not know exactly how the general stress response promotes mutagenesis. Two possibilities are as follows. First, the general stress response modestly up-regulates error-prone Pol IV above SOS-induced levels [ 39 ]. This might be the rate-limiting step. Also, the general stress response down-regulates mismatch repair (MMR) enzymes MutS and MutH [ 40 , 41 ]. The HR-MBR mutation spectrum is similar to that of unstressed MMR-deficient strains [ 35 , 36 , 42 ], suggesting that MMR becomes limiting transiently during HR-MBR [ 36 , 43 , 44 ]. Other σ S targets are also plausible, including down-regulation of the high-fidelity replicative DNA Pol III. Together, these observations suggest a model in which the general stress response enables error-prone polymerases to participate in DSB repair and/or allows the errors introduced by these polymerases to escape mismatch repair.

At least two other stress responses also contribute to one or both MBR mechanisms. The SOS DNA-damage response is required for HR-MBR [ 45 ] but not microhomologous MBR [ 22 ]. The SOS response is detected in about 25% of cells with a reparable DSB [ 13 ] and so comes automatically with the DSB that initiates MBR. (The 75% without SOS may repair fast enough to avoid SOS [ 13 ].) The SOS response halts cell division and activates DNA-damage tolerance and repair pathways. The primary role of the SOS response in HR-MBR is the upregulation of the error-prone DNA polymerases IV and V and possibly II. In some assays, production of Pol IV completely restores mutagenesis in SOS-defective cells [ 23 ]. In others, Pols II and V also contribute to mutagenesis [ 16 , 34 , 46 ]. Finally, the membrane stress response, regulated by σ E , promotes MBR at some loci by playing a role in spontaneous DSB formation through an unknown mechanism (see “Localization of MBR-dependent mutations”) [ 21 ]. The membrane stress response is triggered by an accumulation of unfolded envelope proteins caused by heat and other stressors [ 47 ] and therefore appears to couple these stressors to mutagenesis.

A genome-wide screen revealed a network of 93 genes required for starvation stress–induced MBR [ 25 ]. Strikingly, over half participate in sensing or signaling various types of stress and act upstream of activation of the key stress response regulators, which are hubs in the MBR network. During starvation stress, at least 31 genes function upstream of (in activation of) the general stress response. Most encode proteins used in electron transfer and other metabolic pathways, suggesting that these may be the primary sensors of starvation stress. Additionally, at least six genes are required for activation of the SOS response during MBR, and at least 33 MBR-network genes are required for activation of the membrane stress response. The 93 MBR genes form a highly connected network based on protein–protein interactions with the three stress response regulators (σ S , RecA/LexA, and σ E ) as nonredundant network hubs [ 25 ]. The MBR network highlights the importance of tight, combinatorial stress response regulation of mutagenesis in response to multiple inputs.

Generality of general stress response–promoted mutation

In E . coli , σ S -dependent mutagenesis has a mutational signature that is distinct from that seen in low-stress mutation accumulation (MA) studies and generation-dependent mutagenesis [ 34 , 35 , 42 , 48 ]. Importantly, the nucleotide diversity in genomes of extant E . coli and other bacteria is described better by the σ S -dependent signature than the signature seen in MA studies [ 48 ]. Specifically, both σ S -dependent mutations and those seen in extant species have much higher ratios of transitions to transversions than is seen in MA experiments or expected by chance. This suggests that a significant portion of adaptive mutations in bacteria arise from σ S -dependent stress-induced mutation mechanisms such as MBR [ 48 ]. Furthermore, mathematical modeling suggests that stress response–regulated mutagenesis, such as MBR, promotes adaptation in changing environments [ 17 – 20 ]. Organisms that encode regulated mutagenesis mechanisms may have an increased ability to evolve, which would promote the evolution and maintenance of such mechanisms by second-order selection [ 17 , 19 , 20 ].

Localization of MBR-dependent mutations

MBR generates mutations in hotspots close to the site of the instigating DSB, not at random locations in the genome [ 30 , 49 ]. Hotspotting near DSBs is best described for HR-MBR initiated by engineered DSBs at various sites in the bacterial chromosome [ 30 ]. Mutations are most frequent within the first kilobase (kb) pair on either side of the DSB, and then fall off to near background levels approximately 60 kb from the break, with a weak long-distance hot zone of around 1 MB from the DSB site. This pattern of mutations supports the model that most MBR-dependent mutations arise from DNA polymerase errors during HR repair synthesis, and the remainder arise during more processive error-prone break-induced replication. The observation that mutations occur near DSBs does not, in itself, suggest that mutations are more likely to occur in certain genomic regions or in locations related to an organism’s adaptive “need.” However, it does suggest that the distribution of mutations is likely to mirror the distribution of DSBs, and the following lines of evidence suggest that DSB distributions may be nonrandom and reflect potential utility of genes in particular environments.

The sources and distributions of spontaneous DSBs are poorly understood in all organisms (reviewed, [ 14 ]), but we have some clues about the origins of DSBs that lead to MBR. First, transcriptional RNA–DNA hybrids (R-loops) are one source of MBR-promoting DSBs [ 50 ]. R-loops have been implicated in DSB formation in many experimental systems, although the exact mechanism(s) of DNA breakage is unresolved (reviewed, [ 51 ]). Though the distribution of R-loops has not been thoroughly assessed in starving E . coli , R-loops tend to be biased toward highly transcribed genes, promoters, and noncoding-RNA genes [ 52 – 54 ] and might, therefore, target DSBs and mutations to those sites. Also, activation of the σ E membrane stress response is required for DSB formation in some assays and might target DSBs in genomic space [ 21 ]. The mechanism by which the σ E stress response causes DSBs is unknown, but one possibility is that σ E -activated transcription causes DSBs directly (rather than via gene products’ up- or down-regulation), via an R-loop–dependent or other transcription-dependent mechanism. R-loops and the σ E stress response might direct DSBs, and thus mutations, to regions of the genome with more adaptive potential for a given environment: transcribed genes and regulatory elements (promoters and regulatory small RNAs).

Additionally, MBR-dependent mutations can occur in clusters [ 55 ]. When a MBR-induced mutation occurs, the probability of finding another mutation at neighboring sites 10 kb away is approximately 10 3 times higher than if the first mutation did not occur [ 55 ], and this is not true for a distant unlinked site in the genome [ 43 ], indicating that nearby mutations are not independent events. That is, linked mutations appear to occur simultaneously, in single MBR events. Such clusters are predicted to promote concerted evolution by simultaneously introducing changes to multiple domains of a protein or subunits of a complex protein machine [ 15 , 20 , 55 ]. Because multiple mutations are often needed for new functions to emerge, and often, the intermediate mutated states are less fit and counter selected, how complex protein machines evolve has been a long-standing problem [ 56 ]. Similar clusters have been identified in many organisms [ 57 ] and in cancer genomes, in which mutation clusters are called kataegis , Greek for (mutation) storms [ 58 – 60 ]. The mechanisms of mutation localization and co-occurrence revealed by MBR in E . coli have guided more mechanistic understanding of how mutation clusters occur across the tree of life.

Analyses of E . coli mutation accumulation lines and natural isolates indicate that local mutation rates vary by about one order of magnitude on the scale of approximately 10–100 kb [ 61 , 62 ]. It is possible, even likely, that the DSB-dependent mutation localization and co-occurring mutation clusters characteristic of MBR are important contributors to this nonuniformity in mutation rate. Similar degrees of variation in local mutation rates have been reported for other bacteria [ 63 ], yeast [ 64 ], and mammals (mouse, human, and other primates [ 65 , 66 ]) and could also result from MBR-like mutation mechanisms. Further analysis of natural isolates, with a specific focus on identifying clusters of cosegregating single-nucleotide variants, could indicate how frequent MBR-dependent mutation clusters are and how they shape genomes.

The molecular mechanisms of MBR reveal many ways by which mutations do not occur uniformly or independently from one another in genomic space. More work is needed to assess fully whether the MBR mechanism or genomes themselves have evolved to bias mutations to locations where they are most likely to be beneficial, such as genes actively transcribed in response to the experienced stressor.

Other regulated mutagenesis mechanisms in microbes

In addition to starvation-induced MBR in E . coli , diverse bacteria and single-celled eukaryotes display examples of stress response–up-regulated mutagenesis. Some of these mutation mechanisms provide additional insight into how mutation rates vary across genomes in ways that may accelerate adaptive evolution. Many share characteristics with E . coli MBR but differ enough to suggest that regulated mutagenesis has evolved independently multiple times, thus highlighting the importance of regulated mutagenesis to evolution-driven problems, such as combatting infectious disease and antimicrobial resistance. Potential strategies to counteract pathogen evolution require understanding of how genetic variation is generated in these organisms. Continued study of regulated-mutagenesis mechanisms may reveal potential new drug targets to block mutagenesis and thus evolution [ 12 , 25 , 67 ].

Other mechanisms of starvation stress–induced mutagenesis in bacteria

Diverse wild E . coli isolates show increased mutation rates during extended incubation on solid medium compared with vegetative growth, known as mutagenesis in aging colonies (MAC) [ 68 ]. In the one isolate tested for genetic requirements, MAC required σ S , decreased MMR capacity and error-prone Pol II but not DSB-repair proteins or SOS activation [ 68 ]—like, but not identical to, MBR in E . coli . Bacillus subtilis undergoes starvation-induced mutagenesis that is up-regulated by the ComK starvation-stress response and requires the SOS-induced Pol IV homolog YqjH but does not require DSB repair [ 69 , 70 ]. In B . subtilis , starvation-induced mutation of reporter genes increases with increased levels of transcription of those genes, dependently on the transcription-coupled repair factor Mfd [ 71 ], similarly to E . coli MBR [ 50 ]. This suggests that transcription directs starvation-induced mutations to transcribed regions of the B . subtilis genome, where they are more likely to be adaptive. This is similar to the hypothesized targeting of E . coli MBR but occurs through a DSB-independent mechanism.

Antibiotic-induced mutagenesis in bacteria

Many antibiotics, especially at subinhibitory concentrations, increase mutation rate and generate de novo resistance and cross-resistance in a variety of bacteria, including important pathogens. The β-lactam antibiotic ampicillin induces mutagenesis in E . coli , Pseudomonas aeruginosa , and Vibrio cholera via a mechanism requiring σ S , Pol IV, and limiting mismatch repair [ 41 ]. Whether DSBs are involved remains untested. The topoisomerase-inhibiting antibiotic ciprofloxacin (cipro) induces cipro resistance rapidly in E . coli , requiring HR proteins, SOS induction, and error-prone Pols II, IV, and V [ 72 ]. A requirement for σ S has only very recently been demonstrated, along with the demonstration that cipro-induced mutagenesis is σ S -dependent MBR, similar to that induced by starvation[ 73 ]. In fact, diverse antibiotics both create DSBs [ 74 ] and activate the general stress response in E . coli [ 41 ], suggesting that these antibiotics may increase mutagenesis both by increasing DNA damage and triggering a switch to low-fidelity repair of that damage.

Stress response regulation of mobile DNA elements in bacteria

Environmental stress up-regulates the activity of mobile DNA elements in many organisms, and this inducible genome instability is likely to be an important driver of evolution (reviewed, [ 75 ]). Although the mechanisms of regulation are poorly understood, stress response regulators have been implicated in a few cases. The general stress response promotes excision of an E . coli transposable prophage [ 76 ] and a Pseudomonas transposon [ 77 ]. Starvation increases the retromobility of Lactobacillus lactis LtrB group II intron through signaling by the small molecule regulators guanine pentaphosphate (ppGpp) and cyclic adenosine monophosphate (cAMP) [ 78 ]. Mobility of an E . coli transposon is increased by metabolic disruptions and negatively regulated by the σ E membrane stress response [ 79 ]. Also, stress can directly regulate mobile element activity without an intervening stress response: movement of the T4 td intron becomes promiscuous during oxidative stress through ROS-induced oxidation of an amino acid in the intron-encoded homing endonuclease, which makes it a transposase [ 80 ].

Regulated mutagenesis in eukaryotic microbes

Many examples of stress-associated mutagenesis and MBR have been reported in yeast, but stress response regulation has been demonstrated in only two cases. First, in the budding yeast Saccharomyces cerevisiae , the proteotoxic drug canavanine induces mutagenesis dependently on the MSN environmental stress response [ 81 ]. MSN-dependent mutagenesis requires the nonhomologous end-joining (NHEJ) protein Ku and two error-prone polymerases, Rev1 and Pol zeta (ζ) [ 81 ]. NHEJ is a relatively genome-destabilizing DSB-repair pathway, so MSN-dependent mutagenesis represents a stress-induced switch to MBR. NHEJ proteins are required for starvation-induced mutations in yeast as well [ 82 ]. Others have reported yeast MBR dependent on the error-prone DNA polymerase Rev3 [ 83 ] and spontaneous mutations dependent on error-prone polymerases Rev1 and Pol ζ [ 84 ]. Yeast also form mutation clusters by MBR [ 85 ] and undergo MMBIR similar to E . coli microhomologous MBR [ 86 ]. It is unknown whether these observations represent one or more mechanisms of mutation and whether MSN or other stress responses regulate mutagenesis in these cases. In all cases of yeast MBR, mutations are likely to occur near DSBs and, therefore, may be localized within genomes, as discussed for E . coli MBR.

Second, a heat shock response, activated by heat shock or protein denaturation, induces aneuploidy in S . cerevisiae by titration of the chaperone heat shock protein 90 (HSP90) [ 87 ]. Inhibitors of HSP90, such as radicicol, also induce aneuploidy. HSP90 is required for proper folding of kinetochore proteins in unstressed cells, so HSP90 titration or inhibition probably triggers aneuploidy through the disruption of kinetochore assembly [ 87 ]. The resulting yeast cell populations show high karyotypic and phenotypic variation and harbor cells resistant to radicicol and other drugs [ 87 ]. Aneuploidy in the form of extra chromosome copies may also facilitate adaptive evolution by providing a larger mutational target. Extra chromosomes may also buffer otherwise deleterious mutations through the sharing of gene products. Similar heat- and other stress-induced aneuploidy has been reported in Candida albicans and other yeast species, and can cause resistance to a variety of compounds, including clinically relevant antifungal drugs (reviewed, [ 88 ]). Some of these examples are likely to result from HSP90 titration, but other stress responses may be involved also.

Regulated mutagenesis in multicellular organisms

Although microbes led the way in revealing mechanisms of stress response–up-regulated mutagenesis, many microbial mutation mechanisms are mirrored throughout the tree of life, including in multicellular organisms. Stress response–up-regulated mutation mechanisms have been discovered in plants, flies, and human cells (reviewed, [ 12 ]). The potential adaptive roles of these mutation mechanisms are less clear in multicellular organisms than in microbes. Do these mechanisms contribute to germline variation (and thus organismal evolution), mosaicism and somatic cell evolution, or both? Or are they simply biproducts of other required cellular functions or stress-induced dysfunctions?

In the Drosophila germline, the HSP90 heat shock response increases transposon-mediated mutagenesis and can drive organismal adaptation [ 89 ]. Most other regulated mutation mechanisms characterized to date have been in somatic cells, in which they might contribute to mosaicism. Somatic diversity may be important during development and contribute to organismal fitness, as is the case with antibody diversification during B-cell maturation. For example, neural development might require genetic complexity and plasticity as organisms get differently “wired” during development, based on their experiences. However, up-regulated mutagenesis is also likely to drive pathogenic somatic evolution, such as during cancer development. For example, hypoxic stress responses trigger down-regulation of mismatch repair and down-regulate HR DSB-repair proteins RAD51 and BRCA1, leaving only chromosome-rearranging nonhomologous or microhomologous DSB-repair mechanisms (reviewed, [ 90 ]). Hypoxic stress response–induced mutagenesis occurs in mouse and human, suggesting an adaptive function in addition to its probable relevance to tumor biology. Tumors become hypoxic and induce hypoxic stress responses, which promote angiogenesis. Hypoxic stress responses may also promote tumor evolution via mutagenesis. The tumor growth factor β (TGF-β) signaling pathway also induces genome rearrangement by reduction of HR DSB repair in human cancer cell lines, leading to increased copy number alterations and resistance to multiple chemotherapeutic drugs [ 91 , 92 ]. Stress-induced and localized mutagenesis in multicellular organisms and the relevance of these mechanisms to cancer are reviewed in more detail elsewhere [ 12 ].

Evolution and applications of stress-induced mutation

Mutations provide the raw material for evolution but can also decrease the fitness of an organism. Therefore, mutation rates have, presumably, been finely tuned, apparently through second-order selection. Constitutively high mutation rates are advantageous in rapidly changing environments but decrease fitness in more stable (or periodically changing) environments. By biasing mutation to times of stress and to particular genomic regions, perhaps such regions relevant to a specific stress, stress-induced mutagenesis mechanisms provide the benefits of high mutation rate, while mitigating the risks. The ubiquity of these mechanisms throughout the tree of life supports their crucial role in evolution.

Stress-induced mutation mechanisms, first discovered in bacteria, challenge historical assumptions about the constancy and uniformity of mutation but do not violate strict interpretations of the Modern Synthesis. Mutation is still viewed as probabilistic, not deterministic, but we argue that regulated mutagenesis mechanisms greatly increase the probability that the useful mutations will occur at the right time, thus increasing an organism’s ability to evolve and, possibly, in the right places. Assumptions about the constant, gradual, clock-like, and environmentally blind nature of mutation are ready for retirement.

Stress-induced mutation mechanisms are likely to play important roles in human disease by promoting pathogen and tumor evolution and may drive evolution more generally. Mutation mechanisms may also be attractive drug targets for combatting infectious disease, cancer, and drug-resistance evolution in both [ 73 ]. Although many mechanisms of stress-inducible mutation have been identified in the past two decades [ 12 ], these are likely to be the tip of the iceberg. Some current pressing questions are highlighted below.

Open questions in mutation research

  • What fraction of total “spontaneous” mutagenesis results from mutagenesis up-regulated by stress responses? Do stress response–regulated mutation programs drive much of adaptive evolution in microbes? Multicellular organisms?
  • Are DSBs and the mutations they cause randomly distributed in genomic space? Or is DSB formation regulated, biased, or directed? By what mechanisms? Is this targeting adaptive?
  • Can stress response–regulated mutation mechanisms be targeted by anti-evolvability drugs that limit the generation of heritable diversity? Can these drugs prevent pathogens and cancers from out-evolving host responses and drugs?

Acknowledgments

We thank P.J. Hastings for comments on the manuscript and our colleagues in this bundle for extreme patience.

  • 1. Darwin CR. On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London: John Murray; 1859.
  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 6. Radman M. Phenomenology of an inducible mutagenic DNA repair pathway in Escherichia coli : SOS hypothesis. In: Prokash L, Sherman F, Miller M, Lawrence C, Tabor H, editors. Molecular and Environmental Aspects of Mutagenesis. Springfield, Illinois: Charles C. Thomas; 1974. p. 128–42.
  • 7. Radman M. SOS Repair Hypothesis: Phenomenology of an Inducible DNA Repair Which Is Accomplanied by Mutagenesis. In: Hanawalt P, editor. Molecular Mechanisms for Repair of DNA. New York: Plenum Press; 1975.
  • 73. Pribis JP, García-Villada L, Zhai Y, Lewin-Epstein O, Wang A, Liu J, et al. Gamblers: an antibiotic-induced evolvable cell subpopulation differentiated by reactive-oxygen-induced general stress response. Molecular cell. 2019;74 (in press). https://doi.org/10.1016/j.molcel.2019.02.037

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Mutation Research / Fundamental and Molecular Mechanisms of Mutagenesis Special Issue: DNA Repair and Genetic Instability

Kandace williams.

1 University of Toledo College of Medicine and Life Sciences, Department of Biochemistry & Cancer Biology, 3000 Transverse Dr., Toledo, OH 43614, USA

Robert W. Sobol

2 Department of Pharmacology & Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA.

3 University of Pittsburgh Cancer Institute, Hillman Cancer Center, Pittsburgh, PA 15213, USA.

4 Department of Human Genetics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA 15213, USA.

DNA repair, DNA damage tolerance and the DNA damage response are complex biochemical events within the cell that are of critical importance with regard to genome stability and the onset of carcinogenesis, as well as chemotherapeutic efficacy during cancer treatment. DNA repair, similar to DNA replication, exhibits high fidelity in normal circumstances. However, in the presence of excessive DNA damage, or dysfunctional DNA repair mechanisms, error-prone DNA repair and DNA damage tolerance can lead to genetic instability and carcinogenesis, as well as increased sensitivity or resistance to chemotherapeutics. A detailed knowledge of the sequential molecular and biochemical steps within these cellular pathways is essential for successful clinical treatment of cancer and inherited DNA replication or repair syndromes. This special issue contains articles that illustrate a wide range of topics to better understand the overarching theme of DNA repair and genetic instability in the context of human disease.

DNA glycosylases are the enzymes that initiate base excision repair (BER), a process in all living organisms that protects against a specific subset of base lesions produced by endogenous events, environmental chemicals and ionizing radiation. DNA glycosylases are exceptionally diverse and numerous, accounting for the extensive list of lesions repaired by BER. Susan Wallace and coauthors describe the most current biochemical and structural data for Neil3, a specific glycosylase recently discovered in vertebrate cells. Neil3 has homology to bacterial Fpg/Nei proteins, but with diverse substrate specificity, different structural features and an uncertain biological role. Another pair of DNA glycosylases, Methyl-CpG Domain Protein 4 (MBD4) and Thymine DNA Glycosylase (TDG), remove uracil or thymine from U:T or G:T mispairs respectively, generated by deamination of cytosine or 5-methylcytosine at CpG sites within the genome. New evidence presented by Joann Sweasy indicates that these two glycosylases, previously believed to be redundant, have diverse functions within the cell. Tumor-associated mutations of both glycosylases that generate altered protein function are also discussed in this comprehensive review.

5-Fluorouracil (5-FU) causes the introduction of uracil into DNA by the inhibition of TTP synthesis, and is itself incorporated into DNA. Both base substitutions are recognized by uracil DNA glycosylase (UDG), and single-strand selective monofunctional uracil DNA glycosylase (SMUG1). In the study reported by Michael Wyatt and colleagues, the individual roles of UDG and SMUG1 were examined during cellular response to 5-FU. Using exposure protocols that mimic clinical treatment, they discovered that loss of SMUG1, but not UDG, prolongs S phase and Chk1 phosphorylation after 5-FU treatment, implicating a specific role for SMUG1 in DNA damage recognition and repair after cellular exposure to this chemical.

MUTYH is the human ortholog of MutY, a highly conserved glycosylase, and is integral to one of the most complicated BER pathways. This BER enzyme recognizes and removes adenine incorrectly inserted during DNA replication opposite a template containing the oxidative 8-oxodG lesion. Subsequent repair activity by the BER pathway recreates the 8-oxodG:C base pair that is then recognized by the OGG1 DNA glycosylase and removed with additional BER repair activity to reconstitute the correctly paired G:C base pair. Successful, coordinated activity of these two glycosylases, along with a multitude of other BER proteins, prevents G:C to T:A transversions induced by oxidative damage. Importantly, MUTYH-associated polyposis (MAP) is associated with germline mutations in this BER glycosylase. The review of this BER pathway by Margherita Bignami and associates is a timely and thorough characterization of protein structure, molecular mechanism, and functional studies of MUTYH variants, including relevance of inactivation and of variants in colorectal cancer risk.

The repair of interstrand crosslinks (ICLs) produced by bifunctional alkylating agents and other anticancer drugs involves coordinated activity of specific proteins from multiple DNA repair pathways. Key roles have been identified that involve proteins from Nucleotide Excision Repair (NER) and Homologous Recombination (HR). Recent studies have also indicated a role for the BER pathway in mediating the cytotoxicity of ICLs. Steve Patrick and Anbarasi Kothandapani have contributed an in depth review that examines potential mechanisms and consequences of BER-mediated modulation of ICL repair.

The DNA mismatch repair (MMR) pathway in yeast and human cells has expanded significantly from the post-replication mutation avoidance system first characterized in bacteria. Dysfunctional human MMR is now firmly established as a causative mechanism for Hereditary Nonpolyposis Colon Cancer (HNPCC), and also for acquired resistance to monofunctional alkylating agents. The comprehensive review by Kandace Williams and colleagues focuses primarily on the DNA binding heterodimer MutSα (MSH2 + MSH6). This review explores current knowledge regarding heterodimer structure and function, molecular mechanisms of mismatch detection and binding, and cellular regulation. These authors also include a discussion of several non-canonical roles of this MMR heterodimer, such as potential functions of the N-terminal disordered domain of MSH6, and MMR-dependent DNA damage response induced by O 6 me-Guanine. Interactions with other DNA repair pathways such as BER, double-strand break (DSB) repair and ICL repair are reviewed, as well as roles for MMR during antibody diversity and trinucleotide repeat expansion.

Microsatellite instability (MSI) is observed as varying lengths of repeating microsatellite sequences throughout the genome. This phenomenon commonly develops because DNA replication occurs without coordinated proofreading by the MMR pathway, a fundamental requirement for high fidelity DNA replication. The degree of MSI at specific locations is used clinically for diagnosis of HNPCC and MMR deficient tumors. It has been discovered however, that mutation rates of individual microsatellite regions vary greatly, depending on motif size, sequence, and length, as well as the involvement of cellular pathways or events other than dysfunctional MMR. Kristin Eckert and colleagues present compelling evidence that specific di- and tetra-nucleotide forms of MSI found in sporadic cancers arise by distinct genetic mechanisms. The origins and pathological significance for these unique MSI events are explored in depth in this review.

Fanconi anemia (FA) is a rare recessive genetic disorder that derives from mutations in one of fifteen genes coding for proteins that are within the FA pathway. FA cells are highly sensitive to DNA cross-linking agents and reactive oxygen species. The most common clinical events associated with this disease are acute myeloid leukemia and bone marrow failure with squamous cell carcinoma (SCC) as an additional, extreme risk. Susanne Wells and colleagues review the mechanisms of action of the FA pathway as it contributes to cellular stress response, DNA repair and SCC sensitivity.

Bloom syndrome (BS) is another rare recessive disorder but results from loss of function of the recQ-like BLM helicase. This genetic disorder is characterized by short stature (amongst several other physical features), and predisposition to the development of cancer. Strikingly, cells from patients with BS exhibit genomic instability because of increased chromosomal recombination, including hyper-recombination of rDNA repeats. Samir Acharya and colleagues within Joanna Grodon’s laboratory have identified a direct interaction of DNA topoisomerase I with the C-terminus of BLM in the nucleolus. The results of this study suggest that BLM and DNA topoisomerase I coordinately modulate RNA:DNA hybrid formation.

The integrity of chromosomal DNA is maintained not only by high fidelity DNA replication and repair, but also by a complex cellular network that surveys for base lesions, broken strands, or blocked replication forks, known collectively as the DNA damage response (DDR). The exact biochemical nature of the DDR is dependent on several variables and may involve activation of cell cycle checkpoints, the initiation of DNA repair, as well as the onset of cellular senescence, cell death, and/or DNA damage tolerance (DDT). DDT involves recruitment of translesion DNA synthesis (TLS) polymerases, or lesion bypass polymerases, after DNA replication. The putative reason for this response is to increase genetic stability in the presence of DNA damage in replicating cells, but may instead result in increased mutagenesis. Christine Canman and co-authors review specialized functions of several lesion bypass polymerases with regard to specific DNA repair pathways, mutagenesis, and genetic stability. Complementary to the Canman review, Motoshi Suzuki and colleagues describe the potential involvement of core DNA replication proteins and replication steps that have been implicated in the process of carcinogenesis, including the RAD6-RAD18 TLS pathway. These authors also discuss specific mutations in several associated genes that have been identified in human cancer.

When considering the mechanisms of DNA repair, DNA replication and lesion bypass, the predominant DNA substrate for all of these processes is Watson-Crick B-form DNA. However, there are more than ten different types of ‘non-B’ DNA conformations and many play important roles in regulating gene expression, gene rearrangements and in dictating chromatin status. In their, review, Karen Vasquez and Guliang Wang focus on the interactions of DNA repair proteins with non-B DNA and their roles in genetic instability and advance the possibility that proteins and DNA involved in such interactions may represent plausible targets for selective therapeutic intervention.

Fidelity of DNA repair is critical, as DNA sequence alterations can contribute to an increase in gene mutations, gene dysfunction and cellular abnormalities. Homologous recombination (HR) is a highly error-free pathway involved in the repair of DNA double-strand breaks (DSBs). Unfortunately, familial breast cancers, among others, present with defects in HR-mediated repair and therefore require the cancer cell to use more error-prone pathways for DSB repair such as microhomology-mediated non-homologous end joining (mmNHEJ). Error-prone DSB repair mediated by the mmNHEJ pathway can promote the formation of somatic copy number variations (CNVs) that in turn can promote cancer formation and progression. In their review, Lisa Wiesmueller and colleagues define CNVs, describe mechanisms of CNV formation and detail the latest technologies for CNV detection and analysis. Further, they summarize the latest data on CNVs with regard to breast cancer susceptibility genes and breast cancer biology.

Another critical aspect of genome stability involves the maintenance of telomere length. Telomeres are DNA-protein structures composed of (TTAGGG)n sequence repeats in vertebrates that protect chromosome ends and prevent the eventual loss of coding DNA due to the end replication problem. Maintenance of telomere length occurs via expression of the enzyme telomerase or in a telomerase-independent manner termed alternative lengthening of telomeres (ALT). This review, by Joanna Groden and colleagues, summarizes recent clinical data and findings in mammalian cells that identify the genetic mutations permissive to ALT, the DNA repair proteins involved in ALT mechanisms and the importance of telomere maintenance mechanisms for tumor progression.

Model systems such as bacteria ( Escherichia coli ), yeast ( Saccharomyces cerevisiae , Schizosaccharomyces pombe ), the fruit fly ( Drosophila melanogaster ) and the nematode ( Caenorhabditis elegans ) have provided significant advances in our understanding of DNA repair, DNA lesion bypass and DDR mechanisms. Recently, zebrafish ( Danio rerio ) have emerged as a versatile model organism for the study of DNA repair and the DDR for both gene identification and discovery as well as a model for investigating the role of individual DNA repair genes in development, cancer formation and chemotherapy response. In this timely and thorough review, Phyllis Strauss discusses this model system, emphasizing that the zebrafish genome contains nearly all of the genes involved in the different DNA repair pathways in eukaryotes, including direct reversal (DR), MMR, nucleotide excision repair (NER), BER, HR, nonhomologous end joining (NHEJ) and TLS. Further, they introduce recent work on DNA damage and DNA repair studies in zebrafish, with special emphasis on the role of BER in zebrafish during early embryological development.

Pluripotent cells, organ-specific stem cells, somatic proliferating cells and post-mitotic cells are all subject to genotoxic insult. The resulting genomic DNA damage triggers a set of signaling events known collectively as the DDR, initiating DNA repair processes, facilitating cell cycle arrest and depending on the extent of damage, triggering the onset of cell death or senescence. However, emerging evidence indicates that DNA repair and the DDR function differently in different cellular contexts, with the expectation that stem cells are likely to address DNA damage differently from their somatic counterparts. In this extensive review, Eugenia Dogliotti and colleagues detail information on the common and distinct mechanisms controling genome integrity that are utilized by different cell types along the self-renewal/differentiation program, with special emphasis on their roles in the prevention of aging and disease.

The goal of this themed issue has been to provide a broad view of the many ways that genomic stability is controlled within the cell. Human DNA repair disorders are just beginning to be defined in depth, and already many patients with inherited and sporadic cancers involving DNA repair defects are now in clinical trials using targeted agents that can exploit these defects for selective or preferential response. What is very clear, however, is that an enormous amount of basic research remains to be accomplished to better understand the biochemistry and molecular biology of our DNA replication, DNA repair, and genome maintenance functions before fully realizing the many clinical opportunities on the horizon.

Mutation Research

mutation research journal

Subject Area and Category

  • Molecular Biology
  • Health, Toxicology and Mutagenesis

Elsevier B.V.

Publication type

00275107, 18792871

Information

How to publish in this journal

[email protected]

mutation research journal

The set of journals have been ranked according to their SJR and divided into four equal groups, four quartiles. Q1 (green) comprises the quarter of the journals with the highest values, Q2 (yellow) the second highest values, Q3 (orange) the third highest values and Q4 (red) the lowest values.

CategoryYearQuartile
Genetics1999Q4
Genetics2000Q3
Genetics2001Q3
Genetics2002Q4
Genetics2004Q2
Genetics2005Q2
Genetics2006Q2
Genetics2007Q4
Genetics2008Q4
Genetics2009Q4
Genetics2010Q4
Genetics2011Q4
Genetics2012Q4
Genetics2013Q4
Genetics2014Q4
Genetics2015Q3
Genetics2016Q3
Genetics2017Q4
Genetics2023Q3
Health, Toxicology and Mutagenesis1999Q4
Health, Toxicology and Mutagenesis2000Q2
Health, Toxicology and Mutagenesis2001Q2
Health, Toxicology and Mutagenesis2002Q3
Health, Toxicology and Mutagenesis2004Q1
Health, Toxicology and Mutagenesis2005Q1
Health, Toxicology and Mutagenesis2006Q1
Health, Toxicology and Mutagenesis2007Q3
Health, Toxicology and Mutagenesis2008Q4
Health, Toxicology and Mutagenesis2009Q4
Health, Toxicology and Mutagenesis2010Q4
Health, Toxicology and Mutagenesis2011Q4
Health, Toxicology and Mutagenesis2012Q3
Health, Toxicology and Mutagenesis2013Q4
Health, Toxicology and Mutagenesis2014Q3
Health, Toxicology and Mutagenesis2015Q2
Health, Toxicology and Mutagenesis2016Q3
Health, Toxicology and Mutagenesis2017Q4
Health, Toxicology and Mutagenesis2023Q2
Molecular Biology1999Q4
Molecular Biology2000Q4
Molecular Biology2001Q3
Molecular Biology2002Q4
Molecular Biology2004Q2
Molecular Biology2005Q2
Molecular Biology2006Q2
Molecular Biology2007Q4
Molecular Biology2008Q4
Molecular Biology2009Q4
Molecular Biology2010Q4
Molecular Biology2011Q4
Molecular Biology2012Q4
Molecular Biology2013Q4
Molecular Biology2014Q4
Molecular Biology2015Q3
Molecular Biology2016Q4
Molecular Biology2017Q4
Molecular Biology2023Q3

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

YearSJR
19990.142
20000.461
20010.564
20020.346
20041.507
20051.572
20061.494
20070.320
20080.105
20090.105
20100.104
20110.203
20120.397
20130.248
20140.384
20150.705
20160.561
20170.111
20230.699

Evolution of the number of published documents. All types of documents are considered, including citable and non citable documents.

YearDocuments
1999268
2000174
2001211
2002175
2003162
2004238
2005254
2006189
2007176
2008167
2009162
2010155
2011125
2012124
201368
201495
2015121
201654
201743
201841
201922
202033
202119
202224
202321

This indicator counts the number of citations received by documents from a journal and divides them by the total number of documents published in that journal. The chart shows the evolution of the average number of times documents published in a journal in the past two, three and four years have been cited in the current year. The two years line is equivalent to journal impact factor ™ (Thomson Reuters) metric.

Cites per documentYearValue
Cites / Doc. (4 years)19992.237
Cites / Doc. (4 years)20002.503
Cites / Doc. (4 years)20012.558
Cites / Doc. (4 years)20022.886
Cites / Doc. (4 years)20033.240
Cites / Doc. (4 years)20043.938
Cites / Doc. (4 years)20054.191
Cites / Doc. (4 years)20064.821
Cites / Doc. (4 years)20074.665
Cites / Doc. (4 years)20084.356
Cites / Doc. (4 years)20094.228
Cites / Doc. (4 years)20103.788
Cites / Doc. (4 years)20113.865
Cites / Doc. (4 years)20124.182
Cites / Doc. (4 years)20134.035
Cites / Doc. (4 years)20144.278
Cites / Doc. (4 years)20154.260
Cites / Doc. (4 years)20163.140
Cites / Doc. (4 years)20172.843
Cites / Doc. (4 years)20182.706
Cites / Doc. (4 years)20192.587
Cites / Doc. (4 years)20202.419
Cites / Doc. (4 years)20212.964
Cites / Doc. (4 years)20222.174
Cites / Doc. (4 years)20232.020
Cites / Doc. (3 years)19992.237
Cites / Doc. (3 years)20002.565
Cites / Doc. (3 years)20012.592
Cites / Doc. (3 years)20023.064
Cites / Doc. (3 years)20033.611
Cites / Doc. (3 years)20044.086
Cites / Doc. (3 years)20054.103
Cites / Doc. (3 years)20065.054
Cites / Doc. (3 years)20074.216
Cites / Doc. (3 years)20084.549
Cites / Doc. (3 years)20093.547
Cites / Doc. (3 years)20104.135
Cites / Doc. (3 years)20113.655
Cites / Doc. (3 years)20124.224
Cites / Doc. (3 years)20134.260
Cites / Doc. (3 years)20144.874
Cites / Doc. (3 years)20153.512
Cites / Doc. (3 years)20162.725
Cites / Doc. (3 years)20172.748
Cites / Doc. (3 years)20182.592
Cites / Doc. (3 years)20192.558
Cites / Doc. (3 years)20202.462
Cites / Doc. (3 years)20213.188
Cites / Doc. (3 years)20222.122
Cites / Doc. (3 years)20232.145
Cites / Doc. (2 years)19992.117
Cites / Doc. (2 years)20002.403
Cites / Doc. (2 years)20012.566
Cites / Doc. (2 years)20023.294
Cites / Doc. (2 years)20033.526
Cites / Doc. (2 years)20043.852
Cites / Doc. (2 years)20054.045
Cites / Doc. (2 years)20064.524
Cites / Doc. (2 years)20074.454
Cites / Doc. (2 years)20083.378
Cites / Doc. (2 years)20093.773
Cites / Doc. (2 years)20103.626
Cites / Doc. (2 years)20113.233
Cites / Doc. (2 years)20124.443
Cites / Doc. (2 years)20135.036
Cites / Doc. (2 years)20143.911
Cites / Doc. (2 years)20152.871
Cites / Doc. (2 years)20162.407
Cites / Doc. (2 years)20172.566
Cites / Doc. (2 years)20182.186
Cites / Doc. (2 years)20192.512
Cites / Doc. (2 years)20202.206
Cites / Doc. (2 years)20213.200
Cites / Doc. (2 years)20222.154
Cites / Doc. (2 years)20231.442

Evolution of the total number of citations and journal's self-citations received by a journal's published documents during the three previous years. Journal Self-citation is defined as the number of citation from a journal citing article to articles published by the same journal.

CitesYearValue
Self Cites1999218
Self Cites2000151
Self Cites2001146
Self Cites2002101
Self Cites200392
Self Cites2004125
Self Cites2005112
Self Cites2006130
Self Cites2007103
Self Cites2008104
Self Cites200994
Self Cites201077
Self Cites201184
Self Cites201264
Self Cites201330
Self Cites201432
Self Cites201534
Self Cites201615
Self Cites201718
Self Cites20188
Self Cites20193
Self Cites20202
Self Cites20212
Self Cites20223
Self Cites20231
Total Cites19991966
Total Cites20002216
Total Cites20012027
Total Cites20022001
Total Cites20032022
Total Cites20042239
Total Cites20052359
Total Cites20063305
Total Cites20072871
Total Cites20082816
Total Cites20091887
Total Cites20102088
Total Cites20111769
Total Cites20121867
Total Cites20131721
Total Cites20141545
Total Cites20151008
Total Cites2016774
Total Cites2017742
Total Cites2018565
Total Cites2019353
Total Cites2020261
Total Cites2021306
Total Cites2022157
Total Cites2023163

Evolution of the number of total citation per document and external citation per document (i.e. journal self-citations removed) received by a journal's published documents during the three previous years. External citations are calculated by subtracting the number of self-citations from the total number of citations received by the journal’s documents.

CitesYearValue
External Cites per document19991.989
External Cites per document20002.390
External Cites per document20012.405
External Cites per document20022.910
External Cites per document20033.446
External Cites per document20043.858
External Cites per document20053.908
External Cites per document20064.855
External Cites per document20074.065
External Cites per document20084.381
External Cites per document20093.370
External Cites per document20103.982
External Cites per document20113.481
External Cites per document20124.079
External Cites per document20134.186
External Cites per document20144.773
External Cites per document20153.394
External Cites per document20162.673
External Cites per document20172.681
External Cites per document20182.555
External Cites per document20192.536
External Cites per document20202.443
External Cites per document20213.167
External Cites per document20222.081
External Cites per document20232.132
Cites per document19992.237
Cites per document20002.565
Cites per document20012.592
Cites per document20023.064
Cites per document20033.611
Cites per document20044.086
Cites per document20054.103
Cites per document20065.054
Cites per document20074.216
Cites per document20084.549
Cites per document20093.547
Cites per document20104.135
Cites per document20113.655
Cites per document20124.224
Cites per document20134.260
Cites per document20144.874
Cites per document20153.512
Cites per document20162.725
Cites per document20172.748
Cites per document20182.592
Cites per document20192.558
Cites per document20202.462
Cites per document20213.188
Cites per document20222.122
Cites per document20232.145

International Collaboration accounts for the articles that have been produced by researchers from several countries. The chart shows the ratio of a journal's documents signed by researchers from more than one country; that is including more than one country address.

YearInternational Collaboration
199916.42
200018.97
200118.96
200213.71
200321.60
200420.59
200520.08
200623.28
200728.98
200822.16
200922.22
201026.45
201136.80
201227.42
201329.41
201428.42
201533.06
201616.67
201720.93
201821.95
201913.64
202027.27
202121.05
202216.67
20234.76

Not every article in a journal is considered primary research and therefore "citable", this chart shows the ratio of a journal's articles including substantial research (research articles, conference papers and reviews) in three year windows vs. those documents other than research articles, reviews and conference papers.

DocumentsYearValue
Non-citable documents199919
Non-citable documents200022
Non-citable documents200124
Non-citable documents200219
Non-citable documents200319
Non-citable documents200419
Non-citable documents200521
Non-citable documents200627
Non-citable documents200727
Non-citable documents200824
Non-citable documents200913
Non-citable documents201011
Non-citable documents201114
Non-citable documents201214
Non-citable documents201317
Non-citable documents201411
Non-citable documents201510
Non-citable documents20166
Non-citable documents20175
Non-citable documents20185
Non-citable documents20194
Non-citable documents20203
Non-citable documents20214
Non-citable documents20223
Non-citable documents20233
Citable documents1999860
Citable documents2000842
Citable documents2001758
Citable documents2002634
Citable documents2003541
Citable documents2004529
Citable documents2005554
Citable documents2006627
Citable documents2007654
Citable documents2008595
Citable documents2009519
Citable documents2010494
Citable documents2011470
Citable documents2012428
Citable documents2013387
Citable documents2014306
Citable documents2015277
Citable documents2016278
Citable documents2017265
Citable documents2018213
Citable documents2019134
Citable documents2020103
Citable documents202192
Citable documents202271
Citable documents202373

Ratio of a journal's items, grouped in three years windows, that have been cited at least once vs. those not cited during the following year.

DocumentsYearValue
Uncited documents1999267
Uncited documents2000246
Uncited documents2001221
Uncited documents2002191
Uncited documents2003147
Uncited documents2004123
Uncited documents2005112
Uncited documents2006122
Uncited documents2007166
Uncited documents2008121
Uncited documents2009116
Uncited documents201073
Uncited documents201188
Uncited documents201279
Uncited documents201384
Uncited documents201452
Uncited documents201546
Uncited documents201661
Uncited documents201752
Uncited documents201855
Uncited documents201934
Uncited documents202035
Uncited documents202125
Uncited documents202224
Uncited documents202323
Cited documents1999612
Cited documents2000618
Cited documents2001561
Cited documents2002462
Cited documents2003413
Cited documents2004425
Cited documents2005463
Cited documents2006532
Cited documents2007515
Cited documents2008498
Cited documents2009416
Cited documents2010432
Cited documents2011396
Cited documents2012363
Cited documents2013320
Cited documents2014265
Cited documents2015241
Cited documents2016223
Cited documents2017218
Cited documents2018163
Cited documents2019104
Cited documents202071
Cited documents202171
Cited documents202250
Cited documents202353

Evolution of the percentage of female authors.

YearFemale Percent
199935.47
200030.94
200134.88
200236.72
200338.82
200441.47
200537.95
200641.31
200741.50
200842.69
200945.82
201045.31
201143.13
201248.12
201347.81
201443.86
201545.27
201652.36
201744.83
201846.50
201943.88
202043.22
202142.22
202238.84
202332.76

Evolution of the number of documents cited by public policy documents according to Overton database.

DocumentsYearValue
Overton19990
Overton20000
Overton20010
Overton20020
Overton200312
Overton200451
Overton200550
Overton200640
Overton200751
Overton200826
Overton200925
Overton201020
Overton201110
Overton201210
Overton201311
Overton201412
Overton201511
Overton20164
Overton20175
Overton20182
Overton20190
Overton20200
Overton20211
Overton20220
Overton20230

Evoution of the number of documents related to Sustainable Development Goals defined by United Nations. Available from 2018 onwards.

DocumentsYearValue
SDG201811
SDG201915
SDG202016
SDG20218
SDG202216
SDG202314

Scimago Journal & Country Rank

Leave a comment

Name * Required

Email (will not be published) * Required

* Required Cancel

The users of Scimago Journal & Country Rank have the possibility to dialogue through comments linked to a specific journal. The purpose is to have a forum in which general doubts about the processes of publication in the journal, experiences and other issues derived from the publication of papers are resolved. For topics on particular articles, maintain the dialogue through the usual channels with your editor.

Scimago Lab

Follow us on @ScimagoJR Scimago Lab , Copyright 2007-2024. Data Source: Scopus®

mutation research journal

Cookie settings

Cookie Policy

Legal Notice

Privacy Policy

Mutation Research: Reviews in Mutation Research

Volume 2 • Issue 2

  • ISSN: 1383-5742
  • 5 Year impact factor: 6.4
  • Impact factor: 6.4
  • Journal metrics

The subject areas of Mutation Research - Reviews in Mutation Research (MRR) encompass the entire spectrum of the science of mutation research and its applications, with part… Read more

Mutation Research: Reviews in Mutation Research

Subscription options

Institutional subscription on sciencedirect.

The subject areas of Mutation Research - Reviews in Mutation Research (MRR) encompass the entire spectrum of the science of mutation research and its applications, with particular emphasis on the relationship between mutation and disease. Thus, this section will cover:

Advances in human genome research (including evolving technologies for mutation detection and functional genomics) with applications in clinical genetics, gene therapy and health risk assessment for environmental agents of concern

Genetic toxicology and environmental mutagenesis (including the factors that modulate the genetic activity of environmental agents) will continue to be prominent topics in this section.

MRR supports and follows the general direction proposed by all major societies in the field part of the International Association of Environmental Mutagenesis and Genomics Societies (IAEMGS):

Asociacion Latinoamericana de Mutagenesis, Carcinogenesis y Teratogenesis Ambiental (ALAMCTA)

Brazilian Association of Mutagenesis and Environmental Genomics (MutaGen-Brasil)

Chinese Environmental Mutagen Society (CEMS)

European Environmental Mutagenesis and Genomics Society (EEMGS)

Environmental Mutagenesis and Genomics Society (EMGS)

Environmental Mutagen Society of India (EMS India)

Iranian Environmental Mutagen Society (IrEMS)

The Japanese Environmental Mutagen Society (JEMS)

Korean Environmental Mutagen Society (KEMS)

Molecular and Experimental Pathology Society of Australasia (MEPSA)

Pan-African Environmental Mutagen Society (PAEMS)

Philippines Environmental Mutagen Society (PEMS)

Thai Environmental Mutagen Society (TEMS)

Other Mutation Research sections: DNA Repair Mutation Research - Fundamental and Molecular Mechanisms of Mutagenesis (MR) Mutation Research - Genetic Toxicology and Environmental Mutagenesis (MRGTEM)

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why Submit?
  • About Journal of Language Evolution
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

1. introduction, 2. material and methods, 4. discussion, acknowledgements, data availability, bayesian phylogenetic analysis of pitch-accent systems based on accentual class merger: a new method applied to japanese dialects.

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Takuya Takahashi, Ayaka Onohara, Yasuo Ihara, Bayesian phylogenetic analysis of pitch-accent systems based on accentual class merger: a new method applied to Japanese dialects, Journal of Language Evolution , 2024;, lzae004, https://doi.org/10.1093/jole/lzae004

  • Permissions Icon Permissions

Unlike studies of the evolutionary relationship between languages, the dialect-level variation within a language has seldom been studied within the framework of a phylogenetic tree, because frequent lexical borrowing muddles the evidence of shared ancestry. The phonological history of Japanese is an exceptional case study where the phenomenon called accentual class merger enables the phylogenetic analysis of dialectal pitch-accent systems in a way that is not subject to borrowing. However, previous studies have lacked statistical analysis and failed to evaluate the relative credence of alternative hypotheses. Here we developed a novel substitution model that describes the mutation of pitch-accent systems driven by accentual class merger and integrated the model into the framework of Bayesian phylogenetic inference with geographical diffusion. Applying the method to data on the pitch-accent variation in modern Japanese dialects and historical documents collected from literature, we reconstructed the evolutionary history and spatial diffusion of pitch-accent systems. Our result supports the monophyly of each of three groups of pitch-accent systems in conventional categorization, namely Tokyo type, Keihan type, and N-kei (N-pattern) type of Kyushu, whereas the monophyly of the Tokyo type has been highly controversial in previous studies. The divergence time of the mainland pitch-accent systems was estimated to be from mid-Kofun to early Heian period. Also, it is suggested that the modern Kyoto dialect did not inherit its accent patterns from Bumoki but from an unrecorded lineage which survived from the Muromachi period. Analyses on geographical diffusion suggest that the most recent common ancestor (MRCA) of all the taxa and that of Keihan type were located in or around the Kinki region, whereas the MRCA of N-kei type was located in northern to central Kyushu. The geographical location of the MRCA of Tokyo type remains unclear, but the Kinki and Kanto regions are the most plausible candidates.

Languages display an enormous diversity in terms of many linguistic features, such as lexical, morphological, and syntactic elements. Based on the variation of these features in modern languages, historical linguists have attempted to trace the evolutionary history of languages. Assuming a tree structure to represent the evolutionary history, many attempts have been taken to infer the phylogenetic tree of language families, such as Indo-European ( Bouckaert et al., 2012 ), Bantu ( Currie et al., 2013 ; Koile et al., 2022 ), Austronesian ( Gray et al., 2009 ), and Japonic ( Lee and Hasegawa 2011 ; Saitou and Jinam 2017 ).

Unlike the variation between languages, the dialect-level variation within a single language has been studied without the concept of the phylogenetic tree. Dialectology is known for the slogan ‘every word has its own history’, because dialect words are quite frequently borrowed, driven by human contacts, between different dialects. The frequent contact makes it impossible to assume a single phylogenetic tree of dialects that is shared by multiple different words or other linguistic features. Hence, dialectologists do not usually accept tree-thinking and alternatively perform admixture analysis ( Romano et al., 2022 ), distance-based approach ( Huisman et al., 2019 ; Jeszenvszky et al., 2019 ; Nerbonne 2010 ), or phylogenetic network analysis ( List et al., 2014 ). Under such models, however, it is impossible to define the divergence time.

A possible exception is the pitch-accent system in Japanese dialects, whose evolutionary history has been studied on the basis of a tree model ( Kindaichi 1975a ; Tokugawa 1962 ). Generally, phonological features such as pitch accents are thought to be relatively unaffected by contact between populations, because they are acquired at early stages of development ( Sato et al., 2010 ). Moreover, since even a small change in a phonological feature could result in a simultaneous change in the pronunciation of multiple words, borrowing is not expected to occur frequently. On top of that, in the Japanese language, as well as its close relative Ryukyuan, words are distinguished by their accent patterns , or the voice pitch given to each of its mora or syllable. As argued below, this offers a unique opportunity to detect a strong signal of descent with modification, making phylogenetic reconstruction of Japanese pitch accents an intriguing case study where we can retrace the past of dialects with a timed phylogenetic tree.

To illustrate, consider two Japanese words that are distinguished only by their accent patterns, one of which means ‘rain’ and the other ‘candy’ with the same bimoraic (two-mora) sound ‘ame’ ( a-me ). In the Tokyo dialect, the one for rain is pronounced in the high-low (HL) manner, that is, the first and second morae are pronounced in high and low pitches, respectively. On the other hand, the one for candy is pronounced in the low-high (LH) manner. Japanese dialects vary in the pitch-accent system , which defines the mapping between accent patterns and words within each dialect, so that there is enormous variety over mainland Japan in how words are pronounced. Indeed, by contrast with the Tokyo dialect, the Kyoto dialect associates ‘ame’ for rain and ‘ame’ for candy with the LF and HH accent patterns, respectively, where F denotes a falling pitch ( Shibatani 1990 ).

Here, this article aims to elucidate the evolutionary history of the pitch-accent systems of Japanese dialects, so we first review how linguists have conventionally inferred the phylogenetic relationship among different pitch-accent systems. First of all, a comparison of accent patterns among modern dialects and historical documents has identified accentual classes of words. Accentual classes are groups of words that are inferred to have had distinct accent patterns in the common ancestor of mainland Japanese dialects. Accentual classes are defined for each subset of words with the same number of morae and the same part of speech; for example, for bimoraic nouns, five accentual classes are distinguished. In many modern dialects, a majority of the words belonging to the same accentual class have the same accent pattern, conserving the grouping of the common ancestor, even though the specific accent patterns associated with a given accentual class may vary from dialect to dialect ( Table 1 ). Meanwhile, some dialects have lost the distinction of two or more accentual classes and pronounce them with the same accent pattern. In such cases, the accentual classes are said to be merged. As for bimoraic nouns followed by a particle ‘ga’, for instance, the Tokyo dialect merges the second and third classes shown in Table 1 , so that both are pronounced with LHL accent pattern; similarly, the fourth and fifth classes are merged into HLL. In this way, the pitch-accent system of each dialect is characterized by the way it merges accentual classes (hereafter the merger state ). In this article, the merger state of a pitch-accent system is represented using slash marks; for the case of bimoraic nouns, the pitch-accent system of the Tokyo dialect is represented by 1/23/45. Also, the set of merged classes is referred to as either merged block or simply block , so that the Tokyo dialect has three merged blocks ‘1’, ‘23’, and ‘45’, with regard to bimoraic nouns. Fig. 1 shows the geographic distribution of the merger states over mainland Japan for bimoraic and monomoraic nouns.

Examples of accent patterns for the five accentual classes of bimoraic (two-mora) nouns in four dialects ( Hirayama 1960 ). The accent patterns of bimoraic nouns followed by a particle ‘ga’ are shown, because some accent patterns are distinguished by the pitch drop right after the noun.

ExamplesDialects
TokyoKyotoKagoshimaOita
Class 1ka-ze-ga / to-ri-gaLHHHHHLHLLHH
Class 2u-ta-ga / o-to-gaLHLHLLLHLLHH
Class 3a-shi-ga / i-ke-gaLHLHLLLLHLHL
Class 4i-ki-ga / i-to-gaHLLLLHLLHHLL
Class 5a-ki-ga / ko-e-gaHLLLHLLLHHLL
Merger state
ExamplesDialects
TokyoKyotoKagoshimaOita
Class 1ka-ze-ga / to-ri-gaLHHHHHLHLLHH
Class 2u-ta-ga / o-to-gaLHLHLLLHLLHH
Class 3a-shi-ga / i-ke-gaLHLHLLLLHLHL
Class 4i-ki-ga / i-to-gaHLLLLHLLHHLL
Class 5a-ki-ga / ko-e-gaHLLLHLLLHHLL
Merger state

Geographical distribution of accent systems over mainland Japan. The legend shows the merger state of the five accentual classes of bimoraic nouns and the three accentual classes of monomoraic nouns. ‘OCV’: accent pattern differs for open and close vowels. ‘K’: similar to the Keihan-type accent. ‘up’: accent patterns are distinguished by the position of pitch-rise. ‘NT’ sub-central-type accent without tone. This map was created based on the geographical distribution of accent systems described in literature (Hirayama 1951, 1957, 1969; Kindaichi 1966a, 2001; Long et al., 2008; Matsukura 2014; Matsukura and Nitta 2016; Nitta 2012; Okumura 1976; Sato 1983, 1988; Uwano 1985a, b, 1987; Yamaguchi 1984, 2003) and the base map in Database of Global Administrative Areas (2015) and Ministry of Land, Infrastructure, Transport and Tourism (2020).

Geographical distribution of accent systems over mainland Japan. The legend shows the merger state of the five accentual classes of bimoraic nouns and the three accentual classes of monomoraic nouns. ‘OCV’: accent pattern differs for open and close vowels. ‘K’: similar to the Keihan-type accent. ‘up’: accent patterns are distinguished by the position of pitch-rise. ‘NT’ sub-central-type accent without tone. This map was created based on the geographical distribution of accent systems described in literature ( Hirayama 1951 , 1957 , 1969 ; Kindaichi 1966a , 2001 ; Long et al., 2008 ; Matsukura 2014 ; Matsukura and Nitta 2016 ; Nitta 2012 ; Okumura 1976 ; Sato 1983 , 1988 ; Uwano 1985a , b , 1987 ; Yamaguchi 1984 , 2003 ) and the base map in Database of Global Administrative Areas (2015) and Ministry of Land, Infrastructure, Transport and Tourism (2020) .

By definition of accentual class, the common ancestor of the pitch-accent systems of mainland Japan is believed to have had the merger state 1/2/3/4/5. Indeed, this merger state is found in a historical document Ruiju Myogisho, indicating that all the accentual classes were distinguished in Kyoto in the late Heian period (794–1185). Dialectologists have argued that the present pitch-accent systems were formed after each lineage independently underwent the merger of accentual classes. In contrast with the merger of multiple classes, some modern dialects split a single class and assign two different accent patterns to its component words, depending on whether the word has an open or closed vowel. However, it must be noted that a merged block will not be separated into the original classes once they are merged. This is because, recalling that the set of words included in each accentual class is constant over mainland Japan, a split of once merged accentual classes would imply independent emergence of identical groupings of words, which is highly unlikely. This irreversibility of the merger event helps phylogenetic inference.

The Japanese linguist Haruhiko Kindaichi proposed a phylogenetic relationship between the pitch-accent systems of Japanese dialects based on the variation of the merger state. Japanese pitch-accent systems had been conventionally categorized into Tokyo type, Keihan type, N-kei type, and accentless type ( Table 2 ) ( Shibatani 1990 ). A puzzling observation was that while the Keihan type occupied the Kyoto-Osaka area of central mainland Japan, the Tokyo type was distributed in both east and west sides. Thus, it was unclear whether or not the two geographical clusters of Tokyo type had independent origins. Kindaichi considered the pitch-accent system of the Kyoto dialect in the Muromachi period (1336–1573) as 1/23/4/5 (for bimoraic nouns), based on another historical material Bumoki, and hypothesized that Churin type (1/23/45), a subtype of Tokyo type seen in Tokyo (east) and Hiroshima (west), had been derived from the pitch-accent system of ancient Kyoto (center) as recorded in Bumoki, through a merger event of classes 4 and 5 ( Kindaichi 1942 , 1975a ) ( Fig. 2a ). On the other hand, Gairin type (12/3/45), another subtype of Tokyo type, was posited not to be derived from the ancient Kyoto dialect recorded in Bumoki because the split of classes 2 and 3 is unlikely ( Kindaichi 1978 ).

Tree taxon dialects. The ‘type’ and ‘subtype’ columns display the conventional classification used in linguistics. Although Keihan type is a somewhat ambiguous term encompassing different pitch-accent systems depending on literatures, here we use it in the broader sense, including the Tarui and Sanuki types ( Matsumori et al., 2012 ; Uwano 1985b , 1987 ).

Dialects
(Tree taxa)
TypeSubtypeMerger state
(Bimoraic nouns)
Data Source
TokyoTokyoChurin
Hiroshima
TotsukawaNairin
HirosakiGairin , , ;
Oita
KyotoKeihanCentral
TaruiTarui (C type) ;
Kan’onjiSanuki ;
IbukijimaIbukijima ,
BumokiBumoki
MyogishoMyogisho
MiyakonojoN-kei1-kei
Kagoshima2-kei
Nagasaki ;
Gonza’s materials
Dialects
(Tree taxa)
TypeSubtypeMerger state
(Bimoraic nouns)
Data Source
TokyoTokyoChurin
Hiroshima
TotsukawaNairin
HirosakiGairin , , ;
Oita
KyotoKeihanCentral
TaruiTarui (C type) ;
Kan’onjiSanuki ;
IbukijimaIbukijima ,
BumokiBumoki
MyogishoMyogisho
MiyakonojoN-kei1-kei
Kagoshima2-kei
Nagasaki ;
Gonza’s materials

Phylogenetic relationship of pitch-accent systems hypothesized in previous studies. (a) Kindaichi’s hypothesis (Kindaichi 1964, 1966a, 1967, 1975a, b, 1978). (b) Tokugawa’s hypothesis (Tokugawa 1962). Some modern dialects are assigned to internal nodes instead of leaves because the two linguists both posit, unlike standard comparative linguistics, that one pitch-accent is derived from another, instead of hypothesizing a common ancestral pitch-accent. Also, note that both panels assign 14/235 (called B type) as the merger state of Tarui, instead of 14/23/5 (called C type) used in our dataset.

Phylogenetic relationship of pitch-accent systems hypothesized in previous studies. (a) Kindaichi’s hypothesis ( Kindaichi 1964 , 1966a , 1967 , 1975a , b , 1978 ). (b) Tokugawa’s hypothesis ( Tokugawa 1962 ). Some modern dialects are assigned to internal nodes instead of leaves because the two linguists both posit, unlike standard comparative linguistics, that one pitch-accent is derived from another, instead of hypothesizing a common ancestral pitch-accent. Also, note that both panels assign 14 / 235 (called B type) as the merger state of Tarui, instead of 14 / 23 / 5 (called C type) used in our dataset.

Although Kindaichi’s hypothesis is famous and deemed prevailing, it has been exposed to various criticisms from different perspectives. First, Hattori (1985) points out that the hypothesis does not follow the standard comparative methods, as it claimed that Churin type had evolved from Keihan (central) type, instead of reconstructing an ancient pitch-accent system which is ancestral to both Churin and central types. Tokugawa (1962) posited two lineages with the merger state 1/2/3/45, each including both Churin and Gairin type dialects ( Fig. 2b ), which sharply contradicts Kindaichi’s hypothesis. Moreover, Uwano (2006) even challenged the assumption that the common ancestor of the mainland pitch-accent systems had the same number of accentual classes that are seen in Ruiju Myogisho. Uwano hypothesized that the common ancestor had a more complicated pitch-accent system and that more accentual classes, which are no longer found in the modern dialects or written records, were present. So far, no consensus has been established as to the phylogenetic tree of Japanese pitch-accent systems.

Nevertheless, the merger of accentual classes is an indispensable cue in reconstructing the phonetic phylogeny of Japanese dialects. An advantage of analyzing accentual classes is that we can infer the phylogenetic relationship of different pitch-accent systems in a way that is not subject to borrowing. While the accent pattern for individual words may be borrowed among dialects, substituting the representative accent pattern of a class means all their component words simultaneously change their pronunciation, which is not likely to happen by borrowing.

However, previous arguments about the phylogeny of Japanese pitch-accent systems based on the merger state have been made with little quantitative scrutiny. In other words, while linguists have been able to propose plausible hypotheses, there has been little statistical evaluation of their relative credence. Perhaps that is a reason why the conventional studies of phylogenetic reconstruction have not reached a consensus. In hope of improving the situation, this study presents a statistically grounded method to infer the phylogenetic tree of the pitch-accent systems of modern Japanese dialects based on the principle of the accentual class merger.

Statistical methods for the inference of phylogenetic trees were originally developed in the field of evolutionary biology. These methods are based on a dataset of the variation of morphological characters or nucleotide sequences in present-day species (i.e. terminal taxa) and take into account how the characters or sequences have evolved on the tree branches. Subsequently, linguists applied the same approach to infer phylogenetic trees of languages under the premise that linguistic traits are inherited from generation to generation, like the genetic traits transmitted from parents to offspring. The statistical models have been applied to infer the phylogenetic relationship within a language family ( Bouckaert et al., 2012 ; Currie et al., 2013 ; Gray et al., 2009 ; Koile et al., 2022 ; Lee and Hasegawa 2011 ; Saitou and Jinam 2017 ), replacement rates among cognate groups ( Pagel and Meade 2017 ; Pagel et al., 2007 ), and unknown locations of ancestral languages ( Bouckaert et al., 2012 ). On the technical aspects, besides many algorithms for phylogenetic analysis, such as maximum parsimony, neighbor joining ( Saitou and Nei 1987 ), and maximum likelihood analysis, a growing body of literature employs Bayesian statistics for the reconstruction of linguistic phylogeny ( Hoffman et al., 2021 ). One of the advantages of Bayesian phylogenetic analysis is the ease with which the model can be extended; previous studies have presented models that take into account various phenomena such as spatial diffusion ( Lemey et al., 2008 , 2010 ; Takahashi and Ihara 2023 ) and borrowing ( Neureiter et al., 2022 ).

In this article, to infer the evolutionary history of pitch-accent systems in modern Japanese dialects, we will perform a Bayesian phylogenetic analysis on data of accent patterns assigned to accentual classes. In performing Bayesian phylogenetic analysis, we compute the probability of observing the known data given a phylogenetic tree (likelihood), based on a substitution model representing how the linguistic features change. A major obstacle to be overcome is that most of the substitution models previously proposed for linguistic phylogeny were designed for lexical data (reviewed in Hoffmann et al., 2021 ), and that there is no standard method for the phylogenetic analysis of pitch accent. In particular, although the merger of accentual classes appears to be the key to resolve the phylogeny of Japanese pitch-accent systems, to our knowledge, no previous models have taken this process into account. Most previous studies in linguistic phylogeny convert features of languages into binary characters, namely the presence/absence of cognates or grammatical features, and represent the transition among the feature states as a Markov model. Obviously, if we should extract binary features of pitch-accent systems and assume they evolve independently, the model would miss the evolutionary signal of accentual classes.

We will thus develop a novel model, in which pitch-accent systems evolve on tree branches via the merger of accentual classes and the substitution of accent patterns assigned to each class. The model we will use in this study considers two phenomena for statistical inference: geographical (spatial) diffusion and mutation. On the one hand, to represent the diffusion, our model will be based on the general Bayesian framework described in Takahashi and Ihara (2023) , which treats the evolutionary dynamics in a network through a discrete-time model. This framework computes the tree prior based on a weighted adjacency matrix, representing the rate of cultural transmission among multiple populations. On the other hand, to represent the mutation of pitch-accent systems, we will model the merger of accentual class and the replacement of accent patterns of each class.

2.1. Mesh data with demographic information

We use the Bayesian framework described in Takahashi and Ihara (2023) , which regards the space as a network of n discrete nodes that are connected with each other by weighted edges representing the spatial transmission between them.

To apply this framework to the geography of mainland Japan, we assign lattice sites every 0.25 degrees of longitude and every 0.167 degrees of latitude, which gives a grid network consisting of square cells with approximately 20 km for each side. Note that we exclude from our analysis the Ryukyuan language, whose pitch-accent systems display no clear correspondence with the accentual classes of the mainland dialects, and Hokkaido, which has experienced an extensive immigration from the mainland toward the end of the 19th century. The grid is composed of 691 square cells denoted by P 1 ⋯ P n (i.e. n = 691 ⁠ ).

As we will see in later subsections, the transmission of pitch-accent systems in space is modeled with a function of the geographical distance and population size. To assign a population estimate of the Heian period to each of the 691 cells, we use the population size of each province (ryoseikoku), the ancient administrative unit established in the Asuka period (592–710), estimated by Hattori (1959) as the source data. Since a province occupied a far wider area than a 20 km × 20 km, we calibrate the estimates based on the variation of the present-day population size, surveyed at the level of 1 × 1 km square in 1995 ( e-Stat. Portal Site of Official Statistics of Japan 2016 ) to improve the resolution. See electronic supplementary material for further details about the calibration of population sizes.

2.2. Data of accent patterns

2.2.1. locations of taxon dialects and studied word categories.

We investigate the phylogenetic relationship of the pitch-accent systems of 15 dialects spoken at 12 out of 691 cells in mainland Japan ( Figure 3 ) based on the accent patterns for the following six word categories: monomoraic nouns (denoted as C 1 ⁠ ), bimoraic nouns ( ⁠ C 2 ⁠ ), bimoraic verbs ( ⁠ C 3 ⁠ ), trimoraic godan verbs ( ⁠ C 4 ⁠ ), trimoraic ichidan verbs ( ⁠ C 5 ⁠ ), and trimoraic adjectives ( ⁠ C 6 ⁠ ). Note that ‘godan’ and ‘ichidan’ refer to conjugation types of verbs. Thus, in our framework, each dialect is characterized by a pitch-accent system that specifies the mappings between accent patterns (i.e. sounds) and accentual classes (i.e. words) for each of the six word categories. Let l ( i ) denote the number of accentual classes of C i   ( 1 ≤ i ≤ 6 ) ⁠ . Previous studies have identified accentual classes for these word categories, which gives l ( 1 ) = 3 ⁠ , l ( 2 ) = 5 (i.e. the five classes of bimoraic nouns mentioned in section 1), l ( 3 ) = 2 ⁠ , l ( 4 ) = 3 ⁠ , l ( 5 ) = 2 ⁠ , and l ( 6 ) = 2 ⁠ .

Locations of the modern dialects analyzed in this study. This figure was created based on the base map in Geospatial Information Authority of Japan (2006).

Locations of the modern dialects analyzed in this study. This figure was created based on the base map in Geospatial Information Authority of Japan (2006) .

2.2.2. Assigning accent patterns from literature

We collect from the existing literature the data on the accent patterns of different accentual classes for each word category in each dialect. Although most of the words belonging to the same accentual class have the same accent pattern, the literature indicates that some words do not have the representative accent pattern of the class, probably because their accent patterns have either mutated or been borrowed individually. Ignoring such exceptions, we regard the accent pattern assigned to the majority of a class’s component words as the accent pattern of the class. Also, some literature already picked up a representative accent pattern for each class, in which case we simply used the accent pattern. The accent patterns are coded using five characters H, L, M, F, and R, which respectively represent high, low, middle, falling, and rising pitches assigned to each mora. For nouns, since some accent patterns are distinguished by the presence/absence of the pitch drop placed right after the word, we code the accent pattern of words as they are pronounced with a monomoraic postpositional particle, such as ‘ga’ or ‘mo’. From these, the merger state of each accent system for each word category is determined. Table 2 summarizes the conventional types and subtypes proposed by dialectologists, the merger state for bimoraic nouns, and the data sources for the 15 dialects. The whole dataset is shown in appendix (see Supplementary Table SA1 and SA2 ).

2.2.3. Pitch-accent systems recorded in historical documents

Note that three out of the fifteen dialects are ancestral dialects reconstructed from historical documents; Ruiju Myogisho (hereafter Myogisho), Bumoki, and Gonza’s materials. First, Myogisho and Bumoki are historical documents which are considered to reflect the pitch-accent system of Kyoto in the Heian and Muromachi period, respectively ( Kindaichi 1975a ). On the other hand, Gonza’s materials, which show the pitch-accent system of the old Kagoshima dialect ( Kibe 1997 ), are bilingual manuscripts written by Gonza, a Japanese castaway who arrived in Russia from Kagoshima in the early 18th century. The geographical locations of both Bumoki and Myogisho are assumed to be identical to that of modern Kyoto dialect, whereas the geographical location of Gonza’s materials is the same as that of modern Kagoshima dialect.

2.2.4. Remarks on the dataset of the pitch-accent systems

In Hirosaki and Kagoshima, the unit of the accent is not mora but syllable, that is, each syllable is assigned a pitch marked by H, L, and so on. In order to make the number of accent units consistent among every tree taxon dialect, we coded the accent patterns of words whose mora breaks concur with syllable breaks. Exceptionally, the mora break cannot concur with the syllable break for adjectives because the common ending -i does not constitute a syllable, so the number of syllables is always one less than the number of morae. In order to analyze the accent pattern of three accent units, we coded the accent pattern of the quadrimoraic (trisyllabic) adjectives instead of trimoraic (bisyllabic) adjectives for Hirosaki and Kagoshima. For Hirosaki, we used the accent pattern of yasashii (kind) for the first class and kibishii (strict) for the second class, as they were picked up as typical accent patterns ( Uwano 1990 ). For Kagoshima, we assigned the accent pattern of kanashii (sad) and munashii (futile) for the first class and the accent pattern of kuwashii (detailed), shitashii (close), suzushii (cool), tadashii (correct), and hitoshii (equal) for the second class (Hirayama).

As Ibukijima is a small island off the coast of Kan’onji city, the two taxa Ibukijima and Kanonji are assigned to the same cell due to the relatively coarse resolution (i.e. around 20 km × 20 km) of the grid. However, the Bayesian framework of Takahashi and Ihara (2023) cannot be applied to the case where multiple taxa share the same network node. As a proxy for the actual location of the island, we assigned the pitch-accent system of the Ibukijima dialect to a cell in Shikoku, which is about 33 km away from the island.

In Kyoto, we did not consider the automatic word lengthening of monomoraic nouns.

In Hirosaki, the merger state of bimoraic nous is 12 / 3 / 45 for nouns with second mora having a close vowel and 12 / 345 otherwise. We coded the accent patterns of the former, because it is thought to conserve old characteristics in view of the distinction of accentual classes. Also, words in the Hirosaki dialect, which distinguishes accent patterns by the position of pitch-rise, have different accent patterns when ending and continuing a sentence, the former of which was coded in this study.

In Tarui, the pitch-accent system has relatively recently undergone a merger, and the merger state is either 14 / 23 / 5 (called C type) or 14 / 235 (B type) depending on regions and literature. We assigned the accent patterns of the C type.

Accent data on Nagasaki were placed on the Shimabara Peninsula of Nasgasaki prefecture instead of Nagasaki city, because one of our data sources ( Kibe 2000 ) mainly argued the pitch-accent system of Shimabara. Nevertheless, the pitch-accent system of Nagasaki city is basically the same as that of Shimabara ( Hirayama 1951 ).

2.2.5. Terms and symbols

In order to distinguish observed variables from the latent variables mentioned later, the accent patterns and merger state that are observed in the dataset are referred to as surface accent pattern and surface merger state , respectively. The surface accent pattern for the j th accentual class of the word category C i   is denoted by D i j ( ⁠ 1 ≤ i ≤ 6 ,   1 ≤ j ≤ l ( i ) ⁠ ), and we define the vector D i = ( D i 1 ⋯ D i l ( i ) ) ⁠ . Also, the surface merger state of the word category C i   is denoted by M i ⁠ .

2.3. Latent variables assigned to every cell

To perform a Bayesian Markov Chain Monte Carlo (MCMC) simulation, we develop a generative model that assigns a pitch-accent system to every one of the 691 cells in the grid. In this model, the pitch-accent system of a dialect is specified by the set of accent patterns for all accentual classes of the six word categories. With regard to the i th word category C i ⁠ , every cell has two latent variables N i and b i ⁠ .

First, N i of a given cell represents its extended merger state for word category C i ⁠ . Unlike the surface merger state M i ⁠ , which is represented using slash marks, the extended merger state N i contains information about which of the classes in a merged block has overridden the accent pattern of the other classes in the block. The extended merger state is represented as a set of strings, such as { ′ 1 ′ ,′ 234 ′ ,′ 5 ′ } ⁠ , where each string represents the indexes of classes belonging to each merged block. Importantly, the first character in each string represents the index of the accentual class which overrode the other classes. The extended merger state { ′ 1 ′ , ′ 234 ′ , ′ 5 ′ } not only signifies that classes 2, 3, and 4 are merged into a single block but also that the accent pattern of class 2 overrode those of classes 3 and 4 whereas the former accent patterns of classes 3 and 4 were discarded. For each merged block, the class which overrode the others (i.e. the class indexed by the first character of every string) is referred to as the leading class . The extended merger states are distinguished both by the way they merge classes and by the identity of leading classes, so { ′ 1 ′ , ′ 234 ′ , ′ 5 ′ } and { ′ 1 ′ , ′ 324 ′ , ′ 5 ′ } are distinct representations, although they both correspond to the merger state 1 / 234 / 5 ⁠ . On the other hand, { ′ 1 ′ , ′ 234 ′ , ′ 5 ′ } and { ′ 1 ′ , ′ 243 ′ , ′ 5 ′ } represent the same extended merger state because the leading class of each merged block is identical. The extended merger state is a latent variable because we do not know which class overrode the other classes in the past.

Second, b i = ( b i 1 ⋯ b i l ( i ) ) is a vector representing the latent accent patterns for word category C i ⁠ , each of whose elements b i j takes one of the accent patterns empirically observed for some accentual class of C i ( ⁠ 1 ≤ j ≤ l ( i ) ⁠ ). For example, there are seven distinct accent patterns documented for monomoraic nouns C 1 ( ⁠ l ( 1 ) = 3 ⁠ ), namely HL, LH, HM, HH, LL, LM, and MH, constituting the set of values that b 11 ⁠ , b 12 ⁠ , and b 13 can take.

One might find it odd to have different variables for the merger state and the accent patterns as these two are mutually dependent in the empirical data. Indeed, b i is a latent variable and may differ from the surface accent patterns D i used for computing the likelihood in the MCMC algorithm. As we will see later, this model setting enables us to reduce the number of possible values for each parameter and to efficiently compute the tree likelihood. Hence, in the present analysis, we define a pitch-accent system by the set of parameters { ( N 1 , b 1 ) , ( N 2 , b 2 ) , ⋯ , ( N 6 , b 6 ) } ⁠ .

Considering a discrete-time model, each of the square cells P 1 ⋯ P n is assumed to have one pitch-accent system at a given timestep. The evolution of pitch-accent systems is driven by two events that happen at every timestep: transmission (diffusion) and mutation, the details of which will be described in the following two subsections.

2.4. Spatial transmission of pitch-accent systems

At the beginning of each timestep, every cell inherits the pitch-accent system from one of the n cells. Note that the whole pitch-accent system { ( N 1 , b 1 ) , ( N 2 , b 2 ) , ⋯ , ( N 6 , b 6 ) } is transmitted from the chosen cell. We do not allow a cell to copy some of the variables from one cell and the remaining variables from another, which ensures that the evolutionary history of the pitch-accent system is represented as a phylogenetic tree rather than a phylogenetic network. We will compute the tree prior based on the transmission (diffusion) of the lineages in space.

To represent the spatially structured interaction among human populations, we will set the following assumptions. The probability that cell P i inherits the pitch-accent system from cell P j is denoted by a i j and is modeled as follows:

where π j represents the population size of P j ⁠ , and d i j represents the geographic distance between the two cells, measured in great circle distance. K i is a normalizing factor given by

The factor e x p ( ⋅ ) in equation (2.1 ) is a Gaussian interaction kernel, so cells tend to learn the pitch-accent system from nearby populations (see Burridge 2017 for a similar model). Hence, phylogenetic trees tend to score a high prior probability if geographically close taxa form clusters. It should also be noted that cells with a large population density exert large influence on other cells. The parameter σ is the standard deviation of the Gaussian kernel and gives how far the pitch-accent system diffuses among cells; a large value of σ allows transmission among distant cells. As we have found that the MCMC algorithm does not converge unless the parameter σ is fixed, we assume σ = 70 km in this study, but as we shall see in subsection 2.9, the result with a different value of σ is included in the electronic supplementary material .

Following the method in Takahashi and Ihara (2023) , we store the information about where the cells inherited the pitch-accent system over the past τ timesteps, where τ represents the maximum possible height of the tree root in the unit of timestep. Let G denote a matrix of dimension τ × n ⁠ , whose element at t th row and i th column represents the index of the cell from which the cell P i inherited the pitch-accent system t − 1 timesteps ago. Based on the matrix G ⁠ , we can cut out a phylogenetic tree T ⁠ , by tracing the lineages from leaf nodes (see Fig. 4 ). As the prior probability of G can be computed by using equation (2.1 ), we can obtain a tree prior reflecting the spatial interaction pattern of the cells.

(a) Coalescent process based on matrix G. Circles represent network nodes representing the space (n=5, τ=5 in this example). The lineages starting from the five taxa A, B, C, D, and E are retraced. (b) Resulting phylogenetic tree T. Tree branches are labeled with their lengths in the unit of timestep. For convenience, our model allows a branch with zero length, in order that the output of the coalescent process is always a binary tree. Hence, two taxa C and E, where the former is the descendant of the latter, are regarded as sister taxa.

(a) Coalescent process based on matrix G ⁠ . Circles represent network nodes representing the space ( ⁠ n = 5 ,   τ = 5 in this example). The lineages starting from the five taxa A, B, C, D, and E are retraced. (b) Resulting phylogenetic tree T ⁠ . Tree branches are labeled with their lengths in the unit of timestep. For convenience, our model allows a branch with zero length, in order that the output of the coalescent process is always a binary tree. Hence, two taxa C and E, where the former is the descendant of the latter, are regarded as sister taxa.

2.5. Mutation of the pitch-accent system

After all cells inherit a pitch-accent system at each timestep, they may modify the inherited accent via a mutation event. First, we assume that the evolution of ( N i , b i ) is independent of ( N j , b j ) if i ≠ j ⁠ . That is to say, mutations on the merger states and accent patterns occur independently among different word categories. We thus describe our model of mutation focusing on a single word category C i and consider the mutation of the two variables N i and b i ⁠ , which also mutate independently of each other.

As for the mutation of the extended merger state N i ⁠ , we assume that every pair of merged blocks are merged with the same probability q ⁠ , resulting in a larger block containing all the accentual classes included in the two blocks. When two blocks merge, the strings representing the two blocks are concatenated in a random order, meaning that the leading class of one block overrides the classes of the other block. For example, if N i = ′ 1 ′ , 2 ‵ 3 ′ , ′ 45 ′ ⁠ , it may mutate into either ′ 123 ′ , ′ 45 ′ ⁠ , ′ 231 ′ , ′ 45 ′ ⁠ , { ′ 145 ′ , ′ 23 ′ } ⁠ , { ′ 451 ′ , ′ 23 ′ } ⁠ , { ′ 1 ′ , ′ 2345 ′ } ⁠ , or { ′ 1 ′ , ′ 4523 ′ } with probability q / 2 for each of them (see Fig. 5a for another example). Letting u i denote the number of merged blocks in N i ⁠ , as the merger is possible for the u i 2 = 1 2 u i u i − 1 pairs, the total probability that N i mutates is given by 1 2 u i ( u i − 1 ) q ⁠ .

(a) Mutation of the extended merger state for the case with three accentual classes. The rectangles show extended merger states and the arrows indicate merger of two accentual classes. (b) Replacement of the latent accent pattern for bimoraic words with five possible accent patterns as an example. The rectangles show accent patterns and the arrows indicate transitions among different accent patterns. (c) Relation between the latent and surface accent patterns in the case of three accentual classes. In this example, from the latent accent patterns shown in the rectangles on the left, the surface accent patterns shown in the rectangles on the right are generated. For panels (a) and (b), self-loops representing the absence of mutation are omitted.

(a) Mutation of the extended merger state for the case with three accentual classes. The rectangles show extended merger states and the arrows indicate merger of two accentual classes. (b) Replacement of the latent accent pattern for bimoraic words with five possible accent patterns as an example. The rectangles show accent patterns and the arrows indicate transitions among different accent patterns. (c) Relation between the latent and surface accent patterns in the case of three accentual classes. In this example, from the latent accent patterns shown in the rectangles on the left, the surface accent patterns shown in the rectangles on the right are generated. For panels (a) and (b), self-loops representing the absence of mutation are omitted.

On the other hand, we assume that the latent accent pattern b i = ( b i 1 ⋯ b i l ( i ) ) mutates in an elementwise manner. The latent accent pattern b i j may mutate into any other accent patterns empirically documented for the word category C i with equal probability denoted by α (see Fig. 5b ). This is the discrete-time version of the Mk-model ( Lewis 2001 ), which represents the evolution of a trait with k possible states.

2.6. Generating the surface accent pattern from the latent accent pattern

Focusing again on one word category C i ⁠ , we describe how the surface accent pattern D i is generated from the latent accent pattern b i and the merger state N i in each cell. For each merged block, the latent accent pattern of the leading class is observed as the surface accent pattern in every class of the block (see Figure 5c ). For example, consider the bimoraic noun C 2 with five accentual classes. If a cell has the extended merger state N 2 = { ′ 1 ′ , ′ 23 ′ , ′ 54 ′ } and the latent accent patterns b 2 = ( H H M ,       H L L , H H L , L H L , L L H ) ⁠ , the surface accent pattern will be ( H H M ,       H L L , H L L , L L H , L L H ) ⁠ .

Under these model assumptions, we can properly incorporate two features of the principle of accentual class merger. First, the merged classes mutate simultaneously. For example, considering a word category with three accentual classes, it takes a single mutation event, rather than two, for the transition of surface accent patterns from ( L H H ,       H L L , H L L ) to ( L H H ,       L H L , L H L ) to occur if the relevant merger state is 1/23. Second, as a corollary, we always observe the same accent pattern for merged accentual classes.

2.7. Bayesian inference by MCMC

The probabilistic dependency of the model parameters is depicted in Fig. 6 . The observed data with regard to the word category C i is given by Y i = D i ∩ M i ⁠ . Here we define Y = { Y 1 , ⋯ , Y 6 } ⁠ ,   D = { D 1 , ⋯ , D 6 } ⁠ , B = { b 1 , ⋯ , b 6 } ⁠ , N = { N 1 , ⋯ , N 6 } ⁠ , and M = { M 1 , ⋯ , M 6 } ⁠ . Note that the symbols B , N , M , D , Y in the following expressions and discussion represent the variables assigned to every tree taxon rather than those in a single pitch-accent system, but the indexes of taxa are omitted for notational simplicity. The joint posterior distribution is given by

Graphical representation of the probabilistic dependencies between parameters concerning the word category Ci. In particular, functional dependency is shown by bold arrows, that is, G, Ni and {Ni,bi} functionally determine T, Mi and Di, respectively. The filled and open circles represent observed and latent parameters, respectively. Here, symbols Ni, bi, Mi and Di represent variables assigned at taxa (tree leaves).

Graphical representation of the probabilistic dependencies between parameters concerning the word category C i ⁠ . In particular, functional dependency is shown by bold arrows, that is, G ⁠ , N i and { N i , b i } functionally determine T ⁠ , M i and D i ⁠ , respectively. The filled and open circles represent observed and latent parameters, respectively. Here, symbols N i ⁠ , b i ⁠ , M i and D i represent variables assigned at taxa (tree leaves).

where B A = { b 1 A , ⋯ , b 6 A } denotes the latent accent pattern at the tree root. The latent accent patterns B is integrated out in the above equation. We assume that the prior probability of B A is given independently to each word category, which gives P ( B A ) = ∏ i = 1 6 ⁡ P ( b i A ) ⁠ . Note also that the latent accent pattern always concurs with the surface accent pattern at the tree root since all the accentual classes are unmerged. The latent variables M and B assigned at tree taxa are only dependent on the tree T that is derived from G ⁠ , so we have P ( N | G , q ) = P ( N | T , q ) and P ( D | G , α , B A , N ) = P ( D | T , α , B A , N ) ⁠ . Hence,

The MCMC algorithm allows us to draw a sample from the joint posterior distribution using expression (2.3). Focusing on the right-hand side, we use uniform distributions as the two priors P ( q ) and P ( α ) ⁠ . The prior P ( G ) can be computed as the product of the value a i j using equations (2.1 ) and (2.2 ) (see Takahashi and Ihara 2023 ). As for the latent accent patterns at the tree root b i A = ( b i 1 A ⋯ b i l ( i ) A ) ⁠ , we assume that the prior probability P ( b i A ) is uniform to any b i A such that every class has different latent accent patterns. For example, the latent accent pattern of monomoraic nouns b 11 ⁠ , b 12 ⁠ , and b 13 can take seven possible values which are empirically documented, so b 1 A = ( b 11 A , b 12 A , b 13 A ) may take 7 × 6 × 5 = 210 possible values, each of which is given the prior probability 1 / 210 ⁠ . The factor P ( N i | T , q ) is computed by Felsenstein’s pruning algorithm ( Felsenstein 1973 , 1981 ). The conditional probabilities P ( M i | N i ) is one if and only if the combination of classes constituting the merged blocks is the same in M i and N i ⁠ . Otherwise, P ( M i | N i ) = 0 ⁠ . Thus, in running the MCMC algorithm, we only have to explore the values of N i ⁠ , such that P ( M i | N i ) = 1 holds true.

Finally, the conditional probability P ( D i | T , α , b i A , N i ) in the right-hand side of expression (2.3) can be computed by performing Felsenstein’s pruning algorithm ( Felsenstein 1973 , 1981 ) in an irregular way. As discussed above, when our MCMC algorithm explores the parameter space of N i ⁠ , it is guaranteed that the observed values of surface accent patterns D i j is uniform for every j   ( 1 ≤ j ≤ l ( i ) ) such that j th class belongs to the same merged block in N i ⁠ . Hence, focusing on one leaf node (i.e. taxon), the observed accent patterns D i are generated by the model if and only if b i j = D i j holds true for every j such that j th class is a leading class in N i at the focal leaf. Focusing on j th class of the word category C i ⁠ , we define a function L i j ( p , v ) for a possible latent accent pattern p and a tree node v ⁠ . L i j ( p , v ) represents the probability that b i j = D i j holds true at all the leaves, which are v itself or descendants of v ⁠ , and at which j th class is a leading class in N i ⁠ , given that the node v has the latent accent pattern p ⁠ . We have

To compute this, we recursively compute L i j ( p , v ) from leaves to the root. If v is a leaf, we initialize L i j ( p , v ) by

If v is an internal node or the root, whose child nodes are denoted by s 1 and s 2 ⁠ , we can compute L i j ( p , v ) by

This is practically the pruning algorithm with missing state values at some of the taxa.

MCMC is conducted with three independent chains. For each chain, we run MCMC for 4 × 10 6 iterations, and the first 10 6 iterations are discarded as burn-in. We draw a sample at the interval of 10 3 iterations, which gives the sample size of 9000 in total. After sampling from the joint posterior distribution (2.3), we readily obtain the posterior distributions of model parameters and the latent accent pattern at the root (i.e. P ( q | Y ) ⁠ , P ( α | Y ) ⁠ , and P ( B A | Y ) ⁠ ). The posterior distribution of the phylogenetic tree P ( T | Y ) can be obtained by tracing the lineages based on the sampled values of G ( Fig. 4 ). Moreover, as matrix G contains information as to where the lineages existed at each timestep in the past, we can obtain the posterior distribution of the geographical location of the tree root and the most recent common ancestor (MRCA) of a subset of the tree taxa. We thus infer the location of the root and the MRCA of the three types of pitch-accent systems; Tokyo type, Keihan type, and N-kei type.

To assess convergence, we conducted two additional runs of MCMC with the same model configuration with the exception of the initial value of G (i.e. different starting tree), and obtained confirmatory results (data not shown).

2.8. Time calibration

We establish the correspondence between the tree length, originally given in the unit of timestep, and the real unit of time (year) taking into consideration the time at which the ancestral pitch-accent systems existed. Considering the survey dates or publication dates of the data sources for the accent patterns of modern dialects, we set the observation year of the tree taxa (i.e. terminal nodes) to be 1950 AD. The dates of Ruiju Myogisho, Bumoki, and Gonza’s materials (old Kagoshima) are respectively assumed to be 850, 450, and 225 years before the present, which means 1100, 1500, and 1725 AD in the calendar year. The prior probability of the date of the common ancestor (i.e. root node) in years before present follows the uniform distribution U ( 850 ,   1500 ) ⁠ , ranging from 450 to 1100 AD. The upper limit (oldest limit) is set to 450 AD considering the divergence time of the Ryukyuan and mainland Japanese languages which is debatable but hypothesized to be around the third to seventh century in recent research ( Pellard 2016 ). In Ryukyuan dialects, accentual classes that are not seen in mainland dialects are present, suggesting that the common ancestor of the mainland pitch-accent systems is more recent than the divergence of Ryukyuan and mainland Japanese.

Also, to relate the timestep of our discrete-time model to a real unit of time, we assume that one timestep represents 25 years, roughly one human generation. We also ran MCMC with the assumption that one timestep corresponds to 10 years, but the resulting phylogeny stayed mostly unchanged (see subsection 2.9. and online supplementary material ).

2.9. Sensitivity analysis

We have data on the six different word categories C 1 , ⋯ , C 6 ⁠ , which differ in the number of morae, part in speech, and conjugation. While our assumption is that the evolution of accent patterns occurs independently among the word categories, this may not be the case in reality. For example, our dataset shows that the accent patterns for the first accentual classes of bimoraic nouns and verbs are the same for most of the modern dialects, suggesting the possibility that a single mutation may have affected accent patterns in more than one word category. Since our model would require double mutations for this sort of correlated change, it may bias the inferred phylogenetic tree. Hence, we perform an additional Bayesian inference by using a subset of the data, particularly those on nouns (i.e. C 1 and C 2 ⁠ ).

We also ran MCMC changing the value of σ ⁠ , prior on the time to the most recent common ancestor (MRCA), the number of years to which one timestep of the model corresponds, and the influence of the population size on the transmission of pitch-accent systems.

The sensitivity analysis showed that the resulting phylogenetic trees were mostly similar, particularly in terms of the tree topology, across different assumptions. We thus only present the main result in this section, and the results of the sensitivity analysis (subsection 2.9.) are included in the electronic supplementary material .

3.1. Phylogenetic reconstruction

Fig. 7 shows the maximum clade credibility (MCC) tree derived from the sample of posterior trees. The MCC trees in this article were developed by the TreeAnnotator application distributed with BEAST v2.7.6 ( Bouckaert et al., 2014 ) and visualized by FigTree v1.4.4. Note again that the date of the tree taxa is 1950. The figure indicates that the phylogenetic tree consists of three clades, which concurs with the conventional classification of pitch-accent systems: Tokyo type, Keihan type, and N-kei type. It is thus suggested that the Tokyo-type accents, distributed both the west and east sides of the archipelago, have a shared common ancestor. Focusing on the phylogenetic relationship of the three clades, the Keihan type and N-kei type are most plausibly sister groups, although this clade is far from being decisive in view of its posterior probability. The 95% credibility interval of the date of the common ancestor of all the modern dialects ranges from AD 450 to 825, corresponding to the mid-Kofun to the early Heian period. Similarly, the 95% credibility interval of the time to the most recent common ancestor of Tokyo-type, Keihan-type (including Ruiju Myogisho), and N-kei-type is AD 1000–1550, 825–1100, and 1050–1700, respectively. However, it must be noted that the posterior distribution of the date of the common ancestor is heavily dependent on the prior distribution (see electronic supplementary material ).

Maximum clade credibility (MCC) tree generated from the posterior sample of phylogenetic trees. Horizontal axis represents the time before present in year, but note that the modern pitch-accent systems are assumed to be dated as 1950. The branches are labeled with posterior probabilities representing the proportion of posterior trees supporting each clade. The bars covering the root and internal nodes represent the 95% credibility interval of divergence time. Taxa and clades are colored according to the conventional classification of pitch-accent systems (see Table 2): Keihan type (Bumoki, Tarui, Ibukijima, Kyoto, Kan’onji and Myogisho), N-kei type (Kagoshima, Gonza’s materials, Nagasaki and Miyakonojo), and Tokyo type (Hirosaki, Oita, Tokyo, Hiroshima and Totsukawa).

Maximum clade credibility (MCC) tree generated from the posterior sample of phylogenetic trees. Horizontal axis represents the time before present in year, but note that the modern pitch-accent systems are assumed to be dated as 1950. The branches are labeled with posterior probabilities representing the proportion of posterior trees supporting each clade. The bars covering the root and internal nodes represent the 95% credibility interval of divergence time. Taxa and clades are colored according to the conventional classification of pitch-accent systems (see Table 2 ): Keihan type (Bumoki, Tarui, Ibukijima, Kyoto, Kan’onji and Myogisho), N-kei type (Kagoshima, Gonza’s materials, Nagasaki and Miyakonojo), and Tokyo type (Hirosaki, Oita, Tokyo, Hiroshima and Totsukawa).

We first focus on the phylogenetic relationship among the Tokyo-type pitch-accent systems, particularly on Churin (Tokyo and Hiroshima) and Gairin (Hirosaki and Oita) subtypes, which respectively display the merger states 1 / 23 / 45 and 12 / 3 / 45 with respect to the bimoraic nouns. As the clade of Tokyo and Oita is strongly supported in the MCC tree, our result suggests that both Gairin and Churin subtypes are either paraphyletic or polyphyletic groups which do not share an immediate common ancestor. It is also suggested that there existed a pitch-accent system with the merger state of 1 / 2 / 3 / 45 ⁠ .

Here we focus on the phylogenetic relationship among the Keihan-type accents. The MCC tree ( Fig. 7 ) suggests that the pitch-accent system of Ibukijima, which has the merger state 1 / 2 / 3 / 4 / 5 ⁠ , is the sister taxon of the Kyoto accent, indicating that the common ancestor of the Ibukijima and the modern Kyoto accents had the merger state 1 / 2 / 3 / 4 / 5 ⁠ . Therefore, although the pitch-accent system of Muromachi-period Kyoto, recorded in Bumoki, and the modern Kyoto dialect have the merger state 1 / 23 / 4 / 5 in common, the MCC tree indicates that the merger of classes 2 and 3 happened independently on the lineages of these two taxa.

Considering the phylogenetic relationship among Tokyo, Keihan, and N-kei-types, the Keihan + N-kei clade appeared in 66.6% of the sampled trees ( Fig. 7 ). In contrast, the Tokyo + N-kei clade and Tokyo + Keihan clade were supported by 21.5% and 8.3% of the sampled trees respectively, although these clades do not appear in the MCC tree.

3.2 Homelands of common ancestors

Fig. 8 plots the location of the most recent common ancestor (MRCA) of all fifteen taxa ( Fig. 8a ), Keihan type (including Ruiju Myogisho) ( Fig. 8b ), Tokyo type ( Fig. 8c ), and N-kei type ( Fig. 8d ) which appear in each of the 9000 trees sampled by MCMC. Hence, the values displayed in this figure are proportional to the posterior probability that the MRCA is located in each cell. Not surprisingly, the MRCA tends to appear in cells with a large population size because equation (2.1 ) indicates that the pitch-accent system is more likely to be inherited from such cells. The figure suggests that the common ancestor of all the taxon pitch-accent systems has most likely been located around contemporary Kyoto, Osaka, Nara, or Kobe, although other parts of the Kinki region and the east part of the Chugoku and Shikoku regions are also possible locations of the linguistic homeland ( Fig. 8a ). The MRCA of the Keihan-type accent is also likely to be located in these regions ( Fig. 8b ). The highest posterior probability is scored by Shijo, Kyoto, which is probably because this cell has one of the highest population sizes in the Kinki area and because the accent pattern of Ruiju Myogisho, which is closest to the root, was assigned to this cell. Indeed, this area was a political and cultural center from the Heian period. As for the Tokyo type, locations of the MRCA sampled by MCMC spread relatively widely, but we can observe the highest peak around the Kinki region and the second highest peak around Tokyo ( Fig. 8c ). On the other hand, Tohoku and Kyushu regions are plausibly not the homeland of the Tokyo-type accent, although Tokyo type is currently observed in part of these regions. Finally, the MRCA of the N-kei type was most likely to be located in the northern or central part of Kyushu ( Fig. 8d ).

Geographical distribution of the locations of the MRCA for every sampled phylogenetic tree. The figure shows the number of sampled trees whose MRCA occupies each cell (logarithmic scale). (a) MRCA of all the taxon dialects. (b) MRCA of the dialects with Keihan-type accent. (c) MRCA of the dialects with Tokyo-type accent. (d) MRCA of the dialects with N-kei type accent.

Geographical distribution of the locations of the MRCA for every sampled phylogenetic tree. The figure shows the number of sampled trees whose MRCA occupies each cell (logarithmic scale). (a) MRCA of all the taxon dialects. (b) MRCA of the dialects with Keihan-type accent. (c) MRCA of the dialects with Tokyo-type accent. (d) MRCA of the dialects with N-kei type accent.

3.3 Accent patterns of the common ancestor

Table 3 shows the posterior probability of the accent patterns at the tree root (i.e. common ancestor of all the modern dialects) computed based on the sample from the distribution P ( B A | Y ) ⁠ . Since the surface accent pattern always concurs with the latent accent pattern at the root, P ( B A | Y ) is regarded as the posterior probability of the surface accent pattern. Not surprisingly, the table shows that, for all accentual classes, the accent pattern of Ruiju Myogisho scores the highest posterior probability as the accent pattern at the root.

Posterior probability of accent patterns for every accentual class at the tree root. Three accent patterns which scored the three highest posterior probabilities are shown for each class. Numbers represent posterior probabilities. For comparison, the accent patterns of Ruiju Myogisho ( Kindaichi 1975a ) are shown.

Word CategoryClassRank of posterior probabilityMyogisho
123
Monomoraic noun1HH (0.69)MH (0.13)LM (0.07)HH
2HL (0.83)LL (0.05)LM (0.05)HL
3LH (0.86)HL (0.06)LL (0.02)LH
Bimoraic noun1HHH (0.61)MHH (0.08)LLL (0.07)HHH
2HLH (0.54)LLM (0.10)HLL (0.10)HLH
3LLH (0.72)LHL (0.09)HHM (0.04)LLH
4LHH (0.60)HLL (0.14)HHL (0.04)LHH
5LHL (0.55)HLL (0.17)HHL (0.05)LHL
Bimoraic verb1HH (0.62)HL (0.15)LH (0.09)HH
2LH (0.73)HL (0.21)MH (0.02)LH
Trimoraic godan verb1HHH (0.59)LHL (0.08)MHH (0.08)HHH
2LLH (0.74)LHL (0.12)HLL (0.03)LLH
3LHH (0.61)LHL (0.18)MHH (0.04)LHH
Trimoraic ichidan verb1HHH (0.53)LHH (0.12)LHL (0.09)HHH
2LLH (0.70)LHL (0.20)HLL (0.02)LLH
Trimoraic adjective1HHL (0.64)LHH (0.09)LHL (0.07)HHL
2LLH (0.71)LHL (0.17)HLL (0.03)LLH
Word CategoryClassRank of posterior probabilityMyogisho
123
Monomoraic noun1HH (0.69)MH (0.13)LM (0.07)HH
2HL (0.83)LL (0.05)LM (0.05)HL
3LH (0.86)HL (0.06)LL (0.02)LH
Bimoraic noun1HHH (0.61)MHH (0.08)LLL (0.07)HHH
2HLH (0.54)LLM (0.10)HLL (0.10)HLH
3LLH (0.72)LHL (0.09)HHM (0.04)LLH
4LHH (0.60)HLL (0.14)HHL (0.04)LHH
5LHL (0.55)HLL (0.17)HHL (0.05)LHL
Bimoraic verb1HH (0.62)HL (0.15)LH (0.09)HH
2LH (0.73)HL (0.21)MH (0.02)LH
Trimoraic godan verb1HHH (0.59)LHL (0.08)MHH (0.08)HHH
2LLH (0.74)LHL (0.12)HLL (0.03)LLH
3LHH (0.61)LHL (0.18)MHH (0.04)LHH
Trimoraic ichidan verb1HHH (0.53)LHH (0.12)LHL (0.09)HHH
2LLH (0.70)LHL (0.20)HLL (0.02)LLH
Trimoraic adjective1HHL (0.64)LHH (0.09)LHL (0.07)HHL
2LLH (0.71)LHL (0.17)HLL (0.03)LLH

3.4. Inference of parameters regarding mutation

We also obtained the posterior distributions of the two model parameters q and α ⁠ , which represent the rates of the accentual class merger and substitution between accent patterns ( Fig. 9 ). Focusing on q ⁠ , the 95% credibility interval ranges from 2.25 × 10 − 4 to 4.86 × 10 − 4 per annum. Its posterior median value is 3.37 × 10 − 4 ⁠ , with which the merger of a given pair of two unmerged blocks should take place once every 2967 years. Since a pitch-accent system with the merger state 1 / 2 / 3 / 4 / 5 experiences a merger event with probability 5 2   q = 10 q ⁠ , according to the posterior median, the expected time to the first merger event is 296.7 years. At this pace, the merger state 1 / 2 / 3 / 4 / 5 stays unchanged for over 1000 years with probability 0.034 ⁠ . Intuitively, this small value is in line with the fact that the modern pitch-accent system with 1 / 2 / 3 / 4 / 5 is found only in the dialect of a small island, Ibukijima. On the other hand, the 95% credibility interval of α resides in the range from 6.97 × 10 − 5 to 1.39 × 10 − 4 per annum, with the posterior median being 9.89 × 10 − 5 ⁠ . For example, as monomoraic nouns have seven possible accent patterns, the probability with which the latent accent pattern of an accentual class changes is given by 6 α ⁠ . Based on the posterior median value of α ⁠ , the latent accent pattern of a given accentual class of monomoraic nouns changes once every 1685 years.

Posterior distributions of mutation rates per annum. (a) Posterior distribution of q, the rate at which accentual classes merge. (b) Posterior distribution of α, the rate at which the latent accent pattern is replaced with another specific accent pattern.

Posterior distributions of mutation rates per annum. (a) Posterior distribution of q ⁠ , the rate at which accentual classes merge. (b) Posterior distribution of α ⁠ , the rate at which the latent accent pattern is replaced with another specific accent pattern.

In this paper, we developed a mutation model representing the transition of pitch-accent systems driven by accentual class merger and integrated the model into the Bayesian framework for phylogenetic reconstruction with spatial diffusion ( Takahashi and Ihara 2023 ). On the basis of documented data on accent patterns in multiple modern dialects, we inferred the phylogenetic tree of their pitch-accent systems. The resulting phylogenetic tree ( Fig. 5 ) supports the clades of Tokyo type, Keihan type, and N-kei type (in Kyushu), which is also true for all the conditions of sensitivity analysis (see electronic supplementary material ). Although we included the diffusion of pitch-accent systems into the model to generate the tree prior, the resulting tree still supports the monophyly of the Tokyo-type accents, which are distributed distantly in the west and east.

The combination of the resulting phylogenetic tree ( Fig. 7 ) and the posterior distributions of the geographical location of the MRCAs ( Fig. 8 ) suggests the following scenario of the evolutionary history of pitch-accent systems. First, the common ancestor of the modern pitch-accent systems dates back to the mid-Kofun to early Heian period and was located in the contemporary Kinki region or its perimeter. The pitch-accent system then split into three branches, which are respectively ancestral to the modern Keihan type, Tokyo type, and N-kei type. The Keihan-type branch stayed around the Kinki region until the Heian period and subsequently split into branches that were inherited by historical documents and modern dialects. On the other hand, the lineage of the N-kei-type of the Kyushu region moved from Kinki to northern or central Kyushu, subsequently splitting into lineages to individual modern dialects from AD 1050 to 1700. Also, the Tokyo-type branch most plausibly stayed in Kinki and started splitting around mid-Heian to late Muromachi period, diffusing both eastward and westward from the center of the archipelago. Since Tokyo and Oita have relatively recently diverged in our result, the diffusion of lineages to Tokyo and Oita is expected to have occurred recently. The second most plausible scenario is that, after splitting from the common ancestor of the mainland pitch-accent system, the Tokyo-type branch moved from Kinki to Tokyo, after which it split and diffused to the north (Hirosaki) and to the west (Hiroshima, Oita). Again, the diffusion from Tokyo to Oita is suggested to have happened recently, although the cause of this jumpy transmission is unknown. The latter scenario indicates that the Tokyo-type accent in the west (Hiroshima, Oita) has diffused like a round trip (eastward and then westward) since the common ancestor of the mainland dialects. However, the sensitivity analyses showed the divergence time heavily depended on the prior of the root age (see electronic supplementary material ), so the discussion above depends on the assumption that the MRCA of the mainland pitch-accent does not date back before 450 AD.

Our results are in sharp contradiction to the conventional hypotheses proposed by linguists. First, although Gairin type, a subtype of the Tokyo-type accent seen in Oita and Hirosaki, was posited to be phylogenetically far from the Churin-type accents by Kindaichi ( Fig. 2a ), Figure 7 suggests that the Tokyo-type accents form a monophyletic group. Thus, our results indicate that the common ancestor of the Tokyo-type accents had the merger state 1 / 2 / 3 / 45 with respect to the bimoraic nouns, which has not been observed in any modern dialects to the extent of our knowledge. For this reason, our result does not support the prevailing hypothesis that the Churin-type accents (Tokyo and Hiroshima) derived from the pitch-accent system recorded in Bumoki ( Fig. 2a ) ( Kindaichi 1942 , 1975a ). One hypothesis that is partially congruent with our result has been proposed by Tokugawa (1962) , who posits that Gairin and Churin types have the common ancestor with the merger state 1 / 2 / 3 / 45 ⁠ , although it argues, unlike us, against the monophyly of the Tokyo-type accents ( Fig. 2b ).

Besides the phylogeny of the pitch-accent systems, we compare our result with proposed evolutionary history of Japonic languages/dialects inferred from lexical features. First, Igarashi (2021) ’s analysis suggested a phylogenetic tree of Japonic languages in which two geographically continuous groups of dialects, the ‘Macro-Eastern Japanese branch’ and the ‘Southern Japanese branch’, respectively form clusters. While the former branch extends in the east part of Japanese mainland, including our taxon dialects Hirosaki, Tokyo, and Tarui, the latter consists of Ryukyuan and dialects in Kyushu, including Nagasaki, Oita, Miyakonojo and Kagoshima. However, our results do not support the monophyly of either of the groups, because the Tokyo-type accent, whose geographical distribution overlaps with both of the branches proposed by Igarashi, forms a cluster. Our results also contradict the phylogenetic tree of Lee and Hasegawa (2011) , inferred through Bayesian phylogenetic analysis. In their result, mainland dialects tend to form geographically continuous clades (east-west division), and the split between Tokyo type and Kyoto type (center-periphery division) was not observed. In view of the discrepancies between our results and previous studies, the pitch-accent system and lexicons in mainland dialects may have different histories.

The phylogenetic relationship within the clade of the Keihan-type accent is also quite different from the conventional hypotheses in that the pitch-accent system of the Ibukijima dialect is suggested to share the immediate ancestor with that of the Kyoto dialect. Since Ibukijima is known for the complex pitch-accent system where all the accentual classes of bimoraic nouns are unmerged (i.e. 1 / 2 / 3 / 4 / 5 ⁠ ), our result indicates that the ancestor of the Kyoto dialect had the merger state 1 / 2 / 3 / 4 / 5 until recently. This result may seem inconsistent with the fact that Kyoto had the pitch-accent system 1 / 23 / 4 / 5 in the late Muromachi period ( Kindaichi 1975a ). However, as the dialects can diffuse in space, it is not impossible that the ancestor of the pitch-accent system of the modern Kyoto dialect was located elsewhere in the Muromachi period. Our results suggest that the pitch-accent system of modern Kyoto is not derived from that of Bumoki but from an undocumented pitch-accent system with the merger state 1 / 2 / 3 / 4 / 5 which survived from the Muromachi period. Focusing on the Kyoto region, our result is interpreted that the two lineages to the modern Kyoto dialect and Bumoki have independently undergone the merger of classes 2 and 3.

As for the methodology used in this study, we developed a mutation model which represents the accentual class merger and which can be integrated into the framework of phylogenetic analysis with a practical algorithmic efficiency. The novelty of this method consists in the representation of the phenomenon of merger, unlike models with binary features which are often employed in Bayesian phylogeny. The method is somewhat difficult to interpret, in the way that we assume the latent accent pattern whose mutation is a Markov process. However, the model setting enables the efficient computation of the tree likelihood through Felsenstein’s tree-pruning algorithm ( Felsenstein 1973 , 1981 ), by reducing the number of possible states of each variable. On the technical aspect, although the pruning algorithm requires the computation of the power of the matrix with mutation rates, our model setting efficiently reduces the computational complexity by assuming that variables N i and b i j follow independent Markov processes. Since there are only 196 possible extended merger states for the five accentual classes of the bimoraic nouns, the number of possible values for N i is limited to a relatively small number, which significantly reduces the computation time for matrix multiplication. If we used a model where the surface accent patterns D i followed a Markov process, there would be more than thousands or tens of thousands possible states, rendering the algorithm impractically slow.

Our model may be applied to other features of languages beyond Japanese pitch accents or cultural traits in general, to infer the phylogenetic tree from merger phenomena. The merger is not limited to pitch accent but commonly occurs in the sound system of any language, replacing a sound with another existent sound. Merger in phonology is also a common event where two phonemes lose distinction. The manual comparative method has traditionally often relied on the merger phenomenon in sound system, but few statistical models were built to treat this class of dataset for phylogenetic analysis. For instance, starting from the vowel system of Proto-Japonic as the common ancestor, the pattern of vowel merger differs between the lineage leading to Old Japanese and Proto-Ryukyuan, which has been used to argue the phylogeny and classification of Japonic languages ( Igarashi 2021 ). Although the evolution of sound systems is driven not only by mergers but also by splits in reality, our model can still describe the nature of language evolution, which tends to follow simplifying rather than complicating processes. If relevant data are available, our model may pave the way to statistically infer linguistic phylogeny from a dataset describing the sound system of languages.

We consider the limitations of this study in view of the dataset. In this study, we collected the accent patterns from multiple different research sources, due to the absence of publicly available databases exhaustively recording the accent patterns of Japanese dialects. Conflicts in the recorded accent patterns are thus inevitable because not every author follows the same criteria in judging the accent patterns recorded in their fieldwork. It is possible that our result was biased by the data source which we selected. We hope that a database of Japanese pitch-accent systems, which records their features based on consistent criteria, will be available in the future. Moreover, we omitted trimoraic nouns from our analysis, due to the computational tractability and words with exceptional accent patterns. As the number of possible extended merger states soars with the number of accentual classes, heavily slowing down the computation of the likelihood, it was difficult to include trimoraic nouns, which have six (or seven) classes. In addition, a non-negligible number of trimoraic nouns within an accentual class have different accent patterns, which made it difficult to assign a representative accent pattern to each class. Provided some previous study reconstructed the phylogenetic tree based on trimoraic nouns ( Hirako 2017 ), our analysis may have missed the phylogenetic signal that trimoraic nouns offer. Nevertheless, the resulting phylogenetic tree is not likely to change drastically even if we include trimoraic nouns, because the accent patterns of trimoraic nouns are not independent of those of bimoraic nouns in many dialects. If a pair of two dialects have similar accent patterns for the classes for bimoraic nouns, they tend to also have similar accent patterns for trimoraic nouns.

In a different vein, we did not include the dialects of Ryukyuan, because their accent patterns do not completely correspond to the accentual classes of Japanese ( Matsumori 1998 ), as the common ancestor of Japanese and Ryukyuan dates back earlier than the Heian period. It is known that Proto-Ryukyuan had at least three distinct accent patterns, and that the classes 4 and 5 of bimoraic nouns are split into subclasses 4a and 4b, and 5a and 5b, respectively, while 4a and 5a, and 4b and 5b are merged. Accentless regions epitomized by Fukushima and Miyazaki were also excluded from our study because it is not certain whether these pitch-accent systems, which do not have distinct accent patterns given to each word, were formed through accentual class merger. This limitation is inevitable since our method is based on the assumption that all tree taxa are descendants of a pitch-accent system with distinct accentual classes seen in Ruiju Myogisho. Thus, elucidating the phylogenetic relationship of Ryukyuan dialects and accentless regions would require a different model. Nevertheless, the phylogenetic analysis including Ryukyuan languages could potentially be done by setting the merger state of the tree root to 1 / 2 / 3 / 4 a / 4 b / 5 a / 5 b and is a promising extension of our study. However, challenges concerning data curation and increasing computation time are expected.

Other limitations include the assumptions regarding the mutation of accent patterns. First, we employed the Mk model ( Lewis 2001 ), where every accent pattern may mutate into every other accent pattern with the same probability, which may be an oversimplified assumption. This assumption may have affected the inference of the accent patterns at the tree root and may also have overrated the divergence time of dialects with accent patterns which can easily mutate into each other. A possible solution to this problem would be to assign multiple different mutation rates to different pairs of accent patterns, but our relatively small dataset (i.e. accent patterns of as few as seventeen classes) seems insufficient for the inference of many different model parameters for mutation rates. One possible direction for future research is to pre-classify accent patterns into a few groups, so that the mutation within a group is more likely than mutation between groups. In this way, we may reflect the variation in the mutation rates by introducing two model parameters representing replacement rate within and between groups of accent patterns. However, judging which accent patterns are likely to be mutually replaced would require experimental work or expertise in phonology.

The lack of phonemic analysis is also a limitation of our merger model. In general, every dialect has a small number of features that distinguish accent patterns, such as the position of the pitch drop in the Tokyo dialect, or two tonal registers seen in 2-kei-type accents in Kyushu. The evolution of such distinguishing features is often related to the non-tonal contrast in other parts of the sound system, which is not modeled in this study. Loss of such distinguishing features results in a large-scale merger event that concerns the pitch accent of multiple word categories, but our model assumes that the accent for each word category (i.e. part of speech, number of morae, conjugation) evolves independently. For example, the Kagoshima dialect has two tonal registers, and either the last or second-to-last syllable is pronounced with a high pitch regardless of the number of morae and part of speech. On the other hand, in the Miyakonojo dialect, every word is assigned with the same tonal register: the last mora is pronounced with a high pitch. It has been posited that the 1-kei type pitch-accent system of Miyakonojo was formed due to the loss of one of the tonal registers seen in the Kagoshima dialect. Thus, the difference between the two pitch-accent systems can simply be explained by a single mutation event (loss of a tonal register), but our model regards this as multiple mutation events that happened independently for each word category. Nevertheless, we performed a sensitivity analysis with a reduced dataset, in the attempt to avoid this bias, so this limitation is somewhat mitigated.

In our model, we included geographical information in computing the prior probability of the phylogenetic tree, by modeling the rate of dialect transmission as a function of the geographical distance and population density. Although we used the great circle distance (a shortest distance on the sphere surface of the Earth) as the measure of geographical distance, previous research showed that the travel distance or travel time between locations better explains the variation of the linguistic distance ( Jeszenvszky et al., 2019 ; Szmrecsanyi 2012 ). The presence of the sea routes may have also affected the diffusion of dialects. Another limitation concerning geographical information is the calibration of population sizes in the Heian period. Although we used the demographic data in 1995 for calibration, it must be noted that the Japanese population distribution changed after the alluvial plain development. We did not include these factors in order to keep the model simple, but future studies may consider such geographical factors.

Unlike lexical features which are subject to borrowing between dialects, the pitch accent gives a signal for the evolutionary process described in the tree structure. Moreover, the accentual class merger gives evidence that the pitch-accent systems have split from a shared ancestor, which is quite compatible with tree-thinking. Analyzing the phylogeny of pitch accent is a promising way to shed light on the evolutionary history of the modern dialects.

We thank Peter Ranacher and Nico Neureiter for valuable discussions and feedback. We also appreciate the valuable comments from four anonymous reviewers (including one secondary reviewer). This research was funded by JSPS KAKENHI, Grant nos. 17H06381, 18J00484 and 24K09627, Meiji Institute for Advanced Study of Mathematical Sciences (MIMS) Joint Research Project, and the Swiss NSF Sinergia Project No. CRSII5_183578.

Conflict of interest statement . We declare no conflicts of interest.

The supplementary document is available on journal’s website. Data on the accent patterns of tree taxa are shown in Appendix. Other data and code associated with this paper are available at https://zenodo.org/records/11154180 .

Bouckaert , R. , et al. . ( 2014 ). ‘ BEAST 2: A Software Platform for Bayesian Evolutionary Analysis ’, PLoS Computational Biology , 10 ( 4 ): e1003537 . https://doi.org/10.1371/journal.pcbi.1003537

Google Scholar

Bouckaert , R. , et al. . ( 2012 ). ‘ Mapping the Origins and Expansion of the Indo-European Language Family’, Science , 337 ( 6097 ): 957 – 960 . https://doi.org/10.1126/science.1219669

Burridge , J. ( 2017 ). ‘ Spatial Evolution of Human Dialects’, Physical Review X , 7 ( 3 ): 031008 .

Currie , T. E. , Meade , A. , Guillon , M. , and Mace , R. ( 2013 ). ‘ Cultural Phylogeography of the Bantu languages of sub-Saharan Africa’, Proceedings Biological Sciences , 280 ( 1762 ): 20130695 . https://doi.org/10.1098/rspb.2013.0695

Database of Global Administrative Areas . 2015 . GADM Database (www.gadm.org), version 2.8 . https://gadm.org/data.html (Downloaded on 7/3/2017).

e-Stat . Portal Site of Official Statistics of Japan . 2016 . https://www.e-stat.go.jp/gis/statmap-search?page=1&type=1&toukeiCode=00200521&toukeiYear=1995&aggregateUnit=S&serveyId=S002005111995&statsId=T000751

Felsenstein , J. ( 1973 ). ‘ Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters’, Systematic Biology , 22 ( 3 ): 240 – 249 . https://doi.org/10.1093/sysbio/22.3.240

Felsenstein , J. ( 1981 ). ‘ Evolutionary Trees From DNA Sequences: A Maximum Likelihood Approach’, Journal of Molecular Evolution , 17 ( 6 ): 368 – 376 . https://doi.org/10.1007/BF01734359

Geospatial Information Authority of Japan . 2006 . Global Map Japan Version 1.1 Raster Data . https://www.gsi.go.jp/kankyochiri/gm_japan_e.html . (Downloaded on 4/11/2014).

Gray , R. , Drummond , A. J. , and Greenhill , S. J. ( 2009 ). ‘ Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement’, Science , 323 : 479 – 483 .

Hattori S. 1959 (reprint 1999 ). Nihongo no Keitō , pp. 130 . Japan : Iwanami bunko .

Google Preview

Hattori S. 1985 (reprint 2018 ). ‘ Nihongo Shohōgen no Akusento no Kenkyū to Hikaku Hōhō’, in Z.   Uwano (ed)  Nihon sogo no saiken , pp. 597 – 610 . Japan : Iwanami Shoin .

Hirako , T. ( 2017 ). ‘ On the Historical Position of the Gairin Type Accent’, Journal of Asian and African Studies , 94 : 259 – 276 .

Hirayama T. ( 1951 ) Kyūshū Hōgen On-chō no Kenkyū: Kyōtsū-go Keihan-go tono Hikaku Kōsatsu . Japan : Gakkaino shishin sha .

Hirayama T. ( 1957 ) Nihongo On-chō no Kenkyū . Japan : Meiji shoin .

Hirayama T. ed. ( 1960 ) Zenkoku akusento jiten . Japan : Tokyo-dō Shuppan .

Hirayama T. ( 1969 ) Satsunan Shotō no Sōgō teki Kenkyū . Japan : Meiji Shoin .

Hirayama , T. ( 1979 ). Gengotō Nara-ken Totsukawa hōgen no Seikaku . Gengo Kenkyu , 76 , 29 – 73 . https://doi.org/10.11435/gengo1939.1979.76_29

Hoffmann , K. , Bouckaert , R. , Greenhill , S. J. , and Kühnert , D. ( 2021 ). ‘ Bayesian Phylogenetic Analysis of Linguistic Data Using BEAST’ , Journal of Language Evolution , 6 ( 2 ), 119 – 135 . https://doi.org/10.1093/jole/lzab005

Huisman , J. L. A. , Majid , A. , and van Hout , R. ( 2019 ). ‘ The Geographical Configuration of a Language Area Influences Linguistic Diversity’, PLoS One , 14 ( 6 ), e0217363 . https://doi.org/10.1371/journal.pone.0217363

Igarashi Y. ( 2021 ) ‘ Bunki-gaku-teki shuhō ni motozuita Nichiryū shogo no keitō bunrui no kokoromi’, in Y.   Hayashi , T.   Kinuhata , and  N.   Kibe (eds)  Fīrudo to bunken kara miru Nichiryū shogo no keitō to rekishi , pp. 17 – 51 . Japan : Kaitakusha .

Jeszenvszky , P. , Hikosaka , Y. , Imamura , S. , and Yano , K. ( 2019 ). ‘ Japanese Lexical Variation Explained by Spatial Contact Patterns’, Geo-Inf , 8 ( 9 ): 400 . https://doi.org/10.3390/ijgi8090400

Kibe , N. ( 1997 ). ‘ 18-Seiki Satsuma no hyōryū-min Gonza no Akusento ni Tsuite: Joshi no Akusento to Gonza Akusento no Ichizuke’ , Kokugogaku , 191 : 84 – 97 .

Kibe N. ( 2000 ) Seinanbu Kyūshū 2-kei akusento no Kenkyū . Japan : Bunsei Shuppan .

Kindaichi H. ( 1942 ) (reprint 2005) ‘ Bumoki no Kenkyu Zokuchō. Nihongo no Akusento, Nihon Hōgen Gakkai’, in Kindaichi Haruhiko Chosaku Shū 9 , pp. 9 – 38 . Japan : Tamakawa Daigaku Shuppanbu .

Kindaichi H. ( 1964 ) (reprint 1977). ‘ Watashi no hōgen kukaku’, in Nihon-go hōgen no kenkyū , pp. 54 – 80 . Japan : Tokyodō .

Kindaichi H. ( 1966b ) (reprint 2005) ‘ Sanuki akusento hen’i seiritsu kou’, in H.   Kindaichi (ed)  Kindaichi Haruhiko Chosaku shū 7 , pp. 531 – 568 . Japan :  Tamakawa Daigaku Shuppanbu .

Kindaichi H. , 1966a (reprint 2005). ‘ Tsushima Iki no Akusento no Chii’, in H.   Kindaichi (ed)  Kindaichi Haruhiko Chosaku Shū 7 , pp. 347 – 373 . Japan :  Tamakawa Daigaku Shuppanbu .

Kindaichi , H. ( 1967 ). ‘ Tōgoku hōgen no rekishi o kangaeru’ , Kokugo-gaku , 69 : 40 – 50 .

Kindaichi H. ( 1975b ) (reprint 2005). ‘ On’in henka kara akusento no henka he’, in H.   Kindaichi (ed)  Kindaichi Haruhiko Chosaku shū 7 , pp. 630 – 657 . Japan :  Tamakawa Daigaku Shuppanbu .

Kindaichi H. , ( 1975a ) (reprint 2005). ‘ Tōzai ryo akusento no chigai ga dekiru made’, in H.   Kindaichi (ed)  Kindaichi Haruhiko Chosaku Shū 7 , pp. 374 – 414 . Japan : Tamakawa Daigaku Shuppanbu .

Kindaichi , H. ( 1978 ). ‘ Aichi ken Akusento no Keifu’, Kokugo Gaku Ronshū’, . Kasama-Shoin , 1 : 1 – 19 .

Kindaichi H. ed ( 2001 ). A concise Tone Dictionary of the Japanese Language [Meikai Nihongo Akusento Jiten] . Japan : Sanseidō .

Koile , E. , et al. . ( 2022 ). ‘ Phylogeographic Analysis of the Bantu Language Expansion Supports A Rainforest Route’, Proceedings of the National Academy of Sciences of the United States of America , 119 ( 32 ): e2112853119 . https://doi.org/10.1073/pnas.2112853119

Lee , S. , and Hasegawa , T. ( 2011 ) ‘ Bayesian Phylogenetic Analysis Supports an Agricultural Origin of Japonic Languages’, Proceedings of the Royal Society B , 278 ( 1725 ): 3662 – 3669 . https://doi.org/10.1098/rspb.2011.0518

Lemey , P. , Rambaut , A. , Drummond , A. J. , and Suchard , M. A. ( 2008 ). ‘ Bayesian Phylogeography Finds Its Roots,’ PLoS Computational Biology , 5 ( 9 ): e1000520 . https://doi.org/10.1371/journal.pcbi.1000520

Lemey , P. , Rambaut , A. , Welch , J. J. , and Suchard , M. A. ( 2010 ). ‘ Phylogeography Takes a Relaxed Random Walk in Continuous Space and Time’ , Molecular Biology and Evolution , 27 ( 8 ): 1877 – 1885 . https://doi.org/10.1093/molbev/msq067

Lewis , P. O. ( 2001 ). ‘ A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data’, Systematic Biology , 50 ( 6 ): 913 – 925 . https://doi.org/10.1080/106351501753462876

List J. M. , Shijulal N. S. , Martin W. , and Geisler H. ( 2014 ) ‘ Using Phylogenetic Networks to Model Chinese Dialect History’, Language Dynamics and Change , 4 : 222 – 252 . ( https://doi.org/10.1163/22105832-00402008 )

Long , D. , Isono , E. , and Tsukahara , Y. ( 2008 ). ‘ Ogasawara Shotō no Ōbei-kei Tōmin ni Mirareru Go-akusento no Kata Oyobi Sono Sedaisa’ , Ogasawara Kenkyū Nenpō , 31 , 31 – 40 .

Mase Y. ( 1994 ) Hiroshima-shi hōgen akusento jiten . Japan : Nakano Shuppan Kikaku .

Matsukura , K. ( 2014 ). ‘ Distribution of Accent Systems in Awara City. Fukui Pref ’, Tokyo University Linguistic Papers , 35 : 141 – 154 . https://doi.org/10.15083/00027471

Matsukura , K. , and Nitta , T. ( 2016 ). ‘ Comparison of the Three-pattern Accent Systems in Fukui Prefecture’, Journal of the Phonetic Society of Japan , 20 ( 3 ): 81 – 94 . https://doi.org/10.24467/onseikenkyu.20.3_81

Matsumori , A. ( 1998 ). ‘ Ryūkyū Akusento no Rekishi Teki Keisei Katei: Ruibetsu Gorui Nihakugo no Tokuina Gōryū no Shikata O Tegakari ni’, Gengo Kenkyū , 114 : 85 – 114 .

Matsumori , A. , Nitta , T. , Kibe , N. , and Nakai , Y. ( 2012 ) Nihongo Akusento Nyūmon . Japan : Sanseido .

Ministry of Land, Infrastructure, Transport and Tourism ( 2020 ) Digital national land information (Administrative Area Data), ver. 2.2 , https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-N03-v2_3.html (downloaded in March 2022).

Nerbonne , J. ( 2010 ). ‘ Measuring the Diffusion of Linguistic Change’, Philosophical Transactions of the Royal Society of London, Series B: Biologicl Sciences , 365 ( 1559 ): 3821 – 3828 . https://doi.org/10.1098/rstb.2010.0048

Neureiter , N. , et al. . ( 2022 ). ‘ Detecting Contact in Language Trees: A Bayesian Phylogenetic Model with Horizontal Transfer’ , Humanities and Social Sciences Communications , 9 ( 1 ): 205 .

Nitta , T. ( 2012 ). ‘ Accent of the Kokonogi Dialect in Echizen Town, Fukui Prefecture’ , Journal of the Phonetic Society of Japan , 16 ( 1 ): 63 – 79 . https://doi.org/10.24467/onseikenkyu.16.1_63

Okumura M. ed. ( 1976 ) Gifu-ken hōgen no kenkyū . Japan : Taishushobō .

Pagel , M. , Atkinson , Q. D. , and Meade , A. ( 2007 ). ‘ Frequency of Word-Use Predicts Rates of Lexical Evolution throughout Indo-European History’, Nature , 449 ( 7163 ): 717 – 720 . https://doi.org/10.1038/nature06176

Pagel , M. , and Meade , A. ( 2017 ). ‘ The Deep History of the Number Words’, Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences , 373 ( 1740 ): 20160517 . https://doi.org/10.1098/rstb.2016.0517

Pellard T. ( 2016 ) ‘ Nichiryū Sogo no Bunki Nendai’, in Y.   Takubo , J.   Whitman , and  T.   Hirako (eds)  Ryūkyū shogo to Kodai Nihongo: Nichiryū sogo no saiken ni mukete , pp. 99 – 124 . Japan : Kuroshio Shuppan .

Romano , N. , Ranacher , P. , Bachmann , S. , and Joost , S. ( 2022 ). ‘ Linguistic Traits as Heritable Units? Spatial Bayesian Clustering Reveals Swiss German Dialect Regions’, Journal of Linguistic Geography , 10 ( 1 ): 11 – 22 . https://doi.org/10.1017/jlg.2021.12

Saitou , N. , and Jinam , T. A. ( 2017 ). ‘ Language Diversity of the Japanese Archipelago and its Relationship with Human DNA Diversity’ , Man in India , 95 ( 4 ): 205 – 228 .

Saitou , N. , and Nei , M. ( 1987 ). ‘ The Neighbor-Joining Method: A New Method For Reconstructing Phylogenetic Trees’ , Molecular Biology and Evolution , 4 ( 4 ): 406 – 425 . https://doi.org/10.1093/oxfordjournals.molbev.a040454

Sato , R. ( 1983 ). ‘ Fukui-Shi Oyobi Sono Shūhen Chiiki no Akusento-Chōsahō to Kata no Kubetsu no Arawarekata tono Kanren o Chūshin ni’, Kokugogaku Kenkyū , 23 : 1 – 19 .

Sato R. , ( 1988 ) ‘ The Accent System of Fukui City and Its Suburbs—With Special Reference to the Survey Methods, Age and Individual Differences’, in: National Institute for Japanese Language , (ed.) Hōgen Kenkyū hō no Tankyū , pp. 123 – 219 . Japan :  Shūei Shuppan .

Sato , Y. , Sogabe , Y. , and Mazuka , R. ( 2010 ). ‘ Development of hemispheric specialization for lexical pitch-accent in Japanese infants’ , Journal of Cognitive Neuroscience , 22 ( 11 ): 2503 – 2513 . https://doi.org/10.1162/jocn.2009.21377

Shibata T. (1942) (reprint  1950 ) ‘ Ibigawa Jōryū no Akusento’, in T.   Shibata (ed)  Moji to Kotoba , pp. 231 – 266 . Japan : Toue Shoin .

Shibatani M. , ed ( 1990 ) (8th ed: 2005). The Languages of Japan. Cambridge Language Survey . UK :  Cambridge University Press .

Szmrecsanyi B. ( 2012 ) ‘ Geography is Overrated’, in: Hansen S. , Schwarz C. , Stoeckle P. , and  Streck T. , (eds) Dialectological and Folk Dialectological Concepts of Space—Current Methods and Perspectives in Sociolinguistic Research on Dialect Change , pp. 215 – 231 . Berlin, Germany : De Gruyter .

Takahashi , T. , and Ihara , Y. ( 2023 ). ‘ Spatial Evolution of Human Cultures Inferred Through Bayesian Phylogenetic Analysis’, Journal of the Royal Society Interface , 20 ( 198 ): 20220543 . https://doi.org/10.1098/rsif.2022.0543

Tokugawa , M. ( 1962 ). ‘Nihongo sho-hōgen akusento no keifu’ shiron: ‘rui no tōgō’ to ‘chiri-teki bumpu’ kara miru . Gakushuin Daigaku Kokugo Kokubungaku Kaishi , 6 : 1 – 19 .

Uwano , Z. ( 1977 ) ‘ Nihongo no akusento’, in S.   Ohno and T.   Shibata (eds)  Iwanami kōza Nihongo 5 – On’in , pp. 281 – 322 . Japan : Iwanami Shoten .

Uwano , Z. ( 1985a ). ‘ The Accent System of Ibukijima Dialect’, Transactions of the Japan Academy , 40 ( 2 ): 75 – 179 . https://doi.org/10.2183/tja1948.40.75

Uwano , Z. ( 1985b ). ‘ Genealogical Relationships and the Geographical Distribution of the Accents in Mainland Japan’, Transactions of the Japan Academy , 40 ( 3 ): 215 – 250 . https://doi.org/10.2183/tja1948.40.215

Uwano , Z. ( 1987 ). ‘ Genealogical Relationships and the Geographical Distribution of the Accents in Mainland Japan’ , Transactions of the Japan Academy , 42 ( 1 ): 15 – 70 . https://doi.org/10.2183/tja1948.42.15

Uwano , Z. ( 1990 ). ‘ Accentual System of the Adjective in the Aomori Dialect ’, Asia & African linguistics , 19 : 45 – 81 .

Uwano , Z. ( 2006 ). ‘ Nihongo Akusento no Saiken’ , Gengo Kenkyū , 130 : 1 – 42 .

Uwano , Z. ( 2019 ). ‘ Accent Data of Verbs in the Northern Tōhoku Dialects: Part 1 ’, NINJAL Research Papers , 17 : 101 – 130 . https://doi.org/10.15084/00002226

Yamaguchi Y. ( 1984 ). ‘ Fukui-shi Kougai no Ni-kei Akusento’, Hōgen Kenkyū Nempō , 27 :  207 – 229 .

Yamaguchi Y. ( 2003 ) ‘ Akusento Taikei ga Shashō shita mono – Jun Ni-kei Akusento Shizuoka-ken Maisaka-machi Hōgen no Rei’, in Nihongo Tokyo Akusento no Seiritsu , pp. 326 – 349 . Japan : Minatono hito .

Supplementary data

Month: Total Views:
July 2024 394

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 2058-458X
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Correspondence
  • Open access
  • Published: 22 July 2024

Co-mutation landscape and its prognostic impact on newly diagnosed adult patients with NPM1 -mutated de novo acute myeloid leukemia

  • Yiyi Yao 1 , 2   na1 ,
  • Yile Zhou 1 , 2   na1 ,
  • Nanfang Zhuo   ORCID: orcid.org/0009-0008-9802-0814 1 , 2   na1 ,
  • Wanzhuo Xie 1 , 2 ,
  • Haitao Meng 1 , 2 ,
  • Yinjun Lou 1 , 2 ,
  • Liping Mao 1 , 2 ,
  • Hongyan Tong   ORCID: orcid.org/0000-0001-5603-4160 1 , 2 , 3 , 4 ,
  • Jiejing Qian 1 , 2 ,
  • Min Yang 1 , 2 ,
  • Wenjuan Yu 1 , 2 ,
  • De Zhou 1 , 2 ,
  • Jie Jin   ORCID: orcid.org/0000-0002-8166-9915 1 , 2 , 3 , 4 &
  • Huafeng Wang   ORCID: orcid.org/0000-0002-8360-6395 1 , 2 , 3 , 4  

Blood Cancer Journal volume  14 , Article number:  118 ( 2024 ) Cite this article

43 Accesses

Metrics details

  • Cancer genomics
  • Genetics research

Dear Editor,

Approximately 25–35% of adult patients with acute myeloid leukemia (AML) carries NPM1 mutation, which generally indicated a favorable outcome in the absence of FLT3-ITD mutation [ 1 ]. NPM1 mutations are absent in clonal hematopoiesis, and have been considered as AML initiating lesions [ 2 ]. Research on co-mutation characteristics of NPM1 -mutated patients concentrated on FLT3-ITD , which has been suggested to hold a negative prognostic impact on NPM1 -mutated patients by several large retrospective clinical studies [ 3 , 4 ]. Besides FLT3-ITD , although there remains controversy, other high-frequency co-mutations such as DNMT3A , IDH1 , IDH2 , FLT3-TKD , NRAS , and WT1 mutations have also been pointed out to affect the prognosis of NPM1- mutated patients [ 3 , 5 , 6 , 7 , 8 , 9 ]. Indeed, identification of specific co-mutation combinations other than FLT3-ITD mutation is essential for precise risk stratification and treatment strategy optimization for NPM1 -mutated AML patients. Since allogeneic hematopoietic stem cell transplantation (allo-HSCT) is generally considered to improve the long-term outcome of most adverse-risk and suitable intermediate-risk AML patients, for NPM1 -mutated AML patients, it is imperative to revisit the co-mutation profiles to determine the optimal population who may benefit from allo-HSCT.

In this study, we conducted a retrospective analysis of newly diagnosed adult AML patients with NPM1 mutations (acute promyelocytic leukemia excluded) in our center diagnosed from October 2018 to December 2022, focusing on exploring the therapeutic and prognostic significance of co-mutation characteristics in AML patients with NPM1 mutations. Patients who received at least one complete course of induction therapy were included in the further outcome analysis. Table S1 provided details of induction chemotherapy. We evaluated efficacy after two induction cycles, unless patients achieved CR/CRi after receiving only one induction cycle or discontinued treatment. Response evaluation was performed according to the NCCN guidelines for AML (version 3. 2023) and was categorized as CR/CRi or non-CR/CRi (including PR and NR) cohort [ 10 ]. Overall survival (OS) was defined as the time interval from treatment initiation until death due to any reason. Event-free survival (EFS) was defined as the time interval from treatment initiation to the occurrence of induction failure, relapse, or death, whichever came first. Disease-free survival (DFS) was defined as the time interval from disease remission to the occurrence of relapse or death, whichever came first. The study was conducted in accordance to the Declaration of Helsinki and was approved by the Ethics Committee of the First Affiliated Hospital of Zhejiang University College of Medicine (Hangzhou, China, Ethics Approval Number: IIT20240304A). All statistical analyses were performed using GraphPad Prism 7.0 software (GraphPad Software, CA, USA) and SPSS 23.0 (SPSS Inc., Chicago, IL).

One hundred ninety-two newly diagnosed NPM1 -mutated AML patients detected through next-generation sequencing (NGS) were analyzed (Tables S2 – S4 ). Twenty NPM1 mutants were identified, most of which were located in exon 12 and manifested as 4 base pair duplication/insertion alteration. Seven non-exon 12 mutants were located in exon 5, 8, 9 and exon 11, respectively (Fig. 1A and Table S5 ). A total of 56 co-mutated genes were detected in the cohort (Fig. 1B ). Co-mutated genes with a detection rate of ≥10% included FLT3 (56.77%), DNMT3A (48.44%), TET2 (29.69%), IDH2 (23.96%), IDH1 (14.58%), PTPN11 (11.46%), and NRAS (11.46%). Co-mutated genes related to epigenetics and signal transduction were the most common by functional classification (Table S6 ).

figure 1

A Protein domain structure and location of amino acids affected by mutations in NPM1 . Several nuclear import and export signals of NPM1 assist its nucleocytoplasmic shuttling and cytological localization. The conserved N-terminal domain of NPM1 contains a leucine-rich nuclear export signal (NES). The middle domain contains two nuclear localization signals (NLS) that drive NPM1 to move from the cytoplasm to the nucleus. The C-terminus contains a nucleolar localization signal (NoLS), in which two highly conserved tryptophan residues (W288 and W290) are responsible for the correct folding of the helix to stabilize the hydrophobic core of NoLS. Most of the insertion mutations in exon 12 led to the loss of the original NoLS signal and generated a new NES signal, leading to aberrant cytoplasmic dislocation of NPM1 protein. B Co-mutation distribution map of NPM1 -mutated AML patients.

One hundred seventy-eight patients (92.71%) received at least one complete course of intensive induction chemotherapy and underwent efficacy assessment, of which 133 patients (74.72%) achieved CR/CRi within two courses of induction chemotherapy. The median follow-up of the 178 patients was 26.23 months (95% confidence interval [CI], 23.31–29.16). The median OS and DFS have not been reached, with the median EFS of 15.03 months (95% CI, 8.25–21.82). The 3-year expected OS, EFS, and DFS were 51.5%, 40.3%, and 53.7%, respectively.

Regardless of the cut-off value of variant allele frequency (VAF) levels, there was no significant difference in OS, EFS, and DFS between NPM1 low VAF group and NPM1 high VAF group (Fig. S1 ). Then we focused on impact of co-mutations on response and outcome of AML patients with NPM1 mutations. Among the 178 NPM1 -mutated patients included in the follow-up, we noticed that patients with either FLT3-ITD or DNMT3A mutations showed significantly worse CR/CRi rates and prognosis trends than wild type group ( FLT3-ITD , CR/CRi rates, 63.41% vs. 84.38%, p  = 0.001; median OS, 14.3 months vs. NR, p  < 0.001; median EFS, 7.3 months vs. NR, p  < 0.001; median DFS, 21.6 months vs. NR, p  = 0.044; DNMT3A , CR/CRi rates, 67.44% vs. 81.53%, p  = 0.013; median OS, 15.3 months vs. NR, p  < 0.001; Median EFS, 11.6 months vs 27.7 months, p  = 0.031; Median DFS, p  = 0.337) (Table S7 and Fig. S2 ). We further divided patients into four subgroups according to the FLT3-ITD and DNMT3A mutation status. NPM1/FLT3-ITD/DNMT3A triple mutants showed extremely poor OS and EFS trends among four groups (Fig. 2A, B ). Besides, we noticed that when combined with DNMT3A mutations, FLT3-ITD mutated patients exhibited significantly worse OS than that of FLT3-ITD wild-type patients ( p  = 0.003), while similar results were found in DNMT3A wild-type patients ( p  = 0.002); We also noticed that when combined with FLT3-ITD mutations, DNMT3A mutated patients exhibited significantly worse OS than that of DNMT3A wild-type patients ( p  = 0.045), with similar results occurred in FLT3-ITD wild-type patients ( p  = 0.020) (Fig. 2A ).

figure 2

A OS and B EFS of NPM1 -mutated AML patients with different combination patterns of FLT3-ITD and DNMT3A mutations. C OS, D EFS, and E DFS of NPM1 -mutated AML patients with IDH1/2 mutation . F OS, G EFS, and H DFS of NPM1 -mutated AML patients with PTPN11 -PTP mutation. I OS and J EFS of NPM1 mut FLT3-ITD mut AML patients with IDH mutations. K OS and L EFS of NPM1 mut DNMT3A mut AML patients with PTPN11 mutations. M OS, N EFS, and O DFS of allo-HSCT on NPM1 -mutated AML patients harbored both FLT3-ITD and DNMT3A mutations.

For patients combined with IDH1/2 mutations, we observed that the IDH1/2 mutant group significantly improved OS, EFS, and DFS compared with wild-type group (Median OS, NR vs. 18.6 months, p  < 0.001; Median EFS, NR vs 10.2 months, p  = 0.003; Median DFS, NR vs 18.3 months, p  = 0.012) (Figs. 2C–E and S3 ). Although patients combined with PTPN11 mutations showed a trend toward improved outcome compared with PTPN11 wild-type, the difference was not significant (Fig. S4 ). PTPN11 mutations have been reported to be mainly clustered in the N-terminal Src homology region 2 (N-SH2) and phosphatase (PTP) domains. Since mutations in both two domains involved in attenuating the autoinhibition of the protein, SHP2, encoded by PTPN11 [ 11 ], we further investigated whether mutations in different domains of PTPN11 led to comparable outcome. The OS and EFS of patients with PTPN11 -PTP domain mutations were significantly improved compared to those with PTPN11 wild-type (Median OS, NR vs 26.0 months, p  = 0.014; Median EFS, NR vs 13.5 months, p  = 0.016). Similar trends were found in DFS, whereas patients with PTPN11 -N-SH2 domain mutations showed no significant improvement in outcome (Figs. 2F–H and S4 ). In addition, Fig. S5 showed the prognostic impact of other co-mutation genes with a detection rate of ≥10% in the follow-up patients, including TET2 , FLT3-TKD , NRAS , and WT1 , with trends all non-significant.

Further, we took into account the presence of IDH or PTPN11 mutations in NPM1- mutated patients combined with FLT3-ITD or DNMT3A mutations to explore the prognostic impact of the specific co-mutation interaction patterns. Separately, carrying IDH mutations significantly improved OS and exhibited an improved EFS trend in patients with NPM1 / FLT3-ITD dual mutations (Median OS, 30.8 vs 12.8 months, p  = 0.015; Median EFS, 22.6 vs 6.1 months, p  = 0.099), but has no significant impact on the outcome of patients with NPM1 / DNMT3A mutations (Figs. 2I, J and S6 ). Similarly, carrying PTPN11 mutations significantly improved OS and EFS in patients with NPM1 / DNMT3A dual mutations (Median OS, NR vs. 14.6 months, p  = 0.026; Median EFS, NR vs. 10.2 months, p  = 0.033), but has no significant impact on the outcome of patients with NPM1 / FLT3-ITD mutations (Figs. 2K, L and S6 ).

Previous research generally acknowledged that allo-HSCT is beneficial for FLT3-ITD mutated AML patients without NPM1 mutations. To identify the subgroup of NPM1 -mutated AML patients likely to benefit from allo-HSCT, we explored the prognosis of patients who underwent allo-HSCT during post-remission after achieving CR/CRi within two courses of induction. A total of 32 patients received allo-HSCT, with another four patients relapsed and received salvage-HSCT during post-remission. For patients with NPM1 mutation, receiving allo-HSCT or salvage-HSCT did not significantly improve the outcome compared with non-transplanted patients (Fig. S7 ). For patients with NPM1 mutations combined with either FLT3-ITD or DNMT3A mutation, allo-HSCT showed a trend toward improved outcome, but the difference was not significant. When further focused on patients with NPM1/FLT3-ITD/DNMT3A triple mutations characterized by poor prognosis, we observed that allo-HSCT significantly improved the OS, EFS, and DFS of these subgroup (Median OS, NR vs. 14.0 months, p  = 0.037; Median EFS, NR vs. 9.1 months, p  = 0.014; Median DFS, NR vs. 7.4 months, p  = 0.012) (Fig. 2M–O ). Nevertheless, for NPM1 -mutated patients with wild type FLT3-ITD and DNMT3A , administration of allo-HSCT showed no improved outcome (Fig. S7 ).

Our results indicated that in NPM1 -mutated AML, co-mutations of IDH1 /2 and PTPN11 -PTP domain were correlated with favorable prognosis, whereas FLT3-ITD and DNMT3A co-mutations were indicative of poor prognosis. Notably, the presence of NPM1/FLT3-ITD/DNMT3A triple mutations is associated with exceptionally adverse OS and EFS trends. Several studies have reported NPM1 / FLT3-ITD / DNMT3A , the most common triple mutation pattern in NPM1 -mutated patients, defined an AML subgroup with extremely poor prognosis [ 7 , 12 ], which aligned with our findings. Further, our results on specific co-mutation combinations indicated that IDH and PTPN11 co-mutations, respectively, ameliorated the adverse prognosis of patients with NPM1 / FLT3-ITD or NPM1 / DNMT3A dual mutations, thus two subsets with improved prognosis were redefined from the original adverse-prognosis subset of NPM1 -mutated AML. Besides, for patients with NPM1 / FLT3-ITD dual mutations, allo-HSCT post-first remission has demonstrated a significant enhancement in both OS and DFS juxtaposed with the continued administration of chemotherapy alone [ 13 , 14 ]. However, another large cohort study on pediatric AML reported opposite results [ 15 ]. Our research endeavored to identify the optimal population who may benefit from allo-HSCT. The findings underscored the therapeutic potential of allo-HSCT, particularly for AML patients with NPM1/FLT3-ITD/DNMT3A triple mutations during post-remission.

In summary, these findings underscored the importance of co-mutation analysis in NPM1 -mutated AML for risk stratification and therapeutic decision-making, suggesting that allo-HSCT may be a recommended strategy for NPM1 -mutated patients with specific adverse co-mutation profiles. Nevertheless, further research is needed to confirm these findings and explore how these co-mutations interact to diversify the outcome of NPM1 -mutated AML patients.

Data availability

The data are not publicly available, owing to ethics considerations and privacy restriction, but can be requested from the corresponding author if necessary.

Grimwade D, Ivey A, Huntly BJ. Molecular landscape of acute myeloid leukemia in younger adults and its clinical relevance. Blood. 2016;127:29–41. https://doi.org/10.1182/blood-2015-07-604496 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

McKerrell T, Park N, Moreno T, Grove CS, Ponstingl H, Stephens J, et al. Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis. Cell Rep. 2015;10:1239–45. https://doi.org/10.1016/j.celrep.2015.02.005 .

Papaemmanuil E, Gerstung M, Bullinger L, Gaidzik VI, Paschka P, Roberts ND, et al. Genomic classification and prognosis in acute myeloid leukemia. N Engl J Med. 2016;374:2209–21. https://doi.org/10.1056/NEJMoa1516192 .

Boddu PC, Kadia TM, Garcia-Manero G, Cortes J, Alfayez M, Borthakur G, et al. Validation of the 2017 European LeukemiaNet classification for acute myeloid leukemia with NPM1 and FLT3-internal tandem duplication genotypes. Cancer. 2019;125:1091–100. https://doi.org/10.1002/cncr.31885 .

Article   CAS   PubMed   Google Scholar  

Gaidzik VI, Weber D, Paschka P, Kaumanns A, Krieger S, Corbacioglu A, et al. DNMT3A mutant transcript levels persist in remission and do not predict outcome in patients with acute myeloid leukemia. Leukemia. 2018;32:30–7. https://doi.org/10.1038/leu.2017.200 .

Boddu P, Kantarjian H, Borthakur G, Kadia T, Daver N, Pierce S, et al. Co-occurrence of FLT3-TKD and NPM1 mutations defines a highly favorable prognostic AML group. Blood Adv. 2017;1:1546–50. https://doi.org/10.1182/bloodadvances.2017009019 .

Bezerra MF, Lima AS, Pique-Borras MR, Silveira DR, Coelho-Silva JL, Pereira-Martins DA, et al. Co-occurrence of DNMT3A, NPM1, FLT3 mutations identifies a subset of acute myeloid leukemia with adverse prognosis. Blood. 2020;135:870–5. https://doi.org/10.1182/blood.2019003339 .

Eisfeld AK, Kohlschmidt J, Mims A, Nicolet D, Walker CJ, Blachly JS, et al. Additional gene mutations may refine the 2017 European LeukemiaNet classification in adult patients with de novo acute myeloid leukemia aged <60 years. Leukemia. 2020;34:3215–27. https://doi.org/10.1038/s41375-020-0872-3 .

Patel JP, Gonen M, Figueroa ME, Fernandez H, Sun Z, Racevskis J, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. N Engl J Med. 2012;366:1079–89. https://doi.org/10.1056/NEJMoa1112304 .

Benson AB, Venook AP, Al-Hawary MM, Arain MA, Chen YJ, Ciombor KK, et al. Colon Cancer, Version 2.2021, NCCN clinical practice guidelines in oncology. J Natl Compr Canc Netw. 2021;19:329–59. https://doi.org/10.6004/jnccn.2021.0012 .

Article   PubMed   Google Scholar  

Alfayez M, Issa GC, Patel KP, Wang F, Wang X, Short NJ, et al. The Clinical impact of PTPN11 mutations in adults with acute myeloid leukemia. Leukemia. 2021;35:691–700. https://doi.org/10.1038/s41375-020-0920-z .

Heiblig M, Duployez N, Marceau A, Lebon D, Goursaud L, Plantier I, et al. The impact of DNMT3A status on NPM1 MRD predictive value and survival in elderly AML patients treated intensively. Cancers. 2021;13. https://doi.org/10.3390/cancers13092156 .

Pratcorona M, Brunet S, Nomdedeu J, Ribera JM, Tormo M, Duarte R, et al. Favorable outcome of patients with acute myeloid leukemia harboring a low-allelic burden FLT3-ITD mutation and concomitant NPM1 mutation: relevance to post-remission therapy. Blood. 2013;121:2734–8. https://doi.org/10.1182/blood-2012-06-431122 .

Sakaguchi M, Yamaguchi H, Najima Y, Usuki K, Ueki T, Oh I, et al. Prognostic impact of low allelic ratio FLT3-ITD and NPM1 mutation in acute myeloid leukemia. Blood Adv. 2018;2:2744–54. https://doi.org/10.1182/bloodadvances.2018020305 .

Xu LH, Fang JP, Liu YC, Jones AI, Chai L. Nucleophosmin mutations confer an independent favorable prognostic impact in 869 pediatric patients with acute myeloid leukemia. Blood Cancer J. 2020;10:1. https://doi.org/10.1038/s41408-019-0268-7 .

Article   PubMed   PubMed Central   Google Scholar  

Download references

This work was supported in part by National Natural Science Foundation of China (82370162); Natural Science Foundation of Zhejiang Province, China (LY23H080005); Key R&D Program of Zhejiang (2024C03162) and the Fundamental Research Funds for the Central Universities (226-2022-00003).

Author information

These authors contributed equally: Yiyi Yao, Yile Zhou, Nanfang Zhuo.

Authors and Affiliations

Department of Hematology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310003, Zhejiang, PR China

Yiyi Yao, Yile Zhou, Nanfang Zhuo, Wanzhuo Xie, Haitao Meng, Yinjun Lou, Liping Mao, Hongyan Tong, Jiejing Qian, Min Yang, Wenjuan Yu, De Zhou, Jie Jin & Huafeng Wang

Zhejiang Provincial Key Laboratory of Hematopoietic Malignancy, Zhejiang University, Hangzhou, 310000, Zhejiang, PR China

Zhejiang Provincial Clinical Research Center for Hematological disorders, Hangzhou, 310000, Zhejiang, PR China

Hongyan Tong, Jie Jin & Huafeng Wang

Zhejiang University Cancer Center, Hangzhou, 310000, Zhejiang, PR China

You can also search for this author in PubMed   Google Scholar

Contributions

YY and HW designed the study, collected and analyzed the data, and wrote the first draft of the manuscript. YZ, NZ, WX, HM, YL, LM, HT, JQ, MY, WY, and DZ collected and analyzed the data, and reviewed the manuscript. JJ and HW read and reviewed the manuscript. HW accessed and verified the data, and provided administrative support. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Corresponding author

Correspondence to Huafeng Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethics approval and consent to participate

This study was approved by local ethics committees and was conducted in accordance with the Declaration of Helsinki. All patients signed written informed consent.

Consent for publication

All patients signed informed consent and also consented to the publication of these data.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary materials, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yao, Y., Zhou, Y., Zhuo, N. et al. Co-mutation landscape and its prognostic impact on newly diagnosed adult patients with NPM1 -mutated de novo acute myeloid leukemia. Blood Cancer J. 14 , 118 (2024). https://doi.org/10.1038/s41408-024-01103-w

Download citation

Received : 18 April 2024

Revised : 08 July 2024

Accepted : 11 July 2024

Published : 22 July 2024

DOI : https://doi.org/10.1038/s41408-024-01103-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

mutation research journal

IMAGES

  1. Human Mutation: Vol 41, No 1

    mutation research journal

  2. martin2010

    mutation research journal

  3. Subscribe to Mutation Research

    mutation research journal

  4. Study of mutation from DNA to biological evolution: International

    mutation research journal

  5. Front cover

    mutation research journal

  6. Two examples of drug-gene-mutation relations from a biomedical journal

    mutation research journal

VIDEO

  1. H63D Mutation Research Consortium’s Counseling Team

  2. New Drug Shows Promise for Treating Rare Brain Tumors

  3. Human Mutation

  4. What is Mutation?

  5. Watch Live: Breakthrough study partially restores eyesight

  6. TP53 mutations are linked to unfavourable prognosis in chronic lymphocytic leukaemia

COMMENTS

  1. Mutation Research

    Mutation Research: Genetic Toxicology and Environmental Mutagenesis (MRGTEM) publishes papers advancing knowledge in the field of genetic toxicology. Papers are welcomed in the following areas: ... The evaluation of contrasting or opposing viewpoints is welcomed as long as the presentation is in accordance with the journal's aims, scope, and ...

  2. Mutation Research

    A section of Mutation Research. Mutation Research (MR) provides a platform for publishing all aspects of DNA mutations and epimutations, from basic evolutionary aspects to translational applications in genetic and epigenetic diagnostics and therapy.Mutations are defined as all possible alterations in DNA sequence and sequence organization, from point mutations to genome structural variation ...

  3. Mutation Research

    About the journal. The subject areas of Mutation Research - Reviews in Mutation Research (MRR) encompass the entire spectrum of the science of mutation research and its applications, with particular emphasis on the relationship between mutation and disease. Thus, this section will cover: Advances in human genome …. View full aims & scope ...

  4. Mutation

    Read the latest Research articles in Mutation from Nature Reviews Genetics. ... Journal Club | 19 October 2022. The mutation rate as an evolving trait ... Mutation is the source of genetic ...

  5. Mutation

    Mutation articles from across Nature Portfolio. ... Research 16 Jul 2024 Journal of Human Genetics. P: 1-7 ... Research Highlights 11 Oct 2023 Nature Reviews Genetics.

  6. Mutation Research: Fundamental and Molecular Mechanisms of ...

    A section of Mutation Research. Mutation Research (MR) provides a platform for publishing all aspects of DNA mutations and epimutations, from basic evolutionary aspects to translational applications in genetic and epigenetic diagnostics and therapy. Mutations are defined as all possible alterations in DNA sequence and sequence organization, from point mutations to genome structural variation ...

  7. Mutation

    The mutational landscape of normal human endometrial epithelium. Whole-genome sequencing of normal human endometrial glands shows that most are clonal cell populations and frequently carry cancer ...

  8. Mutation Research (journal)

    Mutation Research is a peer-reviewed scientific journal that publishes research papers in the area of mutation research which focus on fundamental mechanisms underlying the phenotypic and genotypic expression of genetic damage. There are currently three sections: Two previous sections. are now continued as DNA Repair .

  9. Mutation Research

    Scope. Mutation Research (MR) provides a platform for publishing all aspects of DNA mutations and epimutations, from basic evolutionary aspects to translational applications in genetic and epigenetic diagnostics and therapy. Mutations are defined as all possible alterations in DNA sequence and sequence organization, from point mutations to ...

  10. Mutation—The Engine of Evolution: Studying Mutation and Its Role in the

    Abstract. Mutation is the engine of evolution in that it generates the genetic variation on which the evolutionary process depends. To understand the evolutionary process we must therefore characterize the rates and patterns of mutation. Starting with the seminal Luria and Delbruck fluctuation experiments in 1943, studies utilizing a variety of ...

  11. The origins, determinants, and consequences of human mutations

    Advances in DNA sequencing have enabled the identification of human germline and somatic mutations at a genome-wide scale.These studies have confirmed, refined, and extended our understanding on the origins, mechanistic basis, and empirical characteristics of human mutations, including both replicative and nonreplicative errors (), heterogeneity in the rates and spectrum of mutations within ...

  12. Human Mutation

    Human Mutation provides a unique forum for the exchange of ideas, methods, and applications of interest to molecular, human, and medical geneticists in academic, industrial, and clinical research settings worldwide.

  13. Mutation Research

    Incorporating Mutation Research Letters, Mutation Research/Environmental Mutagenesis and Related Subjects and Mutation Research/Genetic Toxicology ; 2024 — Volumes 893-898

  14. Mutation Research

    Scope. Mutation Research - Genetic Toxicology and Environmental Mutagenesis (MRGTEM) publishes papers advancing knowledge in the field of genetic toxicology. Papers are welcomed in the following areas: New developments in genotoxicity testing of chemical agents (e.g. improvements in methodology of assay systems and interpretation of results ...

  15. Subscribe to Mutation Research: Genetic Toxicology and ...

    Mutation Research: Genetic Toxicology and Environmental Mutagenesis (MRGTEM) publishes papers advancing knowledge in the field of genetic toxicology. Papers are welcomed in the following areas: ... The evaluation of contrasting or opposing viewpoints is welcomed as long as the presentation is in accordance with the journal's aims, scope, and ...

  16. What is mutation? A chapter in the series: How microbes ...

    Mutations drive evolution and were assumed to occur by chance: constantly, gradually, roughly uniformly in genomes, and without regard to environmental inputs, but this view is being revised by discoveries of molecular mechanisms of mutation in bacteria, now translated across the tree of life. These mechanisms reveal a picture of highly regulated mutagenesis, up-regulated temporally by stress ...

  17. Mutation Research / Fundamental and Molecular Mechanisms of Mutagenesis

    Journal List; HHS Author Manuscripts; PMC3909961 As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health. ... Mutation Research / Fundamental and Molecular Mechanisms of Mutagenesis Special Issue: DNA ...

  18. Mutation Research

    Scope. Mutation Research (MR) provides a platform for publishing all aspects of DNA mutations and epimutations, from basic evolutionary aspects to translational applications in genetic and epigenetic diagnostics and therapy. Mutations are defined as all possible alterations in DNA sequence and sequence organization, from point mutations to ...

  19. Mutation Research: Reviews in Mutation Research

    The subject areas of Mutation Research - Reviews in Mutation Research (MRR) encompass the entire spectrum of the science of mutation research and its applications, with particular emphasis on the relationship between mutation and disease. Thus, this section will cover: Advances in human genome research (including evolving technologies for mutation detection and functional genomics) with ...

  20. Mutation Research

    Mutation Research - Fundamental and Molecular Mechanisms of Mutagenesis. Supports open access. 4.9 CiteScore. 1.5 Impact Factor. Articles & Issues. About. Publish. ... Sign in to set up alerts; RSS; About. Publish. Order journal. Submit search. Submit your article Guide for authors. All issues. Click here for a complete list of all Mutation ...

  21. Performance evaluation of predictive models for detecting MMR gene

    The mutation risk score distributions in the cohort calculated by each model are shown in Figure 1. The majority of the risk scores were below 10%, consistent with the actual mutation carrier rate of 6.0%. The median risk scores by the PREMM 5, MMRPro, MMRPredict, and Myriad models were 3.4%, 0.45%, 4.0%, and 7.2%, respectively. Notably, most ...

  22. A biology-aware mutation rate model for human germline

    However, research in the past decade has shown that a substantial portion of mutation rate variation has a scale of dozens of kilobases, is DNA strand dependent, and is correlated with gene ...

  23. Bayesian phylogenetic analysis of pitch-accent systems based on

    One possible direction for future research is to pre-classify accent patterns into a few groups, so that the mutation within a group is more likely than mutation between groups. In this way, we may reflect the variation in the mutation rates by introducing two model parameters representing replacement rate within and between groups of accent ...

  24. History of Mutation Research

    The journal Mutation Research was founded in 1964 by Frits H. Sobels. Over the years, adapting to the evolving field, the journal has been divided into several sections and has seen a number of title changes, generating a complex publication history.

  25. Exome Sequencing Identifies Carriers of the Autosomal Dominant Cancer

    PURPOSE The autosomal dominant cancer predisposition disorders hereditary breast and ovarian cancer (HBOC) and Lynch syndrome (LS) are genetic conditions for which early identification and intervention have a positive effect on the individual and public health. The goals of this study were to determine whether germline genetic screening using exome sequencing could be used to efficiently ...

  26. OMRF scientist receives $2.4 million to study genetic mutations

    The National Institutes of Health has awarded a $2.4 million grant to an Oklahoma Medical Research Foundation scientist whose lab identifies disease-causing genetic mutations. ... that will further expedite research on individual genetic mutations. Qin's discovery was recently published in the journal Nature Communications. Filed Under: News ...

  27. Co-mutation landscape and its prognostic impact on newly ...

    Research on co-mutation characteristics of NPM1-mutated patients concentrated on FLT3-ITD, ... Blood Cancer Journal (Blood Cancer J.) ISSN 2044-5385 (online) ...

  28. Mutation Research/Mutation Research Genomics

    Jianzhong Wu. June 2001 View PDF. More opportunities to publish your research: Browse open Calls for Papers beta. Read the latest articles of Mutation Research/Mutation Research Genomics at ScienceDirect.com, Elsevier's leading platform of peer-reviewed scholarly literature.

  29. Thousands of high-risk cancer gene variants identified

    Sep. 7, 2020 — Recent cancer studies have shown that genomic mutations leading to cancer can occur years, or even decades, before a patient is diagnosed. Researchers have developed a statistical ...