Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DNA-METHYLATION-BASED QUALITY CONTROL OF THE ORIGIN OF ORGANISMS
Document Type and Number:
WIPO Patent Application WO/2022/023208
Kind Code:
A1
Abstract:
The invention pertains to a method for the identification of the geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the comparison of a test methylation profile obtained from genomic material of the individual test subject or of the individual group of test subjects with one or more predetermined reference methylation profiles each being specific for a distinct geographic origin.

Inventors:
TÖNGES SINA (DE)
LYKO FRANK (DE)
VENKATESH GEETHA (DE)
ANDRIANTSOA RANJA (DE)
GATZMANN FANNY (DE)
BÖHL FLORIAN (DE)
KAPPEL ANDREAS (DE)
IGWE EMEKA IGNATIUS (DE)
THIEMANN FRANK (DE)
Application Number:
PCT/EP2021/070683
Publication Date:
February 03, 2022
Filing Date:
July 23, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
EVONIK OPERATIONS GMBH (DE)
DEUTSCHES KREBSFORSCHUNGSZENTRUM STIFTUNG DES OEFFENTLICHEN RECHTS (DE)
International Classes:
C12Q1/6827; C12Q1/6888; G16B50/50
Foreign References:
US20040122857A12004-06-24
CN108319984A2018-07-24
US20100112595A12010-05-06
Other References:
J. JIAO ET AL: "Methylation-sensitive amplified polymorphism-based genome-wide analysis of cytosine methylation profiles in Nicotiana tabacum cultivars", GENETICS AND MOLECULAR RESEARCH, vol. 14, no. 4, 1 January 2015 (2015-01-01), pages 15177 - 15187, XP055764753, DOI: 10.4238/2015.November.25.6
LE LUYER J ET AL., PNAS, vol. 114, no. 49, 2017
GATZMANN, F.: "PhD thesis", 2018, FACULTY OF BIOSCIENCES, UNIVERSITY OF HEIDELBERG, article "DNA methylation in the marbled crayfish Procambarus virginalis"
YAMADA ET AL., GENOME RESEARCH, vol. 14, 2004, pages 247 - 266
BORMANN FTUORTO FCIRZI CLYKO F: "BisAMP: A web-based pipeline for targeted RNA cytosine-5 methylation analysis", LEGRAND C.METHODS, vol. 156, 1 March 2019 (2019-03-01), pages 121 - 127, XP085614416, DOI: 10.1016/j.ymeth.2018.10.013
AKALIN ET AL., GENOME BIOLOGY, vol. 13, no. 10, 2012, pages R87
Attorney, Agent or Firm:
EVONIK PATENT ASSOCIATION (DE)
Download PDF:
Claims:
CLAIMS

1 . A method for the identification of the geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the comparison of a test methylation profile obtained from genomic material of the individual test subject or of the individual group of test subjects with one or more predetermined reference methylation profiles each being specific for a distinct geographic origin.

2. The method of claim 1 , comprising the steps of: a. determining the methylation status of one or more pre-selected methylation sites within the genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects; b. determining from the methylation status determined in (a) a test methylation profile of the individual test subject, or of the individual group of test subjects; and c. comparing the test methylation profile determined in (b) with one or more predetermined reference methylation profiles, wherein each of the one or more predetermined reference methylation profiles is specific for a distinct geographic origin of subjects or group of subjects which are of the same biological taxon of the individual test subject or individual group of test subjects; wherein if the test methylation profile is significantly similar to one of the one or more predetermined reference methylation profiles, the individual test subject or the individual group of test subjects has a geographical origin similar to the subjects or group of subjects of the one or more predetermined reference methylation profiles.

3. The method of claim 1 or of claim 2, wherein the individual test subject or individual group of test subjects is any biological entity having a DNA genome and DNA genome methylation, preferably the methylation site being a CpG site.

4. The method of any one of the preceding claims, wherein the individual test subject or individual group of test subjects are selected from a prokaryote, or a eukaryote.

5. The method of any one of the claims 2 to 4, wherein the one or more pre-selected methylation sites in (a) are methylation sites associated with tissue specific gene expression, preferably wherein the pre-selected methylation sites are associated with gene expression of one distinct tissue.

6. The method of claim 5, wherein the tissue is selected from the group consisting of (i) metabolic tissue preferably being gut tissue, (ii) muscular tissue,

(iii) skin or feather tissue, and

(iv) organ tissue, said organ tissue preferably being hepatic and/or pancreatic tissue.

7. The method of any one of the preceding claims, wherein the individual test subject, or the individual group of test subjects, are animals.

8. The method of any one of the preceding claims, wherein the distinct geographic origin is a geographic location that is considered to be the habitat, wherein the individual test subject, or individual group of test subjects, were spawned and/or cultured, or at least cultured for a significant time during their lifetime.

9. The method according to any one of the preceding claims, wherein the one or more preselected methylation sites are within the 20% most differentially methylated genes of the genome of the individual test subject, or individual group of test subjects.

10. A method for quality controlling a suspected geographic origin of an individual test subject, or of an individual group of test subjects, the method comprising the steps of a. determining the methylation status of one or more pre-selected methylation sites within genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects; b. determining from the methylation status determined in (a) a test methylation profile of the individual test subject, or of the individual group of test subjects; and c. comparing the test methylation profile determined in (b) with a predetermined reference methylation profile, wherein the predetermined reference methylation profile is specific for individual subjects, or individual groups of subjects, of the same biological taxon of the individual test subject or individual group of test subjects, and which were obtained from the suspected geographic origin; wherein if the test methylation profile is significantly similar to the predetermined reference methylation profile, the individual test subject or the individual group of test subjects passes the quality control and the suspected geographical origin is indicated as true geographical origin.

11 . A method for assessing one or more environmental parameters of a habitat of an individual test subject, or of an individual group of test subjects, the method comprising the steps of a. determining the methylation status of one or more pre-selected methylation sites within the genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects; b. determining from the methylation status determined in (a) a test methylation profile of the individual test subject, or individual group of test subjects; and c. comparing the test methylation profile determined in (b) with one or more predetermined reference methylation profiles, wherein the one or more predetermined reference methylation profiles are each specific for individual subjects, or individual groups of subjects, of the same biological taxon of the individual test subject or individual group of test subjects, and which were each obtained from distinct geographic origins; and wherein the distinct geographic origin is distinguished from other distinct geographic origins by one or more environmental parameters; wherein if the test methylation profile is significantly similar to one of the one or more predetermined reference methylation profiles, the individual test subject or the individual group of test subjects is derived from a geographical origin having similar, or preferably equal, environmental parameters to the geographical origin of the individual test subjects or individual group of test subjects of the one of the one or more predetermined reference methylation profiles.

12. A method for confirming or declining an assumed geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the comparison of a test methylation profile obtained from genomic material of the individual test subject or of the individual group of test subjects with one or more predetermined reference methylation profiles each being specific for a distinct geographic origin.

13. A method for developing a test system for confirming an assumed geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the steps of: a. determining the methylation status of one or more methylation sites within genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects; b. selecting from the one or more methylation sites a reference panel of methylation sites which is characterized by a specific and distinct differential methylation profile for each of the known geographic origins; c. obtaining a test system by assigning a reference methylation profile for each of the known geographic origins; and wherein a comparison of a test methylation profile obtained from a test sample with the reference methylation profiles obtained in (c) allows for confirming the assumed geographic origin of the individual test subject or of the individual group of test subjects from which the test sample was obtained.

14. The method of any one of the preceding claims, wherein the individual test subject, or the individual group of test subjects is marbled crayfish and/or wherein the distinct geographic origins are geographically distinct waters, these waters preferably being selected from the group consisting of lake(s), river(s) and aquaculture farms.

15. The method of claim 14, wherein the geographically distinct waters are made distinct by one or more environmental parameters selected from the group consisting of pH, water hardness, manganese content, iron content, and aluminum content. 16. The method of any one of claim 14 or claim 15, wherein the method comprises a genome wide methylation analysis or a methylation analysis of a pre-selected panel of methylation sites, the pre-selected panel of methylation sites preferably containing methylation sites within about 500 to 1000, and preferably about 700 genes. 17. The method of claim 16, wherein the panel of methylation sites does not comprise consistently methylated or unmethylated methylation sites.

Description:
DNA-METHYLATION-BASED QUALITY CONTROL OF THE ORIGIN OF ORGANISMS

Field of the Invention

The invention is based on the finding that specific panels of genes provide a source for the generation of DNA methylation profiles which are specific for a geographic origin of organisms. In particular, DNA methylation profiling may be used to identify the genetic origins of animals, that include rearing animals also known as livestock, such as crabs, fish or chicken. The methods of the invention can be applied to identify the geographic origin of organisms including rearing animals, to control assumed geographic origins of a sample of the organisms including rearing animals, and for assessing environmental parameters of habitats of organisms including rearing animals. Further, the invention provides quality control methods and processes for developing new test systems for various organisms including rearing animals.

Background of the Invention

Sustainable food production is presently considered among the globally most important societal needs. As the value chains of the agriculture and aquaculture industries are highly complex, certificates have been established to reinforce consumer relationships and trust. However, certificates are based on audits at specific farms and can be easily tampered by moving livestock from non-certified farms to certified farms. Furthermore, surveillance of sustainable farming practices is spotty and largely limited to audits. As "bad" farming practices are widespread in the industry, there is an urgent need for a tampering-resistant certificate.

The livestock and food process industries have been heavily involved in developing strategies of identifying, tracing and managing the risks in the area of food safety, and in developing strategies for consumer information (transparent value chains). Health, safety and also animal welfare considerations demand that the origins of animal products, and in particular meat products, should be traceable, so that quality assurance audits, and monitoring procedures can be effectively and reliably carried out.

A comparison of genome-wide patterns of methylation and variation at the DNA level revealed that a highly significant proportion of epigenetic variation could be associated with fitness differences and rearing conditions such as captivity in salmon (Le Luyer J et al. 2017 PNAS vol 114, no 49).

A study of genome wide methylation in the marbled crayfish ( Procambarus virginalis) observed stable methylation of most parts of the genome between animals and tissues while a subset of about 700 genes were demonstrated to be highly variable in their methylation (Gatzmann, F. DNA methylation in the marbled crayfish Procambarus virginalis. PhD thesis, Faculty of Biosciences, University of Heidelberg, 2018). In view of the above, there is an urgent need to provide means for identifying and quality controlling the geographic origin of organisms, in particular food and more particularly animal material derived from rearing stock.

Summary of the Invention

The aforementioned objective is solved by the different aspects of the present invention. The invention is based on the finding that resilience to environmental exposures such as stress, climate, light or diet is a fundamental concept of biology and results in the adaptation of an organism to its environment. The capability to adapt to the environment and maintain the adapted biological pattern depends on epigenetic mechanisms, including DNA methylation.

The inventors have unexpectedly found that this property can be utilized to identify environment- specific "epigenetic fingerprints" on the genome and to align organisms to the ecosystem they are originating from. Based on these findings, the present invention provides methods to identify the geographic origin of organisms including rearing animals also known as livestock, methods to control assumed geographic origins of a sample of organisms including rearing animals, and methods for assessing environmental parameters of habitats of organisms including rearing animals. Further, the invention provides quality control methods and processes for developing new test systems for various organisms including rearing animals

Generally, and by way of brief description, the main aspects of the present invention can be described as follows:

In a first aspect, the invention pertains to a method for the identification of the geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the comparison of a test methylation profile obtained from genomic material of the individual test subject or of the individual group of test subjects with one or more predetermined reference methylation profile(s) each being specific for a distinct geographic origin.

In a second aspect, the invention pertains to a method for quality controlling a suspected geographic origin of an individual test subject or individual group of test subjects, the method comprising the steps of a. determining the methylation status of one or more pre-selected methylation sites within genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects; b. determining from the methylation status determined in (a) a test methylation profile of the individual test subject, or of the individual group of test subjects; and c. comparing the test methylation profile determined in (b) with a predetermined reference methylation profile, wherein the predetermined reference methylation profile is specific for individual subjects, or individual groups of subjects, of the same biological taxon (preferably species) of the individual test subject or of the individual group of test subjects, and which were obtained from the suspected geographic origin; wherein if the test methylation profile is significantly similar to the predetermined reference methylation profile, the individual test subject or individual group of test subjects passes the quality control and the suspected geographical origin is indicated as true geographical origin.

In a third aspect, the invention pertains to a method for assessing one or more environmental parameters of a habitat of an individual test subject, or of an individual group of test subjects, the method comprising the steps of

(a) determining the methylation status of one or more pre-selected methylation sites within the genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects;

(b) determining from the methylation status determined in (a) a test methylation profile of the individual test subject, or individual group of test subjects; and

(c) comparing the test methylation profile determined in (b) with one or more predetermined reference methylation profiles, wherein the one or more predetermined reference methylation profiles are each specific for individual subjects, or individual groups of subjects, of the same biological taxon (preferably species) of the individual test subject or individual group of test subjects, and which were each obtained from distinct geographic origins; and wherein the distinct geographic origin is distinguished from other distinct geographic origins by one or more environmental parameters; wherein if the test methylation profile is significantly similar to one of the one or more predetermined reference methylation profiles, the individual test subject or the individual group of test subjects is derived from a geographical origin having similar, or preferably equal, environmental parameters to the geographical origin of the subjects or group of subjects of the one of the one or more predetermined reference methylation profiles.

In a fourth aspect, the invention pertains to a method for confirming or declining an assumed geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the comparison of a test methylation profile obtained from genomic material of the individual test subject or of the individual group of test subjects with one or more predetermined reference methylation profiles each being specific for a distinct geographic origin.

In a fifth aspect, the invention pertains to a method for developing a test system for confirming an assumed geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the steps of:

(a) determining the methylation status of one or more methylation sites within genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects; (b) selecting from the one or more methylation sites a reference panel of methylation sites which is characterized by a specific and distinct differential methylation profile for each of the known geographic origins;

(c) obtaining a test system by assigning a reference methylation profile for each of the known geographic origins (or locations); and wherein a comparison of a test methylation profile obtained from a test sample with the reference methylation profiles obtained in (c) allows for confirming the assumed geographic origin of the individual test subject from which the test sample was obtained.

Detailed Description of the Invention

In the following, the elements of the invention will be described. These elements are listed with specific embodiments and/or examples; however, it should be understood that these elements may be combined in any manner and in any number to create additional embodiments and/or examples. The variously described examples and preferred embodiments should not be construed to limit the present invention to only the explicitly described embodiments or examples. This description should be understood to support and encompass embodiments and examples which combine two or more of the explicitly described embodiments or which combine the one or more of the explicitly described embodiments or examples with any number of the disclosed and/or preferred elements. Furthermore, any permutations and combinations of all described elements in this application should be considered disclosed by the description of the present application unless the context indicates otherwise.

The terms “of the present invention”, “in accordance with the present invention”, “according to the present invention” and the like, as used herein are intended to refer to all aspects, embodiments and examples of the invention described and/or claimed herein.

As used herein, the term “comprising” is to be construed as encompassing both “including” and “consisting of, both meanings being specifically intended, and hence individually disclosed embodiments in accordance with the present invention. Where used herein, “and/or” is to be taken as specific disclosure of each of the two specified features or components with or without the other. For example, “A and/or B” is to be taken as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each is set out individually herein. In the context of the present invention, the terms “about” and “approximately” denote an interval of accuracy that the person skilled in the art will understand to still ensure the technical effect of the feature in question. The term typically indicates deviation from the indicated numerical value by ±20%, ±15%, ±10%, and for example ±5%. As will be appreciated by the person of ordinary skill, the specific deviation for a numerical value for a given technical effect will depend on the nature of the technical effect. For example, a natural or biological technical effect may generally have a larger such deviation than one for a man-made or engineering technical effect. Where an indefinite or definite article is used when referring to a singular noun, e.g. "a", "an" or "the", this includes a plural of that noun unless something else is specifically stated.

It is to be understood that the application of the teachings according to any aspect of the present invention to a specific problem or environment, and the inclusion of variations according to any aspect of the present invention or additional features thereto (such as further aspects and embodiments or examples), will be within the capabilities of one having ordinary skill in the art in light of the teachings contained herein.

Unless context dictates otherwise, the descriptions and definitions of the features set out within this description are not limited to any particular aspect or embodiment of the invention and apply equally to all aspects and embodiments which are described.

All references, patents, and publications cited herein are hereby incorporated by reference in their entirety.

The term “geographic origin” in context of the herein defined invention shall pertain to a geographic location which is distinguished from other geographic locations by one or more environmental parameters of the subject or group of subjects. Such environmental parameters depend on the habitat of the subject or group of subjects and may be different in case the subject or group of subject lives or is cultured in water, on or in soil, or may be selected from a food or air parameter etc. As non-limiting examples of the present invention, for sweet water crabs (such as the marbled crayfish), environmental parameters may be selected from pH, water hardness, manganese content, iron content, and aluminum content - as mentioned these parameters although preferred shall be understood as non-limiting illustrative examples and may greatly vary depending on the taxon or species of the subject or group of subjects. As such, a habitat for the subject or group of subjects that live in water, these habitats can be selected from standing or flowing waters such as lakes, rivers, aqua farms, other pools or bodies of water or ponds. A geographic origin shall be understood to be the geographic location that is considered to be a habitat wherein the individual test subject, or individual group of test subjects, were spawned and/or cultured, or at least cultured for a significant time during their lifetime.

The term “test” used in conjunction with the term subject in the present disclosure refers to an entity or a living organism that is subjected to the method according to any aspect of the present invention and is the basis for an analysis application of the present invention. An “(individual) test subject”, an “(individual) group of test subjects” or a “test profile” is therefore a (individual) subject or group of subjects being tested according to the invention or a profile being obtained or generated in this context. Conversely, the term “reference” shall denote, mostly predetermined, entities which are used for a comparison with the test entity. A subject or group of subjects in context of the present invention may be any living organism. For example, a subject according to any aspect of the present invention may be a plant or animal of any kind, preferably a rearing animal (or rearing stock) or livestock, which may be vertebrates or invertebrates. Typical examples of invertebrates that may be useful for being a subject according to any aspect of the present invention may be prawn or crabs such as the marbled crayfish. Typical examples of vertebrates that may be useful for being a subject according to any aspect of the present invention may be fish or land animals such as chicken or other livestock that may be cultured.

The term “genomic material” shall refer to nucleic acid molecules or fragments of the genome of the subject or group of subjects. Preferably such nucleic acid molecules or fragments are DNA or RNA or hybrids thereof, and most preferably are molecules of the DNA genome of a subject or group of subjects.

In context of the present invention, the terms “methylation profile”, “methylation pattern”, “methylation state” or “methylation status,” are used herein to describe the state, situation or condition of methylation of a genomic sequence, and such terms refer to the characteristics of a DNA segment at a particular genomic locus in relation to methylation. Such characteristics include, but are not limited to, whether any of the cytosine (C) residues within this DNA sequence are methylated, location of methylated C residue(s), percentage of methylated C at any particular stretch of residues, and allelic differences in methylation due to, e.g., difference in the origin of the alleles.

The term "methylation status" refers to the status of a specific methylation site (i.e. methylated vs. non-methylated) which means a residue or methylation site is methylated or not methylated. Then, based on the methylation status of one or more methylation sites, a methylation profile may be determined. Accordingly, the term "methylation profile" or also “methylation pattern” refers to the relative or absolute concentration of methylated C residues or unmethylated C residues at any particular stretch of residues in the genomic material of a biological sample. For example, if cytosine (C) residue(s) not typically methylated within a DNA sequence are methylated, it may be referred to as "hypermethylated"; whereas if cytosine (C) residue(s) typically methylated within a DNA sequence are not methylated, it may be referred to as "hypomethylated". Likewise, if the cytosine (C) residue(s) within a DNA sequence (e.g., the DNA from a sample nucleic acid from a test subject) are methylated as compared to another sequence from a different region or from a different individual (e.g., relative to normal nucleic acid or to the standard nucleic acid of the reference sequence), that sequence is considered hypermethylated compared to the other sequence. Alternatively, if the cytosine (C) residue(s) within a DNA sequence are not methylated as compared to another sequence from a different region or from a different individual, that sequence is considered hypomethylated compared to the other sequence. These sequences are said to be "differentially methylated". Measurement of the levels of differential methylation may be done by a variety of ways known to those skilled in the art. One method is to measure the methylation level of individual interrogated CpG sites determined by the bisulfite sequencing method, as a non-limiting example.

As used herein, a “methylated nucleotide” or a “methylated nucleotide base” refers to the presence of a methyl moiety on a nucleotide base, where the methyl moiety is usually not present in a recognized typical nucleotide base. For example, cytosine in its usual form does not contain a methyl moiety on its pyrimidine ring, but 5-methylcytosine contains a methyl moiety at position 5 of its pyrimidine ring. Therefore, cytosine in its usual form may not be considered a methylated nucleotide and 5-methylcytosine may be considered a methylated nucleotide. In another example, thymine may contain a methyl moiety at position 5 of its pyrimidine ring, however, for purposes herein, thymine may not be considered a methylated nucleotide when present in DNA. Typical nucleotide bases for DNA are thymine, adenine, cytosine and guanine. Typical bases for RNA are uracil, adenine, cytosine and guanine. Correspondingly a "methylation site" is the location in the target gene nucleic acid region where methylation has the possibility of occurring. For example, a location containing CpG is a methylation site wherein the cytosine may or may not be methylated. In particular, the term “methylated nucleotide” refers to nucleotides that carry a methyl group attached to a position of a nucleotide that is accessible for methylation. These methylated nucleotides are usually found in nature and to date, methylated cytosine that occurs mostly in the context of the dinucleotide CpG, but also in the context of CpNpG- and CpNpN-sequences may be considered the most common. In principle, other naturally occurring nucleotides may also be methylated but they will not be taken into consideration with regard to any aspect of the present invention.

As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid (DNA or RNA) that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.

As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more nucleotides that is/are methylated.

A “CpG island” as used herein describes a segment of DNA sequence that comprises a functionally or structurally deviated CpG density. For example, Yamada et al. have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a greater than 50% GC content, and an OCF/ECF ratio greater than 0.6 (Yamada et al., 2004, Genome Research, 14, 247-266). Others have defined a CpG island less stringently as a sequence at least 200 nucleotides in length, having a greater than 50% GC content, and an OCF/ECF ratio greater than 0.6 (Takai et al., 2002, Proc. Natl. Acad. Sci. USA, 99, 3740-3745). The term “bisulfite” as used herein encompasses any suitable type of bisulfite, such as sodium bisulfite, or another chemical agent that is capable of chemically converting a cytosine (C) to a uracil (U) without chemically modifying a methylated cytosine and therefore can be used to differentially modify a DNA sequence based on the methylation status of the DNA, e.g., U.S. Pat. Pub. US 2010/0112595 (Menchen et al.). As used herein, a reagent that "differentially modifies" methylated or non-methylated DNA encompasses any reagent that modifies methylated and/or unmethylated DNA in a process through which distinguishable products result from methylated and non-methylated DNA, thereby allowing the identification of the DNA methylation status. Such processes may include, but are not limited to, chemical reactions (such as a C to U conversion by bisulfite) and enzymatic treatment (such as cleavage by a methylation-dependent endonuclease). Thus, an enzyme that preferentially cleaves or digests methylated DNA is one capable of cleaving or digesting a DNA molecule at a much higher efficiency when the DNA is methylated, whereas an enzyme that preferentially cleaves or digests unmethylated DNA exhibits a significantly higher efficiency when the DNA is not methylated.

In context of the present invention also any “non-bisulfite-based method” and “non-bisulfite-based quantitative method” are comprised to test for a methylation status at any given methylation site to be tested. Such terms refer to any method for quantifying methylated or non-methylated nucleic acid that does not require the use of bisulfite. The terms also refer to methods for preparing a nucleic acid to be quantified that do not require bisulfite treatment. Examples of non-bisulfite-based methods include, but are not limited to, methods for digesting nucleic acid using one or more methylation sensitive enzymes and methods for separating nucleic acid using agents that bind nucleic acid based on methylation status. The terms "methyl-sensitive enzymes" and "methylation sensitive restriction enzymes" are DNA restriction endonucleases that are dependent on the methylation state of their DNA recognition site for activity. For example, there are methyl-sensitive enzymes that cleave or digest at their DNA recognition sequence only if it is not methylated. Thus, an unmethylated DNA sample will be cut into smaller fragments than a methylated DNA sample. Similarly, a hypermethylated DNA sample will not be cleaved. In contrast, there are methyl- sensitive enzymes that cleave at their DNA recognition sequence only if it is methylated. As used herein, the terms "cleave", "cut" and "digest" are used interchangeably.

A “biological sample” in context of the invention may comprise any biological material obtained from the subject or group of subjects that contains genomic material, and may be liquid, solid or both, may be tissue or bone, or a body fluid such as blood, lymph, etc. In particular the biological sample useful for the present invention may comprise biological cells or fragments thereof.

As used herein, the term “pre-selected methylation sites” refers to methylation sites that were selected from genes or regions that showed the highest degree of methylation variation during the training of the method and fulfils certain quality criteria such as a minimum sequencing coverage of >5x were considered and for >5 qualified CpG sites. Additionally, genes that have an average methylation level <0.1 or an average methylation level >0.9 can be excluded due to their limited dynamic range. “Reference methylation profiles” may be defined on the basis of multiple training samples using multivariate statistical methods, such as such as Principal Component analysis or Multi-Dimensional Scaling.

The term “significantly similar” in context of the present disclosure, and in particular in context with the comparison of methylation profiles (such as the comparison between test profiles (from test subject(s) and reference profiles) shall mean a similarity observed by statistical means (i.e. by using bioinformatics) and/or also by observation using the eye. A significant similarity is observed for example if a test profile overlaps with a reference profile that is defined by multiple training samples through multivariate statistical methods, such as Principal Component analysis or Multi- Dimensional Scaling. In particular, a test profile is significantly similar to the pre-determined reference profile if more than 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 % of the methylation pattern/ profile overlaps with that of the reference profile. A similarity of a test profile to more than one, such as two, three or even all reference profile reduces the significance of the similarity.

The term “pre-determined reference profile” used in the context of the present invention refers to a typical or standard methylation profile of the genomic material of a living organism with a specific geographical origin. The pre-determined reference profile may be obtained from a control subject. For example, the control subject may a living organism of the same species as the test subject which has a known geographical origin. Alternatively, the pre-determined reference profile may be obtained from a variety of organisms living in the specific geographical origin. The methylation profile of different organisms of a specific geographical origin may be identical. There may be a compilation of several pre-determined reference profiles and comparing the methylation profile of the test subject with the pre-determined reference profiles in the compilation may enable identifying the specific pre-determined reference profile that is similar to the methylation profile of the test subject and then the geographical origin of the test subject may be deduced to be that of the predetermined reference profile.

The term “similar” used in relation to the geographical origin refers to the habitat or geographical origin of the test subject (s) based on the habitat or geographical origin of the organism from which the pre-determined reference profile was obtained. The term ‘similar’ may refer to the type of habitat, the environmental parameters of the habitat, the country where the habitat is located and the like. The geographical origin of the test subject may be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 % similar to that of the geographical origin of the pre-determined reference profile based on at least one or more environmental parameters as defined above under ‘geographical origin’.

In a first aspect, the invention pertains to a method for the identification of the geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the comparison of a test methylation profile obtained from genomic material of the individual test subject or of the individual group of test subjects with one or more predetermined reference methylation profiles each being specific for a distinct geographic origin.

The present invention is predicated on the surprising identification of methylation profiles in a subset of genes of living organisms including animals which are within one species characteristic for a distinct geographic origin of an individual of said species. Other individuals of the species which originate from a different geographic location are distinguishable by a different methylation profile for the same subset of genes - or methylation sites therein.

In one example of any aspect of the present invention, the method may preferably comprise the following method steps:

(a) determining the methylation status of one or more pre-selected methylation sites within the genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects;

(b) determining from the methylation status determined in (a) a test methylation profile of the individual test subject, or of the individual group of test subjects; and

(c) comparing the test methylation profile determined in (b) with one or more predetermined reference methylation profiles, wherein each of the one or more predetermined reference methylation profiles is specific for a distinct geographic origin of subjects or group of subjects which are of the same biological taxon of the individual test subject or individual group of test subjects; wherein if the test methylation profile is significantly similar to one of the one or more predetermined reference methylation profiles, the individual test subject or the individual group of test subjects has a geographical origin similar to the subjects or group of subjects of the one or more predetermined reference methylation profiles.

The individual test subject or individual group of test subjects may be any biological entity having a DNA genome and DNA genome methylation. Preferably the methylation site is a CpG site. The individual test subject or individual group of test subjects may be selected from a prokaryote, or a eukaryote, such as a unicellular or multicellular plant, a fungus or an animal.

In one aspect of the invention, the one or more pre-selected methylation sites in (a) are methylation sites associated with tissue specific gene expression. Preferably, the pre-selected methylation sites are associated with gene expression of one distinct tissue.

The tissue may be selected from

(i) metabolic tissue such as gut tissue, said gut tissue preferably being ileum or jejunum,

(ii) muscular tissue,

(iii) skin or feather tissue, and

(iv) organ tissue, said organ tissue preferably being hepatic and / or pancreatic tissue. The individual test subject, or the individual group of test subjects, are preferably animals, such as invertebrates such as crabs. Alternatively, the individual test subject, or the individual group of test subjects may be vertebrates such as birds or mammals; and preferably are chicken, prawn or crayfish.

The distinct geographic origin may be a geographic location that is considered to be the habitat (including agricultural environments such as a culture farm) wherein the individual test subject, or individual group of test subjects, were spawned and/or cultured, or at least cultured for a significant time during their lifetime.

Preferably, the one or more pre-selected methylation sites are within the 20% most differentially methylated genes of the genome of the individual test subject, or individual group of test subjects.

In a particular example of the first aspect of the present invention, the individual test subject, or the individual group of test subjects is marbled crayfish. Therein, the distinct geographic origins are geographically distinct waters, preferably being selected from the group consisting of lake(s), river(s) and aquaculture farms. These geographically distinct waters may be made distinct from other bodies of water by one or more environmental parameters selected from pH, water hardness, manganese content, iron content, and aluminum content.

The aforementioned method for marbled crayfish advantageously comprises a genome wide methylation analysis or a methylation analysis of a pre-selected panel of methylation sites. These pre-selected panel of methylation sites preferably contain methylation sites within about 500 to 1000, and preferably about 700 genes. The genes or genetic regions according to table 2 are particularly preferred.

In a particular example of the first aspect of the present invention, the individual test subject, or the individual group of test subjects is chicken. Therein, the distinct geographic origins are geographically distinct chicken farms. These geographically distinct chicken farms may be considered distinct from other chicken farms by one or more environmental parameters, such as, feeding parameters or air parameters (e.g. temperature, humidity, ventilation).

Preferably, the panel of methylation sites in the methods according to the first aspect of the present invention does not comprise consistently methylated or unmethylated methylation sites.

In a second aspect, the invention pertains to a method for quality controlling a suspected geographic origin of an individual test subject or individual group of test subjects, the method comprising the steps of a) determining from the methylation status determined in (a) a test methylation profile of the individual test subject, or of the individual group of test subjects; and b) comparing the test methylation profile determined in (b) with a predetermined reference methylation profile, wherein the predetermined reference methylation profile is specific for individual subjects, or individual groups of subjects, of the same biological taxon of the individual test subject or individual group of test subjects, and which were obtained from the suspected geographic origin; wherein if the test methylation profile is significantly similar to the predetermined reference methylation profile, the individual test subject or the individual group of test subjects passes the quality control and the suspected geographical origin is indicated as true geographical origin.

The biological sample containing genomic material may be as defined above.

Also, for this aspect of the present invention, the individual test subject or individual group of test subjects may be any biological entity having a DNA genome and DNA genome methylation. Preferably the methylation site is a CpG site. The individual test subject or individual group of test subjects may be selected from a prokaryote, or a eukaryote, such as a unicellular or multicellular plant, a fungus or an animal. The one or more pre-selected methylation sites in (a) may be methylation sites associated with tissue specific gene expression. Preferably, the pre-selected methylation sites are associated with gene expression of one distinct tissue. Suitable tissues are as defined above for the first aspect of the invention.

The individual test subject, or the individual group of test subjects may be plants and animals, are preferably animals, such as invertebrates such as crabs. Alternatively, the individual test subject, or the individual group of test subjects may be vertebrates such as birds or mammals; and preferably are chicken, prawn or crayfish.

The distinct geographic origin may be a geographic location that is considered to be the habitat (including agricultural environments such as a culture farm) wherein the individual test subject, or individual group of test subjects, were spawned and/or cultured, or at least cultured for a significant time during their lifetime.

Preferably, the one or more pre-selected methylation sites are within the 20% most differentially methylated genes of the genome of the individual test subject, or individual group of test subjects.

In a particular example of the second aspect of the present invention, the individual test subject, or the individual group of test subjects is marbled crayfish. Therein, the distinct geographic origins are geographically distinct waters, preferably being selected from the group consisting of lake(s), river(s) and aquaculture farms. These geographically distinct waters may be considered distinct from other waters by one or more environmental parameters selected from pH, water hardness, manganese content, iron content, and aluminum content.

The aforementioned method for marbled crayfish advantageously comprises a genome wide methylation analysis or a methylation analysis of a pre-selected panel of methylation sites. These pre-selected panel of methylation sites preferably contain methylation sites within about 500 to 1000, and preferably about 700 genes. The genes or genetic regions according to table 2 are particularly preferred

In a particular example of the first aspect of the present invention, the individual test subject, or the individual group of test subjects is chicken. Therein, the distinct geographic origins are geographically distinct chicken farms. These geographically distinct chicken farms may be considered distinct from other chicken farms by one or more environmental parameters, such as, feeding parameters or air parameters (e.g. temperature, humidity, ventilation).

Preferably, the panel of methylation sites in the methods according to the second aspect of the present invention does not comprise consistently methylated or unmethylated methylation sites.

In a third aspect, the invention pertains to a method for assessing one or more environmental parameters of a habitat of an individual test subject, or of an individual group of test subjects, the method comprising the steps of

(a) determining the methylation status of one or more pre-selected methylation sites within the genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects

(b) determining from the methylation status determined in (a) a test methylation profile of the individual test subject, or of the individual group of test subjects; and

(c) comparing the test methylation profile determined in (b) with one or more predetermined reference methylation profiles, wherein the one or more predetermined reference methylation profiles are each specific for individual subjects, or individual groups of subjects, of the same biological taxon (preferably species) of the individual test subject or the individual group of test subjects, and which were each obtained from distinct geographic origins; and wherein the distinct geographic origin is distinguished from other distinct geographic origins by one or more environmental parameters; wherein if the test methylation profile is significantly similar to one of the one or more predetermined reference methylation profiles, the individual test subject or individual group of test subjects is derived from a geographical origin having similar, or preferably equal, environmental parameters to the geographical origin of the subjects or group of subjects of the one of the one or more predetermined reference methylation profiles.

The biological sample containing genomic material may be as defined above. Also, for this aspect of the present invention, the individual test subject or individual group of test subjects may be any biological entity having a DNA genome and DNA genome methylation. Preferably the methylation site is a CpG site. The individual test subject or individual group of test subjects may be selected from a prokaryote, or a eukaryote, such as a unicellular or multicellular plant, a fungus or an animal. The one or more pre-selected methylation sites in (b) may be methylation sites associated with tissue specific gene expression. Preferably, the pre-selected methylation sites are associated with gene expression of one distinct tissue. Suitable tissues are as defined above for the first aspect of the invention.

The individual test subject, or the individual group of test subjects may be plants or animals, are preferably animals, such as invertebrates such as crabs. Alternatively, the individual test subject, or the individual group of test subjects may be vertebrates such as birds or mammals; and preferably are chicken, prawn or crayfish.

The distinct geographic origin may be a geographic location that is considered to be the habitat (including agricultural environments such as a culture farm) wherein the individual test subject, or individual group of test subjects, were spawned and/or cultured, or at least cultured for a significant time during their lifetime.

Preferably, the one or more pre-selected methylation sites are within the 20% most differentially methylated genes of the genome of the individual test subject, or individual group of test subjects.

In a particular example of the third aspect of the present invention, the individual test subject, or the individual group of test subjects is marbled crayfish. Therein, the distinct geographic origins are geographically distinct waters, preferably being selected from the group consisting of lake(s), river(s) and aquaculture farms. These geographically distinct waters may be considered distinct from other bodies of water by one or more environmental parameters selected from pH, water hardness, manganese content, iron content, and aluminum content.

The aforementioned method for marbled crayfish advantageously comprises a genome wide methylation analysis or a methylation analysis of a pre-selected panel of methylation sites. These pre-selected panel of methylation sites preferably contain methylation sites within about 500 to 1000, and preferably about 700 genes. The genes or genetic regions according to table 2 are particularly preferred.

In a particular example of the first aspect of the present invention, the individual test subject, or the individual group of test subjects is chicken. Therein, the distinct geographic origins are geographically distinct chicken farms. These geographically distinct chicken farms may be considered distinct from other chicken farms by one or more environmental parameters, such as, feeding parameters or air parameters (e.g. temperature, humidity, ventilation). Preferably, the panel of methylation sites in the methods according to the third aspect of the present invention does not comprise consistently methylated or unmethylated methylation sites.

In a fourth aspect, the invention pertains to a method for confirming or declining an assumed geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the comparison of a test methylation profile obtained from genomic material of the individual test subject or of the individual group of test subjects with one or more predetermined reference methylation profiles each being specific for a distinct geographic origin.

The biological sample containing genomic material may be as defined above.

Also, for this aspect of the present invention, the individual test subject or individual group of test subjects may be any biological entity having a DNA genome and DNA genome methylation. Preferably the methylation site is a CpG site. The individual test subject or individual group of test subjects may be selected from a prokaryote, or a eukaryote, such as a unicellular or multicellular plant, a fungus or an animal. The one or more pre-selected methylation sites in (b) may be methylation sites associated with tissue specific gene expression. Preferably, the pre-selected methylation sites are associated with gene expression of one distinct tissue. Suitable tissues are as defined above for the first aspect of the invention.

The individual test subject, or the individual group of test subjects may be plants or animals, are preferably animals, such as invertebrates such as crabs. Alternatively, the individual test subject, or the individual group of test subjects may be vertebrates such as birds or mammals; and preferably are chicken, prawn or crayfish.

The distinct geographic origin may be a geographic location that is considered to be the habitat (including agricultural environments such as a culture farm) wherein the individual test subject, or individual group of test subjects, were spawned and/or cultured, or at least cultured for a significant time during their lifetime.

Preferably, the one or more pre-selected methylation sites are within the 20% most differentially methylated genes of the genome of the individual test subject, or individual group of test subjects.

In a particular example of the fourth aspect of the present invention, the individual test subject, or the individual group of test subjects is marbled crayfish. Therein, the distinct geographic origins are geographically distinct waters, preferably being selected from the group consisting of lake(s), river(s) and aquaculture farms. These geographically distinct waters may be considered distinct from other bodies of water by one or more environmental parameters selected from pH, water hardness, manganese content, iron content, and aluminum content. The aforementioned method for marbled crayfish advantageously comprises a genome wide methylation analysis or a methylation analysis of a pre-selected panel of methylation sites. These pre-selected panel of methylation sites preferably contain methylation sites within about 500 to 1000, and preferably about 700 genes. The genes or genetic regions according to table 2 are particularly preferred.

In a particular example of the first aspect of the present invention, the individual test subject, or the individual group of test subjects is chicken. Therein, the distinct geographic origins are geographically distinct chicken farms. These geographically distinct chicken farms may be considered distinct from other chicken farms by one or more environmental parameters, such as, feeding parameters or air parameters (e.g. temperature, humidity, ventilation).

Preferably, the panel of methylation sites in the methods according to the fourth aspect of the present invention does not comprise consistently methylated or unmethylated methylation sites.

In a fifth aspect, the invention pertains to a method for developing a test system for confirming an assumed geographic origin of an individual test subject or of an individual group of test subjects, the method comprising the steps of: a. determining the methylation status of one or more methylation sites within genomic material contained in a biological sample obtained from the individual test subject, or of the individual group of test subjects; b. selecting from the one or more methylation sites a reference panel of methylation sites which is characterized by a specific and distinct differential methylation profile for each of the known geographic origins; c. obtaining a test system by assigning a reference methylation profile for each of the known geographic origins (or locations); and wherein a comparison of a test methylation profile obtained from a test sample with the reference methylation profiles obtained in (c) allows for confirming the assumed geographic origin of the individual test subject or of the individual group of test subjects from which the test sample was obtained.

The biological sample containing genomic material may be as defined above.

Also, for this aspect of the present invention, the individual test subject or individual group of test subjects may be any biological entity having a DNA genome and DNA genome methylation. Preferably the methylation site is a CpG site. The individual test subject or individual group of test subjects may be selected from a prokaryote, or a eukaryote, such as a unicellular or multicellular plant, a fungus or an animal. The one or more pre-selected methylation sites may be methylation sites associated with tissue specific gene expression. Preferably, the pre-selected methylation sites are associated with gene expression of one distinct tissue. Suitable tissues are as defined above for the first aspect of the invention.

The individual test subject, or the individual group of test subjects, are preferably animals, such as invertebrates such as crabs. Alternatively, the individual test subject, or the individual group of test subjects may be vertebrates such as birds or mammals; and preferably are chicken, prawn or crayfish.

The distinct geographic origin may be a geographic location that is considered to be the habitat (including agricultural environments such as a culture farm) wherein the individual test subject, or individual group of test subjects, were spawned and/or cultured, or at least cultured for a significant time during their lifetime.

Preferably, the one or more pre-selected methylation sites are within the 20% most differentially methylated genes of the genome of the individual test subject, or individual group of test subjects.

In a particular example of the second aspect of the present invention, the individual test subject, or the individual group of test subjects is marbled crayfish. Therein, the distinct geographic origins are geographically distinct waters, preferably being selected from the group consisting of lake(s), river(s) and aquaculture farms. These geographically distinct waters may be considered distinct from other bodies of water by one or more environmental parameters selected from pH, water hardness, manganese content, iron content, and aluminum content.

The aforementioned method for marbled crayfish advantageously comprises a genome wide methylation analysis or a methylation analysis of a pre-selected panel of methylation sites. These pre-selected panel of methylation sites preferably contain methylation sites within about 500 to 1000, and preferably about 700 genes. The genes or genetic regions according to table 2 are particularly preferred.

In a particular example of the first aspect of the present invention, the individual test subject, or the individual group of test subjects is chicken. Therein, the distinct geographic origins are geographically distinct chicken farms. These geographically distinct chicken farms may be considered to be distinct from other chicken farms by one or more environmental parameters, such as, feeding parameters or air parameters (e.g. temperature, humidity, ventilation).

Preferably, the panel of methylation sites in the methods according to the fifth aspect of the present invention does not comprise consistently methylated or unmethylated methylation sites.

Brief description of the Figures Figure 1 shows specific water parameters of four Marbled crayfish population habitats.

Figure 2 shows context-specific differential methylation in marbled crayfish populations. (A) Principal component analysis of abdominal muscle (mus., square symbols) and hepatopancreas (hep., circular symbols) samples from Singlis, based on the methylation levels of 56 genes with tissue-specific methylation differences. (B) Principal component analysis of abdominal muscle (mus., square symbols) and hepatopancreas (hep., circular symbols) samples from Reilingen, based on the methylation levels of 35 genes with tissue-specific methylation differences. (C) Principal component analysis of hepatopancreas samples from all locations, based on the methylation levels of 122 genes with location-specific methylation differences. (D) Principal component analysis of abdominal muscle samples from all locations, based on the methylation levels of 22 genes with location-specific methylation differences.

Figure 3 shows the validation of context-dependent differential methylation in marbled crayfish. Results are shown for capture-based sequencing and for the corresponding validation experiment with amplicon sequencing, for 4 different genomic regions. Unfilled shapes: abdominal muscle; filled shapess: hepatopancreas;squares: Reilingen; stars: Singlis; circles: Andragnaroa; triangle: Ihosy.

Figure 4 are the results of differentially methylated CpG sites in chicken using the function “calculate DiffMeth” from the R package MethylKit on Reduced representation bisulfite sequencing (RRBS) data. The identified differentially methylated CpG sites allowed a robust separation of the three locations in a principle component analysis. After filtering for SNPs: 2.3 - 3.6 million CpG sites. CpG sites with min coverage 10 in all the samples: 623,657, Differentially methylated CpGs:1274 (p-value <0.05).

Figure 5 are the results of differentially methylated CpG sites in soho salmon using the function “calculate DiffMeth” from the R package MethylKit on Reduced representation bisulfite sequencing (RRBS) data. The identified differentially methylated CpG sites allowed a robust separation of the two locations in a principle component analysis. CpG sites with min coverage 10 in all the samples after SNP filtering: 610,397, Significant DMRs: 440 (p-value <0.05, diff in methylation>=10%)

Examples

Certain aspects and embodiments of the invention will now be illustrated by way of example and with reference to the description, figures and tables set out herein. Such examples of the methods, uses and other aspects of the present invention are representative only, and should not be taken to limit the scope of the present invention to only such representative examples.

Example 1

Habitat profiles of four independent marbled crayfish populations To explore the possibility of context-dependent DNA methylation in marbled crayfish, animals from four diverse stable populations were collected. Reilingen (Germany) represents the type locality, a small eutrophic lake in an environmentally protected area. The Singlis (Germany) population is from a larger oligotrophic lake with in a former brown coal mining area. The Andragnaroa (Madagascar) population is located in a river flowing through a forest area at relatively high altitude (1156 m) with soft mountain water. Finally, the Ihosy (Madagascar) population is found in highly turbid water, with high levels of pollution from nearby mining activities. The analysis of physicochemical water parameters showed clean, slightly basic (pH 8.4) water in Reilingen and rather acidic (pH 5.2) water with high levels of Manganese (4792 pg/l) in Singlis. The water in Andragnaroa showed particularly low hardness (0.3 °dH), while the water in Ihosy was characterized by high levels of Aluminium (2967 pg/l) and Iron (2249 pg/l). Altogether, our study thus covered populations that inhabit four diverse habitats from different climatic zones and with different water parameters. These results are shown in Figure 1 ,

Table 1 : Overview of marbled crayfish populations analyzed.

Example 2

Identification of a variably methylated gene set

It was previously shown that DNA methylation in the marbled crayfish is targeted to gene bodies, relatively stable and largely tissue-invariant (Gatzmann et al., 2018). However, a comparison of 8 whole-genome bisulfite sequencing datasets from different animals, different tissues and different developmental stages also indicated the possibility for a smaller group of genes that showed more variable methylation levels (Gatzmann et al., 2018). This was confirmed by systematic analyses of methylation variance. A variance cutoff of >0.006 identified 846 genes, 149 of which were consistently methylated or unmethylated (mean ratio >0.8 or <0.2, respectively) and excluded from further analysis, thus defining a core set of 697 variably methylated genes. Metric multidimensional analysis based on the methylation levels of these genes separated the hepatopancreas samples from the abdominal muscle samples, which suggested the presence of previously unrecognized tissue-specific methylation patterns.

In order to analyze the methylation patterns of these genes in a larger number of samples and at higher coverage methylation, a bead-based capture assay was developed. For this assay, DNA samples from 2 different tissues were prepared: hepatopancreas, which represents the main metabolic organ of crayfish and abdominal muscle, the main muscle tissue forming the abdominal tail. Hepatopancreas DNA was prepared from N=47 animals (11-12 per location), while abdominal muscle DNA was prepared from a subset of the same animals (N=26, 12-4 per location). Subgenome capture was found to be both efficient and specific, providing a minimum of 10 million mapped reads per sample under stringent conditions.

In subsequent steps, genes with more than 50% Ns in their sequence were excluded, which left 623 genes in our analysis. Furthermore, only those CpG sites that were present in all the samples with a sequencing coverage of >5x were considered and average methylation levels were calculated only if a gene had >5 qualified CpG sites. These criteria were fulfilled for 463 genes. The inventors also excluded invariant genes, i.e., genes that were in the bottom 10% for methylation variance as well as genes with an average methylation level <0.1 or >0.9, resulting in a core set of 361 variably methylated genes (Tab. 2). Table 2: Genomic regions suitable as methylation markers in marbled crayfish

Importantly, gene ontology analysis was performed to better understand the underlying mechanisms behind our set of variably methylated genes. A significant enrichment on genes with functional characteristics related to GTP-binding proteins (also named G proteins) was observed. G proteins regulating a wide variety of cellular activities, and among others, we detected variably methylated genes playing a role in transcription/translation regulation, response to stress, RNA metabolism, and immune response to pathogens. Together, the functional heterogeneity observed within those 321 variably methylated genes could potentially confer plasticity for the marbled crayfish living under different environmental pressures.

Example 3

Context-dependent methylation patterns in marbled crayfish populations

In additional steps, we sought to identify specific context-dependent methylation patterns in our core set of 361 variably methylated genes. To identify tissue-specific methylation differences, we applied a Wilcoxon rank sum test for differential (p<0.05 after Benjamini-Hochberg correction) methylation between hepatopancreas and abdominal muscle. For our largest dataset from a single location (Singlis, N=24) this identified 56 genes that allowed a robust separation of the two tissues in a principal component analysis. When the same approach was applied to the second-largest dataset (Reilingen, N=19), it identified 35 differentially methylated genes (28 overlapping with Singlis) that again allowed a robust separation of the two tissues in a principal component analysis. Tissue-specific methylation differences appeared rather moderate for average gene methylation levels, but more pronounced at the CpG level. Of note, tissue-specific methylation differences were highly stable between different populations. Taken together, these findings suggest the existence of localized tissue-specific methylation patterns in marbled crayfish.

To identify location-specific methylation differences, we applied a Kruskal-Wallis test for differential (p<0.05 after Benjamini-Hochberg correction) methylation between the four locations. For the larger hepatopancreas dataset (N=47), this identified 122 genes that allowed a robust separation of the four locations in a principal component analysis. When the same approach was applied to the smaller abdominal muscle dataset (N=26), it identified 22 differentially methylated genes (21 overlapping with hepatopancreas) that again allowed a robust separation of the four locations in a principal component analysis. Similar to our findings for tissue-specific methylation, location- specific methylation differences appeared moderate for average gene methylation levels, but more pronounced at the CpG level. Also, location-specific methylation differences were highly stable between different locations. These findings suggest the existence of defined location-specific methylation differences among marbled crayfish populations.

Example 4

Validation of context dependent methylation patterns

To validate the results for the tissue- and location-specific methylation patterns, markers based on differentially methylated regions (DMRs) within the identified genes, which lead to the separation of the samples, were designed. Both, tissue-specific markers (n=2) and location-specific markers (n=2) were tested with samples from the same two tissues (hepatopancreas and abdominal muscle) and the same four locations (Reilingen, Singlis, Andragnaroa and Ihosy), but from new samples, collected one to two years after the first sampling. The samples were analysed on a PCR based deep sequencing of amplicons. The results confirmed the finding from the capture based subgenome sequencing. With the chosen markers, a separation between the tissues as well as for locations, based on mean methylation ratios per CpG was possible. The mean CpG ratios for the sequenced amplicons were additionally comparable to the mean CpG ratios of the bead-based capture results. Notably, this also confirms that location-specific methylation is stable over time among marbled crayfish populations, resulting in the possibility to define location specific markers to identify the origin of a population and use methylation patterns as a fingerprint for those. These results are shown in Figures 2 and 3.

Materials and Methods

Sampling for bead-based capture assay was carried out in August 2017 for Reilingen, Oktober 2017 for Singlis and as mentioned in Adriantsoa et al., 2019, from October 2017 to March 2018 in Madagascar. Sampling for validation experiment was carried out from March to May 2019 in Germany and Madagascar. Samples were preserved in 100% ethanol and stored in -80 °C until DNA was extracted.

Genomic DNA was isolated and purified from abdominal muscular and hepatopancreas tissue using a Tissue Ruptor (Qiagen), followed by proteinase K digestion and isopropanol precipitation. The quality of isolated genomic DNA was assessed on a 2200 TapeStation (Agilent).

Library preparation was carried out as described in the SureSelectXT Methyl-Seq Target Enrichment System for lllumina Multiplexed Sequencing Protocol, Version DO, July 2015. Quality controls were performed, and sample concentrations were measured on a 2200 TapeStation (Agilent). Multiplexed samples were sequenced on a HiSeqX ten system (lllumina).

Read pairs were quality trimmed and mapped to the 697 genes that showed variable methylation in the whole-genome bisulfite sequencing datasets (Gatzmann et al., 2018) using BSMAP (Xi and Li, 2009). Subsequently, the methylation ratio for each CpG site was calculated using the Python provided with BSMAP. Only those CpG sites that were present in all the samples with a coverage of >5x were considered for further analysis. The average methylation level for each gene was calculated only if a gene had at least 5 CpG sites with >5x coverage. Furthermore, the genes with following criteria were excluded from subsequent analysis: i) genes that were in the bottom 10% in terms of methylation variance ii) genes with an average methylation level of < 0.1 or > 0.9, and ii) genes with more than 50% Ns in their sequence.

In order to identify tissue-specific methylation differences, a Wilcoxon rank sum test was applied (hepatopancreas vs. abdominal muscle samples from Singlis and Reilingen) and the p-values were corrected for multiple testing using the Benjamini-Hochberg method. Likewise, to identify location- specific methylation differences, a Kuskal-Wallis test was used, and the p-values were corrected for multiple testing using the Benjamini-Hochberg method. Additionally, dmrseq (Korthauer et al., 2018) was used to identify tissue-specific and location-specific differentially methylated regions within the respective genesets.

Genomic DNA was bisulfite converted by using the EZ DNA Methylation-Gold Kit (Zymo Research) following the manufacturer's instructions. Target regions were PCR amplified using region-specific primers (Tab. 3). PCR products were gel-purified using the QIAquick Gel Extraction Kit (Qiagen). Subsequently, samples were indexed using the Nextera XT index Kit v2 Set A (lllumina). The pooled library was sequenced on a MiSeqV2 system using a paired-end 150 bp nano protocol. Sequencing data was analyzed using BisAMP (BisAMP: A web-based pipeline for targeted RNA cytosine-5 methylation analysis, Bormann F, Tuorto F, Cirzi C, Lyko F, Legrand C. Methods. 2019 Mar 1 ;156:121 -127.)

Table 3: Primers for Validation

Example 5

Identification of differentially methylated CpG sites in chicken

In order to identify differentially methylated CpG sites in the chicken, the function “calculate DiffMeth” from the R package MethylKit was used on the Reduced representation bisulfite sequencing (RRBS) data. 1274 differentially methylated CpGs were identified (p-value < 0.05). Prior to this analysis, the data was filtered for SNPs and a coverage cutoff of minimum 10 per CpG site was applied. The identified differentially methylated CpG sites allowed a robust separation of the three locations in a principle component analysis as shown in Figure 4.

Material and Methods

Isolated and purified genomic DNA from breast muscular tissue was provided by different service laboratories in the respective country of sample source. Quality was checked using a 2200 TapeStation (Agilent).

RRBS library preparation was carried out as described in the Zymo-Seq RRBS™ Library Kit Instruction Manual Ver. 1.0.0. Quality controls were performed, and sample concentrations were measured on a 2200 TapeStation (Agilent). Multiplexed samples were sequenced on a HiSeq 4000 system (lllumina).

Reads were quality trimmed using trimmomatic version 0.38 and mapped with BSMAP 2.90 to the Gallus gallus genome assembly version 5.0. Methylation ratios were calculated using a python script (meth ratio. py) distributed with the BSMAP package. All the CpG sites that were associated with sex chromosomes and the CpG sites that overlapped with SNPs for the Gallus gallus genome were filtered out from the further analysis. Differential methylation analysis was performed using the R package MethylKit (Akalin et al. (2012), Genome Biology, 13(10), R87).

Example 6

Identification of differentially methylated CpG sites in coho salmon

In order to identify differentially methylated regions in the coho salmon’s RRBS data, the function “calculate DiffMeth” from the R package MethylKit was used. 440 differentially methylated regions were identified (p-value < 0.05, difference in methylation >= 10%). Prior to this analysis, the data was filtered for SNPs and a coverage cutoff of minimum 10 per CpG site was applied. The identified differentially methylated regions allowed a robust separation of the two locations in a principle component analysis as shown in Figure 5. Material and Methods

RRBS data that was published by Le Luyer et al., 2017 was downloaded from the National Center for Biotechnology Information Sequence Read Archive. Reads were mapped with BSMAP 2.90 to Okis_V2 (GCF_002021735.2) and methylation ratios were determined using a python script (meth ratio. py) distributed with the BSMAP package. All the CpG sites that overlapped with SNPs were filtered out from the further analysis. Differential methylation analysis, with the breeding environment and sex as covariates, was performed using the R package MethylKit (Akalin et al. (2012), Genome Biology, 13(10), R87).