Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPUTER-IMPLEMENTED METHOD FOR PARENTERAL ASSIGNMENT
Document Type and Number:
WIPO Patent Application WO/2019/063466
Kind Code:
A1
Abstract:
The present invention relates to a method, computer program product and computing system for parentage assignment through relationship-logic.

Inventors:
GRASHEI KIM (NO)
ØDEGÅRD JØRGEN (NO)
MEUWISSEN THEODORUS (NO)
Application Number:
PCT/EP2018/075756
Publication Date:
April 04, 2019
Filing Date:
September 24, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AQUAGEN AS (NO)
International Classes:
G06F19/18
Other References:
T. C. MARSHALL ET AL: "Statistical confidence for likelihood-based paternity inference in natural populations", MOLECULAR ECOLOGY., vol. 7, no. 5, 1 May 1998 (1998-05-01), GB, pages 639 - 655, XP055460020, ISSN: 0962-1083, DOI: 10.1046/j.1365-294x.1998.00374.x
W. G. HILL ET AL: "Parentage identification using single nucleotide polymorphism genotypes: Application to product tracing", JOURNAL OF ANIMAL SCIENCE, vol. 86, no. 10, 23 May 2008 (2008-05-23), US, pages 2508 - 2517, XP055459874, ISSN: 0021-8812, DOI: 10.2527/jas.2007-0276
SHANNON M. CLARKE ET AL: "A High Throughput Single Nucleotide Polymorphism Multiplex Assay for Parentage Assignment in New Zealand Sheep", PLOS ONE, vol. 9, no. 4, 16 April 2014 (2014-04-16), pages e93392, XP055459868, DOI: 10.1371/journal.pone.0093392
CAMPBELL, D.; DUCHESNE, P.; BERNATCHEZ, L.: "AFLP utility for population assignment studies: analytical investigation and empirical comparison with microsatellites", MOL ECOL, vol. 12, no. 7, 2003, pages 1979 - 1991
GODDARD, M. E.; HAYES, B. J.; MEUWISSEN, T. H.: "Using the genomic relationship matrix to predict the accuracy of genomic selection", J ANIM BREED GENET, vol. 128, no. 6, 2011, pages 409 - 421, XP055459812, DOI: doi:10.1111/j.1439-0388.2011.00964.x
HAYES, B. J.: "Efficient parentage assignment and pedigree reconstruction with dense single nucleotide polymorphism data", J DAIRY SCI, vol. 94, no. 4, 2011, pages 2114 - 2117
HEATON, M. P.; LEYMASTER, K. A.; KALBFLEISCH, T. S.; KIJAS, J. W.; CLARKE, S. M.; MCEWAN, J., ...: "International Sheep Genomics, C. (2014). SNPs for parentage testing and traceability in globally diverse breeds of sheep", PLOS ONE, vol. 9, no. 4, 2014, pages e94851
JONES, A. G.; SMALL, C. M.; PACZOLT, K. A.; RATTERMAN, N. L.: "A practical guide to methods of parentage analysis", MOL ECOL RESOUR, vol. 10, no. 1, 2010, pages 6 - 30
JONES, O. R.; WANG, J.: "COLONY: a program for parentage and sibship inference from multilocus genotype data", MOL ECOL RESOUR, vol. 10, no. 3, 2010, pages 551 - 555
KALINOWSKI, S. T.; TAPER, M. L.; MARSHALL, T. C.: "Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment", MOL ECOL, vol. 16, no. 5, 2007, pages 1099 - 1106
KALINOWSKI, S. T.; TAPER, M. L.; MARSHALL, T. C., CORRIGENDUM. MOLECULAR ECOLOGY, vol. 19, no. 7, 2010, pages 1
MARSHALL, T. C.; SLATE, J.; KRUUK, L. E.; PEMBERTON, J. M.: "Statistical confidence for likelihood-based paternity inference in natural populations", MOL ECOL, vol. 7, no. 5, 1998, pages 639 - 655, XP055460020, DOI: doi:10.1046/j.1365-294x.1998.00374.x
MORRISSEY, M. B.; WILSON, A. J.: "The potential costs of accounting for genotypic errors in molecular parentage analyses", MOL ECOL, vol. 14, no. 13, 2005, pages 4111 - 4121
PURFIELD, D. C.; MCCLURE, M.; BERRY, D. P.: "Justification for setting the individual animal genotype call rate threshold at eighty-five percent", J ANIM SCI, vol. 94, no. 11, 2016, pages 4558 - 4569
SARGOLZAEI, M.; SCHENKEL, F. S.: "QMSim: a large-scale genome simulator for livestock", BIOINFORMATICS, vol. 25, no. 5, 2009, pages 680 - 681
STRUCKEN, E. M.; LEE, S. H.; LEE, H. K.; SONG, K. D.; GIBSON, J. P.; GONDRO, C.: "How many markers are enough? Factors influencing parentage testing in different livestock populations", J ANIM BREED GENET, vol. 133, no. 1, 2016, pages 13 - 23
VANRADEN, P. M.: "Efficient methods to compute genomic predictions", J DAIRY SCI, vol. 91, no. 11, 2008, pages 4414 - 4423, XP026955125
WALDBIESER, G. C.; BOSWORTH, B. G.: "A standardized microsatellite marker panel for parentage and kinship analyses in channel catfish, Ictalurus punctatus", ANIM GENET, vol. 44, no. 4, 2013, pages 476 - 479
Attorney, Agent or Firm:
ONSAGERS AS (NO)
Download PDF:
Claims:
CLAIMS

1. A computer-implemented method for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship, the method comprising:

a) receiving experimental data comprising data related to alleles present within the genome of:

i) the offspring (O);

ii) a first potential parent (PI) of the offspring; and

iii) a second potential parent (P2) of the offspring;

for selected loci;

b) processing the experimental data to generate at least one set of computerized genomic relationship parameters; and

c) receiving control data comprising at least one set of predetermined genomic relationship parameters;

d) comparing the at least one set of computerized genomic relationship parameters with the at least one set of predetermined genomic relationship parameters in order to determine whether the first potential parent of the offspring and the second potential parent of the offspring are the true parents of the offspring; wherein

- the experimental data comprises data relating to number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus;

- the at least one set of computerized genomic relationship parameters is

calculated based upon genomic relationship between:

i) the offspring and the first potential parent of the offspring (ro,pi);

ii) the offspring and the second potential parent of the offspring (ro,p2); iii) the offspring and itself (ro,o);

iv) the first potential parent of the offspring and the second potential parent of the offspring (rpi,p2); v) the first potential parent of the offspring and itself (rpi,pi); and vi) the second potential parent of the offspring and itself (rp2,p2);

the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on basis of at least the number of copies of a reference allele present for each of the selected loci; and

- the genomic relationship does not assume independent loci.

2. The method of claim 1, wherein the offspring is an animal, the animal preferably being a fish such as a farmed fish.

3. The method of claim 1, wherein the first potential parent (PI) of the offspring, the second potential parent (P2) of the offspring and the offspring (O) are organisms that reproduce sexually.

4. The method of claim 1, wherein the experimental data further comprises data relating to allele frequency of the reference allele for each of the selected loci.

5. The method of claim 1, wherein the experimental data further comprises data relating to total number of loci.

6. The method according to claim 1, wherein genomic relationship between two organisms is calculated at least on the basis of

- number of copies of a reference allele present for each of the selected loci;

- allele frequency of the reference allele for each of the selected loci; and - total number of loci.

7. The method according to claim 6, wherein genomic relationship between two organisms i and j is calculated according to formula (1):

wherein

represents genomic relationship between organisms i and j; relate to the number of copies of a reference allele present for

organisms i and j at locus t;

pt represents the allele frequency at locus t; and c represents the total number of loci;

wherein

the reference allele at locus t is selected from the group consisting of i) alleles present within the genome of the offspring at locus t; ii) alleles present within the genome of the first potential parent of the offspring at locus t; and iii) alleles present within the genome of the second potential parent of the offspring at locus t.

8. The method according to claim 1, wherein the at least one set of computerized genomic relationship parameters comprises at least three computerized genomic relationship parameters; the at least three computerized genomic relationship parameters being defined as follows:

- a first computerized genomic relationship parameter is calculated according to formula (2)

ΓΟ,ΡΙ representing the genomic relationship between the offspring and the first potential parent of the offspring;

representing the average of x and y, wherein x represents the

genomic relationship between the first potential parent of the offspring and the second potential parent of the offspring; and y represents the genomic relationship between the first potential parent of the offspring and itself; a second computerized genomic relationship parameter is calculated according to formula (3)

ro,p2 representing the genomic relationship between the offspring and the second potential parent of the offspring; representing the average of x and y, wherein x represents the genomic relationship between the first potential parent of the offspring and the second potential parent of the offspring; and y represents the genomic relationship between the second potential parent of the offspring and itself; and

- a third computerized genomic relationship parameter is calculated according to formula (4)

ro,o representing the genomic relationship between the offspring and itself; representing the genomic relationship between the first potential parent

of the offspring and the second potential parent of the offspring.

9. The method according to claim 1, wherein each set of the predetermined

genomic relationship parameters has been calculated based upon at least alleles present within the genome of:

i) a control offspring (o);

ii) a first true parent (TP1) of the control offspring; and

iii) a second true parent (TP2) of the control offspring;

for selected loci;

wherein at least one of i), ii) and iii) is different for each set of the predetermined genomic relationship parameters.

10. The method according to claim 1, wherein the selected loci are polymorphic.

11. The method according to claim 1 , wherein the number of selected loci that forms basis for calculating the at least one set of computerized genomic relationship parameters is at least 100 and/or the number of selected loci that forms basis for calculating the at least one set of the predetermined genomic relationship parameters is at least 100.

12. The method according to claim 1, wherein the at least one set of computerized genomic relationship parameters and the at least one set of predetermined genomic relationship parameters are compared using a mathematical equation which is proportional to a multivariate normal density function.

13. The method according to claim 1 , wherein the at least one set of predetermined genomic relationship parameters is predetermined threshold values.

14. A computer program product residing on a computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 13.

15. A computing system comprising a processor and a memory configured to

perform the method according to any one of claims 1 to 13.

Description:
COMPUTER-IMPLEMENTED METHOD FOR

PARENTERAL ASSIGNMENT

Aquaculture represents an expanding use of the world's marine ecosystem. A major challenge with aquaculture is containment, and in all regions where fish are reared in marine cages escapes have been reported. Efforts to prevent fish from escaping are given high priority, and both the industry and the authorities are working on a broad front to minimize losses caused by escapees. A prerequisite for escape prevention is knowledge on inter alia the origin of the escaped fish. By ensuring that all fish at a fish farm facility have a dedicated pool of unique parents, the origin of the fish may be determined by parenteral assignment. The present invention relates to a method, computer program product and computing system for parentage assignment through relationship-logic.

BACKGROUND OF THE INVENTION

Aquaculture represents an expanding use of the world's marine ecosystem. In Norway, which is one of the largest global seafood producers, the combined aquaculture production of Atlantic salmon Salmo Salar, rainbow trout

Oncorhynchus mykiss and Atlantic cod Gadus morhua is worth more than the total catch from wild fisheries, and represents the country's second most economically significant export product.

A major challenge with aquaculture is containment, and in all regions where fish are reared in marine cages escapes have been reported. These escapes mean economic loss for the industry and even more important the escaped fish may represent a threat to the wild fish's ability to survive. In addition to compete for food and habitat, the escaped fish may breed with the wild stock which may dilute the natural gene pool and thereby threaten the long-term survival and evolution of wild species.

Genetic changes in wild Atlantic salmon populations, as a result of interbreeding between wild and farmed conspecifics, have been documented. Farmed salmon strains typically display lower levels of genetic variation when compared with wild salmon populations, and typically display genetic differences for traits such as growth, physiology, behavior and gene transcription. Furthermore, offspring of farmed fish, and hybrids, display lower fitness in natural habitats when compared to their wild counterparts. Consequently, it is generally accepted that farm escapes represent a significant threat to genetic integrity and the long term evolutionary capacity of recipient wild populations.

Efforts to prevent fish from escaping are given high priority, and both the industry and the authorities are working on a broad front to minimize losses caused by escapees. A prerequisite for escape prevention is knowledge on why, when, and from where the fish escape. Such information is needed to identify relationships between culture technologies, techniques, site locations and escapes. When this information is combined with knowledge of survival and distribution of escaped fish at different life stages, times of year and locations to identify the most critical escape periods, risk analysis can be performed and the high priority areas for rapid improvement in containment can be identified.

In order to be able to determine the origin of escaped fish, there must be a link between the fish and the place of origin, and methodologies for identifying the link must also be available. Having this in place will be an important contribution in the continuing effort to prevent fish from escaping, but such technology may also find other fields of utilization such as in crime prevention.

The increasing amount of high value seafood products that is in transit around the globe may be a tempting target for criminals, and there have already been incidences of trucks filled with salmon that have been stolen. If the seafood products were tagged with a label which is incapable of being removed it would be possible to register the seafood products as stolen goods. Further, when the crime is solved the knowledge about the origin of the stolen fish would be a significant contribution in the continuing effort to prevent crime.

Thus, there is need in the art for providing methodologies suitable for tracing aquaculture seafood, in particular farmed salmon, back to farm of origin. One way of fulfilling this need would be to i) ensure that all fish at a fish farm facility have a dedicated pool of unique parents; and ii) develop methodologies for parentage assignment.

In the field of animal genetics, low density single nucleotide polymorphisms (SNPs), microsatellites or amplified fragment length polymorphisms (AFLPs) have long been the preferred genomic data types for parentage assignment due to low cost (e.g. (Heaton et al., 2014), (Waldbieser & Bosworth, 2013),(Campbell, Duchesne, & Bernatchez, 2003)). In practice, the foundation of today's parentage assignment rests on exclusion- and likelihood-based methods(A. G. Jones, Small, Paczolt, & Ratterman, 2010). Exclusion-based methods rely on the ability to exclude false parent-offspring combinations where the offspring's putative parents' genotypes are different from what is allowed by Mendel's law. These methods are often used due to their ease of interpretation, but the number of expected exclusions depends on the population's allele frequencies as well as genotype error rates(Morrissey & Wilson, 2005). Exclusion-based methods also require more loci than Likelihood-based methods since only genotypes with Mendelian

inconsistencies are used (Strucken et al., 2016).

Likelihood-based methods typically calculate the likelihood ratio (LR) of the child genotype, i.e., the probability of the child genotype given the genotypes of the putative parents, relative to the probability of observing the genotype in the population by chance, effectively giving more weight to rare alleles. The different loci are typically assumed independent, and total LR is thus multiplied over all loci. Likelihood-based methods have higher power than exclusion-based methods, but their interpretation is more complicated. Both likelihood- and exclusion-based models usually assume known and homogenous genotype error rate, independent loci and do not account for variation in genotype call rate (Morrissey & Wilson, 2005),(Marshall, Slate, Kruuk, & Pemberton, 1998), (Purfield, McClure, & Berry, 2016)), all of which are important assumptions when working with high density SNP data. For dense SNP-chip data, the assumption of independent inheritance among loci is not realistic (i.e., alleles are inherited in large segments), which may lead to inflated LR values using conventional likelihood-based methods.

In contrast to the above methodologies, the present invention is based on parentage assignment through relationship-logic using realized genomic relationships. The interrelationship among parents governs the expected inbreeding in offspring as well as parent-offspring relationships. Genomic relationships assess the average genomic similarity across loci and does not assume independent loci. Hence, by increasing the number of markers in calculations, the precision of the genomic relationships increases, making the method in particular suitable for high density SNP data.

The claimed computer-implemented method is well suited to perform parentage assignment with high accuracy, in particular on high density SNP datasets. It can be applied with success on datasets with high and/or unknown genotype error rate, highly dependent markers, closely related animals, inbreeding and in some cases clones. In addition, the assignment can be done without having a reference dataset with known parent-offspring trio combinations.

It is to be understood that even though the above disclosure is focused on parentage assignment of fish, the computer-implemented method of the present invention is well suited for parentage assignment of any organism that reproduce sexually. The person skilled in the art will know which plants, insects, birds, mammals, fish, reptiles, amphibians, mollusks and others that reproduce sexually and which therefore may be subject to parentage assignment by the claimed method.

SUMMARY OF THE INVENTION

A first aspect of the present invention relates to a computer-implemented method for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship, the method comprising:

a) receiving experimental data comprising data related to alleles present within the genome of:

i) the offspring (O); ii) a first potential parent (PI) of the offspring; and

iii) a second potential parent (P2) of the offspring;

for selected loci;

b) processing the experimental data to generate at least one set of computerized genomic relationship parameters; and

c) receiving control data comprising at least one set of predetermined genomic relationship parameters;

d) comparing the at least one set of computerized genomic relationship parameters with the at least one set of predetermined genomic relationship parameters. In one preferred embodiment according to the first aspect of the present invention, the genomic relationship does not assume independent loci.

In an even more preferred embodiment according to the first aspect of the present invention,

- the experimental data comprises data relating to number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus;

- the at least one set of computerized genomic relationship parameters is

calculated based upon genomic relationship between:

i) the offspring and the first potential parent of the offspring (ro,pi);

ii) the offspring and the second potential parent of the offspring (ro,p 2 ); iii) the offspring and itself (ro,o);

iv) the first potential parent of the offspring and the second potential parent of the offspring (rpi,p 2 );

v) the first potential parent of the offspring and itself (rpi,pi); and vi) the second potential parent of the offspring and itself (rp 2 ,p 2 );

the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on basis of at least the number of copies of a reference allele present for each of the selected loci; and - the genomic relationship does not assume independent loci.

General aspects

In one embodiment according to the first aspect of the present invention, the offspring is an animal such as a fish, and in particular farmed fish. The fish may e.g. be selected from the group consisting of Atlantic salmon Salmo Salar, rainbow trout Oncorhynchus mykiss and Atlantic cod Gadus morhua. In one preferred

embodiment, the fish is Atlantic salmon Salmo Salar.

In another embodiment according to the first aspect of the present invention, the offspring is diploid.

In another embodiment according to the first aspect of the present invention, the first potential parent (PI) of the offspring and the second potential parent (P2) of the offspring are of different gender.

In another embodiment according to the first aspect of the present invention, the first potential parent (PI) of the offspring, the second potential parent (P2) of the offspring and the offspring (O) reproduce sexually. Asexual reproduction being the opposite of sexual reproduction.

In a further embodiment according to the present invention, at least one of the selected loci are polymorphic.

In a further embodiment according to the present invention, at least half of the selected loci are polymorphic.

In a further embodiment according to the present invention, all of the selected loci are polymorphic.

Experimental data

In one embodiment according to the present invention, the number of selected loci that forms basis for calculating the at least one set of computerized genomic relationship parameters is at least 100, such as at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900 or at least 1000.

In a further embodiment according to the present invention, at least one of the selected loci are polymorphic. In a further embodiment according to the present invention, at least half of the selected loci are polymorphic. In a further embodiment according to the present invention, all of the selected loci are polymorphic.

In a further embodiment according to the first aspect of the present invention, the experimental data comprises data relating to the number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus.

In another embodiment according to the first aspect of the present invention, the experimental data further comprises data relating to allele frequency of the reference allele for each of the selected loci; and optionally total number of loci.

In one embodiment according to the present invention, the allele frequency is set at a fixed value in the range 0.1-0.9, such as in the range 0.2-0.8 or in the range 0.3- 0.7 or in the range 0.4-0.6. In one embodiment, the allele frequency is set at a fixed value of 0.5.

In one embodiment according to the present invention, the experimental data comprises data relating to

- the number of copies of a reference allele present for each of the selected loci; - allele frequency of the reference allele for each of the selected loci; and

optionally;

- total number of loci;

wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus.

In one embodiment according to the present invention, the experimental data related to the number of copies of the reference allele present for each of the selected loci is: "0" for each of the selected loci that has two copies of the reference allele; "1" for each of the selected loci that has only one copy of the reference allele; and "2" for each of the selected loci that has no copies of the reference allele.

In one embodiment according to the present invention, the experimental data related to the number of copies of the reference allele present for each of the selected loci is: "0" for each of the selected loci that is homozygote for the reference allele; "1" for each of the selected loci that is heterozygote for the reference allele; and "2" for each of the selected loci that is homozygote for a non-reference allele, i.e.

homozygote for an alternative allele. A non-reference allele is herein meant to refer to an allele that is different from the reference allele. In another embodiment according to the first aspect of the present invention, the at least one set of computerized genomic relationship parameters is calculated based upon genomic relationship between:

i) the offspring and the first potential parent of the offspring (ΓΟ,ΡΙ);

ii) the offspring and the second potential parent of the offspring (ro,p 2 );

iii) the offspring and itself (ro,o);

iv) the first potential parent of the offspring and the second potential parent of the offspring (rpi,p 2 );

v) the first potential parent of the offspring and itself (rpi,pi); and

vi) the second potential parent of the offspring and itself (rp 2 ,p 2 ).

In one embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus.

In another embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; and the allele frequency of the reference allele for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus.

In another embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; the allele frequency of the reference allele for each of the selected loci; and total number of loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus. In one embodiment according to the present invention, the number of copies of the reference allele present for each of the selected loci is "0" for each of the selected loci that has two copies of the reference allele; "1" for each of the selected loci that has only one copy of the reference allele; and "2" for each of the selected loci that has no copies of the reference allele.

In another embodiment according to the present invention, the number of copies of the reference allele present for each of the selected loci is "0" for each of the selected loci that is homozygote for the reference allele; "1" for each of the selected loci that is heterozygote for the reference allele; and "2" for each of the selected loci that is homozygote for a non-reference allele, i.e. homozygote for an alternative allele.

In a further embodiment according to the present invention, genomic relationship between two organisms i and j is calculated according to formula (1):

wherein

represents genomic relationship between organisms i and j; relate to the number of copies of a reference allele present for

organisms i and j at locus t;

represents the allele frequency at locus t; and c represents the total number of loci;

wherein

the reference allele at locus t is selected from the group consisting of i) alleles present within the genome of the offspring at locus i; ii) alleles present within the genome of the first potential parent of the offspring at locus t; and iii) alleles present within the genome of the second potential parent of the offspring at locus t.

In one embodiment according to the present invention, are

independently selected from the group consisting of "0", "1" or "2" depending on the number of copies of a reference allele that is present for animals i and j at locus t.

In one embodiment according to the present invention, are

independently selected from - "0" for locus t if that locus has two copies of the reference allele;

- "1" for locus t if that locus has one copy of the reference allele; or

- "2" for "0" for locus t if that locus has no copies of the reference allele.

In one embodiment according to the present invention, each set of computerized genomic relationship parameters comprises at least three computerized genomic relationship parameters.

In a further embodiment according to the present invention, each set of

computerized genomic relationship parameters comprises at least three

computerized genomic relationship parameters; the at least three computerized genomic relationship parameters being defined as follows:

- a first computerized genomic relationship parameter is calculated according to formula (2)

rΟ,Ρ1 representing the genomic relationship between the offspring and the first potential parent of the offspring; representing the average of x and y, wherein x represents the

genomic relationship between the first potential parent of the offspring and the second potential parent of the offspring; and y represents the genomic relationship between the first potential parent of the offspring and itself;

- a second computerized genomic relationship parameter is calculated according to formula (3)

ro,p 2 representing the genomic relationship between the offspring and the second potential parent of the offspring;

representing the average of x and y, wherein x represents the

genomic relationship between the first potential parent of the offspring and the second potential parent of the offspring; and y represents the genomic relationship between the second potential parent of the offspring and itself; and

- a third computerized genomic relationship parameter is calculated according to formula (4) ro,o representing the genomic relationship between the offspring and itself; representing the genomic relationship between the first potential parent of the offspring and the second potential parent of the offspring.

Control data

In a further embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters has been calculated based upon alleles present within the genome of:

i) a control offspring (o);

ii) a first true parent (TP1) of the control offspring; and

iii) a second true parent (TP2) of the control offspring;

for selected loci;

wherein at least one of i), ii) and iii) is different for each set of the predetermined genomic relationship parameters.

In a further embodiment according to the present invention, at least one of the selected loci are polymorphic. In a further embodiment according to the present invention, at least half of the selected loci are polymorphic. In a further embodiment according to the present invention, all of the selected loci are polymorphic.

In case of more than one set of predetermined genomic relationship parameters, each set has been calculated based upon alleles present within the genome of a unique combination of animals, i.e. at least one of i), ii) and iii) is unique to each set of the predetermined genomic relationship parameters.

In one embodiment according to the present invention, the number of selected loci that forms basis for calculating the at least one set of predetermined genomic relationship parameters is at least 100, such as at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900 or at least 1000.

In one embodiment according to the first aspect of the present invention, the control offspring is an animal such as a fish, and in particular farmed fish. The fish may e.g. be selected from the group consisting of Atlantic salmon Salmo Salar, rainbow trout Oncorhynchus mykiss and Atlantic cod Gadus morhua. In one preferred

embodiment, the fish is Atlantic salmon Salmo Salar.

In another embodiment according to the first aspect of the present invention, the control offspring is diploid. In another embodiment according to the first aspect of the present invention, the first true parent (TP1) of the control offspring and the second true parent (TP2) of the control offspring are of different gender.

In another embodiment according to the first aspect of the present invention, the first true parent (TP1) of the control offspring, the second potential parent (P2) of the control offspring and the control offspring (o) reproduce sexually. Asexual reproduction being the opposite of sexual reproduction.

In a further embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters has been calculated based upon the number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In a further embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters has been calculated based upon the number of copies of a reference allele present for each of the selected loci; allele frequency of the reference allele for each of the selected loci; and optionally total number of loci;

wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In one embodiment according to the present invention, the allele frequency is set at a fixed value in the range 0.1-0.9, such as in the range 0.2-0.8 or in the range 0.3- 0.7 or in the range 0.4-0.6. In one embodiment, the allele frequency is set at a fixed value of about 0.5.

In one embodiment, the number of copies of the reference allele present for each of the selected loci is: "0" for each of the selected loci that has two copies of the reference allele; "1" for each of the selected loci that has only one copy of the reference allele; and "2" for each of the selected loci that has no copies of the reference allele.

In one embodiment, the number of copies of the reference allele present for each of the selected loci is: "0" for each of the selected loci that is homozygote for the reference allele; "1" for each of the selected loci that is heterozygote for the reference allele; and "2" for each of the selected loci that is homozygote for a non- reference allele, i.e. homozygote for an alternative allele. A non-reference allele is herein meant to refer to an allele that is different from the reference allele.

In another embodiment according to the first aspect of the present invention, the at least one set of predetermined genomic relationship parameters is calculated based upon genomic relationship between:

i) the control offspring and the first true parent of the control offspring;

ii) the control offspring and the second true parent of the control offspring;

iii) the control offspring and itself;

iv) the first true parent of the control offspring and the second true parent of the control offspring;

v) the first true parent of the control offspring and itself; and

vi) the second true parent of the control offspring and itself.

In one embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In another embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; and the allele frequency of the reference allele for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In another embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; the allele frequency of the reference allele for each of the selected loci; and total number of loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In one embodiment according to the present invention, the number of copies of the reference allele present for each of the selected loci is "0" for each of the selected loci that has two copies of the reference allele; "1" for each of the selected loci that has only one copy of the reference allele; and "2" for each of the selected loci that has no copies of the reference allele.

In another embodiment according to the present invention, the number of copies of the reference allele present for each of the selected loci is "0" for each of the selected loci that is homozygote for the reference allele; "1" for each of the selected loci that is heterozygote for the reference allele; and "2" for each of the selected loci that is homozygote for a non-reference allele, i.e. homozygote for an alternative allele.

In a further embodiment according to the present invention, genomic relationship between two organisms i and j is calculated according to formula (1):

wherein

represents genomic relationship between organisms i and j; relate to the number of copies of a reference allele present for

organisms i and j at locus t;

p t represents the allele frequency at locus t; and c represents the total number of loci;

wherein

the reference allele at locus t is selected from the group consisting of i) alleles present within the genome of the control offspring at locus t; ii) alleles present within the genome of the first true parent of the control offspring at locus t; and iii) alleles present within the genome of the second true parent of the control offspring at locus t.

In one embodiment according to the present invention, are

independently selected from the group consisting of "0", "1" or "2" depending on the number of copies of a reference allele that is present for animals i and j at locus t.

In one embodiment according to the present invention, are

independently selected from

- "0" for locus t if that locus has two copies of the reference allele;

- "1" for locus t if that locus has one copy of the reference allele; or

- "2" for "0" for locus t if that locus has no copies of the reference allele.

In one embodiment according to the present invention, each set of predetermined genomic relationship parameters comprises at least three predetermined genomic relationship parameters.

In a further embodiment according to the present invention, each set of

predetermined genomic relationship parameters comprises at least three

predetermined genomic relationship parameters; the at least three predetermined genomic relationship parameters being defined as follows:

- a first predetermined genomic relationship parameter is calculated according to formula (2)

ΓΟ,ΡΙ representing the genomic relationship between the control offspring and the first true parent of the control offspring; representing the average of x and y, wherein x represents the

genomic relationship between the first true parent of the control offspring and the second true parent of the control offspring; and y represents the genomic relationship between the first true parent of the control offspring and itself;

- a second predetermined genomic relationship parameter is calculated according to formula (3)

ro,p 2 representing the genomic relationship between the control offspring and the second true parent of the control offspring;

representing the average of x and y, wherein x represents the

genomic relationship between the first true parent of the control offspring and the second true parent of the control offspring; and y represents the genomic relationship between the second true parent of the control offspring and itself; and

- a third predetermined genomic relationship parameter is calculated according to formula (4)

ro,o representing the genomic relationship between the control offspring and itself;

representing the genomic relationship between the first true parent of the

control offspring and the second true parent of the control offspring.

In one embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters is more than 10 sets of predetermined genomic relationship parameters, such as at least 100 sets of predetermined genomic relationship parameters. Each set of the predetermined genomic relationship parameters being calculated based on genomic

interrelationships between control offspring and its parents.

In one embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters are fixed values, i.e. predetermined threshold values.

Comparison

In one embodiment according to the present invention, the at least one set of computerized genomic relationship parameters is compared with more than one set of predetermined genomic relationship parameters.

In one embodiment according to the present invention, the at least one set of computerized genomic relationship parameters is compared with overall means for more than one set of predetermined genomic relationship parameters.

In another embodiment according to the present invention, the at least one set of computerized genomic relationship parameters is compared with overall means for more than one set of predetermined genomic relationship parameters using a mathematical equation which is proportional to a multivariate normal density function. In one embodiment according to the present invention, the mathematical equation which is proportional to a multivariate normal density function is given by formula (5):

where is a vector of the first, second and third computerized genomic

relationship parameter respectively; is a vector of the first, second and third predetermined genomic

relationship parameter respectively, even more preferably a vector of overall means for the first, second and third predetermined genomic relationship parameter respectively; and

Σ is a 3x3 (co)variance matrix of the three predetermined genomic relationship parameters.

In a further embodiment according to the present invention, the at least one set of computerized genomic relationship parameters must lie within the highest 80%, such as within the highest 85%, highest 90%, highest 95% or highest 99%, of the at least one set of the predetermined genomic relationship parameters in order to confirm that the first potential parent (PI) of the offspring and the second potential parent (P2) of the offspring are the true parents (TP 1 and TP2 respectively) of the offspring (O).

In a further embodiment according to the present invention, the following two requirements need to be fulfilled in order to confirm that the potential parents of the offspring are the true parents of the offspring:

and

The GRL value is calculated using formula (5). The highest GRL-value (best trio) is referred to as GRLi, the second highest GRL-value (second best trio) is referred to as GRL 2 and AGRL is GRLi - GRL 2 . Determination of are defined in example 1 a. Preferably, the GRLthreshoid value is set at a value corresponding to the lowest GRL value among the 80% highest GRL values, such as among the highest 85% GRL values, among the highest 90% GRL values, among the highest 95% GRL values or preferably among the highest 99% GRL values of true trios. A second aspect of the present invention relates to a computer program product residing on a computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform the method according to the first aspect of the present invention.

A preferred embodiment according to the second aspect of the present invention relates to a computer program product residing on a computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform operations comprising:

a) receiving experimental data comprising data related to alleles present within the genome of:

i) the offspring (O);

ii) a first potential parent (PI) of the offspring; and

iii) a second potential parent (P2) of the offspring;

for selected loci;

b) processing the experimental data to generate at least one set of computerized genomic relationship parameters; and

c) receiving control data comprising at least one set of predetermined genomic relationship parameters;

d) comparing the at least one set of computerized genomic relationship parameters with the at least one set of predetermined genomic relationship parameters in order to determine whether the first potential parent of the offspring and the second potential parent of the offspring are the true parents of the offspring; wherein

- the experimental data comprises data relating to number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus;

- the at least one set of computerized genomic relationship parameters is

calculated based upon genomic relationship between:

i) the offspring and the first potential parent of the offspring (ΓΟ,ΡΙ);

ii) the offspring and the second potential parent of the offspring (ro,p 2 ); iii) the offspring and itself (ro,o);

iv) the first potential parent of the offspring and the second potential parent of the offspring (rpi,p 2 );

v) the first potential parent of the offspring and itself (rpi,pi); and vi) the second potential parent of the offspring and itself (rp 2 ,p 2 );

the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on basis of at least the number of copies of a reference allele present for each of the selected loci; and

- the genomic relationship does not assume independent loci.

A third aspect of the present invention relates to a computing system comprising a processor and a memory configured to perform the method according to the first aspect of the present invention.

In one embodiment according to the third aspect of the present invention, the computing system is for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship.

A fourth aspect of the present invention relates to a computing system comprising a processor configured to perform the method according to the first aspect of the present invention.

In one embodiment according to the fourth aspect of the present invention, the computing system is for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship.

A fifth aspect of the present invention relates to a data processing system comprising means for carrying out the method according to the first aspect of the present invention.

In one embodiment according to the fifth aspect of the present invention, the data processing system is for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship.

A sixth aspect of the present invention relates to a data processing system comprising means for carrying out step a), means for carrying out step b), means for carrying out step c) and/or means for carrying out step d) of the method according to the first aspect of the present invention.

In one embodiment according to the sixth aspect of the present invention, the data processing system is for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship. BRIEF DESCRIPTION OF DRAWINGS

The present invention is described in detail by reference to the following drawings:

Figure 1 illustrates a trio of offspring (O), first parent (PI) and second parent (P2). The variables near the arrows indicate the genomic relationships between organisms, while the variables over PI and P2, and below O, are the organisms' genomic relationships to themselves, respectively.

DETAILED DESCRIPTION OF THE INVENTION

Unless specifically defined herein, all technical and scientific terms used have the same meaning as commonly understood by a skilled artisan in the fields of genetics, biochemistry and bioinformatics.

All methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, with suitable methods and materials being described herein. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will prevail.

Where a numerical limit or range is stated herein, the endpoints are included. Also, all values and sub ranges within a numerical limit or range are specifically included as if explicitly written out.

The term "trio" as used herein refers to a group consisting of an offspring and its parents, i.e. a group of three organisms/individuals.

To the best of our knowledge, the prior art methods for assigning parents to an offspring are based on exclusion- and/or likelihood-methodologies. Both likelihood- and exclusion-based models usually assume known and homogenous genotype error rate, independent loci and do not account for variation in genotype call rate, all of which are important assumptions when working with high density SNP data. For dense SNP-chip data, the assumption of independent inheritance among loci is not realistic (i.e., alleles are inherited in large segments), which may lead to inflated LR values using conventional likelihood-based methods.

In contrast to the prior art methodologies, the present computer-implemented method takes a fundamentally different approach by looking at parentage assignment through relationship-logic, using realized genomic relationships. The interrelationship among parents governs the expected inbreeding in offspring as well as parent-offspring relationships. Genomic relationships assess the average genomic similarity across loci and does not assume independent loci. Hence, by increasing the number of markers in calculations, the precision of the genomic relationships increases, making the method suitable for high density SNP data. A first aspect of the present invention relates to a computer-implemented method for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship, the method comprising:

a) receiving experimental data comprising data related to alleles present within the genome of:

i) the offspring (O);

ii) a first potential parent (PI) of the offspring; and

iii) a second potential parent (P2) of the offspring;

for selected loci;

b) processing the experimental data to generate at least one set of computerized genomic relationship parameters; and

c) receiving control data comprising at least one set of predetermined genomic relationship parameters;

d) comparing the at least one set of computerized genomic relationship parameters with the at least one set of predetermined genomic relationship parameters.

General aspects

In the context of the present invention, the term "genomic relationship" refers to a cumulative measure of DNA similarity over the genome.

The term "allele" refers to a variant form of a given gene. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation. However, most genetic variations result in little or no observable variation.

The term "genome" refers to the genetic material of an organism. Except from some RNA viruses which has a genome consisting of R A, most other organisms has a genome consisting of DNA. The genome includes both the genes (the coding regions), the noncoding DNA and the genetic material of the mitochondria and chloroplasts.

The terms "first potential parent (PI) of the offspring" and "second potential parent (P2) of the offspring" refer to two candidate organisms which may be true parents of the offspring. By using the above mentioned computer-implemented method, it may be decided whether these two candidates in fact are the true parents of the offspring or not.

The term "locus" (plural loci) refers to the position on a chromosome. A variant of the similar DNA sequence located at a given locus is called an allele. The ordered list of loci known for a particular genome is called a gene map. Gene mapping is the process of determining the locus for a particular biological trait.

The computer-implemented method of the present invention is well suited for parentage assignment of a huge number of different organism, the only requirement being that the organism is an organism that reproduce sexually.

Thus, in one embodiment according to the first aspect of the present invention the offspring is a plant, an insect, a bird, a mammal, a fish, a reptile, an amphibian or a mollusk. More preferably, the offspring is an animal such as a fish, and in particular farmed fish. The fish may e.g. be selected from the group consisting of Atlantic salmon Salmo Salar, rainbow trout Oncorhynchus mykiss and Atlantic cod Gadus morhua. In an even more preferred embodiment the offspring is Atlantic salmon, in particular farmed Atlantic salmon. It is preferred that the first potential parent (PI) of the offspring, the second potential parent (P2) of the offspring and the offspring (O) are organisms which reproduce sexually.

The term "sexual reproduction" is a form of reproduction where two

morphologically distinct types of specialized reproductive cells fuse together. Each reproductive cell contains half the number of chromosomes of normal cells. They are created by a specialized type of cell division, which only occurs in eukaryotic cells, known as meiosis. The two reproductive cells fuse during fertilization to produce DNA replication and the creation of a single-celled zygote which includes genetic material from both reproductive cells.

If all fish at a fish farm facility have a dedicated pool of unique parents, the origin of a fish which has escaped from this facility may be determined by parenteral assignment using the above mentioned computer-implemented method. Being able to determine the origin of escaped fish is an important contribution in the continuing effort to prevent fish from escaping, but such technology may also find other fields of utilization such as in crime prevention.

In another embodiment according to the first aspect of the present invention, the offspring is diploid. In one embodiment, the offspring and its parents are diploid. The term "diploid organism" refers to organisms which have two homologous copies of each chromosome, usually one from the mother and one from the father.

In another embodiment according to the present invention, the first potential parent (PI) of the offspring and the second potential parent (P2) of the offspring are of different gender. Experimental data

Genomic relationships estimates usually require large numbers of loci (Goddard, Hayes, & Meuwissen, 201 1), and its expectation is proportional to the genetic covariance between individuals. Thus, in one embodiment according to the present invention, the number of selected loci that forms basis for calculating the at least one set of computerized genomic relationship parameters is at least 100, such as at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900 or at least 1000. In another embodiment according to the present invention, the number of selected loci that forms basis for calculating the at least one set of computerized genomic relationship parameters is in the range 100 to 100 000, such as in the range 500 to 80 000 or in the range 1000 to 70 000.

Further, genomic relationships assess the average genomic similarity across loci and it is therefore preferred that the selected loci are polymorphic. Thus, in one embodiment according to the present invention at least one of the selected loci that forms basis for calculating the at least one set of computerized genomic relationship parameters is polymorphic. In another embodiment according to the present invention, at least 10% of the selected loci that forms basis for calculating the at least one set of computerized genomic relationship parameters are polymorphic, such as at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90 % or that all of the selected loci that forms basis for calculating the at least one set of computerized genomic relationship parameters are polymorphic.

In the context of the present invention, the term "polymorphic locus" refers to a genetic locus with two or more different alleles, at which the most common form has a frequency not exceeding 0.95 in a given population, i.e. that the most common form has a frequency not exceeding 95% in a given population. A detailed definition of allele frequency is provided below.

In a further embodiment according to the first aspect of the present invention, the experimental data comprises data relating to the number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus.

Which allele is being selected as the reference allele for a given locus is of no importance as long as the reference allele for the given locus is used as the reference allele for both offspring, first potential parent of the offspring and second potential parent of the offspring. In order to illustrate this further, reference is made to the following example. A diploid offspring has the alleles AB at locus t. A first potential parent of the offspring has the alleles AA at locus t and a second potential parent of the offspring has the alleles BB at locus t. The reference allele may be either A or B. If A is selected as the reference allele, A represents the reference allele both for the offspring, first potential parent and the second potential parent. It is of no importance whether A or B is selected as the reference allele.

If A is selected as the reference allele, the offspring will have 1 copy of the reference allele at locus t, the first potential parent of the offspring will have 2 copies of the reference allele at locus t and the second potential parent of the offspring will have no copies of the reference allele at locus t.

In one embodiment according to the present invention, the experimental data related to the number of copies of the reference allele present for each of the selected loci is: "0" for each of the selected loci that has two copies of the reference allele; "1" for each of the selected loci that has only one copy of the reference allele; and "2" for each of the selected loci that has no copies of the reference allele.

In one embodiment according to the present invention, the experimental data related to the number of copies of the reference allele present for each of the selected loci is: "0" for each of the selected loci that is homozygote for the reference allele; "1" for each of the selected loci that is heterozygote for the reference allele; and "2" for each of the selected loci that is homozygote for a non-reference allele, i.e.

homozygote for an alternative allele. A non-reference allele is herein meant to refer to an allele that is different from the reference allele.

A diploid organism is "heterozygous" at a gene locus when its cells contain two different alleles of a gene. A cell is said to be "homozygous" for a particular gene when identical alleles of the gene are present on both homologous chromosomes.

In another embodiment according to the present invention the experimental data further comprises data relating to allele frequency of the reference allele for each of the selected loci; and optionally total number of loci.

The term "allele frequency" refers to the relative frequency of an allele (variant of a gene) at a particular locus in a population, expressed as a fraction or percentage. In order to illustrate this further, reference is made to the following example.

Consider a locus that carries two alleles, A and B. In a diploid population there are three possible genotypes, two homozygous genotypes (AA and BB), and one heterozygous genotype (AB). If 10 individuals are sampled from a population, and we observe the genotype frequencies freq(AA) = 6, freq(AB) = 3 and freq(BB) = 1 then there are 15 observed copies of the A allele and 5 of the B allele out of 20 total chromosome copies. The allele frequency of the A allele is then 0.75 (75%), and the allele frequency of the B allele is 0.25 (25%).

In one embodiment according to the present invention, the allele frequency is set at a fixed value in the range 0.1 to 0.95, such as in the range 0.1 to 0.8, in the range 0.1 to 0.7, in the range 0.1 to 0.6, or in the range 0.1 to 0.5. In one embodiment, the allele frequency is set at a fixed value of about 0.5 such as 0.5.

It is to be understood that if the allele frequency is not set at a fixed value, the allele frequency may be determined by determining the allele frequency at a given locus in a population of organisms. In one embodiment according to the present invention, the number of organisms in said population is preferably higher than 10, such as more than 15, more than 20 or more than 50 organisms.

The term "total number of loci" refers to the total number of loci that has been selected.

In another embodiment according to the first aspect of the present invention, the at least one set of computerized genomic relationship parameters is calculated based upon genomic relationship between:

i) the offspring and the first potential parent of the offspring (ro,pi);

ii) the offspring and the second potential parent of the offspring (ro,p 2 );

iii) the offspring and itself (ro,o);

iv) the first potential parent of the offspring and the second potential parent of the offspring (rpi,p 2 );

v) the first potential parent of the offspring and itself (rpi,pi); and

vi) the second potential parent of the offspring and itself (rp 2 ,p 2 ).

In one embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus.

In another embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; and the allele frequency of the reference allele for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus.

In another embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; the allele frequency of the reference allele for each of the selected loci; and total number of loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus.

In one embodiment according to the present invention, the number of copies of the reference allele present for each of the selected loci is "0" for each of the selected loci that has two copies of the reference allele; "1" for each of the selected loci that has only one copy of the reference allele; and "2" for each of the selected loci that has no copies of the reference allele.

In another embodiment according to the present invention, the number of copies of the reference allele present for each of the selected loci is "0" for each of the selected loci that is homozygote for the reference allele; "1" for each of the selected loci that is heterozygote for the reference allele; and "2" for each of the selected loci that is homozygote for a non-reference allele, i.e. homozygote for an alternative allele.

The term "non-reference allele" refers to an allele that is different from the reference allele.

In a most preferred embodiment of the present invention, genomic relationship between two organisms i and j is calculated according to formula (1):

wherein i J represents genomic relationship between organisms i and j; relate to the number of copies of a reference allele present for organisms i and j at locus t; represents the allele frequency at locus t; and

c represents the total number of loci;

wherein

the reference allele at locus t is selected from the group consisting of i) alleles present within the genome of the offspring at locus t; ii) alleles present within the genome of the first potential parent of the offspring at locus t; and iii) alleles present within the genome of the second potential parent of the offspring at locus t. In one embodiment according to the present invention, are

independently selected from the group consisting of "0", "1" or "2" depending on the number of copies of a reference allele that is present for animals i and j at locus t.

In one embodiment according to the present invention, are

independently selected from

- "0" for locus t if that locus has two copies of the reference allele;

- "1" for locus t if that locus has one copy of the reference allele; or

- "2" for "0" for locus t if that locus has no copies of the reference allele.

Genomic relationships calculated in accordance with formula (1) can be based on extremely dense genomic data (even up to full sequence), and does not assume independence of the loci. As the genomic relationship is a cumulative measure of DNA similarity over the genome, the method is expected to be robust to genotyping errors, given that correct genotypes are much more common than genotyping errors. In contrast, using likelihood-based methods with multiplicative likelihoods over loci, a limited set of genotyping errors may dramatically reduce the likelihood of a true parent pair.

Formula (1) may be used to estimate the genomic interrelationship between parents and offspring, i.e., the relationship of the offspring with itself relationships of parent candidates with themselves relationships between the offspring and both parent candidates and relationships between the parent candidates see figure 1.

The expected genomic relationships between two true parents (TP) and an offspring are:

In other words, the relationship between a child and its parent is the average of the genomic relationship of the parent with itself and the relationship among the two parents. The expected relationship between the offspring and itself is:

where is the expected inbreeding coefficient of the offspring. Three residual relationships are defined as differences between actual and expected genomic relationships:

Inbreeding is accounted for when using the above residuals, as well as the direction of the relationships (cannot switch a parent with the offspring and get the same result).

Thus, in one embodiment according to the present invention, each set of the computerized genomic relationship parameters comprises at least three

computerized genomic relationship parameters.

In another embodiment according to the present invention, each set of computerized genomic relationship parameters comprises at least three computerized genomic relationship parameters; the at least three computerized genomic relationship parameters being defined as follows:

- a first computerized genomic relationship parameter is calculated according to formula (2)

representing the genomic relationship between the offspring and the first potential parent of the offspring; representing the average of x and y, wherein x represents the

genomic relationship between the first potential parent of the offspring and the second potential parent of the offspring; and y represents the genomic relationship between the first potential parent of the offspring and itself;

- a second computerized genomic relationship parameter is calculated according to formula (3)

ro,p 2 representing the genomic relationship between the offspring and the second potential parent of the offspring; representing the average of x and y, wherein x represents the

genomic relationship between the first potential parent of the offspring and the second potential parent of the offspring; and y represents the genomic relationship between the second potential parent of the offspring and itself; and

- a third computerized genomic relationship parameter is calculated according to formula (4)

ro,o representing the genomic relationship between the offspring and itself; representing the genomic relationship between the first potential parent of

the offspring and the second potential parent of the offspring.

Control data

During the assigning step, the at least one set of computerized genomic relationship parameters is compared with the at least one set of predetermined genomic relationship parameters.

In one embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters are fixed values, i.e. predetermined threshold values. These threshold values may be compared with the at least one set of computerized genomic relationship parameters and based on this comparison it may be decided whether the first potential parent of the offspring and the second potential parent of the offspring are the true parents of the offspring.

In another embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters has been calculated based upon alleles present within the genome of: i) a control offspring (o);

ii) a first true parent (TP1) of the control offspring; and

iii) a second true parent (TP2) of the control offspring;

for selected loci.

The computer-implemented method of the present invention is well suited for parentage assignment of a huge number of different organism, the only requirement being that the organism is an organism that reproduce sexually.

Thus, in one embodiment according to the first aspect of the present invention the control offspring is a plant, an insect, a bird, a mammal, a fish, a reptile, an amphibian or a mollusk. More preferably, the control offspring is an animal such as a fish, and in particular farmed fish. The fish may e.g. be selected from the group consisting of Atlantic salmon Salmo Salar, rainbow trout Oncorhynchus mykiss and Atlantic cod Gadus morhua. In an even more preferred embodiment the control offspring is Atlantic salmon, in particular farmed Atlantic salmon. It is preferred that the first true parent (TP1) of the control offspring, the second true parent (TP2) of the control offspring and the control offspring (o) are organisms which reproduce sexually. In one preferred embodiment according to the present invention, the control offspring (o) and the offspring (O) are of the same species, i.e. if the control offspring (o) is a fish the offspring (O) is also a fish.

In another embodiment according to the first aspect of the present invention, the control offspring is diploid. In a further embodiment, the offspring and its parents are diploid. In a preferred embodiment according to the present invention, the control offspring (o) is diploid if the offspring (O) is diploid.

In another embodiment according to the first aspect of the present invention, the first true parent (TP1) of the control offspring and the second true parent (TP2) of the offspring are of different gender.

In another embodiment according to the first aspect of the present invention, the first true parent (TP1) of the control offspring, the second true parent (TP2) of the control offspring and the control offspring (o) reproduce sexually. Asexual reproduction being the opposite of sexual reproduction.

Each set of predetermined genomic relationship parameters are preferably being calculated based upon a unique combination of organisms, i.e. that at least one of i), ii) and iii) is different for each set of the predetermined genomic relationship parameters.

In case of more than one set of predetermined genomic relationship parameters, each set has been calculated based upon alleles present within the genome of a unique combination of animals, i.e. at least one of i), ii) and iii) is unique to each set of the predetermined genomic relationship parameters.

In one embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters is at least 10 sets of predetermined genomic relationship parameters, such as at least 50 sets, at least 100 sets or at least 500 sets of predetermined genomic relationship parameters.

Genomic relationships estimates usually require large numbers of loci (Goddard, Hayes, & Meuwissen, 201 1), and its expectation is proportional to the genetic covariance between individuals. Thus, in one embodiment according to the present invention, the number of selected loci that forms basis for calculating the at least one set of predetermined genomic relationship parameters is at least 100, such as at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900 or at least 1000. In another embodiment according to the present invention, the number of selected loci that forms basis for calculating the at least one set of predetermined genomic relationship parameters is in the range 100 to 100 000, such as in the range 500 to 80 000 or in the range 1000 to 70 000. In another embodiment according to the present invention, the number of selected loci that forms basis for calculating the at least one set of predetermined genomic relationship parameters is similar or identical to the number of selected loci that forms basis for calculating the at least one set of computerized genomic relationship parameters.

Further, genomic relationships assess the average genomic similarity across loci and it is therefore preferred that the selected loci are polymorphic. Thus, in one embodiment according to the present invention at least one of the selected loci that forms basis for calculating the at least one set of predetermined genomic relationship parameters is polymorphic. In another embodiment according to the present invention, at least 10% of the selected loci that forms basis for calculating the at least one set of predetermined genomic relationship parameters are polymorphic, such as at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90 % or that all of the selected loci that forms basis for calculating the at least one set of predetermined genomic relationship parameters are polymorphic.

In a further embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters has been calculated based upon the number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

Which allele is being selected as the reference allele for a given locus is of no importance as long as the reference allele for the given locus is used as the reference allele for both control offspring, first true parent of the control offspring and second true parent of the control offspring.

In one embodiment, the number of copies of the reference allele present for each of the selected loci is: "0" for each of the selected loci that has two copies of the reference allele; "1" for each of the selected loci that has only one copy of the reference allele; and "2" for each of the selected loci that has no copies of the reference allele.

In one embodiment, the number of copies of the reference allele present for each of the selected loci is: "0" for each of the selected loci that is homozygote for the reference allele; "1" for each of the selected loci that is heterozygote for the reference allele; and "2" for each of the selected loci that is homozygote for a non- reference allele, i.e. homozygote for an alternative allele. A non-reference allele is herein meant to refer to an allele that is different from the reference allele.

In a further embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters has been calculated based upon the number of copies of a reference allele present for each of the selected loci; allele frequency of the reference allele for each of the selected loci; and optionally total number of loci;

wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In one embodiment according to the present invention, the allele frequency is set at a fixed value in the range 0.1 to 0.95, such as in the range 0.1 to 0.8, in the range 0.1 to 0.7, in the range 0.1 to 0.6, or in the range 0.1 to 0.5. In one embodiment, the allele frequency is set at a fixed value of about 0.5 such as 0.5.

It is to be understood that if the allele frequency is not set at a fixed value, the allele frequency may be determined by determining the allele frequency at a given locus in a population of organisms. In one embodiment according to the present invention, the number of organisms in said population is preferably higher than 10, such as more than 15, more than 20 or more than 50 organisms. The term "total number of loci" refers to the total number of loci that has been selected.

In another embodiment according to the first aspect of the present invention, the at least one set of predetermined genomic relationship parameters is calculated based upon genomic relationship between:

i) the control offspring and the first true parent of the control offspring;

ii) the control offspring and the second true parent of the control offspring;

iii) the control offspring and itself;

iv) the first true parent of the control offspring and the second true parent of the control offspring;

v) the first true parent of the control offspring and itself; and

vi) the second true parent of the control offspring and itself.

In one embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In another embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; and the allele frequency of the reference allele for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In another embodiment according to the present invention, the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on the basis of at least the number of copies of a reference allele present for each of the selected loci; the allele frequency of the reference allele for each of the selected loci; and total number of loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the control offspring at the given locus; ii) alleles present within the genome of the first true parent of the control offspring at the given locus; and iii) alleles present within the genome of the second true parent of the control offspring at the given locus.

In one embodiment according to the present invention, the number of copies of the reference allele present for each of the selected loci is "0" for each of the selected loci that has two copies of the reference allele; "1" for each of the selected loci that has only one copy of the reference allele; and "2" for each of the selected loci that has no copies of the reference allele.

In another embodiment according to the present invention, the number of copies of the reference allele present for each of the selected loci is "0" for each of the selected loci that is homozygote for the reference allele; "1" for each of the selected loci that is heterozygote for the reference allele; and "2" for each of the selected loci that is homozygote for a non-reference allele, i.e. homozygote for an alternative allele.

In a most preferred embodiment according to the present invention, genomic relationship between two organisms i and j is calculated according to formula (1):

wherein

represents genomic relationship between organisms i and j; relate to the number of copies of a reference allele present for

organisms i and j at locus t;

represents the allele frequency at locus t; and c represents the total number of loci;

wherein

the reference allele at locus t is selected from the group consisting of i) alleles present within the genome of the control offspring at locus t; ii) alleles present within the genome of the first true parent of the control offspring at locus i; and iii) alleles present within the genome of the second true parent of the control offspring at locus t.

In one embodiment according to the present invention, are

independently selected from the group consisting of "0", "1" or "2" depending on the number of copies of a reference allele that is present for animals i and j at locus t. In one embodiment according to the present invention, are

independently selected from

- "0" for locus t if that locus has two copies of the reference allele;

- "1" for locus t if that locus has one copy of the reference allele; or

- "2" for "0" for locus t if that locus has no copies of the reference allele.

Formula (1) may be used to estimate the genomic interrelationship between true parents and control offspring, i.e., the relationship of the control offspring with itself relationships of parents with themselves relationships

between the control offspring and both parents and relationships between the parents see figure 1.

The expected genomic relationships between two true parents (TP) and an offspring are:

In other words, the relationship between a child and its parent is the average of the genomic relationship of the parent with itself and the relationship among the two parents. The expected relationship between the offspring and itself is:

where is the expected inbreeding coefficient of the offspring. Three residual relationships are defined as differences between actual and expected genomic relationships:

Inbreeding is accounted for when using the above residuals, as well as the direction of the relationships (cannot switch a parent with the offspring and get the same result). In one embodiment according to the present invention, each set of predetermined genomic relationship parameters comprises at least three predetermined genomic relationship parameters.

In a further embodiment according to the present invention, each set of

predetermined genomic relationship parameters comprises at least three

predetermined genomic relationship parameters; the at least three predetermined genomic relationship parameters being defined as follows:

- a first predetermined genomic relationship parameter is calculated according to formula (2)

ΓΟ,ΡΙ representing the genomic relationship between the control offspring and the first true parent of the control offspring;

representing the average of x and y, wherein x represents the

genomic relationship between the first true parent of the control offspring and the second true parent of the control offspring; and y represents the genomic relationship between the first true parent of the control offspring and itself;

- a second predetermined genomic relationship parameter is calculated according to formula (3)

ro,p 2 representing the genomic relationship between the control offspring and the second true parent of the control offspring;

representing the average of x and y, wherein x represents the

genomic relationship between the first true parent of the control offspring and the second true parent of the control offspring; and y represents the genomic relationship between the second true parent of the control offspring and itself; and

- a third predetermined genomic relationship parameter is calculated according to formula (4)

ro,o representing the genomic relationship between the control offspring and itself; representing the genomic relationship between the first true parent of the control offspring and the second true parent of the control offspring.

In one embodiment according to the present invention, the at least one set of predetermined genomic relationship parameters is more than 10 sets of

predetermined genomic relationship parameters, such as at least 100 sets of predetermined genomic relationship parameters. Each set of the predetermined genomic relationship parameters being calculated based on genomic

interrelationships between control offspring and its parents.

Comparison

Having i) received the experimental data, ii) processed the experimental data to provide at least one set of computerized genomic relationship parameters and iii) received control data comprising at least one set of predetermined genomic relationship parameters; its time to compare the at least one set of computerized genomic relationship parameters and the at least one set of predetermined genomic relationship parameters in order to determine whether the first potential parent of the offspring and the second potential parent of the offspring are the true parents of the offspring.

It is an advantage that the at least one set of computerized genomic relationship parameters is compared with threshold values that are representative for true parents-offspring trios. Thus, in one embodiment according to the present invention, the at least one set of computerized genomic relationship parameters is compared with more than one set of predetermined genomic relationship parameters.

Preferably, the at least one set of computerized genomic relationship parameters is compared with overall means for more than one set of predetermined genomic relationship parameters.

The at least one set of computerized genomic relationship parameters and the at least one set of predetermined genomic relationship parameters may be compared by a number of alternative methods, preferably they are compared using a mathematical equation which is proportional to a multivariate normal density function. Thus, in one embodiment according to the present invention, the at least one set of computerized genomic relationship parameters is compared with overall means for more than one set of predetermined genomic relationship parameters using a mathematical equation which is proportional to the natural logarithm of a multivariate normal density function.

One example of a mathematical equation which is proportional to the natural logarithm of a multivariate normal density function is given by formula (5):

where is a vector of the first, second and third computerized genomic

relationship parameter respectively; is a vector of the first, second and third predetermined genomic

relationship parameter respectively, even more preferably a vector of overall means for the first, second and third predetermined genomic relationship parameter respectively;

∑ is a 3x3 (co)variance matrix of the three predetermined genomic relationship parameters; and

GRL is a shortening of Genomic Relationship Likelihood.

In a further embodiment according to the present invention, the at least one set of computerized genomic relationship parameters must lie within the highest 80%, such as within the highest 85%, within the highest 90%, within the highest 95% or within the highest 99%, of known parent-offspring GRL values in order to confirm that the first potential parent (PI) of the offspring and the second potential parent (P2) of the offspring are the true parents (TP1 and TP2 respectively) of the offspring (O). In one preferred embodiment according to the present invention, the at least one set of computerized genomic relationship parameters must lie within the highest 90% of known parent-offspring GRL values in order to confirm that the first potential parent (PI) of the offspring and the second potential parent (P2) of the offspring are the true parents (TP1 and TP2 respectively) of the offspring (O). In another preferred embodiment according to the present invention, the at least one set of computerized genomic relationship parameters must lie within the highest 99% of known parent-offspring GRL values in order to confirm that the first potential parent (PI) of the offspring and the second potential parent (P2) of the offspring are the true parents (TP1 and TP2 respectively) of the offspring (O), thus accepting a false negative rate of 1%.

Preferred embodiments,

After having calculated more than one set of computerized genomic relationship parameters, and these sets of computerized genomic relationship parameters have been compared with the at least one set of computerized genomic relationship parameters using e.g. formula (5), the number of GRL values obtained are equal to the number of sets of computerized genomic relationship parameters.

To reduce the false positive rate and increase the true negative rate a AGRL value may be assessed according to formula (6):

where is the highest GRL value achieved; and is the second highest GRL value.

In datasets where both parents exist for a child, while no other relatives are available, the AGRL value will be very high, as no other realistic trio exists. In cases where close relatives to the parents are included among the candidate parents, such as uncles, aunts, grandparents, siblings or offspring of the child, the AGRL value will be lower due to genomic similarity between some false parent candidates and the child. High relatedness to the child alone (e.g., full-sib relationships) is not enough to give a high GRL value since the method also accounts for the

interrelationship of the whole trio. For example, if the parent candidates consist of one true parent and one full- sib of the child, the interrelationship of the trio will typically be illogical, as there will be an elevated relationship among the two parental candidates, but "normal" relationships of the child with itself and between parent candidates and the child (all of these should be elevated given a higher relationships among the two parent candidates). In cases where a parent is missing, but many other relatives exist (e.g., multiple uncles/aunts, grandparents or siblings) the GRL1 value may in rare cases exceed the threshold, but the AGRL value will then typically be low, as many "similar" candidates exist. Threshold values for assignment may therefore be set both for GRL1 and AGRL. The AGRL value is analogous to the A statistic used in (Marshall et al., 1998).

Thus, in one embodiment according to the first aspect of the present invention, step a) is repeated for each parents-offspring trio to be tested; and the experimental data processed to generate a number of sets of computerized genomic relationship parameters that equals to the number of parents-offspring trios to be tested. Said in other words, if 10 parents-offspring trios are to be tested, the number of sets of computerized genomic relationship parameters is 10. These sets of computerized genomic relationship parameters are then compared with the at least one set of predetermined genomic relationship parameters. If formula (5) is used for comparing these sets of computerized genomic relationship parameters with the at least one set of predetermined genomic relationship parameters, one GRL value will be obtained for each set of computerized genomic relationship parameters. To reduce the false positive rate and increase the true negative rate a AGRL value may be assessed according to formula (6):

where is the highest GRL value achieved; and is the second highest GRL value.

The obtained GRL1 and AGRL values may then be compared with a threshold value for GRL1 and AGRL respectively.

In one preferred embodiment according to the present invention, AGRL is set at ln(x), wherein x is a number higher than 100, preferably higher than 500 and even more preferably x is 1000 or higher. If "x" is 1000, this threshold implies that the best trio must be at least 1000 times more likely than the second-best trio.

It is to be understood that the more than one set of computerized genomic relationship parameters may be compared with the at least one set of computerized genomic relationship parameters by other means than using formula (5). In one embodiment they are compared using a mathematical equation which is proportional to a multivariate normal density function. Thus, in one embodiment according to the present invention, the more than one set of computerized genomic relationship parameters is compared with overall means for more than one set of predetermined genomic relationship parameters using a mathematical equation which is proportional to the natural logarithm of a multivariate normal density function. One example of a mathematical equation which is proportional to the natural logarithm of a multivariate normal density function is given by formula (5).

To reduce the false positive rate and increase the true negative rate a delta value may be assessed for each set of computerized genomic relationship parameters, the delta value representing the difference in genomic relationship between two offspring-parents trios. Said two offspring-parents trios being the two alternatives that has the highest probability of representing true offspring-parent trios. The obtained delta value may then be compared with a delta threshold value.

Preferred embodiments, removal of obviously incorrect parent candidates

As an offspring inherits half of its alleles from each parent, there is necessarily a substantial relationship between true parents and offspring. To speed up the assignment process, obviously incorrect parents candidates (with a relationship to the child lower than a given threshold) may be removed, which considerably reduces number of trios to be tested per child. The threshold may preferably be set conservatively, so that true parents are not likely to be removed.

In one embodiment according to the present invention, the at least one set of computerized genomic relationship parameters is calculated based upon genomic relationship between:

i) the offspring and the first potential parent of the offspring (ΓΟ,ΡΙ);

ii) the offspring and the second potential parent of the offspring (ro,p 2 );

iii) the offspring and itself (ro,o);

iv) the first potential parent of the offspring and the second potential parent of the offspring (rpi,p 2 );

v) the first potential parent of the offspring and itself (rpi,pi); and

vi) the second potential parent of the offspring and itself (rp 2 ,p 2 );

wherein

genomic relationship between two organisms i and j is calculated according to formula (1):

wherein

represents genomic relationship between organisms i and j; relate to the number of copies of a reference allele present for

organisms i and j at locus t;

represents the allele frequency at locus t; and c represents the total number of loci;

wherein

the reference allele at locus t is selected from the group consisting of i) alleles present within the genome of the offspring at locus t; ii) alleles present within the genome of the first potential parent of the offspring at locus t; and iii) alleles present within the genome of the second potential parent of the offspring at locus t. If the genomic relationship, calculated according to formula (1), between the offspring and the first potential parent of the offspring is < 0.25, the first potential parent of the offspring is not a true parent of the offspring. Similarly, if the genomic relationship, calculated according to formula (1), between the offspring and the second potential parent of the offspring is < 0.25, the second potential parent of the offspring is not a true parent of the offspring. Thus, a conservative threshold value Of 0.25 may be used to exclude obviously incorrect parents candidates.

Alternative embodiment, training predetermined genomic relationship parameters

As previously disclosed, the at least one set of predetermined genomic relationship parameters may be set at fixed threshold value(s) or the at least one set of predetermined genomic relationship parameters may be calculated based upon alleles present within the genome of a control offspring (o), a first true parent (TP1) of the control offspring and a second true parent (TP2) of the control offspring for selected loci.

An alternative to the above, the at least one set of predetermined genomic relationship parameter(s) may be determined by an iterative two-steps method which is shortly described below.

Step 1, allele dropping: Random matings between organisms in a dataset are performed in silico to produce simulated offspring. The dataset has a sizeable number of true trios (i.e. multiple cases of true child-parent trios). Not all children (unknown individuals) are required to have parents present in the data, as long as some true (albeit unknown) trios are present. For simplicity, all loci are assumed to be inherited independently. For dense marker data, this means that the in silico parent-offspring relationships will have the same expectation as true parent- offspring relationships, but much smaller variance. In reality, the effective number of segments inherited from parent to offspring is rather limited (due to the limited recombination rate), and more realistic parent-offspring relationships may be achieved by using a subsample of the loci (e.g., 500 loci). The simulated trios are then used for initial estimates of the predetermined genomic relationship parameters. Randomly mating two individuals might produce highly inbred children, and in some cases an individual may even be mated to itself (i.e. selfing), but the method accounts for this. The assignment threshold values is then chosen by ordering the in-silico generated GRL values by ascending value, and picking the value that is in the 0.1% location, e.g. if there are 50 000 GRL values ordered from lowest to highest, the threshold will be the value at position 50 (50 000 * 0.01).

GRL values below this threshold are considered as false trios. Initial estimates for ^ and are obtained from the simulated parent-offspring trios. Step 2, assignment iteration: Now, real data is analyzed. Trios are initially assigned using the GRL method with the predetermined genomic relationship parameter estimated in step 1. The predetermined genomic relationship parameters estimated in step 1 are from idealized simulated trios, and the initial assignments will thus be highly conservative due to the low variance estimates obtained from allele dropping. Parameters (except the GRL threshold) are then re-estimated using the newly assigned trios from real data, and then used as the basis of the next assignment iteration. When all (or most) true trios have been included in the iteration assignment there are two possibilities; (1) no further trios are assigned due to the difference in GRL values between the worst true trio and the best false trio and the iteration stops, or (2) false trios (i.e. trios where the parents are incorrect) are assigned and included in the parameter estimation for the next iteration, effectively increasing the residual variances drastically. Due to the increase of variance estimates in (2), false trios will start to seem plausible, and the values will decrease, effectively reducing the number of assignments. Because of this behavior, the iteration process ceases if the number of assignments in iteration n is equal or less that of iteration n- 1. Final parameter estimates from the iteration process are always from iteration n-1 (the iteration with most assignments).

An apparent problem may arise when there are few or no true trios to estimate the GRL parameters in the initial round of step 2. In this case, it is possible to perform GRL trio assignments using the parameter estimates from step 1 , but this will result in very restrictive assignments with few or no false positives and many false negatives if many genetic markers are used when performing allele dropping. This is because the allele dropping method used in step 1 assumes independent markers, which results in very low residual variance, again resulting in restrictive assignments when used on real organisms where the markers are clearly not independent, underlining the necessity to run the step 2 iterations to re-estimate the parameters. A possible solution is to use fewer markers when performing allele dropping in step 1. Another solution is to use pre-defined parameter estimates from a previous training session where the genotype errors and call rates are similar to the ones where training cannot be performed.

A second aspect of the present invention relates to a computer program product residing on a computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform the method according to the first aspect of the present invention.

A preferred embodiment according to the second aspect of the present invention relates to a computer program product residing on a computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform operations comprising: a) receiving experimental data comprising data related to alleles present within the genome of:

i) the offspring (O);

ii) a first potential parent (PI) of the offspring; and

iii) a second potential parent (P2) of the offspring;

for selected loci;

b) processing the experimental data to generate at least one set of computerized genomic relationship parameters; and

c) receiving control data comprising at least one set of predetermined genomic relationship parameters;

d) comparing the at least one set of computerized genomic relationship parameters with the at least one set of predetermined genomic relationship parameters in order to determine whether the first potential parent of the offspring and the second potential parent of the offspring are the true parents of the offspring; wherein

- the experimental data comprises data relating to number of copies of a reference allele present for each of the selected loci; wherein the reference allele for a given locus is selected from the group consisting of i) alleles present within the genome of the offspring at the given locus; ii) alleles present within the genome of the first potential parent of the offspring at the given locus; and iii) alleles present within the genome of the second potential parent of the offspring at the given locus;

- the at least one set of computerized genomic relationship parameters is

calculated based upon genomic relationship between:

i) the offspring and the first potential parent of the offspring (ΓΟ,ΡΙ);

ii) the offspring and the second potential parent of the offspring (ro,p 2 ); iii) the offspring and itself (ro,o);

iv) the first potential parent of the offspring and the second potential parent of the offspring (rpi,p 2 );

v) the first potential parent of the offspring and itself (rpi,pi); and vi) the second potential parent of the offspring and itself (rp 2 ,p 2 ); the genomic relationship between the organisms referred to in i), ii), iii), iv), v), and/or vi) is calculated on basis of at least the number of copies of a reference allele present for each of the selected loci; and

- the genomic relationship does not assume independent loci.

A third aspect of the present invention relates to a computing system comprising a processor and a memory configured to perform the method according to the first aspect of the present invention.

In one embodiment according to the third aspect of the present invention, the computing system is for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship.

A fourth aspect of the present invention relates to a computing system comprising a processor configured to perform the method according to the first aspect of the present invention.

In one embodiment according to the fourth aspect of the present invention, the computing system is for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship.

A fifth aspect of the present invention relates to a data processing system comprising means for carrying out the method according to the first aspect of the present invention.

In one embodiment according to the fifth aspect of the present invention, the data processing system is for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship.

A sixth aspect of the present invention relates to a data processing system comprising means for carrying out step a), means for carrying out step b), means for carrying out step c) and/or means for carrying out step d) of the method according to the first aspect of the present invention.

In one embodiment according to the sixth aspect of the present invention, the data processing system is for assigning both parents (PI and P2) to an offspring (O) based on genomic relationship.

General

As will be appreciated by one skilled in the art, the present disclosure may be embodied as a method, a system, or a computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," "module" or "system." Furthermore, the present disclosure may take the form of a computer program product on a computer-usable storage medium having computer-usable program code embodied in the medium. Any suitable computer usable or computer readable medium may be utilized. The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a non- exhaustive list) of the computer-readable medium may include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a transmission media such as those supporting the Internet or an intranet, or a magnetic storage device. The computer-usable or computer-readable medium may also be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer-usable medium may include a propagated data signal with the computer- usable program code embodied therewith, either in baseband or as part of a carrier wave. The computer usable program code may be transmitted using any appropriate medium, including but not limited to the Internet, wireline, optical fiber cable, RF, etc.

Computer program code for carrying out operations of the present disclosure may be written in an object oriented programming language such as Java, Smalltalk, C++ or the like. However, the computer program code for carrying out operations of the present disclosure may also be written in conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through a local area network / a wide area network / the Internet. CERTAIN REFERENCES CITED IN THE APPLICATION

Campbell, D., Duchesne, P., & Bernatchez, L. (2003). AFLP utility for population assignment studies: analytical investigation and empirical comparison with microsatellites. Mol Ecol, 12(7), 1979-1991.

Goddard, M. E., Hayes, B. J., & Meuwissen, T. H. (2011). Using the genomic relationship matrix to predict the accuracy of genomic selection. J Anim Breed Genet, 128(6), 409-421. doi: 10.1 111/j.1439-0388.2011.00964.x

Hayes, B. J. (2011). Efficient parentage assignment and pedigree reconstruction with dense single nucleotide polymorphism data. J Dairy Sci, 94(4), 2114-2117. doi: 10.3168/jds.2010-3896

Heaton, M. P., Leymaster, K. A., Kalbfleisch, T. S., Kijas, J. W., Clarke, S. M, McEwan, J., . . . International Sheep Genomics, C. (2014). SNPs for parentage testing and traceability in globally diverse breeds of sheep. PLoS One, 9(4), e94851. doi: 10.1371/journal.pone.0094851

Jones, A. G., Small, C. M., Paczolt, K. A., & Ratterman, N. L. (2010). A practical guide to methods of parentage analysis. Mol Ecol Resour, 10(1), 6-30.

doi: 10.1111/j.1755-0998.2009.02778.X

Jones, O. R., & Wang, J. (2010). COLONY: a program for parentage and sibship inference from multilocus genotype data. Mol Ecol Resour, 10(3), 551-555.

doi: 10.1111/j.1755-0998.2009.02787.X

Kalinowski, S. T., Taper, M. L., & Marshall, T. C. (2007). Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment. Mol Ecol, 16(5), 1099-1106. doi: 10.1111/j.1365- 294X.2007.03089.X

KALINOWSKI, S. T., TAPER, M. L., & MARSHALL, T. C. (2010). Corrigendum. Molecular Ecology, 19(7), 1. doi: 10.111 1/j. l365-294X.2010.04544.x

Marshall, T. C, Slate, J., Kruuk, L. E., & Pemberton, J. M. (1998). Statistical confidence for likelihood-based paternity inference in natural populations. Mol Ecol, 7(5), 639-655.

Morrissey, M. B., & Wilson, A. J. (2005). The potential costs of accounting for genotypic errors in molecular parentage analyses. Mol Ecol, 14(13), 4111-4121. doi: 10.1111/j.1365-294X.2005.02708.X

Purfield, D. C, McClure, M., & Berry, D. P. (2016). Justification for setting the individual animal genotype call rate threshold at eighty-five percent. J Anim Sci, 94(11), 4558-4569. doi: 10.2527/jas.2016-0802 Sargolzaei, M., & Schenkel, F. S. (2009). QMSim: a large-scale genome simulator for livestock. Bioinformatics, 25(5), 680-681. doi: 10.1093/bioinformatics/btp045

Strucken, E. M., Lee, S. H., Lee, H. K., Song, K. D., Gibson, J. P., & Gondro, C. (2016). How many markers are enough? Factors influencing parentage testing in different livestock populations. J Anim Breed Genet, 133(1), 13-23.

doi: 10.1111/jbg. l2179

VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. J Dairy Sci, 91(11), 4414-4423. doi: 10.3168/jds.2007-0980

Waldbieser, G. C, & Bosworth, B. G. (2013). A standardized microsatellite marker panel for parentage and kinship analyses in channel catfish, Ictalurus punctatus. Anim Genet, 44(4), 476-479. doi: 10.1111/age.12017

Having generally described this invention, a further understanding can be obtained by reference to certain specific examples, which are provided herein for purposes of illustration only, and are not intended to be limiting unless otherwise specified.

EXAMPLES

Example la-d:

Example la describes how control data, i.e. the predetermined genomic relationship parameters, are determined. Example lb describes determination of computerized genomic relationship parameters for a trio consisting of an offspring and its true parents and example lc describes determination of computerized genomic relationship parameters for a trio consisting of an offspring and false parents of the offspring. Example 1 d illustrates how the computer-implemented method may be applied to differentiate between true and false trios.

Example la: Predetermined genomic relationship parameters

Predetermined genomic relationship parameters were determined on basis of data from 12175 trios. A total of 36578 loci were analyzed for each organism of said trios. Allele frequency was calculated by summing up the number of reference alleles and dividing this number by two for each locus. All of the selected loci were polymorphic.

Each trio consists of a control offspring, a first parent of the control offspring and a second parent of the control offspring.

Actual genomic relationship

Actual genomic relationship between:

i) the control offspring and the first parent of the control offspring; ii) the control offspring and the second parent of the control offspring;

iii) the control offspring and itself;

iv) the first parent of the control offspring and the second parent of the control offspring;

v) the first parent of the control offspring and itself; and

vi) the second parent of the control offspring and itself;

is calculated according to formula (1):

wherein

represents genomic relationship between organisms i and j; relate to the number of copies of a reference allele present for

organisms i and j at locus t;

represents the allele frequency at locus t; and c represents the total number of loci;

wherein

the reference allele at locus t is selected from the group consisting of i) alleles present within the genome of the control offspring at locus t; ii) alleles present within the genome of the first true parent of the control offspring at locus t; and iii) alleles present within the genome of the second true parent of the control offspring at locus t.

are independently selected from

- "0" for locus t if that locus has two copies of the reference allele;

- "1" for locus t if that locus has one copy of the reference allele; or

- "2" for locus t if that locus has no copies of the reference allele. Expected genomic relationship

The expected genomic relationships between two true parents (TP) and an offspring are:

the expected genomic relationship between the offspring and itself is:

Residual relationship

Three residual relationships are defined as differences between actual and expected genomic relationships:

GRLthreshold

The residual relationships for each of the tested trios ("e") is then compared with the residual relationships for all of the tested trios ("μ"), i.e. overall means for the three residual relationships, using formula (5):

Each tested trio will then be assigned a calculated GRL value. The threshold is set at 99%, which means that GRLthreshold is set at a value corresponding to the lowest GRL value among the 99% highest GRL values. In the present example, the

GRLthreshold is set at a value of— 7.54823.

Predetermined genomic relationship parameters is a vector of overall means for the three residual relationships.

The Σ is the 3x3 covariance matrix of the three residual variates. In this example it is required that the best trio is 1000 times more likely than the second best trio in order for assigning the parents to an offspring. Thus, is set at a fixed value of ln(1000) ~ 6.9. Table 1:

Example lb: Computerized genomic relationship parameters for trio 1

In this example the first- and second potential parent of the offspring are the true parents of the offspring.

Actual genomic interrelationship between the first- and second potential parents and offspring is estimated according to formula (1). The results are presented in table 2.

Table 2:

Assuming that the two potential parents of the offspring are the true parents of the offspring, the expected genomic relationships between two true parents (TP) of an offspring and the offspring may be estimated according to the following formulas:

The expected relationship between the offspring and itself may be estimated according to the following formula:

The results are presented in table 3.

Table 3:

Three residual relationships are defined as differences between actual and expected genomic relationships:

The residual relationships are presented in table 4.

Table 4:

The residual relationships would be expected to be close to "0" if the potential parents are the true parents of the offspring.

The residual relationships, i.e. the at least one set of computerized genomic relationship parameters, may then be compared with the at least one set of predetermined genomic relationship parameters using formula (5): where

The result of the above comparison being:

Example lc: Computerized genomic relationship parameters for trio 2

In this example the first potential parent of the offspring is a true parent of the offspring while the second potential parent of the offspring is a false parent of the offspring.

Actual genomic interrelationship between the first- and second potential parents and offspring is estimated according to formula (1). The results are presented in table 5.

Table 5:

Assuming that the two potential parents of the offspring are the true parents of the offspring, the expected genomic relationships between two true parents (TP) of an offspring and the offspring may be estimated according to the following formulas:

The expected relationship between the offspring and itself may be estimated according to the following formula:

The results are presented in table 6.

Table 6:

Three residual relationships are defined as differences between actual and expected genomic relationships:

The residual relationships are presented in table 7. Table 7:

The residual relationships would be expected to be close to "0" if the potential parents are the true parents of the offspring. The residual relationship e 0 , p2 is not close to "0", clearly indicating that this potential parent is not a true parent of the offspring.

The residual relationships, i.e. the at least one set of computerized genomic relationship parameters, may then be compared with the at least one set of predetermined genomic relationship parameters using formula (5):

where

The result of the above comparison being:

Example Id: Assignment

One trio has been tested in example lb and antoher trio has been tested in example lc. The results of these two tests are two GRL-values, one GRL-value for each trio.

With reference to example lb, the GRL-value for trio 1 was found to be

-1.319273381. With reference to example lc, the GRL-value for trio 2 was found to be

-43.10243837.

The highest GRL-value is referred to as GRLi, the second highest GRL-value is referred to as GRL 2 and is GRLi - GRL 2 .

The following two requirements need to be fulfilled in order to confirm that the potential parents of the offspring are the true parents of the offspring:

In view of the above data, it is clear that the trio forming the basis for GRLi fulfills both of the above requirements. Thus, the potential parents in trio 1 (example lb) is therefore assigned as the true parents of the offspring.

The above examples la- Id clearly show that the computer-implemented method may be used to assign both parents to an offspring. Example 2: Genomic relationship between two organisms.

The present example illustrates how genomic relationship between two organisms may be calculated. The number of loci to be analyzed is set at 1. At this locus it is tt has the alleles A/A while animal 2 at locus t has the alleles A/T.

Genomic relationship between two organisms may be calculated according to formula (1):

wherein

represents genomic relationship between organisms i and j; relate to the number of copies of a reference allele present for

organisms i and j at locus t; p t represents the allele frequency at locus t; and c represents the total number of loci.

Since we are only looking at the alleles at one locus, the formula is:

The reference allele may be either A or T. In this example A is selected as the reference allele. Thus, animal 1 (A/A) has 2 copies of the reference allele at locus t while animal 2 (A/T) has 1 copy of the reference allele at locus t (mu = 2 and mjt = 1). Further, the reference allele frequency is set at 0.45.

In view of the above, the genomic relationship between animal 1 and animal 2 is: