Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SAMPLE PREPARATION WITH OPPOSITELY ORIENTED GUIDE POLYNUCLEOTIDES
Document Type and Number:
WIPO Patent Application WO/2022/243437
Kind Code:
A1
Abstract:
The invention relates to a method for preparing a sample (400) comprising sample polynucleotides (602) for sequencing, wherein at least one of the sample polynucleotides comprises a known sequence (605), the method comprising: protecting (102) the ends of the sample polynucleotides, contacting (104) at least first and second nucleoprotein particles (608, 702, 704) with the protected sample polynucleotides, wherein each first nucleoprotein particle comprises an effector protein and a first guide polynucleotide (706) and each second nucleoprotein particle comprises an effector protein and a second guide polynucleotide (708), wherein the sequences of the first and second guide polynucleotides are different and are selected such that o the first guide polynucleotides cause the effector proteins to cut the known sequence at a first cleavage site (722); o the second guide polynucleotides cause the effector proteins to cut the known sequence at a second cleavage site (724); o wherein the first and second cleavage sites define an in-between sequence (614) as the part of the known sequence between the first and second cleavage sites (722, 724); o whereby the first and second guide polynucleotides respectively comprise a binding sequence (715, 713) whose position and orientation within the known sequence is selected such that when the sample is contacted with a sequencing adapter (616), the sequencing adapter selectively binds to the ends of the sample polynucleotide fragments created by the cutting which do not comprise the in- between sequence; adding (106) sequencing adapters to the sample.

Inventors:
JANTO BENJAMIN (DE)
Application Number:
PCT/EP2022/063585
Publication Date:
November 24, 2022
Filing Date:
May 19, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KWS SAAT SE & CO KGAA (DE)
KEYGENE NV (NL)
International Classes:
C12Q1/6806
Domestic Patent References:
WO2019224560A12019-11-28
WO2018236548A12018-12-27
Foreign References:
EP2771468A12014-09-03
EP3009511B12017-05-31
Other References:
FARBOUD BEHNOM ET AL: "Strategies for Efficient Genome Editing Using CRISPR-Cas9", vol. 211, 30 November 2018 (2018-11-30), pages 431 - 457, XP055857227, Retrieved from the Internet
GILPATRICK TIMOTHY ET AL: "Targeted nanopore sequencing with Cas9-guided adapter ligation", NATURE BIOTECHNOLOGY, GALE GROUP INC, NEW YORK, vol. 38, no. 4, 10 February 2020 (2020-02-10), pages 433 - 438, XP037086856, ISSN: 1087-0156, [retrieved on 20200210], DOI: 10.1038/S41587-020-0407-5
PAVLOPOULOS ANASTASIOS ED - BUGERT ET AL: "Identification of DNA sequences that flank a known region by inverse PCR", METHODS IN MOLECULAR BIOLOGY; [METHODS IN MOLECULAR BIOLOGY; ISSN 1064-3745; VOL. 1310], HUMANA PR, US, vol. 772, 2011, pages 267 - 275, XP009180772, ISBN: 978-1-61779-291-5, DOI: 10.1007/978-1-61779-228-1_16
ZENG TINGRU ET AL: "Identification of genomic insertion and flanking sequences of the transgenic drought-tolerant maize line "SbSNAC1-382" using the single-molecule real-time (SMRT) sequencing method", PLOS ONE, vol. 15, no. 4, 10 April 2020 (2020-04-10), pages e0226455, XP055856970, DOI: 10.1371/journal.pone.0226455
GILPATRICK, T.LEE, I.GRAHAM, J.E. ET AL.: "Targeted nanopore sequencing with Cas9-guided adapter ligation", NAT BIOTECHNOL, vol. 38, 2020, pages 433 - 438, XP037086856, Retrieved from the Internet DOI: 10.1038/s41587-020-0407-5
ZHANG ET AL.: "Preparation of megabase-sized DNA from a variety of organisms using the nuclei method for advanced genomics research", NAT PROTOC, vol. 7, no. 3, 16 February 2012 (2012-02-16), pages 467 - 78, XP055699592, DOI: 10.1038/nprot.2011.455
HENG LI: "Minimap2: pairwise alignment for nucleotide sequences", BIOINFORMATICS, vol. 34, 15 September 2018 (2018-09-15), pages 3094 - 3100, Retrieved from the Internet
Attorney, Agent or Firm:
NEDERLANDSCH OCTROOIBUREAU (NL)
Download PDF:
Claims:
Claims

1 . A method for preparing a sample (400) comprising sample polynucleotides (602) for sequencing, wherein at least one of the sample polynucleotides comprises a known sequence (605) of nucleotides, the method comprising: protecting (102) the ends of the sample polynucleotides, contacting (104) at least first and second nucleoprotein particles (608, 702, 704) with the protected sample polynucleotides, wherein each first nucleoprotein particle is a complex comprising a polynucleotide-guided effector protein and a first guide polynucleotide (706), wherein each second nucleoprotein particle is a complex comprising a polynucleotide- guided effector protein and a second guide polynucleotide (708), wherein the sequences of the first and second guide polynucleotides are different and are selected such that o the first guide polynucleotides cause the effector proteins in the first nucleoprotein particles to cut the known sequence at a first cleavage site (722); o the second guide polynucleotides cause the effector proteins in the second nucleoprotein particles to cut the known sequence at a second cleavage site (724); o wherein the first and second cleavage sites define an in-between sequence (614) as the part of the known sequence between the first and second cleavage sites (722, 724); o whereby the first and second guide polynucleotides respectively comprise a binding sequence (715, 713) whose position and orientation within the known sequence is selected such that when the sample is contacted with a sequencing adapter (616), the sequencing adapter selectively binds to the ends of the sample polynucleotide fragments created by the cutting which do not comprise the in- between sequence; adding (106) sequencing adapters to the sample to selectively bind directly or via intermediate adapters to the cut ends of the sample polynucleotide fragments which do not comprise the in-between sequence for enabling sequencing of the arms starting from the two cleavage sites in opposite direction towards sample polynucleotide regions outside of the known sequence.

2. The method according to claim 1 , wherein the effector protein is a nuclease adapted to generate sticky ends and wherein the method further comprises contacting (202) the sample after the cutting and prior to the adding of the sequencing adapters with a sticky-end-filling polymerase and a mixture of dATP, dCTP, dGTP and dTTP nucleotides, the sticky-end-filling polymerase being in particular T4 polymerase or DNA Polymerase I Large Klenow Fragment or Taq polymerase.

3. The method according to any one of the previous claims, wherein the method further comprises contacting the sample prior to the adding of the adapter with a non-proofreading polymerase that has terminal transferase activity and dATP to add a single-A-tail or a polyA-tail (613) to all ends of the sample polynucleotides in the sample including the cut ends but not to the cut ends of the ones of the sample polynucleotide fragments to which one of the nucleoprotein particles is bound.

4. The method according to any one of the preceding claims, wherein the sequencing adapter or an intermediate adapter linking the sequencing adapter to the cut ends of the sample polynucleotide fragments which do not comprise the in-between sequence comprises a single T or a polyT tail.

5. The method according to any one of the preceding claims, wherein the binding sequences (715, 713) of the first and second guide polynucleotides are selected such that the first and second nucleoprotein particles remain bound after the cutting to the end of the one of the two sample polynucleotide fragments generated by the cut which comprises the in-between sequence (614), thereby preventing the sequencing adapter from binding to the cut ends to which one of the nucleoprotein particles is bound.

6. The method according to any of the preceding claims, wherein the sample preparation method is free of any procedure which would remove the nucleotide particles from the cut ends after the cutting and before the adding of the sequencing adapter, wherein the sample preparation method is in particular free of a deproteinization step for removing the nucleoprotein particles from the cut ends.

7. The method according to any of the preceding claims, wherein the sample polynucleotide is contacted with the first and the second nucleoprotein particles in the same reaction vessel and wherein the cutting of the known sequence in the first and second cleavage sites by the first and second nucleoprotein particles generates a central fragment comprising the in-between sequence and two remaining arms, wherein the nucleoprotein particles remain bound to the cut ends of the central fragment (614) after the cutting.

8. The method according to any of the preceding claims, wherein the sample polynucleotide is a double-stranded polynucleotide, wherein the sequencing adapter or the intermediate adapter comprises a binding moiety with a T-tail which is complementary to the A-tail; and wherein the sequencing adapter or the intermediate adapter is added to the sample in combination with a ligase, the ligase being adapted to create the binding between sample polynucleotide ends and the T-tails of the adapters selectively for double strand sample polynucleotide ends comprising a non-protected 5’ strand end and a A-tailed 3’strand end.

9. The method according to any one of the previous claims, wherein the protecting of the ends of the polynucleotides in the sample comprises a method selected from a group of methods consisting of:

- dephosphorylating the 5’ ends of the polynucleotides;

- contacting the sample with a non-proofreading polymerase that has terminal transferase activity, in particular Taq polymerase or terminal deoxynucleotidyl transferase, and chain-elongation-inhibition nucleotides, in particular dideoxynucleotides.

10. The method according to any one of the preceding claims, wherein the guide polynucleotides are guide RNAs, in particular synthetic CRISPR RNAs (crRNAs), single guide RNAs (sgRNAs), or combinations of a trans-activating CRISPR RNA (tracrRNA) and a synthetic CRISPR RNA (crRNA).

11 . The method according to any one of the previous claims, wherein the polynucleotide-guided effector protein is a nuclease of a CRISPR/Cas system, in particular a nuclease selected from a group consisting of: Cas3, Cas4, Cas8a, Cas8b, Cas8c, Cas9, Casio, Cas10d, Cas12a (synonym for Cpfl), Cas12b, Cas13, Csn2, Csfl, Cmr5, Csm2, Csyl, Csel, C2c2, CasX, CasY or CasZ or Csm or a MADzyme such as MAD2 or MAD7, or any combination, variant, or catalytically active fragment of one or more of the aforementioned effector proteins.

12. The method according to any of the previous claims further comprising performing (108) the sequencing of at least parts of the sample polynucleotide arms starting from the bound sequencing adapters.

13. A method for designing guide polynucleotides adapted for enabling the sequencing of doublestrand sample polynucleotide regions (604) surrounding a region with known sequence referred to as “known sequence” (605), the method comprising:

- identifying (302) a first protospacer adjacent motif (PAM) (716) within a maximum distance from a first end of the known sequence;

- identifying (304) a second protospacer adjacent motif (PAM) (714) within a maximum distance from a second end of the known sequence; - identifying (306) a first protospacer sequence (720) within a predefined distance upstream or downstream of the first PAM;

- identifying (308) a second protospacer sequence (718) within a predefined distance upstream or downstream of the second PAM;

- wherein the identification of the first and second protospacers and PAMs comprises checking that the position and orientations of the first and second PAMS and protospacers within the known sequence and relative to each other ensures that:

- a first guide polynucleotides comprising a binding sequence (713) complementary to the first protospacer causes an effector protein comprised in a first nucleoprotein particle to bind to a first protospacer and to cut the known sequence at a first cleavage site (722);

- the second guide polynucleotides comprising a binding sequence (715) complementary to the second protospacer causes an effector protein comprised in a second nucleoprotein particle to bind to the second protospacer and to cut the known sequence at a second cleavage site (724);

- wherein the first and second cleavage sites define an in-between sequence (614) as the part of the known sequence between the first and second cleavage sites (722, 724);

- whereby the binding sequences (715, 713) are oriented relative to each other such that the first and second nucleoprotein particles remain bound after the cutting to the end of the one of the two sample polynucleotide fragments generated by the cut which comprises the in-between sequence (614); creating (310) the first guide polynucleotide; and

- creating (312) the second guide polynucleotide.

14. The method according to claim 13, further comprising:

- contacting (314) the first guide polynucleotides with effector proteins to provide the first nucleoprotein particles;

- contacting (316) the second guide polynucleotides with effector proteins to provide the second nucleoprotein particles; and

- using (318) the first and second nucleoprotein particles in a method for preparing a sample according to any one of claims 1-12.

15. A method for characterizing the integration of an insertion sequence into a genomic polynucleotide of an organism, the method comprising:

- preparing a sample (400) of the genomic polynucleotide of the organism in accordance with any one of the previous claims 1-12, wherein the integrated insertion sequence comprises a region with known nucleotide sequence referred to as ’’known sequence” (605), wherein the first and second guide polynucleotides are selected such that they guide the effector proteins to first and second cleavage sites (722, 724) which lie within the known region, wherein the region between the first and second cleavage site is referred to as in-between sequence (614); and

- performing the adapter-based sequencing for obtaining sequence information selectively for the ones of the sample polynucleotide fragments generated by the cutting which do not comprise the in-between-sequence.

16. The method of claim 15, further comprising:

- analyzing the sequence information of the sequenced fragments for determining if sequence information was obtained for at least parts of at least two different sample polynucleotide fragments (610, 612), wherein the absence of sequence information for at least one of the at least two fragments indicates that only a part of the insertion sequence was integrated in the genomic polynucleotide or that the insertion sequence was not integrated at all; and/or

- analyzing the sequence information of the sequenced fragments for determining if the obtained sequence information can be clustered into three or more different clusters of sequences, wherein the presence of three or more different sequence clusters indicates that the insertion sequence was integrated in two or more copies at multiple different genomic locations; and/or

- analyzing the sequence information of the sequenced fragments for determining the orientation of the 5’ and 3’ ends of the insertion sequence within the sample polynucleotide.

17. The method of claim 15 or 16, the insertion sequence being a DNA strand having been incorporated into the genome of an organism by means of genetic engineering, the organism being in particular a plant and/or the DNA strand being in particular the T-DNA portion of a Ti plasmid or a gene cassette.

Description:
SAMPLE PREPARATION WITH OPPOSITELY ORIENTED GUIDE POLYNUCLEOTIDES

TECHNICAL FIELD

The invention relates to the technical field of preparing polynucleotide samples, e.g. genomic DNA samples, for performing a sequencing reaction, in particular a targeted DNA sequencing reaction.

BACKGROUND

Conventional PCR amplicon-based sequencing techniques are restricted to relatively short target lengths compatible with amplification and do not perform well in genomic regions with a repetitive nature or high GC content. Using a CRISPR/Cas-based targeted sequencing approach, it is possible to achieve a significant increase in coverage of the targeted genomic region and an improved robustness against repetitive structures because amplification is not necessary. Coupled with a long read sequencing technology such as Oxford Nanopore Technologies (ONT) nanopore sequencing, Cas-targeted sequencing allows for the analysis of extremely large target regions and can even support the identification of epigenetic modifications.

The use of CRISPR-Cas nucleases for targeted sequencing based on an enrichment of a region of interest surrounded by known sequences has been described already, e.g. by Oxford Nanopore Technologies in their "Cas9 targeted sequencing" protocol (Version:

ENR_9084_v109_revP_04Dec2018) or in the nCATS (nanopore Cas9-targeted sequencing) method by Gilpatrick et al. (Gilpatrick, T., Lee, I., Graham, J.E. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol 38, 433-438 (2020). https://doi.orq/10.1038/s41587-020- 0407-5). These methods describe the CRISPR-Cas9 enrichment of regions of interest using pairs of guide RNAs that face each other, which leads to sequencing of the region in between the two guide RNAs. This strategy is similar to PCR amplicon-based sequencing techniques which rely on a PCR reaction using PCR primers that need to be oriented towards each other. It allows for targeted sequencing of regions that are surrounded by known sequences that are required for the design of the guide RNAs.

SUMMARY

The invention provides for an improved method of preparing a sample for performing targeted sequencing, and a corresponding method for designing guide polynucleotides to be used for the sample preparation as specified in the independent patent claims. Embodiments of the invention are given in the dependent claims. Embodiments of the present invention can be freely combined with each other if they are not mutually exclusive.

In one aspect, the invention relates to a method for preparing a sample comprising sample polynucleotides for sequencing. At least one of the sample polynucleotides comprises a known polynucleotide sequence referred to as known sequence. The method comprises: protecting the ends of the sample polynucleotides, contacting at least first and second nucleoprotein particles with the protected sample polynucleotides, wherein each first nucleoprotein particle is a complex comprising a polynucleotide-guided effector protein and a first guide polynucleotide, wherein each second nucleoprotein particle is a complex comprising a polynucleotide-guided effector protein and a second guide polynucleotide, wherein the sequences of the first and second guide polynucleotides are different and are selected such that o the first guide polynucleotides cause the effector proteins in the first nucleoprotein particles to cut the known sequence at a first cleavage site; o the second guide polynucleotides cause the effector proteins in the second nucleoprotein particles to cut the known sequence at a second cleavage site; o wherein the first and second cleavage sites define an in-between sequence as the part of the known sequence between the first and second cleavage sites; o whereby the first and second guide polynucleotides respectively comprise a binding sequence whose position and orientation within the known sequence is selected such that when the sample is contacted with a sequencing adapter (616), the sequencing adapter selectively binds to the ends of the sample polynucleotide fragments created by the cutting which do not comprise the in-between sequence; adding sequencing adapters to the sample to selectively bind directly or via intermediate adapters to the cut ends of the sample polynucleotide fragments which do not comprise the in-between sequence for enabling sequencing of the arms starting from the two cleavage sites in opposite direction towards sample polynucleotide regions outside of the known sequence.

Using at least two different guide polynucleotides which are designed with orientations pointing away from each other may have the advantage that it is possible to sequence and characterize unknown regions of interest (that are not framed on both sides by known sequence). This was not possible by state-of-the-art approaches which are based on guideRNAs facing each other.

In addition, by using two different guide polynucleotides targeting different, known target sequences within the sample polynucleotide, whereby the two guide polynucleotides “point away” from each other, it is possible to obtain information about the orientation of the in-between sequence relative to the regions of interest surrounding this in-between sequence.

Embodiments of the method may be used for discovering and characterization of unknown sequences surrounding a short piece of known sequence. The use of an adapter-based sequencing approach such as nanopore sequencing allows obtaining long reads and speeding up sequence analysis. The obtained long-read sequencing data is less ambiguous than the sequencing data obtained by other, in particular undirected, sequencing approaches which typically generate shorter reads. As the described method uses a targeted sequencing approach, far fewer raw reads are needed than with a random shotgun approach (WGS) resulting in a reduced cost. It is important to note that this approach also does not require a previous PCR amplification step. Hence, all the limitations associated with PCR (length, amplification issues etc.) can be avoided. The sample preparation method described for embodiments of the invention is well suited for use in the field of gene editing in any organism, for identifying or confirming the location of any type of artificial/directed insertion or other modification.

This includes Agrobacterium-mediated T-DNA insertions in plants, insertions by bombardment, or Cas-mediated targeted modifications. The sample preparation and sequencing approach may also be applied to investigate "natural" insertions, such as viral integration analyses, transposon analyses, or in any scenario as a replacement for primer walking.

In a further beneficial aspect, the consumption of chemicals is reduced, and sequencing data noise can be avoided: allowing the nucleoprotein particles to remain bound to the ends of the in-between sequence fragment (which comprises a known polynucleotide sequence) saves time and resources as this region may not be sequenced. For example, the two different guide polynucleotides may be chosen such that the in-between sequence is at least 500, e.g., at least 1000 or even more than 10.000 nucleotides long. For example, this may allow sequencing the environment of large genomic insert elements such as viruses ortransposons within multiple different genomic integration sites.

The applicant has observed that even though the sequencing reaction is performed without any amplification reaction, the obtained sequencing data has a high quality.

According to embodiments, the guide polynucleotides are designed to guide the effector protein to generate cuts within the known region such that the known region is split into two or more sample polynucleotide fragments (depending on the number of occurrences of the known sequence in the sample polynucleotide and depending on whether the cutting by the two different nucleoprotein is performed in the same or in different reaction vessels). Each of the sample polynucleotide fragments generated by the cutting will typically comprise at least a short portion of the known sequence, whereby in case the in-between sequence is cut out completely, this central fragment (which may not be sequenced) completely consists of the known sequence. Each of the other sample polynucleotide fragments will typically comprise unknown or insufficiently characterized sequence regions which are referred to as region-of-interest (ROI).

Embodiments of the method can be applied in various use-case-scenarios. For example, the two types of guide polynucleotides can be adapted to guide the effector protein such that the two cuts are located within a sample polynucleotide region comprising a known nucleotide sequence (referred herein as “known sequence”). The one or more sample polynucleotide regions comprising the known sequence can belong to the original (native, non-genetically modified) genome of an organism or can be part of an insertion polynucleotide integrated into the genome by means of a genetic engineering technique. The sample preparation method can be performed for preparing a sample of genomic DNA of said organism. Preparing the sample for adapter- based targeted sequencing can be performed to obtain accurate sequence information of a genomic DNA sequence for which no or an unreliable/inaccurate sequencing information is available.

For example, repetitive sequences such as transposon sequences often cannot be sequenced accurately with conventional sequencing approaches; in some cases, e.g. for statistical reasons, for some regions of the genome only a small number of reads is obtained.

According to some examples, one or more of the sample polynucleotide fragments generated by the cutting and having bound a sequencing adapter comprises a ROI within the genomic DNA of an organism which comprises:

-a repetitive sequence pattern and/or

-a genomic region which was sequenced previously and for which an insufficient number of reads was obtained in the previous sequencing; and/or

-a genomic region of unknown sequence.

According to some embodiments, the effector protein is a nuclease adapted to generate blunt ends. For example, the effector protein can be a Cas9.

According to other embodiments, the effector protein is a nuclease adapted to generate sticky ends. For example, the effector protein can be Cas12a, also known as “Cpfl”.

The method further comprises, when using a nuclease generating sticky ends, contacting the sample after the cutting and prior to the adding of the sequencing adapters with a sticky-end-filling polymerase and a mixture of dATP, dCTP, dGTP and dTTP nucleotides. The sticky-end-filling polymerase can be any polymerase but is in particular T4 polymerase, DNA Polymerase I Large Klenow Fragment, or a Taq polymerase.

The use of the endonuclease Cpfl may be advantageous, since Cpfl is a comparatively small enzyme whose guide polynucleotide consists only of a single RNA (crRNA), so no tracrRNA is needed to cut double-stranded DNA at a targeted site. Cpfl has a different cleavage site than other nucleases such as Cas9 and creates a base overhang on one of the two DNA strands (sticky end).

According to embodiments, the method further comprises contacting the sample prior to the adding of the adapter with a non-proofreading polymerase that has terminal transferase activity such as Taq polymerase or terminal deoxynucleotidyl transferase (TdT) and dATP to add an A-tail, i.e. , a single A tail or a polyA tail, to all ends of the sample polynucleotides in the sample including the cut ends but not to the cut ends of the ones of the sample polynucleotide fragments to which one of the nucleoprotein particles is bound. This step may enable or facilitate adapter ligation. For example, the sequencing adapter may have a “T tail”, such as a single T tail or polyT tail. By adding A-tails to the ends of the sample polynucleotides, the sequencing adapter will be enabled to selectively bind to the ends of the sample polynucleotide fragments generated by the cutting by the effector proteins, because the ends of the sample polynucleotides which have already existed before the cutting have been protected.

According to embodiments, the sequencing adapter or an intermediate adapter used for ligating the sequencing adapter to specific sample polynucleotide ends selectively binds to polynucleotides which fulfil two requirements: a) the sample polynucleotide end must have 5’ phosphate and in addition have a 3’ overhang complementary to an overhang (tail) of the adapter.

As a consequence, the adapter will only bind to sample polynucleotide ends generated by the cutting and which do not lie adjacent to the in-between sequence, because the cut ends adjacent to the in- between sequence are blocked by the still-bound effector protein complex. In other words, the only sample polynucleotide ends that are not blocked and that have both a 5’ phosphate and a complementary 3’ tail are those (arms) liberated from the RNP cut.

This may be beneficial as these features may ensure that only the two sample polynucleotide fragments generated by the cutting which respectively comprise one of the two surrounding ROIs are (partially or completely) sequenced. Neither the cut ends adjacent to the in-between sequence nor the ends of the sample polynucleotide existing already before the cutting step fulfill both criteria.

According to embodiments, also the sequencing adapter or the intermediate adapter fulfils the requirements a) and b), meaning that the adapter comprises a polynucleotide end with a 5’ phosphate and a 3’ polynucleotide tail complementary to the 3’ polynucleotide tail of the cut ends. Corresponding sequencing adapters are commercially available, e.g. from ONT.

According to embodiments, the sequencing adapter or an intermediate adapter linking the sequencing adapter to the cut ends of the sample polynucleotide fragments which do not comprise the in-between sequence comprises a single T or a polyT tail.

According to embodiments, the binding sequences of the first and second guide polynucleotides are selected such that the first and second nucleoprotein particles remain bound after the cutting to the end of the one of the two sample polynucleotide fragments generated by the cut which comprises the in-between sequence, thereby preventing the sequencing adapter from binding to the cut ends to which one of the nucleoprotein particles is bound.

According to embodiments, the position and orientation of the binding sequence of the guide polynucleotides are selected such that as a consequence of the cutting by an RNP comprising this guide polynucleotide, all or the majority of the protospacer nucleotides (being complementary to the binding sequence of the guide polynucleotide) are part of the one of the two fragments generated by this cut which comprises the in-between sequence (the sequence of the sample polynucleotide between the two cleavage sites of the two different types of RNPs)”. This may ensure that the RNPs/guide RNA will remain bound to fragment which comprises the majority of the bases of the protospacer. According to embodiments, the sample preparation method is free of any procedure which would remove the nucleotide particles from the cut ends after the cutting and before the adding of the sequencing adapter. In particular, the sample preparation method is free of a deproteinization step for removing the nucleoprotein particles from the cut ends.

According to embodiments, the two types of guide polynucleotide sequences are designed such that they cause the nucleoprotein particles to bind to the known sequence in the one sample polynucleotide in an orientation that the nucleoprotein particles remain bound to the cut ends of the sample polynucleotide fragments comprising the in-between sequence after the cutting.

This may be advantageous as the nucleoprotein complex prevents the sequencing adapter from binding to the ends of the in-between sequence which is already known. Applicant has observed that sequencing reactions mostly start from the cut ends of the fragments not comprising the in-between sequence and not having bound the nucleoprotein particles.

According to embodiments, the nucleoprotein particles bound to the cut ends of the fragments comprising the in-between sequence prevent the adding of the single A tail or polyA tail to the ends of the central fragment.

According to embodiments, the sample polynucleotide is contacted with the first and the second nucleoprotein particles in the same reaction vessel. The cutting of the known sequence in the first and second cleavage sites by the first and second nucleoprotein particles generates a central fragment comprising the in-between sequence and two remaining arms, wherein the nucleoprotein particles remain bound to the two cut ends of the central fragment after the cutting. This embodiment is illustrated for example in figures 5 and 7.

According to other embodiments, the known sequence is cut in different reaction vessels into two fragments at different cleavage sites as illustrated in figure 6.

According to embodiments, the sample preparation method is free of any step of removing the nucleoprotein complexes from the sample polynucleotide. In particular, no deproteinization step is performed. Thereby, it is ensured that the RNP particles remain bound to the ends of the ones of the sample polynucleotide fragments generated by the cutting which comprise the in-between sequence (the border of the in-between sequence is determined by the two cleavage sites of the two different types of RNP complexes). Thereby, it is ensured that the adapter-based sequencing is performed in the two directions pointing away from this in-between sequence outwards to the ROI(s) and not towards the in-between region whose sequence is already known.

According to embodiments, the sample polynucleotide is a double-stranded polynucleotide. The sequencing adapter or the intermediate adapter comprises a binding moiety with a T-tail which is complementary to the A-tail. The sequencing adapter or the intermediate adapter is added to the sample in combination with a ligase; the ligase is adapted to create the binding between sample polynucleotide ends and the T-tails of the adapters selectively for double strand sample polynucleotide ends comprising a non-protected 5’ strand end and a A-tailed 3’strand end. For example, the ligase can be T4 ligase.

According to embodiments, the protecting of the ends of the polynucleotides in the sample comprises a method selected from a group of methods consisting of:

- dephosphorylating the 5’ ends of the polynucleotides; and

- contacting the sample with a non-proofreading polymerase that has terminal transferase activity and chain-elongation-inhibition nucleotides, in particular dideoxynucleotides; the “nonproofreading polymerase that has terminal transferase activity” can in particular be a Taq polymerase or a terminal transferase.

Accordingly, a non-protected end of a sample polypeptide is:

- (if the protection is based on dephosphorylation): an end of a double-strand polynucleotide whose 5’strand end comprises a reactive phosphate group; or

- (if the protection is based on an elongation inhibition nucleotide): an end of a double-strand polynucleotide whose 5’strand ends with an elongation-inhibition nucleotide, in particular a dideoxynucleotide; dideoxynucleotides are chain-elongating inhibitors of DNA polymerase. They are also known as 2', 3' because both the 2' and 3' positions on the ribose lack hydroxyl groups, and are abbreviated as ddNTPs (ddGTP, ddATP, ddTTP and ddCTP).

Hence, multiple different ways of chemically altering the ends of the polynucleotides in the sample for protecting the ends exist.

The above-mentioned first option of protecting the ends may be based on the fact that the 5' ends of a polynucleotide are normally phosphorylated. By dephosphorylating the ends of the polynucleotides and when the sample polynucleotide is cut after the dephosphorylation step, the cut ends will be the only polynucleotide ends in the sample still comprising the phosphate at the 5’ ends. As a consequence, an adapter may be attached (e.g. ligated) to the cut ends but not to the dephosphorylated ends. This enables an adapter to be selectively covalently attached to the cut ends of the target polynucleotide. Dephosphorylation of the ends can be achieved by adding a dephosphorylase to the sample and then inactivating the dephosphorylase, e.g. by heating the sample prior to addition of the cutting enzyme.

The second option of protecting the ends may rely on the idea of attaching at least one elongation- inhibition nucleotide to each end of the sample polynucleotide. Thereby, the 5’ ends will become chemically blocked such that chain elongation at the blocked 5 ' ends will no longer be possible. According to embodiments, the guide polynucleotides are guide RNAs, in particular crRNAs, single guide RNAs (sgRNAs), or combinations of a trans-activating CRISPR RNA (tracrRNA) and a synthetic CRISPR RNA (crRNA). The combination of the tracrRNA and the crRNA may be based on an annealing step.

According to embodiments, the polynucleotide-guided effector protein is an RNA-guided nuclease or a DNA-guided nuclease. According to embodiments, the polynucleotide-guided effector protein is a nuclease of a CRISPR/Cas system, and in particular a nuclease selected from a group comprising the following: Cas3, Cas4, Cas8a, Cas8b, Cas8c, Cas9, Casio, Cas10d, Cas12a (synonym for Cpfl), Cas12b, Cas13, Csn2, Csfl, Cmr5, Csm2, Csyl, Csel, C2c2, CasX, CasY or CasZ or Csm or a MADzyme such as MAD2 or MAD7, or any combination, variant, or catalytically active fragment of one or more of the aforementioned effector proteins. According to embodiments, the method further comprises performing the sequencing of at least parts of the sample polynucleotides starting from the bound sequencing adapters. The sequencing length may depend on the sequencing device and technique used.

For example, the sequencing can be performed based on nanopore sequencing technique in a sequencing device. Before performing the sequencing, additional steps for cleaning the polynucleotide fragments and/or for selectively extracting the ones of the sample polynucleotide fragments having bound a sequencing adapter at one of its ends may be performed and selectively the extracted sample polynucleotide fragment- sequencing adapter - complexes may be put into a sequencing device.

In a further aspect, the invention relates to a method for designing guide polynucleotides adapted for enabling the sequencing of double-strand sample polynucleotide regions surrounding a region with a “known sequence”. The method comprises:

- identifying a first protospacer adjacent motif (PAM) within a maximum distance from a first end of the known sequence;

- identifying a second protospacer adjacent motif (PAM) within a maximum distance from a second end of the known sequence;

- identifying a first protospacer sequence within a predefined distance upstream or downstream of the first PAM;

- identifying a second protospacer sequence within a predefined distance upstream or downstream of the second PAM; wherein the identification of the first and second protospacers and PAMs comprises checking that the position and orientations of the first and second PAMs and protospacers within the known sequence and relative to each other ensures that: - a first guide polynucleotide comprising a binding sequence complementary to the first protospacer causes an effector protein comprised in a first nucleoprotein particle to bind to the first protospacer and to cut the known sequence at a first cleavage site;

- the second guide polynucleotide comprising a binding sequence complementary to the second protospacer causes an effector protein comprised in a second nucleoprotein particle to bind to the second protospacer and to cut the known sequence at a second cleavage site;

- wherein the first and second cleavage sites define an in-between sequence as the part of the known sequence between the first and second cleavage sites; and

- whereby the binding sequences are oriented relative to each other such that the first and second nucleoprotein particles remain bound after the cutting to the end of the one of the two sample polynucleotide fragments generated by the cut which comprises the in-between sequence;

- creating the first guide polynucleotide; and

- creating the second guide polynucleotide.

For example, the step of creating a guide polynucleotide can comprise synthesizing a crRNA, or synthesizing a single guide RNA (sgRNA) or synthesizing a combination of a crRNA and a tracrRNA to be annealed to form the guide RNA. The synthesized sgRNA or crRNA is synthesized such that it comprises a binding sequence which is homologous to the respectively identified “sequence of homology” (and hence is able to bind to a protospacer complementary to the binding sequence).

According to embodiments, the method further comprises:

- contacting the first guide polynucleotides with effector proteins to provide the first nucleoprotein particles;

- contacting the second guide polynucleotides with effector proteins to provide the second nucleoprotein particles; and

- using the first and second nucleoprotein particles in a method for preparing a sample according to any one of the embodiments and examples described herein.

In a further aspect, the invention relates to a method for characterizing the integration of an insertion sequence into a genomic polynucleotide of an organism. The method comprises preparing a sample of the genomic polynucleotide of the organism in accordance with any one of the embodiments and examples described herein. The integrated insertion sequence comprises a region with known nucleotide sequence referred to as ’’known sequence”. The first and second guide polynucleotides are selected such that they guide the effector proteins to first and second cleavage sites which lie within the known sequence. The region between the first and second cleavage site is referred to as in- between sequence. Then, the adapter-based sequencing is performed for obtaining sequence information selectively for the ones of the sample polynucleotide fragments generated by the cutting which do not comprise the in-between-sequence.

These features may be highly advantageous in many use case scenarios where an insertion polynucleotide of a known sequence, typically a piece of DNA, is integrated into the genome of an organism at an unknown or insufficiently characterized locus. For example, there exist a plurality of techniques for randomly integrating a particular piece of DNA into the genome of an organism, e.g., a plant. The piece of DNA to be integrated may comprise a particular gene which may encode favorable traits such as disease resistance, drought resistance, increased nutritional value, increased growth rates, etc. Embodiments of the method may be used in order to determine and characterize the genomic environment of the insertion site which may have an impact on the expression of the integrated gene(s) and on other factors. For example, it may happen that a desired gene was successfully integrated but is never expressed due to inhibitory sequences in spatial proximity to the integration site. It may happen that the desired gene was integrated at multiple sites in the genome, which may or may not be considered desirable.

The technique used for integrating the DNA sequence into the genome may in some cases not be based on a random insertion but on a technique for targeted genomic integration. In this case, the adapter-based sequencing starting from the cut ends which were ligated directly or indirectly to a sequencing adapter may allow verifying that the integration polynucleotide was indeed inserted in the locus of choice and did not integrate into another genomic region.

According to some embodiments, the method further comprises analyzing the sequence information of the sequenced fragments for determining if sequence information was obtained for at least parts of at least two different sample polynucleotide fragments. The absence of sequence information for at least one of the at least two fragments indicates that only a part of the insertion sequence was integrated in the genomic polynucleotide or that the insertion sequence was not integrated at all.

For example, a chromosome may have been cut at the two cleavage sites of an integrated T-DNA into a central fragment comprising the in-between sequence and two remaining arms. Each of the two arms may comprise a portion of the known sequence, e.g., a sequence portion at one end of integrated T-DNA or of an integrated gene cassette which is known. This known portion will be part of the sequencing reads obtained during the adapter-based sequencing of at least parts of the two generated arms. In case the insertion polynucleotide, e.g., the T-DNA, was completely integrated into a single genomic site, there will be basically two groups of sequencing reads: one group of sequencing reads will start from the adapter-binding end of one of the two arms, and the other group of sequencing reads will start from the adapter-binding end of the other one of the two arms. The length of the sequencing reads in each group may vary and the sequencing reads obtained will basically fall into one of two groups of reads covering either the region upstream or downstream of the integration site. In case only a single group of sequencing reads is obtained, this may indicate that the cutting was performed only on a single cleavage site rather than two cleavage sites. This may indicate that the insertion polynucleotide was not integrated completely and may have lost the part of the integration polynucleotide comprising the missing cleavage site. In case no group of sequencing reads was obtained, this may indicate that the integration polynucleotide was not integrated into the genome at all.

In addition, or alternatively, the method comprises analyzing the sequence information of the sequenced fragments for determining if the obtained sequence information can be clustered into three or more different clusters of sequences, wherein the presence of three or more different sequence clusters indicates that the insertion sequence was integrated in two or more copies at multiple different genomic locations.

For example, it may happen that the insertion polynucleotide was inserted in two or more copies into the same or into different chromosomes of the organism. Embodiments of the invention will allow characterizing the genomic environment of each of the multiple integrated copies in a single sequencing reaction. So, it is not necessary to perform a whole-genome-sequencing in order to characterize the genomic context of multiple integrated copies of a particular insertion polypeptide.

The same approach can be performed for characterizing highly repetitive genomic elements such as transposons which are part of the original genome of an organism and have not been artificially (by a human) integrated into the genome. This may increase accuracy and reduce costs, because a large number of sequencing reads are obtained selectively for the genomic region surrounding the repetitive element of interest.

According to preferred embodiments, in case more than two clusters of genomic sequencing reads are observed per pair of opposite-directed guide polynucleotides, the sequencing data may be filtered such that all sequencing reads which do not comprise a short portion of the known sequence of the insertion polypeptide are removed from the data set or ignored. For example, sequencing reads which do not comprise a short portion of the known sequence of the T-DNA or gene cassette which was integrated into the genome may be the consequence of the genome comprising additional sites of homology in respect to the binding sequence of one of the guide polynucleotides. In this case, additional sequencing read clusters may be generated also for genomic regions lacking the insertion polynucleotide.

In addition, or alternatively, the method comprises analyzing the sequence information of the sequenced fragments for determining the orientation of the 5’ and 3’ ends of the insertion sequence within the sample polynucleotide.

Determining the orientation of the insertion polynucleotide within its genomic context may be advantageous as it allows determining if the insertion polynucleotide is under control of an inhibitory or promoting influence of a regulatory element, e.g., a promotor, or may itself have such an effect on genes in proximal vicinity to the integration site.

According to embodiments, the insertion sequence is a DNA strand having been incorporated into the genome of an organism by means of genetic engineering, e.g., based on a random integration approach or based on a targeted integration approach. The organism can in particular be a plant. The DNA strand can in particular be the T-DNA portion of a Ti plasmid or a gene cassette. The integration sequence is the T-DNA portion of a Ti plasmid or a gene cassette.

Preferably, in case the guide polynucleotides are created for cutting an insertion polynucleotide which is to be integrated into a genome by means of genetic engineering and which is normally not part of the genome of the organism, the guide polynucleotides are chosen and designed such that they selectively bind to a respective target sequence within the insertion polynucleotide and not to any sequence comprised in the wild-type genome of the organism.

In a further aspect, the invention relates to a method for preparing a sample comprising one or more sample polynucleotides for sequencing.

- providing a guide polynucleotide solution comprising at least one first guide polynucleotide and optionally one or more further guide polynucleotides, wherein the guide polynucleotides are in particular crRNAs; for example, a pool of two different oppositely oriented crRNAs can be provided which have been selected and designed as described herein for embodiments of the invention; however, it is possible that the guide polynucleotide solution merely comprises a single type of guide polynucleotide or a mixture of more than two, e.g. four different guide polynucleotides, e.g. two pairs of oppositely oriented crRNAs;

- mixing the guide polynucleotide solution with effector proteins, the effector proteins being an endonuclease adapted to create sticky cut ends, the effector proteins being in particular Cpfl for allowing nucleoprotein complexes respectively comprising one guide polynucleotide and one effector protein to form; for example, Cpf1 nucleoprotein complexes (RNPs) were formed by mixing the guide polynucleotide solution with a Cpf1 solution and incubating the mixture at 16°C-42°C, in particular 25° C for at least 5 minutes, e.g. about 10 minutes. No annealing step with a tracrRNA is performed as the crRNA can directly bind to Cpfl nucleases;

- protecting the sample polynucleotides in the sample e.g. by dephosphorylating sample DNA ends and then heat deactivating the phosphatase;

- adding the solution with the formed nucleoprotein particles (crRNA-Cpfl-RNP), a dNTP mixture (comprising dATP, dCTP, dGTP and dTTP), a polymerase for creating polyA tails (e.g. Taq polymerase) and a sticky-end-filling polymerase (e.g. T4 DNA polymerase, Klenow fragment) to the sample comprising the sample polynucleotides with the protected (e.g. dephosphorylated) ends. The sticky-end-filling polymerase is added for generating blunt ends from the sticky ends generated by Cpfl and similar types of endonucleases. Adding dNTPs instead of dATPs ensures that the sticky ends can be filled up with any nucleotide required. For example, the cleavage, fill-in and A- tailing reaction can be incubated at over 30°C, in particular at about 37°C for preferably about 10 minutes, and then at over 68°C, e.g. about 72°C for 5 minutes and then placed on ice; while the mixture is incubated at about 37°C, the RNPs will cut the sample polynucleotides at cleavage sites defined by their respective guide polynucleotide such that sticky cut ends are generated; the sticky ends are filled by the T4 polymerase (or a functionally equivalent sticky-end-filling polymerase) using the dNTP mixture, and a polyA tail is attached to all ends of the sample polynucleotides which are not blocked by an RNP by the Taq polymerase; and

- Contacting the above-mentioned solution with sequencing adapters for allowing the sequencing adapters to selectively bind directly or indirectly to the ones of the cut sample polynucleotide fragments not blocked by an RNP. For example, this step can be executed in a buffer in which a ligase which ligates the adapters to the non-protected cut ends comprising the A-tail can be active. For example, the ligase can be a T4 ligase and an example for a suitable ligation buffer would be a buffer comprising 50 mM Tris-HCI, 10 mM MgCI2, 1 mM ATP and 10 mM DTT having a pH 7.5 at 25°C.

This may have the advantage that the sample preparation process can be accelerated: Cpfl and other nucleases which generate sticky ends does not require a tracrRNA, so no extra annealing step for annealing the crRNA and the tracrRNA are necessary. It is sufficient to identify a suitable crRNA sequence and synthesize the crRNA based on the identified sequence. For example, the sequence and in particular the DNA-binding-sequence of the crRNA can be chosen such that it is possible to perform a targeted sequencing selectively in one or more directions starting from a known sequence outwards to unknown or insufficiently characterized adjacent regions in the sample polynucleotides. However, Cpfl has the downside of producing sticky ends which would normally require an additional step for filling up the overhangs to create blunt ends. Applicant has observed that by replacing the dATP mixture which is used in the A-tailing step with a dNTP mixture comprising all four nucleotides dATP, dGTP, dCTP and dTTP, and by adding both a sticky-end-filling polymerase and a A-tailing polymerase to the reaction mixture, it is possible to perform the cleaving, the sticky-end-filling- and the A-tailing reaction in a single step in a single reaction vessel. Hence, the time for sample preparation including the time for creating the guideRNAs and RNPs can be reduced.

Taq polymerase has terminal transferase activity and preferentially adds adenine to 3' ends (even in the presence of other nucleotides). This feature of the Taq polymerase is sometimes exploited in vector cloning. This feature of the Taq polymerase can be used in the context of targeted sequencing for increasing the speed of sample preparation.

According to embodiments, the binding of the sequencing adapter to the sample polynucleotide ends can be performed as follows: a ligation mastermix can be made using components from an SQK- LSK109 library prep kit from ONT. This contained 20 pL of Ligation Buffer (LNB) (ONT), 3 pL of nuclease-free water, 10 pL of NEBNext Quick T4 DNA Ligase (NEB), and 5 pL of AMX (AMX, ONT). This mastermix is added to the cleaved and A-tailed reaction and mixed by gently flicking the tube. Once thoroughly mixed the remaining 18 pL are added and mixed in the same way. The reaction is incubated at room temperature for 10 minutes. According to embodiments, the sample preparation method further comprises purifying the adapter- ligated sample polynucleotide fragments. For example, the purification step can be performed in accordance with any DNA purification method which is compatible with the adapter-based sequencing technique to be applied, e.g., with the nanopore sequencing technique. For example, the purification technique can be based on a solid phase reversible immobilization (SPRI) bead cleanup. However, alternative approaches which use centrifugation instead of bead-binding can likewise be used.

According to one example, the adapter- ligated fragments can be purified by adding TE (Tris-EDTA) buffer (pH 8) and 0.3x volume (48 pL) of AMPure XP magnetic beads (Beckman Coulter) followed by gentle inversion to mix. The reaction can then be incubated at room temperature for 10 minutes. The magnetic beads can then be pelleted on a magnet and washed twice with a wash buffer, preferably a non-alcohol based wash buffer, e. g. 250 pl_ of LFB (ONT). Beads are then resuspended in 13 pL of elution buffer EB (ONT) and incubated at room temperature for 10 minutes. Beads can be again pelleted on a magnet and the supernatant containing adapter- ligated fragments removed to a new tube.

According to embodiments, the adapter-based sequencing is performed using a nanopore-sequencing device.

For example, A FLO-MIN106 flowcell (MinlON/GridlON from ONT) can be primed by adding 30 mI_ of FLT (ONT) to a tube of FB (ONT) and then loading 800 mI_ of this mixture into the priming port of the flowcell. The mixture is allowed to sit for 5 minutes. After the 5-minute incubation, the SpotON port is opened and then another 200 mI_ of FB/FLT is added to the priming port. Then, 12 pL of a cleaned sequencing-adapter-sample-polynucleotide-complex solution (also referred to as “sequencing library” obtained by eluting the beads) are mixed with 37.5 mI_ SQB (ONT) and 25.5 mI_ of LB (ONT). This mixture is then loaded into the SpotON port of the primed flowcell. The sequencing can be carried out for 24-72 hours on a MinlON sequencing device.

A “terminal transferase” (also referred to as “Terminal deoxynucleotidyl transferase (TdT)” is a template independent polymerase that catalyzes the addition of nucleotides to the 3' hydroxyl terminus of DNA molecules.

A “Sample” as used herein is liquid or solid matter comprising polynucleotides. The sample may be a biological sample. The sample may be obtained from or extracted from any organism or microorganism, e.g. from plants, animals (including humans), fungi, bacteria or archaea. The sample may be a non-biological sample, e.g. water such as drinking water or river water. The sample may be processed prior to carrying out the method, for example by centrifugation or by passage through a membrane that filters out unwanted molecules or cells, such as red blood cells.

A “sample polynucleotide” as used herein is a polynucleotide or mixture of multiple different polynucleotides comprised in a sample. The polynucleotide may be from any organism. The polynucleotide can be a nucleic acid, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The polynucleotide can comprise one strand of RNA hybridized to one strand of DNA. The polynucleotide may comprise one or more synthetic nucleotide. Synthetic nucleotides known in the art include peptide nucleic acid (PNA), glycerol nucleic acid (GNA), threose nucleic acid (TNA), locked nucleic acid (LNA) or other synthetic polymers with nucleotide side chains. The sample polynucleotide is preferably DNA, RNA or a DNA/RNA hybrid, most preferably DNA, in particular double-stranded DNA. The sample polynucleotide may in particular be DNA, e.g. genomic DNA. Preferably the genomic DNA is not fragmented.

A “guide polynucleotide” as used herein is a polynucleotide or an annealed combination of two or more polynucleotides that guides the effector protein to its target sequence. A guide polynucleotide comprises at least two functional moieties: an effector protein binding site adapted to bind to and form a complex with an effector protein, and a binding sequence adapted to hybridize with a complementary sequence comprised in the sample polynucleotide, the so-called protospacer. The guide polynucleotide may have any structure that enables it to bind to the target polynucleotide and to an effector protein.

A guide polynucleotide targets the complementary sequences by simple Watson-Crick base pairing. The guide polynucleotide may comprise a crRNA that binds to a sequence in the target polynucleotide and a tracrRNA that typically binds to the effector protein. Typical structures of guide polynucleotides are known in the art.

For example, the crRNA is typically a single stranded RNA. A guide polynucleotide can be a guide RNA, e.g. a crRNA, a single guide RNA or a combination of a crRNA and a tracrRNA. A single guide RNA is an artificially designed combination of two RNA molecules: one component (tracrRNA) is responsible for binding to the effector protein (e.g. Cas9 endonuclease), and the other component (crRNA) is responsible for binding to a specific sequence within the sample polynucleotide. Therefore, the trans activating RNA (tracrRNA) and crRNA are two key components and can be joined by e.g. tetraloop which results in formation of sgRNA. TracrRNAs are polynucleotides having a stem loop structure in themselves that can interact with the endonuclease enzyme.

CRISPR RNAs (crRNA) identify the specific complementary target region in the sample polynucleotide, which is cleaved by effector proteins after its binding with the guide RNA (a combination of a crRNA and a tracrRNA or an sgRNA or only the crRNA), which all together are known as an “effector complex” or “nucleoprotein particle". For example, Cas9 can form a complex with a combination of a crRNA and a tracrRNA or with an sgRNA while Cpfl forms a complex with a crRNA. By modifying the binding sequence in the guide RNA, the binding location can be changed, thereby guiding the effector protein to cut the sample polynucleotide at basically any sequence location of choice (provided the location meets some further criteria, e.g. comprises a PAM sequence of the chosen effector protein).

A “guide polynucleotide” may be composed of a single molecule (a sgRNA), or it may comprise two separate molecules. Some CRISPR systems only need a crRNA to be active, whereas other CRISPR systems require the presence of a crRNA and a tracrRNA. These relevant RNA portions have to be incorporated into a guide molecule accordingly to guarantee a functional RNA-guided CRISPR system.

An “A-tail” as used herein is a single-strand tail created by enzymatically adding a single adenine nucleotide (dATP) or a plurality of adenine nucleotides to the 3' end of a polynucleotide.

Analogously, a “T-tail” is a single-strand tail created by enzymatically adding a single thymine nucleotide (dTTP) or a plurality of thymine nucleotides to the 3' end of a polynucleotide.

A “sequencing adapter” as used herein is a molecule which is adapted to directly or indirectly (via one or more further, “intermediate” adapter molecules) bind to an end of a polynucleotide and enable the sequencing of the bound polynucleotide starting from the end where the sequencing adapter is bound. For example, sequencing adapters are commonly used for performing nanopore sequencing. Using nanopore sequencing, a single molecule of e.g. DNA or RNA can be sequenced without the need for PCR amplification or chemical labeling of the sample. Nanopore sequencing has the potential to offer relatively low-cost genotyping, high mobility fortesting, and rapid processing of samples with the ability to display results in real-time.

An adapter may be designed such that it has a single stranded region, such as a single stranded overhang, on the opposite strand to the overhang added e.g. via the A-tailing to the cut end of the sample polynucleotide to which the adapter is wished to bind. In case an intermediate adapter is used, the intermediate adapter may have an overhang complementary to the A-tail added to the ends of the sample polynucleotides and may have a further overhang complementary to a polydNTP tail of the actual sequencing adapter.

According to embodiments, the sequencing adapters or the intermediate adapters are added to the sample in combination with a ligase such as the T4 ligase which is adapted to ligate the adapter to an end of the sample polynucleotide only in case this end comprises a non-protected (e.g. not dephosphorylated) 5’ end and if this end in addition comprises a 3’ A-tail. Thereby, it is ensured that only the cut ends where no RNP sits are 1) complementary to the T-tail of the adapter and 2.) allow the chemical ligation reaction for connecting the adapter to the sample polynucleotide end.

According to embodiments, the overhang of the cut end of the sample polynucleotide generated in the A-tailing step (adding a single dATP or a sequence of multiple dATPs) is capable of hybridizing to a single stranded region, such as the overhang, of the adapter. It is possible that the A-tail is only one or two dATP nucleotides long, but in some embodiments, there will be at least 3, such as from 4 to 20 matched bases between the two overhang sequences.

In one embodiment the adapter may be missing a 5' phosphate. This can help prevent the adapters self-ligating. According to some embodiments, the adapter is used as a tag to separate the sample polynucleotide fragments having been ligated to an adapter, e.g. by using the adapter to attach biotin to the target polynucleotide, allowing the target polynucleotide to be attached to beads. According to embodiments, the sequencing adapter is adapted to binding to a helicase protein and to a transmembrane pore. The adapter adapted to bind to a transmembrane pore preferably comprises a polynucleotide binding protein and/or a membrane or pore anchor. The helicase protein is capable of threading the sample polynucleotide to be sequenced through the pore. The adapter may optionally comprise a tag for binding to a bead.

A “sticky end” as used herein is a non-blunt end of a double-stranded polynucleotide. A sticky end comprises an overhang, i.e., a stretch of unpaired nucleotides at the end of the polynucleotide.

A “blunt end” as used herein is an end of a double-stranded polynucleotide which does not comprise an overhang.

An “effector protein” as used herein is a polynucleotide-targeting enzyme, in particular a nuclease, in particular an RNA- or DNA-targeting nuclease.

The effector protein can in particular be an “RNA-guided nuclease”, i.e., a site-specific nuclease, which requires an RNA molecule, i.e. a guide RNA, to recognize and cleave a specific target site, e.g. in genomic DNA or in RNA as target. The RNA-guided nuclease forms a nuclease complex together with the guide RNA and then recognizes and cleaves the target site in a sequence-dependent matter. RNA-guided nucleases can therefore be programmed to target a specific site by the design of the guide RNA sequence.

According to some embodiments, the effector protein is a "CRISPR nuclease". A CRISPR nuclease is a specific form of a site-directed nuclease and refers to any nucleic acid guided nuclease which has been identified in a naturally occurring CRISPR system, which has subsequently been isolated from its natural context, and which preferably has been modified or combined into a recombinant construct of interest to be suitable as tool for targeted genome engineering. Any CRISPR nuclease can be used and optionally reprogrammed or additionally mutated to be suitable for the various embodiments according to the present invention as long as the original wild-type CRISPR nuclease provides for DNA recognition, i.e., binding properties. CRISPR nucleases also comprise mutants or catalytically active fragments or fusions of a naturally occurring CRISPR effector sequences, or the respective sequences encoding the same. A “nucleoprotein particle” or “nucleoprotein complex” (RNP) as used herein is a complex comprising at least one effector protein and a guide polynucleotide.

According to embodiments, the nucleoprotein particle is a "CRISPR system". A CRISPR system is a combination of a CRISPR nuclease or CRISPR effector, or a functional active fragment or variant thereof together with the cognate guide polynucleotide guiding the relevant CRISPR nuclease. Such nucleases may leave blunt or sticky ends. Further included are variants of an RNA-guided nuclease, which may be used in combination with a fusion protein, or protein complex, to alter and modify the functionality of such a fusion protein, for example, in a base editor or prime editor.

A variety of different CRISPR nucleases/systems and variants thereof are meanwhile known to the skilled person and can be used in embodiments of the invention. These variants include, inter alia, CRISPR/Cas9 systems (EP2771468), CRISPR/Cpfl systems (EP3009511 B1), CRISPR/C2C2 systems, CRISPR/CasX systems, CRISPR/CasY systems, CRISPR/Cmr systems, CRISPR/MAD systems, including, for example, CRISPR/MAD7 systems (WO2018236548A1) and CRISPR/MAD2 systems, CRISPR/CasZ systems and/or any combination, variant, or catalytically active fragment thereof. A nuclease may be a DNAse and/or an RNAse, in particular taking into consideration that certain CRISPR effector nucleases have RNA cleavage activity alone, or in addition to the DNA cleavage activity.

CRISPR nucleases, in comparison to TALEN or ZFN systems, may have the advantage that the RNA- guided CRISPR nuclease can be easily programmed or re-programmed by just exchange the at least one guiding RNA to a new genomic target site of interest without the need to modify the CRISPR nuclease or variant as such.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, only exemplary forms of the invention are explained in more detail, whereby reference is made to the drawings in which they are contained. They show:

Figure 1 a flowchart of a method for preparing a sample comprising sample polynucleotides for adapter-based sequencing;

Figure 2 a flowchart of a method for preparing a sample comprising sample polynucleotides for adapter-based sequencing using a nuclease which generates sticky ends;

Figure 3 a flowchart of a method for identifying, synthesizing and using a pair of guide polynucleotides facing away from each other;

Figure 4 an illustration of the location and orientation of the target sequence of two oppositely oriented guide polynucleotides within a known sequence region;

Figure 5 is an illustration of the sample preparation method where the sample is contacted with a mixture of two different types of RNPs;

Figure 6 an illustration of the sample preparation method where two fractions of the sample are contacted separately with a respective type of RNP, and wherein after adapter ligation the different fractions are pooled for sequencing;

Figure 7 an illustration of sample preparation and adapter-based sequencing as described already with reference to figure 5;

Figure 8 illustrates the cleavage of a double stranded polynucleotide using two different guide polynucleotides oriented in opposite directions; Figure 9 shows the binding sites and PAMs of two pairs of guide polynucleotides oriented in opposite directions within the blaTEM gene (TEM-1 , NG_050145.1);

Figure 10 illustrates the location and orientation of the binding sequences of two further pairs of guide polynucleotides oriented in opposite directions within a target gene;

Figure 11 shows sugar beet sequencing reads starting from two cleavage sites within an integration polynucleotide;

Figure 12 shows a zoomed view of the sugar beet sequencing reads;

Figure 13 shows sugar beet sequencing reads starting from two cleavage sites (introduced based on the crRNAs whose positions are indicated in figures 10 and 11) within an insertion polynucleotide having been integrated into chromosome 8;

Figure 14 shows the binding sites of four guide polynucleotides designed for an insertion polynucleotide to be integrated into the genome of corn;

Figure 15 shows the corn sequencing reads obtained by an adapter-based sequencing reaction based on the four guide polynucleotides illustrated in figure 14; and

Figure 16 shows the corn sequencing reads obtained for an insertion polynucleotide having been integrated into chromosome 3 of the corn genome.

DETAILED DESCRIPTION

Figure 1 shows a flow chart of a method for preparing a sample comprising sample polynucleotides for adapter-based sequencing.

The method can be used for targeted sequencing in many different use case scenarios some of which will be described below in the form of “examples”. However, the method may likewise be used for a plurality of other examples, e.g. for a plurality of other genes, regions of interests, and species.

Example 1: Cas9 or Cpf1 -mediated targeted sequencing of Sugar beet DNA a) Generating genetically modified plants and DNA sample preparation

According to example 1 , a gene or piece of DNA of known sequence, e.g. the blaTEM gene (TEM-1 , NG_050145.1) is inserted into the genome of Beta vulgaris subsp. vulgaris (Sugar Beet, RefBeet- 1.2.2) by means of Agrobacterium-mediated transformation. In this process a Ti plasmid (e.g. pBIN19) is constructed containing the coding sequence of TEM-1. This plasmid is established in a culture of Agrobacterium tumefaciens. This bacterium culture is allowed to interact with plant cells, thereby enabling the bacteria to inject the T-DNA (transferred DNA) portion of the Ti plasmid (which contains the TEM-1 sequence) into the plant cells where the T-DNA is randomly incorporated into the genome. Then, the genomic DNA of the transfected plant cells is extracted and a sample comprising the extracted genomic DNA is provided. Thereby, a sample comprising a “sample polynucleotides” is provided, whereby the chromosomes and chromosome fragments (if any) of the genome of the genetically modified plant are provided as the “sample polynucleotides”.

In this example it is desired to determine where the T-DNA has been inserted in the genome and to determine whether it has been inserted as a fully intact single copy. b) Guide RNA design and synthesis

Since the sequence of TEM-1 is known, guide RNAs (or at least the guide RNA parts comprising the sequence to selectively bind to a target sequence, i.e., the crRNAs) are designed and created.

The guide RNAs are created to bind to a protospacer of the marker gene on the antisense strand (called reverse guides or "F") and to a protospacer on the sense strand (called forward guides or "R") of a marker gene (see Figure 4). The marker gene is an example for the region of known sequence.

It is expected that two different types of RNPs containing "R" crRNA or the "F" crRNA cut the DNA at different cleavage sites within a DNA region of known sequence, wherein each of the RNPs after the cutting remains bound to the end of one of the two fragments generated by this cutting which comprises the sample polynucleotide region located in-between the two cleavage sites of the two different types of RNPs (see Figures 4 and 8).

For example, the two guide RNAs can be identified using a software configured to search a selected polynucleotide of known sequence for suitable binding sequences of guide polynucleotides adjacent to a PAM. In addition to the position and orientation of the binding sequence of the guide polynucleotides, additional criteria such as GC content or hairpin formation may be considered. For example, the identification of the sequence of suitable guide polynucleotides may pe performed based on a software application. After having designed suitable guide polynucleotide sequences, the guide polynucleotide sequence or at least the crRNAs may be synthesized or ordered from a supplier, e.g. from IDT Integrated DNA Technologies, Inc. (IDT).

The table below shows two alternative pairs of crRNAs designed for different PAM sites in the TEM-1 gene. The suffix F/R in the name indicates the orientation, the binding sequence indicates the sequence which is to be ordered or synthesized as part of the guide RNA and which is complementary/adapted to bin to a region within the TEM-1 gene referred to as “protospacer”, and the PAM site adjacent to the respective protospacer is also provided.

In case the Cas9 nuclease is to be used as effector protein, the TEM-1_Cas9_F and TEM-1_Cas9_R crRNAs have to be synthesized and have to be annealed with a tracrRNA.

In case the Cpfl nuclease is to be used as effector protein, the TEM-1_Cas12a_F and TEM- 1_Cas12a_R crRNAs have to be synthesized. No tracrRNA is needed. In case a known sequence whose sequence environment is to be examined should not comprise a PAM of a particular nuclease, e.g. of Cas9, another type of nuclease, e.g. Cpfl , may be used (provided a corresponding PAM exists in the known sequence) and the sample preparation protocol may be adapted accordingly.

Table 1: two alternative sets of crRN As designed for PAM sites in the TEM-1 gene.

In case Cas9 is used as the effector protein, the sample preparation protocol may include the following steps for synthesizing the guide RNAs:

- A crRNA pool is made by mixing at least two designed crRNAs facing in opposite directions as described above in equal volumes (e.g. 100 pM ea.); in the “example 1 ” described here, the TEM-1_Cas9_F and TEM-1_Cas9_R crRNAs are synthesized and pooled;

- An annealed guide RNA pool is made by mixing 1 pl_ of the crRNA pool with 1 mI_ of tracrRNA (100 mM, IDT), and 8 mI_ of Nuclease-free Duplex Buffer (from IDT). The mixture is heated to 95°C for 5 minutes and then allowed to cool to room temperature; thereby, two different types of guide polynucleotides are created which all comprise the same type of tracrRNA but different types of crRNA (TEM-1_Cas9_F and TEM-1_Cas9_R);

- Cas9 RNPs are formed by mixing 3 pl_ of the annealed guide RNA mixture with 3 pl_ of 10X CutSmart buffer (provided e.g. by New England Biolabs - NEB), 23.7 pl_ of nuclease-free water, and 0.3 pl_ of HiFi Cas9 (62 mM). The mixture is incubated at room temperature for 30 minutes. CutSmart Buffer (1x) comprises 50 mM potassium acetate, 20 mM Tris-acetate and 10 mM Magnesium acetate. c) Sample preparation for targeted sequencing

After the extraction of the genomic DNA from the transgenic plant for providing the sample, a sample preparation protocol for adapter-based sequencing is performed. For example, the sample preparation protocol of Oxford Nanopore Technologies (ONT) published as “ENR_9084_v109_revP_04Dec2018” can be followed. However, other sample preparation protocols for adapter-based, targeted sequencing may likewise be used. In step 102 of the sample preparation method, the ends of the chromosomes and chromosome fragments (if any) in the genomic DNA sample are protected; for example, this is achieved by dephosphorylating the ends of the DNA in the sample; 5 pg of the (genomic) sample DNA in a volume of 24 pl_ is dephosphorylated by adding 3 pL of 10X CutSmart Buffer (NEB) and 3 mI_ of calf intestinal phosphatase (CIP). For example, the CutSmart NEB buffer comprises 500 mM Potassium acetate,

200 mM Tris-acetate, 100 mM Magnesium acetate, and 1000 pg/ml BSA. Similar buffers of other suppliers may be used as well. The dephosphorylation reaction is incubated at 37°C for 10 minutes and then at 80°C for 2 minutes.

Next in step 104, the DNA with the protected ends is contacted with at least two different types of nucleoprotein particles (here: two different types of RNPs comprising the same type of tracrRNA but different types of crRNA, namely TEM-1_Cas9_F and TEM-1_Cas9_R), thereby letting the nucleoprotein particles cut the sample DNA at a cleavage site determined by the binding sequence of the respective guide RNA; to achieve this, 10 mI_ of Cas9 RNPs are added to the DNA sample with the protected ends. The cleavage reaction is incubated at 37°C for 15 minutes and then 72°C for 5 minutes.

In addition, the ends generated by the nuclease in step 104 which are not blocked by the nucleoprotein particles are polyA-tailed to enable the sequencing adapter (or an intermediate adapter) to selectively bind to the cut ends comprising this polyA-Tail. This reaction can be performed during step 104 simply by adding 1 mI_ of Taq polymerase (NEB) and 1 pL of 10 mM dATP to the reaction mixture such that the cleavage reaction and the polyA-tailing reaction can be performed in the same sample in a single step;

Next in step 106, sequencing adapters are added to the cleaved and polyA-tailed sample DNAs to enable the adapters to selectively bind to the sample DNA cut ends comprising a polyA-tail. To achieve this, 20 mI_ of a ligation mastermix is added to the cleaved and A-tailed reaction and mixed by gently flicking the tube. Once thoroughly mixed the remaining 18 mI_ of the master mix is added and mixed in the same way. The reaction is incubated at room temperature for 10 minutes. The ligation mastermix can be made using components from an SQK-LSK109 library prep kit from ONT. This contains 20 mI_ of LNB (ONT), 3 pL of nuclease-free water, 10 pL of NEBNext Quick T4 DNA Ligase (NEB), and 5 pL of a buffer comprising the sequencing adapters (called “AMX” of ONT). d) Performing targeted, adapter-based sequencing

Then, the sample may be further processed and used for targeted sequencing.

The adapter-ligated fragments obtained in step 106 are purified. The purification can be performed e.g., by adding 1 volume (80 mI_) of 1X Tris-EDTA (“TE”) buffer (pH 8) and 0.3x volume (48 mI_) of AMPure XP Beads (Beckman Coulter) followed by gentle inversion to mix. The reaction is incubated at room temperature for 10 minutes. The magnetic beads are then pelleted on a magnet and washed twice with 250 mI_ of LFB (ONT). Beads are then resuspended in 13 pL of elution buffer EB (ONT) and incubated at room temperature for 10 minutes. Beads are again pelleted on a magnet and the supernatant containing adapter-ligated fragments originating within the TEM-1-gene is removed to a new tube.

Next in step 108, the targeted sequencing is performed: A flowcell (e.g. a FLO-MIN106 flowcell MinlON/GridlON from ONT) is primed by adding 30 pL of FLT (ONT) to a tube of flush buffer (e.g. “FB” of ONT) and then loading 800 pl_ of this mixture into the priming port of the flowcell. FB is a buffer in which the nanopores remain stable while an electrical current is applied. FLT contains a substance that tethers DNA to surfaces. This is allowed to sit for 5 minutes. After the 5 minute incubation the SpotON port is opened and then another 200 pL of FB/FLT is added to the priming port. 12 pL of the eluted sequencing library is mixed with 37.5 pL SQB (ONT) and 25.5 pL of LB (ONT). LB contains microbeads that help drag DNA molecules down to the membrane where nanopores lie. SQB contains the "fuel" that allows the motor protein to push the DNA molecule through the nanopore (this is dATP). This mixture is then loaded into the SpotON port of the primed flowcell. The SpotON port and priming ports are closed and the sequencing run is started.

As shown in Figure 7, Cas9 RNPs are expected to cut and remain bound to DNA fragments in the TEM-1 gene representing the “known sequence” 605. Fragments extending off the 5' and 3' ends of the gene will be released with 5' phosphate groups and will be available for A-tailing by Taq polymerase. Only these fragment ends (containing 5' phosphate groups and 3' A-overhangs) will be able to be ligated to sequencing adapters in the next step. Thus, all sequencing reads are expected to originate within the TEM-1 gene and extend outwards towards the borders of the T-DNA and eventually out into the genome of sugar beet. This simultaneously evaluates the intactness of the T- DNA and reveals the location of the insertion in the sugar beet genome.

Alternatively, instead of Cas9, Cpfl could be used as the nuclease and in this case the two crRNAs for the Cpfl nuclease would have to be used. The sample preparation protocol would have to be modified by adding a sticky-end-filling polymerase to the buffer mixture where the cutting and A-tailing takes place. In addition, the dATP monomers are replaced by a mixture of dATP, dGTP, dTTP an dCTP nucleotides, thereby allowing both the “filling in” of the sticky ends and the generation of the A-tails as described above.

Example 2: Cas9 -mediated targeted sequencing of Sugar beet DNA a) Generating genetically modified plants

In this experiment, an integration site analysis based on a targeted sequencing approach was performed on genomic DNA of another genetically modified line of sugar beet plants.

The sugar beet line was generated by Agrobacterium-mediated transformation. By this method, a T- DNA containing a selective marker gene was inserted into the genome.

The targeted sequencing in example 2 was performed to a) identify the location of an Agrobacterium- mediated random T-DNA integration and b) determine the intactness of the T-DNA in a sugar beet genome.

The method may allow analyzing the previously unknown sequence surrounding the inserted T-DNA by using the guide RNAs that were specifically designed in opposite orientations within one of the inserted genes (selective marker gene) (Figure 10). Thus, the resulting sequencing reads originate within the selective marker gene and continue on past the integration points and into the sugar beet genome (Figures 11 and 12). Because long-read sequencing was performed, tens of kilobases of flanking information is obtained (Table 2) making it trivial to locate the integration points (Figure 13).

Table 2: Overview of sequence information obtained using the guide RNA design and targeted sequencing method described above.

In this experiment, over 100 reads were obtained at each insertion point (at the left border of the T- DNA and at the right border) providing significant support for integration in the genome between 47,887,187 bp and 47,887,191 bp on chromosome 8 (Figure 13). While this targeted sequencing approach described in this example was not able to produce reads that fully span junctions and full integrations, it can still provide strong evidence as to whether single or multiple complete, partial or multicopy integrations are present. In this case, the reads indicate a single copy of the full T-DNA is integrated.

According to some examples, the guide RNA design, sample preparation and targeted sequencing method described herein can be performed iteratively, whereby the sequence information obtained in the previous sequencing step is used as the new “known sequence” and the guide polynucleotides are designed such that they bind to the one of the two ends of the new known sequence which lies next to the still unknown/insufficiently characterized regions of the sample polynucleotide. By performing multiple repeats of creating new guide polynucleotides at the ends of regions of sample polynucleotides whose sequence was obtained in the previous sequencing step, it is possible to sequence very long sample polynucleotides and even whole chromosomes. In particular, the approach using two different tubes as outlined in figure 6 for the “chromosome walk” in opposite directions may be used for this iterative sequencing approach. b) DNA sample preparation

DNA Material of the above-mentioned sugar beet line was obtained as follows: Sugar beet leaf tissue from this line was snap frozen in liquid nitrogen. Nuclei were isolated using the method outlined by Zhang M et al.: “Preparation of megabase-sized DNA from a variety of organisms using the nuclei method for advanced genomics research”, Nat Protoc. 2012 Feb 16;7(3):467-78. doi: 10.1038/nprot.2011 .455. PMID: 22343429). DNA was extracted from the isolated nuclei using a Nanobind Big Plant DNA kit (Circulomics Inc, Baltimore, USA) as per the manufacturer's protocol (www.circulomics.com/support-nanobind). c) Guide RNA design and synthesis

Four guide RNAs (crRNA) were designed with homology to known sequence contained within the T- DNA (the inserted selective marker gene). Two crRNAs were designed at the 5' end of the gene on the antisense strand. The other two crRNAs were designed at the 3' end of the gene on the sense strand (Figure 10). d) Preparation of the nucleoprotein particles

Cas9 crRNAs, tracrRNA, and Alt-R® S. pyogenes HiFi Cas9 nuclease V3 were obtained from Integrated DNA Technologies (IDT). The "Cas9 targeted sequencing" protocol from ONT (Nanopore Protocol: "Cas9 targeted sequencing - Introduction to the protocol" Version: ENR_9084_v109_revP 04 Dec 2018) was followed:

- A crRNA pool was made by mixing all four crRNAs in equal volume (100 pM ea.)

- An annealed guide RNA pool was made by mixing 1 pl_ of the said crRNA pool with 1 mI_ of tracrRNA (100 mM, IDT), and 8 mI_ of Nuclease-free Duplex Buffer (IDT). The mixture was heated to 95C for 5 minutes and then allowed to cool to room temperature.

- Cas9 RNPs were formed by mixing 3 pl_ of the annealed guide RNA mixture with 3 pl_ of 10X CutSmart buffer (NEB), 23.7 pl_ of nuclease-free water, and 0.3 pl_ of HiFi Cas9 (62 mM). The mixture was incubated at room temperature for 30 minutes. e) Sample preparation and adapter-based sequencing

- Step 102: 5 pg of DNA in a volume of 24 mI_ was dephosphorylated by adding 3 pL of 10X CutSmart Buffer (NEB) and 3 mI_ of calf intestinal phosphatase (CIP). The dephosphorylation reaction was incubated at 37C for 10 minutes and then 80C for 2 minutes.

- Step 104: To this reaction 10 pL of Cas9 RNPs, 1 pL of 10 mM dATP, and 1 pL of Taq polymerase (5000 units/mL, NEB) was added. The cleavage and A-tailing reaction was incubated at 37C for 15 minutes and then 72C for 5 minutes.

- Step 106: 20 mI_ of a ligation mastermixwas added to the cleaved and A-tailed reaction and mixed by gently flicking the tube. Once thoroughly mixed the remaining 18 mI_ was added and mixed in the same way. The reaction was incubated at room temperature for 10 minutes. The ligation mastermix was made using components from an SQK-LSK109 library prep kit from ONT. This contained 20 pL of LNB (ONT), 3 pl_ of nuclease-free water, 10 pL of NEBNext Quick T4 DNA Ligase (NEB), and 5 pL of AMX (AMX, ONT).

The adapter-ligated fragments were purified by adding 1 volume (80 mI_) of 1X TE (pH 8) and 0.3x volume (48 mI_) of AMPure XP Beads (Beckman Coulter) followed by gentle inversion to mix. The reaction was incubated at room temperature for 10 minutes. The magnetic beads were then pelleted on a magnet and washed twice with 250 mI_ of LFB (ONT). Beads were then resuspended in 13 pL of elution buffer EB (ONT) and incubated at room temperature for 10 minutes. Beads were again pelleted on a magnet and the supernatant containing adapter-ligated fragments was removed to a new tube.

A FLO-MIN106 flowcell (MinlON/GridlON from ONT) was primed by adding 30 pL of FLT (ONT) to a tube of flux buffer FB (ONT) and then loading 800 mI_ of this mixture into the priming port of the flowcell. This was allowed to sit for 5 minutes. After the 5-minute incubation, the SpotON port was opened and then another 200 mI_ of FB/FLT was added to the priming port.

Then, step 108 was executed by mixing 12 mI_ of the eluted sequencing library (i.e., the cut DNA strands some of which being ligated to a sequencing adapter) with 37.5 mI_ SQB (ONT) and 25.5 mI_ of LB (ONT). This mixture was then loaded into the SpotON port of the primed flowcell. The SpotON port and priming ports were closed and sequencing was carried out for 24 hours on a MinlON sequencing device. f) Data analysis

The sequencing data obtained from the flowcell-based sequencing (“Raw ONT sequencing data”) was basecalled using Guppy software version 3.4.3+f4fc735 software. ONT reads were sequentially mapped against the construct sequence used for transformation and then to the parent sugar beet genome to identify reads that contained both T-DNA and genomic sequence (i.e. chimeric junction reads). Mapping was performed with the software minimap2.17-r941 using the parameters -a -x map- ont and -r 10000. Alignments were visualized in CLC Genomics Workbench 11 .0.

Example 3: Cas9 -mediated targeted sequencing of corn DNA a) Generating genetically modified plants

In this experiment, a corn line was generated by Agrobacterium-mediated transformation. By this method, a T-DNA containing a selective marker gene was inserted into the corn genome.

In experiment 3, an integration site analysis using pairs of guide RNAs that were designed in opposite orientations within one of the inserted elements (selective marker gene and its promoter) was performed to a) identify the location of an Agrobacterium mediated T-DNA integration and b) determine the intactness of the T-DNA in a corn genome. Thereby, it was possible to discover the previously unknown sequence surrounding the inserted T-DNA (Figure 14). In this particular sample, reads originating in the selective marker gene that extend towards the left border, loop back on themselves creating palindromic sequences (Figure 15). This indicates that the T-DNA is inserted as an inverted tandem repeat with the LB site in the middle and with two RB sites at the genomic insertion points. Reads originating in the promoter of the selective marker extend outwards beyond the RB and into the corn genome. Approximately half of these reads extend in the forward orientation on chromosome 3 and half extend in the reverse orientation on chromosome 3 (Figure 16), also supporting the model of an inverted tandem repeat insertion.

Because long-read sequencing is performed, hundreds of kilobases of flanking information is obtained (123 kb), making it trivial to locate the integration site on chromosome 3 from 216,967,972 - 216,967,987 bp within exon 2 of the Isocitrate dehydrogenase gene (indicating that it disrupts this gene). This experiment demonstrates how adapter-based sequencing using pairs of opposite-directed guide RNAs is able to not only identify T-DNA insertion sites, but to characterize the structural integrity of insertions. b) Nuclei isolation and DNA extraction:

Material was used from the above-mentioned corn line that had been generated by Agrobacterium- mediated transformation. Nuclei were isolated as described for example 2. DNA was extracted from the isolated nuclei using a Nanobind Big Plant DNA kit (Circulomics Inc, Baltimore, USA) as per the manufacturer's protocol (www.circulomics.com/support-nanobind). c) guide RNA design and creation of nucleoprotein particles

Four guide RNAs (crRNA) were designed with homology to known sequence contained within the T- DNA (the inserted selective marker gene). Two crRNAs were designed at the 5' end of the promoter of the gene and both were oriented in the reverse direction. The other two crRNAs were designed at the 3' end of the gene and were oriented in the forward direction (Figure 13).

The crRNAs, tracrRNA and Alt-R® S. pyogenes HiFi Cas9 nuclease V3 were obtained from IDT. The "Cas9 targeted sequencing" protocol from ONT was followed. Briefly, the ribonucleoprotein particles (RNP) were generated by 1) annealing the crRNAs with the tracrRNA in equimolar amounts and 2) adding HiFi Cas9 nuclease. d) Sample preparation:

5 pg of DNA was dephosphorylated using Quick Calf Intestinal Phosphatase (NEB) as outlined by the ONT Cas9-targeted sequencing protocol and as described already for the example 2. The dephosphorylated DNA was then cleaved and A-tailed by adding the previously prepared RNPs, dATP (NEB), and Taq polymerase (NEB). This mixture was incubated at 37°C for 15 minutes, 72°C for 5 minutes and then placed on ice. The rest of the ONT Cas9-targeted sequencing protocol was followed using an SQK-LSK109 library preparation kit (ONT) to ligate sequencing adapters, bead clean, and load the cleaned library. The final sequencing library was loaded onto an R9.4.1 FLO-MIN106 flow cell and sequenced for 24 hours on a MinlON device. e) Preparation of the nucleoprotein particles

Cas9 crRNAs, tracrRNA, and Alt-R® S. pyogenes HiFi Cas9 nuclease V3 were obtained from IDT.

The "Cas9 targeted sequencing" protocol from ONT was followed and the preparation of the nucleoprotein particles was performed as described already for example 2. f) Data analysis

Raw sequencing data was basecalled using Guppy 3.6.0+98ff765. The ONT sequencing reads were sequentially mapped against the construct sequence used for transformation and then to the parent corn genome to identify reads that contained both the T-DNA and genomic sequence (i.e. chimeric junction reads). Mapping was performed with minimap2.17-r941 using the parameters -a -x map-ont and -r 10000. Alignments were visualized with the Integrative Genomics Viewer - IGV 2.8.0. The minimap2 software is described in Heng Li, Minimap2: pairwise alignment for nucleotide sequences, Bioinformatics, Volume 34, Issue 18, 15 September 2018, Pages 3094-3100, https://doi.org/10.1093/bioinformatics/btv191 .

Figure 2 shows a flowchart of a method for preparing a sample comprising sample polynucleotides for adapter-based sequencing using a nuclease which generates sticky ends. For example, the nuclease generating sticky ends can be Cpfl , also known as Cas12a. An example for adapter-based sequencing using guide RNAs bound to Cpfl is described below as “example 4”.

Example 4: Cpfl -mediated targeted sequencing in corn

The goal of this experiment was to repeat example 3 and to discover the previously unknown sequence surrounding the inserted T-DNA. However, instead of the nuclease Cas9 which was used in example 3, the guide RNAs in example 4 were designed such that they can form a nucleoprotein complex with the nuclease Cpfl and can guide Cpfl to a corresponding protospacer and PAM at two opposing ends of a known sequence. Again, the guide RNAs were designed in opposite orientations within the promoter of the selective marker gene (figure 14). a) Generating genetically modified plants, nuclei isolation and DNA extraction These steps were performed as described for example 3. b) guide RNA design:

Four guide RNAs (Alt-R L.b. Cas12a crRNA, IDT) were designed with homology to known sequence contained within the T-DNA (the promoter of the selective marker gene). Two crRNAs were designed at the 5' end of the promoter of the gene on the antisense strand. The other two crRNAs were designed at the 3' end of the promoter of the gene on the sense strand (Figure 14). Several modifications to the sample preparation protocol were made to make the sample protocol compatible with Cpfl . c) Sample preparation and sequencing:

The "Cas9 targeted sequencing" protocol from ONT was modified to make it compatible with Cpfl . When Cas9 cuts DNA, it creates blunt-ends which are then directly A-tailed. When Cpfl cuts DNA, it creates 5' overhangs that must be filled in and then A-tailed. In order to make the sample preparation protocol compatible with Cpfl , the following protocol modifications were made: the buffer system used for the cutting reaction was switched such that dATP was replaced with all four dNTPs, and T4 polymerase was added in addition to Taq polymerase.

The sample preparation method for Cpfl comprises:

- A crRNA pool was made by mixing the two pairs of oppositely oriented crRNAs in equal volume (100 pM each)

- Cpfl RNPs were formed by mixing 0.3 pl_ of the pooled crRNAs with 3 mI_ of 10X NEBuffer 2.1 (NEB), 26.4 mI_ of nuclease-free water, and 0.3 mI_ of EnGen Lba Cas12a that had been diluted to 62 mM in 1X NEBuffer 2.1. 1 x NEBuffer 2.1 comprises 50 mM NaCI, 10 mM Tris-HCI, 10 mM MgCI2 and 100 pg/ml BSA and has a pH value of 7.9 at 25°C. The mixture was incubated at 25°C for 10 minutes. No annealing step with a tracrRNA was performed as the crRNA can directly bind to Cpfl nucleases;

- Step 102 was performed by dephosphorylating sample DNA ends: 5 pg of DNA in a volume of 24 pL was dephosphorylated by adding 3 pL of 10X NEBuffer 2.1 (NEB) and 3 mI_ of calf intestinal phosphatase (CIP). The dephosphorylation reaction was incubated at 37°C for 10 minutes and then 80C for 2 minutes.

- To this reaction 10 mI_ of Cpfl RNPs, 1 pL of 10 mM dNTPs, 1 mI_ of Taq polymerase (5000 units/mL, NEB), and 1 mI_ of T4 DNA polymerase (3,000 units/mL) was added. The T4 polymerase is added for generating blunt ends from the sticky ends generated by Cpfl . Adding dNTPs instead of dATPs ensures that the sticky ends can be filled up with any nucleotide required. Taq polymerase has terminal transferase activity and preferentially adds adenine to 3' ends (even in the presence of other nucleotides). The cleavage, fill-in and A-tailing reaction was incubated at 37°C for 10 minutes, 72°C for 5 minutes and then placed on ice. This step represents the reaction steps 104 and 202 of figure 2, whereby both reactions are performed in the same buffer system in a single step;

- A ligation mastermix is made using components from an SQK-LSK109 library prep kit from ONT. This contained 20 mI_ of LNB (ONT), 3 mI_ of nuclease-free water, 10 pL of NEBNext Quick T4 DNA Ligase (NEB), and 5 mI_ of AMX (AMX, ONT). T4 ligase requires both 5' phosphate and 3' A-tail to attach a T-tailed adapter. Hence, only the cut ends where no RNP sits are ligated to an adapter.

- In step 106, 20 mI_ of the above mastermix is added to the cleaved and A-tailed reaction and mixed by gently flicking the tube. Once thoroughly mixed the remaining 18 mI_ are added and mixed in the same way. The reaction was incubated at room temperature for 10 minutes.

- The adapter-ligated fragments were purified by adding 1 volume (80 pL) of 1X TE (pH 8) and 0.3x volume (48 pl_) of AMPure XP Beads (Beckman Coulter) followed by gentle inversion to mix. The reaction was incubated at room temperature for 10 minutes. The magnetic beads were then pelleted on a magnet and washed twice with 250 mI_ of LFB (ONT). Beads were then resuspended in 13 pL of elution buffer EB (ONT) and incubated at room temperature for 10 minutes. Beads were again pelleted on a magnet and the supernatant containing adapter- ligated fragments was removed to a new tube.

- A FLO-MIN106 flowcell (MinlON/GridlON from ONT) was primed by adding 30 pL of FLT (ONT) to a tube of FB (ONT) and then loading 800 mI_ of this mixture into the priming port of the flowcell.

This was allowed to sit for 5 minutes.

- After the 5-minute incubation, the SpotON port was opened and then another 200 mI_ of FB/FLT was added to the priming port.

- In step 108, 12 mI_ of the eluted sequencing library are mixed with 37.5 mI_ SQB (ONT) and 25.5 mI_ of LB (ONT). This mixture is then loaded into the SpotON port of the primed flowcell. The SpotON port and priming ports were closed and sequencing was carried out for 24 hours on a Min ION sequencing device. d) Data analysis:

Raw ONT sequencing data was basecalled using Guppy 4.2.2+effbaf8. ONT reads were sequentially mapped against the construct sequence used for transformation and then to the parent corn genome to identify reads that contained both the T-DNA and genomic sequence (i.e. chimeric junction reads). Mapping was performed with minimap2.17-r941 using the parameters -a -x map-ont and -r 10000. Alignments were visualized with IGV 2.8.0.

Sequencing reads analysis results supported the findings from experiment 3 that the sample contained an inverted repeat of the T-DNA inserted within exon 2 of the Isocitrate dehydrogenase gene between (3:216,967,972 - 3:216,967,987) on chromosome 3. In this experiment ~33 kb of flanking information was obtained. This experiment demonstrates that the sample preparation protocol using two or more oppositely oriented guide RNAs is compatible with multiple Cas nucleases.

Figure 3 shows a flowchart of a method for identifying, synthesizing, and using a pair of guide polynucleotides facing away from each other.

First, in steps 302-304, a known polynucleotide sequence (e.g. the “known sequence” 605 in Figure 7, e.g. a marker gene) is examined to determine whether and, if so, where it contains so-called “protospacers” and "PAM" sequences. The protospacer is a sequence motif in the sample polynucleotide which is complementary to the binding sequence of a guide RNA part that is responsible for targeted binding. The PAM motif (Protospacer adjacent Motif) is a DNA sequence motif that is required for a strand break mediated by a Cas-type endonuclease. The PAM motif being searched for depends on the endonuclease to be used. It may be possible to search for the PAM motifs of multiple different Cas endonucleases and select the endonuclease depending on the PAM motifs identified in the known sequence region. The two PAMs identified in these steps should each be contained at different ends of the known sequence 605. The closer the two identified PAMs are to the center of the known sequence, the greater the proportion of the already known sequence region that is unnecessarily co-sequenced.

In steps 306-308, the immediate sequence environment of the two PAMs identified in steps 302 and 304 is then analyzed to determine which DNA nucleotides are present in a “homologous sequence region” (see sequence regions 718 and 720 in figure 8). The “homologous sequence region” is a region within the known sequence 605 of predefined length at a predefined distance upstream or downstream of the PAM. The length and distance of the homologous sequence region, as well as whether the region to be analyzed is upstream or downstream of the identified PAM, also depends on the type of endonuclease to be used.

Steps 306 and 308 may include various analyses of the DNA sequence contained in said predefined region. The analysis may include, for example, determining whether the DNA sequence found in said region tends to form hairpin formations, whether said sequence is unique within the genome of the organism under investigation or occurs multiple times, whether the GC content is in a proportion range favorable for binding guide RNA, etc. For example, the GC content of the guide sequence should be 20-80% and preferably between 40% and 60%. High GC content stabilizes the RNA-DNA duplex while destabilizing off-target hybridization. The length of the guide sequence should be between 17-24bp noting a shorter sequence minimizes off-target effects. Guide sequences less than 17bp have a chance of targeting multiple loci.

The analysis may include further checks which are essentially identical or similar to the checks performed during DNA primer design. For example, the guide polynucleotides are designed to avoid self-complementarity, and the guide polynucleotides are further designed such that their binding sequence is not homologous to any further sequence in the known sequence or in the sample polynucleotides.

If the analysis shows that the sequence within the region in question is unsuitable for binding a guide RNA, this PAM and its sequence environment are discarded as a potential binding site of a guide polynucleotide to be synthesized and the analysis is repeated at the nearest PAM until a pair of first and second PAMs together with the adjacent "sequence of homology" region can be identified, which allows specific binding of a pair of two different RNPs in the spatial orientation to each other, e.g., in the reference to the RNP of the organism to be synthesized, e.g. as described in reference to figures 4 and/or 8.

After having identified a pair of homologous sequences 718, 720 in appropriate orientation and position in the known sequence 605 next to a PAM 714, 716, a pair of oppositely oriented guide polynucleotides 706, 708 is synthesized which comprise a binding sequence 715, 713 which is homologous to the identified homologous sequences 718, 720. For some types of Cas nucleases, at first a pair of oppositely oriented crRNAs is synthesized which comprise the said binding sequences and this pair of crRNAs is in case needed annealed with a tracrRNA to provide the pair of guide polynucleotides. For other types of Cas nucleases, no tracrRNA and no respective annealing step is required.

In steps 314 and 316, the two different types of guide polynucleotides are contacted with an effector protein, e.g. a particular Cas nuclease such as Cas9 or Cas12a (Cpfl), thereby providing two types of RNPs adapted to perform a cutting reaction at a specific cleavage site and in a specific orientation relative to each other and to the sample polynucleotide to be cut that the RNPs remain bound to specific ends generated by the cutting as described e.g. with reference to figure 4.

Finally, in step 318, the two types of RNPs are used for preparing a sample comprising multiple sample polynucleotides, e.g. a sample comprising genomic DNA of a wild-type organism or of a genetically modified organism, for targeted sequencing of the unknown/insufficiently characterized regions of interest 604 surrounding the known sequence 605.

Figure 4 shows the creation of a pair of oppositely oriented guide polynucleotides (e.g. guide RNAs consisting of a combination of a crRNA and a tracrRNA or consisting of a single crRNA).

The guide RNAs are created at the ends of a known sequence region, e.g. of a marker gene. The guide RNA 336 binding near the 5’ end of the antisense strand 332 of the known sequence to said 5’ antisense strand is called forward guide or "F". The guide RNA 334 binding near the 5’ end of the sense strand of the known sequence to the 5' sense strand 330 is called reverse guide or "R”.

RNPs containing an "F" guide polynucleotide cut both strands of the sample polynucleotide (sample DNA comprising the known sequence) and remain bound to the 5' end of the antisense strand 332 fragment generated by the cutting, thereby releasing the 3' cut end of the antisense strand. RNPs containing an "R" guide polynucleotide cut the sample polynucleotide and remain bound to the 5' end of the sense strand 332 fragment generated by the cutting, thereby releasing the 3' cut end of the sense strand.

Figure 5 shows an illustration of the sample preparation method where the sample is contacted with a mixture of two different types of RNPs.

A DNA sample 400 is provided and the ends of the DNA strands comprised in the sample (“sample DNA”, “sample polynucleotides”) are protected in a dephosphorylation step.

In addition, at least two different guide RNAs are synthesized. Each of the guide RNAs comprises a binding sequence adapted to bind to a corresponding target sequence in one of the sample polynucleotides, whereby the position and orientation of the binding sequences of the guide RNAs are selected such that the two different guide RNAs will bind to and guide their bound effector proteins to different target sequences within a known sequence within the sample DNA and are selected such that the effector proteins will remain bound to the one of the two cut ends generated by said effector protein which belongs to the portion of known sequence lying in-between the two binding sites. The two different guide RNAs are pooled and contacted with effector proteins such that a pool of two different types of nucleoprotein particles is formed in a separate sample 402. The RNP complex pool 402 may comprise more than two different guide RNAs and hence more than three different RNP complexes.

The DNA sample with the blocked ends 400 and the RNP complex mixture 402 are mixed together in a buffer whose composition will depend on the type of nuclease used. Assuming the nuclease is e.g. Cas9, the buffer will comprise a Taq Polymerase and dATPs and the Taq Polymerase will add a polyA tail to the ends generated by the targeted cleavage reaction performed by the Cas9 nuclease.

Finally, selectively those sample polynucleotide ends which were generated by the cutting by the RNP complexes and which belong to the portion of known sequence lying in-between the two binding sites are ligated with a sequencing adapter. The sample polynucleotide fragments are then washed, e.g. with an alcohol-free, bead-based washing technology such as Ampure XP.

Figure 6 shows an illustration of a sample preparation method which is an alternative to the approach depicted in figure 5. Each of the at least two different guide polynucleotides is provided in a separate sample tube and mixed with an effector protein, e.g. a Cas9 nuclease. Hence, for n different guide RNAs, n different tubes 502, 504 are provided and the same amount of effector protein is added to each of said tubes so n different types of RNP complexes can form. In addition, the DNA sample is blocked as described before and is split into as many DNA sample tubes as there exist different RNP complex sample tubes. Then, each RNP complex sample tube is mixed with a respective one of the blocked DNA sample tubes. In each of the said n different mixtures, the RNP complexes perform a cleavage reaction and a polyA tail is attached to the ends generated by the cutting. The adapter ligation may also be performed in n different tubes. Finally, the cleaved and adapter- ligated n samples are pooled, purified and input into a sequencing device. According to a further variant, the pooling of the n different RNP complex samples with the sample polynucleotide into a single sample can already be performed before or when performing the cleavage reactions. This would further reduce costs and the time required for sample preparation.

Figure 7 is a further illustration of sample preparation and adapter-based sequencing as described already with reference to figure 5.

A polynucleotide sample 400, e.g. a genomic DNA sample, is provided. The sample comprises a plurality of sample nucleotides 602, e.g. chromosomes. One of the sample nucleotides comprises a region with a known sequence referred herein as “known sequence” 605. The region with the known sequence may be a natural part of the wild-type genome of an organism or may be an insertion sequence introduced into the sample polynucleotide via a genetical engineering technique. It may be desirable to characterize the sequence of the two regions 604 referred to as “regions of interest” adjacent to the known sequence 605. In order to achieve this, the sample polynucleotides are protected such that the sequencing adapter is blocked from binding to said polynucleotide ends. For example, the protection can be achieved by a phosphatase which removes a phosphate group from all 5’ ends of the sample polynucleotides.

Then a mixture of at least two different types of RNPs are added. The first type of RNP complexes comprise a guide polynucleotide having a first binding sequence being adapted to specifically bind to a first target sequence in the known sequence and the second type of RNP complexes comprise a guide polynucleotide having a second binding sequence adapted to specifically bind to a second target sequence in the known sequence. The first and the second binding sequences are selected such that the first and second guide polynucleotides will bind to different target sequences within a known sequence within the sample polynucleotide and are selected such that the effector proteins will remain bound to the one of the two cut ends generated by said effector protein which belongs to the portion of known sequence lying in-between the two binding sites.

The added RNPs will cut the sample polynucleotide comprising the known sequence at the cleavage site determined by the binding site of the guide polynucleotide which has guided the effector protein to the sample polynucleotide. Hence, if the RNP complex pool comprises two different RNP complexes with the said two different guide polynucleotides, a central fragment 614 and two remaining arms 610, 612 will be generated.

However, in case of the alternative sample preparation approach depicted in figure 6, it is possible that the cutting reaction is performed in separate tubes for each of the different guide polynculeotides/RNP complexes. In this case, no central fragment is generated. Nevertheless, the sequencing result is the same as the RNP complex will remain bound to the cut end belonging to the portion of the known sequence lying in-between the two binding sites and will thereby prohibit the sequencing in the direction towards the center of the region in-between the two guide polynucleotide binding sites.

The buffer for performing the cleavage step may in addition comprise a Taq polymerase and dATPs and the Taq polymerase will add a polyA tail 608 to all polynucleotide ends which are not blocked by the RNPs which remain bound to the cut ends belonging to the region lying within the two binding sequences/cut sites.

Finally, a sequencing adapter 616 is added which will selectively bind to polynucleotide ends having a polyA tail and which have not been protected by the dephosphorylation reaction.

The adapter-sample-polynucleotide-arm-complexes are cleaned and added into a flow cell for performing a targeted sequencing reaction. For example, the targeted sequencing reaction can be a nanopore sequencing reaction wherein a new polynucleotide strand is synthesized through a pore complex 618 integrated in a membrane. The sequencing adapter 616 is adapted to guide the adapter- sample-polynucleotide-arm-complex to a pore for initializing the sequencing reaction.

Figure 8 illustrates the cleavage of a double stranded polynucleotide (e.g. a DNA strand) at two cleavage sites 722, 724 within a known sequence region 605 into a central fragment 614 and two remaining arms 610, 612 using two different types of ribonucleoprotein complexes (RNPs) 702, 704. The known sequence 605 can be the sequence of a marker gene or another known sequence.

Figure 8 illustrates the positions of the PAMs 714, 716 relative to their respective sequence of homology 718, 720 and complementary protospacers 710, 712 for effector proteins of the Cas9 type: the PAM sequence 714, 716 for Cas9 is located next to the 3’ end of the sequence of homology 718, 720, in contrast to the PAM of Cpf1 and other sgRNA nucleases which is located next to the 5’ end of the sequence of homology. The sequence of homology 718, 720 is homologous to a “polynucleotide binding sequence” within a guide polynucleotide and complementary to a region 710, 712 referred to as protospacer.

In order to achieve the desired opposite orientation of the two types of guide polynucleotides and the respectively created RNPs, the identification and design of the guide polynucleotides comprises a step of analyzing the known sequence of the sample polynucleotide for identifying the following patterns:

For creating guide polynucleotides for Cas9 and other endonucleases requiring the same relative position of the sequence of homology 718, 720 and the PAM 714, 716, the sample polynucleotide is searched for the following pattern:

For a “forward” (F) guide RNA, the search is carried out on the sense strand in the 5’ to 3’ direction (left to right in figure 8).

If a PAM is identified then the “sequence of homology” is evaluated to determine if it will make a suitable “binding sequence”. The evaluation may comprise determining if the GC content of the candidate binding sequence is within a predefined allowable/suitable range, if there is selfcomplementarity, or if there exist other homologous sequences elsewhere in the genome, etc. In the case of Cas9 this sequence of homology is upstream of this PAM and in the case of Cpfl the sequence of homology is downstream of this PAM.

Therefore, the pattern search for Cas9 comprises searching for the following pattern:

5 ' -[sample polynuc.]-[sequence of homology]-[PAM]-[sample polynuc.]- 3’

If a match was found for this pattern in the sense strand of the known sequence next to the 3’ end of the sense strand of the known sequence, the sequence of homology within this match can be used as the binding sequence in an F-guide polynucleotide to be synthesized.

For a “reverse” (R) guide RNA, the search is carried out on the antisense strand in the 5’ to 3’ direction (right to left in figure 8).

If a match was found for this pattern in the antisense strand of the known sequence next to the 3’ end of the antisense strand of the known sequence, the sequence of homology within this match can be used as the binding sequence in an R-guide polynucleotide to be synthesized. The expression “next to” may mean within a predefined maximum distance. For creating guide polynucleotides for Cpf1 and other endonucleases requiring the same relative position of the sequence of homology and the PAM, the sample polynucleotide is searched for the following pattern:

5 ' -[sample polynuc.]-[PAM]-[sequence of homology]-[sample polynuc.]-3

The sequence of homology becomes (is exactly the same as) the binding sequence when it is synthesized and this binding sequence is complementary to the protospacer.

If a match was found for this pattern in the sense strand of the known sequence next to the 3’ end of the sense strand of the known sequence, the sequence of homology within this match can be used as the binding sequence in an F-guide polynucleotide to be synthesized (the F guide polynucleotide binds to the antisense strand).

If a match was found for this pattern in the antisense strand of the known sequence next to the 3’ end of the antisense strand of the known sequence, the sequence of homology within this match can be used as the binding sequence in an R-guide polynucleotide to be synthesized (the R guide polynucleotide binds to the sense strand).

The pairs of F- and R- guide polynucleotides synthesized in this way are pointing away from each other.

After the cutting, the effector proteins will remain bound to the end of the one of the two fragments generated by the cutting at a given cleavage site which comprises the sample polynucleotide region 614 between the two cleavage sites 722, 724. This region is a separate central fragment in the case of the sample preparation protocol variant depicted in figure 5 or can be part of the other one of only two fragments generated in case of the sample preparation protocol variant depicted in figure 6 (a mixture comprising the sample DNA and two different RNPs will cut the sample DNA comprising the known sequence into a central arm and two remaining fragments; two separate mixtures respectively comprising the DNA and only one of the two different RNPs will not provide a central fragment and two arms, but only two fragments whereby the guide polynucleotides are designed such that the RNPs will always remain bound to the end of the one of the two cut fragments which comprises the “in-between” sequence 614, thereby preventing sequencing of the known sequence region 614).

In many effector proteins of the Cas family, e.g. Cas9, the sequence of homology is about 20-nt long and is located at the 5' end of the gRNA to be synthesized. The PAM is a short DNA sequence usually 2-6 base pairs in length that follows the DNA region targeted for cleavage. The PAM is required for a Cas nuclease to cut and is generally found 3-4 nucleotides downstream from the cut site. After base pairing of the gRNA to the target sequence 710, 712 mediates a double strand break, in the case of Cas9 about 3-nt upstream of PAM.

According to embodiments, the sample polynucleotides are double-strand polynucleotides, wherein the cutting is performed at a first and a second cleaving site. The first and second cleavage sites both lie within the known sequence of one of the sample polynucleotides, preferably close to (e.g. within a predefined maximum distance from) the two ends of the known sequence, e.g. the ends of a marker gene of known sequence. The two cleavage sites define the ends of an “in-between” polynucleotide region within the two cleavage sites.

In case the two different types of RNPs are added to the sample polynucleotides in the same reaction tube, the sample polynucleotide comprising the known sequence will be cut into a central fragment and two remaining arms, whereby the central fragment corresponds to the “in-between” polynucleotide region.

In case the two different types of RNPs are contacted with the sample polynucleotides in two separate reaction tubes, the sample polynucleotide comprising the known sequence will be cut only once into a left part and a right part, whereby in each reaction tube/reaction the RNP complex will remain bound to the one of the two parts comprising the “in-between” polynucleotide region.

Thereby, it is ensured that in both alternative approaches not the “in-between” region but rather the sample polynucleotide regions surrounding this “in-between” region of the two cleavage sites is sequenced.

Figure 9 shows the binding sites (target sequences) and PAMs of the two pairs of guide polynucleotides oriented in opposite directions within the blaTEM gene (TEM-1 , NG_050145.1). The pairTEM-1_Cas9_R, TEM-1_Cas9_F) is adapted to work with the Cas9 nuclease. The pair (TEM- 1_Cas12a_R, TEM-1_Cas12a_F) is designed to work with the Cpfl nuclease. As can be inferred from figure 9, the relative position of a target sequence and a PAM of a guide RNA may depend on the type of guide RNA and effector protein to be used. For example, the relative positions of target sequence and PAM of Cas9 and Cas12a are different from each other.

Figure 10 illustrates the location and orientation of the binding sequences of two further pairs of guide polynucleotides oriented in opposite directions within a target gene. Figure 10 shows the sugar beet guide RNAs designed as described in example 2.

Figure 10A shows the T-DNA of a Ti-Plasmid bounded by the left border (LB) and right border (RB) within the cloning construct used to generate the sugar beet line used in the example. Shown are the locations of the crRNA (CRISPR RNAs (crRNAs)) cut sites as four small bars above one of the marker genes. Figure 10B shows a zoomed view of the selective marker gene within the T-DNA and the location and orientation of the crRNA guides (F1 , F2, R1 , and R2) used for the sample preparation and sequencing method of example 2.

Figure 11 shows sugar beet sequencing reads starting from two cleavage sites within a known sequence in a marker gene comprised in an insert polynucleotide. The insert polynucleotide is the T- DNA of a Ti-Plasmid described in figure 10. Figure 11 shows the sugar beet sequencing reads generated in example 2. The reads comprise portions of the integration polynucleotide (here: the T- DNA), but most of the reads in addition comprise sequence information of the genomic regions adjacent to the T-DNA. The alignment of sequencing reads against the T-DNA shows that almost all reads originate at the crRNA sites and extend outwards toward the left border (LB) and right border (RB) of the T-DNA. Most reads also extend past the borders into the chromosome where the T-DNA was integrated. Most reads do not match the Ti Plasmid sequence outside of the T-DNA region indicating that they cover the genomic sites of insertion. Dark grey lines indicate read orientation in the reverse direction. Grey lines indicate read orientation in the forward direction. Light grey lines indicate portions of reads that do not match the construct sequence (i.e., the Ti-plasmid sequence outside of the T-DNA portion).

Figure 12 shows a zoomed view of the sugar beet sequencing reads originating at crRNA sites, a zoomed view of the selective marker gene and the location and orientation of the crRNA guides (F1 , F2, R1 , and R2) used in example 2. The alignment shown in figure 12 was generated with minimap2 (whereby all reads were matched against the cloning construct used). The alignment depicted in figure 12 shows that almost all reads originate at the crRNA sites and extend outwards toward the left border (LB) and right border (RB) of the T-DNA. Dark grey lines indicate read orientation in the reverse direction. Grey lines indicate read orientation in the forward direction. Light grey lines indicate portions of reads that do not match the construct sequence.

Figure 13 shows sugar beet sequencing reads starting from two cleavage sites within an insertion polynucleotide having been integrated into chromosome 8. The location of the T-DNA insertion is revealed on chromosome 8 when the reads mapping to the cloning construct are then mapped to the sugar beet parent genome. The orientation of the reads shows that they originate elsewhere (within the inserted T-DNA at the crRNA sites in the selective marker gene) and then begin to match the reference genome at the site of insertion (8:47,887,187 - 8:47,887,191). They then extend outwards into the genomic sequence. The consensus sequence obtained surrounding the insertion site is 61 kb. Dark grey lines indicate read orientation in the reverse direction. Grey lines indicate read orientation in the forward direction. Light grey lines indicate portions of reads that do not match the parent genomic sequence.

Figure 14 shows the position of the guide RNAs designed in example 3. Figure 14A shows the T- DNA bounded by the left border (LB) and right border (RB) within the cloning construct used to generate the sugar beet line used in the study. The locations of the crRNA cut sites are marked. Figure 14B shows a zoomed view of the selective marker gene and the location and orientation of the crRNA sites (F1 , F2, R1 , and R2) used for the targeted sequencing described in example 3.

Figure 15 shows the corn sequencing reads obtained by an adapter-based sequencing reaction based on the four guide polynucleotides illustrated in figure 14. The corn sequencing reads contain parts of the construct sequence (T-DNA). The alignment of sequencing reads against the cloning construct shows that almost all reads originate at the crRNA sites (arrows) and extend outwards toward the left border (LB) and right border (RB) of the T-DNA. Grey lines indicate read orientation in the forward direction. Black lines indicate read orientation in the reverse direction. A significant number of reads aligning in reverse direction of the LB region also have supplementary alignments mapping in the forward direction. This suggests that the insertion has been duplicated and that the inserted DNA exists as an inverted duplication with the LB site in the middle and two RB sites at the genomic sites of insertion.

Figure 16 shows the corn sequencing reads obtained for an insertion polynucleotide having been integrated into chromosome 3 of the corn genome. The location of the T-DNA insertion is revealed on chromosome 3 when the reads mapping to the cloning construct are mapped to the corn parent genome. The orientation of the reads shows that they originate elsewhere (within the inserted T-DNA at the crRNA sites in the selective marker gene) and then begin to match the reference genome at the site of insertion (3:216,967,972 - 3:216,967,987). They then extend outwards into the genomic sequence. The consensus sequence obtained surrounding the insertion site is 122 kb. Grey arrows indicate read orientation in the forward direction. Black arrows indicate read orientation in the reverse direction.