NUCLEIC ACID AMPLIFICATION AND IDENTIFICATION METHOD

Title:

NUCLEIC ACID AMPLIFICATION AND IDENTIFICATION METHOD

Document Type and Number:

WIPO Patent Application WO/2020/120747

Kind Code:

Abstract:

The present invention provides a method for generating labelled amplification fragments of a nucleic acid template comprising the steps of providing said template nucleic acid, annealing at least one oligonucleotide primer to said template nucleic acid, elongating the at least one oligonucleotide primer in a template specific manner thereby creating an elongation product, wherein said elongating reaction stops when the elongation product reaches the 5' end of the template nucleic acid or a nucleic acid elongation stopper that is annealed to the template nucleic acid downstream of the elongation product, providing an adaptor nucleic acid that comprises an identification sequence on its 5' end,wherein said identification sequence does not hybridize to the elongation stopper when in con- tact thereto, ligating the adaptor nucleic acid at its 5' end to the 3' end of the elongation product, thereby generating a labelled amplification fragment.

Inventors:

GÖPEL YVONNE (AT)
MOLL PAMELA (AT)
REDA TORSTEN (AT)
SEITZ ALEXANDER (AT)

Application Number:

PCT/EP2019/085095

Publication Date:

June 18, 2020

Filing Date:

December 13, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

LEXOGEN GMBH (AT)

International Classes:

C12Q1/6855

Domestic Patent References:

WO2013038010A2	2013-03-21
WO2012134884A1	2012-10-04
WO2013038010A2	2013-03-21
WO2014071361A1	2014-05-08
WO2016138500A1	2016-09-01
WO2016005524A1	2016-01-14

Foreign References:

EP3119886A1	2017-01-25
US20180163201A1	2018-06-14
US20100273219A1	2010-10-28
US20140274729A1	2014-09-18
EP3119886B1	2017-08-02
US20180163201A1	2018-06-14
US5554730A	1996-09-10
US8017339B2	2011-09-13

Other References:

SENA ET AL., SCIENTIFIC REPORTS, vol. 8, 2018, pages 13121
DEANGELIS ET AL., NUCLEIC ACIDS RESEARCH, vol. 23, no. 22, 1995, pages 4742 - 4743
BMC GENOMICS, vol. 6, 2005, pages 150
JIANG ET AL., GENOME RES., vol. 21, no. 9, 2011, pages 1543 - 1551
CHEN ET AL., BIOTECHNIQUES, vol. 30, no. 3, 2001, pages 574 - 582
ZIMMERMAN ET AL., PROC NATL ACAD SCI U S A., vol. 80, no. 19, 1983, pages 5852 - 6

Attorney, Agent or Firm:

SONN & PARTNER PATENTANWÄLTE (AT)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

1. A method for generating labelled amplification fragments of a nucleic acid template comprising the steps of

providing said template nucleic acid,

annealing at least one oligonucleotide primer to said template nucleic acid,

elongating the at least one oligonucleotide primer in a template specific manner thereby creating an elongation product, wherein said elongating reaction stops when the elongation product reaches the 5' end of the template nucleic acid or a nucleic ac id elongation stopper that is annealed to the template nucleic acid downstream of the elongation product,

providing an adaptor nucleic acid that comprises an identifica tion sequence on its 5' end, wherein said identification se quence does not hybridize to the elongation stopper or to the template,

ligating the adaptor nucleic acid at its 5' end to the 3' end of the elongation product, thereby generating a labelled amplifica tion fragment.

2. The method of claim 1, wherein in case the elongation prod uct reaches the 5' end of the template nucleic acid, a nucleo tide polymerase is allowed to add untemplated nucleotides to the elongation product, preferably by a terminal transferase activi ty of the polymerase, and/or preferably wherein 1 to 15 untem plated nucleotides are added in at least 70% of the extension products .

3. The method of claim 1 or 2, wherein a plurality of adaptor nucleic acids is provided and used in the ligation step, wherein said adaptors of the plurality have different identification se quences, preferably wherein at least 10, more preferred at least 50, adaptor nucleic acids with different identification sequenc es are provided and used in the ligation step.

4. The method of any one of claims 1 to 3, wherein the identi fication sequence is a random sequence.

5. The method of any one of claims 1 to 4, wherein the elonga- tion stopper has primer activity and is also elongated during the elongating step, preferably wherein at least 9, more pre ferred at least 49, elongation stoppers are used that have dif ferent annealing sequences for annealing to the template and thereby potentially anneal to different locations on the tem plate nucleic acid.

6. The method of claim 5, wherein the annealing sequence is a random sequence.

7. The method of any one of claims 1 to 6, wherein the adaptor nucleic acid(s) is/are bound to, hybridized to or is/are not bound or is/are not hybridized to the elongation stopper(s), preferably when the adaptor nucleic acids are bound to or hy bridized to the elongation stoppers then the identification se quence is independent of an annealing sequence of the elongation stopper for annealing the elongation stopper to the template.

8. The method of any one of claims 1 to 7, wherein the template is RNA, preferably wherein a reverse transcriptase is used for elongation .

9. The method of any one of claims 1 to 8, wherein the oligonu cleotide primer and preferably also the elongation stopper com prises a universal amplification sequence and/or wherein the adaptor nucleic acid comprises a universal adaptor amplification sequence .

10. The method of any one of claims 1 to 9, wherein the oligonu cleotide primer comprises an annealing sequence for annealing to the template, which comprises an oligo (T) sequence for anneal ing to an oligo (A) sequence in the template, preferably wherein said oligo (T) sequence comprises one or more 3' anchoring nu cleotides different from the oligo (T) sequence.

11. The method of any one of claims 1 to 10, wherein the liga tion reaction is in the presence of a crowding agent, preferably a polymer or polymer comprising compound, like a polyalkyl gly col, preferably PEG, Octoxinol or Triton X, or a polysorbate, preferably Tween; and/or wherein the elongation stopper and preferably also the oligonucleotide primer comprise (s) one or more modified nucleotide ( s ) that increase the melting tempera ture in an annealing sequence for annealing to the template.

12. The method of any one of claims 1 to 11, wherein at least one, preferably at least 9, elongation stopper has primer activ ity and is also elongated during the elongating step and at least two, preferably at least 10, adaptor nucleic acids that comprise different identification sequences are used, whereby at least two, preferably at least 10, different labelled amplifica tion fragments are generated, optionally amplifying the labelled amplification fragments, further comprising assembling the se quences of amplification fragments which are unique, wherein the labels are used to identify unique amplification fragments.

13. A kit for performing a method of any one of claims 1 to 12, comprising

at least one oligonucleotide primer capable of hybridizing to a template nucleic acid and priming an elongation reaction on its 3' end,

one or more elongation stoppers capable of hybridizing to a tem plate nucleic acid, preferably capable of priming an elongation reaction on its 3' end,

one or more adaptor nucleic acids that comprise an identifica tion sequence on its 5' end, wherein said identification se quence does not hybridize to the elongation stopper, preferably wherein the adaptor nucleic acid is bound to, hybridized to or is not bound or hybridized to an elongation stopper,

a reverse transcriptase, and an oligonucleotide ligase.

14. The kit of claim 13 comprising at least 10, more preferred at least 50, adaptor nucleic acids with different identification sequences .

15. The kit of claim 13 or 14, wherein at least one oligonucleo tide primer comprises an annealing sequence for annealing to the template, which comprises an oligo (T) sequence for annealing to an oligo (A) sequence in the template, preferably wherein said oligo (T) sequence comprises one or more 3' anchoring nucleo tides different from the oligo (T) sequence.

Description:

Nucleic acid amplification and identification method

The present invention relates to the field of nucleic acid anal ysis and amplification.

Background

US 2010/0273219 A1 describes a method for multi-primer am plification for barcoding target nucleic acids.

WO 2012/134884 A1 describes barcoding template nucleic acids in a multiplex amplification reaction.

WO 2013/038010 A2 describes a method for generating an am plified nucleic acid part of a template nucleic acid using oli gonucleotide primers and stoppers to prevent strand displacement and read-through by a polymerase that is used for generating the nucleic acid parts for sequencing. This method will remove bias es during nucleic acid amplification.

WO 2014/071361 A1 describes a method for making dual barcod- ed nucleic acids using barcoded adaptor nucleic acids.

US 2014/0274729 A1 describes a method for generating cDNA libraries using DNA polymerases with strand displacement activi ty.

EP 3 119 886 B1 describes a quantitative method to generate nucleic acid products from template RNA.

US 2018/163201 A1 relates to a reverse transcription method wherein a C-tail is added to the 3' end of the cDNA strand.

WO 2016/138500 A1 describes a method for barcoding nucleic acids for sequencing. Stochastic, i.e. random, barcodes are used as molecular labels.

Molecular labels, or unique molecular identifiers (UMIs), also called molecular barcodes, have been developed to identify PCR duplicates for reducing sequence specific PCR biases and for detecting rare mutations. Attaching unique molecular identifiers to RNA molecules, before any PCR amplification of a sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of a subsequent PCR amplification bias, which is particularly important where many PCR cycles are required, for example, when generating sequencing libraries from low template input amounts as in single cell studies. After PCR, molecules sharing the same sequence and also the same UMI are assumed to be identical copies derived from the same input molecule (Sena et al . ,

Scientific Reports (2018) 8:13121).

Summary of the invention

A goal of the invention is to provide an improved method of generating sequence fragments of a template nucleic acid which eases the allocation and assembly of said sequence fragments to a joined sequence that corresponds to the sequence of the tem plate nucleic acid. A desired improvement would also reduce se quence bias during fragment generation and increase coverage of sequence fragments over the whole length of the template to in crease confidence in the generated joined sequence.

Accordingly, the invention provides a method for generating labelled amplification fragments of a nucleic acid template com prising the steps providing said template nucleic acid, anneal ing at least one oligonucleotide primer to said template

nucleic acid, elongating the at least one oligonucleotide primer in a template specific manner thereby creating an elongation product, wherein said elongating reaction stops when the elonga tion product reaches the 5' end of the template nucleic acid or a nucleic acid elongation stopper that is annealed to the tem plate nucleic acid downstream of the elongation product, provid ing an adaptor nucleic acid that comprises an identification se quence on its 5' end, wherein said identification sequence does not hybridize to the elongation stopper when in contact thereto and preferably also not to the template, ligating the adaptor nucleic acid at its 5' end to the 3' end of the elongation prod uct, thereby generating a labelled amplification fragment.

The invention also provides a method of for generating la belled amplification fragments of a nucleic acid template com prising the steps providing said template nucleic acid, anneal ing at least one oligonucleotide primer to said template

nucleic acid, elongating the at least one oligonucleotide primer in a template specific manner thereby creating an elongation product, providing an adaptor nucleic acid that comprises an identification sequence, wherein said identification sequence does not hybridize to the template, ligating the adaptor nucleic acid preferably at its 5' end to the 3' end of the elongation product, thereby generating a labelled amplification fragment.

The invention further provides a kit suitable for performing the method. A kit of the invention may comprise at least one ol igonucleotide primer capable of hybridizing to a template nucle ic acid and priming an elongation reaction on its 3' end, one or more elongation stoppers capable of hybridizing to a template nucleic acid, preferably capable of priming an elongation reac tion on its 3' end, one or more adaptor nucleic acids that com prise an identification sequence on its 5' end, wherein said identification sequence does not hybridize to the elongation stopper, preferably wherein the adaptor nucleic acid is bound to, hybridized to or is not bound to an elongation stopper, a reverse transcriptase, and an oligonucleotide ligase. The dif ferent components of the kit may be provided in different con tainers, such as vials.

The following detailed disclosure reads on all aspects, in cluding methods and kits, and embodiments of the present inven tion. I.e. descriptions of methods may be a suitability of the kit. Any components described in the methods may be part of the kits. Components of the kit may be used in the inventive meth ods .

Detailed description of the invention

The present invention provides a method for generating la belled amplification fragments of a nucleic acid template where in an identification sequence is introduced as label before am plifying these fragments. A template nucleic acid can be present in multiple copies. According to the invention, fragmentation is usually a process that occurs during amplification, i.e. from a template of a given length, one or more (usually more) fragments are generated during amplification of parts of the template. The sequences of generated fragments may overlap when copies of tem plates generate at the same time fragments and the primers for synthesizing these complementary nucleic acid fragments anneal at different locations on different template copies. Although the inventive concepts work for a single fragment per template, preferably many fragments are generated from one template mole cule, usually by using multiple primers that bind at different locations to the template.

The invention improves prior methods by binding an identifi cation sequence to a generated fragment. Identification sequenc es can be introduced with the primer or after elongation, the synthesis of the complementary nucleic acid fragment. Then, the identification sequence is introduced by ligation of the elonga tion product with an adaptor nucleic acid. Surprisingly, the li gation reaction occurs with single stranded identification se quences, i.e. the parts of the identification sequences that have a non-hybridized (or "free") 5' end can ligate to the elon gated product's 3' end. The ligation reaction usually involves a phosphate residue that is preferably provided on the 5' end of the identification sequence. Surprisingly, no template or stop per sequence dependent, supported by hybridization, vicinity of the adaptor nucleic acid to the 3' end of the elongation product is needed (as shown in the examples) . Although such a vicinity can be supported by providing the adaptor nucleic acid with a complementary sequence part (downstream, i.e. 3' direction, of the identification sequence) for hybridization with an oligonu cleotide that is bound to the template (also referred to herein as elongation stoppers or just stoppers, which may also be fur ther primers in case more than one fragment per template is gen erated) , a directed vicinity is not needed and can be the result of an undirected simple diffusion process. In particular, it has been shown that the adaptor nucleic acid can be ligated to an elongation product that has reached the 5' end of the template nucleic acid, and where no further downstream elongation stopper is present. Such a ligation reaction can occur to this end of the elongation product directly or after a polymerase has added one or more untemplated nucleotides based on its terminal trans ferase activity that some polymerases possess. This ligation to the elongation product that corresponds to the 5' end of the template has some surprising and beneficial advantages: It in creases the occurrence of fragments at the 5' end of the tem plate and therefore the sequence coverage increases fundamental ly, which prior art methods lacked. In previous methods, the fragment start site distribution is constant which leads to a high coverage distribution by fragments in the middle of tem plates with much lower coverage, approaching zero, at its 3' and 5' ends (which is a result of number template copies, the aver age fragment size, and the sequencing read length) . This effect on the 5' end is mitigated by the inventive method. Furthermore, the invention also provides embodiments to increase coverage on the 3' end of the template too. The amplification fragments (generated as one fragment mole cule per elongation reaction) are usually further amplified, i.e. copied. This means that the ligated identification sequence is amplified, hence copied, as well. Usually, the identification sequences are so manifold that a random selection process is ca pable of uniquely identifying a single fragments which carry the same sequence but result from different copies of one template.

In all embodiments of the invention, the identification sequence helps to determine if fragment copies after sequencing are com ing from different copies of the template because they have dif ferent identification sequences or if they are coming from the same template molecule and are just copies made during said fur ther amplification.

A further method provides generating labelled amplification fragments of a nucleic acid template comprising the steps providing said template nucleic acid, annealing at least one ol igonucleotide primer to said template nucleic acid, elongating the at least one oligonucleotide primer in a template specific manner thereby creating an elongation product, providing an adaptor nucleic acid that comprises an identification sequence, wherein said identification sequence does not hybridize to the template, ligating the adaptor nucleic acid preferably at its 5' end to the 3' end of the elongation product, thereby generating a labelled amplification fragment. This method is essentially the same as above and all preferred embodiments described herein apply as well, safe that a stopper is not used. Multiple pri mers, possibly without stopper function can be used. Adaptor nu cleic acids can still be ligated to the elongation products af ter a diffusion process. For ligation, the elongation products can still be hybridized to the template or as single strand. However, preferably stoppers are used.

The inventive method starts with the step of providing said template nucleic acids. The template molecule is made accessible to a skilled practitioner for use in the inventive method. Usu ally the template is provided in a sample of nucleic acid mole cules. Such template nucleic acids may be isolated from a cell, such as eukaryotic or prokaryotic cells. In particular preferred embodiments, the template is RNA. Total RNA or a fraction of RNA, such as mRNA or rRNA-depleted RNA of a cell can be provid ed. RNA amounts that are easy to handle are e.g. 0.1 pg to 500 ng, 1 pg to 200 ng, 10 pg to 100 ng, or 0.1 ng - 100 ng rRNA- depleted RNA or 0.1 ng to 1000 ng total RNA. In some embodi ments, the amount of total RNA can e.g. 10 pg, and the amount of non-rRNA RNA can be below 1 pg. Primers, stoppers and adaptors are preferably DNA.

The method further comprises annealing at least one oligonu cleotide primer to said template nucleic acid. An oligonucleo tide primer is an oligonucleotide molecule, preferably DNA that anneals to the template and is capable for priming an elongation reaction as it is a standard practice in the art. The oligonu cleotide primer (or simply "primer") preferably anneals to the template in at least one part of its length of e.g. 4 nucleo tides to 30 nucleotides (nt) in length. Annealing is by hybridi zation. The primer may have a part that does not anneal to the template. Such further parts may be used to anneal to other oli gonucleotides and/or be used for the further amplification men tioned above when amplification fragments are further amplified to produce copies thereof. Such further parts or portions may thus have a sequence to which other primers bind for this ampli fication/copying reaction. Such a part is also referred to as primer linker sequence. A primer linker sequence preferably has 4 nt to 30 nt in length.

Returning to the main inventive method, the at least one ol igonucleotide primer is elongated in a template specific manner thereby creating an elongation product (complementary sequence) . Such reactions are standard in the art and usually make use of a polymerase. If the template is RNA, then an RNA-dependent poly merase is used, such as a reverse transcriptase. If the template is DNA, then a DNA-dependent polymerase is used. The elongating reaction stops when it reaches a nucleic acid elongation stopper that is annealed to the template nucleic acid downstream of the elongation product or when the elongation product reaches the 5' end of the template nucleic acid. Obviously, when the elongation reaction reaches the 5' end of the template and thus runs out of template, it stops. Some polymerase may add one or more non- templated nucleotides at this point to the elongation product, which is acceptable or may even be beneficial when selecting for 5' coverage product in the sequence analysis of the produced la belled amplified fragments. This addition of non-templated nu cleotides is however not necessary. Elongation reactions also stop when the elongation reaction reaches a nucleic acid elonga tion stopper that is annealed to the template nucleic acid down stream of the elongation product. Such a stopped reaction is de scribed in length in WO 2013/038010 A2 (incorporated herein by reference) . In this WO-document, the elongation stopper is re ferred to as "oligonucleotide stopper" or "further oligonucleo tide primer". According to the present invention one term is used, i.e. nucleic acid elongation stopper or just "elongation stopper" or just "stopper". This inventive stopper can also be a primer and then corresponds to the "further oligonucleotide pri mer" of WO 2013/038010 A2. In essence, such a stopper stops the elongation reaction of an upstream elongation reaction (hence, the stopper is downstream of the elongation product) by present ing an obstacle on the template. The stopper is annealed or hy bridized to the template and the elongation reaction does not displace the stopper and thus aborts. Read-through, i.e. dis placement of a stopper would be a side reaction. Measures to prevent displacement of the stopper are described in length in WO 2013/038010 A2 and these can be used according to the inven tion. Briefly, preferred methods and means to prevent displace ment of the stopper (due to stand displacement activity) are us ing an elongation stopper that comprises one or more modified nucleotides that increase the melting temperature in an anneal ing sequence for annealing to the template (a part of the stop per that anneals/hybridizes to the template) . Increase in melt ing temperature refers to an unmodified, natural nucleic acid, such as DNA or RNA. Such modifications are e.g. LNA (locked nu cleic acid), ZNA (zip nucleic acids), 2 fluoro nucleosides / 2 fluoronucleotides or PNA (peptidic or peptide nucleic acid) . Other measures are using a polymerase that does not have strand displacement activity or using intercalators . Preferably 1, 2,

3, 4, 5 or 6 nucleotides are modified. Preferably the modified nucleic acids are on the 5' side of the sequence part of the stopper that hybridizes to the template. There may be further parts of the stopper in 5' direction that do not hybridize - such as amplification sequences that act the same as described for the oligonucleotide primer described above for amplifica tion/copying in a further amplification reaction ("primer linker sequence") - in fact, such a further part is preferred for bind ing/hybridizing to the adaptor nucleic acid - see below. The adaptor may bind/hybridize to the "primer linker sequence" or to another part of the oligonucleotide stopper. In preferred embod iments, the elongation stopper and preferably also the oligonu cleotide primer comprise (s) one or more modified nucleotide ( s ) that increase the melting temperature in an annealing sequence (linker) for annealing to the template.

Preferably, after the elongation reaction, the primers and stoppers that are not bound to the template are removed in a pu rification step. I.e. the elongation products hybridized to the template are purified and retained for further processing. Other embodiments of the invention are done in a single volume without purification. Such a purification can be done by methods known in the art, e.g. immobilisation of the template or elongation products to a solid phase (e.g. beads) and washing to remove any unbound primers and stoppers. An example method is solid phase reverse immobilization (SPRI; DeAngelis et al . , Nucleic Acids Research, 1995, 23(22): 4742-4743).

The inventive method comprises the step of providing an adaptor nucleic acid that comprises an identification sequence on its 5' end. Further sequence tags, such as sequences for am plification (amplification sequences) may also be part of the adapter nucleic acid. The 5' end is the end that is intended for ligation to the 3' end of the elongation product for the lat ter's labelling by the identification sequence. The identifica tion sequence shall not hybridize to the elongation stopper nor to the template. Thus, it is usually single stranded and not hy bridized. Herein the term "identification sequence" is used for the 5' terminal part of the adaptor nucleic acid that is not hy bridizing or annealing - even if only parts of the identifica tion sequence would later be used for identification. Other parts of the adaptor nucleic acid may form a hybrid with, or an neal to, the elongation stopper. The adaptor nucleic acid may also comprise a complementary primer sequence, which is the tar get for a further amplification reaction of the labelled ampli fication fragments as mentioned above (called adaptor linker se quence) . The identification sequence can be prevented from hy bridization to the elongation stopper or to the template by se lecting a sequence for the identification sequence that has no complement on the elongation stopper. It is also possible to se lect the identification sequence so that it has no complement on the template. This can be easily done if the sequence of the template is known. If it is unknown but from a biological source, then the identification sequence can be selected from sequences that do not or rarely occur in biological nucleic ac ids. Such sequences are known from "spike-in" nucleic acids, such as ERCC (External RNA Control Consortium) sequences or SIRV (spike-In RNA variants) sequences (see e.g. ERCC, BMC Genomics 2005 6: 150; Jiang et al . , Genome Res. 2011, 21(9) : 1543-1551;

WO 2016/005524 Al , all incorporated herein by reference) . If the identification sequence would anneal to the template in a side reaction, then this situation would usually prevent ligation in the next step and thus not lead to a labelled fragment and is thus not seen as result. Such side reactions can be tolerated but are not preferred. The easiest and most preferred method to prevent annealing of the identification sequence (and preferably the entire adaptor nucleic acid) to the template is by simply providing the adaptor nucleic acid after the elongation reac tion. After the elongation reaction, the template is in form of a double strand with the elongation products (and the primer and stoppers) . In this form the adaptor nucleic acid cannot bind to the template anymore since the template is already covered by hybridization partners. In this preferred method, the identifi cation sequence may even have a sequence that is a complement to the template and may be capable to hybridize to the template but is hindered to do so by the succession of method steps. So, no consideration to template sequences is needed in this embodi ment .

The most preferred option to prevent annealing of the iden tification sequence to the stopper is that parts of the stopper and parts of the adaptor carry complementary sequences to each other. Because at an approach of adaptor to the stopper the com plementary sequences hybridize first and the identification se quence remains single stranded.

The inventive method further comprises ligating the adaptor nucleic acid at its 5' end to the 3' end of the elongation prod uct, thereby generating a labelled amplification fragment. Liga tion is usually performed using a ligase enzyme. The type of ligase depends on the nature of the oligonucleotides to be li gated and can be selected by a skilled practitioner. Example ligases include a DNA ligase or an RNA ligase. The ligase may also be an RNA ligase, especially an RNA ligase that has DNA li gating activity such as T4 RNA ligase 2. Further ligases are T4 DNA ligase, T4 RNA ligase 1, DNA ligase I, DNA ligase III, DNA ligase IV, E. coli DNA ligase, ampligase DNA ligase, truncated Rnl2, Rnl2 truncated K227Q, Thermus scotoductus ligase, Methano- bacterium thermoautotrophicum RNA ligase, thermostable App- ligase (NEB), Chlorella virus DNA ligase or SplintR ligase. The ligase may be a single strand ligase or a double strand ligase. Also possible are combinations of ligases for different reac tions in one reaction volume to be performed in parallel, e.g. when different elongation products and/or adaptor nucleic acid molecules are present and shall be ligated at the same time.

Preferred combinations are DNA ligase and RNA ligase or a single strand ligase and a double strand ligase. The ligase reaction usually involves a phosphate residue that is preferably provided on the 5' end of the identification sequence of the adaptor nu cleic acid. Also other 5' moieties can be used for ligation, e.g. ligation of adenylated ends. Such can be ligated with trun cated ligases or App-ligases.

The generated labelled amplification fragments will have the structure from 5' to 3' after the ligation of: primer sequence - elongation product sequence - adaptor sequence with identifica tion sequence bordering the elongation product sequence. The primer sequence may have a "primer linker sequence" and/or the adaptor sequence may have an "adaptor linker sequence". The products of the inventive method, i.e. the generated labelled amplification fragments, are preferably further amplified. Such a further amplification produces copies of the generated la belled amplification fragments by methods known in the art, such as PCR (polymerase chain reaction) or linear amplification. Such a further amplification usually involves the use of further pri mers that bind to the labelled amplification fragments, prefera bly on linker sequences, especially linker sequences located on the fragments ends, i.e. within the parts of the primer sequence and the adaptor sequence, in particular preferred on the 5' end of the primer sequence and the 3' end of the adaptor sequence.

As mentioned above with regard to these primers and adaptors, they may have regions of known sequence to bind such primers of the further amplification ("primer linker sequence" and "adaptor linker sequence") . These regions (or "parts") may be so long and specific to not bind to the template; they may be universal pri mer binding sites, i.e. not selective between different adap tors/primers - contrary to the identification sequence, which is preferably unique.

The identification sequence provides a unique label for an amplification fragment and is therefore also referred to as unique molecular identifier (UMI) herein. The identification sequences can identify replicates of the further amplification (e.g. PCR) and reduce the effects of sequence dependent

amplification bias. In preferred embodiments, the identification sequences are oligonucleotides with, mostly, random nucleotide distribution at each position which are ligated to extension products (fragments) prior to further amplification. If identi fication sequences are evenly distributed and their number is considerably larger than the number of identical extension products, then it is unlikely that the same identification se quence is ligated to two identical extension products (different copies) . In this case, the number of distinct identification se quences after further amplification is the same as the number before further amplification. Identification sequences of the invention can also be used as described for UMIs in Sena et al . (Scientific Reports (2018) 8:13121) . The entire sequence or parts of the entire sequence of the labelled fragment may be considered as a "read" in next generation sequencing methods and further sequence analysis. One or more reads are assembled during data analysis to obtain a joined sequence of the

template. Subsequently, data analysis may also become a

quantitative analysis of template molecules and fragments, which may provide insights if particular template copies are over or underrepresented, which e.g. hints are different expression rates of RNA splice variants. In preferred embodiments, the present invention further comprises the step of assembling the sequences of amplification fragments which are unique, wherein the labels are used to identify unique amplification fragments. Different identification sequences in the amplified labelled am plification fragments identify unique amplification fragments. The identification sequences enable duplicate and replicate identification and removal in the assembly or any other data analysis step.

In preferred embodiments, the identification sequence is 3 nt (nucleotides) or more in length, preferably 3 nt to 20 nt, especially preferred 4 nt to 15 nt or 5 nt to 10 nt, such as 3 nt, 4 nt, 5 nt, 6 nt, 7 nt, 8 nt, 9 nt, 10 nt, 11 nt, 12 nt, 13 nt, 14 nt, 15 nt or more in length. Such lengths are sufficient ly small for easy handling and efficient ligation reactions but still provide a sufficiently large amount of different identifi cation sequence due to nucleotide permutations in their nucleo tides to provide the desired identification of single amplified fragments, preferably to provide unique labels thereto.

In preferred embodiments, in case the elongation product reaches the 5' end of the template nucleic acid, a nucleotide polymerase is allowed to add non-templated nucleotides to the elongation product, preferably by a terminal transferase activi ty of the polymerase, and/or preferably wherein 1 to 15 untem- plated nucleotides are added in at least 70% of the extension products. As said above, such non-templated nucleotide addition is a property of some polymerases (see Chen et al . Biotechniques 2001, 30(3) : 574-582) . This activity is most prominent in reverse transcriptases, such as M-MLV (murine leukemia virus) reverse transcriptase or AMV (alfalfa mosaic virus) reverse transcrip tase. These non-templated nucleotides are usually of any nucleo tide type (A, T (U) , G, C) and may appear random. This means that elongation products of 5' ends of different templates may share the same sequence corresponding to the 5' end but then may con tinue by different, seemingly random further nucleotides that are the product of such non-templated addition. These different additions may be used to identify the exact position of the 5' end of the template sequence at the transition between the tern- plated repeating sequence and the non-templated random addi tions. After the non-templated nucleotides, the labelled frag ment continues with the identification sequence, which may be used as described above. In case the identification sequence is (also) random, the non-templated random nucleotides may be treated like a part of the identification sequence. The position of the identification sequence relative to the constant part the adaptor sequence identifies unambiguously the identification se quence .

In particularly preferred embodiments, a plurality of adap tor nucleic acids is provided and used in the ligation step. These adaptors of the plurality may have different identifica- tion sequences. This allows unique identification of the adap tors and the generated fragments to which they are ligated.

Preferably at least 10, more preferred at least 50, or even 100 or more or 200 or more, adaptor nucleic acids with different identification sequences are provided and used in the ligation step. In particular preferred, as many adaptors with different identification sequences are used as different generated frag ments with the same sequence are expected - or preferably more adaptors with different identification sequences. The expecta tion of the number of template copies can be based on the sample type, e.g. whole cell RNA, whole cell mRNA (transcriptome) , amount of the RNA, and the complexity of the sample (how many different transcript variants are targeted which can be either the whole transcriptome or just selected genes or transcripts as it is the case in gene panels), etc.

In particular preferred, the identification sequence is a random sequence. "Random sequence" is to be understood as a mix ture of different sequences with a high variance due to a random synthesis of at least a part of the identification sequence.

Random sequences potentially cover the entire combinatory area for said sequence for 4 naturally occurring nucleotides (A,

T (U) , G, C) . The random sequence may cover 1, 2, 3, 4, 5, 6, 7,

8, 9, 10, 11, 12, 13, 14, 15 or more nucleotides which are ran domly selected from A, G, C or T (U) . In terms of hybridizing capability of sequences of nucleotides T and U are used inter changeably herein. The full combinatory possible area for a ran dom sequence portion is m ⁿ, wherein m is the number of nucleotide types used (preferably all four of A, G, C, T (U) ) and n is the number of the random nucleotides. Therefore, a random hexamer, wherein each possible sequence is represented, consists of 4 ⁶ = 4096 different sequences. The identification sequence shall not bind to the template. In all cases, but especially for random identification sequences, it is preferred to add the adaptor nu cleic acid(s) after the elongation reaction. When the elongation product has reached the stopper (or end of the template) and es sentially the entire template is then in form of a double strand with the elongation products, then the adaptor nucleic acid(s) are prevented from binding to the template.

In further embodiments of the invention, the primers and stoppers are selected to bind to one or more particular target sequences of interest in a template nucleic acid (with the stop per being downstream for an elongation product) so that an elon gation sequence of a particular template part is obtained. Such targeting of specific regions is preferably used for transcripts (RNA) or genes (gDNA) as templates. Identification sequences are especially helpful when used in gene panels. Such as for the analysis of sequence variants of different species of templates, such as splice variants or other varying template sequences.

In especially preferred embodiments of the invention for all its embodiments and aspects, the elongation stopper has primer activity and is also elongated during the elongating step. This means that more than one primer is used and most have stopper function (i.e. prevent displacement - see above) . Using several primers means that a template yields many generated fragments, i.e. coverage is improved. Although primers bind to one template each they will provide comprehensive coverage when different primers bind to different locations on the template. The in ventive method using a plurality of primers (that preferably al so are stoppers) will increase coverage since a new extension product will start at a position on the template where an up stream elongation product has just stopped. This yields many fragments that cover the entire template. Further, it means also that stoppers/primers (in this embodiment used synonymously) are used that bind to different parts of a template molecule. In general, binding to the template molecule is determined by the annealing sequences of the primers and stoppers. This sequence hybridizes with the template and may be varied to bind to dif ferent locations on the template. Preferably at least 9, at least 10, more preferred at least 49, at least 50, e.g. 100 or more or 200 or more, elongation stoppers are used that have dif ferent annealing sequences for annealing to the template. There by they will potentially anneal to different locations on the template nucleic acid. Preferably the annealing sequence is a random sequence. Random sequences are described above with re gard to the identification sequence and the same applies to the annealing sequence of the primer, stopper and stoppers with pri mer function as well. Preferably, the random sequence of the an nealing sequence may cover 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,

12, 13, 14, 15 or more nucleotides which are randomly selected from A, G, C or T (U) . Preferably the adaptor nucleic acid(s) is/are bound to, hy bridized to, or is/are not bound to the elongation stopper (s) . Such a binding reaction, e.g. by chemical reaction, complex for mation or hybridization, facilitates positioning of the adaptor nucleic acid near the 3' end of the upstream elongation product to which its identification sequence that itself is not hybrid ized to the stopper or template, and which surprisingly is not required for the ligation reaction to work. Preferably when the adaptor nucleic acids are bound to or hybridized to the elonga tion stoppers, then the identification sequence is selected in dependent of an annealing sequence of the elongation stopper for annealing the elongation stopper to the template. Both the an nealing sequence and the identification sequence can be random sequences, preferably independently selected from each other.

This is usually guarantied when the nucleic acid parts of the stopper and of the adaptor are universal sequences, i.e. any adaptor can bind to any stopper (which is preferred for all em bodiments of the invention) and further an adaptor nucleic acid is not provided bound to the stopper, e.g. when the adaptor is provided only after the elongation reaction. In other embodiment or other parts of the reaction, they are not bound, such as when the elongation reaction reaches the 5' end of the template, where no stopper is usually hybridized to because the stopper needs at least a minimal annealing sequence on the template, which moves the most downstream stopping position several nucle otides upstream from the 5' end. The adaptor is also able to be ligated to the elongation product without bind or hybridization to the elongation stopper. However, it is preferred in all em bodiments that when the adaptor nucleic acid is ligated to the elongation product, said elongation stopper and/or the elonga tion product, in particular preferred its 3' end, is still hy bridized to the template. It is also preferred that the adaptor nucleic acids are hybridized to the elongation stopper, espe cially preferred, after the elongation reaction and/or - in par ticular preferred - for ligation.

In preferments of the inventive method and kit, the oligonu cleotide primer - and preferably but not necessarily also the elongation stopper - comprises a universal amplification se quence ("primer linker sequence", see above) and/or wherein the adaptor nucleic acid comprises a universal adaptor amplification sequence ("adaptor linker sequence", see above) . Such an ampli fication sequence or "linker" can be used to bind primers for a further amplification as already mentioned above. A universal sequence means that it is the same for all primers, stoppers or adaptors, respectively. This allows binding of the same primer type to these oligonucleotides. In especially preferred embodi ments, the universal amplification sequence (linker sequence) is also the same for the primers, stoppers and adaptors, i.e. a further amplification primer may bind to oligonucleotide pri mers, elongation stoppers and adaptor nucleic acids likewise. This facilitates easy handling since only one type of primer is necessary for further amplification. In other embodiments, the primers, stoppers and adaptors have different universal amplifi cation sequences (linker sequences), i.e. a further amplifica tion primer may only bind to oligonucleotide primers, another further amplification primer may only bind to elongation stop pers and a further amplification primer may only bind to adaptor nucleic acids. Within these groups, the primers are preferably universal. This still allows easy handling but better control since primers for both ends of labelled fragment will be differ ent and can be selected specifically.

In preferred embodiments, a special oligonucleotide primer is used to select for and anneal to a selected sequence of the template, preferably on the 3' end of the template. In case of mRNA, or any other type of RNA that comprises an oligo (A) tail, such a 3' end can be annealed to with a complementary oligonu cleotide primer, e.g. which comprises an oligo (dT) annealing se quence that is complementary to said oligo (A) tail. Preferably at least one oligonucleotide primer comprises an annealing se quence for annealing to a selected sequence of the template, which may be at or near the 3' end of the template. Such a se lected sequence is any known sequence of the template, like an oligo (A) tail, but any other sequence when known can be used as well. Preferably, the oligonucleotide primer for the selected sequence comprises an oligo (dT) sequence for annealing to an ol igo (A) sequence in the template. Preferably said oligo (dT) se quence comprises one or more 3' anchoring nucleotides different from the oligo (dT) sequence. This allows proper localization and binding to the 5' end of the oligo (A) template sequence. The an choring nucleotide will anneal to the next non- A (e.g. T, G, C) on the template next to the oligo (A) part. If the next non-A nu cleotide is unknown, it is possible to use a mixture of oligonu cleotide primer with different anchoring primers, e.g. using three oligonucleotide primer with each non-T (e.g. A, G, C) nu cleotide (complementary to the next non-A (e.g. T, G, C) on the template) . In preferred embodiments, two anchoring nucleotides are used. The anchoring nucleotide next to said non-T nucleotide may selected from any nucleotide type (A, T (U) , G, C) as it is not bordering the oligo (T) . Said special oligonucleotide primer may not be stopper and may not comprise a sequence for hybridi zation to an adaptor since these are not needed if the special oligonucleotide primer anneals to or near the 3' end of the tem plate - this means that no upstream elongation product will ar rive at its position. Of course, for ease or unity in pri mer/stopper manufacturing such a sequence and/or stopper func tion may be present.

Preferably, the ligation reaction is in the presence of a crowding agent. A crowding agent increases the likelihood of the adaptor and elongation product interacting with each other by decreasing the effective reactive volume, see Zimmerman et al . , Proc Natl Acad Sci U S A. 1983; 80(19) : 5852-6. Further crowding agents are e.g. disclosed in US 5,554,730, US 8,017,339 and WO 2013/038010 A2. Preferably, the crowding agent is a macromole cule, polymer or polymer comprising compound, like a polyalkyl glycol, preferably PEG, Octoxinol or Triton X, or a polysorbate, preferably Tween. In preferred embodiments, the crowding agent is used in concentrations of 5% to 35% (v/v), especially pre ferred 10% to 25% (v/v) . Preferably, the crowding agent has a molecular weight of 200 bis 35000 g/mol, preferably 1000 to 10000 g/mol. In particular preferred is a polyalkyl glycol, like PEG, especially with said molecular weight. A crowding agent is preferably provided in the inventive kit, preferably in a liga tion buffer.

Other ingredients for the kit, in any component, are buff ers, salts, enzymatic cofactors and metals, such as Mn ²⁺ and Mg ²⁺ for polymerases and ligases, solvents, containers.

The present invention provides a kit for performing the in ventive method. Such a kit may comprise any of the compounds and means described so far. Preferably the kit comprises

(i) at least one oligonucleotide primer capable of hybridizing to a template nucleic acid and priming an elongation reaction on its 3' end, (ii) one or more elongation stoppers capable of hy bridizing to a template nucleic acid, preferably capable of priming an elongation reaction on its 3' end, (ill) one or more adaptor nucleic acids that comprise an identification sequence on its 5' end, wherein said identification sequence does not hy bridize to the elongation stopper, preferably wherein the adap tor nucleic acid is bound to, hybridized to or is not bound to an elongation stopper, (iv) a reverse transcriptase, and (v) an oligonucleotide ligase, (iv) and (v) may be optional since they might be available to many laboratories independent of the pre sent invention. The important parts are the adaptor/stopper de signs, in particular the identification sequences on the adap tors. Preferably a plurality of adaptors with different identi fication sequences are provided in the kit - as described above. All these components the kit have been described above and any preferred embodiment thereof also applies to the kit as well. Preferably the kit comprises at least 10, more preferred at least 50, adaptor nucleic acids with different identification sequences. The reasons for such a preferred embodiment have been given above. Preferably the oligonucleotide primer comprises an annealing sequence for annealing to the template, which compris es an oligo(dT) sequence for annealing to an oligo (A) sequence in the template, preferably wherein said oligo (dT) sequence com prises one or more 3' anchoring nucleotides different from the oligo (dT) sequence. The kit may also comprise a solid phase for purification, such as beads, preferably magnetic beads (see method details above, which also read on the kit components' suitability and embodiments) .

All preferred embodiments as described above can be com bined. Such a method uses a random primer (with a linker se quence) that is also a stopper (also called "Strand Displacement Stop Primer") . After the elongation reaction, preferably a puri fication of the elongation products (hybridized to the template) is done to remove unbound primers and stoppers. Then adaptors with their linkers and identification sequences are ligating to the elongation product. The identification sequence has a random sequence with a length of preferably between 4 and 12 nt . One preferred option is to use mixtures of differently long identi fication sequences because ligases tend to impose ligation bias- es by favouring certain 5' located nucleotides in the ultimate and penultimate position. Because such biases can affect read quality in sequencing such mixtures equalize the nucleotide dis tribution when sequencing across the region of ligation junc tions. However, the variable identification sequence provides a much more unbiased ligation as any other determined sequence and serves at the same time also as a UMI (Unique Molecular Index) .

An identification sequence, such as UMI, allows to determine if sequencing reads which possess an identical sequence, or which map to an identical position in a reference annotation which ac counts for minor sequencing errors, are coming from different template molecules or from one template molecule and are just the result of further amplification (PCR duplication) . The adap tor is hybridized to the primer when present.

Identification sequences, like UMIs, can also distinguish between real SNPs (single nucleotide polymorphisms) between in dividuals and errors (mutations) introduced during reverse tran scription or in early PCR cycles, which are amplified later. All of those randomly occurring and amplified errors should have the same identifier, whereas "real SNPs" in a sample have various different identifiers. Or RNA-editing events that introduce mod ified bases leading to mis-incorporation and thus errors during RT could be more reliably quantified.

Identification sequences, like UMIs, could also be used to reliably determine and quantify allele frequencies in popula tions, molecular markers and causative mutations in hereditary diseases. Preferably DNA templates are used for this embodiment.

A further preferred combination is a method of the invention wherein at least one, preferably at least 9, elongation stopper has primer activity and is also elongated during the elongating step and at least two, preferably at least 10, adaptor nucleic acids that comprise different identification sequences are used, whereby at least two, preferably at least 10, different labelled fragments are generated, optionally amplifying the labelled fragments, further comprising assembling the sequences of ampli fication fragments which are unique, wherein the labels are used to identify unique amplification fragments. The different labels in the amplified labelled fragments can be used for identifying unique amplification fragments.

A further preferred method uses stoppers with primer func- tions. Preferably a plurality of such primers is used. In such a method, without differentiating between stoppers and primers, an embodiment of the invention can be defined as follows: A method for generating labelled amplification fragments of a nucleic ac id template comprising the steps of providing said template nu cleic acid, annealing a plurality of oligonucleotide primers to said template nucleic acid, elongating the oligonucleotide pri mers in a template specific manner thereby creating a plurality of elongation products, wherein said elongating reactions stop when the elongation products reach the 5' end of the template nucleic acid or an oligonucleotide primer that is annealed to the template nucleic acid downstream of such an elongation prod uct, providing a plurality of adaptor nucleic acids that com prise an identification sequence on their 5' ends, wherein said identification sequences do not hybridize to the oligonucleotide primer or to the template, ligating the adaptor nucleic acids of the plurality at their respective 5' ends to the 3' end of the elongation products, thereby generating plurality of labelled amplification fragments. This is a preferred embodiment that can be combined with any particularly described aspects in the claims and described above. Everything described above for stop pers applies to the primers in this embodiment since these pri mers are stoppers with primer function. The term "plurality" is used for oligonucleotide primers, elongation products (which are the result of the primers' elongation) , adaptor nucleic acids and labelled amplification fragments (which are the result of elongation and adaptor ligation) . As indicated, the amount of some of these pluralities is a result of the method. The amounts of oligonucleotide primers and adaptor nucleic acids can be se lected - as described above. Their amounts can be selected inde pendently but are preferably approximately the same for pairwise association with a given elongation product. Preferably the plu rality is e.g. 10 or more, 50 or more, 100 or more, 200 or more, etc. Many different oligonucleotide primers and adaptor nucleic acids can be used: For the oligonucleotide primers to bind to multiple different locations on the template, for the adaptor nucleic acids to have different identification sequences, pref erably unique identification sequences for the labelled amplifi cation fragments. Although in this embodiment primers and stop per are the same, a special primer that does not need (but may have) stopper function may also be added, such as a 5' end spe cific primer, like an oligo (A) targeting primer as described above .

The present invention is further described in the following figures and examples, without being limited to these embodiments of the present invention.

Figures :

Figure 1: Schematic representation of creating a UMI-linker tagged short cDNA library using a primer with SDS properties and a partially complementary UMI-containing linker oligo within the body of the RNA.

a) General strand displacement stop primers Pn are hybridized to an RNA transcript, with primer Pn+1 hybridized to a more up stream (5') position of the template RNA than primer Pn. When the reverse transcriptase while extending Pn reaches a primer Pn+1, the polymerase reaction will be stopped by the strand dis placement stop technology described in WO 2013/038010 A2. A UMI- containing linker oligo encompassing L2 which is complementary to LI is hybridized to primers Pn and Pn+1. b) During ligation the extension product is now ligated to the UMI preceding the L2 strand of the linker. In this manner again, a cDNA library is created that has two linker sequences (LI, L2) present on its ends and contains unique molecular identifiers, c) Finally, a PCR is performed to amplify these libraries.

Figure 2: Generation of UMI-containing libraries

Figure 2 a) shows libraries generated by the SDS + ligation ap proach .

Ligation of the UMI-containing partially complementary L2 adap tor (see Fig. 1 for reference) can be performed using either a ss ligase or a ds ligase (lane 2, 3) . No libraries are generated when ligase is omitted (lane 1) . After ligation, cDNA fragments containing LI and L2 linkers are amplified by PCR and analysed. Shown are gel images from an HS DNA Assay run on a Bioanalyzer (Agilent Technologies, Inc.) . b) schematic illustration of the generation of UMI-containing libraries using the SDS + ligation approach with non-hybridizing starter and adaptor oligos. In this case, the adaptor oligo L2' does not contain sequences com- plementary to the elongation starter Pn. c) Gel image and elec- tropherogram of replicate libraries generated using non

hybridizing elongation starter and UMI-containing adaptor oligos (SEQ ID No. 10) . Images are obtained from an HS DNA Assay run on a Bioanalyzer (Agilent Technologies, Inc.)

Figure 3: Improved 5' end coverage of transcripts achieved by ligation of L2 linkers to cDNA at the 5' end of the RNA tem plate .

a) Schematic representation of the RT reaction at 5' end of transcripts. Without SDS by downstream primers Pn+1, the termi nal deoxynucleotidyl transferase activity (TdT) of the RT adds untemplated nucleotides to the 3' end of the cDNA generating an overhang, b) the non-templated nts can either serve as hybridi sation site for LI containing primer Pn+1. In conjunction with partially hybridized L2, the ligation of the UMI-L2 linker can occur in a double strand, c) Alternatively, in the absence of priming the UMI-L2 linker can be ligated as a single strand. d) Libraries generated as shown schematically in Fig. 3 a-c) were sequenced on an Illumina NextSeq 500 (single read, 75bp) . Shown are reads mapping to the 5' end of ERCC-0130 (as present in SIRV set 3, Lexogen Catalog #051. ON) . Reads were analyzed without trimming of additional and mis-matching bases. Nucleotides marked in gray correspond to the annotation of ERCC-0130, and nucleotides shown in black are derived from non-templated addi tion by TdT activity of the RT . Thirty representative sequences of the reads obtained for the 5' end of ERCC-0130 are shown be low. Read sequences are SEQ ID NO: 12 to 42, from top to bottom, e) Improved 5' end coverage of the SDS/ligation approach as com pared to conventional protocols. Libraries were prepared using a conventional protocol (NEBNext® Ultra™ II directional RNA Li brary Prep Kit for Illumina®, New England Biolabs, Catalog # E7760S) or the SDS/ligation approach and sequenced on an Illumi na NextSeq 500 (paired end read, 150bp) . Reads mapping to ERCC- 0130 were superimposed and compared with expected coverage shown as rectangles, left: conventional RNA library preparation proto col, right: coverage obtained by the novel SDS/ligation technol ogy .

Figure 4: Schematic representation of the reaction used to im- prove 3' end coverage by the SDS/ligation approach and a combi nation of general (Pn) and oligo-dT primers (PdT) .

a) General primer Pn is hybridized to the RNA template within the body of the RNA. Additionally, present oligo-dT primers (PdT) hybridize to the poly (A) tail of the 3' end of poly- adenylated transcripts. RT will extend PdT until a downstream primer Pn will be reached and stops strand displacement.

b) During ligation, the UMI-containing L2 linker will be ligated to cDNA fragments spanning the 3' end, generating LI and Li- linked, UMI-containing cDNA libraries covering the 3' ends of transcripts, c) gene body coverage plot showing enhanced cover age of the 3' end of transcripts over the whole transcriptome . Libraries were prepared using the SDS + ligation protocol using a mixture of random priming and oligo-dT first strand synthesis primer as described in example 3. Libraries were sequenced on a NextSeq 500 machine and gene body coverage over the transcrip tome was plotted in comparison to the previously described SDS + ligation protocol, d) exemplary coverage over an endogenous housekeeping gene (HSP90) for a conventional library preparation method (upper panel) and the SDS + ligation protocol with oligo- dT titration (lower panel) which results in improved 3' end cov erage .

Figure 5: Global improvement of 5' and 3' coverage of tran scripts. Transcription start sites, i.e. genuine 5' ends of transcripts and transcript end sites, i.e. genuine 3' ends of transcripts are resolved using the SDS + ligation protocol, but not resolved when using two exemplary conventional library prep aration methods. Libraries generated using the SDS + ligation protocol as shown schematically in Fig. 3 a-c) were sequenced on an Illumina NextSeq 500 (paired end, 150bp) . Conventional li braries were prepared according to manufacturer' s instructions using either the TruSeq Stranded Total RNA Library Prep Hu man/Mouse/Rat, Illumina Catalog # 20020596 or 20020597

(=Conventional 1) or the NEBNext® Ultra™ II directional RNA Li brary Prep Kit for Illumina®, New England Biolabs, Catalog # E7760S (= Conventional 2) . a) Shown are reads mapping to genuine 5' and 3' ends of detected ERCCs (as present in SIRV set 3, Lex- ogen Catalog #051. ON) . Reads were mapped to ERCC spike in RNAs with known sequence. Normalized coverage of accumulated mapped reads for all detected ERCCs is plotted for absolute nucleotide positions relative to the transcription start (TSS) and tran script end sites (TES) marked by dotted lines, b) Extended 5' coverage reveals generic TSS. Upper panel: Coverage profile for gapdh with condensed intron visualization as generated using the SDS + ligation protocol or conventional library preps as de scribed above, b) Reads mapping to gapdh were analyzed without trimming of additional and mis-matching bases. Read sequences are SEQ IDs No. 43 to No. 67 from top to bottom. Nucleotides marked in black correspond to the annotation of gapdh, and nu cleotides shown in gray are mis-matches or derived from non- templated addition by TdT activity of the RT . Start site clus ters generated by stacking of reads at the 5' end of transcripts may be used to re-annotate TSS. The annotated and manually de termined TSS are indicated by arrows at the annotated consensus sequence shown in bold.

Examples :

Example 1: Ligation of unique molecular identifiers (UMI) to first strand cDNA fragments.

Libraries were prepared from universal human reference RNA (Ag ilent Technologies, Catalog # 740000) containing SIRV Set 3 spike in control mix (Lexogen, Catalog # 051. ON) according to the manufacturer's instruction.

After cDNA synthesis, downstream primers (Pn+1 (L2) ) con taining a unique molecular identifier of a length between 2 and 24 nucleotides, preferably between 6 and 12 nucleotides, can be ligated to the newly transcribed cDNA strand in the hybrid with the template RNA. Reverse transcription was performed using oli- gos, template and conditions as described WO 2013/038010 A2. Various Ligases and combinations thereof can be used to ligate oligos like:

SEQ ID No: 1: (Phos) ( 5' -NNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -3' (3InvdT) ) ,

SEQ ID No: 2: (Phos) ( 5' -NNNNNNNNNNAGATCGGAAGAGCACACGTCTGAA- 3' (3InvdT) ) ,

SEQ ID No: 3: (Phos) (5 NNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG- 3 ' (3InvdT) ) ,

SEQ ID No: 4: (Phos) ( 5' -NNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGG- 3' (3InvdT) ) ,

SEQ ID No : 5 : (Phos) (5' -

NNNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG- 3 ' (3InvdT) ) ,

SEQ ID No : 6 : (Phos) (5' -

NNNNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG- 3 ' (3InvdT) ) ,

SEQ ID No: 7: (Phos) ( 5 ' -NNNNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGG- 3' (3InvdT) ) ,

SEQ ID No: 8: (Phos) ( 5 ' -+NNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGG- 3' (3InvdT) ) ,

SEQ ID No: 9: (Phos) ( 5 ' -+NNNNNNNNNNNNAGATCGGAAGAGCGTCGTGTAGG- 3' (3InvdT) ) .

After reverse transcription (RT) the samples were purified by solid phase reverse immobilization (SPRI) with magnetic purifi cation beads (AMPure Beads; Agentcourt) according to the manu facturer's instruction. The cDNA:RNA hybrids were eluted in 20 mΐ water or 10 mM Tris, pH 8.0, before 17 mΐ of the supernatants were transferred into a new PCR plate. Then, ligation reactions were performed in 60 mΐ with 20% PEG-8000, 50 mM Tris-HCl (pH 7.5 at 25 ° C ) , 10 mM MgCl ₂, 5 mM DTT , 0.4 mM ATP, 0.01% Triton- xlOO, 50 pg/ml BSA and 20 units ligase, which can either be a single-strand specific ligase and/or a double strand-specific ligase. Un-ligated small fragments and remaining oligos were re moved by SPRI purification. All remaining primary cDNA libraries were amplified in a PCR reaction using a high-fidelity polymer ase, and the following program: 98°C for 30 seconds followed by 10-25 PCR cycles of 98°C for 10 seconds, 65°C for 20 seconds and 72°C for 30 seconds. Final extension was performed at 72°C for 60 seconds. Figure 1 b) shows the general principle underlying the ligation of the extended cDNA to the UMI-containing linker oligo (L2) which has a complementary sequence to the strand dis placement stop primer (LI) .

The example in Figure 2 shows that various ligases can per form the ligation reaction of a UMI containing oligo nucleotide, and thus produce cDNA fragments that contain both PCR linkers and are amplifiable by PCR (Fig. 2 a), lane 2-3) . In contrast, the control experiment omitting any ligase shows that no librar ies can be amplified emphasizing the specificity of the reaction ( Fig . 2 a) , lane 1 ) . Example 2: Library generation using non-hybridizing elongation starter and adaptor oligonucleotides.

Libraries were prepared from universal human reference RNA (Agilent Technologies, Catalog # 740000) containing SIRV Set 3 spike in control mix (Lexogen, Catalog # 051. ON) according to the manufacturer's instruction.

Reverse transcription (RT) was performed as described in Ex ample 1. Following RT the samples were purified by solid phase reverse immobilization (SPRI) with magnetic purification beads (AMPure Beads; Agentcourt) according to the manufacturer's in struction and the purified cDNA:RNA hybrids were eluted in 20 mΐ 10 mM Tris, pH 8.0, before 17 mΐ of the supernatants were trans ferred into a new PCR plate. Ligation was performed using the conditions described in Example 1 but providing an adaptor oli gonucleotide that does not contain sequence complementarity to elongation starter used for priming the reverse transcription reaction. Hence, the oligonucleotide adaptor cannot hybridize and thus are not brought into the vicinity of the newly generat ed 3' ends of the elongation products by recruitment (Fig. 2 b) ) . Oligos such as SEQ ID No. 10: (Phos) (5'-

NNNNNNNNNNNNTGGAATTCTCGGGTGCCAAGG -3' (SpcC3) ) do not possess se quence complementarity to elongation starters. Fragments con taining both linker sequences were amplified following clean up as described in Example 1. Fig. 2 c) shows gel images and elec- tropherograms of library traces for two replicate SDS + ligation libraries generated with non-hybridizing elongation starters and adaptor oligos.

Example 3: Improved 5' end coverage as a consequence of terminal transferase activity and ss-ligation of a UMI-linker to first strand cDNA fragments.

Libraries were prepared from universal human reference RNA (Agilent Technologies, Catalog # 740000) containing SIRV Set 3 spike in control mix (Lexogen, Catalog # 051. ON) according to the manufacturer's instruction.

First strand cDNA synthesis stops at the 5' ends of template RNA molecules. Terminal transferase activity of reverse tran scriptases catalyzes non-templated addition of nucleotides at the 3' end of the cDNA stand (Fig. 3 a) .

Ligation of UMI-linker oligos (e.g., SEQ IDs 1-9) after re verse transcription can occur in double strand formation (Fig.3 b) and at single-stranded overhangs (Fig.3 c) . Following SPRI- purification and PCR amplification, libraries were sequenced on a NextSeq 500, either in single read or paired-end mode. Reads mapping to the 5' end of ERCC-0130 were analysed without prior clipping of mis-matched nucleotides. Reads covering the 5' end of ERCC-0130 are shown exemplarily in Fig. 3 d. Addition of ter minal nucleotides and the UMI ligation at extended single strands result in improved 5' coverages. The comparison of cov erage profiles between common RNA-seq library preparation and the present invention are shown in Fig. 3 e. Coverages are seen as superposition of all aligned reads (trace shown in grey) and compared to the expected uniform coverage shown as rectangle. Whereas in sequencing data derived from conventional protocols the 5' and 3' ends are less efficiently covered apparent in a slope towards either end (Fig. 3 e, left), the novel protocol generates more reads mapping to the extreme 5' end of tran scripts (Fig. 3 e, right) .

Example 4: Improvement of 3' end coverage by titration of oligo- dT first strand synthesis primers.

The coverage of transcript 3' ends can be modified, prefera bly increased, by using oligo-dT containing first strand primers (Pn containing LI) which are added to the mixture of random priming SDS oligos, which contain already a portion of T-rich, and T-only priming sequences (such as SEQ ID No: 11 5'-GTGACTG- GAGTTCAGACGTGTGCTCTTCCGATCT +TTT TTT TTT TTT TTT TTT+ V-3') ac cording to the normal distribution of random nucleotides, to boost the coverage at the 3' ends. Depending on the chosen ratio between random and poly-dT LI primers the change in sequencing depth at the 3' -end sites can be foregrounded (Fig. 4) . The ra tios of random SDS primers and specific oligo dT primers, as well as the primer length and LNA content can vary, and will de termine the amount of over-representation of the 3' ends.

Libraries were prepared by SDS + ligation using either ran dom priming displacement stop primers only or a mixture with various amounts of oligo-dT first strand primers (SEQ ID No: 11) . The resulting libraries were subjected to sequencing on a NextSeq 500, data was analysed, and gene body coverage plots over the whole transcriptome were generated from mapped reads using the geneBody_coverage python script available from rseqc (Fig. 4 c) . The coverage of 3' ends can be significantly in creased upon addition of oligo-dT primers during reverse tran scription .

Further, gene coverages were visualized exemplarily for en dogenous genes using a custom script to evaluate coverage on in dividual genes. Figure 4 d shows the coverage of housekeeping gene HSP90 obtained by a conventional RNA library preparation protocol (upper panel) with notoriously under-represented 5' and 3' ends. In contrast, the SDS-ligation protocol with oligo-dT titration shows an improved 5' and 3' coverage (lower panel) .

Example 5: Improvements of 5' and 3' coverage facilitates deter mination of true transcript start and end sites.

SDS + Ligation libraries were prepared on ribo-depleted univer sal human reference RNA (Agilent Technologies, Catalog # 740000) containing SIRV Set 3 spike in control mix (Lexogen, Catalog # 051. ON) as described in Examples 3 and 4. Removal of ribosomal RNA was achieved by using RiboCop Lexogen, Catalog # 037.96) ac cording to manufacturer's instructions. As a comparison, two conventional library preparation methods were used on the same ribo-depleted universal human reference RNA: the TruSeq Stranded Total RNA Library Prep Human/Mouse/Rat, Illumina Catalog #

20020596 or 20020597 ^Conventional 1) or the NEBNext® Ultra™ II directional RNA Library Prep Kit for Illumina®, New England Bi olabs, Catalog # E7760S (= Conventional 2) following manufactur er's instructions. The resulting libraries were subjected to se quencing on a NextSeq 500, and data was analysed. Gene body cov erage plots were generated for all detected ERCCs present in SIRV Set 3. Figure 5 a) shows the normalized coverage of accumu lated mapped reads over ERCCs for absolute nucleotide positions relative to the known transcription start (TSS) and transcript end sites (TES), both indicated by dotted lines. The coverage at 5' and 3' ends is significantly increased for samples derived from SDS + Ligation libraries as compared to both convention li brary preparations that show reduced coverage of the 3' end and lack resolution of the exact 5' end.

Further, gene coverages were visualized exemplarily for an endogenous housekeeping gene, gapdh, using a custom script to evaluate coverage on individual genes. Figure 5 b) shows the coverage profile for gapdh with condensed intron visualization. Reads mapping to gapdh (SEQ IDs No. 43 to No. 67) were analyzed without trimming of additional and mis-matched bases. Nucleo tides matching the consensus sequence (top row) are marked in black, and nucleotides deviating from the annotated consensus sequence or derived from non-templated addition are marked in gray. Based on the stacking of the reads observed for samples derived from SDS + Ligation library preparations, authentic transcription start sites can be determined and re-annotated for transcripts of interest. In the Example shown in Figure 5 b) the TSS was manually adjusted to position -15 (in respect to the an notated +1 position) . Similarly, genuine transcription start and end sites can be re-assessed for other transcripts of interest allowing comprehensive analysis of the complete transcript in cluding single-nucleotide resolution at genuine TSS for high throughput NGS experiments. This can be achieved simply by using the SDS + Ligation library preparation method, as opposed to specialized and more complicated approaches such as 5' capture sequencing techniques (CAGE-Seq) or low throughput methodologies such as 5' RACE (rapid amplification of cDNA ends) .

Previous Patent: LOBE PUMP WITH INNER BEARING

Next Patent: METHOD FOR DEODORISING EXHAUST AIR