Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
RAPID AND PARALLELSTORAGE-ORIENTED DNA SEQUENCE MATCHING METHOD AND SYSTEM THEREOF
Document Type and Number:
WIPO Patent Application WO/2018/000174
Kind Code:
A1
Abstract:
A rapid and parallel storage-oriented DNA sequence matching method, applied to storage compression for a DNA sequence. The method comprises: a hash index building step, on the basis of a prefix, building a hash index for a reference genome of a FASTA format , finding all kmers of the designated prefix and using same as key values to build a hash index table, each entry storing the position where the corresponding kmer appears; a file segmenting step, inputting a DNA sequence file of the FASTQ format, and segmenting the DNA sequence file; and a multithread processing step: starting a plurality of threads to respectively process a plurality of tasks decided by the number of the threads, a plurality of sub-blocks simultaneously calling a matching function which is rapidly positioned on the basis of a kmer hash index, matching, in parallel, the sub-blocks to a target reference genome of the FASTA format, and substituting an original DNA sequence with the storage matching result to achieve the purpose of storage compression.

Inventors:
ZHU ZEXUAN (CN)
DENG QINGJIN (CN)
CHU YING (CN)
SUN YIWEN (CN)
Application Number:
PCT/CN2016/087407
Publication Date:
January 04, 2018
Filing Date:
June 28, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV SHENZHEN (CN)
International Classes:
H03M7/30; G06F17/30; G16B50/50
Foreign References:
CN106096332A2016-11-09
CN103546160A2014-01-29
CN104951672A2015-09-30
CN104239750A2014-12-24
CN102081707A2011-06-01
CN103995988A2014-08-20
Other References:
MARCAIS, G. ET AL.: "A Fast, Lock-Free Approach for Efficient Parallel Counting of Occurrences of K-mers", BIOINFORMATICS, vol. 27, no. 6, 1 July 2011 (2011-07-01), pages 764 - 770, XP055450695
MEHTA, A. ET AL.: "DNA Compression Using Hash Based Data Structure", INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND KNOWLEDGE MANAGEMENT, vol. 2, no. 2, 1 July 2010 (2010-07-01), pages 383 - 386, XP055450698
ZHU, Z.X. ET AL.: "High-Throughput DNA Sequence Data Compression", BRIEFINGS IN BIOINFORMATICS, vol. 16, no. 1, 3 December 2013 (2013-12-03), pages 1 - 15, XP055372867
ZHU, ZEXUAN ET AL.: "Advances in the Compression of High-Throughput DNA Sequencing Data", JOURNAL OF SHENZHEN UNIVERSITY ( SCIENCE & ENGINEERING, vol. 30, no. 4, 31 July 2013 (2013-07-31), pages 409 - 415
ZHANG, Y.P. ET AL.: "Light-Weight Reference-Based Compression of FASTQ Data", BIOINFORMATICS, vol. 16, 9 June 2015 (2015-06-09), XP021223710
ZHANG, LIXIA ET AL.: "DNA Compressed Pattern Matching Algorithms Based on Character and 0/1 Coding", APPLICATION RESEARCH OF COMPUTERS, vol. 24, no. 9, 30 September 2007 (2007-09-30), pages 22 - 24
ZHOU, JIARUI ET AL.: "Intelligent DNA Sequence Data Compression Using Memetic Algorithm", ACTA ELECTRONICA SINICA, vol. 41, no. 3, 31 March 2013 (2013-03-31), pages 513 - 518
Attorney, Agent or Firm:
HENSEN INTELLECTUAL PROPERTY FIRM (CN)
Download PDF: