Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
OFFLINE DOWNLOAD METHOD AND SYSTEM
Document Type and Number:
WIPO Patent Application WO/2013/135203
Kind Code:
A1
Abstract:
An offline download method used with an offline download system is developed. The offline download system pre-downloads a requested file and stores the requested file in a cloud storage in response to an offline download request from a client. The offline download method includes steps of calculating similarity values between any two files stored in the cloud storage; grouping files with the similarity values greater than a threshold value; sorting the grouped files according to the similarity values. Thus, at least one file stored in the cloud storage can be deleted or suggested to substitute the requested file according to the sorting result.

Inventors:
LIU GANG (CN)
WU YUNSHENG (CN)
Application Number:
PCT/CN2013/072738
Publication Date:
September 19, 2013
Filing Date:
March 15, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TENCENT TECH SHENZHEN CO LTD (CN)
International Classes:
H04N21/231
Foreign References:
CN101184091A2008-05-21
CN101166190A2008-04-23
CN102387220A2012-03-21
US20040122909A12004-06-24
Attorney, Agent or Firm:
PSHIP FIRM, LLC (No.5 Land 1135 Middle Yanan Road,Jingan District, Shanghai 0, CN)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. An offline download method adapted to be used with an offline download system including a cloud storage storing pre-downloaded files, the offline download method comprising steps of:

receiving an offline download request for a requested file from the client; selecting at least one substitute file from the pre-downloaded files stored in the cloud storage when the requested file is not found in the cloud storage, wherein the at least one substitute file is determined to be similar to the requested file according the similarity between the at least one substitute file and the requested file;

sending first information comprising at least one filename of the at least one substitute file and at least one associated download link to the cloud storage; and

the client downloading one of the at least one substitute file from the cloud storage by connecting the associated download link.

2. The offline download method according to claim 1, wherein the requested file and the pre-downloaded files are video files and the offline download method further comprises steps of:

calculating similarity values between any two video files stored in the cloud storage;

grouping video files with the similarity values greater than a first threshold value; and

sorting the grouped video files according to the similarity values in descending order.

3. The offline download method according to claim 2, wherein the video files are grouped by recording recognition codes corresponding to the video files in the same archive in a database, the recognition codes being hash values mapped with the video files and sorted according to the similarity values in descending order.

4. The offline download method according to claim 3, further comprising steps of:

selecting one video file mapped to the recognition code at the head of the archive;

deleting the selected video file in the cloud storage to release space of the cloud storage; and

retaining the recognition code corresponding to the deleted video file in the archive.

5. The offline download method according to claim 2, wherein the similarity value between the at least one substitute file and the requested file is greater than a second threshold value, the second threshold value being greater than the first threshold value.

6. The offline download method according to claim 2, wherein the calculating step further comprises steps of:

using hash algorithm to index representative frames and computing frame match sequence between the two video files;

using random sample consensus (RANSAC) method to fit the frame match sequence to obtain matching values of the frame match sequence between the two video files; and

calculating the similarity value between the two video files according to the matching values.

7. The offline download method according to claim 6, wherein after the frame match sequence is computed, the offline download method further comprises a step of removing noise-affected frames of the video files to obtain valid frame match sequence between the two video files.

8. The offline download method according to claim 6, wherein before the similarity value is calculated, the offline download method further comprises a step of adjusting a frame rate of at least one of the two video files to synchronize the two video files.

9. The offline download method according to claim 1, wherein after the step of receiving the offline download request, the offline download method further comprising steps of:

extracting a filename of the requested file from the offline download request; and

searching the extracted filename in a database recording a plurality of filenames of the pre-downloaded files to find at least one filename corresponding to the at least one substitute file.

10. The offline download method according to claim 9, further comprising steps of:

dismissing words and symbols in the extracted filename, the words and symbols being irrelevant to contents of the requested file;

cutting the extracted filename into sections, each of which is a separate phrase;

classifying the designated filename into a target cluster by analyzing the separate phrases;

searching the target cluster in the database; and

determining filenames in the target cluster as the at least one filename corresponding to the at least one substitute file when the target cluster is found in the database.

11. The offline download method according to claim 10, further comprising steps of: starting a pre-download process of the requested file when the target cluster is not found in the database and storing the pre- downloaded requested file in the cloud storage;

creating the target cluster in the database;

recording the extracted filename and an associated download link to the cloud storage in the target cluster in the database; and

the client downloading the requested file from the cloud storage by connecting the associated download link after the pre-download process of the requested file completes.

12. The offline download method according to claim 10, further comprising steps of:

accessing each cluster in the database periodically;

retaining one file and deleting the other file stored in the cloud storage when filenames of the two files are recorded in the same cluster; and

associating the filename of the deleted file with the associated download link corresponding to the retained file.

13. An offline download system adapted to be used for pre-downloading a requested video file from internet in response to an offline download request from a client, the offline download system comprising:

a cloud storage storing a plurality of video files;

a similarity-calculating device for calculating similarity values between any two video files stored in the cloud storage;

a video matching database for grouping video files with the similarity values greater than a first threshold value and sorting the grouped video files according to the similarity values in descending order; and

a task manager in communication with the client, for receiving the offline download request for the requested video file from the client, and sending first information to the client when the requested video file is not found in the cloud storage,

wherein the first information comprises at least one filename corresponding to at least one substitute file selected from the video files stored in the cloud storage, so that the client selects one of the at least one substitute file to download, the at last one substitute file being determined to be similar to the requested video file according the similarity value between the at least one substitute file and the requested video file.

14. The offline download system according to claim 13, wherein the video files are grouped by recording recognition codes corresponding to the video files in the same archive in the video matching database, the recognition codes being hash values mapped with the grouped video files and sorted according to the similarity values in descending order.

15. The offline download system according to claim 14, further comprising a file-eliminating device for deleting one video file in the cloud storage wherein the deleted video file is mapped to the recognition code at the head of the archive.

16. The offline download system according to claim 13, the similarity value between the at least one substitute file and the requested file is greater than a second threshold value, the second threshold value being greater than the first threshold value.

17. The offline download system according to claim 13, wherein the similarity-calculating device comprises:

a frame match sequence- computing unit for using hash algorithm to index representative frames and computing frame match sequence between the two video files;

a matching value- calculating unit for using random sample consensus (RANSAC) method to fit the frame match sequence to obtain matching values of the frame match sequence between the two video files; and

a similarity value- calculating unit for calculating the similarity value between the two video files according to the matching values.

18. The offline download system according to claim 17, wherein the similarity-calculating device further comprises a valid frame match sequence- computing unit for removing noise-affected frames of the video files to obtain valid frame match sequence between the two video files.

19. The offline download system according to claim 17, wherein the similarity-calculating device further comprises a frame rate-adjusting unit for adjusting a frame rate of at least one of the two video files to synchronize the two video files.

20. The offline download system according to claim 13, wherein the video matching database comprises:

a recognition code-recording unit for grouping the video files with the similarity values greater than the first threshold value; and

a sorting unit for sorting the grouped video files according to the similarity values in descending order.

21. An offline download system adapted to be used for pre-downloading a requested file from internet in response to an offline download request from a client, the offline download system comprising:

a cloud storage storing a plurality of video files;

a task manager in communication with the client for receiving the offline download request from the client, extracting a filename of the requested file from the offline download request, and sending first information to the client when the requested file is not found in the cloud storage wherein the first information comprises at least one filename corresponding to at least one substitute file and at least one associated download link to the cloud storage; a cluster database recording a plurality of filenames of the video files stored in the cloud storage; and

a cluster-analyzing device in communication with the task manager and the cluster database for receiving the extracted filename and searching the extracted filename in the cluster database, the cluster-analyzing device sending the first information to the task manager when the extracted filename is not found in the cluster database.

22. The offline download system according to claim 21, wherein the cluster-analyzing device dismisses words and symbols in the extracted filename, the words and symbols being irrelevant to contents of the requested file; cuts the extracted filename into sections, each of which is a separate phrase; classifies the extracted filename into a target cluster by analyzing the separate phrases; and determines filenames in the target cluster as the at least one filename corresponding to the at least one substitute file when the target cluster is found in the cluster database.

23. The offline download system according to claim 21, wherein the cluster-analyzing device sends second information to the task manager to start a pre-download process of the requested file when the target cluster is not found in the cluster database; creates the target cluster in the cluster database; and records the extracted filename and an associated download link to the cloud storage in the target cluster in the cluster database.

24. The offline download system according to claim 23, further comprising an offline download server group including at least one offline download server wherein the task manager selects one offline download server among the offline download server group to pre-download the requested file in the internet after receiving the second information, and stores the requested file in the cloud storage.

25. The offline download system according to claim 24, wherein the selected offline download server sends the associated download link to the cluster-analyzing device after the pre-download process of the requested file completes.

26. The offline download system according to claim 25, further comprising a record-eliminating device for accessing each cluster in the cluster database periodically; retaining one file and deleting the other file stored in the cloud storage when filenames of the two files are recorded in the same cluster; and associating the filename of the deleted file with the associated download link corresponding to the retained file.

27. The offline download system according to claim 25, further comprising a task organizer and a task database,

wherein the task organizer receives the offline download request from the client, generates a recognition code corresponding to the requested file, and searches the recognition code in the task database,

wherein the task organizer informs the client to download the requested file from the cloud storage when the recognition code is found in the task database and an associated status corresponding to the recognition code is labeled pre-download completion,

wherein the task organizer sends the offline download request to the task manager and records the recognition code in the task database when the recognition code is not found in the task database,

wherein the task organizer labels the associated status as pre-download completion after the pre-download process completes.

Description:
OFFLINE DOWNLOAD METHOD AND SYSTEM

FIELD OF THE INVENTION

[0001] The present invention relates to an offline download method and an offline download system, and more particularly to a method and a system for offline downloading video files.

BACKGROUND OF THE INVENTION

[0002] In a traditional file- sharing system, files are downloaded from a single server to a client. In this condition, the most effective way to accelerate downloading is to allow data to be transmitted at full rate which is always restricted by the bandwidth between the server and the client. Then, peer-to-peer (P2P) distributed technologies, for example BitTorrent, eMule, and the like, are developed to allow shared access to various sources. In other words, data are transmitted among different peers at the same time. The download speed is significantly affected by the popularity of the downloaded file. Popular files can be downloaded in an effective manner, but files with lower popularity are usually downloaded at a poor speed. Moreover, unstable peers usually slow down the download progress, even no progress for a long time due to no available peers to be connected.

[0003] To overcome the unstable download problem, cloud pre-download technology, also called as offline download method, is further developed. In brief, the desired file is pre-downloaded to a server of the service provider (offline download server) in response to a request of the client. After the pre-download process completes, the entire pre-downloaded file is transmitted to the client's local computer at high speed when the local computer is connected to the offline download server. Therefore, it is not necessary for the local computer to be connected to the file source all day long so that the bandwidth of the local computer is available for other actions or applications. According to the offline download method, if the desired file is out of popularity and the source peers for the desired file are fewer than imagined, which cause slow download speed, the client can request the offline download server to pre- download the desired file rather than directly download the file from the poor source peers to the local computer. It is much effective to directly download the file from the offline download server rather than the original file source.

[0004] The offline download method has several advantages such as rapid, stable and unrestricted. In detail, the offline download server is coupled to a network system with much greater bandwidth than the network system coupled to any client computer, such as asymmetric digital subscriber line (ADSL) network. Hence, the download speed for files from/to the offline download server is much higher than the download speed for the files from the file source to the client computer. In summary, the total time required for the offline download server to receive a file from the file source and then transmit the file to the client computer is still less than the time required for direct transmission between the file source and the client computer. Further, since the download speed to the client computer is limited by the available bandwidth, but not a constant bandwidth as announced by the network service provider, the download speed fluctuates depending on the unstable network environment. On the contrary, the bandwidth between the offline download server and the client computer is great enough to stabilize the download speed. Moreover, the offline download server works 34 hours a day and 7 days a week at full rate. Even though the download speed is restricted by the BitTorrent, eMule, or other P2P distributed services, the client has not to make the computer be in a working status for the download action for a long time. The client only has to issue a request, and then download the files from the offline download server at high speed after the pre-download process completes. The offline download technology breaks the download restriction, and it is really time-saving and cost-effective.

[0005] For the sake of clarity, in the specification, "pre-download process" is defined as data transmission from the file sources to the offline download sever, while "download process" is defined as data transmission from the offline download server to the client computer. It is not intended to limit the transmission mechanism. To start the offline download process, an offline download request is sent to a task manager of the offline download server. The offline download request includes a uniform source locator (URL) link, a hypertext transfer protocol (HTTP) request link, an eMule identifier, a torrent file, or a Magnet link to indicate the desired file. Upon receiving the offline download request, the task manager searches a hash value corresponding to the file indication in a task database. For example, if the offline download request includes a URL link, the hash value is obtained from a hash function of the URL link. If the offline download request includes an eMule identifier, the hash value may be a MD4 or MD5 hash of the eMule identifier. If the file is downloaded via BitTorrent service, the hash value is obtained by a hash function of the characteristic code and serial number of the torrent file.

[0006] If the corresponding hash value is found in the task database, it means that the requested file has been pre-downloaded and stored in the cloud storage. Then, the offline download sever informs the client computer that the requested file is ready for download. The hash value corresponding to the requested file is also included in the information. Thus, the client computer may download the file from the cloud storage by peer to serve and peer (P2SP) technology according to the received hash value at high speed. On the contrary, if the hash value is not found in the task database, it means that the requested file has not ever been pre-downloaded before. Therefore, the pre-download process of the requested file is arranged into a download schedule of the offline download server.

[0007] According to the above-described offline download method, the offline download server determines whether to start the pre-download process or not according to a searching result of the hash value corresponding to the requested file in the task database. If the exactly the same hash value is found in the task database, the client computer can directly download the file already stored in the cloud storage. Otherwise, the offline download sever should pre-download the requested file from the designated file source. However, many similar but not the same files, corresponding to different hash values, may be requested by different clients. In particular, a movie may be stored as movie files with different formats and definitions, but the contents are substantially the same. The similar movie files (sometimes called as near-duplicate video files) are pre-downloaded in response to the different offline download requests so that the bandwidth of the network and the space of the cloud storage is improperly and uneconomically used. For example, there are many movie files of "Transformers: Dark of the Moon", involving formats of rmvb, mp4, mkv, 3gp, etc., stored in the QQDownload server and Thunder server (provided by two social networking companies in China). Also, the stored movie files may involve different video definitions. However, although so many similar movie files have been pre-downloaded, the client still cannot directly download any of the similar movie files if the hash values of the pre-downloaded files and the requested file are unmatched.

[0008] Since the space of the cloud storage is limited, it is impossible to store more and more pre-downloaded files without limitation. Therefore, several files are removed at intervals to release certain space for new coming files. Usually, the least requested files are selected and deleted first. It is an improper selection because it is more difficult and takes more time to pre-download the least requested files again due to few file sources. After the deletion, when the file is requested again, it is possible that the file cannot be successfully pre-downloaded because of insufficient file sources.

[0009] In general, regardless of the difference in format, definition, codec type or audio volume, two video files show similar video contents are called as similar video files. "Similarity value" is defined to show the similarity between two video files. If the two video files are the same with regard to the video contents, the similarity value between the two video files is 1. Under several conditions, two video files are considered as similar video files as described in the following paragraph.

[0010] (1) Video files having the same video contents and different definitions are considered as similar video files. For example, a video file of English movie "Transformers" at 1380*720 definition and another video file of English movie "Transformers" at 1034*576 definition are considered as similar video files. (2) Video files obtained by converting similar video files into different video types or versions for various media players (software) or platforms are considered as similar video files. For example, files provided by cloud-based converters or individual converter software on personal computers are not changed in video contents. (3) Video files with different formats obtained from encryption or compression of similar video files are considered as similar video files. Sometimes, the encryption or compression related information is shown in the filenames. For example, DVDRip means a compressed copy of a final released DVD; DVDscr means a copy of release preview DVD with high quality through MPEG-4 compression which is available prior to the DVDrip version; HDRip (HD-DVD Rip) means a compressed copy of a high definition DVD with excellent quality. Furthermore, a video file of English movie "Transformers" with rmvb format and another video file of English movie "Transformers" with AVI format are considered as similar video files. Concretely, the video file of English movie "Transformers" with rmvb format according to the RV40 standard and another video file of English movie "Transformers" with AVI format according to the H.364 standard are considered as similar video files.

[0011] There are many near-duplicate video files modified in size, length, compression and include/exclude borders, banners and logos on the internet. Hence, many reproduced video files corresponding to substantially the same video contents are stored in the cloud storage of the offline download server. Since the conventional offline download system or method has to provide the client the video file corresponding to the exactly the same hash value as requested, the other reproduced video files having substantially the same video contents but having unmatched hash values are left aside. The various versions of video files retained for possible future requests indeed occupy more and more space of the cloud storage.

SUMMARY OF THE INVENTION

[0012] The present invention provides an offline download method used with an offline download system. At first, the offline download system receives an offline download request for a requested file from the client. If the requested file is not found in a cloud storage of the offline download system, at least one substitute file is selected from the files stored in the cloud storage. The substitute file is determined by the similarity between the requested file and the substitute file. The similarity is calculated based on the filename or the file contents. At last, the client can decide whether to download the substitute file from the cloud storage without waiting the pre-download process of the requested file.

[0013] The present invention further provides an offline download system for pre-downloading a requested video file from internet in response to an offline download request from a client. The offline download system includes a cloud storage, a similarity- calculating device, a video matching database, and a task manager. The similarity- calculating device calculates similarity values between any two video files stored in the cloud storage. The video matching database groups video files with the similarity values greater than a threshold value and sorts the grouped video files according to the similarity values in descending order. The task manager receives the offline download request, and suggests at least one substitute file selected from the video files stored in the cloud storage if the requested video file is not stored in the cloud storage. The substitute file is determined to be similar to the requested video file according to the similarity value between the substitute file and the requested video file. [0014] The present invention further provides an offline download system for pre- downloading a requested file from internet in response to an offline download request from a client. The offline download system includes a cloud storage, a task manager, a cluster database and a cluster-analyzing device. The task manager receives the offline download request from the client, and extracts a filename of the requested file from the offline download request. The cluster-analyzing device searches the extracted filename in the cluster database. If the extracted filename is not recorded in the cluster database, the task manger suggests at least one substitute file whose filename is similar to the extracted filename from the cloud storage.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] The present invention will become more readily apparent to those ordinarily skilled in the art after reviewing the following detailed description and accompanying drawings, in which:

[0016] FIG. 1 is a schematic diagram illustrating an embodiment of an offline download system according to the present invention;

[0017] FIGs. 2A&B are flow diagrams showing steps of an offline download method according to the present invention;

[0018] FIG. 3 is a schematic diagram illustrating another embodiment of an offline download system according to the present invention;

[0019] FIG. 4 is a flow diagram showing steps of matching-relative portion of the offline download method according to the present invention; and

[0020] FIG. 5 is a schematic diagram illustrating a matching-relative portion of the offline download system according to the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0021] The present invention will now be described more specifically with reference to the following embodiments. It is to be noted that the following descriptions of preferred embodiments of this invention are presented herein for purpose of illustration and description only. It is not intended to be exhaustive or to be limited to the precise form disclosed.

[0022] Please refer to FIG. 1, a schematic diagram illustrating an embodiment of an offline download system according to the present invention. The offline download system 1 includes a client 11, an offline download server group 12 having at least one offline download server, a cloud storage 13, a task manager 14, a cluster-analyzing device 15 and a cluster database 16 recording therein filenames and associated download links to the cloud storage 13. The task manager 14 receives an offline download request from the client 11, wherein the offline download request indicates the requested filename. The requested filename is extracted from the offline download request and sent to the cluster-analyzing device 15. Then, the cluster- analyzing device 15 searches the requested filename in the cluster database 16. If there is at least one similar filename recorded in the cluster database 16, the cluster- analyzing device 15 informs the task manager 14 the similar filename(s) and the associated download link(s) to the cloud storage 13. The similar filename(s) and the associated download link(s) are further provided to the client for choice. Thus, the client may select one of the associated download link(s) and downloads the corresponding file similar to the requested file.

[0023] In the above-described offline download system, the task manager 14 does not manage one sever of the offline download server group 12 to start the pre-download process immediately after receiving the offline download request from the client 11. Instead, the requested filename is sent to the cluster- analyzing device 15 to check whether a similar file has been pre- downloaded and stored in the cloud storage 13. This step is performed by comparing the requested filename and the recorded filenames in the cluster database 16. If any similar filename is found, the information including the similar filename(s) and the associated download link(s) to the cloud storage 13 is sent to the client 11 through the task manager 14 so that the client 11 can decide whether to download one of the similar file(s) via the associated download link(s) or not. By means of this offline download system, the pre-download loading of the offline download server is expected to be reduced. The requested file which is similar to the file stored in the cloud storage 13 has not to be pre- downloaded repetitiously, and the cloud storage 13 does not store numerous similar files any longer.

[0024] The offline download method performed by the offline download system of FIG. 1 is shown in the flow chart of FIGs. 2A&B. In step 201, the task manager 14 receives an offline download request from the client 11, and extracts the requested filename from the offline download request. The requested filename is sent to the cluster-analyzing device 15 for analysis. In step 202, the cluster- analyzing device 15 checks whether any similar filename is recorded in the cluster database 16 or not. If any similar filename is found in the cluster database 16, the cluster- analyzing device 15 sends first information to the task manager 14. For example, the first information may include the similar filename(s) recorded in the cluster database 16 and the associated download link(s) to the cloud storage 13. In addition, the first information may include other properties of the searched similar file(s) which are useful for the client 11 to recognize the similar file(s). Optionally, the cluster- analyzing device 15 puts the requested filename into a cluster recording the similar filename(s) in the cluster database 16. In step 203, the task manager 14 sends the recorded similar filenames(s) and the associated download link(s) included in the first information to the client 11. By now the client 11 may select one of the associated download link(s) to download one of the similar file(s) from the cloud storage 13. If no similar filename is found in the cluster database 16, the cluster- analyzing device 15 sends second information to the task manager 14 in step 205. Then, a new cluster is created in the cluster database 16 and the requested filename is recorded in the new cluster by the cluster- analyzing device 15. Upon receiving the second information, the task manager 14 selects one offline download server among the offline download server group 12 according to a predefined loading balance principle to start the pre-download process for the requested file. The pre-downloaded file will be stored in the cloud storage 13, and the download link indicating the location where the pre-downloaded file is stored, is recorded in the cluster database 16 and associated with the requested filename. The download link will be provided to the client 11 after the pre-download process so that the client 11 can access the pre-downloaded file in the cloud storage 13 through the download link.

[0025] According to the offline download method, by analyzing the cluster in the requested filename prior to pre-downloading the requested file, the pre-downloading action of similar files is reduced and even exempted. It results in fewer similar files stored in the cloud storage 13. Under the best condition, there is only one copy of the video file associated with a specific cluster is stored in the cloud storage 13. In other words, for the specific cluster, no similar file is pre-downloaded, so that the loading of the offline download sever and the used space of the cloud storage 13 is significantly reduced.

[0026] Please refer to FIG. 3, a schematic diagram illustrating anther embodiment of an offline download system according to the present invention. Compared with the offline download system 1, the offline download system 3 further includes a task organizer 37, a statistical device 38, a task database 39 and a record-eliminating device 40 in this embodiment. The function and communication of each element is described in detail in the following paragraphs.

Element-Client

[0027] The client 31 is in communication with the task organizer 37, the cloud storage 33, and the statistical device 38. It is to be noted that the client 31 hereinafter means any device which can send a request to another device, for example, a personal computer, a computer workstation or an electronic apparatus running a client software. In this embodiment, the client 31 sends an offline download request to the task organizer 37 wherein the offline download request may include an original file source, for example, a URL link, a HTTP request link, an eMule identifier, a torrent file, or a Magnet link indicating the requested file. In addition, the client 31 can login in a portal of the task organizer 37 to inquire the pre-download progress of the requested file. After the pre-download process is completed and the requested file is stored in the cloud storage 33, the client 31 is allowed to download the requested file from the cloud storage 33 by peer to serve and peer (P2SP) technology combining peer to peer (P2P) technology and peer to server (P2S) technology. The client 31 may know the completion of the pre-download process through notification sent by the task organizer 37 or by voluntarily inquiring the pre-download progress through the task organizer 37. In an embodiment, the client 31 sends an error report to the statistical device 38 if any error is found in the received data segments involved in the downloaded file from the cloud storage 33. Optionally, the client 31 may send a download report including the download period, download speed, download result, file size, file category, original file source or other download information to the statistical device 38. As described above, the original file source is also included in the offline download request.

Element-Task organizer

[0028] The task organizer 37 is in communication with the client 31, the task database 39 and the task manager 34. The task organizer 37 receives the offline download request from the client 31 and generates a unique serial number associated with the offline download request. The original file source is sent to the task database 39 for record. In addition, a recognition code corresponding to the requested file, for example, a hash value is also generated. Although the hash value is used in the embodiment for illustration, it is to be noted that other recognition code is applicable for the present invention. If the offline download request includes a URL link, the hash value is obtained from a hash function of the URL link. If the offline download request includes an eMule identifier, the hash value may be a MD4 or MD5 hash of the eMule identifier. If the file is downloaded via BitTorrent service, the hash value is obtained by a hash function of the characteristic code and serial number of the torrent file.

[0029] If the hash value corresponding to the requested file has been recorded in the task database 39 and the associated status is labeled pre-download completion, it means that the requested file has been pre-downloaded and stored in the cloud storage 33. Then, the task organizer 37 informs the client 31 that the requested file is ready for download in the cloud storage 33. In another situation, if the hash value corresponding to the requested file has been recorded in the task database 39 and the associated status is not labeled pre-download completion, it means that the pre-download process of the requested file is now in progress. The task organizer 37 will inform the client 31 after the completion of the pre-download process. In a different situation, the hash value corresponding to the requested file is not recorded in the task database 39, and it means that the requested file has not ever been pre-downloaded. Then, the hash value is recorded in the task database 39 and associated with the original file source, and the offline download request is sent to the task manager 34. During the pre-download process, the progress provided by the task manager 34 can be shown on the task organizer 37 to respond to the client's inquiry.

Element-Task manager

[0030] The task manager 34 is in communication with the task organizer 37, the offline download sever group 32 and the cluster- analyzing device 35. The task manager 34 receives the loading condition of each offline download server periodically from the offline download sever group 32. In addition, the task manager 34 can filter out repeated offline download request to make sure that the offline download server group 32 does not receive an offline download request more than once. In other words, when different clients make the same offline download request, only one offline download request is received by the offline download server group 32 and the different clients will receive the same pre-download progress while inquiring. In detail, the task manager 34 determines whether the current offline download request is identical to any previous offline download request. If yes, the information associated with the previous offline download request, e.g. the pre-download progress, is also sent to the current client, not only the previous client. Therefore, after the pre-download process of the requested file completes, each of the clients is informed and directed to download the same requested file from the cloud storage 33.

[0031] On the contrary, if it is the first time to receive the offline download request, the task manager 34 extracts the filename from the offline download request and sends the filename to the cluster- analyzing device 35. When the task manager 34 receives the first information (indicating that a similar file has been pre-downloaded before the offline download request) from the cluster- analyzing device 35, the task manager 34 does not send the offline download request to the offline download server group 32, but provides the similar filename(s) recorded in the cluster database 36 and the associated download link(s) included in the first information to the client 31. Therefore, the client 31 selects one of the associated download link(s) to download one of the similar file(s), or called as substitute files, from the cloud storage 33. On the other hand, when the task manager 34 receives the second information (indicating that there is no similar filename recorded in the cluster database 36) from the cluster-analyzing device 35, the task manager 34 selects one offline download server among the offline download server group 32 according to the received loading report and a predefined loading balance principle to start the pre-download process for the requested file. In response, the selected offline download server starts to pre-download and store the requested file in the cloud storage 33. During the pre-download process, the task manager 34 receives the pre-download progress, pre-download speed and other pre-download information from the offline download server group 32 and synchronously updates the pre-download information in the task organizer 37. Element-Cluster-analyzing device

[0032] The cluster-analyzing device 35 is in communication with the task manager 34, the offline download server group 32 and the cluster database 36. The cluster-analyzing device 35 receives the extracted filename from the task manager 34 and analyzes whether any filename recorded in the cluster database 36 is similar to the extracted filename.

[0033] The analyzing step is performed based on cluster analysis. At first, meaningless symbols and words in the extracted filename irrelevant to the file material are ignored. In general, there is a simple naming rule in a filename, e.g. movie filename consisting of movie name + publisher + version + resolution + video coding + audio coding + format or TV series filename consisting of series name + season number + episode number + episode title + image source + video coding (default is MP3) + audio coding + release group. Sometimes, the season number and episode number are not defined in Chinese TV series. It may be found that several referenced words or symbols irrelevant to the video material are inserted in some filenames. The words may include premiere, exclusive, hot, online, must, dubbing, TV, TV series and the like; the symbols may include []() ( ) [ ] &, etc. These irrelevant words or symbols should be deleted or dismissed. Afterwards, the extracted filename is cut into sections, each of which is a separate phrase. By cluster analysis based on the separate phrases, the extracted filename is classified into a target cluster. In brief, cluster is a group of data objects, and the data objects in the same cluster are more similar to each other than to those in other clusters. The cluster-analyzing device 35 compares the target cluster and the clusters in the cluster database 36. If the target cluster is found in the cluster database 36, it means that at least one filename in the matched cluster is similar to the extracted filename. In detail, the cluster analysis determines the similarity between the phrases of the requested filename and the phrases of filenames in each cluster recorded in the cluster database 36. After the cluster analysis, the cluster including the similar filename, if any, can be found. The known clustering algorithms are suitable for the present invention, e.g. partitioning method, hierarchical method, density-based method, grid-based method and model-based method, and the present invention is not limited by the following description. The filenames in one cluster are mapped to the cluster.

[0034] Besides, the similarity between two filenames can be obtained by cosine operation of the two filenames. For example, each of the two filenames are segmented and expressed by a vector: Fa = a-^ + a 2 x 2 +— H a n x n and Fb = + b 2 x 2 +— H b n x n . The similarity value is defined as:

Larger similarity value represents greater similarity between the two filenames.

[0035] If any similar filename is found in the cluster database 36, the cluster-analyzing device 35 sends the first information to the task manager 34 wherein the first information includes at least one similar filename and the associated download link. Which similar filename should be selected to be included in the first information is determined according to a predefined strategy. For example, the cluster-analyzing device 35 may randomly select several similar filenames in the designated cluster; may select the similar filenames which are requested within a predetermined period, e.g. 10 minutes; or may select several filenames corresponding to the latest requests in the designated cluster. Afterwards, the requested filename is also recorded in the designated cluster.

[0036] If no similar filename is found in the cluster database 36, the cluster-analyzing device 35 sends the second information to the task manager 34. Then, a new cluster is created in the cluster database 36 and the requested filename is recorded in the new cluster. After the pre-downloaded process completes, the cluster-analyzing device 34 receives the download link indicating the location of the pre-downloaded file, and records the download link associated with the requested filename in the new cluster in the cluster database 36. The download link is provided to the client 31 after the pre-download process so that the client 31 can access the pre- downloaded file in the cloud storage 33 through the download link.

Element-Cluster database

[0037] The cluster database 36 is in communication with the cluster-analyzing device 35 and the record-eliminating device 40. The cluster database 36 records clusters of filenames. In addition, the cluster database 36 can provide filenames in the designated cluster and the associated download links in response to the inquiry from the cluster-analyzing device 35. Furthermore, the recorded cluster information is accessible to the record-eliminating device 40.

Element-Record eliminating device

[0038] The record-eliminating device 40 is in communication with the cluster database 36 and the offline download server group 32. The record-eliminating device 40 periodically accesses the cluster information and the associated download links from the cluster database 36. For similar files, it is assumed that only one file should be retained in the cloud storage 33 and the other files should be removed from the cloud storage 33 to release the space in the cloud storage 33. Thus, the download links recorded in the cluster database 36 and associated with the deleted files are marked to indicate that the download links for the deleted files should be directed to the download link for the retained file. Therefore, the download link included in the first information should be associated with the retained file rather than the deleted file.

Element-Offline download server group

[0039] The offline download server group 32 is in communication with the record-eliminating device 40, the cloud storage 33, the task database 39, the task manager 34 and the cluster-analyzing device 35. The offline download server group 32 includes at least one offline download server. It periodically reports the loading condition and the available space of each offline download server to the task manager 34 so that the task manager 34 can designate a proper offline download server for each pre-download process according to the report. In response to the command from the task manager 34, the designated offline download server continuously pre-downloads the requested file, and the offline download server group 32 regularly sends the pre-download progress, pre-download speed and other pre-download information to the task manager 34. After the pre-download process completes, the pre-downloaded file is stored in the cloud storage 33 to be downloaded by the client 31 later. In the task database 39, the associated status of the recognition code such as the hash value corresponding to the pre-downloaded file is marked pre-download completion in response to the notice from the offline download server group 32. The associated download link to the cloud storage is sent to the cluster- analyzing device 35.

Element- Statistical device

[0040] The statistical device 38 is in communication with the client 31. The statistical device 38 receives the error report from the client 31 when any error is found in the received data segments involved in the downloaded file from the cloud storage 33. The statistical device 38 further receives feedback from the client 31 such as the download reports including the download period, download speed, download result, file size, file category, original file source or other download information. Basically the statistical device 38 generates log files according to the download reports for further statistical analysis.

Element-Cloud storage

[0041] The cloud storage 33 is in communication with the client 31 and the offline download server group 32. The cloud storage 33 mainly stores the pre-downloaded files from the offline download server group 32. The client 31 can download the files from the cloud storage 33 at high speed because certain bandwidth is reserved for the client 31.

[0042] The function and communication of each element of the offline download system as shown in FIG. 3 has been described in detail. According to the present invention, the task manager 34 does not command the offline download sever group 32 to start a pre-download process immediately after receiving an offline download request. Instead, the task manager 34 extracts the filename from the offline download request and sends the requested filename to the cluster- analyzing device 35. The cluster- analyzing device 35 checks whether any similar filename has been recorded in the cluster database 36. If yes, the cluster-analyzing device 35 sends the first information to the task manager 34 wherein the first information includes the similar filename(s) and the associated download link(s) to the cloud storage 33. Then, the task manager 34 provides the similar filename(s) and the associated download link(s) to the client 31, so that the client 31 may select one of the associated download links to download a similar file or a substitute file from the cloud storage 33. Hence, the loading of the offline download server group 32 together with the bandwidth is significantly reduced because the pre-download processes of similar files have been basically avoided. Another advantage is that the cloud storage 33 does not store huge amount of similar files with the same contents any more, resulting in optimization of storage space. Moreover, if the suggested similar file is acceptable to the client 31 , the client 31 can download a ready file from the cloud storage 33 without waiting long for the pre-download process of exactly the requested file. The proposed offline download method and system indeed achieve higher efficiency.

[0043] In the above-described embodiments, the similar files are determined according the filenames. For video files, the similar files can be determined according to the video contents of the video files, and the relative offline download methods are described as follows.

[0044] Please refer to FIG. 4, a flow diagram showing steps of a matching-relative portion of the offline download method according to the present invention. In step 401, the similarity value between two video files is calculated based on video matching. Since the video contents of video files are too complicated to analyze or classify, it is necessary to find the representative characteristic of the video files for analysis. For a video file, the video contents consist of video frames. At least one video frame may be extracted from the video file and considered as the representative characteristic to be analyzed. In detail, video contents consists many scenes which are constituted of many video frames. Each scene may be characterized by few video frames. Thus, the video frames are selected to represent the scene. In principle, similar video files should have the same representative video frames. Therefore, the essential issue is how to select the representative video frames.

[0045] This example illustrates the steps of calculating the similarity value between two video files A and B. At first, a video frame at time point A is extracted from a scene A of the video file A and decoded by a compliant video decoder to generate a representative decoded frame A'. Then, video frames during a period (e.g. from time point A-d to time point A+d; d is about 1 minute), are extracted from a scene B of the video file B and decoded to generate several decoded frames Bl~Bn. If the representative decoded frame A' matches one of the decoded frames Bl~Bn, it is assumed that the two scenes A and B are similar. Then, another video frame at different time point is further extracted from the video file A and the comparison step is repeated. If the video files A and B have more similar scenes than video files A and C, the similarity value between the video files A and B is greater than that between the video files A and C. When two video files are the same, the similarity value is defined as 1.

[0046] In practice, hash algorithm may be used to index the representative frames and compute the frame match sequence between the two video files. Then, random sample consensus (RANSAC) method is used to fit the frame match sequence to obtain the matching values of the frame match sequence between the two video files. At last, the similarity value between the two video files is calculated according to the matching values.

[0047] Concretely, the feature of the scene can be represented by the change on average brightness between video frames. When the current scene is switched to the next scene, it is believed that the average brightness will change obviously. Therefore, after determining the sampling period, we can calculate the average brightness of each video frame during the sampling period and obtain an average brightness curve. Thus, the similarity value between the two video files can be calculated by comparing the average brightness curves of the two video files.

[0048] Furthermore, since the video frames may be affected by noise signals, the noise-affected video frames or the influence should be removed to get the valid match sequence before calculating the similarity value. By taking advantage of the de-noise step, the precise of the calculated similarity value increases. Concretely, Gaussian filter or other proper filter can be applied to the average brightness curve to remove the meaningless extreme values and smooth the average brightness curve. Therefore, the similarity value of two video files can be calculated by comparing two smoothed average brightness curve derived from the two video files.

[0049] If the video file to be compared has been modified or edited by the publisher to change the frame rate, e.g. fast forward or slow motion effect, or the two video files have different frame rates, a frame rate-adjusting preprocess is required to synchronize the two video files before calculating the similarity value between the two video files. Otherwise, the calculation based on the asynchronous frames will affect the precise of the similarity value.

[0050] In the embodiment, the video files to be compared are downloaded from the cloud storage of the offline download system and the similarity values therebetween are calculated by a similarity-calculating device. Since the filename can be extracted from the offline download request, to calculate the similarity values between the requested video file and other video files downloaded from the cloud storage, it is suggested to select the video file having a filename similar to the requested video file as the first priority to be compared. [0051] After the calculation of the similarity values, the video files are classified into groups according to the similarity values. In step 402, if the similarity value between any two video files is greater than a first threshold value, it means that the two video files are considered as similar video files and they are grouped together. The video files are grouped together by recording the recognition codes such as hash values of the two video files in the same archive in a video matching database. In other words, similar video files are mapped to the hash values recorded in the same archive. In the embodiment, the similarity values are also recorded in the video matching database.

[0052] In step 403, the grouped video files are sorted by sorting the recognition codes according to the similarity values in descending order or ascending order. From the sorted records, the similarity between video files can be easily judged.

[0053] In order to release a certain space of the cloud storage for new coming files, the cloud storage should be regularly cleaned up to remove out-of-date files. In the embodiment, deleting the near- duplicated files, but not hard-to-get files, can increase the overall efficiency of the offline download system. In step 404, for the video files corresponding to the hash values recorded in a specific archive, one video file with higher similarity value relative to another video file can be deleted first. Therefore, if the hash values in the archive are sorted according to the similarity values in descending order, the video file mapped to the hash value arranged at the head can be deleted first. For example, relative to a specific video file, the similarity values of the first, the second and the third video files are 08, 0.7 and 1, respectively. The three hash values corresponding to the three video files are recorded in the same archive and rearranged in order of the third, the first and the second hash values. Therefore, the third video file mapped to the third hash value is selected and deleted first, if necessary. The next choice is the first video file, and the last choice is the second video file. It is to be noted that the sequence is not absolute. In an alternative embodiment, the acceptable similarity value may be predefined, and the video files with greater similarity values than the acceptable similarity value are randomly selected for deletion. Although the video files are removed from the cloud storage, the hash values and associated similarity values are still recorded in the video matching database.

[0054] In step 405, when the requested file is not found in the cloud storage, the task manager may check the video matching database to determine whether the requested file has been ever pre-downloaded but is deleted later. If yes, the task manager can find the recognition code such as hash value corresponding to the requested file in the video matching database. Thus, the similar video files mapped to the hash values recorded in the same archive may be suggested as the substitute video files to the client. In a preferred embodiment, only the video files with greater similarity values relative to the requested file than a second threshold value are shown in the recommended list. Certainly, the suggested video files are highly similar to the requested file based on the similarity values. It is easily known that since the corresponding hash values are recorded in the same archive, the second threshold value is greater than or equal to the first threshold value. Then, the client can select one of the suggested video file and download it from the cloud storage at high speed without waiting another pre-download process of the requested file. By this method, the loading of the offline download server group is significantly reduced because the pre-download processes of similar files have been basically avoided. In addition, it is not necessary for the cloud storage to store huge amount of similar video files or near- duplicate video files. Moreover, the client can download a ready file from the cloud storage without waiting long for the pre-download process of exactly the requested file and the efficiency is significantly raised.

[0055] It is to be noted that the steps 404 and 405 are independent of each other. That is, the execution sequence of steps 404 and 405 is not limited by the above-described embodiments. Even synchronous execution of these steps is allowed. [0056] Please refer to FIG. 5, a schematic diagram illustrating a matching-relative portion of the offline download system for performing the above-described offline download method. The offline download system basically includes, but not limited to, the cloud storage 51, the similarity-calculating device 52, the video matching database 53, the task manager 54, and a file-eliminating device 55.

[0057] The similarity-calculating device 52 is in communication with the cloud storage 51 and the video matching database 53. The similarity-calculating device 52 continuously downloads video files to be compared from the cloud storage 51, and calculates similarity values between any two video files. The similarity-calculating device 53 further includes a frame match sequence-computing unit 531, a matching value-calculating unit 532 and a similarity value-calculating unit 533. The frame match sequence- computing unit 531 uses hash algorithm to index the representative frames and computes the frame match sequence between the two video files. The matching value-calculating unit 532 uses random sample consensus (RANSAC) method to fit the frame match sequence to obtain the matching values of the frame match sequence between the two video files. The similarity value-calculating unit 533 calculates the similarity value according to the matching values.

[0058] Optionally, the similarity-calculating device 52 further includes a valid frame match sequence- computing unit 534 for removing the noise-affected frames of the video files to obtain the valid frame match sequence between the two video files. Further, the similarity- calculating device 52 may include a frame rate-adjusting unit 535 for adjusting frame rate of the video frames to synchronize the two video files.

[0059] The video matching database 53 groups the video files with similarity values greater than the first threshold value, and records the recognition codes such as hash values of the grouped video files in the same archive. The recognition codes recorded in the same archive are sorted according to the similarity values between video files. The above-mentioned actions are performed by a recognition code-recording unit 531 and a sorting unit 532 included in the video matching database 53, respectively.

[0060] When the requested file is not found in the cloud storage 51, the task manager 54 searches the recognition code corresponding to the requested file in the video matching database 53. If the recognition code is found, it means that the requested file has been removed from the cloud storage 51. The task manager 54 provides a recommended list showing the similar video files mapped to the recognition codes recorded in the same archive to the client. In a preferred embodiment, only the video files with greater similarity values relative to the requested file than the second threshold are shown in the recommended list because these video files are highly similar to the requested file. The second threshold value is greater than or equal to the first threshold value. Then, the client can select one of the suggested video files and download it from the cloud storage 51 at high speed without waiting another pre-download process of the requested file.

[0061] The file-eliminating device 55 is used for deleting near- duplicate video files in the cloud storage 51. The video files to be deleted are determined according to the sorted recognition codes recorded in the archive. If the recognition codes are sorted according to the similarity values in descending order, the video file mapped to the recognition code arranged at the head of the archive is deleted first. In an alternative embodiment, the acceptable similarity value may be predefined, and the video files with greater similarity values than the acceptable similarity value are randomly selected for deletion.

[0062] It is to be noted in the above-described embodiments, all units, elements and devices are descried and denominated based on their functionality. Other units, elements or devices capable of performing at least one of the steps described above are included in the modifications of the embodiments and encompassed in the invention. In addition, the steps of the methods according to the present invention can be implemented by hardware or software. The scope of the present invention is not limited thereto.

[0063] For those having ordinary skill in the art, it is understood that all or part of the steps in the various embodiments described above can be executed by a program instructing relevant hardware, and the corresponding program may be stored in a computer readable storage medium. The so-called storage medium can be, for example, a ROM / RAM, disk or optical disk, etc.

[0064] While the invention has been described in terms of what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention needs not be limited to the disclosed embodiment. On the contrary, it is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims which are to be accorded with the broadest interpretation so as to encompass all such modifications and similar structures.