Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND DEVICE FOR PACKET ACOUSTIC ECHO CANCELLATION
Document Type and Number:
WIPO Patent Application WO/2015/036858
Kind Code:
A1
Abstract:
An object of the invention is providing a method and device for packet acoustic echo cancellation. The echo cancelling device acquires source voice packet streams between two calling ends, determines transmission direction information corresponding to each packet, updates target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, cancels echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, so as to obtain echo-cancelled packet streams and send it to a corresponding end of the two calling ends. Compared with the prior art, the present invention realizes bidirectional packet acoustic echo cancellation, enhances the performance of the PAEC channel for multiple times, reduces the number of hardware and corresponding maintenance costs, and meanwhile reduces call processing and relevant signaling overheads; further it needs no signaling support and provides a transparent PAEC function; moreover, the present invention may further use software to implement the signal buffer manager, which improves the flexibility of System processing and enhances the efficiency of System.

Inventors:
LI ZHOUZHOU (CN)
CAI YIGANG (US)
Application Number:
PCT/IB2014/002005
Publication Date:
March 19, 2015
Filing Date:
September 08, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ALCATEL LUCENT (FR)
International Classes:
H04M9/08
Foreign References:
US20090168673A12009-07-02
US20130155924A12013-06-20
US7333447B22008-02-19
US7852792B22010-12-14
US8144862B22012-03-27
US20090168673A12009-07-02
Attorney, Agent or Firm:
THERIAS, Philippe (148/152 Route de la Reine, Boulogne-billancourt, FR)
Download PDF:
Claims:
We claim:

1. A method for packet acoustic echo cancellation, wherein the method comprises the following steps:

a. acquiring source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets;

b. determining transmission direction information corresponding to each packet in the source voice packet streams;

c. updating target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information;

d. cancelling echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams;

e. sending the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams. 2. The method according to claim 1, wherein the step b comprises:

- determining transmission direction information corresponding to the packets based on source address information and target address information included in each packet in the source voice packet streams. 3. The method according to claim 1 or 2, wherein the step d comprises:

- cancelling echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information along with the packets corresponding to the transmission direction information in the target packet streams, to obtain echo-cancelled packet streams corresponding to the target packet streams.

4. The method according to claim 1 or 2, wherein the method further comprises:

- establishing or updating reference packet streams corresponding to the target packet streams based on the echo-cancelled packet streams;

wherein the step d comprises:

dl. determining whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams;

d2. cancelling echo of the target packet streams when the target packet streams include an echo packet, to obtain echo-cancelled packet streams corresponding to the target packet streams.

5. The method according to claim 4, wherein the step dl comprises:

- determining, based on a cyclic sliding window, whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams. 6. The method according to claim 4, wherein the step dl comprises:

- determining whether the target packet streams include any echo packet based on the reference packet streams along with transmission direction information corresponding to each packet in the target packet streams and the reference packet streams and energy level information corresponding to a corresponding plurality of successive packets in the target packet streams and the reference packet streams.

7. The method according to any of claims 4-6, wherein the step d2 comprises:

- cancelling echo of the target packet streams using a replacement packet when the target packet streams include an echo packet, to obtain echo-cancelled packet streams corresponding to the target packet streams.

8. An echo cancelling device for packet acoustic echo cancellation, wherein the device comprises:

acquiring apparatus for acquiring source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets;

direction determining apparatus for determining transmission direction information corresponding to each packet in the source voice packet streams;

target updating apparatus for updating target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information;

cancelling apparatus for cancelling echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams;

sending apparatus for sending the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

9. The echo cancelling device of claim 8, wherein direction determining apparatus is configured to:

- determine transmission direction information corresponding to the packets based on source address information and target address information included in each packet in the source voice packet streams.

10. The echo cancelling device of claim 8 or 9, wherein cancelling apparatus is configured to:

- cancel echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information along with the packets corresponding to the transmission direction information in the target packet streams, to obtain echo-cancelled packet streams corresponding to the target packet streams.

11. The echo cancelling device of claim 8 or 9, wherein the device further comprises: reference updating apparatus for establishing or updating reference packet streams corresponding to the target packet streams based on the echo-cancelled packet streams;

wherein the cancelling apparatus comprises:

echo determining unit for determining whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams;

echo cancelling unit for cancelling echo of the target packet streams when the target packet streams include an echo packet, to obtain echo-cancelled packet streams corresponding to the target packet streams.

12. The echo cancelling device of claim 11, wherein the echo determining unit is configured to:

- determine, based on a cyclic sliding window, whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams.

13. The echo cancelling device of claim 11, wherein the echo determining unit is configured to:

- determine whether the target packet streams include any echo packet based on the reference packet streams along with transmission direction information corresponding to each packet in the target packet streams and the reference packet streams and energy level information corresponding to a corresponding plurality of successive packets in the target packet streams and the reference packet streams.

14. The echo cancelling device of any of claims 11-13, wherein the echo cancelling unit s is configured to:

- cancel echo of the target packet streams using a replacement packet when the target packet streams include an echo packet, to obtain echo-cancelled packet streams corresponding to the target packet streams.

Description:
METHOD AND DEVICE FOR PACKET ACOUSTIC ECHO

CANCELLATION

FIELD OF THE INVENTION

[0001] The present invention relates to the field of communication, and in particular to the technology of packet acoustic echo cancellation.

BACKGROUND OF THE INVENTATION

[0002] The acoustic echo in mobile networks is due to badly designed handsets and hands free equipment where the sound from the speakers of voice receiver is fed to the microphone of the receiver (and then back to voice sender). The acoustic echo cancellation (AEC) removes the echo present in a communication signal. The AEC is a key capability to guarantee the quality of voice in telecommunications.

[0003] The traditional AEC technology in CS network has been well developed when acoustic echo is removed in wave form domain. However, there was no recognized way of performing AEC in packet network, such as Voice over IP networks. Some vendors (e.g., Broadcom (refer to US7333447), Samsung, 3Com, etc.) invented packet AEC for packet network, but they had to decode packet stream to analog signal or digital signal (i.e., in wave form domain) first and remove echo from the signal, then re-encode echo cancelled signal back to packets. This would introduce degradation of voice quality (VQ) due to multiple encoding/decoding thus negating the benefits of TrFO (Transcoder Free Operation) in eliminating multiple encoding and decoding. In addition, the traditional AEC are inefficient for VoIP networks as these only support limited tail-length delay due to computational complexity and huge buffering requirements.

[0004] Alcatel-Lucent/Bell Labs invented a true packet acoustic echo cancellation (PAEC) technology, for instance, which can suppress acoustic echo in packet stream by only using the parameters for describing the waveform in EVRC or EVRC-B RTP packets. The Bell Labs has three related patents and patent publication on the PAEC as follows:

[0005] - US7852792 Packet Based Echo Cancellation and Suppression (granted on 12/14/2010) by Binshi Cao et al.

[0006] - US008144862 method and Apparatus for the Detection and Suppression of Echo in Packet based Communication Networks Using Frame Energy Estimation (granted on 3/27/2012) by Binshi Cao et al.

[0007] - US2009/0168673 Method and Apparatus for Detecting and Suppressing Echo in Packet Networks (published on 7/2/2009) by Lampros Kalampoukas and Semyon Sosin.

[0008] In these above patents and patent publication, by using the parameters for describing the waveform of the packets to compare and estimate, they compare reference stream packets with target stream packets in a PAEC channel to remove similar packets in the target stream which are suspected as echo packets, which realizing fundamental methods for cancellation/suppression of packet acoustic echo in packet network.

[0009] However, the methodologies provided in these patents and patent publication were only for one directional packet echo cancellation, which were not for bidirectional PAEC. With two or multiple parties involved in a single voice call, multiple one directional PAEC devices or multiple one directional PAEC channels in one device must be employed to remove echo which could be generated from multiple party equipments during the call. From packet switch performance and capacity analysis, one directional packet echo cancellation is not efficient, especially in an intra-PS switch scenario, and may not meet industry quality and performance standards. The PAEC product implemented with one directional packet echo cancellation will not greatly satisfy packet switch customer needs. Therefore, these methods all have shortcomings and limitations for real industry deployment.

[0010] For example, Figure 1 shows the one directional PAEC architecture as depicted in the US2009/0168673. Since the one-directional PAEC channel can only be allocated to one party, so it has to distinguish the direction of the voice stream, that is, the voice flow is "to" the party or "from" the party. If the voice flow is to the party, the voice flow is a reference flow. If the voice flow is from the party, the voice flow is a traget flow. The voice flow runs either as a reference packet processing or as a target packet processing. The key is that the reference part and target part won't run in parallel.

[001 1] The apparent shortcoming of the one directional packet echo cancellation is that even for the one directional packet echo cancellation, the channel has to buffer/manage reference stream, i.e., additional unnecessary buffering and computation complexity. If two direction echo cancellation required, two PAEC channels have to be applied, and additional reference buffering/managing plus normal/error packet processing should be considered in each PAEC channel. It is definitely a waste of resources when in an-intra PS switch scenario. SUMMARY OF THE INVENTION

[0012] An object of the invention is providing a method and device for packet acoustic echo cancellation.

[0013] According to one aspect of the invention, a method for packet acoustic echo cancellation is provided, wherein the method comprises the following steps:

[0014] a. acquiring source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets;

[0015] b. determining transmission direction information corresponding to each packet in the source voice packet streams;

[0016] c. updating target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information;

[0017] d. cancelling echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams;

[0018] e. sending the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

[0019] According to another aspect of the invention, an echo cancelling device for packet acoustic echo cancellation is further provided, wherein the device comprises:

[0020] acquiring apparatus for acquiring source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets;

[0021] direction determining apparatus for determining transmission direction information corresponding to each packet in the source voice packet streams;

[0022] target updating apparatus for updating target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information;

[0023] cancelling apparatus for cancelling echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams;

[0024] sending apparatus for sending the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

[0025] Compared with the prior art, the present invention realizes bidirectional packet acoustic echo cancellation in a packet echo cancelling device, by acquiring source voice packet streams to be cancelled packet acoustic echo between two calling ends, determining transmission direction information corresponding to each packet in the source voice packet streams, updating target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, cancelling echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams, and finally sending the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams. The present invention enhances the performance of the PAEC channel for multiple times, reduces the number of hardware and corresponding maintenance costs, and meanwhile reduces call processing and relevant signaling overheads; further it needs no signaling support and provides a transparent PAEC function; moreover, the present invention may further use software to implement the signal buffer manager, which greatly shortens the echo cancelling process and saves the buffer storage space. It also improves the flexibility of system processing and enhances the efficiency of packet acoustic echo cancellation in packet switching.

BRIEF DESCRIPTION OF THE DRAWINGS [0026] Other features, purposes and advantages of the invention will become more explicit by means of reading the detailed statement of the non-restrictive embodiments made with reference to the accompanying drawings.

[0027] Fig. 1 shows an architecture diagram of one directional packet acoustic echo cancellation as depicted in US2009/0168673 according to one aspect of the present invention;

[0028] Fig. 2 shows a schematic diagram of an echo cancelling device for packet acoustic echo cancellation according to one aspect of the present invention;

[0029] Fig. 3 shows a schematic diagram of an echo cancelling device for packet acoustic echo cancellation according to a preferred embodiment of the present invention;

[0030] Fig. 4 shows a flow diagram of a method for packet acoustic echo cancellation according to another aspect of the present invention;

[0031] Fig. 5 shows a flow diagram of a method for packet acoustic echo cancellation according to a preferred embodiment of the present invention;

[0032] Fig. 6 shows a reference diagram of a bidirectional packet acoustic echo cancellation according to a preferred embodiment of the present invention, wherein a packet in each direction acts as a reference for a packet in the other direction;

[0033] Fig. 7 shows a reference diagram of a bidirectional packet acoustic echo cancellation according to a preferred embodiment of the present invention, wherein an echo-cancelled packet in each direction acts as a reference for a packet in the other direction;

[0034] Fig. 8 shows a buffer and comparison diagram of a bidirectional packet acoustic echo cancellation using echo-cancelled packets as reference according to a preferred embodiment of the present invention;

[0035] Fig. 9 shows a diagram of cyclic sliding window for packet acoustic echo cancellation according to a preferred embodiment of the present invention.

[0036] The same or similar reference signs in the drawings represent the same or similar component parts.

DETAILED DESCRIPTION OF THE INVENTION

[0037] Below, details of the invention will be further provided in combination with the accompanying drawings.

[0038] Fig. 2 shows a schematic diagram of an echo cancelling device for packet acoustic echo cancellation according to one aspect of the present invention; wherein the echo cancelling device comprises: an acquiring apparatus 1, a direction determining apparatus 2, a target updating apparatus 3, a cancelling apparatus 4, and a sending apparatus 5. Specifically, the acquiring apparatus 1 acquires source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets; the direction determining apparatus 2 determines transmission direction information corresponding to each packet in the source voice packet streams; the target updating apparatus 3 updates target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information; the cancelling apparatus 4 cancels echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams; the sending apparatus 5 sends the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

[0039] Here, the echo cancelling device includes, but not limited to, an electronic hardware device or software device that automatically perform numerical value computation and information processing according to a pre-set or pre-stored instruction, wherein the hardware device includes, but not limited to, a microprocessor, an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital processor (DSP), an embedded device, etc. Those skilled in the art should understand that other echo cancellation devices, if applicable to the present invention, should also be included within the protection scope of the present invention and are incorporated here by reference.

[0040] The echo cancelling device may be applied in any VoIP network, real-time communication network (RTC), and LTE/EPC network. Currently, no effective and well-known packet acoustic echo cancellation apparatus has been proposed in those networks.

[0041] The above apparatuses work continuously therebetween. Here, those skilled in the art should understand that "continuously" means each of the above apparatuses acquires source voice packet streams between two calling ends, determines transmission direction information, updates target packet streams, acquires echo-cancelled packet streams, sends the echo-cancelled packet streams and the like between two calling ends in real time or according to a preset or real-time adjusted working pattern requirements, until the echo cancelling device stops acquiring source voice packet streams to be cancelled packet acoustic echo between the two calling ends.

[0042] Acquiring apparatus 1 acquires source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets.

[0043] Specifically, the acquiring apparatus 1 acquires, from two calling ends which are performing a call (with calling end A and calling end B as an example), source voice packet streams to be cancelled packet acoustic echo between two calling ends, wherein the source voice packet steams include a source voice packet stream from calling end A to calling end B, as well as a source voice packet stream from calling end B to calling ending A, wherein the source voice packet streams include one or more packets, and the packets of the source voice packet streams might include any echo packet.

[0044] The direction determining apparatus 2 determines transmission direction information corresponding to each packet in the source voice packet streams. [0045] Specifically, the direction determining apparatus 2 may parse out direction information corresponding to the packet based on header information of the packet, wherein the direction information in the header information may be identified by for example a device named Flow Information Access and the like based on source address information and target address information included in each packet in the source voice packet streams, and a direction tag is applied to the packet in a header of the packet, so as to be available for parsing by the direction determining apparatus 2.

[0046] Or, preferably, the direction determining apparatus 2 may directly determine the transmission direction information corresponding to the packet based on source address information and target address information included in each packet in the source voice packet streams.

[0047] For example, calling end A and calling end B are taken as examples to illustrate the two calling ends. The transmission direction information includes from A to B or from B to A. Suppose the address of calling end A and/or the address of calling end B are known, the transmission direction information corresponding to the packet may be determined directly based on the source address and the target address in the header information of the packet.

[0048] Or, for example, the source address and the target address in the header information of the packet are compared using a predetermined computing function. In the source address is greater than the target address, it is determined that the transmission direction of the packet is from A to B; otherwise, if the source address is smaller than the target address, it is determined that the transmission direction of the packet is from B to A. If other cases arise, error occurs, and then the packet is discarded.

[0049] The target updating apparatus 3 updates target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information.

[0050] Specifically, the target updating apparatus 3 based on the source voice packet streams, sends the source voice packet streams to a signal buffer manager, so as to update the target packet streams in the signal buffer manager using the source voice packet streams, wherein since the source voice packet streams are voice packet streams to be subjected to packet acoustic echo cancellation between two calling ends, the target packet streams also include voice packet streams corresponding to the two calling ends. Herein, since the direction determining apparatus 2 determines the transmission direction information corresponding to each packet in the source voice packet stream, each packet in the target packet streams also includes the corresponding transmission direction information.

[0051] Here, those skilled in the art should understand that the signal buffer manager may be implemented by hardware such as FPGA or by software.

[0052] The cancelling apparatus 4 cancels echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams. [0053] Specifically, the cancellation apparatus 4 acquires target packet streams corresponding to the two calling ends in the single buffer manager and acquires voice packet streams which have been cancelled echo and do not include any echo packet, corresponding to the target packet streams, takes the voice packet streams that do not include any echo packet as reference packet streams. The cancelling apparatus 4 compares the target packet streams and the reference packet streams in different directions based on the transmission direction information corresponding to each packet in the target packet streams and the reference packet streams. For example, the target packet stream from end A to end B is compared with the reference packet stream from end B to end A, or the target packet stream from end B to end A is compared with the reference packet stream from end A to end B. It is detected whether the target packet stream includes any echo packet based on packet acoustic echo cancellation algorithm (PAEC algorithm). If it includes an echo packet, the target packet stream is subjected to echo cancellation through deleting the echo packet or by substituting the detected echo packet with a replacement packet, or through other manners.

[0054] Here, substitute the detected echo packet with a replacement packet to acquire an echo-cancelled packet stream corresponding to the target packet stream, wherein the replacement packet includes, but not limited to, a noise packet (for example, a packet including a certain type of noise, for example white noise, comfortable noise, etc.), a silent packet (for example, an empty packet), a 1/8 rate packet as finally buffered in the target packet stream, and the like or a combination thereof.

[0055] The method of determining transmission direction information corresponding to each packet in the reference packet stream is identical or similar to the method of determining transmission direction information corresponding to each packet in the source voice packet stream, which is thus not detailed here but incorporated here by reference.

[0056] Or, preferably, the cancelling apparatus 4 could cancel echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information along with the packets corresponding to the transmission direction information in the target packet streams, to obtain echo-cancelled packet streams corresponding to the target packet streams.

[0057] Specifically, the cancelling apparatus 4 acquires target packet streams corresponding to the two calling ends in the signal buffer manager, and determines two target packet streams corresponding to different transmission direction information based on each packet in the target packet streams and its corresponding transmission direction information. Herein, the different transmission direction information takes from end A to end B and from end B to end A as examples. The cancelling apparatus 4 compares target packet streams in different directions to cancel echo of the target packet streams. For example, when the cancelling apparatus 4 cancels the target packet stream from end A to end B, the target packet stream from end B to end A act as the reference packet stream for the target packet stream from end A to end B, and vice versa, such that the target packet streams and the reference packet streams are compared based on a packet acoustic echo cancellation algorithm (PAEC algorithm), to thereby detect whether the target packet streams include any echo packet. If they include an echo packet, echo canceling is performed to the target packet streams by deleting the echo packet or substituting the detected echo packet using a replacement packet, or in other manner.

[0058] For example, Fig. 6 shows a reference diagram of a bidirectional packet acoustic echo cancellation according to a preferred embodiment of the present invention, wherein a packet in each direction acts as a reference for a packet in the other direction.

[0059] Specifically, the RTP parser sends both source voice packet streams coming from end A and/or end B into the packet processing, and buffers the source voice packet streams in the signal buffer manager, and takes the source voice packet streams as the target packet streams. Herein, the source voice packet streams as sent by the RTP parser include payloads and headers of the packets of the source voice packet streams; the header of the packet includes transmission direction information corresponding to each packet in the source voice packet streams. Here, the source voice packet stream sent from end A either has end B's echo, or does not have any echo, and the source voice packet stream sent from end B either has end As echo or does not have any echo. Since the target packet stream is determined through buffering the source voice packet stream, if the source voice packet stream includes an echo, the target packet stream also includes a corresponding echo; if the source voice packet does not include any echo, then the target packet stream does not include any corresponding echo.

[0060] In the PAEC algorithm module, a target packet stream in one direction in the signal buffer manager is compared with a target packet stream, which is used as a reference packet stream, in the other direction as pre-stored in the signal buffer manager, so as to determine whether the target packet streams in different directions have any echo packet.

[0061] If the target packet streams have an echo packet, then the PAEC algorithm module performs packet acoustic echo cancellation computation thereto, and sends the echo-cancelled packet streams after echo cancelling to end A and end B, respectively.

[0062] The sending apparatus 5 sends the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

[0063] Specifically, the sending apparatus 5 sends the echo-cancelled packet streams to a corresponding end corresponding to a source end of the echo-cancelled packet streams based on the transmission direction information corresponding to the echo-cancelled packet streams, for example, based on the target address information of the echo-cancelled packet stream, or the corresponding calling end information in the transmission direction information.

[0064] For example, if the transmission direction information corresponding to the echo-cancelled packet stream is from end A to end B, then the echo-cancelled packet stream is sent to end B. Here, end B is the corresponding end for end A.

[0065] Therefore, the present invention implements a bidirectional packet acoustic echo cancellation method, the method:

[0066] - reduces the number of hardware and corresponding maintenance cost: compared with a signal directional PAEC, the bidirectional PAEC hardware demand is reduced to half, and corresponding maintenance is saved, and as the present invention may further use software to implement the signal buffer manager, it further reduces the number of hardware, which is easy to manage and control;

[0067] - reduces call processing and signaling overheads: for a basic call, it is only required to allocate one PAEC channel;

[0068] - realizes an implicit/ transparent PAEC without any signaling support: a gateway in a packet voice (transmit) path can integrate bidirectional PAEC, so as to provide an implicit/ transparent PAEC to end A and end B.

[0069] Fig. 3 shows a schematic diagram of an echo cancelling device for packet acoustic echo cancellation according to a preferred embodiment of the present invention, wherein the echo cancelling device comprises an acquiring apparatus , a direction determining apparatus 2', a target updating apparatus 3', a cancelling apparatus 4', a sending apparatus 5' and a reference updating apparatus 6', wherein the cancelling apparatus 4' comprises an echo determining unit 4 and an echo cancelling unit 42'. Specifically, the acquiring apparatus acquires source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets; the direction determining apparatus 2' determines transmission direction information corresponding to each packet in the source voice packet streams; the target updating apparatus 3' updates target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information; the reference updating apparatus 6' establishes or updates reference packet streams corresponding to the target packet streams based on the echo-cancelled packet streams; the echo determining unit 4 determines whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams; the echo cancelling unit 42' cancels echo of the target packet streams when the target packet streams include an echo packet, to obtain echo-cancelled packet streams corresponding to the target packet streams; the sending apparatus 5' sends the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

[0070] Herein, the acquiring apparatus , the direction determining apparatus 2', the target updating apparatus 3' and the sending apparatus 5' as comprised in the echo cancelling device are identical or substantially identical to corresponding apparatus shown in Fig. 2, which are thus not detailed here, but incorporated here by reference.

[0071] The above apparatuses work continuously therebetween. Here, those skilled in the art should understand that "continuously" means each of the above apparatuses acquires source voice packet streams between two calling ends, determines transmission direction information, updates target packet streams, establishes or updates reference packet streams, determines whether the target packet streams include any echo packet, acquires echo-cancelled packet streams, sends the echo-cancelled packet streams and the like between two calling ends in real time or according to a preset or real-time adjusted working pattern requirements, until the echo cancelling device stops acquiring source voice packet streams to be cancelled packet acoustic echo between the two calling ends.

[0072] The reference updating apparatus 6' establishes or updates reference packet streams corresponding to the target packet streams based on the echo-cancelled packet streams.

[0073] Specifically, the reference updating apparatus 6' may interact with the cancelling apparatus 4' so as to obtain the echo-cancelled packet streams; then, the reference updating apparatus 6' establishes or updates reference packet streams corresponding to the target packet streams in the signal buffer manager based on the echo-cancelled packet streams. Namely, if the signal buffer manager does not include a reference packet stream yet, then establish a reference packet stream corresponding to the target packet stream based on the echo-cancelled packet stream; if the signal buffer manager includes a reference packet stream, and then update a reference packet stream corresponding to the target packet stream based on the echo-cancelled packet stream.

[0074] The echo determining unit 4 determines whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams.

[0075] Specifically, the echo determining unit 4 compares the target packet streams and the reference packet streams in different directions based on the transmission direction information corresponding to each packet in the target packet streams and the reference packet streams. For example, the target packet stream with echo from end A to end B is compared with the reference packet stream which is echo cancelled from end B to end A, or the target packet stream with echo from end B to end A is compared with the reference packet stream which is echo cancelled from end A to end B. It is detected whether the target packet stream includes any echo packet based on packet acoustic echo cancellation algorithm (PAEC algorithm).

[0076] Here, the method of determining transmission direction information corresponding to each packet in the reference packet stream is identical or similar to the method of determining transmission direction information corresponding to each packet in the source voice packet stream, which is thus not detailed here but incorporated here by reference.

[0077] Preferably, the echo determining unit 4 could determine whether the target packet streams include any echo packet based on the reference packet streams along with transmission direction information corresponding to each packet in the target packet streams and the reference packet streams and energy level information corresponding to a corresponding plurality of successive packets in the target packet streams and the reference packet streams.

[0078] For example, the echo determining unit 4 compares the target packet streams and the reference packet stream pre-stored in anther direction based on reference packet streams along with transmission direction information corresponding to each packet in the target packet stream and the reference packet stream, and detects whether the target packet stream and the reference packet stream have similar packets in the corresponding plurality of successive packets in the target packet streams and the reference packet streams . Then the echo determining unit 41 ' may further judge whether attenuation exists to the similar packets in the target packet streams in further consideration of energy level information (i.e., various kinds of gain information) corresponding to a plurality of successive packets in the target packet stream and the reference packet stream, i.e., the energy level information is lower than the energy level information of the corresponding reference packet stream; in the case of existence, it proves that the similar packets are echo packets, then the target packet streams include echo packets.

[0079] The reason is that in an echo, the echo energy typically has a certain degree of attenuation compared with the original voice; therefore, the energy level is compared as an ancillary condition for detecting an echo packet.

[0080] The echo cancelling unit 42' cancels echo of the target packet streams when the target packet streams include an echo packet, to obtain echo-cancelled packet streams corresponding to the target packet streams.

[0081] Specifically, if the target packet stream comprises an echo packet, the echo cancelling unit 42' cancels echo through deleting the echo packet or by substituting the detected echo packet with a replacement packet, or through other manners.

[0082] Here, preferably, substitute the detected echo packet with a replacement packet so as to obtain an echo-cancelled packet stream corresponding to the target packet stream, wherein the replacement packet includes, but not limited to, a noise packet (for example, a packet including a certain type of noise, for example white noise, comfortable noise, etc.), a silent packet (for example, an empty packet), a 1/8 rate packet as finally buffered in the target packet stream, and the like or a combination thereof.

[0083] Herein, when using a replacement packet with a given payload, it is required to correspondingly modify the RTP headers and other length-related fields and checks, for example, modifying a platform-specific header, an IP header, a UDP header, and an RTP header.

[0084] For example, Fig. 7 shows a reference diagram of a bidirectional packet acoustic echo cancellation according to a preferred embodiment of the present invention, wherein an echo-cancelled packet in each direction acts as a reference for a packet in the other direction.

[0085] Specifically, the RTP parser sends both source voice packet streams coming from end A and/or end B into the packet processing, and buffers the source voice packet streams in the signal buffer manager, and takes the source voice packet streams as the target packet streams. Herein, the source voice packet streams as sent by the RTP parser include payloads and headers of the packets of the source voice packet streams; the header of the packet includes transmission direction information corresponding to each packet in the source voice packet streams. Here, the source voice packet stream sent from end A either has end B's echo or does not have any echo, and the source voice packet stream sent from end B either has end As echo or does not have any echo. Since the target packet stream is determined through buffering the source voice packet stream, if the source voice packet stream includes an echo, the target packet stream also includes a corresponding echo; if the source voice packet does not include any echo, then the target packet stream does not include any corresponding echo.

[0086] The signal buffer manager interacts with the PAEC algorithm module so as to acquire the echo-cancelled packet stream as determined by the PAEC algorithm module, and buffers the echo-cancelled packet stream to the signal buffer manager as the reference packet stream.

[0087] Herein, each packet in the target packet stream and the reference packet stream includes its corresponding transmission direction information.

[0088] In the PAEC algorithm module, a target packet flow in one direction in the signal buffer manager is compared with a reference packet stream in the other direction as pre-stored in the signal buffer manager. As shown in Fig. 8, the packet set in the target packet stream (packet j to packet j+M, i.e., the target packet stream from end B to end A) is compared with corresponding set 1, set 2, ... , set K in the reference packet stream (i.e., voice packet stream from end A to end B, as references in the direction from end B to end A) respectively, and the packet set in the target packet stream (packet i to packet i+N, i.e., the target packet streams in the direction from end A to end B) is compared with corresponding set 1, set 2, set Q in the reference packet stream (i.e., voice packet streams from end B to end A, as references in the direction from end A to end B) respectively, to determine whether the target packet streams in different directions have any echo packet, wherein the reference packet streams do not include a corresponding echo packet any more, which thus belong to echo-cancelled packet streams.

[0089] If the target packet streams have an echo packet, then the PAEC algorithm module performs packet acoustic echo cancellation computation thereto, so as to send the echo-cancelled packet streams to end A and end B, respectively.

[0090] Herein, a comparison and removal algorithm with respect to echo frames at end A and end B is described as follows:

[0091] Specifically, in Fig. 8, "N+l" denotes a target window size for direction A to B, "N+Q" denotes a corresponding reference window size. "Q" is determined by an echo path delay at end B. gj q denotes a comparison result from comparing the N+l

(i, i+1, i+N) packets in the target buffer from A to B with N+l (q, q+1, 1+N) packets in the reference buffer from A to B. Herein, those skilled in the art should understand that, the transmission direction information of the reference packet stream for the target packet stream from A to B should be from B to A. The minimum value

A

of Qi q (q=q, q+1, q+Q-1) will be compared with the minimum threshold Cth to determine whether any echo exists. [0092] (e quation l)

[0093] (equation 2)

[0094] The minimum value of the result in equation (2) indicates a similarity between a target stream in the direction from end A to end B and a corresponding reference stream; if the result of equation 2 satisfies the following expression:

A <

[0095] mm e ^ ~ Qm (equation 3)

[0096] it indicates that similarity exists between the target packet stream and the reference packet stream. Therefore, the target packet stream includes an echo.

[0097] In Fig. 8, "M+l" denotes a target window size for direction B to A, "M+K" denotes a corresponding reference window size. "K" is determined by an echo path delay at end A. g j k denotes a comparison result from comparing the M+l (j , j+1, j+M) packets in the target buffer from B to A with M+l (k, k+1, ... , k+M) packets in the reference buffer from B to A. Herein, those skilled in the art should understand that, the transmission direction information of the reference packet stream for the target packet stream from B to A should be from A to B. The minimum value of k

(k=k, k+1, ... ,k+Q-l) will be compared with the minimum threshold to determine whether any echo exists.

L0098J (equation 4)

[0099] (equation 5)

[00100] The minimum value of the result in equation (5) indicates a similarity between a target stream in the direction from end B to end A and a corresponding reference stream; if the result of equation 5 satisfies the following expression:

[00101] mm Qj - k ~ Qm (equation 6)

[00102] it indicates that similarity exists between the target packet stream and the reference packet stream. Therefore, the target packet stream includes an echo.

[00103] Here, P denotes the value of LSP (Line Spectral Pair).

[00104] Preferably, the echo determining unit 4 could determine, based on a cyclic sliding window, whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams.

[00105] When the header of each packet carries transmission direction information, the packets corresponding end A and end B directions may be buffered for PAEC processing. The cyclic sliding window may include target packets with echo and reference packets with echo cancelled. After the echo in a packet is cancelled, the cyclic sliding window moves forward the packet. A new target packet and a new echo-cancelled packet will be filled into the cyclic sliding window, to thereby realize PAEC cyclic management, so as to reduce PAEC channel resources and space to a great margin and enhance the speed and efficiency of packet acoustic echo cancellation.

[00106] Specifically, for example, Fig. 9 shows a diagram of cyclic sliding window for packet acoustic echo cancellation according to a preferred embodiment of the present invention.

[00107] Fig. 9 shows 2 voice packet streams, including a first voice packet stream buffered in counterclockwise direction and represented by pi to p7, and a second voice packet stream buffered in clockwise direction and represented by ρ to p7', wherein the first voice packet stream is sent from end B to end A, and the second voice packet stream is sent from end A to end B.

[00108] In Fig. 9, set the size of the target window to 4, i.e., performing determination of whether echo is included in a voice packet stream using 4 packets as one group; set the size of the reference window to 9, i.e., retaining 9 echo-cancelled packets as a reference packet stream for a voice packet stream in the other direction.

[00109] In the first voice packet stream sent from end B to end A, pi is the start point of the target window, p4 is the end point of the target window, and pi to p4 compose the target window of the first voice packet stream; p5 is the start point of the reference window, p7 is the end point of the reference window, and 9 packets from the packet indicated by p5 to the packet indicated by p7 compose the reference window for end A to end B.

[00110] If a new target packet sent from end B to end A enters, then the idle p8 will be used, and the end point of the target window will move to p8. Therefore, the size of the target window will exceed 4, thereby triggering an echo detection and cancellation operation. Here, since pi is at the start point position of the target window, the echo detection and cancellation operation is performed with respect to packet pi .

[00111] If pi is detected as an echo, substitute pi with a replacement packet so as to perform echo cancellation; if pi is a non-echo normal voice, substitution is not needed. After pi is sent, the start point of the target window moves to p2 (the size of the target window resumes to 4); the end point of the reference window moves to pi, and the size of the reference window changes to be 10. Since the reference window is enlarged beyond 9, reference buffer release is triggered, such that the start point of the reference window moves to p6, and p5 is released.

[00112] Here, those skilled in the art should understand that in practical application, the sizes of the reference window and the target window may be online configured and calibrated, and multiple idle buffers (i.e., multiple p8) may be reserved.

[00113] In the second voice packet stream sent from end A to end B, ρ is the start point of the target window, p4' is the end point of the target window, and ρ to p4' compose the target window of the second voice packet stream; p5' is the start point of the reference window, p7' is the end point of the reference window, and 9 packets from the packet indicated by p5' to the packet indicated by p7' compose the reference window for end B to end A.

[00114] If a new target packet sent from end A to end B enters, then the idle p8' will be used, and the end point of the target window will move to p8'. Therefore, the size of the target window will exceed 4, thereby triggering an echo detection and cancellation operation. Here, since ρ is at the start point position of the target window, the echo detection and cancellation operation is performed with respect to packet ρ .

[00115] If ρ is detected as an echo, substitute ρ with a replacement packet so as to perform echo cancellation; if ρ is a non-echo normal voice, substitution is not needed. After ρ is sent, the start point of the target window moves to p2' (the size of the target window resumes to 4); the end point of the reference window moves to ρ , and the size of the reference window changes to be 10. Since the reference window is enlarged beyond 9, reference buffer release is triggered, such that the start point of the reference window moves to p6', and p5' is released.

[00116] Here, those skilled in the art should understand that in practical application, the sizes of the reference window and the target window may be online configured and calibrated, and multiple idle buffers (i.e., multiple p8') may be reserved.

[00117] Fig. 4 shows a flow diagram of a method for packet acoustic echo cancellation according to another aspect of the present invention. Specifically, in the step si, the echo cancelling device acquires source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets; in the step s2, the echo cancelling device determines transmission direction information corresponding to each packet in the source voice packet streams; in the step s3, the echo cancelling device updates target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information; in the step s4, the echo cancelling device cancels echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams; in the step s5, the echo cancelling device sends the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

[00118] The above steps work continuously therebetween. Here, those skilled in the art should understand that "continuously" means each of the above steps acquires source voice packet streams between two calling ends, determines transmission direction information, updates target packet streams, acquires echo-cancelled packet streams, sends the echo-cancelled packet streams and the like between two calling ends in real time or according to a preset or real-time adjusted working pattern requirements, until the echo cancelling device stops acquiring source voice packet streams to be cancelled packet acoustic echo between the two calling ends.

[001 19] In the step si, the echo cancelling device acquires source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets.

[00120] Specifically, in the step si, the echo cancelling device acquires, from two calling ends which are performing a call (with calling end A and calling end B as an example), source voice packet streams to be cancelled packet acoustic echo between two calling ends, wherein the source voice packet steams include a source voice packet stream from calling end A to calling end B, as well as a source voice packet stream from calling end B to calling ending A, wherein the source voice packet streams include one or more packets, and the packets of the source voice packet streams might include any echo packet.

[00121] In the step s2, the echo cancelling device determines transmission direction information corresponding to each packet in the source voice packet streams.

[00122] Specifically, in the step s2, the echo cancelling device may parse out direction information corresponding to the packet based on header information of the packet, wherein the direction information in the header information may be identified by for example a device named Flow Information Access and the like based on source address information and target address information included in each packet in the source voice packet streams, and a direction tag is applied to the packet in a header of the packet, so as to be available for parsing by the echo cancelling device.

[00123] Or, preferably, in the step s2, the echo cancelling device may directly determine the transmission direction information corresponding to the packet based on source address information and target address information included in each packet in the source voice packet streams.

[00124] For example, calling end A and calling end B are taken as examples to illustrate the two calling ends. The transmission direction information includes from A to B or from B to A. Suppose the address of calling end A and/or the address of calling end B are known, the transmission direction information corresponding to the packet may be determined directly based on the source address and the target address in the header information of the packet.

[00125] Or, for example, the source address and the target address in the header information of the packet are compared using a predetermined computing function. In the source address is greater than the target address, it is determined that the transmission direction of the packet is from A to B; otherwise, if the source address is smaller than the target address, it is determined that the transmission direction of the packet is from B to A. If other cases arise, error occurs, and then the packet is discarded.

[00126] In the step s3, the echo cancelling device updates target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information.

[00127] Specifically, in the step s3, the echo cancelling device based on the source voice packet streams, sends the source voice packet streams to a signal buffer manager, so as to update the target packet streams in the signal buffer manager using the source voice packet streams, wherein since the source voice packet streams are voice packet streams to be subjected to packet acoustic echo cancellation between two calling ends, the target packet streams also include voice packet streams corresponding to the two calling ends. Herein, since in the step s2, the echo cancelling device determines the transmission direction information corresponding to each packet in the source voice packet stream, each packet in the target packet streams also includes the corresponding transmission direction information.

[00128] Here, those skilled in the art should understand that the signal buffer manager may be implemented by hardware such as FPGA or by software.

[00129] In the step s4, the echo cancelling device cancels echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information, to obtain echo-cancelled packet streams corresponding to the target packet streams.

[00130] Specifically, in the step s4, the echo cancelling device acquires target packet streams corresponding to the two calling ends in the single buffer manager and acquires voice packet streams which have been cancelled echo and do not include any echo packet, corresponding to the target packet streams, takes the voice packet streams that do not include any echo packet as reference packet streams. In the step s4, the echo cancelling device compares the target packet streams and the reference packet streams in different directions based on the transmission direction information corresponding to each packet in the target packet streams and the reference packet streams. For example, the target packet stream from end A to end B is compared with the reference packet stream from end B to end A, or the target packet stream from end B to end A is compared with the reference packet stream from end A to end B. It is detected whether the target packet stream includes any echo packet based on packet acoustic echo cancellation algorithm (PAEC algorithm). If it includes an echo packet, the target packet stream is subjected to echo cancellation through deleting the echo packet or by substituting the detected echo packet with a replacement packet, or through other manners.

[00131] Here, substitute the detected echo packet with a replacement packet to acquire an echo-cancelled packet stream corresponding to the target packet stream, wherein the replacement packet includes, but not limited to, a noise packet (for example, a packet including a certain type of noise, for example white noise, comfortable noise, etc.), a silent packet (for example, an empty packet), a 1/8 rate packet as finally buffered in the target packet stream, and the like or a combination thereof.

[00132] The method of determining transmission direction information corresponding to each packet in the reference packet stream is identical or similar to the method of determining transmission direction information corresponding to each packet in the source voice packet stream, which is thus not detailed here but incorporated here by reference.

[00133] Or, preferably, in the step s4, the echo cancelling device could cancel echo of the target packet streams based on each packet in the target packet streams and its corresponding transmission direction information along with the packets corresponding to the transmission direction information in the target packet streams, to obtain echo-cancelled packet streams corresponding to the target packet streams.

[00134] Specifically, in the step s4, the echo cancelling device acquires target packet streams corresponding to the two calling ends in the signal buffer manager, and determines two target packet streams corresponding to different transmission direction information based on each packet in the target packet streams and its corresponding transmission direction information. Herein, the different transmission direction information takes from end A to end B and from end B to end A as examples. In the step s4, the echo cancelling device compares target packet streams in different directions to cancel echo of the target packet streams. For example, when the echo cancelling device cancels the target packet stream from end A to end B, the target packet stream from end B to end A act as the reference packet stream for the target packet stream from end A to end B, and vice versa, such that the target packet streams and the reference packet streams are compared based on a packet acoustic echo cancellation algorithm (PAEC algorithm), to thereby detect whether the target packet streams include any echo packet. If they include an echo packet, echo canceling is performed to the target packet streams by deleting the echo packet or substituting the detected echo packet using a replacement packet, or in other manner.

[00135] For example, Fig. 6 shows a reference diagram of a bidirectional packet acoustic echo cancellation according to a preferred embodiment of the present invention, wherein a packet in each direction acts as a reference for a packet in the other direction.

[00136] Specifically, the RTP parser sends both source voice packet streams coming from end A and/or end B into the packet processing, and buffers the source voice packet streams in the signal buffer manager, and takes the source voice packet streams as the target packet streams. Herein, the source voice packet streams as sent by the RTP parser include payloads and headers of the packets of the source voice packet streams; the header of the packet includes transmission direction information corresponding to each packet in the source voice packet streams. Here, the source voice packet stream sent from end A either has end B's echo, or does not have any echo, and the source voice packet stream sent from end B either has end As echo or does not have any echo. Since the target packet stream is determined through buffering the source voice packet stream, if the source voice packet stream includes an echo, the target packet stream also includes a corresponding echo; if the source voice packet does not include any echo, then the target packet stream does not include any corresponding echo.

[00137] In the PAEC algorithm module, a target packet stream in one direction in the signal buffer manager is compared with a target packet stream, which is used as a reference packet stream, in the other direction as pre-stored in the signal buffer manager, so as to determine whether the target packet streams in different directions have any echo packet.

[00138] If the target packet streams have an echo packet, then the PAEC algorithm module performs packet acoustic echo cancellation computation thereto, and sends the echo-cancelled packet streams after echo cancelling to end A and end B, respectively.

[00139] In the step s5, the echo cancelling device sends the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

[00140] Specifically, in the step s5, the echo cancelling device sends the echo-cancelled packet streams to a corresponding end corresponding to a source end of the echo-cancelled packet streams based on the transmission direction information corresponding to the echo-cancelled packet streams, for example, based on the target address information of the echo-cancelled packet stream, or the corresponding calling end information in the transmission direction information.

[00141] For example, if the transmission direction information corresponding to the echo-cancelled packet stream is from end A to end B, then the echo-cancelled packet stream is sent to end B. Here, end B is the corresponding end for end A.

[00142] Therefore, the present invention implements a bidirectional packet acoustic echo cancellation method, the method:

[00143] - reduces the number of hardware and corresponding maintenance cost: compared with a signal directional PAEC, the bidirectional PAEC hardware demand is reduced to half, and corresponding maintenance is saved, and as the present invention may further use software to implement the signal buffer manager, it further reduces the number of hardware, which is easy to manage and control;

[00144] - reduces call processing and signaling overheads: for a basic call, it is only required to allocate one PAEC channel;

[00145] - realizes an implicit/ transparent PAEC without any signaling support: a gateway in a packet voice (transmit) path can integrate bidirectional PAEC, so as to provide an implicit/ transparent PAEC to end A and end B.

[00146] Fig. 5 shows a flow diagram of a method for packet acoustic echo cancellation according to a preferred embodiment of the present invention. Specifically, in the step si ', the echo cancelling device acquires source voice packet streams to be cancelled packet acoustic echo, between two calling ends, wherein the source voice packet streams comprise one or more packets; in the step s2', the echo cancelling device determines transmission direction information corresponding to each packet in the source voice packet streams; in the step s3 ', the echo cancelling device updates target packet streams corresponding to the two calling ends in a single buffer manager based on the source voice packet streams, wherein each packet in the target packet streams includes the transmission direction information; in the step s6', the echo cancelling device establishes or updates reference packet streams corresponding to the target packet streams based on the echo-cancelled packet streams; in the step s41 ', the echo cancelling device determines whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams; in the step s42', the echo cancelling device cancels echo of the target packet streams when the target packet streams include an echo packet, to obtain echo-cancelled packet streams corresponding to the target packet streams; in the step s5', the echo cancelling device sends the echo-cancelled packet streams to a corresponding end in the two calling ends based on transmission direction information corresponding to the echo-cancelled packet streams.

[00147] Herein, the step si ', step s2', step s3' and step s5' of the method are identical or substantially identical to corresponding step shown in Fig. 4, which are thus not detailed here, but incorporated here by reference.

[00148] The above steps work continuously therebetween. Here, those skilled in the art should understand that "continuously" means each of the above steps acquires source voice packet streams between two calling ends, determines transmission direction information, updates target packet streams, establishes or updates reference packet streams, determines whether the target packet streams include any echo packet, acquires echo-cancelled packet streams, sends the echo-cancelled packet streams and the like between two calling ends in real time or according to a preset or real-time adjusted working pattern requirements, until the echo cancelling device stops acquiring source voice packet streams to be cancelled packet acoustic echo between the two calling ends.

[00149] In the step s6', the echo cancelling device establishes or updates reference packet streams corresponding to the target packet streams based on the echo-cancelled packet streams.

[00150] Specifically, in the step s6', the echo cancelling device may interact with the step s4' so as to obtain the echo-cancelled packet streams; then, in the step s6', the echo cancelling device establishes or updates reference packet streams corresponding to the target packet streams in the signal buffer manager based on the echo-cancelled packet streams. Namely, if the signal buffer manager does not include a reference packet stream yet, then establish a reference packet stream corresponding to the target packet stream based on the echo-cancelled packet stream; if the signal buffer manager includes a reference packet stream, and then update a reference packet stream corresponding to the target packet stream based on the echo-cancelled packet stream.

[00151] In the step s41 ', the echo cancelling device determines whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams.

[00152] Specifically, in the step s41 ', the echo cancelling device compares the target packet streams and the reference packet streams in different directions based on the transmission direction information corresponding to each packet in the target packet streams and the reference packet streams. For example, the target packet stream with echo from end A to end B is compared with the reference packet stream which is echo cancelled from end B to end A, or the target packet stream with echo from end B to end A is compared with the reference packet stream which is echo cancelled from end A to end B. It is detected whether the target packet stream includes any echo packet based on packet acoustic echo cancellation algorithm (PAEC algorithm).

[00153] Here, the method of determining transmission direction information corresponding to each packet in the reference packet stream is identical or similar to the method of determining transmission direction information corresponding to each packet in the source voice packet stream, which is thus not detailed here but incorporated here by reference.

[00154] Preferably, in the step s41 ', the echo cancelling device could determine whether the target packet streams include any echo packet based on the reference packet streams along with transmission direction information corresponding to each packet in the target packet streams and the reference packet streams and energy level information corresponding to a corresponding plurality of successive packets in the target packet streams and the reference packet streams.

[00155] For example, in the step s41 ', the echo cancelling device compares the target packet streams and the reference packet stream pre-stored in anther direction based on reference packet streams along with transmission direction information corresponding to each packet in the target packet stream and the reference packet stream, and detects whether the target packet stream and the reference packet stream have similar packets in the corresponding plurality of successive packets in the target packet streams and the reference packet streams . Then in the step s41 ', the echo cancelling device may further judge whether attenuation exists to the similar packets in the target packet streams in further consideration of energy level information (i.e., various kinds of gain information) corresponding to a plurality of successive packets in the target packet stream and the reference packet stream, i.e., the energy level information is lower than the energy level information of the corresponding reference packet stream; in the case of existence, it proves that the similar packets are echo packets, then the target packet streams include echo packets.

[00156] The reason is that in an echo, the echo energy typically has a certain degree of attenuation compared with the original voice; therefore, the energy level is compared as an ancillary condition for detecting an echo packet.

[00157] In the step s42', the echo cancelling device cancels echo of the target packet streams when the target packet streams include an echo packet, to obtain echo-cancelled packet streams corresponding to the target packet streams.

[00158] Specifically, if the target packet stream comprises an echo packet, in the step s42', the echo cancelling device cancels echo through deleting the echo packet or by substituting the detected echo packet with a replacement packet, or through other manners.

[00159] Here, preferably, substitute the detected echo packet with a replacement packet so as to obtain an echo-cancelled packet stream corresponding to the target packet stream, wherein the replacement packet includes, but not limited to, a noise packet (for example, a packet including a certain type of noise, for example white noise, comfortable noise, etc.), a silent packet (for example, an empty packet), a 1/8 rate packet as finally buffered in the target packet stream, and the like or a combination thereof.

[00160] Herein, when using a replacement packet with a given payload, it is required to correspondingly modify the RTP headers and other length-related fields and checks, for example, modifying a platform-specific header, an IP header, a UDP header, and an RTP header.

[00161] For example, Fig. 7 shows a reference diagram of a bidirectional packet acoustic echo cancellation according to a preferred embodiment of the present invention, wherein an echo-cancelled packet in each direction acts as a reference for a packet in the other direction.

[00162] Specifically, the RTP parser sends both source voice packet streams coming from end A and/or end B into the packet processing, and buffers the source voice packet streams in the signal buffer manager, and takes the source voice packet streams as the target packet streams. Herein, the source voice packet streams as sent by the RTP parser include payloads and headers of the packets of the source voice packet streams; the header of the packet includes transmission direction information corresponding to each packet in the source voice packet streams. Here, the source voice packet stream sent from end A either has end B's echo or does not have any echo, and the source voice packet stream sent from end B either has end As echo or does not have any echo. Since the target packet stream is determined through buffering the source voice packet stream, if the source voice packet stream includes an echo, the target packet stream also includes a corresponding echo; if the source voice packet does not include any echo, then the target packet stream does not include any corresponding echo.

[00163] The signal buffer manager interacts with the PAEC algorithm module so as to acquire the echo-cancelled packet stream as determined by the PAEC algorithm module, and buffers the echo-cancelled packet stream to the signal buffer manager as the reference packet stream.

[00164] Herein, each packet in the target packet stream and the reference packet stream includes its corresponding transmission direction information.

[00165] In the PAEC algorithm module, a target packet flow in one direction in the signal buffer manager is compared with a reference packet stream in the other direction as pre-stored in the signal buffer manager. As shown in Fig. 8, the packet set in the target packet stream (packet j to packet j+M, i.e., the target packet stream from end B to end A) is compared with corresponding set 1, set 2, ... , set K in the reference packet stream (i.e., voice packet stream from end A to end B, as references in the direction from end B to end A) respectively, and the packet set in the target packet stream (packet i to packet i+N, i.e., the target packet streams in the direction from end A to end B) is compared with corresponding set 1, set 2, set Q in the reference packet stream (i.e., voice packet streams from end B to end A, as references in the direction from end A to end B) respectively, to determine whether the target packet streams in different directions have any echo packet, wherein the reference packet streams do not include a corresponding echo packet any more, which thus belong to echo-cancelled packet streams.

[00166] If the target packet streams have an echo packet, then the PAEC algorithm module performs packet acoustic echo cancellation computation thereto, so as to send the echo-cancelled packet streams to end A and end B, respectively.

[00167] Herein, a comparison and removal algorithm with respect to echo frames at end A and end B is described as follows:

[00168] Specifically, in Fig. 8, "N+l" denotes a target window size for direction A to B, "N+Q" denotes a corresponding reference window size. "Q" is determined by an A

echo path delay at end B. gj q denotes a comparison result from comparing the N+l

(i, i+1, i+N) packets in the target buffer from A to B with N+l (q, q+1, 1+N) packets in the reference buffer from A to B. Herein, those skilled in the art should understand that, the transmission direction information of the reference packet stream for the target packet stream from A to B should be from B to A. The minimum value

A

of Qi q (q=q, q+1, q+Q-1) will be compared with the minimum threshold Cth to determine whether an echo exists.

(equation 7)

[ 00170] (equation 8) [00171] The minimum value of the result in equation (8) indicates a similarity between a target stream in the direction from end A to end B and a corresponding reference stream; if the result of equation 8 satisfies the following expression:

A

[00172] mm e i , q < e H / (equation 9)

[00173] it indicates that similarity exists between the target packet stream and the reference packet stream. Therefore, the target packet stream includes an echo.

[00174] In Fig. 8, "M+l" denotes a target window size for direction B to A, "M+K" denotes a corresponding reference window size. "K" is determined by an echo path delay at end A. g j k denotes a comparison result from comparing the M+l (j , j+1, j+M) packets in the target buffer from B to A with M+l (k, k+1, ... , k+M) packets in the reference buffer from B to A. Herein, those skilled in the art should understand that, the transmission direction information of the reference packet stream for the target packet stream from B to A should be from A to B. The minimum value of k

(k=k, k+1, k+Q-1) will be compared with the minimum threshold to determine whether any echo exists.

E B _ r B B B -|

i _ LC i k' C i k+1 ' '"' C i k+K- L00175J J J ' k J ' k+1 ^J' k+ i (equation 10)

[00176] (equation 11)

[00177] The minimum value of the result in equation (11) indicates a similarity between a target stream in the direction from end B to end A and a corresponding reference stream; if the result of equation 11 satisfies the following expression:

[00178] m Qjjc - Qm (equation 12)

[00179] it indicates that similarity exists between the target packet stream and the reference packet stream. Therefore, the target packet stream includes an echo.

[00180] Here, P denotes the value of LSP (Line Spectral Pair).

[00181] Preferably, in the step s41 ', the echo cancelling device could determine, based on a cyclic sliding window, whether the target packet streams include any echo packet based on each packet in the target packet streams and its corresponding transmission direction information along with the packets in the reference packet streams.

[00182] When the header of each packet carries transmission direction information, the packets corresponding end A and end B directions may be buffered for PAEC processing. The cyclic sliding window may include target packets with echo and reference packets with echo cancelled. After the echo in a packet is cancelled, the cyclic sliding window moves forward the packet. A new target packet and a new echo-cancelled packet will be filled into the cyclic sliding window, to thereby realize PAEC cyclic management, so as to reduce PAEC channel resources and space to a great margin and enhance the speed and efficiency of packet acoustic echo cancellation.

[00183] Specifically, for example, Fig. 9 shows a diagram of cyclic sliding window for packet acoustic echo cancellation according to a preferred embodiment of the present invention.

[00184] Fig. 9 shows 2 voice packet streams, including a first voice packet stream buffered in counterclockwise direction and represented by pi to p7, and a second voice packet stream buffered in clockwise direction and represented by ρ to p7', wherein the first voice packet stream is sent from end B to end A, and the second voice packet stream is sent from end A to end B.

[00185] In Fig. 9, set the size of the target window to 4, i.e., performing determination of whether echo is included in a voice packet stream using 4 packets as one group; set the size of the reference window to 9, i.e., retaining 9 echo-cancelled packets as a reference packet stream for a voice packet stream in the other direction.

[00186] In the first voice packet stream sent from end B to end A, pi is the start point of the target window, p4 is the end point of the target window, and pi to p4 compose the target window of the first voice packet stream; p5 is the start point of the reference window, p7 is the end point of the reference window, and 9 packets from the packet indicated by p5 to the packet indicated by p7 compose the reference window for end A to end B.

[00187] If a new target packet sent from end B to end A enters, then the idle p8 will be used, and the end point of the target window will move to p8. Therefore, the size of the target window will exceed 4, thereby triggering an echo detection and cancellation operation. Here, since pi is at the start point position of the target window, the echo detection and cancellation operation is performed with respect to packet pi .

[00188] If pi is detected as an echo, substitute pi with a replacement packet so as to perform echo cancellation; if pi is a non-echo normal voice, substitution is not needed. After pi is sent, the start point of the target window moves to p2 (the size of the target window resumes to 4); the end point of the reference window moves to pi, and the size of the reference window changes to be 10. Since the reference window is enlarged beyond 9, reference buffer release is triggered, such that the start point of the reference window moves to p6, and p5 is released.

[00189] Here, those skilled in the art should understand that in practical application, the sizes of the reference window and the target window may be online configured and calibrated, and multiple idle buffers (i.e., multiple p8) may be reserved.

[00190] In the second voice packet stream sent from end A to end B, ρ is the start point of the target window, p4' is the end point of the target window, and ρ to p4' compose the target window of the second voice packet stream; p5' is the start point of the reference window, p7' is the end point of the reference window, and 9 packets from the packet indicated by p5' to the packet indicated by p7' compose the reference window for end B to end A.

[00191] If a new target packet sent from end A to end B enters, then the idle p8' will be used, and the end point of the target window will move to p8'. Therefore, the size of the target window will exceed 4, thereby triggering an echo detection and cancellation operation. Here, since ρ is at the start point position of the target window, the echo detection and cancellation operation is performed with respect to packet ρ .

[00192] If ρ is detected as an echo, substitute ρ with a replacement packet so as to perform echo cancellation; if ρ is a non-echo normal voice, substitution is not needed. After ρ is sent, the start point of the target window moves to p2' (the size of the target window resumes to 4); the end point of the reference window moves to ρ , and the size of the reference window changes to be 10. Since the reference window is enlarged beyond 9, reference buffer release is triggered, such that the start point of the reference window moves to p6', and p5' is released.

[00193] Here, those skilled in the art should understand that in practical application, the sizes of the reference window and the target window may be online configured and calibrated, and multiple idle buffers (i.e., multiple p8') may be reserved.

[00194] To those skilled in the art, apparently the present invention is not limited to the details of the aforementioned exemplary embodiments, moreover, under the premise of not deviating from the spirit or fundamental characteristics of the invention, this invention can be accomplished in other specific forms. Therefore, the embodiments should be considered exemplary and non-restrictive no matter from which point, the scope of the invention is defined by the appended claims instead of the above description, and aims at covering the meanings of the equivalent components falling into the claims and all changes within the scope in this invention. Any reference sign in the claims shall not be deemed as limiting the concerned claims. Besides, apparently the word "comprise/include" does not exclude other components or steps, singular numbers does not exclude complex numbers, the plurality of components or means mentioned in device claims may also be accomplished by one component or means through software or hardware, the wording like first and second are only used to represent names rather than any specific order.