Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SEQUENCING THE SPEECH SIGNAL
Document Type and Number:
WIPO Patent Application WO/2017/025108
Kind Code:
A2
Abstract:
A method of operating an audio processing device to improve a user's perception to speech sound. The method comprising: Sequencing Speech Signal by splitting an audio signal into a plurality of frequency bands, and presenting in sequence (Non-simultaneous) these speech frequency bands from the high frequency bands to the low frequency bands.

Inventors:
AL-SHALASH TAHA KAIS TAHA (IQ)
Application Number:
PCT/EG2016/000029
Publication Date:
February 16, 2017
Filing Date:
October 04, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AL-SHALASH TAHA KAIS TAHA (EG)
International Classes:
G10L19/04
Download PDF:
Claims:
CLAIMS :-

1 - A METHOD OF OPERATING AN AUDIO PROCESSING DEVICE TO IMPROVE A USER'S PERCEPTION OF AN SPEECH SOUND, THE METHOD COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS. WHEREIN SAID FREQUENCY BANDS PRESENTED IN SEQUENCE (NON- SIMULTANEOUS).

2- A METHOD OF OPERATING AN AUDIO PROCESSING DEVICE TO IMPROVE A USER'S PERCEPTION OF AN SPEECH SOUND, THE METHOD COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN THE LOWER FREQUENCY BANDS PRESENTED TOGETHER WITH THE HIGHER FREQUENCY BANDS IN SEQUENCE.

3- A HEARING ASSISTANCE APPARATUS, COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN SAID FREQUENCY BANDS PRESENTED IN SEQUENCE (NON-SIMULTANEOUS).

4- A HEARING ASSISTANCE APPARATUS, COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN THE LOWER FREQUENCY BANDS PRESENTED TOGETHER WITH THE HIGHER FREQUENCY BANDS IN SEQUENCE.

Description:
SEQUENCING THE SPEECH SIGNAL

TECHNICAL FIELD

THE PRESENT APPLICATION RELATES TO IMPROVE SPEECH PERCEPTION, E.G. SPEECH INTELLIGIBILITY, IN PARTICULAR TO IMPROVING SOUND PERCEPTION FOR A PERSON, E.G. A HEARING IMPAIRED PERSON.

THE APPLICATION RELATES TO AN AUDIO PROCESSING DEVICE AND IT'S USE LIKE ALL KINDS OF HEARING AIDS AND COCHLEAR IMPLANTS.

THE APPLICATION FURTHER RELATES TO A DATA PROCESSING SYSTEM COMPRISING A PROCESSOR PERFORMING THE METHOD.

THE DISCLOSURE MAY BE USEFUL IN APPLICATIONS SUCH AS COMMUNICATION DEVICES, E.G. TELEPHONES, OR LISTENING DEVICES, E.G. HEARING INSTRUMENTS, HEADSETS, HEAD PHONES, ACTIVE EAR PROTECTION DEVICES.

BACKGROUND ART

THE FOLLOWING WAYS FOR THE LOSS OF HEARING DESCRIBED IS TO MODIFY SPEECH TO MAKE IT MORE INTELLIGIBLE FOR PEOPLE WITH SENSORY NEURAL HEARING LOSS. 1- ENHANCEMENT SPECTRAL SHAPE:-

IT INCREASE THE GAIN FOR FREQUENCIES WITH HIGH CONCENTRATION OF ACOUSTIC ENERGY IN THE SPEECH WAVE (E.G. FORMANTS) TO MAKE THESE PEAKS MORE PROMINENT IN SPEECH SPECTRUM, UNFORTUNATELY, IMPROVEMENT IN INTELLIGIBILITY HAVE BEEN SMALL OR NON-EXISTENT.

2- ENHANCEMENT OF CONSONANT TO VOWEL RATIO :-

IT INCREASE THE GAIN FOR CONSONANT SOUNDS BUT NOT FOR VOWEL SOUNDS BASED ON THE IMPORTANCE OF CONSONANTS SOUND FOR INTELLIGIBILITY AND TO PREVENT MASKING OF CONSONANTS BY VOWELS.

3- TRANSIENT ENHANCEMENT:-

IT INCREASE THE RATE OF CHANGE IN INTENSITY FOR SOUNDS BASED ON THAT MANY CONSONANT SOUNDS HAVE RAPID INTENSITY CHANGES THAT MIGHT BE IMPORTANT FOR THESE SOUNDS RECOGNITION, BUT ITS USEFULNESS IN REAL LIFE MIGHT BE DISAPPOIMTNG.

4- ENHANCEMENT OF DURATION:-

VOWELS PRECEDING A VOICED CONSONANT ARE LONGER IN DURATION THAN VOWELS THAT PRECEDE AN UNVOICED CONSONANT. SO ENHANCEMENT THE DURATION OF VOWELS MIGHT BE USEFUL IN RECOGNITION OF FOLLOWING CONSONANT.

UNFORTUNATELY THE INTELLIGIBILITY WITH THIS METHOD IS NOT HIGH.

5- SPEECH SIMPLIFICATION;-

INTERACTION OF MANY CUES OF SPEECH STIMULUS MAY BE DIFFICULT FOR HEARING IMPAIRED PERSON WITH A LIMITED HEARING ABILITY TO SEPARATE THESE CUES, SO REPLACE THE SPEECH SIGNAL AS AN EXTREME WITH PURE TONES WILL DECREASE THE CUES NEEDED TO BE RECOGNIZED AND SEPARATED, THAT MIGHT BE USEFUL IN INTELLIGIBILITY.

THIS METHOD APPEAR TO BE BENEFICIAL ONLY FOR SEVER HEARING IMPAIRED.

6- ENHANCEMENT BY RE-SYNTHESIS:-

IT CONSIST OF RECOGNITION OF SPEECH SIGNAL BY HEARING AID PROCESSOR THEN RESYNTHESIZED IT IN A CLEAR, NOISE FREE WAY.

THIS METHOD IS HIGHLY AFFECTED BY NOISE WHILE ACCENTS AND EMOTION WILL NOT BE CONVOYED. DISCLOSURE OF INVENTION

SEQUENCING SPEECH SIGNAL:

MASKING WHICH OBSCURES A SOUND IMMEDIATELY FOLLOWING THE MASKER IS CALLED FORWARD MASKING. THAT MEAN THE SIGNAL IS PERSIST FOR SOME TIME AFTER IT TURNED OFF.

UPWARD SPREAD OF MASKING IS LOW-FREQUENCY SOUNDS MASKING HIGH-FREQUENCY SOUNDS.

WE CAN TAKE BENEFIT OF THESE TWO FACTS BY SEQUENCING

(NON-SIMULTANEOUS) THE SPEECH SIGNAL FOR EACH SPEECH PHONEME SO THE HIGH FREQUENCY INFORMATION IS PRESENTED FIRST THEN THE LOW FREQUENCY INFORMATION IS PRESENTED LATER. BY THIS MECHANISM THE UPWARD SPREAD OF MASKING WILL NOT OCCUR BECAUSE THE HIGH FREQUENCY WILL BE PRESENTED WITHOUT THE LOW FREQUENCY PART OF SPEECH SIGNAL.

THERE ARE TWO SUGGESTED METHODS TO SEQUENCING SPEECH SIGNAL: a. EACH FREQUENCY BAND PRESENTED ALONE FROM HIGH FREQUENCY BANDS TO LOW FREQUENCY BANDS, SEE FIG (1), AS EXAMPLE THE HIGH FREQUENCY BAND PRESENTED FIRST THEN MIDDLE FREQUENCY BAND PRESENTED SECOND THEN LOW FREQUENCY BAND PRESENTED LASTLY. b. HIGH FREQUENCY BANDS PRESENTED FIRST THEN LOWER FREQUENCY BANDS ADDED TO THE HIGHER BANDS THEN PRESENTED SIMULTANEOUSLY, SEE FIG (2) AS EXAMPLE THE HIGH FREQUENCY BAND PRESENTED FIRST THEN HIGH AND MIDDLE FREQUENCY BANDS PRESENTED SECOND THEN ALL FREQUENCY BANDS PRESENTED LASTLY .

DURATION OF PRESENTATION OF EACH FREQUENCY BAND:-

THERE ARE MANY METHODS COULD BE USED TO DETERMINE THE DURATION OF EACH FREQUENCY BAND PRESENTATION, I WILL DISCUSS TWO METHODS OF THEM AS EXAMPLES :-

1- THE DURATION OF EACH FREQUENCY BAND OF EACH PHONEME COULD BE CONSTANT, I.E. THE DURATION FIXED FOR ANY PHONEME , BUT WE MUST BE SURE THAT THE SUM OF ALL FREQUENCY BANDS DURATION NOT EXCEED THE DURATION OF ANY PHONEME, THIS COULD BE DONE BY PROVIDE RELATIVELY SMALL DURATION FOR ALL FREQUENCY BANDS EXCEPT THE LAST ONE THAT COULD BE PRESENTED AS LONG AS PHONEME PRESENTED; FOR EXAMPLE THE DURATION OF PHONEME IS 90 MSEC THEN HIGH FREQUENCY PART LAST FOR (E.G. 25MSEC), MIDDLE FREQUENCY BAND LAST FOR (E.G. 25 MSEC), AND LOW FREQUENCY BAND LAST (E.G. 40 MSEC) SEE FIG (3).

2- THE DURATION OF EACH FREQUENCY BAND OF EACH PHONEME CORRELATED WITH THE DISTRIBUTION OF ACOUSTIC ENERGY ACROSS FREQUENCIES, FOR EXAMPLE THE MORE ACOUSTIC ENERGY WITHIN HIGH FREQUENCY REGION COULD INDICATE MORE DURATION FOR HIGH FREQUENCY BANDS AND VISE VERSA.

BRIEF DESCRIPTION OF FIGURES

FIG (1): FIRST SUGGESTED METHOD TO SEQUENCING A SINGLE SPEECH PHONEME SIGNAL. 1=HIGH FREQUENCY BAND PRESENTED 1 st , 2= MIDDLE FREQUENCY BAND PRESENTED 2 nd , 3= LOW FREQUENCY BAND PRESENTED LASTLY.

FIG (2): SECOND SUGGESTED METHOD TO SEQUENCING A SINGLE SPEECH PHONEME SIGNAL. 1= HIGH FREQUENCY BAND PRESENTED 1 st , 2= HIGH AND MIDDLE FREQUENCY BANDS PRESENTED 2 nd , 3= ALL FREQUENCY BANDS PRESENTED LASTLY.

FIG (3): EXAMPLE ON FREQUENCY BAND DURATION. l=DURATION OF SINGLE PHONEME (90 MSEC), 2= HIGH FREQUENCY BAND PRESENTED FOR 25 MSEC, 3= MIDDLE FREQUENCY BAND PRESENTED FOR 25 MSEC, 4= LOW FREQUENCY BAND PRESENTED FOR 40 MSEC.