PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 16/782,111
Filing Date: 5 Feb 2020
Appellant(s): FROEHLICH, MATTHIAS



__________________
Jacob Stemer
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed February 4, 2022.

(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated 10/12/2021 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”
 
The following ground(s) of rejection are applicable to the appealed claims:

Claims 1-4, 11-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ranieri et al (2017/0188173)
Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Ranieri et al (2017/0188173) in view of Visser et al (2015/0301796)
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Ranieri et al (2017/0188173) in view of Claussen et al (2011/0261983)
Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Ranieri et al (2017/0188173) in view of Heigold et al (2017/0069327).

(2) Response to Argument

Applicant has filed an Appeal Brief 2/4/2022 in response to art rejections over prior art Ranieri.  Applicant’s arguments filed have been fully considered but are not persuasive.

The current application at hand teaches signal processing of an audio signal.  The application can recognize and store image and speaker identification parameters of a preferred conversation partner.  These features can then be utilized in an application phase to detect the preferred conversation partner and adjust the partner's audio signal (abstract).

Claim 1 recites:
A method for individualized signal processing of an audio signal of a hearing device, the method comprising: 
in a recognition phase: 
generating a first image capture with an auxiliary device; 
inferring a presence of a preferred conversation partner from the first image capture, and based thereon, analyzing a first audio sequence of the audio signal and/or an auxiliary audio signal of the auxiliary device for characteristic speaker identification parameters; and 
storing the speaker identification parameters ascertained in the first audio sequence in a database; and 
in an application phase: 
analyzing the audio signal with respect to the stored speaker identification parameters, and thus evaluating the audio signal with respect to a presence of the preferred conversation partner; and 



Cited prior art, Ranieri, teaches audio signal processing that can classify and store audio and corresponding image information, and subsequently use this information to recognize and modify received audio to create a customized sound experience for a user.

Regarding claim 1, Ranieri teaches A method for individualized signal processing of an audio signal of a hearing device (abstract method; 114 apparatus may be used as intelligent hearing aid for automatically or manually selecting audio sources to be enhanced), the method comprising: 
in a recognition phase (92): 
generating a first image capture with an auxiliary device (86 image captured; 89; 92); 
inferring a presence of a preferred conversation partner from the first image capture, and based thereon, analyzing a first audio sequence of the audio signal and/or an auxiliary audio signal of the auxiliary device for characteristic speaker identification parameters; (89 A correspondence could be established between features or elements in the video image and audio sources. For example, a correspondence could be established between faces of persons recognised in the image, and audio sources classified as voices.  the correspondence may be user-dependent, and based for example on a face recognition and speaker identification for associating the face of a known person with his voice.); and 
storing the speaker identification parameters ascertained in the first audio sequence in a database (86: feature recognition module 1312 may be arranged for visually detecting and recognizing elements, such as for example human faces, parts of a machine, etc. Some of those elements may be associated with corresponding audio sources; 88 classify the audio source to detect the type of audio source; 89; 92 a classifier may be trained for recognizing speakers in the circle of acquaintances of a specific user, and for augmenting); and 
in an application phase: 
analyzing the audio signal with respect to the stored speaker identification parameters, and thus evaluating the audio signal with respect to a presence of the preferred conversation partner (111; 114); and 
if the presence of the preferred conversation partner is detected, emphasizing the preferred conversation partner's signal contributions in the audio signal
([0111] The module 1322 may also remove some audio sources, for example all the unselected audio sources, or a specific audio source. For example, in a bar, the module 1322 may remove all the music and speech, with the exception of a specifically selected person whose voice in enhanced; 114; 116).  


	Applicant argues that cited prior art, Ranieri, does not teach the limitations as claimed.  
	Applicant argues on pages 4-6 of the brief that Ranieri fails to teach or disclose the recognition phase and separate application phase (page 6: fails to disclose the separation into a recognition and application phase).  Examiner respectfully disagrees.
	Ranieri teaches analyzing and classifying audio, and modifying the audio based on recognition and the classification (44; 52).  In order to perform classification, data must first be trained/learned, which is taught by Ranieri (and can incorporate image information as well).  Ranieri teaches the creation, storage for which audio sources should be classified (44) and 
A neural network system, a hidden Markov system or a hybrid system could be used for classifying an audio source. The classifier may be a self-learning system and trained with feedbacks from the user or from other users. The classifier may receive the processed audio signals from a source, possibly associated visual features, and possibly other information that could help for the classification, such as the location of the audio source. The classification may be user-independent. The classification may be user-dependent. For example, a classifier may be trained for recognizing speakers in the circle of acquaintances of a specific user, and for augmenting (92).
The reference additionally teaches A correspondence could be established between features or elements in the video image and audio sources. For example, a correspondence could be established between faces of persons recognised in the image, and audio sources classified as voices (89).  
Ranieri teaches the training, and once the system has been trained (recognition phase), the stored information can be used for identifying and modifying received audio (application phase).  Paragraph 52 teaches modifying an audio source, with additional paragraphs teaching classification and modification (44-48; 104; 111; 114; 116).  These paragraphs represent captured audio, and comparing it to the stored information for identification (of animals, scenarios (for example, forest), and specific individuals).
Ranieri therefore teaches a recognition phase for obtaining and storing image and speaker identification parameters, and an application phase for recognizing a received audio signal using the parameters.


Applicant argues on pages 5-7 that the prior art does not teach the limitations of the application phase.  Specifically, it appears Applicant is trying to convey that the reference does not teach where the emphasizing is triggered (only) by detecting the conversation partner identification parameters.  Applicant argues that in the application phase the preferred conversation partner is recognized solely from the audio signal and does not require an image capture (page 5 brief), and (only, based on) the detection of 

Examiner respectfully disagrees.  Applicant appears to be reading the claim more narrowly than what is actually recited, and presents a plethora of arguments that are based on an exaggerated analysis of the claim language and an improper analysis of the reference.  There are no specific limitations reciting that image data is only captured during the recognition phase, the interplay between the hearing device and auxiliary device (and how this affects image data, when it is captured, and when it is used), or of any selection (or lack thereof) regarding the emphasis.  
The claim recites
in an application phase: 
analyzing the audio signal with respect to the stored speaker identification parameters, and thus evaluating the audio signal with respect to a presence of the preferred conversation partner; and 
if the presence of the preferred conversation partner is detected, emphasizing the preferred conversation partner's signal contributions in the audio signal.
Thus, according to the claims, in the application phase an audio signal is received, analyzed to determine if it matches stored parameters, and if a match, can emphasize the received signal.

Paragraph 65 of Application teaches [0065] In addition, the user of a hearing device 2 may initiate the analysis himself on an ad hoc basis by means of user input, if necessary via the auxiliary device 4, in particular via a dedicated application 15 for the method. In addition, the analysis of the audio signal 12 may also be triggered by a new image capture, in particular in a manner analogous to triggering the analysis in the recognition phase 1, i.e. by facial recognition taking place immediately when the image capture is generated and triggering the analysis in the event that the preferred conversation partner is recognized in a generated image capture.

The arguments also present a mis-characterization of the prior art, and confuse the steps and methods.  Mentioned above, according to the claims, in the application phase an audio signal is received, analyzed to determine if it matches stored parameters, and if a match, can emphasize the received signal.  Ranieri, at the very least, explicitly reads on these limitations.
Ranieri (as already discussed above) first performs a training (recognition phase) for classifying audio sources.  During run-time (application phase), audio signals are captured (abstract).  The received audio signals are analyzed and compared against the 44-48; 104; 111; 114; 116 – corresponding to Applicant’s limitation “analyzing the audio signal with respect to the stored speaker parameters”).  Once a match is made/an audio source is classified, the module can provide the modification of the signal based on specific rules for that source or scenario (52; 104; 111; 114; 116). 
While not required (by claim language), the received audio signals can be solely audio (47-48; 83; 111), and allow for the modification based on a recognition and classification of the audio source.  These sections also focus on the received audio being a specific person, allowing for recognition of that person based on stored identification parameters (as taught in para 89), and further emphasis of their specific audio based on the identification (48: teacher as audio source when the user is in a classroom.; 111: specifically selected person whose voice in enhanced;
corresponding to Applicant’s “if the presence of the preferred conversation partner is detected, emphasizing the preferred conversation partner's signal contributions in the audio signal”).  While Applicant argues (that in Ranieri, in an attempt to highlight differences from the claimed application, brief pages 6-7) that in order to perform the modification or enhancement of the sources, a user selection must first be obtained, Ranieri explicitly teaches where this is not required (there is also no claim language further clarifying the emphasizing; how or how not to perform).  Ranieri teaches that upon receipt and classification of audio, the selection of the recognized audio source for modification can happen automatically (99: selection of audio source may also be automatic.  The module 1323 may for example determine the most interesting audio sources, or the audio sources which the user most likely wants to augmente or otherwise annotate.; 114 for automatically… selected audio sources to be enhanced).
Therefore, when a particular sound is heard and captured, the sound can be classified and modified based on stored parameters and adjustments.  The sound can be animals, or a specific individual (a preferred conversation partner, recognized based on previously stored identification parameters).  Thus, the reference can receive and recognize audio solely, and modify it based on the identification parameters.  
While additional sources (location, gaze) can be incorporated to help with determining which audio sources should be modified, they are not required.  Further, these teachings do not take away from the analyzing, evaluating, and emphasizing of the audio signal.  These teachings, when in a particular embodiment or mode, are for determining which audio signal should get additional modification.
	To be clear, Ranieri explicitly teaches receiving, analyzing, evaluating, and emphasizing audio signals.  Ranieri creates scenarios which incorporate rules for customizing audio for a particular speaker.  The additional location and gaze information are enhancements that can be further applied to determine when a user is in a scenario and assist in determining which source to further select for modification based on scenario rules.  Thus, the reference reads on the limitations, and the additional modifications could serve as enhancements, while not taking away from what is already taught.

	Therefore, cited prior art Ranieri reads on all of the limitations of claim 1 as currently recited.

	Regarding claim 17 Applicant argues that the cited prior art, Ranieri, fails to disclose the limitations based on arguments presented above regarding claim 1.  Examiner respectfully disagrees.  The rejection of Claim 17 is maintained based on arguments presented above and art rejections of the final action. 

	The rejections of the additional claims are also maintained based on art rejections of the final action.

For the above reasons, it is believed that the rejections should be sustained.
Respectfully submitted,
/SHAUN ROBERTS/Primary Examiner, Art Unit 2655                                                                                                                                                                                                        
Conferees:
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655                                                                                                                                                                                                 
/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657                                                                                                                                                                                                        
Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.