Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
1.	This action is responsive to remarks filed 9/24/2021.
Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
	NPL documents submitted 2/5/2020 and 8/28/2020 have not been included on any IDS.
Response to Amendment
3.	The drawings submitted are accepted and the objection overcome.
Response to Arguments
4.	Applicants arguments filed have been considered but are not persuasive.
Applicant appears to argue on pages 10-11 of remarks that the cited art of record, Ranieri, does not teach the limitations of claim 1.
Applicant argues on page 11 that Ranieri fails to disclose the features of the separation into a recognition phase and application phase such that the presence of a preferred conversation partner is inferred only in the recognition phase; And on page 12 that in Ranieri, there is no separation of the features of this training to be based on and the features a recognition during normal operation. 

	
Regarding claim 1 Ranieri teaches A method for individualized signal processing of an audio signal of a hearing device (abstract method; 114 apparatus may be used as intelligent hearing aid for automatically or manually selecting audio sources to be enhanced), the method comprising: 
in a recognition phase (92): 
generating a first image capture with an auxiliary device (86 image captured; 89; 92); 
inferring a presence of a preferred conversation partner from the first image capture, and based thereon, analyzing a first audio sequence of the audio signal and/or an auxiliary audio signal of the auxiliary device for characteristic speaker identification parameters; (89 A correspondence could be established between features or elements in the video image and audio sources. For example, a correspondence could be established between faces of persons recognised in the image, and audio sources classified as voices.  the correspondence may be user-dependent, and based for example on a face recognition and speaker identification for associating the face of a known person with his voice.); and 
storing the speaker identification parameters ascertained in the first audio sequence in a database (86: feature recognition module 1312 may be arranged for visually detecting and recognizing elements, such as for example human faces, parts of a machine, etc. Some of those elements may be associated with corresponding audio sources; 88 classify the audio source to detect the type of audio source; 89; 92 a classifier may be trained for recognizing speakers in the circle of acquaintances of a specific user, and for augmenting); and 
in an application phase: 
analyzing the audio signal with respect to the stored speaker identification parameters, and thus evaluating the audio signal with respect to a presence of the preferred conversation partner (111; 114); and 
if the presence of the preferred conversation partner is detected, emphasizing the preferred conversation partner's signal contributions in the audio signal
([0111] The module 1322 may also remove some audio sources, for example all the unselected audio sources, or a specific audio source. For example, in a bar, the module 1322 may remove all the music and speech, with the exception of a specifically selected person whose voice in enhanced; 114; 116).  

	Ranieri teaches recognizing and classifying audio, and modifying the audio based on the classification (44; 52).  Paragraph 89 teaches A correspondence could be established between features or elements in the video image and audio sources. For example, a correspondence could be established between faces of persons recognised in the image, and audio sources classified as voices.  Paragraph 44 teaches the creation, storage for classified audio sources.  Paragraph 52 teaches modifying an audio source, where the reference teaches recognizing/classifying audio (92), and presenting modifications based on the classification, discussed in 111 and 114, thus 
	Therefore the claims as currently recited do not yet overcome the current art of record and the rejection is maintained.
	The additional claims are rejected based on arguments presented above and art rejections below.


Claim Rejections - 35 USC § 102
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
6.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


s 1-4, 11-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ranieri et al (2017/0188173).

Regarding claim 1 Ranieri teaches A method for individualized signal processing of an audio signal of a hearing device (abstract method; 114 apparatus may be used as intelligent hearing aid for automatically or manually selecting audio sources to be enhanced), the method comprising: 
in a recognition phase (92): 
generating a first image capture with an auxiliary device (86 image captured; 89; 92); 
inferring a presence of a preferred conversation partner from the first image capture, and based thereon, analyzing a first audio sequence of the audio signal and/or an auxiliary audio signal of the auxiliary device for characteristic speaker identification parameters; (89 A correspondence could be established between features or elements in the video image and audio sources. For example, a correspondence could be established between faces of persons recognised in the image, and audio sources classified as voices.  the correspondence may be user-dependent, and based for example on a face recognition and speaker identification for associating the face of a known person with his voice.); and 
storing the speaker identification parameters ascertained in the first audio sequence in a database (86: feature recognition module 1312 may be arranged for visually detecting and recognizing elements, such as for example human faces, parts of a machine, etc. Some of those elements may be associated with corresponding audio sources; 88 classify the audio source to detect the type of audio source; 89; 92 a classifier may be trained for recognizing speakers in the circle of acquaintances of a specific user, and for augmenting); and 
in an application phase: 
analyzing the audio signal with respect to the stored speaker identification parameters, and thus evaluating the audio signal with respect to a presence of the preferred conversation partner (111; 114); and 
if the presence of the preferred conversation partner is detected, emphasizing the preferred conversation partner's signal contributions in the audio signal
([0111] The module 1322 may also remove some audio sources, for example all the unselected audio sources, or a specific audio source. For example, in a bar, the module 1322 may remove all the music and speech, with the exception of a specifically selected person whose voice in enhanced; 114; 116).  

Regarding claim 2 Ranieri teaches The method according to claim 1, which comprises recognizing the preferred conversation partner in the first image capture by way of facial recognition (89 face recognition).  

Regarding claim 3 Ranieri teaches The method according to claim 1, which comprises using a mobile telephone as the auxiliary device (fig 1 smartglasses; 68; 120 smartphones).  

68; 73-74; 89; 92 – device for analyzing audio signal).  

Regarding claim 11 Ranieri teaches The method according to claim 1, which comprises, in the application phase, initiating the step of analyzing the audio signal based on an additional image capture of the auxiliary device (86; 89).  

Regarding claim 12 Ranieri teaches The method according to claim 1, which comprises: 
in the first image capture, determining a number of persons present (88; 92; 111; 114); and 
analyzing the first audio sequence of the audio signal, or of the auxiliary audio signal of the auxiliary device, as a function of the number of persons present (89; 92; 111; 114  multiple people/audio in an area).  

Regarding claim 13 Ranieri teaches The method according to claim 1, which comprises: 
generating the first image capture as part of a first image sequence (89; 92); 
in the first image sequence, detecting a speech activity of the preferred conversation partner (89; 92); and 
111; 114 The apparatus may also be used for selectively selecting whose talkers need to be amplified in a crowd. For example, a user may select who he wants to listen in a noisy café, simply by looking them or by selecting the person with any audio source selection means.).  

Regarding claim 14 Ranieri teaches The method according to claim 1, wherein the step of emphasizing the signal contributions of the preferred conversation partner is based on directional signal processing and/or blind source separation (81).  

Regarding claim 15 Ranieri teaches A system, comprising: 
a hearing device; 
an auxiliary device configured to generate an image capture; and said hearing device and said auxiliary device being commonly configured to perform the method according to claim 1.  
Rejected for similar rationale and reasoning as claim 1 & 3

Regarding claim 16 Ranieri teaches The system according to claim 15, wherein said auxiliary device is a mobile telephone (73 smartphone; 120).  

73; 120), for: 
generating and/or detecting at least one image capture (89; 92); 
automatically recognizing a person in the at least one image capture who has been predefined as a preferred person (89; 92); and 
generating a start command for recording a first audio sequence of an audio signal and/or a start command for analyzing an audio sequence or the first audio sequence for characteristic speaker identification parameters in order to recognize the preferred person (39; 86; 89)
	Also Rejected for similar rationale and reasoning as claim 1

Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9.	Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Ranieri et al (2017/0188173) in view of Visser et al (2015/0301796).


The method according to claim 1, which comprises analyzing at least one speaker identification parameter selected from the group consisting of: 
a number of pitches; a number of formant frequencies; a number of phonospectra; a distribution of stresses; a chronological sequence of phones; and a chronological sequence speech pauses  (Visser 110: characteristics, pitch related feature; energy related features; 114-115: MFCC; phoneme; 120)
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Visser presenting a reasonable expectation of success in still allowing for speaker identification. 

Regarding claim 6 Ranieri does not specifically teach where Visser teaches The method according to claim 1, which comprises: 
decomposing the first audio sequence into a plurality of sub-sequences (105 frames); 
ascertaining for each of the respective sub-sequences a speech intelligibility parameter and/or a signal-to-noise ratio and comparing with an associated criterion (121; 123 the validation module may determine a signal to noise ratio (SNR); may determine audio corresponds to non-standard speech in response to determining SNR fails to satisfy an SNR threshold); 
121; 123).  
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Visser for an improved system to only process audio of appropriate quality for accurate identification.


10.	Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Ranieri et al (2017/0188173) in view of Claussen et al (2011/0261983).

Regarding claim 7 Ranieri does not specifically teach where Claussen teaches The method according to claim 1, which comprises: 
decomposing the first audio sequence into a plurality of sub-sequences (46: each time instance); 
monitoring in the hearing device a user's own speech activity (abstract own voice recognition for hearing aids); and 
for the analysis with regard to the characteristic speaker identification parameters, using only those sub-sequences having a proportion of the user's own speech activity that does not exceed a predetermined upper limit (Claussen 47 electronic control in the hearing aid that controls the beamforming property is adapted to own voice recognition or lessened or even cancelled when own voice recognition occurs. This may improve the user's experience.)
.


11.	Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Ranieri et al (2017/0188173) in view of Heigold et al (2017/0069327).

Regarding claim 8 Ranieri teaches The method according to claim 1, which comprises: 
generating a second image capture with the auxiliary device and, in response to the second image capture, analyzing a second audio sequence of the audio signal and/or of an auxiliary audio signal of the auxiliary device with regard to characteristic speaker identification parameters (86; 89; 92 – can obtain multiple image and audio information);
but does not specifically teach
adapting the speaker identification parameters that are stored in the database by way of the speaker identification parameters ascertained from the second audio sequence.  
Ranieri teaches [0092] A neural network system, a hidden Markov system or a hybrid system could be used for classifying an audio source. The classifier may be a self-learning system and trained with feedbacks from the user or from other users. The classifier may receive the processed audio signals from a source, possibly associated visual features, and possibly other information that could help for the classification, such as the location of the audio source. The classification may be user-independent. The classification may be user-dependent. For example, a classifier may be trained for recognizing speakers in the circle of acquaintances of a specific user, and for augmenting
Heigold teaches adapting the speaker identification parameters that are stored in the database by way of the speaker identification parameters ascertained from the second audio sequence (99 if the speaker verification model determines with sufficient confidence that the verification utterance was spoken by the enrolled speaker, the speaker model for the enrolled user may then be updated based on the verification utterance.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Heigold for an improved system allowing for the updating of speaker profiles or models to continuously improve speaker identification.

Regarding claim 9 Heigold teaches The method according to claim 8, wherein the step of adapting the speaker identification parameters stored in the database using the speaker identification parameters that were ascertained from the second audio sequence comprises using averaging and/or an artificial neural network (99 The speaker model may be updated by combining (e.g., averaging) the speaker representation generated by the neural network for the third verification utterance with other speaker representations from enrollment utterances of the user that were used to create the speaker model in the first instance.).  
Rejected for similar rationale and reasoning as claim 8 above

Regarding claim 10 Heigold teaches The method according to claim 8, which comprises terminating the recognition phase when a deviation of the speaker identification parameters that - 31 -FDST-P180399 were ascertained from the second audio sequence, from among the speaker identification parameters stored in the database, falls below a threshold value (99 the similarity score for the second verification utterance is not sufficiently high for the enrolled user's speaker model  to be updated based on the second verification utterance.).  
Rejected for similar rationale and reasoning as claim 8 above, and further allowing for updating to not continue or be performed if received speaker parameters are no longer close enough

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541.  The examiner can normally be reached Monday-Friday 9-5 EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHAUN ROBERTS/
Primary Examiner, Art Unit 2657