DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 7/9/21 have been fully considered but they are not persuasive. 
            Regarding since amended claims 1, 8 and 15 that incorporates some of the limitations from claims 5, 12 and 19, Applicant argues that the references including Aviles-Casco fail to disclose or appear to be silent regarding limitations “determining, according to comparisons between the voice confidence value and two voice confidence value thresholds respectively, object recognition information as the position information, the voiceprint feature information, or a combination of the position information and
the voiceprint feature information” or “obtaining an object recognition result of the target object according to the object recognition information” (Amendment, pg. 13, second para. – pg. 15, fourth para.). Examine respectfully disagrees.
Aviles-Casco discloses performing speaker recognition on the voiceprint of received test audio by comparing the voiceprint to speaker models to obtain an observed score, and where the observed score is compared to a speaker recognition threshold to determine a target user/speaker as well as comparing the score to a reliability threshold to confirm the target user/speaker feature (para. [0017]; para. [0058]; para. [0065]), where a determining, according to comparisons between the voice confidence value and two voice confidence value thresholds respectively, object recognition information as the position information, the voiceprint feature information, or a combination of the position information and the voiceprint feature information” and “obtaining an object recognition result of the target object according to the object recognition information”.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.       Claims 1-4, 8-11 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Tomlin et al US PGPUB 2015/0302869 A1 (“Tomlin”) in view of Aviles-Casco et al US 2016/0307572 A1 (“Aviles-Casco”)
       Per Claim 1, Tomlin discloses an object recognition method performed by a computer, comprising: 
          obtaining speech information of a target object in a current speech environment and position information of the target object (a person's location may be determined in a 
          extracting voiceprint feature from the speech information based on a trained voiceprint matching model, to obtain voiceprint feature information (For example, a controller (explained below) associated with an HMD device 104 may store a known voice list correlating audio voice data to certain family, friends, associates…if the device 104 detects audio voice data correlating to some confidence level with a voice on the known voice list…, para. [0059]; para. [0067], correlating voice data as involving feature extraction); 
         obtaining a voice confidence value corresponding to the voiceprint feature information (While engaged with content, for example wearing HMD device 104, if the device 104 detects audio voice data correlating to some confidence level with a voice on the known voice list…, para. [0059]); and 
           Tomlin discloses obtaining an object recognition result of the target object (para. [0049]; para. [0058]-[0059]; para. [0062])
          Tomlin does not explicitly disclose determining, according to comparisons between the voice confidence value and two voice confidence value thresholds respectively, object recognition information as the position information, the voiceprint feature information, or a combination of the position information and
the voiceprint feature information or obtaining an object recognition result of the target object according to the object recognition information
           However, these features are taught by Aviles-Casco:
determining, according to comparisons between the voice confidence value and two voice confidence value thresholds respectively, object recognition information as the position information, the voiceprint feature information, or a combination of the position information and the voiceprint feature information (para. [0017]; it may be decided that a person is the target if the observed score is larger than the speaker 
recognition threshold…, para. [0058]; a trial may be considered reliable if the observed score is considered target (e.g. the observed score of the speaker recognition system is higher the speaker recognition threshold) and the probability of the hidden score to be higher than (and optionally equal to) the speaker recognition threshold given the quality measures and the observed score is higher than a given reliability threshold…, para. [0065], observed score as compared to speaker recognition threshold and reliability threshold, comparison as determining target speaker/object recognition as the provider of voiceprint feature information); and
          obtaining an object recognition result of the target object according to the object
recognition information (it may decide that the person is the target (meaning that the testing audio was spoken by the assumed person) …, para. [0058]; para. [0065])
          It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Aviles-Casco with the method of Tomlin, because such combination would have resulted in reliably determining which of a number of persons whose voice prints are known to the system is speaking (Aviles-Casco, Abstract; para. [0002]; para. [0103]).
        Per Claim 2, Tomlin in view of Aviles-Casco discloses the method according to claim 1, 

          performing screening processing on the speech information set, to obtain the speech information of the target object after the screening processing (para. [0058]-[0060]); 
         obtaining phase information of the microphone array during acquiring of the speech information set (para. [0043]); and 
         determining the position information of the target object based on a speech position indicated by the phase information (para. [0043]). 
        Per Claim 3, Tomlin in view of Aviles-Casco discloses the method according to claim 1, 
             Aviles-Casco discloses:
             wherein before the obtaining speech information of a target object in a current speech environment and position information of the target object, the method further comprises: obtaining a voiceprint training speech set (para. [0002]; para. [0020]); and 
            training a voiceprint matching model based on voiceprint training speeches in the voiceprint training speech set and sample feature information corresponding to the voiceprint training speeches, to generate the trained voiceprint matching model (para. [0002]; para. [0020])
          Per Claim 4, Tomlin in view of Aviles-Casco discloses the method according to claim 3, 

           determining the voice confidence value corresponding to the voiceprint feature information according to the matching degree value (para. [0031]; para. [0054]-[0062]). 
         Per Claim 8, Tomlin discloses a computer device, comprising a processor and a memory, the memory storing a computer-readable instruction, and when executed by the processor, the computer-readable instruction causing the processor to perform the operations (para. [0094]-[0098]) including:
           obtaining speech information of a target object in a current speech environment and position information of the target object (a person's location may be determined in a variety of ways, including for example using speech source locators…, para. [0058]; if the device 104 detects audio voice data…, para. [0059]);
           extracting voiceprint feature from the speech information based on a trained voiceprint matching model, to obtain voiceprint feature information (For example, a controller (explained below) associated with an HMD device 104 may store a known voice list correlating audio voice data to certain family, friends, associates… if the device 104 detects audio voice data correlating to some confidence level with a voice on the known voice list…, para. [0059]; para. [0067], correlating voice data as involving feature extraction);
          obtaining a voice confidence value corresponding to the voiceprint feature information (While engaged with content, for example wearing HMD device 104, if the 
           Tomlin discloses obtaining an object recognition result of the target object (para. [0049]; para. [0058]-[0059]; para. [0062])
          Tomlin does not explicitly disclose determining, according to comparisons between the voice confidence value and two voice confidence value thresholds respectively, object recognition information as the position information, the voiceprint feature information, or a combination of the position information and
the voiceprint feature information or obtaining an object recognition result of the target object according to the object recognition information
           However, these features are taught by Aviles-Casco:
          determining, according to comparisons between the voice confidence value and two voice confidence value thresholds respectively, object recognition information as the position information, the voiceprint feature information, or a combination of the position information and the voiceprint feature information (para. [0017]; it may be decided that a person is the target if the observed score is larger than the speaker 
recognition threshold…, para. [0058]; a trial may be considered reliable if the observed score is considered target (e.g. the observed score of the speaker recognition system is higher the speaker recognition threshold) and the probability of the hidden score to be higher than (and optionally equal to) the speaker recognition threshold given the quality measures and the observed score is higher than a given reliability threshold…, para. [0065], observed score as compared to speaker recognition threshold and reliability threshold, comparison as determining target speaker/object recognition as the provider of voiceprint feature information); and
          obtaining an object recognition result of the target object according to the object
recognition information (it may decide that the person is the target (meaning that the testing audio was spoken by the assumed person) …, para. [0058]; para. [0065])
          It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Aviles-Casco with the device of Tomlin, because such combination would have resulted in reliably determining which of a number of persons whose voice prints are known to the system is speaking (Aviles-Casco, Abstract; para. [0002]; para. [0103]).
           Per Claim 9, Tomlin in view of Aviles-Casco discloses the computer device according to claim 8, 
               Tomlin discloses wherein when executed by the processor, the computer-readable instruction causes the processor to perform: obtaining a speech information set in the current speech environment based on a microphone array (para. [0043]);
              performing screening processing on the speech information set, to obtain the speech information of the target object after the screening processing (para. [0058]-[0060]);
              obtaining phase information of the microphone array during acquiring of the speech information set (para. [0043]); and 
              determining the position information of the target object based on a speech position indicated by the phase information (para. [0043]). 
Claim 10, Tomlin in view of Aviles-Casco discloses the computer device according to claim 8, wherein when executed by the processor, the computer-readable instruction causes the processor to further perform (Tomlin, para. [0094]-[0098]): 
             Device Claim 10 and method claim 3 are related as Device and the Method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 10 is similarly rejected under the same rationale as applied above with respect to claim 3.
           Per Claim 11, Tomlin in view of Aviles-Casco discloses the computer device according to claim 10, 
             Tomlin discloses wherein when executed by the processor, the computer-readable instruction causes the processor to perform (para. [0094]-[0098]): 
           Device Claim 11 and method claim 4 are related as Device and the Method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 11 is similarly rejected under the same rationale as applied above with respect to claim 4.
           Per Claim 15, Tomlin discloses a non-transitory computer-readable storage medium, storing a computer-readable instruction, and when executed by one or more processors, the computer-readable instruction causing the one or more processors to perform (para. [0094]-[0098]):
          obtaining speech information of a target object in a current speech environment and position information of the target object (a person's location may be determined in a variety of ways, including for example using speech source locators…, para. [0058]; if the device 104 detects audio voice data…, para. [0059]);
correlating voice data as involving feature extraction);
         obtaining a voice confidence value corresponding to the voiceprint feature information (While engaged with content, for example wearing HMD device 104, if the device 104 detects audio voice data correlating to some confidence level with a voice on the known voice list…, para. [0059]); and 
         Tomlin discloses obtaining an object recognition result of the target object (para. [0049]; para. [0058]-[0059]; para. [0062])
         Tomlin does not explicitly disclose determining, according to comparisons between the voice confidence value and two voice confidence value thresholds respectively, object recognition information as the position information, the voiceprint feature information, or a combination of the position information and
the voiceprint feature information or obtaining an object recognition result of the target object according to the object recognition information
           However, these features are taught by Aviles-Casco:
          determining, according to comparisons between the voice confidence value and two voice confidence value thresholds respectively, object recognition information as the position information, the voiceprint feature information, or a combination of the position information and the voiceprint feature information (para. [0017]; it may be decided that a person is the target if the observed score is larger than the speaker 
recognition threshold…, para. [0058]; a trial may be considered reliable if the observed score is considered target (e.g. the observed score of the speaker recognition system is higher the speaker recognition threshold) and the probability of the hidden score to be higher than (and optionally equal to) the speaker recognition threshold given the quality measures and the observed score is higher than a given reliability threshold…, para. [0065], observed score as compared to speaker recognition threshold and reliability threshold, comparison as determining target speaker/object recognition as the provider of voiceprint feature information); and
          obtaining an object recognition result of the target object according to the object
recognition information (it may decide that the person is the target (meaning that the testing audio was spoken by the assumed person) …, para. [0058]; para. [0065])
          It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Aviles-Casco with the medium of Tomlin, because such combination would have resulted in reliably determining which of a number of persons whose voice prints are known to the system is speaking (Aviles-Casco, Abstract; para. [0002]; para. [0103]).
          Per Claim 16, Tomlin in view of Aviles-Casco discloses the computer-readable storage medium according to claim 15, 
             Tomlin discloses when performing the operation of obtaining speech information of a target object in a current speech environment and position information of the target object: the computer-readable instructions further causes the processor to perform; obtaining a speech information set in the current speech environment based on a microphone array (Tomlin para. [0043]; para.[0098]);
          performing screening processing on the speech information set, to obtain the speech information of the target object after the screening processing (para. [0058]-[0060]);
          obtaining phase information of the microphone array during acquiring of the speech information set (para. [0043]); and 
          determining the position information of the target object based on a speech position indicated by the phase information (para. [0043]).
        Per Claim 17, Tomlin in view of Aviles-Casco discloses the computer-readable storage medium according to claim 15, the computer-readable instructions further causes the processor to perform; (Tomlin, para. [0094]-[0098])
            Medium Claim 17 and method claim 3 are related as Medium and the Method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 17 is similarly rejected under the same rationale as applied above with respect to claim 3.
          Per Claim 18, Tomlin in view of Aviles-Casco discloses the computer-readable storage medium according to claim 17, the computer-readable instructions further causes the processor to perform (Tomlin, para. [0094]-[0098])
             Tomlin discloses wherein when executed by the processor, the computer-readable instruction causes the processor to perform the following operations (para. [0094]-[0098])
Claim 18 and method claim 4 are related as Medium and the Method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 18 is similarly rejected under the same rationale as applied above with respect to claim 4.

Allowable Subject Matter
Claims 6, 7, 13, 14 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The prior art fails to explicitly disclose the limitations recited in claims 6, 13 and 20.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See Welbourne (PTO 892 form) disclosing obtaining object recognition result using confidence, position and voice print information.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUJIMI A ADESANYA whose telephone number is (571)270-3307.  The examiner can normally be reached on 8:30-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 


/OLUJIMI A ADESANYA/Primary Examiner, Art Unit 2658