Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
                                                             Introduction
2.   This action is response to the preliminary amendment filed on 12-02-2020.  Claims 1-20 have been canceled and claims 21-40 have been added. Claims 21-40 are pending.

Double Patenting
3.  The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory  double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
4.   Claims 21-40 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-13 of U.S. Patent No. 10,887,690. Although the claims at issue are not identical, they are not patentably distinct from each other because, the current (17/109,597) claim limitations are broader than claims 1-13 of U.S. Patent No. 10,887,690 as shown in the table below.
    Instant   Application No. 17/109,597
                       US PAT. 10,887,690
21. (New) A device comprising: a camera configured to obtain a real-time image of a sound object; one or more processors configured to determine a sound source position of the sound object relative to the device based on the real-time image of the sound object, activate a voice interaction process between the sound object and the device, and determine voice content of the sound object is relevant to the device via semantical analysis of the voice content; and a microphone array configured to perform a sound enhancement on sound data of the sound object according to the sound source position.
1.  A method implemented by an interactive device, the method comprising: determining a sound source position of a sound object relative to the interactive device based on a real-time image of the sound object; activating a voice interaction process between the sound object and the interactive device in response to determining the sound source position of the sound object; obtaining voice content of the sound object via the interactive device; performing semantical analysis on the voice content; determining that voice content of the sound object is relevant to the interactive device based on the semantical analysis; and performing a sound enhancement on sound data of the sound object based on the sound source position.. 




Claim Rejections - 35 USC § 102
5.  In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
6.  The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.



7. Claims 1, 29 and 36 are rejected under 35 U.S.C. 102a(2) as being anticipated by White et al. (US 2022/0215837).   
    Consider Claim 1, White teaches a device comprising: a camera(see fig. 2(206) and paragraphs[0056]-[0061]) configured to obtain a real-time image of a sound object(see fig. 1); one or more processors(see fig. 2) configured to determine a sound source position of the sound object relative to the device based on the real-time image of the sound object, activate a voice interaction process between the sound object and the device(see figs. 1-5 and paragraphs[0058]-[0071]), and determine voice content of the sound object is relevant to the device via semantical analysis of the voice content; and a microphone array configured to perform a sound enhancement on sound data of the sound object according to the sound source position(see figs. 2-6B and paragraphs[0113]-[0123]).
      Consider Claim 29, White teaches a method implemented by a device(see fig. 1), the method comprising: obtaining a real-time image of a sound object(see fig. 2(206)); determining a sound source position of the sound object relative to the device based on the real-time image of the sound object(see figs. 1-5 and paragraphs[0058]-[0071]); activating a voice interaction process between the sound object and the device; determining voice content of the sound object is relevant to the device via semantical analysis of the voice content; and performing a sound enhancement on sound data of the sound object according to the sound source position (see figs. 2-6B and paragraphs[0113]-[0123]).
     Consider Claim 36, White teaches one or more computer readable media storing executable instructions that, when executed by one or more processors(see fig. 2(200)), cause the one or more processors to perform acts comprising: obtaining a real-time image of a sound object(see fig.2(206)); determining(see fig. 2) a sound source position of the sound object relative to the device based on the real-time image of the sound object(see figs. 1-5 and paragraphs[0058]-[0071]); activating a voice interaction process between the sound object and the device; determining(see fig. 2) voice content of the sound object is relevant to the device via semantical analysis of the voice content(see paragraphs[0113]-[0116]); and performing a sound enhancement on sound data of the sound object according to the sound source position (see figs. 2-6B and paragraphs[0113]-[0123]).

 

Claim Rejections - 35 USC § 103
8.   In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
9   The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

10.  The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action. 
11.          Claims 21-40 are rejected under 35 U.S.C. 103(a) as being unpatentable over Nakadai et al. (US 2009/0030552) in view of White et al. (US 2022/0215837).
Consider Claim 21, Nakadai teaches a device comprising: a camera (see fig. 3(15) and paragraphs[0045]-[0053]) configured to obtain a real-time image of a sound object(see fig. 1); one or more processors(see fig. 4) configured to determine a sound source position of the sound object relative to the device based on the real-time image of the sound object, activate a voice interaction process between the sound object and the device(see figs. 8-12 and paragraphs[0058]-[0074]), and determine voice content of the sound object is relevant to the device of the voice content; and a microphone array configured to perform a sound enhancement on sound data of the sound object according to the sound source position(see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); but White does not explicitly determine voice content of the sound object is relevant to the device via semantical analysis of the voice content.
      However, White teaches a camera(see fig. 2(206) and paragraphs[0056]-[0061]) configured to obtain a real-time image of a sound object(see fig. 1); one or more processors(see fig. 2) configured to determine a sound source position of the sound object relative to the device based on the real-time image of the sound object, activate a voice interaction process between the sound object and the device(see figs. 1-5 and paragraphs[0058]-[0071]), and determine voice content of the sound object is relevant to the device via semantical analysis of the voice content; and a microphone array configured to perform a sound enhancement on sound data of the sound object according to the sound source position(see figs. 2-6B and paragraphs[0113]-[0123]).
    Therefore, it would have obvious to one of ordinary skill in the art before the effective filling date the invention was made to combine the teaching of Nakadai in to the teaching of White to provide context-based device arbitration techniques to select a voice-enabled device from multiple voice-enabled devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include determining a ranked list of voice-enabled devices that are ranked based on audio signal metric values for audio signals generated by each voice-enabled device, and iteratively moving through the list to determine, based on device states of the voice-enabled devices, whether one of the voice-enabled devices can perform an action responsive to the command. If the voice-enabled devices that detected the speech utterance are unable to perform the action responsive to the command, all other voice-enabled devices associated with an account may be analyzed to determine whether one of the other voice-enabled devices can perform the action responsive to the command in the speech utterance.
     Consider Claims 22 and 23 , Nakadai as modified by White teaches the device wherein to determine the sound source position of the sound object relative to the device based on the real-time image of the sound object, the one or more processors are further configured to: determine whether the sound object is facing the device(see figs. 8-12 and paragraphs[0058]-[0074]) ; determine a horizontal angle and a vertical angle of a sounding portion of the sound object relative to the device in response to determining that the sound object is facing the device; and set the horizontal angle and the vertical angle of the sounding portion relative to the device as the sound source position (see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); and the device wherein to determine the horizontal angle and the vertical angle of the sounding portion of the sound object relative to the device, the one or more processors are further configured to: form an arc centered at the device covering a viewing angle of the device, a diameter of the arc corresponding to a length of an image frame(see figs. 8-12 and paragraphs[0058]-[0074]) ; equally divide the arc, and using projections of equal diversion points on an imaging frame as scales; determine a scale in which a sounding portion of a target object is located on the imaging frame; and determine angles corresponding to the determined scale as the horizontal angle and the vertical angle of the sounding portion relative to the device(see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]) .
     Consider Claims 24 and 25, Nakadai as modified by White teaches the device wherein to determine the horizontal angle and the vertical angle of the sounding portion of the sound object relative to the device, the one or more processors are further configured to: determine a size of a marking area of a target object in an imaging frame(see figs. 8-12 and paragraphs[0058]-[0074]), wherein a sounding part is located in the marking area; determine a distance of the target object from a camera according to the size of the marking area in the imaging frame; and calculate the horizontal angle and the vertical angle of the sounding part relative to the device through an inverse trigonometric function based on the determined distance (see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); and  the device wherein to perform the sound enhancement on the sound data of the sound object according to the sound source position, the one or more processors are further configured to: perform a directional enhancement on sound from the sound source position; and perform a directional suppression on sound from positions other than the sound source position(see figs. 8-12 and paragraphs[0058]-[0074]). 
Consider Claims 26-28, Nakadai as modified by White teaches the device wherein to perform the sound enhancement on the sound data of the sound object according to the sound source position, the one or more processors are further configured to: perform directional de-noising on the sound data through the microphone array(see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); and the device wherein to determine the sound source location of the sound object relative to the device based on the real-time image of the sound object, the one or more processors are further configured to: determine the sound object of the sound data according to one of the following rules(see figs. 8-12 and paragraphs[0058]-[0074]): treating an object of a plurality of objects that is at the shortest linear distance from the device as the sound object; or treating the object of the plurality of objects with the largest angle facing towards the device as the sound object (see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); and  the device wherein the microphone array comprises at least one of a directional microphone array or an omni-directional microphone array (In White, see figs. 1-5 and paragraphs[0054]-[0070]).
    Claim 29, Nakadai teaches a method implemented by a device(see fig. 3), the method comprising: obtaining a real-time image of a sound object(see fig. 3(15) and paragraphs[0045]-[0053]) ; determining a sound source position of the sound object relative to the device based on the real-time image of the sound object(see figs. 8-12 and paragraphs[0058]-[0074]); activating a voice interaction process between the sound object and the device; determining voice content of the sound object is relevant to the device  analysis of the voice content; and performing a sound enhancement on sound data of the sound object according to the sound source position (see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); but White does not explicitly determine voice content of the sound object is relevant to the device via semantical analysis of the voice content.
      However, White teaches), the method comprising: obtaining a real-time image of a sound object(see fig. 2(206)); determining a sound source position of the sound object relative to the device based on the real-time image of the sound object(see figs. 1-5 and paragraphs[0058]-[0071]); activating a voice interaction process between the sound object and the device; determining voice content of the sound object is relevant to the device via semantical analysis of the voice content; and performing a sound enhancement on sound data of the sound object according to the sound source position (see figs. 2-6B and paragraphs[0113]-[0123]).      
  Therefore, it would have obvious to one of ordinary skill in the art before the effective filling date the invention was made to combine the teaching of Nakadai in to the teaching of White to provide context-based device arbitration techniques to select a voice-enabled device from multiple voice-enabled devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include determining a ranked list of voice-enabled devices that are ranked based on audio signal metric values for audio signals generated by each voice-enabled device, and iteratively moving through the list to determine, based on device states of the voice-enabled devices, whether one of the voice-enabled devices can perform an action responsive to the command. If the voice-enabled devices that detected the speech utterance are unable to perform the action responsive to the command, all other voice-enabled devices associated with an account may be analyzed to determine whether one of the other voice-enabled devices can perform the action responsive to the command in the speech utterance.
   Consider Claims 30 and 31, Nakadai as modified by White teaches the method wherein determining the sound source position of the sound object relative to the device based on the real-time image of the sound object further comprises: determining whether the sound object is facing the device(see figs. 8-12 and paragraphs[0058]-[0074]) ; determining a horizontal angle and a vertical angle of a sounding portion of the sound object relative to the device in response to determining that the sound object is facing the device; and setting the horizontal angle and the vertical angle of the sounding portion relative to the device as the sound source position (see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); and the method wherein determining the horizontal angle and the vertical angle of the sounding portion of the sound object relative to the device further comprises: forming an arc centered at the device covering a viewing angle of the device, a diameter of the arc corresponding to a length of an image frame(see figs. 8-12 and paragraphs[0058]-[0074]) ; equally dividing the arc, and using projections of equal diversion points on an imaging frame as scales; determining a scale in which a sounding portion of a target object is located on the imaging frame; and determining angles corresponding to the determined scale as the horizontal angle and the vertical angle of the sounding portion relative to the device(see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]) 
   Consider Claims 32 and 33, Nakadai as modified by White teaches the method  wherein determining the horizontal angle and the vertical angle of the sounding portion of the sound object relative to the device further comprises: determining a size of a marking area of a target object in an imaging frame, wherein a sounding part is located in the marking area(see figs. 8-12 and paragraphs[0058]-[0074]); determining a distance of the target object from a camera according to the size of the marking area in the imaging frame; and calculating the horizontal angle and the vertical angle of the sounding part relative to the device through an inverse trigonometric function based on the determined distance (see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); and the method wherein performing the sound enhancement on the sound data of the sound object according to the sound source position further comprises: performing a directional enhancement on sound from the sound source position; and performing a directional suppression on sound from positions other than the sound source position(see figs. 8-12 and paragraphs[0058]-[0074]) .
    Consider Claims 34 and 35, Nakadai as modified by White teaches the method wherein performing the sound enhancement on the sound data of the sound object according to the sound source position further comprises: performing directional de-noising on the sound data through the microphone array(see figs. 8-12 and paragraphs[0058]-[0074]); and  the method wherein determining the sound source location of the sound object relative to the device based on the real-time image of the sound object(see figs. 8-12 and paragraphs[0058]-[0074]) further comprises: determining the sound object of the sound data according to one of the following rules: treating an object of a plurality of objects that is at the shortest linear distance from the device as the sound object; or treating the object of the plurality of objects with the largest angle facing towards the device as the sound object (see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]).
     Consider Claim 36, Nakadai teaches one or more computer readable media storing executable instructions that, when executed by one or more processors(see fig. 4), cause the one or more processors to perform acts comprising: obtaining a real-time image of a sound object(see fig.3); determining(see fig. 4) a sound source position of the sound object relative to the device based on the real-time image of the sound object(see fig. 3(15) and paragraphs[0045]-[0053]); activating a voice interaction process between the sound object and the device; determining(see fig. 4) voice content of the sound object is relevant to the device analysis of the voice content(see figs. 8-12 and paragraphs[0058]-[0074]); and performing a sound enhancement on sound data of the sound object according to the sound source position (see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); but White does not explicitly determine voice content of the sound object is relevant to the device via semantical analysis of the voice content.
      However, White teaches one or more computer readable media storing executable instructions that, when executed by one or more processors(see fig. 2(200)), cause the one or more processors to perform acts comprising: obtaining a real-time image of a sound object(see fig.2(206)); determining(see fig. 2) a sound source position of the sound object relative to the device based on the real-time image of the sound object (see figs. 1-5 and paragraphs[0058]-[0071]); activating a voice interaction process between the sound object and the device; determining(see fig. 2) voice content of the sound object is relevant to the device via semantical analysis of the voice content(see paragraphs[0113]-[0116]); and performing a sound enhancement on sound data of the sound object according to the sound source position (see figs. 2-6B and paragraphs[0113]-[0123]).
     Therefore, it would have obvious to one of ordinary skill in the art before the effective filling date the invention was made to combine the teaching of Nakadai in to the teaching of White to provide context-based device arbitration techniques to select a voice-enabled device from multiple voice-enabled devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include determining a ranked list of voice-enabled devices that are ranked based on audio signal metric values for audio signals generated by each voice-enabled device, and iteratively moving through the list to determine, based on device states of the voice-enabled devices, whether one of the voice-enabled devices can perform an action responsive to the command. If the voice-enabled devices that detected the speech utterance are unable to perform the action responsive to the command, all other voice-enabled devices associated with an account may be analyzed to determine whether one of the other voice-enabled devices can perform the action responsive to the command in the speech utterance.
      Consider Claims 37 and 38, Nakadai as modified by White teaches the one or more computer readable media wherein determining the sound source position of the sound object relative to the device based on the real-time image of the sound object further comprises(see figs. 1-5 and paragraphs[0058]-[0071]): determining whether the sound object is facing the device; determining a horizontal angle and a vertical angle of a sounding portion of the sound object relative to the device in response to determining that the sound object is facing the device; and setting the horizontal angle and the vertical angle of the sounding portion relative to the device as the sound source position(see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); and the one or more computer readable media wherein determining the horizontal angle and the vertical angle of the sounding portion of the sound object relative to the device further comprises: forming an arc centered at the device covering a viewing angle of the device, a diameter of the arc corresponding to a length of an image frame; equally dividing the arc (see figs. 1-5 and paragraphs[0058]-[0071]), and using projections of equal diversion points on an imaging frame as scales; determining a scale in which a sounding portion of a target object is located on the imaging frame; and determining angles corresponding to the determined scale as the horizontal angle and the vertical angle of the sounding portion relative to the device(see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]) 
    Consider Claims 39 and 40, Nakadai as modified by White teaches the one or more computer readable media wherein determining the horizontal angle and the vertical angle of the sounding portion of the sound object relative to the device further comprises: determining a size of a marking area of a target object in an imaging frame, wherein a sounding part is located in the marking area(see figs. 1-5 and paragraphs[0058]-[0071]); determining a distance of the target object from a camera according to the size of the marking area in the imaging frame; and calculating the horizontal angle and the vertical angle of the sounding part relative to the device through an inverse trigonometric function based on the determined distance(see figs. 3, 4, 8-12 and paragraphs[0075]-[0116]); and  the one or more computer readable media  wherein performing the sound enhancement on the sound data of the sound object according to the sound source position further comprises: performing a directional enhancement on sound from the sound source position; and performing a directional suppression on sound from positions other than the sound source position(see figs. 1-5 and paragraphs[0058]-[0071]).    
    
                                                                 Conclusion
12.  The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Cheiky et al. (US PAT. 6,919,892) is cited to show other related  
SOUND PROCESSING METHOD AND INTERACTIVE DEVICE.

13.             Any response to this action should be mailed to:

Mail Stop ____(explanation, e.g., Amendment or After-final, etc.)        		
Commissioner for Patents
        			P.O. Box 1450
        			Alexandria, VA 22313-1450
		Facsimile responses should be faxed to:
			(571) 273-8300
Hand-delivered responses should be brought to: 
Customer Service Window
Randolph Building
401 Dulany Street
Alexandria, VA 22314
			
	Any inquiry concerning this communication or earlier communications from the examiner 

should be directed to Lao,Lun-See whose telephone number is (571) 272-7501  The examiner 

can normally be reached on Monday-Friday from 8:00 to 5:30.

	If attempts to reach the examiner by telephone are unsuccessful, the examiner's 

supervisor, Nguyen Duc M(SPE), can be reached on (571) 272-7503. 

	Any inquiry of a general nature or relating to the status of this application or proceeding

 should be directed to the Technology Center 2600 whose telephone number is (571) 272-2600.

/LUN-SEE LAO/Primary Examiner, Art Unit 2651                                                                                                                                                                                                        Patent Examiner
US Patent and Trademark Office
Knox
571-272-7501
Date 07-15-2022