Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to RCE/Amendments/Remarks
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 08/08/2022 has been entered. Claims 1, 5, 10, 12-13, and 18 have been amended.
IDS filed on 08/08/2022 has been received and considered. 
Currently claim 1-18 remained pending.
Please refer to the action below.

Examiner Notes
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. However, the claimed subject matter, not the specification, is the measure of the invention. 

Responses to Arguments/Remarks
Applicants’ arguments cited in at least pages 7-10 regarding the currently amended independent claims and the prior arts of Goetz in view of Huang “Applicants respectfully traverse these rejections and the assertions and holdings therein, because Goetz and Huang, whether alone or in combination, have not been shown to teach or suggest each and every element of the amended claims as required by law. Independent claim 1, as amended, recites the following claim elements: determine, based on at least one feature of the target user comprised in the user image data, that the target user is speaking; and send a wakeup instruction to the application processor in response to determining that the target user is speaking based on the at least one feature of the target user comprised in the user image data”, have been considered, however, they are moot in light of the newly cited ground of rejection of Goetz in view of Hart, as the prior art of Hart reads on said cited amended claims limitations cited above. Please refer to the action below.



  Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-18 rejected under 35 U.S.C. 103 as being unpatentable and obvious over Goetz et al (US 9973732, previously cited), in view of Hart et al. (US 2012/0062729, A1). 
    Regarding claim 1, Goetz teaches a voice interaction processing apparatus (at least Figs. 3-6, Col. 2, lines 25-38 and Col. 12, lines 1-25 teaches a voice interaction processing apparatus for interacting with users via voice and video), wherein the apparatus comprises a microprocessor and an application processor (apparatus 700 comprises a microprocessor 706 and media 716 comprising said application processor);
wherein the microprocessor is configured to: receive voice data of a first user (the received voice data of at least Col. 12, lines 1-25 and Col. 15, lines 15-24 comprises at least a user wake word instruction and second user voice data);
determine, based on the voice data of the first user, that the first user is a target user (Col. 2, lines 25-38 further teaches monitoring received voices data of a plurality of users, and to ascertain a first target user to follow based on at least a level of engagement in the voice data and based on the voice data of said first user, that said first user is a target user to follow);
 in response to determining that the first user is the target user, receive user image data (after determining the target user to follow in further Col. 2, lines 25-38, the apparatus is further configured for receiving user image data from at least one of a plurality of imaging devices);
send a wakeup instruction to the application processor (the system further understoodly configured to send the received wakewords or wakeup instructions of Col. 12, lines 1-25 and that of Col. 15, 15-24 to the application processor for at least as noted further in Col. 2, lines 25-38 for activating target devices according to at least user request);
 and the application processor is configured to receive the wakeup instruction and wake up voice interaction software to provide a voice interaction function for the target user (one skill in the art would further appreciate said apparatus of  Col. 12, lines 1-25 and in Col. 15, lines 15-24 further adapted to receive said transmitted wakeword or wakeup instruction and further understoodly wake up said voice interaction software to provide a voice interaction function for the target user).
    However, Goetz is silent regarding specifically determine, based on at least one feature of the target user comprised in the user image data, that the target user is speaking; and send a wakeup instruction to the application processor in response to determining that the target user is speaking based on the at least one feature of the target user comprised in the user image data.
     Hart teaches at least in para. 0049 and 0055 comprising a voice processing system configured for receiving first voice data of at least one user, and further configured to determine an active user based on  at least a posture of the user and/or device such as ascertaining whether the device is facing the user or the user is facing the device, the system further supplements the captured voice with image data to further determine whether the person is actually speaking from the image capture to at least determine a target activation device to further process user instructions  in response to determining that the target user is speaking based on the at least the cited image information comprising at least one feature of moving lips of the target user comprised in the user image data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include wherein said determine, based on at least one feature of the target user comprised in the user image data, that the target user is speaking; and send a wakeup instruction to the application processor in response to determining that the target user is speaking based on the at least one feature of the target user comprised in the user image data, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed wakeup words and voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed wakeup words and voice instructions, Hart further complements Goetz in the sense that user’s voice is supplemented with image captures, with additionally ascertaining user and device postures, and whether the target user is one speaking based on at his/her lips moving in the obtained image, based on the combination of the image and the audio ascertain a downstream device to activate to further process the target user instructions, as the target destination activation devices maybe in the art ones that require a wakeup transmission instruction for activation, said activation command or wakeup transmission instruction when combined with Hart only sent when it is affirmative the obtained audio and image combination of the target user is intended for the device and the user is actively speaking which when further combined with Goetz will activate the voice interaction system for interacting with said target user when it is ensured that said user is indeed the active user, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

    Regarding claim 2 (according to claim 1), Goetz further teaches wherein the microprocessor is configured to: determine, based on the user image data and by using a living detection method, said target user (the apparatus of further Figs. 3-4, and lines 20-30 of Col. 12 further configured to activate video camera to identify user and capably understoodly to determine, based on the user image data and by using said camera and additional sensors comprising further a living detection method, that said target may obviously in a case speaking).
   Goetz is silent regarding further determine that the target user is speaking.
     Hart further teaches at least in para. 0049 and 0055 to determine and validating whether person speaking is relevant to the device as an active or target user, based on further at least one feature such as moving lips of the target user in the obtained user image data, that said target user is speaking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include determine that the target user is speaking, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed wakeup words and voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed wakeup words and voice instructions, Hart further complements Goetz in the sense that the voice interaction device is only awakened, after validating of both of the obtained voice data and image data, including additionally analyzed the image data to further ascertain a user facing the device and speaking to the device is actively further moving their lips indicating said target user is speaking and intended when combined to invoke the system or the voice interactive device of Goetz, which then would obviously triggers the system to activate applicable downstream component or to wakeup a device to perform the voice interaction and output user responses corresponding the received instructions, when it is ensured that said user is indeed the target user, which maybe further realized, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

    Regarding claim 3 (according to claim 1), Goetz further teaches wherein the apparatus further comprises: a posture sensor, configured to detect a posture parameter of the apparatus and transmit the posture parameter to the microprocessor (Fig. 1 teaches a case of detecting and determining a posture and location of a user relative to a device, and a posture parameter of the device such as cited “As illustrated in FIG. 1, the imaging device 114 is positioned in the environment 112 to capture image data of the user 104 represented as the image data 122. For example, because the user 104 is facing towards the imaging device 114, the image data 122 represents the front of the user 104. As the user 104 is facing away from the smart appliance 116, image data 124 captured by the smart appliance 116 represents the back of the user 104” understoodly indicating acquired posture parameter of the apparatus indicating in a case front facing posture or rear facing posture and to transmit obviously said posture parameter to the microprocessor which in turns activates a front facing camera or a rear facing camera based on the and the apparatus in that case to capture or collect image data of the user and essentially transmitting said posture parameter); and 
the microprocessor is further configured to: in response to determining, based on the posture parameter, that the apparatus is in a front placement posture, send a first enabling instruction that instructs to collect a front-facing image (a case exists in the cited Fig. 1 above in response to determining, based on the posture parameter of a front facing status, that the device is in a front placement posture, sending a first enabling instruction that instructs the camera to collect a front-facing image); or in in response to determining, based on the posture parameter, that the apparatus is in a back placement posture, send a second enabling instruction that instructs to collect a back-facing image (a case further exists in the cited Fig. 1 above in response to determining, based on the posture parameter of a rear  facing status, that said device is in a rear facing posture, sending another enabling instruction to a rear facing camera that instructs the camera to collect a rear facing image).
 
   Regarding claim 4 (according to claim 1), Goetz further teaches wherein the apparatus further comprises: a distance sensor, configured to detect a distance between the first user and the apparatus and transmit the distance to the microprocessor (the system of further lines 60-65 of Col. 1 and lines 60-65 of Col. 3 employs a plurality of means such as locating users by received RF data configured to detect a distance between the first user and the apparatus and transmit said distance to the microprocessor); and wherein the microprocessor is further configured to: in response to determining that the distance is less than or equal to a preset distance, send a third enabling instruction (it is further implied in at least lines 60-65 of Col. 3, based on the user substantially at or near respective devices, enabling said device for video communication as requested by user which distance location understoodly comprising obviously a distance equal to a preset distance).

    Regarding claim 5, Goetz teaches a voice interaction processing method (the system of at least Fig. 7 comprising as illustrated further in Figs. 3-4 comprising said method and at least a voice interaction processing apparatus 700 comprising a plurality of application processors including a requested video communication application processing function), 
wherein the method is applied to an apparatus comprising, a microprocessor and an application processor (apparatus 700 comprises a microprocessor 706 and media 716 comprising said application processor); and wherein the method comprises: receiving, by the microprocessor, voice data of a first user (the received voice data of at least Col. 12, lines 1-25 and Col. 15, lines 15-24 comprises at least a user wake word instruction and second user voice data);
determining, by the microprocessor and based on the voice data of the first user, that the first user is a target user (Col. 2, lines 25-38 further teaches monitoring received voices data of a plurality of users, and to ascertain a first target user to follow based on at least a level of engagement in the voice data and based on the voice data of said first user, that said first user is a target user to follow);
in response to determining that the first user is the target user, receiving, by the microprocessor, user image data (after determining the target user to follow in further Col. 2, lines 25-38, the apparatus is further configured for receiving user image data from at least one of a plurality of imaging devices);
sending, by the microprocessor, a wakeup instruction to the application processor (the system further understoodly configured to send the received wakewords or wakeup instructions of Col. 12, lines 1-25 and that of Col. 15, 15-24 to the application processor for at least as noted further in Col. 2, lines 25-38 for activating target devices according to at least user request); and receiving, by the application processor, the wakeup instruction, and waking up voice interaction software to provide a voice interaction function for the target user (one skill in the art would further appreciate said apparatus of  Col. 12, lines 1-25 and in Col. 15, lines 15-24 further adapted to receive said transmitted wakeword or wakeup instruction and further understoodly wake up said voice interaction software to provide a voice interaction function for the target user).
    However, Goetz is silent regarding specifically determining, based on at least one feature of the target user comprised in the user image data, that the target user is speaking; and sending said wakeup instruction to the application processor in response to determining that said target user is speaking based on the at least one feature of the target user comprised in the user image data.
     Hart teaches at least in para. 0049 and 0055 comprising a voice processing system configured for receiving first voice data of at least one user, and further configured to determine an active user based on  at least a posture of the user and/or device such as ascertaining whether the device is facing the user or the user is facing the device, the system further supplements the captured voice with image data to further determine whether the person is actually speaking from the image capture to at least determine a target activation device to further process user instructions  in response to determining that the target user is speaking based on the at least the cited image information comprising at least one feature of moving lips of the target user comprised in the user image data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include wherein said determining and said sending wakeup instruction to the application processor, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed wakeup words and/or voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed words and voice instructions, Hart further complements Goetz in the sense that user’s voice is supplemented with image captures, with additionally ascertaining user and device postures, and whether the target user is one speaking based on at his/her lips moving in the obtained image, based on the combination of the image and the audio ascertain a downstream device to activate to further process the target user instructions, as the target destination activation devices maybe in the art ones that require a wakeup transmission instruction for activation, said activation command or wakeup transmission instruction when combined with Hart only sent when it is affirmative the obtained audio and image combination of the target user is intended for the device and the user is actively speaking which when further combined with Goetz will activate the voice interaction system for interacting with said target user when it is ensured that said user is indeed the active user, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

    Regarding claim 6 (according to claim 5), Goetz further teaches wherein the determining that the user image data indicates that the target user is speaking comprises: determining, based on the user image data and by using a living detection method, theFirst Named Inventor Bailin WENAttorney Docket No.: 43968-Application No. : 16/840,7531165001 / 85807653US03Filed: April 6, 2020 Page: 4 of 10target user (the apparatus of further Figs. 3-4, and lines 20-30 of Col. 12 further configured to activate video camera to identify user and capably understoodly to determine, based on the user image data and by using said camera and additional sensors comprising further a living detection method, that said target may obviously in a case speaking).
   Goetz is silent regarding further determining, based on said user image data and by using a living detection method, saidFirst Named Inventor Bailin WENAttorney Docket No.: 43968-Application No. : 16/840,7531165001 / 85807653US03Filed: April 6, 2020 Page: 4 of 10target user is speaking.  
     Hart further teaches at least in para. 0049 and 0055 to determine presence of a user using known methods, and validating whether person speaking is relevant to the device as an active or target user, based on further at least one feature such as moving lips of the target user in the obtained user image data, that said target user is speaking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include determining, based on said user image data and by using a living detection method, saidFirst Named Inventor Bailin WENAttorney Docket No.: 43968-Application No. : 16/840,7531165001 / 85807653US03Filed: April 6, 2020 Page: 4 of 10target user is speaking, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed words and voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed wakeup words and/or voice instructions, Hart further complements Goetz in the sense that a voice interaction device or target device is only awakened, after validating of both of the obtained voice data and image data, including additionally analyzed the image data to further ascertain a user facing the device and speaking to the device is actively further moving their lips indicating said target user is speaking and intended to invoke the voice interactive device, which then triggers the system to wakeup said device to perform the voice interaction and output user responses corresponding the received instructions, whereby the system only employs said voice interaction when it is ensured that said user is indeed the target user, which maybe further realized, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

   Regarding claim 7 (according to claim 5), Goetz further teaches wherein the apparatus further comprises a posture sensor (Goetz cited “As illustrated in FIG. 1, the imaging device 114 is positioned in the environment 112 to capture image data of the user 104 represented as the image data 122. For example, because the user 104 is facing towards the imaging device 114, the image data 122 represents the front of the user 104. As the user 104 is facing away from the smart appliance 116, image data 124 captured by the smart appliance 116 represents the back of the user 104” understoodly indicating the apparatus configured for detecting and determining a posture and location of a user relative to a device, and a posture parameter of the device indicating in a case front facing posture or rear facing posture and to transmit obviously said posture parameter to the microprocessor which in turns activates a front facing camera or a rear facing camera based on the and the apparatus in that case to capture or collect image data of the user and essentially transmitting said posture parameter);  and wherein the method further comprises: detecting, by the posture sensor, a posture parameter of the apparatus, and transmitting the posture parameter to the microprocessor (as implied in Fig. 1 above);  and performing one of the following operations: in response to determining, based on the posture parameter, that the apparatus is in a front placement posture, sending, by the microprocessor, a first enabling instruction that instructs to collect a front-facing image (a case exists in the cited Fig. 1 above in response to determining, based on the posture parameter of a front facing status, that the device is in a front placement posture, sending a first enabling instruction that instructs the camera to collect a front-facing image); or in in response to determining, based on the posture parameter, that the apparatus is in a back placement posture, sending, by the microprocessor, a second enabling instruction that instructs to collect a back-facing image (a case further exists in the cited Fig. 1 above in response to determining, based on the posture parameter of a rear  facing status, that said device is in a rear facing posture, sending another enabling instruction to a rear facing camera that instructs the camera to collect a rear facing image).

    Regarding claim 8 (according to claim 5), Goetz further teaches wherein the apparatus further comprises a distance sensor, and the method further comprises: detecting, by the distance sensor, a distance between the first user and the apparatus, and transmitting the distance to the microprocessor (the system of further lines 60-65 of Col. 1 and lines 60-65 of Col. 3 employs a plurality of means such as locating users by received RF data configured to detect a distance between the first user and the apparatus and transmit said distance to the microprocessor); and in response to determining that the distance is less than or equal to a preset distance, sending, by the microprocessor, a third enabling instruction (it is further implied in at least lines 60-65 of Col. 3, based on the user substantially at or near respective devices, enabling said device for video communication as requested by user which distance location understoodly comprising obviously a distance equal to a preset distance).

    Regarding claim 9 (according to claim 3), Goetz further teaches wherein the first enabling instruction is sent to a front-facing camera, or the second enabling instruction is sent to a back-facing camera (the enabled camera instruction of further  Figs. 3-4, and lines 20-30 of Col. 12, understoodly indicating one of said enabling instruction  sent to a front-facing user camera).

    Regarding claim 10 (according to claim 1), Goetz is silent regarding the at least one feature comprises a lip feature indicating that the target user is speaking.  
     Hart further teaches at least in para. 0049 and 0055 to determine and validating whether person speaking is relevant to the device as an active or target user, based on further at least one feature such as moving lips of the target user in the obtained user image data, that said target user is speaking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include determine that the target user is speaking, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed wakeup words and voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed wakeup words and voice instructions, Hart further complements Goetz in the sense that the voice interaction device is only awakened, after validating of both of the obtained voice data and image data, including additionally analyzed the image data to further ascertain a user facing the device and speaking to the device is actively further moving their lips indicating said target user is speaking and intended when combined to invoke the system or the voice interactive device of Goetz, which then would obviously triggers the system to activate applicable downstream component or to wakeup a device to perform the voice interaction and output user responses corresponding the received instructions, when it is ensured that said user is indeed the target user, which maybe further realized, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

    Regarding claim 11 (according to claim 7), Goetz further teaches wherein the first enabling instruction is sent to a front-facing camera, or the second enabling instruction is sent to a back-facing camera (the enabled camera instruction of further  Figs. 3-4, and lines 20-30 of Col. 12, understoodly indicating one of said enabling instruction  sent to a front-facing user camera).

    Regarding claim 12 (according to claim 5), Goetz is silent regarding the at least one feature comprises a lip feature indicating that the target user is speaking.  
     Hart further teaches at least in para. 0049 and 0055 to determine and validating whether person speaking is relevant to the device as an active or target user, based on further at least one feature such as moving lips of the target user in the obtained user image data, that said target user is speaking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include the at least one feature comprises a lip feature indicating that the target user is speaking, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed words and voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed wakeup words and voice instructions, Hart further complements Goetz in the sense that a target voice interaction device is only awakened, after validating of both of the obtained voice data and image data, including additionally analyzed the image data to further ascertain a user facing the device and speaking to the device is actively further moving their lips indicating said target user is speaking and intended to invoke the voice interactive device, which then triggers the system to wakeup said device to perform the voice interaction and output user responses corresponding the received instructions, whereby the system only employs said voice interaction when it is ensured that said user is indeed the target user, which maybe further realized, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

    Regarding claim 13, Goetz teaches in at least Col. 22, lines 10-30  a non-transitory computer-readable storage medium storing programming instructions for execution by a microprocessor and an application processor comprised in an apparatus (the system of further at least Fig. 7 and Figs. 3-4 at least a voice interaction processing apparatus comprising a plurality of application processors including a requested video communication application processing function), 
that when executed by the microprocessor and the application processor, cause the apparatus to perform operations comprising: 
receiving, by the microprocessor, voice data of a first user (the received voice data of at least Col. 12, lines 1-25 and Col. 15, lines 15-24 comprises at least a user wake word instruction and second user voice data);
determining, by the microprocessor and based on the voice data of the first user, that the first user is a target user (Col. 2, lines 25-38 further teaches monitoring received voices data of a plurality of users, and to ascertain a first target user to follow based on at least a level of engagement in the voice data and based on the voice data of said first user, that said first user is a target user to follow);
in response to determining that the first user is the target user, receiving, by the microprocessor, user image data (after determining the target user to follow in further Col. 2, lines 25-38, the apparatus is further configured for receiving user image data from at least one of a plurality of imaging devices);
sending, by the microprocessor, a wakeup instruction to the application processor (the system further understoodly configured to send the received wakewords or wakeup instructions of Col. 12, lines 1-25 and that of Col. 15, 15-24 to the application processor for at least as noted further in Col. 2, lines 25-38 for activating target devices according to at least user request); and receiving, by the application processor, the wakeup instruction, and waking up voice interaction software to provide a voice interaction function for the target user (one skill in the art would further appreciate said apparatus of  Col. 12, lines 1-25 and in Col. 15, lines 15-24 further adapted to receive said transmitted wakeword or wakeup instruction and further understoodly wake up said voice interaction software to provide a voice interaction function for the target user).
    However, Goetz is silent regarding specifically determining, based on at least one feature of the target user comprised in the user image data, that the target user is speaking; and sending said wakeup instruction to the application processor in response to determining that said target user is speaking based on the at least one feature of the target user comprised in the user image data.
     Hart teaches at least in para. 0049 and 0055 comprising a voice processing system configured for receiving first voice data of at least one user, and further configured to determine an active user based on  at least a posture of the user and/or device such as ascertaining whether the device is facing the user or the user is facing the device, the system further supplements the captured voice with image data to further determine whether the person is actually speaking from the image capture to at least determine a target activation device to further process user instructions  in response to determining that the target user is speaking based on the at least the cited image information comprising at least one feature of moving lips of the target user comprised in the user image data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include wherein said determining and said sending wakeup instruction to the application processor, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed wakeup words and/or voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed words and voice instructions, Hart further complements Goetz in the sense that user’s voice is supplemented with image captures, with additionally ascertaining user and device postures, and whether the target user is one speaking based on at his/her lips moving in the obtained image, based on the combination of the image and the audio ascertain a downstream device to activate to further process the target user instructions, as the target destination activation devices maybe in the art ones that require a wakeup transmission instruction for activation, said activation command or wakeup transmission instruction when combined with Hart only sent when it is affirmative the obtained audio and image combination of the target user is intended for the device and the user is actively speaking which when further combined with Goetz will activate the voice interaction system for interacting with said target user when it is ensured that said user is indeed the active user, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

    Regarding claim 14 (according to claim 13), Goetz further teaches wherein the determining that the user image data indicates that the target user is speaking comprises: determining, based on the user image data and by using a living detection method, the First Named Inventor Bailin WENtarget user (the apparatus of further Figs. 3-4, and lines 20-30 of Col. 12 further configured to activate video camera to identify user and capably understoodly to determine, based on the user image data and by using said camera and additional sensors comprising further a living detection method, that said target may obviously in a case speaking).
   Goetz is silent regarding further determining, based on said user image data and by using a living detection method, said First Named Inventor Bailin WENtarget user is speaking.  
     Hart further teaches at least in para. 0049 and 0055 to determine presence of a user using known methods, and validating whether person speaking is relevant to the device as an active or target user, based on further at least one feature such as moving lips of the target user in the obtained user image data, that said target user is speaking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include determining, based on said user image data and by using a living detection method, saidFirst Named Inventor Bailin WENAttorney Docket No.: 43968-Application No. : 16/840,7531165001 / 85807653US03Filed: April 6, 2020 Page: 4 of 10target user is speaking, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed words and voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed wakeup words and/or voice instructions, Hart further complements Goetz in the sense that a voice interaction device or target device is only awakened, after validating of both of the obtained voice data and image data, including additionally analyzed the image data to further ascertain a user facing the device and speaking to the device is actively further moving their lips indicating said target user is speaking and intended to invoke the voice interactive device, which then triggers the system to wakeup said device to perform the voice interaction and output user responses corresponding the received instructions, whereby the system only employs said voice interaction when it is ensured that said user is indeed the target user, which maybe further realized, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

   Regarding claim 15 (according to claim 13), Goetz further teaches wherein the apparatus further comprises a posture sensor (Goetz cited “As illustrated in FIG. 1, the imaging device 114 is positioned in the environment 112 to capture image data of the user 104 represented as the image data 122. For example, because the user 104 is facing towards the imaging device 114, the image data 122 represents the front of the user 104. As the user 104 is facing away from the smart appliance 116, image data 124 captured by the smart appliance 116 represents the back of the user 104” understoodly indicating the apparatus configured for detecting and determining a posture and location of a user relative to a device, and a posture parameter of the device indicating in a case front facing posture or rear facing posture and to transmit obviously said posture parameter to the microprocessor which in turns activates a front facing camera or a rear facing camera based on the and the apparatus in that case to capture or collect image data of the user and essentially transmitting said posture parameter);  and wherein the method further comprises: detecting, by the posture sensor, a posture parameter of the apparatus, and transmitting the posture parameter to the microprocessor (as implied in Fig. 1 above);  and performing one of the following operations: in response to determining, based on the posture parameter, that the apparatus is in a front placement posture, sending, by the microprocessor, a first enabling instruction that instructs to collect a front-facing image (a case exists in the cited Fig. 1 above in response to determining, based on the posture parameter of a front facing status, that the device is in a front placement posture, sending a first enabling instruction that instructs the camera to collect a front-facing image); or in in response to determining, based on the posture parameter, that the apparatus is in a back placement posture, sending, by the microprocessor, a second enabling instruction that instructs to collect a back-facing image (a case further exists in the cited Fig. 1 above in response to determining, based on the posture parameter of a rear  facing status, that said device is in a rear facing posture, sending another enabling instruction to a rear facing camera that instructs the camera to collect a rear facing image).

    Regarding claim 16 (according to claim 13), Goetz further teaches wherein the apparatus further comprises a distance sensor, and the method further comprises: detecting, by the distance sensor, a distance between the first user and the apparatus, and transmitting the distance to the microprocessor (the system of further lines 60-65 of Col. 1 and lines 60-65 of Col. 3 employs a plurality of means such as locating users by received RF data configured to detect a distance between the first user and the apparatus and transmit said distance to the microprocessor); and in response to determining that the distance is less than or equal to a preset distance, sending, by the microprocessor, a third enabling instruction (it is further implied in at least lines 60-65 of Col. 3, based on the user substantially at or near respective devices, enabling said device for video communication as requested by user which distance location understoodly comprising obviously a distance equal to a preset distance).

    Regarding claim 17 (according to claim 15), Goetz further teaches wherein the first enabling instruction is sent to a front-facing camera, or the second enabling instruction is sent to a back-facing camera (the enabled camera instruction of further  Figs. 3-4, and lines 20-30 of Col. 12, understoodly indicating one of said enabling instruction  sent to a front-facing user camera).

    Regarding claim 18 (according to claim 13), Goetz is silent regarding the at least one feature comprises a lip feature indicating that the target user is speaking.  
     Hart further teaches at least in para. 0049 and 0055 to determine and validating whether person speaking is relevant to the device as an active or target user, based on further at least one feature such as moving lips of the target user in the obtained user image data, that said target user is speaking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goetz in view of Hart to include determine that the target user is speaking, as discussed above, as Goetz in view of Hart are in the same field of endeavor of executing user commands from identified instructed wakeup words and voice instructions, where both prior arts used a mixed user detection including one of voice recognition and collected user image data to ascertain whether to execute instructed wakeup words and voice instructions, Hart further complements Goetz in the sense that the voice interaction device is only awakened, after validating of both of the obtained voice data and image data, including additionally analyzed the image data to further ascertain a user facing the device and speaking to the device is actively further moving their lips indicating said target user is speaking and intended when combined to invoke the system or the voice interactive device of Goetz, which then would obviously triggers the system to activate applicable downstream component or to wakeup a device to perform the voice interaction and output user responses corresponding the received instructions, when it is ensured that said user is indeed the target user, which maybe further realized, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCELLUS AUGUSTIN whose telephone number is (571)270-3384. The examiner can normally be reached 9 AM- 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BENNY TIEU can be reached on 571-272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARCELLUS J AUGUSTIN/Primary Examiner, Art Unit 2674                                                                                                                                                                                                        08/27/2022