DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Sweden on December 20th, 2019. It is noted, however, that applicant has not filed a certified copy of the SE1951518-8 application as required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on December 21st, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
However, the information disclosure statement filed December 21st, 2020 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been considered.
Drawings
The drawings are objected to because they appear to show only a portion of each of the full drawings. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3-6, 8, 12, 16, and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 3 and 8, the phrase "preferably" renders the claim indefinite because it is unclear whether the limitation(s) following the phrase are part of the claimed invention.  See MPEP § 2173.05(d). For the purposes of expedited examination, “preferably” will be ignored, and it will be assumed that the limitations following the word “preferably” are instead required.
Claims 4-6 are dependent upon claim 3 and are thus rejected on similar grounds.
Claim 12 recites the limitation "...with said head posture detection step..." in line 4. There is insufficient antecedent basis for this limitation in the claim. For the purposes of expedited examination, it is assumed that the cited attention detection step is equivalent to the head posture detection step cited in claim 11; that is, claim 12 includes the following language:
Wherein the attention detection step further comprises a head posture detection step, in which the digital assistant, as a necessary condition for detecting said user attention, additionally detects a current head posture of the user and determines that a face of the user faces towards said predetermined object, within a predetermined head posture margin.
Claim 18 recites the limitation “…wherein the current speaker tracking step comprises…” in line 1.  There is insufficient antecedent basis for this limitation in the claim. For the purposes of expedited examination, it is assumed that claim 18 is dependent upon claim 2, rather than claim 1.
Regarding claim 16, lines 4-6 cite:
A facial expression step, in which a facial expression of the user is detected based on a set of predetermined parameters, which facial expression is indicative of the user not having finished speaking;
	It is unclear as to what is meant by “which facial expression is indicative of the user not having finished speaking”. Turning towards the specification, applicant discloses:
Thereafter, the endpoint detection step may comprise a facial expression detection step 423, in which a facial expression of the user 110, 210 is detected based on a set of predetermined facial expression parameters. Moreover, detected facial expressions are defined, in terms of said facial expression parameters, to be indicative of the user 110, 210 not yet having finished speaking.
	Thus, for the purposes of expedited examination, claim 16 will be assumed to read: 
A facial expression step, in which a facial expression of the user which is indicative of the user not having finished speaking is detected based on a set of predetermined parameters.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite a method for voice-based interactive communication, and a digital assistant arranged to perform the same method, and are drawn to a series of elements that can be broadly construed as performance in the mind with the aid of pen and paper. The method of claim 1, for example, may be performed by an individual who, upon noticing that they have the attention of a customer, may begin focusing on listening; this individual may then identify that the customer is speaking, write down what the customer is saying, confirm that the customer is the one speaking by seeing that the customer’s lips are moving, re-read the transcription of the customer’s words, and then respond accordingly. This judicial exception is not integrated into a practical application because it merely cites the use of a “digital assistant” as a tool to carry out the claimed method. It should additionally be noted that the step of activating a listening mode in a digital assistant could be considered an insignificant extra-solution activity in the form of activating data gathering (see MPEP 2106.05(g)). The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, while it is true that the claim cites the use of a “digital assistant” to perform the claimed processes, the “digital assistant” is no more than a computer executing a program as a tool to carry out a mental process (see MPEP 2106.05(f)). It should additionally be noted that the step of activating a listening mode in a digital assistant could be considered an extra-solution activity in the form of activating data gathering (see MPEP 2106.05(g)).
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because it is directed towards a “computer software product” without any structural recitations; that is, it is simply “software per se” (see MPEP 2106.03).
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 7-8, 13-15, and 17-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Konzelmann et al. (U.S. Patent Application Publication 2020/0349966 A1, hereinafter “Konzelmann”).
In regards to claim 1, Konzelmann teaches:
Method for voice-based interactive communication using a digital assistant (Paragraph 4), wherein the method comprises 
an attention detection step, in which the digital assistant detects a user attention (Paragraph 7, lines 1-3: the device determines (i.e. detects) whether the user is intending to interact (an intention to interact is construed as attention) with the automated assistant (i.e. digital assistant)) and as a result is set into a listening mode (Paragraph 7, lines 4-13 and Paragraph 8, lines 1-5: upon detecting that the user is intending to interact with the device, the device may initiate processing of certain sensor data, such as audio data (that is, setting into a listening mode)); 
a speaker detection step, in which the digital assistant detects the user as a current speaker (Paragraph 9, lines 9-18: the device may verify a spoken utterance is from a certain speaker; Paragraph 6: the spoken utterance may be from the user); 
a speech sound detection step, in which the digital assistant detects and records speech uttered by the current speaker (Paragraph 9, lines 1-5: the assistant may monitor (i.e. detect) and locally buffer (i.e. record) audio data; also see Paragraph 6: voice activity is one of several utilized attributes), which speech sound detection step further comprises a lip movement detection step, in which the digital assistant detects a lip movement of the current speaker (Paragraph 6: the device may utilize mouth movement of the user); 
a speech analysis step, in which the digital assistant parses said recorded speech and extracts speech-based verbal informational content from said recorded speech (Paragraph 58, lines 1-3: User may, e.g. speak “what’s today’s forecast” in order to obtain today’s forecast; Also Paragraph 9: the automated assistant can include speech-to-text processors and natural language understanding components); and 
a subsequent response step, in which the digital assistant provides feedback to the user based on said recorded speech (Paragraph 58, lines 1-3: User may, e.g. speak “what’s today’s forecast” in order to obtain today’s forecast from the automated assistant).
In regards to claim 2, Konzelmann further teaches:
Method according to claim 1, wherein the method further comprises a current speaker tracking step, in which the digital assistant tracks the current speaker using visual 20and/or voice tracking of the current speaker (Paragraph 9, lines 9-18: the device may verify a spoken utterance is from (i.e. track) a certain speaker, such as one that matches a speaker visually).
In regards to claim 3, Konzelmann further teaches:
Method according to claim 1, wherein the method further comprises a lip movement synchronisation step, in which lip movement synchronisation step the digital assistant temporally correlates detected lip movement to detected speech sound, and in which lip move25ment synchronisation step particular detected discreet lip movements are temporally correlated to particular detected discreet speech sounds (Paragraph 52: the device may process a stream of image frames and audio frames to determine whether a mouth (i.e. lip) movement and a voice activity (i.e. speech sound) are co-occurring (i.e. temporally correlated)).
In regards to claim 7, Konzelmann further teaches:
Method according to claim 1, wherein the speaker detection step comprises temporally correlating detected lip movements of the user to detected speech and detecting the user as the current speaker based on said temporal correlation (Paragraph 49: the device has a mouth/voice module that determined co-occurrence of mouth movement and voice activity (i.e. temporally correlates detected lip movements of the user to detected speech) in order to generate a probability of whether the user captured in the vision data is actually speaking (i.e. detecting the user as the current speaker)).
In regards to claim 8, Konzelmann further teaches:
Method according to claim 1, wherein the method further comprises 
a speaker identification step comprising a visual user authentication step and/or a voice sound characteristics based user authentication step (Paragraph 54: the face recognition module may compare visually captured images to user accounts/profiles (i.e. visual user authentication); the account identifier associated with the recognized face can be compared to an account identifier determined based on text-independent speaker identification that is based on processing of audio data that captures a spoken utterance (i.e. voice sound characteristics based user authentication); see also Paragraph 78, where this voice sound characteristics based user authentication is further described), 
and wherein the method preferably further comprises selecting a set of user interaction parameters based on an identity identified in said speaker identification step (Paragraph 54, lines 11-15: adapting one or more functions (i.e. selecting a set of user interaction parameters) can be contingent on the user being a recognized user (i.e. based on an identity identified).
In regards to claim 13, Konzelmann further teaches:
Method according to claim 1, wherein the method further comprises an endpoint detection step after the speech sound detection step, in which the digital assistant detects an endpoint of said speech and as a result exits said listening mode (Paragraph 44: the engine will only process audio data (i.e. remain in a listening mode) if e.g.  a directed gaze of the user is determined, and continues until an endpoint is determined. See also Paragraph 16: the system is capable of detecting an end of voice activity (i.e. an endpoint of said speech) and optionally halt the processing of certain sensor data (i.e. exit said listening mode))
In regards to claim 14, Konzelmann further teaches:
Method according to claim 13, wherein the speech analysis step comprises the digital assistant detecting a current head posture or head posture change and updates an endpoint detection parameter used in the detection of said endpoint in said endpoint detection step (Paragraph 44: Paragraph 44: the engine will only process audio data (i.e. remain in a listening mode) if e.g.  a directed gaze of the user is determined, and continues until an endpoint is determined; also Paragraph 13, lines 17-20: the user may redirect their gaze to halt assistant functions. Also Fig. 5, element 510 and Paragraphs 108-109: the system determines whether occurrence of a directed gaze has been determined, and whether it continues; this determination results in a “yes” or “no” decision which is construed as a parameter; also Paragraph 111, which notes that endpointing may utilize analysis of vision data, which may only occur after a “yes” determination in the directed gaze occurrence detection).
In regards to claim 15, Konzelmann further teaches:
Method according to claim 14, wherein the speech analysis step further comprises the digital assistant detecting that the current speaker either turns its face away from the27 predetermined object (Paragraph 6, lines 1-4: the assistant device is a predetermined object) or has its face turned away from the predetermined object during at least a predetermined time period, and interprets this event as an endpoint (Paragraph 44: Paragraph 44: the engine will only process audio data (i.e. remain in a listening mode) if e.g. a directed gaze of the user is determined, and continues until an endpoint is determined; also Paragraph 13, lines 17-20: the user may redirect their gaze to halt assistant functions (i.e. the system will interpret a user looking away as an endpoint); see also Paragraph 111, which notes that endpointing may utilize analysis of vision data).
In regards to claim 17, Konzelmann further teaches
Method according to claim 1, wherein the speech sound detection step comprises the digital assistant detecting a physical non-presence of the current speaker and as a result stays in the speech sound detection step for a predetermined amount of time, after which the digital assistant ends the speech sound detection step in case neither a physical pres25ence of the current speaker nor a speech sound associated with the current speaker has been detected (Paragraph 58: Konzelmann describes how a user may skip using an “explicit invocation phrase” by directing a gaze at the client device and speaking during or temporally near the directed gaze. That is, the device begins listening after a gaze is directed at it, and stops listening after a threshold of time is over. Paragraph 57 notes that such behavior may be based on, generally, the occurrence of any of the attributes. One such attribute may be the physical presence of one or more users, as noted in Paragraph 22). 
In regards to claim 18, Konzelmann further teaches:
Method according to claim 1, wherein the current speaker tracking step comprises the digital assistant determining a physical distance to the current speaker (Paragraph 8: the device performs adaptations based on the attributes; Paragraph 6, lines 4-5: one such attribute may be physical distance to the current speaker), and wherein the 30response step comprises the digital assistant modifying a response sound volume based on said physical distance (Paragraph 1: the assistant may adapt by reducing the volume of the audible user interface output (i.e. modifying a response sound volume) in response to occurrence of one or more other attributes (e.g. physical distance)).
In regards to claim 19, claim 19 is a device claim corresponding to the method of claim 1. Thus, it is rejected on similar grounds.
In regards to claim 20, claim 20 is a software claim corresponding to the method of claim 1. Thus, it is rejected on similar grounds. In addition, Konzelmann further teaches:
Computer software product (Paragraph 125) arranged to, when executed on one or several CPUs (Paragraph 126), cause a digital assistant to perform the method of claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 4-6 are rejected under 35 U.S.C. 103 as being unpatentable over Konzelmann as applied to claim 3 above, and further in view of Kim et al. (U.S. Patent Application Publication 2018/0268812 A1, hereinafter “Kim”).
In regards to claim 4, Konzelmann does not explicitly teach a lip movement interpretation step, in which the digital assistant interprets a lip-based 30verbal informational content corresponding to said detected lip movement,25 wherein, in the lip movement synchronisation step, the digital assistant temporally correlates said speech-based verbal informational content to said lip-based verbal informational content; nor does Konzelmann teach a speech parsing step, in which the digital assistant uses said lip-based verbal infor5mational content to parse said speech-based verbal informational content.
In a related art, Kim teaches a system for improving endpoint detection of a voice query by utilizing images corresponding to lip movement on a face (Abstract). Notably, Kim teaches that their system can be used to independently verify speech recognition by an ASR system to reduce the likelihood of generating incorrect transcription hypotheses (Paragraph 5, lines 9-17), specifically discussing how lip movement data may indicate terms and/or phrases spoken by the user (i.e. interpreting a lip-based verbal information content corresponding to said detected lip movement) and be used to identify and/or correct misrecognized terms (i.e. parsing speech-based verbal informational content by using said lip-based verbal informational content). Kim also teaches that their system may reduce false positive voice query detection, reduce the missed detection of specified personal assistant system activation terms or phrases, or identify the occurrence of multiple voice commands within audio data encoding a received query (Paragraph 5, lines 1-9).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Konzelmann to incorporate the endpointing techniques and teachings of Kim. Doing so may have helped reduce false positive voice query detection, reduce the missed detection of specified personal assistant system activation terms or phrases, or identify the occurrence of multiple voice commands within audio data encoding a received query, as taught by Kim.
Thus, the combination of Konzelmann and Kim teach:
Method according to claim 3, wherein the method further comprises: 
a lip movement interpretation step, in which the digital assistant interprets a lip-based verbal informational content corresponding to said detected lip movement (Kim, Paragraph 5, lines 9-17), wherein, in the lip movement synchronisation step, the digital assistant temporally correlates said speech-based verbal informational content to said lip-based verbal informational content (Konzelmann, Paragraph 52: the device may process a stream of image frames and audio frames to determine whether a mouth (i.e. lip) movement and a voice activity (i.e. speech sound) are co-occurring (i.e. temporally correlated)); and wherein the method further comprises 
a speech parsing step, in which the digital assistant uses said lip-based verbal informational content to parse said speech-based verbal informational content (Kim, Paragraph 5, lines 9-17).
In regards to claim 5, Kim further teaches:
Method according to claim 4, wherein said speech parsing step comprises the digital assistant purging detected speech sound and/or extracted speech-based verbal informational content not corresponding to detected lip movement (Paragraph 5, lines 9-17: lip movement data may indicate terms and/or phrases spoken by the user and may be used to identify and/or correct (i.e. purge) misrecognized terms (i.e. detected speech sound and/or extracted speech-based verbal informational content)).
In regards to claim 6, Konzelmann further teaches:
Method according to claim 5, wherein the method further comprises providing to the digital assistant a predetermined attention phrase (Paragraph 9, lines 1-4: the automated system may monitor for occurrence of an explicit invocation phrase (i.e. predetermined attention phrase)) and wherein said speech parsing step comprises the digital assistant purging extracted speech-based verbal information content starting with said attention phrase uttered by a different person than the current speaker (Paragraph 9, lines 9-18: the device may verify a spoken utterance is from a certain (i.e. current) speaker and may adapt (e.g. process the data) in response to this occurrence. Paragraph 9, lines 1-9: Prior to this adaptation, such as when the spoken utterance is not verified to be from a certain speaker, the device will “discard” (i.e. purge) locally buffered audio data (i.e. speech-based verbal information content) after minimally processing it).
Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Konzelmann as applied to claim 13 above, and further in view of Nair et al. (U.S. Patent Application Publication 2021/0224031 A1, hereinafter “Nair”):
In regards to claim 9, Konzelmann further teaches 
a subsequent follow-up interaction step, in which additional speech is detected and analysed and in which the digital assistant provides feedback to the user based on said additional speech (Paragraph 136: the device may detect voice activity occurring at a second time that is subsequent to the first time, and may process the audio data, or render one or more human perceptible cues in response), 
wherein, in said follow-up interaction step, the digital assistant maintains the identifica30tion of the current speaker (Paragraph 151: the method may include determining that the user is a recognized user that is the same recognized user that initiative providing of the content being rendered by the device),26 
	While Konzelmann does teach maintaining the identification of the current speaker based on processing of one or more of the image frames locally at the client device (Paragraph 151), and also discusses how the device may utilize facial recognition in identifying the user (Paragraph 6), Konzelmann does not explicitly teach and wherein, in said follow-up interaction step, a follow-up feedback provided to the current speaker is determined based in part on a facial expression detected for the current speaker  using a visual depiction of the current speaker and image analysis.
	In a related art, Nair teaches a system for providing audio information responsive to received input with a digital assistant (Abstract). Notably, Nair teaches, after stopping the provision of the audio information, detecting one or more visual characteristics and determining whether the one or more visual characteristics indicate that further speech input is expected; and responding or forgoing a response based on whether that further speech input is expected (Paragraph 10). In addition, the one or more visual characteristics may include a facial expression (Paragraph 11). That is to say, a follow-up feedback (the absence or presence of a response) provided to the current speaker is determined based in part on a facial expression detected for the speaker. Nair teaches that their system provides a more natural and efficient interaction with the digital assistant by allowing a user to speak without being interrupted or distracted by audio information from the digital assistant.
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Konzelmann to incorporate the teachings of Nair to detect a facial expression to determine how to respond to the user. Doing so would have allowed the system to provide a more natural and efficient interaction with the digital assistant, as taught by Nair.
	Thus, the combination of Konzelmann and Nair teaches:
Method according to claim 1, wherein the method further comprises 
a subsequent follow-up interaction step, in which additional speech is detected and analysed and in which the digital assistant provides feedback to the user based on said additional speech (Konzelmann, Paragraph 136: the device may detect voice activity occurring at a second time that is subsequent to the first time, and may process the audio data, or render one or more human perceptible cues in response), 
wherein, in said follow-up interaction step, the digital assistant maintains the identifica30tion of the current speaker (Konzelmann, Paragraph 151: the method may include determining that the user is a recognized user that is the same recognized user that initiated providing of the content being rendered by the device),26 
and wherein, in said follow-up interaction step, a follow-up feedback provided to the current speaker is determined based in part on a facial expression detected for the current speaker (Nair, Paragraph 10: a response is provided or foregone based on whether further speech input is expected, which is determined based on visual characteristics; Paragraph 11: visual characteristics may include a facial expression) using a visual depiction of the current speaker and image analysis (Nair, Paragraph 26: system includes image (i.e. visual depiction) sensors).
Claims 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Konzelmann as applied to claim 1 above, and further in view of Alameh et al. (U.S. Patent 10,713,002 B1, hereinafter “Alameh”).
In regards to claim 10, Konzelmann further teaches:
Method according to claim 1, wherein the attention detection step further comprises 
a gaze detection step, in which the digital assistant, as a necessary condition for detecting a user attention, detects a current gaze of the user and determines that said gaze is directed towards a predetermined object (Paragraph 6, lines 1-4: the assistant device is a predetermined object; see also Paragraph 45, which notes that a user can divert his/her eyes to halt certain processing; thus, gaze may refer to direction of the eyes).
	However, Konzelmann fails to teach a predetermined margin for the gaze direction.
	In a related art, Alameh teaches an electronic device that receives an audio input and adjusts audio output based on the audio input (Abstract) and may perform the functions of a voice assistant (Col. 8, lines 7-13). Notably, Alameh teaches a gaze detector component and a face analyzer component that may help confirm a user’s intent to be authenticated (Col. 19, lines 9-47). The gaze detector may notably detect a “gaze cone”, which is a field of view within which the user may easily see without diverting their eyes or head (Col. 19, lines 16-20; i.e. predetermined margin for gaze direction).
	It would have been obvious to one of ordinary skill in the art at the time of filing to modify Konzelmann to incorporate the teachings of Alameh to include the gaze detection and face analyzer components. Doing so may have allowed the system to better confirm a user’s intent to be authenticated, as taught by Alameh.
	Thus, the combination of Konzelmann and Alameh teaches:
Method according to claim 1, wherein the attention detection step further comprises 
a gaze detection step, in which the digital assistant, as a necessary condition for detecting a user attention, detects a current gaze of the user and determines that said gaze is directed towards a predetermined object (Konzelmann, Paragraph 6, lines 1-4: the assistant device is a predetermined object; see also Paragraph 45, which notes that a user can divert his/her eyes to halt certain processing; thus, gaze may refer to direction of the eyes), within a predetermined gaze direction margin (Alameh, Col. 19, lines 9-47).
In regards to claim 11, Konzelmann further teaches:
Method according to claim 1, wherein the attention detection step further comprises a 
head posture detection step, in which the digital assistant, as a necessary condition for detecting said user attention, additionally detects a current head posture of the user and determines that a face of the user faces towards said predetermined object (Paragraph 6, lines 9-11: user attention may be detected by a pose of the body, which may include the head; alternatively, Paragraph 23 notes that gaze detection includes detection of the face, which is necessarily connected to the head posture).
	However, Konzelmann fails to teach a predetermined head posture margin.
	In a related art, Alameh teaches an electronic device that receives an audio input and adjusts audio output based on the audio input (Abstract) and may perform the functions of a voice assistant (Col. 8, lines 7-13). Notably, Alameh teaches a gaze detector component and a face analyzer component that may help confirm a user’s intent to be authenticated (Col. 19, lines 9-47). The gaze detector may notably detect a “gaze cone”, which is a field of view within which the user may easily see without diverting their eyes or head (Col. 19, lines 16-20; i.e. predetermined head posture margin).
	It would have been obvious to one of ordinary skill in the art at the time of filing to modify Konzelmann to incorporate the teachings of Alameh to include the gaze detection and face analyzer components. Doing so may have allowed the system to better confirm a user’s intent to be authenticated, as taught by Alameh.
	Thus, the combination of Konzelmann and Alameh teaches:
Method according to claim 1, wherein the attention detection step further comprises a 
head posture detection step, in which the digital assistant, as a necessary condition for detecting said user attention, additionally detects a current head posture of the user and determines that a face of the user faces towards said predetermined object (Konzelmann, Paragraph 6, lines 9-11: user attention may be detected by a pose of the body, which may include the head; alternatively, Paragraph 23 notes that gaze detection includes detection of the face, which is necessarily connected to the head posture), within a predetermined head posture margin (Alameh, Col. 19, lines 9-47: note that the gaze cone takes into account what a user may easily see without diverting their head as well as their eyes; thus, the system is considering the posture of the head and what direction is being faced).
In regards to claim 12, Konzelmann and Alameh further teach:
Method according to claim 10, 
wherein the attention detection step further comprises a head posture detection step, in which the digital assistant, as a necessary condition for detecting said user attention, additionally detects a current head posture of the user and determines that a face of the user faces towards said predetermined object (Konzelmann, Paragraph 6, lines 9-11: user attention may be detected by a pose of the body, which may include the head; alternatively, Paragraph 23 notes that gaze detection includes detection of the face, which is necessarily connected to the head posture), within a predetermined head posture margin (Alameh, Col. 19, lines 9-47: note that the gaze cone takes into account what a user may easily see without diverting their head as well as their eyes; thus, the system is considering the posture of the head and what direction is being faced; note that these limitations are imported from claim 11 in order to remedy the issue of indefiniteness described in the rejection under 35 U.S.C. 112(b) above); and
wherein the method further comprises providing to the digital assistant a predetermined attention phrase (Konzelmann, Paragraph 9, lines 1-4: the automated assistant may monitor for occurrence of an explicit invocation phrase (i.e. predetermined attention phrase)), and wherein the attention detection step comprises the digital assistant detecting a user attention based on said gaze detection step together with said head posture detection step (Alameh, Col. 19, lines 9-47) but in the absence of a detection of 20said predetermined attention phrase (Konzelmann, Paragraph 57: the need for the user to speak an explicit invocation phrase (i.e. predetermined attention phrase) may be obviated by one or more attributes by modules of the adaptations; Paragraph 58: one example of such adaptations may be detection that the user is looking at the client device).
Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Konzelmann as applied to claim 13 above, and further in view of Nair and Rodenas et al. (U.S. Patent Application Publication 2018/030,0557 A1, hereinafter “Rodenas”):
In regards to claim 16, Konzelmann further teaches:
Method according to claim 13, wherein the endpoint detection step further comprises
5a user depiction step, in which a face of the user is depicted and stored as a digital image (Paragraph 53: vision data (i.e. digital image) can include (i.e. depict) a region that is detected to only contain a face of a user); 
	However, Konzelmann fails to explicitly teach a facial expression detection step, in which a facial expression of the user is detected based on a set of predetermined parameters, which facial expression is indicative of the user not having finished speaking; and 10as a result of said facial expression detection, postponing the endpoint detection despite no speech currently being detected, wherein, preferably, the facial expression is detected based on any one or several of the following parameter sets: a first parameter set defining the facial expression based on a detected eyeball posi15tion; a second parameter set defining the facial expression based on a detected lip openness; and a third parameter set defining the facial expression based on a detected eyebrow configuration.
In a related art, Nair teaches a system for providing audio information responsive to received input with a digital assistant (Abstract). Notably, Nair teaches, after stopping the provision of the audio information, detecting one or more visual characteristics and determining whether the one or more visual characteristics indicate that further speech input is expected; and responding or forgoing a response based on whether that further speech input is expected (Paragraph 10; i.e. postponing the endpoint detection despite no speech currently being detected). In addition, the one or more visual characteristics may include a facial expression or an eye gaze (Paragraph 11: eye gaze may be construed as “eyeball position”).
In another related art, Rodenas teaches a system for analyzing video data to determine relevant behaviors and determining whether and what action should be taken (Abstract). Notably, Rodenas teaches that the relative locations and shapes of things like a person’s lips, eyebrows, eyelids, and other such features can be indicative of the mood or sentiment of a user (Paragraph 23). Rodenas then goes further to note how the mood or sentiment of a user may suggest the oncoming behavior of the user (Paragraph 24). Rodenas notes that the ability to recognize behavioral patterns may be useful for security or advertisement purposes (Paragraphs 21-22).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Konzelmann to incorporate the teachings of Nair to detect a facial expression to determine how to respond to the user. Doing so would have allowed the system to provide a more natural and efficient interaction with the digital assistant, as taught by Nair. Furthermore, it would have been obvious to further improve the teachings of Nair with the teachings of Rodenas to include analysis of a person’s lips, eyebrows, eyelids and other such features in order to determine the oncoming behavior of the user; the system of Rodenas may be adapted to determine, based on the mood or sentiment of a user, that they may be inputting further speech, but the combination may also benefit from the security or advertisement utilities taught by Rodenas, as well.
	Thus, the combination of Konzelmann, Nair, and Rodenas teaches:
Method according to claim 13, wherein the endpoint detection step further comprises 
5a user depiction step, in which a face of the user is depicted and stored as a digital image (Konzelmann, Paragraph 53: vision data (i.e. digital image) can include (i.e. depict) a region that is detected to only contain a face of a user); 
a facial expression detection step, in which a facial expression of the user is detected based on a set of predetermined parameters, which facial expression is indicative of the user not having finished speaking (Nair, Paragraph 10); and 
10as a result of said facial expression detection, postponing the endpoint detection despite no speech currently being detected (Paragraph 10), wherein, preferably, the facial expression is detected based on any one or several of the following parameter sets: 
a first parameter set defining the facial expression based on a detected eyeball posi15tion (Paragraph 11: eye gaze may be construed as “eyeball position”); 
a second parameter set defining the facial expression based on a detected lip openness (Rodenas, paragraph 24, lines 35-38: Rodenas notes that an open mouth may be a facial feature of note); and 
a third parameter set defining the facial expression based on a detected eyebrow configuration (Rodenas, Paragraph 23: eyebrows are an example feature that may indicate a mood or sentiment).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yuan et al. (U.S. Patent Application Publication 2020/0341546 A1) teaches an automated system that may respond to a user’s gaze.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER J KIM whose telephone number is (571)272-4442. The examiner can normally be reached M-F 7:30 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALEXANDER JOONGIE KIM/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655