DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on August 29, 2022 has been entered.   Claims 1, 9, 11, 19, and 20 have been amended.  Claims 1-20 are pending.
 


 Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-6, 8-15, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fox et al (US Patent Application Publication No. 2020/0125321) in view of Divakaran (US Patent Application Publication No. 2017/0160813) in view of Bathiche (US Patent No 2018/0232571) and in view of Merler (US Patent Application Publication No. 2019/0289327). 
Fox discloses digital assistant user interface amalgamation.  Regarding claims 1, 11, and 20, Fox discloses a computer-implemented method [Fig. 3/4/5; para 0014]; non-transitory computer readable medium including instructions that are executed by one or more processors [para 0014-0015];  a system [Fig. 3] comprising memories and processors for executing instructions [para 0014] for interpreting spoken user input, comprising: determining that a first prediction is relevant to a first text input that has been derived from a first spoken input received from a user [para 0034; 0044 -- the process couples the word action with the gesture provided by the user] ; generating a first predicted context based on the first prediction [para 0035 -- determines a meaning of the gesture that was performed by the user and uses natural language processing (NLP) to derive a meaning from the words spoken by the user];  and transmitting the first text input and the first predicted context to at least one software application that subsequently performs one or more additional actions based on the first text input and the first predicted context [para 0035 -- the QA system determines the appropriate response (action) and returns the action that the digital assistant is to perform back to the digital assistant via the computer network – where “system, turn on that light” , determines the words+gesture meaning is turn device lanmp0234 on, and the action is performed; para 0040-0041].    Fox fails to specifically teach a first and second type of non-verbal cue in determining prediction/context information.   In a similar field of endeavor, Divakaran teaches a virtual personal assistant that enables a person to interact with a computer-driven device using multi-modal inputs  such as spoken, written, or typed natural language, verbal cues, emotional cues, and/or visual cues [para 0002] and can also respond not only to natural language input, entered as spoken or typed words, but also to non-verbal cues derived from audio, visual, and/or tactile input. The system can use behavioral cues such as gestures, facial emotions, voice volume or pace, and so on to understand a person's emotional and/or cognitive state to adapt to human modes of communication and is also able to combine modalities, including both explicit and behavioral cues to fully understand a person's context [para 0156].  The system uses combined or fused processing of the plurality of analyzers [para 0272-0276] such that combinations of behavior cues [para 0283-0286] can be processed to (where head pose and facial expressions represent a first and second non-verbal cue) determine and analyze user context.   Divakaran teaches the system allows the personal assistant to engage the user in an interactive dialog to accomplish a task for the user [para 0002].  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the combination of multiple behavior cues processing suggested by Divakaran, in the system of Fox, to improve interactive dialog with user and the execution of user’s tasks, as suggested by Divakaran [para 0002].   Fox fails to teach implementing time marker with relevant time windows associated with base time markers.  Merler [para 0025-0029; 0034; 0036-0039; 0040-0043; 0047-0048; 0064-0067] provides a set of markers that are identified for media content, where each marker corresponds to one of a plurality of visible and audible cues in the content.  Segments are identified based on the identified set of markers and an excitement score is computed for each segment based on the identified markers that fall within the segment (“relevant time window”).  Merler teaches the system allows for determining the important moments within the content [para 0003].  One having ordinary skill in the art would have recognized the advantages of implementing the visual and audible cue processing and classification techniques suggested by Merler, for the purpose of ensuring important or necessary visual or audible cues are correctly identified and classified, as suggested by Merler, thereby improving system performance and user interactions with the system.  Fox fails to teach the context is based on a combination of features and first, second and third weights.  BATHICHE teaches an intelligent assistant device communicating non-verbal cues, where data indicating context information of the human is received from one or more sensors of the device and is used to determine the context of the user, where data from the various sensors are weighted [para 0063-0065] where a Kalman filter is utilized to combine the multiple weighted data inputs to output a more confident prediction [para 0094].  One having ordinary skill in the art would have recognized the advantages of implementing the context feature weighting/combination processing suggested by Bathiche in the Fox/Divakaran system, for the purpose of making possible more informative and rich interactions between the user and the intelligent device, as suggested by Bathiche [para 0026].

Regarding claim 2, the combination of Fox, Divakaran, Merler and Bathiche teaches the first prediction is based on one or more user actions [Fox at para 0034 – user performs gestures with various parts of the user's body, such as by moving the user's hands, arms, legs, etc].  
Regarding claims 3 and 13, the combination of Fox, Divakaran, Merler and Bathiche teaches the one or more additional actions performed by the at least one software application comprise generating a first text output based on the first text input and the first predicted context [Fox at para 0024 – QA system 100 may provide a response to users in a ranked list of answers].  
Regarding claims 4 and 14, the combination of Fox, Divakaran, Merler and Bathiche teaches generating the first predicted context comprises inputting the first prediction and the second prediction that is relevant to the first text input into a trained machine learning model that, in response, outputs a composite prediction that is included in the first predicted context [Fox at para 0036 – machine learning system; Divakaran at para 0272-0276 – fusion techniques; 0278 – interaction modeler; 0283-0286—combinations of be behavioral cues].  
Regarding claim 5, the combination of Fox, Divakaran, Merler and Bathiche teaches generating the first predicted context comprises applying one or more rules to the first prediction and the second prediction that is relevant to the first text to compute a composite prediction that is included in the first predicted context [Fox at para 0037; 0040-0041; 0044 -- the process retrieves previous trained responses that most closely match the word/gesture amalgamation that was received from the user.—where retrieving trained responses that closely match provides a form of a rule that is applied; Divakaran at para 0272-0276– fusion techniques; 0278 – interaction modeler; 0283-0286—combinations of be behavioral cues].    
Regarding claims 6 and 15, the combination of Fox, Divakaran, Merler and Bathiche teaches the first predicted context relates to at least one of an intention, an emotion, a personality trait, a user identification, a level of attentiveness, or a user action [Fox at para 0034 – user performs gestures with various parts of the user's body, such as by moving the user's hands, arms, legs, etc; para 0046 – facial gestures].    
Regarding claims 8 and 18, the combination of Fox, Divakaran, Merler and Bathiche teaches determining that a third prediction is relevant to a second text input that has been derived from a second spoken input received from the user; generating a third predicted context based on the second prediction and the first predicted context; and transmitting the second text input and the second predicted context to the at least one software application [Fox at para 0042 – the looping processing waits for the user to issue further commands; Divakaran uses combined or fused processing of the plurality of analyzers [para 0272-0276] such that combinations of behavior cues [para 0283-0286] can be processed – where implementing a third prediction from amongst the plurality of obtained behavioral cues is an obvious step requiring only routine skill in the art, so as to improve the interactive dialog with user and ensure the system successfully executes the user’s requested task].  
Regarding claims 9 and 19, the combination of Fox, Divakaran, Merler and Bathiche teaches determining the relevance time window based on a prediction type associated with the first prediction [Fox at para 0037-0038; 0040-0041; 0044; 0055 – deep learning model; Merler at 0064-0068 -- identified markers may include multimodal excitement features based on detections of different visible or audible cues of excitement such as crowd cheering, players celebrating, and excited commentator tone or expressions. The identified markers may also include markers based on on-screen overlay information such as TV graphics, texts, statistics, and scene changes…..overall excitement score for each segment based on the identified markers that fall within the segment. The system may aggregate and normalize positive scores for markers that fall within a timing window of a particular excitement marker as the excitement score of the segment].  
Regarding claim 10, the combination of Fox, Divakaran, Merler and Bathiche teaches the first prediction is generated by inputting at least one of an audible input associated with the user or a visual input associated with the user into a trained machine-learning model [Fox at para 0037-0038; 0045; 0055 – deep learning model].  
Regarding claim 12, the combination of Fox, Divakaran, Merler and Bathiche teaches the first type of non-verbal cue comprises at least one of a non-verbal sound, a gesture or a facial expression [Fox at para 0034 – user performs gestures with various parts of the user's body, such as by moving the user's hands, arms, legs, etc; para 0046 – facial gestures; Divakaran para 0002; 0156-- gestures, facial emotions, voice volume or pace].    



Claims 7 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Fox in view of Divakaran, Merler and Bathiche and further in view of Eldeeb et al (US Patent Application Publication No. 2020/0380389).
Regarding claims 7 and 16, Fox fails to teach the prediction is at least one of personality or a user identification.  In a similar field of endeavor, Eldeeb teaches sentiment and intent analysis by predicting user intent based on impressions of collected user data, where the impressions of collected user data include the collected data items are associated with one or more user activities (e.g., an email message composed by the user, a search phrase the user provided to a search engine, a speech input uttered by the user), the collected data items likely indicates or reflects, to certain degree, the user's social statuses, interests, characteristics, preferences, or traits [para 0257].  Eldeeb teaches the system provides for more intelligent suggestions to the user based on sentiment analysis, user intent prediction, and contextual information.  One having ordinary skill in the art at the time of the invention would have recognized the advantages of utilizing user traits/characteristics/preferences for prediction/collected data processing, as suggested by Eldeeb, for the purpose of improving the system by providing more intelligent responses to the users, as taught by Eldeeb.
Regarding claim 17, the combination of Fox, Divakaran, Merler, Bathiche and Eldeeb teaches the second prediction is generated by inputting at least one of an audible input associated with the user or a visual input associated with the user into a trained machine-learning model [Fox at para 0037-0038; 0045; 0055 – deep learning model; Divakaran at para 0272-0276 – fusion techniques; 0278 – interaction modeler; 0283-0286—combinations of be behavioral cues].  


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598. The examiner can normally be reached M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659