DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

Acknowledgment is made that claims 8 and 14-20 are amended.  Claims 7 and 12 are cancelled.   Claim 22 is new.  Claims 1-6, 8-11 and 13-22 are pending in the instant application.

Response to Arguments

Applicant’s arguments, see Remarks, filed on 1/25/2021 have been fully considered.  

Claim Rejections under 35 U.S.C. 103
Claims 1-6, 8-9, 13, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Nuta et al. (US 9, 641,681 B2), hereinafter Nuta further in view of Breazeal et al. (US 2018/0133900) hereinafter Breazeal.

predicting, by a computing device, a subsequent spoken word of a user based on a monitoring of a contextual data.”  Applicant refers to previously cited columns and line numbers and asserts that Nuta discusses predicting whether a word (that has been spoken) is in a class of words.  The examiner respectfully disagrees and finds the argument unpersuasive.

Nuta discloses technique of training a speaker specific language model using transcripts from a suitable group of conversations involving a speaker or member of a group of speakers, the transcript includes at least a specified number of words spoken by the speakers or members of the group of speakers (see col. 13, lines 7-18). For example, sales representatives from different companies tend to utter different names of their respective company’s employees, brand products and customers (see col. 13, lines 31-34 and 54-57), the names spoken by the sale representative can be control data for the language model.  The language model predicts that the next word or phrase in the sequence is a name then it can determine which name is presented using a class-specific language model trained to recognize names.  State another way, the language model is fed with a user’s conversations and trained to recognize when a word in a word sequence is an instance of a recognized class e.g. name, the language model inputs the user’s speech into a machine-learned speech recognition model, and provides an expected name as an additional input to the machine-learned speech recognition model.  Therefore, contrary to applicant’s assertion, the machine-learned speech recognition model predicts a subsequent spoken work of a user.  Applicant’s 

Claims 2-6, 8, 9 and 13
Applicant argues these claims conditionally based on the arguments presented to their parent claim.  Applicant’s argument is unpersuasive for the reason set forth above.

Claim 9 is taught by Moncomble and the same ground of rejection is made therefore, claim 9 is not allowable.

Claim 19 is amended with the following features:
“program instructions to monitor second contextual data associated with a second user during the conference;
program instructions to determine a second user interest level based on the monitoring the second contextual data;
program instructions to determine a second control instruction to provide to a second user device associated with the second user based on the second user’s interest level, wherein the second control instruction causes the second user device to modulate a tone and adjust a tempo of the voice of the speaker in the conference, wherein the control instruction and the second control instruction are different instructions; and
program instructions to output the second control instruction to cause the second user device to execute the second control instruction.” (Emphasis added)
On page 15 of the Remarks, applicant argues the combined system of Moncomble, Nuta and Breazeal does not teach the limitation “program instruction to predict a subsequent spoken word of user based on the contextual data.”  Applicant’s argument is unpersuasive for the reason set forth above.

Regarding the amended features, a new ground of rejection is made.

Claim 20
Applicant argues the claims conditionally based on the argument of claim 19, claim 19 is rejected under a new ground and so is claim 20.

Claim 14-18 and 21
On page 17 of the Remarks, applicant argues that the combination of Moncomble, Nuta and Breazeal does not teach the limitation “predict a subsequent spoken word of the user based on the monitoring the contextual data.” Applicant’s argument is unpersuasive for the reason set forth above.

Claims 15-18 and 21 
These claims are dependences of claim 14, therefore, the cited referenced still teach these claims.  Regarding the amendment made to claim 15, a new ground of rejection is made to the claim.

Claims 10 and 11 
On page 18 of the Remarks, applicant argues that the combination of Moncomble, Nuta and Breazeal does not teach the limitation “predict a subsequent spoken word of the user based on the monitoring the contextual data.” Applicant’s argument is unpersuasive for the reason set forth above.

Claim 12
On page 18 of the Remarks, applicant argues that the combination of Moncomble, Nuta, Breazeal and Kothuri does not teach the limitation “predict a subsequent spoken word of the user based on the monitoring the contextual data.” Applicant’s argument is unpersuasive for the reason set forth above.

Claim 22 is new claim with additional features therefore a new ground of rejection is made to the claim.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-6, 9 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Nuta et al. (US 9, 641,681 B2), hereinafter Nuta further in view of Breazeal et al. (US 2018/0133900) hereinafter Breazeal.

As for claim 1, Moncomble teaches a computer-implemented method comprising:
monitoring, by a computing device, contextual data associated with a user during a conference (paragraphs [0072]-[0073] and [0076]-[0077] describe a learning device measure attention level of audience attending a conference by capturing a viewer’s facial movements, eyes movements, frequency of yawning.  Note: paragraph [0170] describes the learning device and the prediction device are contained in a network server, therefore, actions performed by any of the two devices and the server are construed as actions performed by a computing device);
determining, by the computing device, a user interest level based on the monitoring the contextual data (paragraphs [0072]-[0073] describe the learning device measure attention level of the audience by detecting the audience’s behaviors  paragraph [0082] describes the determination of attention level of the audience);
wherein a user interest level is a user’s context (paragraph [0076] describes the attention level is measured based analysis of the faces in the audience);
wherein a voice is of a speaker at a conference (paragraph [0072] describes a speaker gives an audio presentation to audiences).
Moncomble fails to teach predicting, by a computing device, a subsequent spoken word of a user based on a monitoring of a contextual data;

outputting, by the computing device, the control instruction to cause the user device to execute the control instruction.
However, it is well known in the art, to apply a predictive model to a user’s conversation, as evidenced by Nuta.
Nuta discloses 
predicting, by a computing device, a subsequent spoken word of a user based on a monitoring of a contextual data (col. 13, lines 41-45 and 51-54 and col. 14, lines 1- 5 describe a language model of a speech recognition engine (see Fig. 2) can predict a next word or phrase in a sequence is a name);
determining, by the computing device, a control instruction to provide to a user device associated with a user based a user’s context and a predicted subsequent spoken word (col. 18, lines 29-41 describe speech quality metrics i.e. user’s context are evaluated; col. 20, lines 30-33 describe tone and cadence metrics indicate the conversation participant’s interest i.e. user’s context;  col. 24, lines 1-14 describe the speech quality metrics are used to provides suggestions to the speaker; col. 18, lines 10-15 describe speech data which collects by the speech recognition engine is used to provide assessments of the conversation represented by the speech data; col. 27, lines 34-44 describe the assessment includes recommendation for a participant in the conversation, thus the speech data i.e. prediction of a word is used to provide 
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Nuta for implementing a predictive model to obtain conversation metric data and conversation assessment data include in a plurality of conversations. The teachings of Nuta, when implemented in the Moncomble system, will allow one of ordinary skill in the art to evaluate the quality of a participant’s contributions to a conversation. One of ordinary skill in the art would be motivated to utilize the teachings of Nuta in the Moncomble system in order to improve the quality of conversations and guide a speaker to speak in ways that are more likely to induce participants’ cooperation and interest. 
The combined system of Moncomble and Nuta fails to teach wherein a control instruction is directed to a user device;
outputting, by a computing device, a control instruction to cause the user device to execute the control instruction.
However, it is well known in the art, to adjust a speech based on context associated with individuals during a conversation, as evidenced by Breazeal.
Breazeal discloses wherein a control instruction is directed to a user device  (paragraphs [0123]-[0125] describe a diction engine provides instruction for addressing issues e.g. generates tags in the form of text, voice, vision to direct a robot to adjust 
outputting, by the computing device, the control instruction to cause the user device to execute the control instruction (paragraphs [0125] and [0127] describe tags are provided to the robot to execute).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Breazeal for adjusting of a speech based on data associated with parties participated in a conversation. The teachings of Breazeal, when implemented in the Moncomble and Nuta system, will allow one of ordinary skill in the art to enhance user experience in a collaborate event. One of ordinary skill in the art would be motivated to utilize the teachings of Breazeal in the Moncomble and Nuta system in order to enhance a user experience during a conversation.

As for claim 2, the combined system of Moncomble and Nuta teaches all the information set forth above except wherein a control instruction further includes at least one selected from the group consisting of:
an instruction to modulate a volume, or accent of the speaker in the conference; an instruction to pause the speaker’s voice during conversation; an instruction to provide haptic feedback to the user via the user device; and an instruction to provide a visual alert or animation on the user device.
However, it is well known in the art, to provide a mode of operation of a device during a conversation, as evidenced by Breazeal.

an instruction to modulate a volume, or accent of the speaker in the conference; an instruction to pause the speaker’s voice during conversation; an instruction to provide haptic feedback to the user via the user device; and an instruction to provide a visual alert or animation on the user device (paragraph [0125] describes ESML tags that are generated include text, voice, vision).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Breazeal for providing instructions to adjust a speech.  The teachings of Breazeal, when implemented in the Moncomble and Nuta system, will allow one of ordinary skill in the art to enhance use experience in a collaborate event. One of ordinary skill in the art would be motivated to utilize the teachings of Breazeal in the Moncomble and Nuta system in order to enhance a user experience during a conversation.

As for claim 3, the combined system of Moncomble and Nuta teaches  all the limitations set forth above except wherein a control instruction is a first control instruction, the method further comprising determining a second control instruction for a different user device associated with a different user, wherein the first control instruction and the second control instruction are different.
However, it is well known in the art, to provide instructions to adjust a speech to different users, as evidenced by Breazeal.

One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Breazeal for providing instructions to adjust a speech.  The teachings of Breazeal, when implemented in the Moncomble and Nuta system, will allow one of ordinary skill in the art to enhance use experience in a collaborate event. One of ordinary skill in the art would be motivated to utilize the teachings of Breazeal in the Moncomble and Nuta system in order to enhance a user experience during a conversation.

As for claim 4, the combined system of Moncomble, Nuta and Breazeal teaches wherein the first control instruction and the second control instruction are provided by different communications channels (Moncomble: paragraph [0126] describes the 

As for claim 5, the combined system of Moncomble, Nuta and Breazeal teaches wherein the control instruction is determined based on criteria that maps the control instruction to the user interest level (Moncomble: paragraphs [0122]-[0123] and [0126] describe a search is performed in a database of characteristic elements of the speaker and/or of the presentation and of the associated parameters so as to find a corresponding information in relation to the change in a probable attention level, the prediction of the attention level is associated with determination of recommendations for actions to be performed on the presentation so as to change the attention level in a desired direction).

As for claim 6, the combined system of Moncomble, Nuta and Breazeal teaches the method further comprising determining an effectiveness of the control instruction and updating the criteria based on the effectiveness of the control instruction (Moncomble: paragraphs [0097] and [0103] describe characteristics of the presentation and the attention level measurements are used by an analysis module to determine the change in the attention level, this module determines probabilities or correlation between a decrease or an increase observed with regard to the attention measurement and various groups of characteristic elements of the presentation and/or of the speaker.  If a correspondence between characteristics elements or groups of characteristic elements and attention level decrease or attention level increase rate is found in several 

As for claim 9, the combined system of Moncomble, Nuta and Breazeal teaches wherein the contextual data is received from one or more sensors (Moncomble: paragraph [0072] describes an audio measurement sensor to measure the speakers sound level), wherein the contextual data comprises at least one selected from the group consisting of: tempo of the spoken words (Moncomble: paragraph [0072] describes the speaker’s sound level); and 
user biometrics data (Moncomble: paragraph [0072] describes the sensor measures the speaker’s gestures).

As for claim 13, the combined system of Moncomble, Nuta and Breazeal teaches deploying a system comprising providing a computer infrastructure operable to perform the monitoring the contextual data (Moncomble: paragraphs [0145]-[0146] describe a learning device implements a learning method, the device comprises software and hardware components capable of implementing the function of collecting the measurements captured by sensors), the determining the user interest level (Moncomble: paragraphs [0074]-[0075] describe the attention level measurements are determined), the determining the control instruction (Moncomble: paragraph [126] .

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Nuta (US 9,641,681 B2) and Breazeal  (US 2018/0133900) further in view of Peters et al. (US 2021/0076002), hereinafter Peters and further in view of Bangalore et al. (US 2015/0317974), hereinafter Bangalore.

As for claim 8, the combined system of Moncomble, Nuta and Breazeal teaches all the limitations set forth above except wherein a conference includes:
a teleconference;
a webcast;
a web/video conference; and 
an audiobook.
However, it is well known in the art, to manage different categories of a conference, as evidenced by Peters.
Peters discloses wherein a conference includes:
a teleconference (paragraph [0003] describes a video conference which involves participants located remotely from each other);
a web/video conference (paragraph [0064] describes a video conference).
 One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Peters for 
The combined system of Moncomble, Nuta, Breazeal and Peters fails to teach wherein a conference includes a webcast and an audiobook.
However, it is well known in the art, to utilize different resources on the internet to create voice profiles, as evidenced by Bangalore.
Bangalore discloses wherein a conference includes a webcast and an audiobook (paragraphs [0002] and [0036] describe webcasts and audiobooks).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Bangalore for managing different types of web resources that contain human speech data.  The teachings of Bangalore, when implemented in the Moncomble, Nuta, Breazeal and Peters system, will allow one of ordinary skill in the art to acquire voice parameterizations from human speech data. One of ordinary skill in the art would be motivated to utilize the teachings of Bangalore in the Moncomble, Nuta and Breazeal and Peters system in order to generate speech specific to a user (Bangalore: abstract).

Claims 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Nuta (US 9,641,681 B2) and Breazeal  (US 2018/0133900) further in view of Peterson (US 2015/0156529).

As for claim 10, the combined system of Moncomble, Nuta and Breazeal teaches the computing device (Moncomble: Fig. 3, paragraph [0145] describes a learning device).
The combined system of Moncomble, Nuta and Breazeal fails to teach wherein a service provider at least one of creates, maintains, deploys and supports a computing device.
However, it is well known in the art, to implement an application in a user equipment device to collect users’ biometric data, as evidenced by Peterson.
Peterson discloses wherein a service provider at least one of creates, maintains, deploys and supports a computing device (paragraph [0057] describes user equipment devices own by subscriber to services associated with media content sources; paragraph [0055] describes a system in which a media guidance application is implemented at user equipment devices which function as a standalone device; paragraphs [0036] describes the media guidance application obtains data associated with a user including content access, biometric data).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Peterson for implementing an application at user equipment devices. The teachings of Peterson, when implemented in the Moncomble, Nuta and Breazeal system, will allow one of 

As for claim 11, the combined system of Moncomble, Nuta and Breazeal teaches the monitoring the contextual data (Moncomble: paragraphs [0072]-[0073] describe the speaker and audience’s interactions are captured by various sensors), the determining the user interest level (Moncomble: paragraphs [0074]-[0075] describe the attention level measurements are determined), the determining the control instruction (Moncomble: paragraph [126] describes recommendations are determined), and the outputting the control instruction are provided by an entity (Moncomble: paragraph [0128] describes an interface that determines the recommendations).
The combined system of Moncomble, Nuta and Breazeal fails to teach wherein an entity is a service provider on a subscription, advertising, and/or fee basis.
However, it is well known in the art, to provide a services on an advertising basis, as evidenced by Peterson.
Peterson discloses wherein an entity is a service provider on a subscription, advertising, and/or fee basis (paragraphs [0001]-[0002] describes systems for selectively transmitting user interaction information based on biometric information to serve advertisers and media content providers; paragraph [0032] describes advertisement is 
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Peterson for implementing an application at user equipment devices. The teachings of Peterson, when implemented in the Moncomble, Nuta and Breazeal system, will allow one of ordinary skill in the art to collect users’ interactions and biometric data. One of ordinary skill in the art would be motivated to utilize the teachings of Peterson in the Moncomble, Nuta and Breazeal system in order to provide an interactive program guide which capture user interactions and biometric characteristics to create user profiles which enable a multimedia service provider to generate content that are customized based on the received user profiles.

Claims 14, 16-17 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Nuta (US 9,641,681 B2) and Breazeal  (US 2018/0133900) further in view of Osotio et al. (US 2018/0197066), hereinafter Osotio.

As for claim 14, Moncomble teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith (paragraph [0148] describes a computer program is stored in a memory), the program instructions executable by a computing device to cause the computing device to 
receive identification information for a participant in a conference (paragraph [0065] describes a presentation or an online training course which provides by a speaker to multiple users, each user is in front of their respective terminal; paragraph [0083] describes an individual measurement is performed for each of the people in the audience);
monitor contextual data associated with the identified participant (paragraphs [0076] and [0080] describe a person’s behaviors are captured);
determine an interest level of the participant based on the monitoring the contextual data (paragraphs [0083]-[0084] describe an individual measuring of attention level of each people in the audience is performed);
determine a custom control instruction based on the participant’s interest level (paragraph [0116] describes a server performs the method of prediction of attention level; paragraph [0126] describes the prediction of the attention level is associated with determination of recommendations for actions to be performed on a presentation in order to change the attention level in a desired direction);
wherein a participant’s interest level is a user’s context (paragraph [0076] describes the attention level is measured based analysis of the faces in the audience);
wherein a voice is of a speaker at a conference (paragraph [0072] describes a speaker gives an audio presentation to audiences).
Moncomble fails to teach predict a subsequent spoken word of a user based on a monitoring a contextual data;

output the custom control instruction to cause the user device to execute the custom control instruction;
determine an updated participant interest level after the outputting the control instruction to cause the user device to execute the control instruction, and
in response to the updated participant interest level being below a threshold, modify the control instruction criteria and output a different control instruction to cause the user device to execute the different control instruction.
However, it is well known in the art, to predict a word spoken by a user based on environment data, as evidenced by Nuta.
Nuta discloses predict a subsequent spoken word of a user based on a monitoring a contextual data (col. 13, lines 41-45 and 51-54 and col. 14, lines 1- 5 describe a language model of a speech recognition engine (see Fig. 2) can predict a next word or phrase in a sequence is a name);
determining, using control instruction criteria, a custom control instruction to provide to a user device associated with a participant based on the participant’s interest level and the predicted subsequent spoken word (col. 18, lines 29-41 describe speech quality metrics i.e. user’s context are evaluated; col. 20, lines 30-33 describe tone and 
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Nuta for implementing a predictive model to obtain conversation metric data and conversation assessment data include in a plurality of conversations. The teachings of Nuta, when implemented in the Moncomble system, will allow one of ordinary skill in the art to evaluate the quality of a participant’s contributions to a conversation. One of ordinary skill in the art would be motivated to utilize the teachings of Nuta in the Moncomble system in order to improve the quality of conversations and guide a speaker to speak in ways that are more likely to induce participants’ cooperation and interest. 
The combined system of Moncomble and Nuta fails to teach wherein a control instruction is directed to a user device;

output the custom control instruction to cause the user device to execute the custom control instruction;
determine an updated participant interest level after the outputting the control instruction to cause the user device to execute the control instruction, and
in response to the updated participant interest level being below a threshold, modify the control instruction criteria and output a different control instruction to cause the user device to execute the different control instruction.
However, it is well known in the art, to adjust a speech based on context associated with individuals during a conversation, as evidenced by Breazeal.
Breazeal discloses wherein a control instruction is directed to a user device (paragraphs [0123]-[0125] describe a diction engine provides instruction for addressing issues e.g. generates tags in the form of text, voice, vision to direct a robot to adjust inflection i.e. a tempo on words and adjust a pitch i.e. a tone of speed when talking to a child);
wherein a custom control instruction causes a user device to provide a visual alert or animation on a user device (paragraph [0125] describes tags in the form of text, voice, and vision are provided to direct the robot a mode of operation); and
output the custom control instruction to cause the user device to execute the custom control instruction (paragraphs [0125] and [0127] describe tags are provided to the robot to execute).

The combined system of Moncomble, Nuta and Breazeal fails to teach provide a haptic to a user via a user device;
determine an updated participant’s context after an outputting a control instruction to cause a user device to execute the control instruction, and
in response to the updated participant’s context being below a threshold, modify the control instruction criteria and output a different control instruction to cause the user device to execute the different control instruction.
However, it is well known in the art, to provide options to update a presentation based on user data, as evidenced by Osotio.
Osotio discloses provide a haptic to a user via a user device (paragraph [0098] describes a mobile device i.e. a client computing device of a user (see paragraph [0051]) provides tactile feedback);
determine an updated participant’s context after an outputting a control instruction to cause a user device to execute the control instruction (paragraphs [0058]-[0061] describe user contexts, emotions etc. are collected after an initial visual presentation of 
in response to the updated participant’s context being below a threshold, modify the control instruction criteria and output a different control instruction to cause the user device to execute the different control instruction (paragraph [0062]-[0063] describes if the adjustment threshold has been reached, instructions is sent so that the visual personification of the AI interface is adjusted; paragraph [0081] describe user contexts are compared with an adjustment threshold and if the user contexts do not meet the threshold then the generated visual personification of AI is provided to a user).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Osotio for updating a visual presentation based on data and inputs collected from a user. The teachings of Osotio, when implemented in the Moncomble, Nuta and Breazeal system, will allow one of ordinary skill in the art to adjust and evolve a presentation as additional data is collected from a user. One of ordinary skill in the art would be motivated to utilize the teachings of Osotio in the Moncomble, Nuta and Breazeal system in order to provide an AI user interface that is generated, adjusted and evolves over time based on the user to increase engagements, trust and emotional connection  with the user (Osotio: paragraph [0049]).



As for claim 17, the combined system of Moncomble, Nuta, Breazeal and Osotio teaches wherein the program instructions further cause the computing device to determine an effectiveness of the control instruction, and updating the criteria based on the effectiveness of the control instruction (Moncomble: paragraphs [0097] and [0103] describe characteristics of the presentation and the attention level measurements are used by an analysis module to determine the change in the attention level, this module determines probabilities or correlation between a decrease or an increase observed with regard to the attention measurement and various groups of characteristic elements of the presentation and/or of the speaker.  If a correspondence between characteristics elements or groups of characteristic elements and attention level decrease or attention level increase rate is found in several presentations of the set, then this correspondence is recorded in a learning database; paragraph [0113] further describes at the end of the learning phase, the database is enriched by a set of information in relation to the 
wherein a participant’s interest level is a participant’s context (Moncomble: paragraph [0076] describes the attention level is measured based analysis of the faces in the audience);
wherein an outputting a control instruction is an adjustment (Moncomble: paragraph [0127] describes a recommendation that asks to increase the sound level of the speaker’s voice).
The combined system of Moncomble, Nuta and Breazeal fails to teach wherein an effectiveness is determined based on an update participant context determined after an adjustment.
However, it is well known in the art, to rely on current user context to evaluate an offer for a change, as evidenced by Osotio.
Osotio discloses wherein an effectiveness is determined based on an update participant context determined after an adjustment (paragraph [0069] describes implicit and explicit feedbacks are used for evaluation of a user’s likes/dislikes of the change made to the presentation).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Osotio for updating a visual presentation based on data and inputs collected from a user. The teachings of Osotio, when implemented in the Moncomble, Nuta and Breazeal system, will allow one of ordinary skill in the art to adjust and evolve a presentation as additional data is collected from a user. One of ordinary skill in the art would be motivated to utilize 

As for claim 21, the combined system of Moncomble, Nuta and Breazeal teaches wherein the control instruction further include an instruction provide a visual alert or animation on the user device (Breazeal: paragraph [0125] describes tags in the form of text, voice, and vision are provided to direct the robot a mode of operation);
wherein a user interest level is a user context (Moncomble: paragraph [0076] describes the attention level is measured based analysis of the faces in the audience), 
The combined system of Moncomble, Nuta and Breazeal fails to teach instruction to provide haptic feedback to a user via a user device, and further comprising:
determining, by a computing device, an updated user interest level after an outputting a control instruction to cause the user device to execute a control instruction; and
in response to an updated user context being below a threshold, the computer device outputting a different control instruction to cause the user device to execute the control instruction.
However, it is well known in the art, to provide options to update a presentation based on user data, as evidenced by Osotio.
 instruction to provide a haptic to a user via a user device (paragraph [0098] describes a mobile device i.e. a client computing device of a user (see paragraph [0051]) provides tactile feedback);
determining, by a computing device, an updated user interest level after an outputting a control instruction to cause the user device to execute the control instruction (paragraphs [0058]-[0061] describe user contexts, emotions etc. are collected after an initial visual presentation of an AI interface is provided to a user; the collected data is compared to an adjustment threshold to determine that an adjustment to the generated personification of the AI interface would be appropriate for the user; paragraph [0067] describes instructions are provided to make change), and
in response to the updated user context being below a threshold, the computing device outputting a different control instruction to cause the user device to execute the different control instruction (paragraph [0062]-[0063] describes if the adjustment threshold has been reached, instructions is sent so that the visual personification of the AI interface is adjusted; paragraph [0081] describe user contexts are compared with an adjustment threshold and if the user contexts do not meet the threshold then the generated visual personification of AI is provided to a user).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Osotio for updating a visual presentation based on data and inputs collected from a user. The teachings of Osotio, when implemented in the Moncomble, Nuta and Breazeal system, will allow one of ordinary skill in the art to adjust and evolve a presentation as additional data is collected from a user. One of ordinary skill in the art would be motivated to utilize .

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Nuta (US 9,641,681 B2) and Breazeal  (US 2018/0133900) further in view of Osotio et al. (US 2018/0197066) and further in view of Peters (US 2021/0076002).

As for claim 15, the combined system of Moncomble, Nuta, Breazeal and Osotio teaches all the limitations set forth above except wherein a custom control instruction includes at least one selected from the group consisting of:
an instruction to pause the speaker’s voice during conversation; and
an instruction to provide haptic feedback to the user device.
However, it is well known in the art, to provide actions to participants’ endpoints, as evidenced by Peters.
Peters discloses wherein a custom control instruction includes at least one selected from the group consisting of:
an instruction to pause the speaker’s voice during conversation (paragraph [0079] describe actions are performed to alter the transmission and presentation of the video conference at various endpoint e.g. adjusting audio properties for different endpoints, e.g. mute audio of participants).
.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Nuta (US 9,641,681 B2) and Breazeal  (US 2018/0133900) further in view of Osotio (US 2018/0197066) and further in view of Mossoba et al. (US 10,388,286 B1). 

As for claim 18, the combined system of Moncomble, Nuta, Breazeal and Osotio teaches wherein the contextual data is received from one or more sensors (Moncomble: paragraphs [0072]-[0073] describe audio and video sensors are used to capture the speaker and the audience’s behaviors, biometrics data).
The combined system of Moncomble, Nuta, Breazeal and Osotio fails to teach wherein a contextual data comprises at least one selected from a group consisting of: tempo of the spoken words.

Mossoba discloses wherein a contextual data comprises at least one selected from a group consisting of: tempo of the spoken words (col. 7, lines 13-15 describe a user’s voice samples include tempo).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Mossoba for selecting a human’s voice sample.  The teachings of Mossoba, when implemented in the Moncomble, Nuta, Breazeal and Osotio system, will allow one of ordinary skill in the art to provide speech samples to a speech analysis. One of ordinary skill in the art would be motivated to utilize the teachings of Mossoba in the Moncomble, Nuta, Breazeal and Osotio system in order to create a speech pattern model for a user based on the user’s extracted speech from a multi-party conversation.

Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Peters et al. (US 2021/0076002), hereinafter Peters further in view of Nuta (US 9,641,681 B2).

As for claim 19, Moncomble teaches a system comprising:
a processor (paragraph [0148] describes a processor), a computer readable memory and a computer readable storage medium associated with a computing device (paragraph [0148] describes computer program stored in a memory);

program instructions to monitor contextual data associated with the identified participant (paragraph [0149] describes code instructions; paragraphs [0076] and [0080] describe a person’s behaviors are captured);
program instructions to determine an interest level of the participant based on the monitoring the contextual data (paragraph [0149] describes code instructions; paragraphs [0083]-[0084] describe an individual measuring of attention level of each people in the audience is performed);
program instructions to determine a custom control instruction to provide to a user device associated with the participant based on the participant’s interest level and a characteristic element (paragraph [0149] describes code instructions; paragraph [0126]-[0127] describe the prediction of the attention level and the decreased voice level which leads to the decrease of the attention level are associated with determination of recommendations for actions to be performed on a presentation in order to change the attention level in a desired direction); and
wherein the program instructions are stored on the computer readable storage medium for execution by the processor via the computer readable memory (paragraph 
Moncomble fails to teach program instructions to predict a subsequent spoken word of a user based on a contextual data;
program instructions to determine a custom control instruction to provide to a user device associated with a participant based on the participant’s context and the subsequent spoken word, wherein a custom control instruction is customized for the participant and causes a user device to modulate a tone and adjust a tempo of a voice of a speaker in a conference; 
program instructions to output the custom control instruction to cause the user device to execute the custom control instruction;
program instructions to monitor second contextual data associated with a second user during the conference;
program instructions to determine a second user interest level based on the monitoring the second contextual data;
program instructions to determine a second control instruction to provide to a second user device associated with the second user based on the second user’s interest level, wherein the second control instruction causes the second user device to modulate a tone and adjust a tempo of the voice of the speaker in the conference, wherein the control instruction and the second control instruction are different instructions; and
program instructions to output the second control instruction to cause the second user device to execute the second control instruction.

Peters discloses 
program instructions to determine a custom control instruction to provide to a user device associated with a participant based on a participant’s interest level and a subsequent spoken word (paragraph [0395] describes program instructions executed by a processing apparatus, this is applied to all limitations that disclose program instructions hereinafter; paragraphs [0071]-[0073] and [0079] describe a moderator module receives a media stream which includes audio i.e. speech of a conference’s participant, words spoken by the participant and video data from a particular endpoint, the moderator module processes the video stream to assess the conditions of collaboration in the video conference, calculate collaboration factor scores representing how well the participant has been participating and performs a number of management actions at various endpoints), wherein a custom control instruction is customized for the participant and causes a user device to modify a voice of a speaker in the conference (paragraph [0079] describes the management actions includes adjust audio properties for the different endpoints e.g. changes a volume level or mute audio of one or more participants); and
program instructions to output the custom control instruction to cause the user device to execute the custom control instruction (paragraph [0079] describes the moderator module alters the transmission of data of the video conference at the endpoints);

program instructions to determine a second user interest level based on the monitoring the second contextual data (paragraph [0066] describes the calculation of measurement value based on the characteristics and the measurement values are used to represent the quality and extent the participants have participated);
program instructions to determine a second control instruction to provide to a second user device associated with the second user based on the second user’s interest level (paragraph [0066] describes the providing of active feedback of the level and quality of the conference participants based on the monitored characteristics, furthermore, if certain thresholds are achieved or maintained, certain actions may be triggered), wherein the second control instruction causes the second user device to modify a voice of the speaker in the conference (paragraph [0079] describe actions are performed to alter the transmission and presentation of the video conference at various endpoint e.g. adjusting audio properties for different endpoints, changing a volume level or mute audio of participants), wherein the control instruction and the second control instruction are different instructions (paragraphs [0066]-[0069] and [0080] describe the moderator system monitors speaking time of each participant, tracks emotional status and response of each participant in order to help measure and determine the level and quality of each participant and provide active feedback based on the quality and extent 
program instructions to output the second control instruction to cause the second user device to execute the second control instruction (paragraph [0067] describes a score or other indicator of the level and quality of participant is output, in real time, to the conference participants which is integrated with a media stream or a representation of an endpoint or the corresponding participant).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Peters for providing instructions to a participant’s device. The teachings of Peters, when implemented in the Moncomble, Nuta and Breazeal system, will allow one of ordinary skill in the art to improve the quality of a conference that will mutually benefit both the speaker and the participants. One of ordinary skill in the art would be motivated to utilize the teachings of Peters in the Moncomble, Nuta and Breazeal system in order to help participants overcome challenges that they usually face in their remote environment when participating in a video conference.
The combined system of Moncomble and Peters fails to teach program instructions to predict a subsequent spoken word of a user based on a contextual data;
wherein modifying a voice includes modulate a tone and adjust a tempo of the voice.
However, it is well known in the art, to adjust a characteristics of a voice of a user, as evidenced by Nuta.
Nuta discloses 

wherein modifying a voice includes modulate a tone and adjust a tempo of the voice (col. 24, lines 10-14 describes the suggestions include a change in cadence, use of rapport building phrases, and other measures).
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Nuta for implementing a predictive model to obtain conversation metric data and conversation assessment data include in a plurality of conversations. The teachings of Nuta, when implemented in the Moncomble and Peters system, will allow one of ordinary skill in the art to evaluate the quality of a participant’s contributions to a conversation. One of ordinary skill in the art would be motivated to utilize the teachings of Nuta in the Moncomble and Peters system in order to improve the quality of conversations and guide a speaker to speak in ways that are more likely to induce participants’ cooperation and interest. 

As for claim 20, the combined system of Moncomble, Peters and Nuta teaches wherein the custom control instruction includes at least one selected from the group consisting of:
.

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Moncomble (US 2019/0212811) in view of Nuta et al. (US 9, 641,681 B2), hereinafter Nuta further in view of Breazeal et al. (US 2018/0133900) hereinafter Breazeal and Peters (US 2021/0076002).

As for claim 22, the combined system of Moncomble, Nuta and Breazeal teaches
wherein modifying a voice includes modulate a tone and adjust a tempo of the voice (Nuta: col. 24, lines 10-14 describes the suggestions include a change in cadence, use of rapport building phrases, and other measures).
The combined system of Moncomble, Nuta and Breazeal fails to teach
monitoring, by a computing device, second contextual data associated with a second user during the conference;
determining, by the computing device, a second user interest level based on the monitoring the second contextual data;
determining, by the computing device, a second control instruction to provide to a second user device associated with the second user based on the second user’s interest level, wherein the second control instruction causes the second user device to modify a voice of a speaker in a conference, wherein a control instruction and the second control instruction are different instructions;

determining, by the computing device, a group control instruction to provide to a plurality of group user devices associated with a plurality of group users based on the second user’s interest level, wherein the group control instruction causes the group user devices to modulate a tone and adjust a tempo of the voice of the speaker in the conference;
outputting, by the computing device, the user control instruction to cause the group user devices to execute the group control instruction; and
generating, by the computing device, a report that provides feedback to the speaker in the conference, the report including the second user’s interest level at a plurality of times, and the second control instructions that were outputted.
However, it is well known in the art, to collect data associated with a participant’s characteristics in a video conference, as evidenced by Peters.
Peters discloses
monitoring, by computing device, second contextual data associated with a second user during the conference (paragraphs [0064] describes a moderator system monitors characteristics of participants of a video conference);
determining, by the computing device, a second user interest level based on the monitoring the second contextual data (paragraph [0066] describes the moderator module calculates a measurement value based on the characteristics and the measurement values are used to represent the quality and extent the participants have participated);

outputting, by the computing device, the second control instruction to cause the second user device to execute the second control instruction (paragraph [0067] describes the moderator module outputs a score or other indicator of the level and quality of participant, in real time, to the conference participants which is integrated with a media stream or a representation of an endpoint or the corresponding participant);

outputting, by the computing device, the user control instruction to cause the group user devices to execute the group control instruction (paragraph [0303] describes the system performs management actions i.e. mute or unmute audio, the actions can be done for group of participants or for all participants); and
generating, by the computing device, a report that provides feedback to the speaker in the conference (paragraph [0113] describes the system provides feedback to a meeting host about level of interest among participants and characteristics that were used that make the instructor’s performance effective), the report including the second user’s interest level at a plurality of times (paragraph [0113] describes the feedback includes level of interest among participants), and the second control instructions that 
One of ordinary skill in the art before the effective filing date of the claimed invention would have recognized the ability to utilize the teachings of Peters for providing feedback to participants of a video conference. The teachings of Peters, when implemented in the Moncomble, Nuta and Breazeal system, will allow one of ordinary skill in the art to improve the quality of a conference that will mutually benefit both the speaker and the participants. One of ordinary skill in the art would be motivated to utilize the teachings of Peters in the Moncomble, Nuta and Breazeal system in order to help participants overcome challenges that they usually face in their remote environment when participating in a video conference.

Conclusion



The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hammersley et al. (US 2019/0189019) teach method for integrating special effects with a text source
Lopes et al. (US 9,632,647 B1) teach selecting presentation position in dynamic content
Kim et al. (US 2020/0027456) teach method for providing artificial intelligence services based on pre-gathered conversations.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to L. T N. whose telephone number is (571)272-1013.  The examiner can normally be reached on M & Th 5:30 am - 2:30 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TONIA DOLLINGER can be reached on 571-272-4170.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for 


/L. T. N/
Examiner, Art Unit 2459



/Backhean Tiv/Primary Examiner, Art Unit 2459