DETAILED ACTION

Acknowledgement


This action is in response to the request for continued examination (RCE) filed on 11/28/2022.


Status of Claims


Claims 1, 11, and 16 have been amended. 
Claims 1-20 are now pending.


Continued Examination Under 37 CFR 1.114


A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/28/2022 has been entered.


Response to Arguments

Claim objection is withdrawn in light of amendments.
The 35 U.S.C. 112(a) rejection is withdrawn in light of amendments.
The 35 U.S.C. 101 rejection of claims 1-20 is reinstated in light of amendments. Claims 1, 11, and 16 no longer recite an improvement to machine learning technology (i.e. practical application). The claims no longer recite the machine learning algorithm is updated based on feedback from the initial output of the evaluation score for purposes of correction and/or continuous improvement. Updating the scoring algorithm weights based on user input of importance is not improving upon the machine learning technology itself. 
Applicant's arguments filed on 11/28/2022 regarding the 35 U.S.C. 103 rejection of claims 1-20 have been fully considered. The Applicant argues that (1) the collective teachings of Moudy, Xu, and Hanks fail to teach or suggest the limitations of “determining one or more context-based attributes associated with the instruction event by generating and processing audience responses to one or more dynamic context-based audience queries…” (claim 1d) and (2) neither Moudy, nor Xu, nor Hanks discloses the amended limitation of “determining, using one or more regression techniques…, a recommended instructor for the given event” (claim 1h) .
As per argument (1), the Examiner respectfully disagrees. Moudy in view of Xu and Hanks teach the claim 1d limitations. Moudy teaches determining one or more context-based attributes associated with the instruction event by generating and processing audience responses to one or more dynamic context-based audience queries in connection with the instruction event (Moudy e.g. Fig. 10 multimodal feedback analyzer 1000 includes a feedback input data parser 1010 that may receive and parse different modes of user content feedback data (e.g., text, voice/audio, image/video, etc.) [0124]. Text feedback data, such as online discussion posts and responses, emails, instant messaging and online chat content, text-based reviews or evaluations, and the like, may be routed to the text feedback analyzer 1020 [0125]. Within the text feedback analyzer 1020, the text feedback data may be parsed and analyzed to determine sentiment using various techniques and processes. For example, a trained sentiment NLP neural network 660 may be used to determine a raw sentiment score for each text feedback data [0133].). Moudy in view of Xu do not explicitly teach, however, Hanks teaches wherein the one or more dynamic context-based audience queries are generated via applying one or more artificial intelligence techniques to at least a portion of one or more of the audio data, the video data, and the image data (Hanks e.g. Figs. 1 2, and 6, Hanks teaches systems, methods, and apparatus configured to intelligently analyze digital content and generate study-aid questions based upon the analyzed content.  Systems and methods comprise receiving class notes, class slides, class audio recordings, class video recording, and any other known digital media. The received content can then be automatically analyzed using natural language processing to identify potential questions and answers that can be presented to a user. As such, novel users of artificial intelligence can assist a user in creating study aids based upon user provided content (Fig. 1, 6, and [0007]). Fig. 2 illustrates example questions and answers generated by the computer system of Fig. 1 [0014]. Generated question-and-answer pairs can be loaded into an artificial intelligence component of the present invention for the purposes of training the artificial component. The artificial intelligence component may comprise IBM WATSON [0031].)
As per argument (2), the Examiner finds the Applicant’s arguments persuasive. Therefore, the previous 103 rejection has been withdrawn. However, upon further consideration, a new ground of 103 rejection for claims 1-20 is made. See details below.


Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention, “Providing Feedback by Evaluating Multi-Modal Data Using Machine Learning Techniques”, is directed to an abstract idea, specifically Mental Processes and Certain Methods of Organizing Human Activity, without significantly more. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements individually or in combination provide mere instructions to implement the abstract idea on a computer.
Step 1:  Claims 1-20 are directed to a statutory category, namely a process (claims 1-10), a manufacture (claims 11-15) and a machine (claims 16-20).
Step 2A (1): Independent claims 1, 11, and 16 are directed to an abstract idea of Mental Processes, based on the following claim limitations: “determining one or more audio attributes associated with an instruction event; determining one or more video attributes associated with the instruction event; determining one or more image attributes associated with the instruction event by processing image data captured in connection with the instruction event; creating at least one time frame of images, captured during the instruction event, of at least a portion of an audience of the instruction event and at least one instructor of the instruction event; and calculating one or more values attributed to the at least on instructor and one or more values attributed to at least a portion of the audience, at one or more instances of time during the instruction event, to provide at least a portion of the one or more image attributes; determining one or more context-based attributes associated with the instruction event by generating and processing audience responses to one or more dynamic context-based audience 15queries in connection with the instruction event, wherein the one or more dynamic context-based audience queries are generated; generating, using at least one evaluation score algorithm, an evaluation score attributed to at least one instructor of the instruction event based at least in part on the one or more audio attributes, the one or more video attributes, the one 20or more image attributes, and the one or more context-based attributes; outputting the evaluation score to at least one of one or more users; updating, based at least in part on user input related to the evaluation score, at least a portion of the at least one evaluation score algorithm, and determining, using one or more regression techniques and based at least in part on the evaluation score and one or more parameters associated with a given event, a recommended instructor for the given event.”. These claims describes a process of analyzing, evaluating, and scoring data associated with an instruction event and recommending an instructor based on results. Dependent claims 2-10, 12-15, and 17-20 further describes the characteristics of the data being analyzed and the evaluation and scoring process. The claimed invention could encompass a human person mentally observing, analyzing, and evaluating/scoring data associated with an instruction using pen and paper and providing a recommendation based on the results. Recommending an instructor for a given event is considered managing personal behavior. Therefore, these limitations, under the broadest reasonable interpretation, fall within the abstract groupings of Mental Processes which includes concepts performed in the human mind such as observations, evaluations, judgments, and opinions and Certain Methods of Organizing Human Activity” which encompasses managing personal behavior or relationships or interactions between people including social activities, teaching, and following rules or instructions. As per the October 2019 Patent Subject Matter Eligibility Guidance, Mental Processes include claims directed to collecting information, analyzing it, and displaying certain results of the collection and analysis even if they are claimed as being performed on a computer. Certain Methods of Organizing Human Activity can encompass the activity of a single person (e.g. a person following a set of instructions), activity that involve multiple people (e.g. a commercial interaction), and certain activity between a person and a computer (e.g. a method of anonymous loan shopping). Therefore, claims 1-20 are directed to an abstract idea and are not patent eligible.
Step 2A (2): This judicial exception is not integrated into a practical application. In particular, claims 1, 3, 5, 7, 11-14,  and 16-19  recite additional elements of a computer-implemented method, applying one or more machine learning techniques and artificial intelligence techniques to audio, video, and image data, processing image data using one or more convolutional neural network-based image classification algorithm, machine-learning-based score algorithm, centralized platforms, one processing device comprising a processor coupled to a memory, a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device, an apparatus comprising at least one processing device comprising a processor coupled to a memory, the at least one processing device. These additional elements do not integrate the abstract idea into a practical application because the claims do not recite (a) an improvement to another technology or technical field and (b) an improvement to the functioning of the computer itself and (c) implementing the abstract idea with or by use of a particular machine, (d) effecting a particular transformation or reduction of an article, or (e) applying the judicial exception in some other meaningful way beyond generally linking the use of an abstract idea to a particular technological environment. These additional elements are viewed as computer algorithms and devices that are used to automate the abstract elements of the data analysis, evaluation/scoring, and recommendation process. Limitations that recite mere instructions to implement an abstract idea on a computer or merely uses a computer as a tool to perform an abstract idea are not indicative of integration into a practical application (see MPEP 2106.05(f)). Therefore, claims 1-20 do not integrate the judicial exception into a practical application and thus are not patent eligible. 
Step 2B: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Claims 1, 3, 5, 7, 11-14,  and 16-19 recite additional elements as stated above. As per the Applicant’s PG PUB disclosure,  machine learning techniques include logistic regression [0037], Dual Ask-Answer Network (DAANET) [0038], convolutional neural network (CNN) [0041]; artificial intelligence techniques include a two-stage synthesis network [0043]; processing platform comprising one or more computers, servers, storage devices or other processing devices [0093]; processor comprises a microprocessor, a microcontroller, an application - specific integrated circuit ( ASIC ), a field - programmable gate array ( FPGA ) or other type of processing circuitry [0027]; memory comprises random access memory ( RAM ), read - only memory ( ROM ) or other types of memory, in any combination and may be viewed as examples of what are more generally referred to as “processor-readable storage media” storing executable computer program code or other types of software programs  [0027]; and processing device are referred to as computers [0018]. These additional elements are viewed as mere instructions to apply or implement the abstract idea on a computer. Applying an abstract idea on a computer does not integrate a judicial exception into a practical application or provide an inventive concept (see MPEP 2106.05(f)). Therefore, claims 1-20 do not include additional elements that are sufficient to amount to significantly more than the judicial exception and thus are not patent eligible.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Moudy et al. (US 2016/0300135 A1) in view of Xu et al. (US 2020/0065612 A1), in further view of Hanks et al. (US 2016/0133148 A1), and in further view of Shen et al. (US 2019/0138614 A1).
As per claim 1 (Currently Amended), Moudy teaches a computer-implemented method comprising (Moudy e.g. Fig. 11 process of analyzing and calculating raw sentiment scores for content feedback data including both uni-modal and multimodal content feedback data. Steps in this process may be performed by a multimodal feedback analyzer 1000 (Fig. 10) operating within or in collaboration with the various component servers and devices of a sentiment analyzer system 600 or 700 (Figs. 6-7) [0129].): 
Moudy teaches determining one or more audio attributes associated with an instruction event by applying one or more machine learning techniques to audio data captured in connection with the instruction event; (Moudy e.g. Fig. 10 multimodal feedback analyzer 1000 includes a feedback input data parser 1010 that may receive and parse different modes of user content feedback data (e.g., text, voice/audio, image/video, etc.) [0124]. Voice data and other audio feedback data, such as live or recorded audio discussion posts, and the like may be routed to the audio feedback analyzer 1030 [0126]. Various natural language processing (NLP) engines, such as sentiment neural networks (i.e. machine learning), may be used to calculate sentiment and analyze individual and group sentiment based on received feedback data [0165].)
Moudy teaches determining one or more video attributes associated with the instruction event by applying one or more machine learning techniques to video data captured in connection with the instruction event; (Moudy e.g. Fig. 10 multimodal feedback analyzer 1000 includes a feedback input data parser 1010 that may receive and parse different modes of user content feedback data (e.g., text, voice/audio, image/video, etc.) [0124]. Image and/or video feedback data, such as video chat data, video conference, video blog content, video recordings of questions and responses, image or video of user expressions or gestures in responses to various content, and the like, may be routed to the image/video feedback analyzer 1040 [0127]. Various natural language processing (NLP) engines, such as sentiment neural networks (i.e. machine learning), may be used to calculate sentiment and analyze individual and group sentiment based on received feedback data [0165].)
Moudy teaches determining one or more image attributes associated with the instruction event by processing image data captured in connection with the instruction event (Moudy e.g. Fig. 10 multimodal feedback analyzer 1000 includes a feedback input data parser 1010 that may receive and parse different modes of user content feedback data (e.g., text, voice/audio, image/video, etc.) [0124]. Voice data and other audio feedback data, such as live or recorded audio discussion posts, and the like may be routed to the audio feedback analyzer 1030 [0126]. Image and/or video feedback data, such as video chat data, video conference, video blog content, video recordings of questions and responses, image or video of user expressions or gestures in responses to various content, and the like, may be routed to the image/video feedback analyzer 1040 [0127]. Various natural language processing (NLP) engines, such as sentiment neural networks, may be used to calculate sentiment and analyze individual and group sentiment based on received feedback data [0165].) Moudy does not explicitly teach, however, Xu teaches using one or more convolutional neural network-based image classification algorithms (Xu e.g. Xu teaches a system and techniques for providing interactive feedback to a trainee based on a computational analysis of a video/audio stream or recording of said trainee [0018]. The input video feed, consisting of individual video frames and time-ordered sequences of video frames, acts as inputs to a set of neural networks (e.g., convolutional neural network 202, convolutional neural network 204, ... convolutional neural network 206). “Convolutional neural network” refers to a class of deep neural networks applied to analyzing images and video (Fig. 2 and [0041]).) , comprising: 
Moudy in view of Xu teach creating at least one time frame of images, captured during the instruction event using one or more imaging devices, of at least a portion of an audience of the instruction event and at least one instructor of the instruction event; and 
Moudy teaches images captured of an audience and instructor during the instruction event using one or more imaging devices (Moudy e.g. Moudy teaches systems and methods for capturing feedback data from various client devices in a content distribution network [0003]. Camera, microphones, may be used to capture video, image, and audio feedback from one or more users (Fig. 9B and [0122]). A set of user devices may be displaying live content, or time-synchronized pre-recorded content (e.g. a live presentation, lecture, or television program, a prerecorded lecture or program, live interactive gaming content, etc.) or may be physically located in the same location (e.g. a conference room, lecture hall, class room, etc.) in which users are viewing or interacting with the same content (e.g. a live lecture, a prerecorded presentation or program, etc.). The user devices 1310 may capture feedback associated with the presentation and/or presentation device [0158]. Presenters of live content (e.g., lecturers, instructors, performer, live television program producers, etc.) may use the real-time or (near real-time) feedback data from the presentation device to alter and customize the live presentation content [0155].)
Moudy does not explicitly teach, however, Xu teaches creating at least one time frame of images captured during an event  (Xu e.g. Xu teaches a system and techniques for providing interactive feedback to a trainee based on a computational analysis of a video/audio stream or recording of said trainee [0018]. The video signal is captured either in real time or from a recording by a video analyzer 102.  The video analyzer 102 processes video frames, individually and in time ordered sequence, of the video signal to convert the video signal into a plurality of human morphology feature predictions, e.g., eye contact, expression, movement, gestures, and so on [0029]. Referring to the video analyzer 102 in Fig. 2, the input video feed, consisting of individual video frames and time-ordered sequences of video frames, acts as inputs to a set of neural networks (e.g. , convolutional neural network 202, convolutional neural network 204, ...convolutional neural network 206 ) [0041].)
Moudy in view of Xu teach calculating, by processing at least a portion of the at least one time frame of images using the one or more convolutional neural network-based image classification algorithms, one or more values attributed to the at least one instructor and one or more values attributed to at least a portion of the audience, at one or more instances of time during the instruction event, to provide at least a portion of the one or more image attributes; 
Moudy teaches calculating one or more values attributed to the at least one instructor and one or more values attributed to at least a portion of the audience, at one or more instances of time during the instruction event, to provide at least a portion of the one or more image attributes (Moudy e.g. The raw sentiment score calculator 1060 may receive the synchronized outputs of the multiple individual mode feedback analyzers 1020-1040 (i.e. text feedback analyzer, audio feedback analyzer, and image/video feedback analyzer) and calculate one or more raw sentiment scores for specific users and/or specific feedback content data (Fig. 10 and [0128]). A sentiment score calculated in step 1109 may correspond to individual user's sentiment (e.g. user transmitting a video message), or the sentiment of a group of users (e.g. users listening to a talk or lecture), at a particular time and/or associated with a particular action (e.g. attending a class lecture) (Fig. 11 and  [0145]).)
	Moudy does not explicitly teach, however, Xu teaches by processing at least a portion of the at least one time frame of images using the one or more convolutional neural network-based image classification algorithms (Xu e.g. Xu teaches a system that provides analysis and personalized feedback based upon video and audio information gathered from a trainee [0018].The system comprises modules that process the video and audio information to create recommendations or scores, which can be evaluated in aggregate form or by individual communication attributes (e.g., enthusiasm, confidence, engagement, etc.) [0019]. The input video feed, consisting of individual video frames and time-ordered sequences of video frames, acts as inputs to a set of neural networks (e.g., convolutional neural network 202, convolutional neural network 204, ...convolutional neural network 206). Each frame goes is applied in parallel to the convolutional neural networks to extract key point features at time or frame interval t (Fig. 2 and [0041]). Fig. 6 convolutional neural network 600 includes the initial convolution layer 602 which stores the raw image pixels of the video frames and the final pooling layer 620 which determines the performance scores/predictions ([0063]-[0064]).)
The Examiner submits that before the effective filing date, it would have been obvious to one of ordinary skill in the art to modify Moudy’s sentiment analyzer system to include other machine learning techniques/algorithms such as convolutional neural network image classification algorithm to determine image attributes as taught by Xu in order to improve image feature classification (Xu e.g. [0065]).
Moudy teaches determining one or more context-based attributes associated with the instruction event by generating and processing audience responses to one or more dynamic context-based audience queries in connection with the instruction event (Moudy e.g. Fig. 10 multimodal feedback analyzer 1000 includes a feedback input data parser 1010 that may receive and parse different modes of user content feedback data (e.g., text, voice/audio, image/video, etc.) [0124]. Text feedback data, such as online discussion posts and responses, emails, instant messaging and online chat content, text-based reviews or evaluations, and the like, may be routed to the text feedback analyzer 1020 [0125]. Within the text feedback analyzer 1020, the text feedback data may be parsed and analyzed to determine sentiment using various techniques and processes. For example, a trained sentiment NLP neural network 660 may be used to determine a raw sentiment score for each text feedback data [0133].), Moudy in view of Xu do not explicitly teach, however, Hanks teaches wherein the one or more dynamic context-based audience queries are generated via applying one or more artificial intelligence techniques to at least a portion of one or more of the audio data, the video data, and the image data (Hanks e.g. Figs. 1 2, and 6, Hanks teaches systems, methods, and apparatus configured to intelligently analyze digital content and generate study-aid questions based upon the analyzed content.  Systems and methods comprise receiving class notes, class slides, class audio recordings, class video recording, and any other known digital media. The received content can then be automatically analyzed using natural language processing to identify potential questions and answers that can be presented to a user. As such, novel users of artificial intelligence can assist a user in creating study aids based upon user provided content (Fig. 1, 6, and [0007]). Fig. 2 illustrates example questions and answers generated by the computer system of Fig. 1 [0014]. Generated question-and-answer pairs can be loaded into an artificial intelligence component of the present invention for the purposes of training the artificial component. The artificial intelligence component may comprise IBM WATSON [0031].); 
The Examiner submits that before the effective filing date, it would have been obvious to one of ordinary skill in the art to modify Moudy in view of Xu’s  sentiment analyzer system to include generating queries based on digital content (i.e. feedback data) as taught by Hanks in order to improve the quality of questions (Hanks e.g. [0049]) and evaluations (i.e. feedback).
Moudy teaches generating, using at least one machine learning-based evaluation score algorithm, an evaluation score attributed to the at least one instructor of the instruction event based at least in part on the one or more audio attributes, the one or more video attributes, the one or more image attributes, and the one or more context-based attributes (Moudy e.g. Fig. 10 Multimodal feedback analyzer 1000 also includes a feedback input data synchronizer 1050 and a raw sentiment score calculator 1060. The raw sentiment score calculator 1060 may receive the synchronized outputs of the multiple individual mode feedback analyzers 1020-1040 and calculate one or more raw sentiment scores for specific users and/or specific feedback content data [0128]. A sentiment score calculated in step 1109 may correspond to individual user's sentiment (e.g. user transmitting a video message), or the sentiment of a group of users (e.g. users listening to a talk or lecture), at a particular time and/or associated with a particular action (e.g. attending a class lecture) (Fig. 11 and  [0145]). Various natural language processing (NLP) engines, such as sentiment neural networks, may be used to calculate sentiment and analyze individual and group sentiment based on received feedback data [0165].); 
Moudy teaches outputting the evaluation score to at least one of one or more users and one or more centralized platforms; (Moudy e.g. A sentiment analyzer system may determine sentiment scores in real-time or near real-time and transmit sentiment analyzer outputs to one or more presentation computing devices associated with a presenter of live content [0006]. Fig. 15 is an example user interface with a provided group sentiment data chart [0163].)
Moudy does not explicitly teach, however, Xu teaches automatically updating, based at least in part on user input related to the evaluation score, at least a portion of the at least one machine learning-based evaluation score algorithm; and (Xu e.g. Xu teaches a system that provides analysis and personalized feedback based upon video and audio information gathered from a trainee (i.e. input) [0018]. The system comprises modules that process the video and audio information to create recommendations or scores [0019]. Feedback is provided for the purpose of refining learning models to improve results over time [0004]. Feedback from back propagation logic is applied to the video analyzer and audio analyzer as a closed-loop control system to further refine the quality of recommendations. “Backpropagation” refers to an algorithm used in neural networks to calculate a gradient for updating the weights in the neural network. Backpropagation algorithms are commonly used to train neural networks [0027].)
The Examiner submits that before the effective filing date, it would have been obvious to one of ordinary skill in the art to modify Moudy’s sentiment analyzer system to include updating machine learning techniques based on feedback as taught by Xu in order to improve model results over time (Xu e.g. [0004]).
Moudy, Xu, nor Hanks explicitly teach, however, Shen teaches determining, using one or more regression techniques and based at least in part on the evaluation score and one or more parameters associated with a given event, a recommended instructor for the given event; (Shen e.g. Shen teaches a method for recommending a teacher to a target student in a network teaching system [0002]. Candidate teachers can be recommended based on the characteristic information of the target student, thereby facilitating the target student to decide whether or not to reserve a course provided by the recommended candidate teachers [0008]. Fig. 1 network teaching system 10 may be used to provide teaching services between students and teachers. The teaching server 16 is capable of proactively recommending teachers whom the student may be interested in [0020]. A logistic regression method commonly used in the field of machine learning may be used to predict for the target student the probability of reserving a course from the candidate teachers. The features used for constructing a logistic regression model may include student features, teacher features, and/or student-teacher correlation features. The teacher features include at least one item selected from a group consisting of age, region, teaching seniority, graduation school  the number of reviews, rating, browsing crowd, days of induction, and the number of followers. The student teacher correlation features include at least one item selected from a group consisting of student browsing, student course reservation, student following, student evaluation, and teacher evaluation [0041]. The candidate teachers in the candidate teacher list are ranked based on the calculated probability and the top 20, more or less,  teachers who have the highest probability of course-reservation are recommended to the target student ([0045]-[0046]).)
The Examiner submits that before the effective filing date, it would have been obvious to one of ordinary skill in the art to modify Moudy in view of Xu and Hanks’ sentiment analyzer system to include using regression analysis to recommend instructors for events based on the evaluation score and one or more parameters as taught  by Shen in order to improve the success rate of course-reservation and improve the relevance of candidate teachers (Shen e.g. [0008] and [0035]) .
Moudy teaches wherein the method is performed by at least one processing device comprising a processor coupled to a memory (Moudy e.g. Steps in the Fig. 11 process may be performed by a multimodal feedback analyzer 1000 (Fig. 10) operating within or in collaboration with the various component servers and devices of a sentiment analyzer system 600 or 700 (Figs. 6-7) [0129]. The multimodal feedback analyzer 1000 (Fig. 10) may be implemented as one or more separate computing systems associated with sentiment analyzer systems 600 (Fig. 6) and/or content distribution networks 100 (Fig. 1) [0123]. Fig. 5 is an example computing system with processing units 504 and storage subsystem 510 [0072].).
As per claim 11 (Currently Amended), Moudy teaches a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device (Moudy e.g. The multimodal feedback analyzer 1000 (Fig. 10) may be implemented as one or more separate computing systems associated with sentiment analyzer systems 600 (Fig. 6) and/or content distribution networks 100 (Fig. 1) [0123]. Fig. 5 is an example computing system with processing units 504 and storage subsystem 510 [0072]. The system memory 518 and/or computer readable storage media 516 within the storage subsystem 510 may store program instructions that are loadable and executable on processing units 504 as well as data generated during the execution of these programs [0079].): 
Moudy teaches to determine one or more audio attributes associated with an instruction event by applying one or more machine learning techniques to audio data captured in connection with the instruction event; (See claim 1a for response.)
Moudy teaches to determine one or more video attributes associated with the instruction event by applying one or more machine learning techniques to video data captured in connection with the instruction event; (See claim 1b for response.)
Moudy in view of Xu teach to determine one or more image attributes associated with the instruction event by processing image data captured in connection with the instruction event using one or more convolutional neural network-based image classification algorithms, comprising: (See claim 1c for response.)
Moudy in view of Xu teach creating at least one time frame of images, captured during the instruction event using one or more imaging devices, of at least a portion of an audience of the instruction event and at least one instructor of the instruction event; and (See claim 1c(i) for response.)
Moudy in view of Xu teach calculating, by processing at least a portion of the at least one time frame of images using the one or more convolutional neural network-based image classification algorithms, one or more values attributed to the at least one instructor and one or more values attributed to at least a portion of the audience, at one or more instances of time during the instruction event, to provide at least a portion of the one or more image attributes; (See claim 1c(ii) for response.)
Moudy in view of Xu and Hanks teach to determine one or more context-based attributes associated with the instruction event by generating and processing one or more dynamic context-based audience queries in connection with the instruction event, wherein the one or more dynamic context-based audience queries are generated via applying one or more artificial intelligence techniques to at least a portion of one or more of the audio data, the video data, and the image data; (See claim 1d for response.)
Moudy teaches to generate, using at least one machine learning-based evaluation score algorithm, an evaluation score attributed to the at least one instructor of the instruction event based at least in part on the one or more audio attributes, the one or more video attributes, the one or more image attributes, and the one or more context-based attributes; (See claim 1e for response.)
Moudy teaches to output the evaluation score to at least one of one or more users and one or more centralized platforms; and  (See claim 1f for response.)
Moudy in view of Xu teach to automatically update, based at least in part on user input related to the evaluation score, at least a portion of the at least one machine learning-based evaluation score algorithm, and (See claim 1g for response.)
Moudy in view of Xu, Hanks, and Shen teach to determine, using one or more regression techniques and based at least in part on the evaluation score and one or more parameters associated with a given event, a recommended instructor for the given event. (See claim 1h for response.)
As per claim 16 (Currently Amended), Moudy teaches an apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured (Moudy e.g. FIG. 10 multimodal feedback analyzer for receiving and analyzing multimodal user feedback data and calculating raw sentiment scores for the feedback. The multimodal feedback analyzer 1000 may be implemented as one or more separate computing systems associated with sentiment analyzer systems 600 and/or content distribution networks 100 [0123]. Fig. 5 is an example computing system with processing units 504 and storage subsystem 510 [0072]. Processing unit 504 execute program code that may reside in storage subsystem 510 [0075].): 
Moudy teaches to determine one or more audio attributes associated with an instruction event by applying one or more machine learning techniques to audio data captured in connection with the instruction event; (See claim 1a for response.)
Moudy teaches to determine one or more video attributes associated with the instruction event by applying one or more machine learning techniques to video data captured in connection with the instruction event; (See claim 1b for response.)
Moudy in view of Xu teach to determine one or more image attributes associated with the instruction event by processing image data captured in connection with the instruction event using one or more convolutional neural network-based image classification algorithms, comprising (See claim 1c for response.): 
Moudy in view of Xu teach creating at least one time frame of images, captured during the instruction event using one or more imaging devices, of at least a portion of an audience of the instruction event and at least one instructor of the instruction event; and (See claim 1c(i) for response.)
Moudy in view of Xu teach calculating, by processing at least a portion of the at least one time frame of images using the one or more convolutional neural network-based image classification algorithms, one or more values attributed to the at least one instructor and one or more values attributed to at least a portion of the audience, at one or more instances of time during the instruction event, to provide at least a portion of the one or more image attributes; (See claim 1c(ii) for response.)
Moudy in view of Xu and Hanks teach to determine one or more context-based attributes associated with the instruction event by generating and processing one or more dynamic context-based audience queries in connection with the instruction event, wherein the one or more dynamic context-based audience queries are generated via applying one or more artificial intelligence techniques to at least a portion of one or more of the audio data, the video data, and the image data; (See claim 1d for response.)
Moudy teaches to generate, using at least one machine learning-based evaluation score algorithm, an evaluation score attributed to the at least one instructor of the instruction event based at least in part on the one or more audio attributes, the one or more video attributes, the one or more image attributes, and the one or more context-based attributes; (See claim 1e for response.)
Moudy teaches to output the evaluation score to at least one of one or more users and one or more centralized platforms; and (See claim 1f for response.)
Moudy in view of Xu teach to automatically update, based at least in part on user input related to the evaluation score, at least a portion of the at least one machine learning-based evaluation score algorithm; and (See claim 1g for response.)
Moudy in view of Xu, Hanks, and Shen teach to determine, using one or more regression techniques and based at least in part on the evaluation score and one or more parameters associated with a given event, a recommended instructor for the given event (See claim 1h for response.).
As per claim 2 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, Moudy teaches wherein the one or more audio attributes comprise at least one of tone of the at least one instructor, pitch of the at least one instructor, sentiment of the at least one instructor, and correctness of the at least one instructor in answering one or more audience queries. (Moudy e.g. Figs. 10-11, When multiple voices are contained within the same voice/audio feedback data (e.g., a two person conversation, or multi-person chat or discussion, etc.), the audio feedback analyzer 1030 may isolate individual user voices for separate sentiment analyses. For each individual voice pattern isolated within the voice/audio feedback data, the audio feedback analyzer 1030 may determine one or more sentiment and/or sentiment levels (or magnitudes) based on voice characteristics such as speech volume, speed, tone, pitch, inflection, and emphasis [0135].)
As per claims 3, 12, and 17 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, non-transitory processor-readable storage medium of claim 11, and apparatus of claim 16, Moudy teaches wherein determining the one or more audio attributes comprises training at least one model associated with the one or more machine learning techniques using labeled historical audio data. (Moudy e.g. Various natural language processing (NLP) engines, such as sentiment neural networks, may be used to calculate sentiment and analyze individual and group sentiment based on received feedback data [0165]. Such sentiment neural networks may be trained using content feedback data as input and corresponding sentiment-related results data as output, in order to construct a neural network data structure capable of determining associations between user feedback data and user sentiment with a high degree of accuracy ([0165]-[0167]). Feedback data include text, voice/audio, image/video, etc.) [0124]. Fig. 19 shows a process of generating and training an eLearning sentiment neural network for use by an eLearning CDN 100 [0179].)
As per claim 4 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, Moudy teaches wherein the one or more video attributes comprise at least one of sentiment of the at least one instructor and sentiment of the audience. (Moudy e.g. Figs. 10-11  Image or video feedback data may be transmitted to an image/video feedback analyzer in step 1107 (Fig. 11 and [0139]). Within the image/video feedback analyzer 1040, the image/video feedback data may be parsed and analyzed to determine sentiment using various techniques and processes. For each individual identified within the image/video feedback data, the image/video feedback analyzer 1040 may determine one or more sentiment and/or sentiment levels (or magnitudes) based on characteristics such as facial expressions, gestures, posture, eye position, eye movement, and the like [0140]. A sentiment score calculated in step 1109 may correspond to individual user's sentiment (e.g. user transmitting a video message), or the sentiment of a group of users (e.g. users listening to a talk or lecture), at a particular time and/or associated with a particular action (e.g. attending a class lecture) (Fig. 11 and  [0145]).)
As per claims 5, 13, and 18 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, non-transitory processor-readable storage medium of claim 11, and the apparatus of claim 16, Moudy teaches wherein determining the one or more video attributes comprises training at least one model associated with the one or more machine learning techniques using labeled historical video data. (Moudy e.g. Various natural language processing (NLP) engines, such as sentiment neural networks, may be used to calculate sentiment and analyze individual and group sentiment based on received feedback data [0165]. Such sentiment neural networks may be trained using content feedback data as input and corresponding sentiment-related results data as output, in order to construct a neural network data structure capable of determining associations between user feedback data and user sentiment with a high degree of accuracy ([0165]-[0167]). Feedback data include text, voice/audio, image/video, etc.) [0124]. Fig. 19 shows a process of generating and training an eLearning sentiment neural network for use by an eLearning CDN 100 [0179].)
As per claim 6 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, Moudy teaches wherein the one or more image 15attributes comprise at least one of sentiment of the at least one instructor and sentiment of the audience. (Moudy e.g. Figs. 10-11  Image or video feedback data may be transmitted to an image/video feedback analyzer in step 1107 (Fig. 11 and [0139]). Within the image/video feedback analyzer 1040, the image/video feedback data may be parsed and analyzed to determine sentiment using various techniques and processes. For each individual identified within the image/video feedback data, the image/video feedback analyzer 1040 may determine one or more sentiment and/or sentiment levels (or magnitudes) based on characteristics such as facial expressions, gestures, posture, eye position, eye movement, and the like [0140]. A sentiment score calculated in step 1109 may correspond to individual user's sentiment (e.g. user transmitting a video message), or the sentiment of a group of users (e.g. users listening to a talk or lecture), at a particular time and/or associated with a particular action (e.g. attending a class lecture) (Fig. 11 and  [0145]).)
As per claims 7, 14, and 19 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, non-transitory processor-readable storage medium of claim 11, and apparatus of claim 16, Moudy teaches wherein determining the one or more image attributes comprises training at least one model associated with the one or more 20machine learning techniques using labeled historical image data. (Moudy e.g. Various natural language processing (NLP) engines, such as sentiment neural networks, may be used to calculate sentiment and analyze individual and group sentiment based on received feedback data [0165]. Such sentiment neural networks may be trained using content feedback data as input and corresponding sentiment-related results data as output, in order to construct a neural network data structure capable of determining associations between user feedback data and user sentiment with a high degree of accuracy ([0165]-[0167]). Feedback data include text, voice/audio, image/video, etc.) [0124]. Fig. 19 shows a process of generating and training an eLearning sentiment neural network for use by an eLearning CDN 100 [0179].)
As per claim 8 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, Moudy teaches wherein the one or more context- based attributes comprises attentiveness of the audience. (Moudy e.g. Captured feedback data and user actions (i.e. text feedback, voice feedback, gestures, facial expressions, etc.) can correspond to the user’s sentiment output (e.g. high comprehension and engagement, etc.) [0182].)
As per claim 9 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, Moudy teaches wherein the one or more context- based attributes comprises comprehension of instruction event content by the audience. (Moudy e.g. Fig. 13 illustrates a real-time sentiment feedback system 1300 for receiving and analyzing user feedback data, and providing feedback to presenters of live content in real-time or near real-time [0150]. Live content can include lectures or presentations in a training course or eLearning system [0151]. Real-time aggregate group sentiment data may be provided to presentation device 1330 as in Fig. 15.  Fig. 15 user interface screen shows an alert and sentiment data chart 1520 that indicates a drop in group sentiment. The presenter may conclude that a subset of the group (e.g. a set of trainees or students) has failed to comprehend or appreciate the material ([0162]-[0164]).)
As per claims 10, 15, and 20 (Original), Moudy in view of Xu, Hanks, and Shen teach the computer-implemented method of claim 1, non-transitory processor-readable storage medium of claim 11, and apparatus of claim 16, Moudy teaches wherein generating the evaluation score comprises applying a predefined weight to each of the one or more audio attributes, the one 23113975.01 or more video attributes, the one or more image attributes, and the one or more context-based attributes. (Moudy e.g. Fig. 10, When calculating a raw sentiment score for an item of content feedback data (e.g., a discussion post, a text, audio, or video message, a reaction to or review of media content or a live talk or lecture, etc.), the raw sentiment score calculator 1060 may use the outputs of one or more multimodal feedback analyzers 1020-1040. If each feedback analyzer 1020-1040 outputs a modal numeric sentiment score, the calculator 1060 may combine (e.g., by weighting and/or averaging) the separate sentiment scores (Fig. 10 and [0146]-[0147]).)

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ayanna Minor whose telephone number is (571)272-3605. The examiner can normally be reached M-F 9am-5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jerry O'Connor can be reached on 571-272-6787. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/A.M./Examiner, Art Unit 3624                                                                                                                                                                                                        



/MEHMET YESILDAG/Primary Examiner, Art Unit 3624