DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Arguments
Applicant's arguments filed 8/22/2022 regarding claims 1, 8, 16 has been fully considered. The highlighted arguments are listed below and will be addressed accordingly.  


Argument #1 (REMARKS, pages 7-8): Applicant asserts that claims 1, 8, 16 overcomes the rejection under 35 U.S.C. 103 because, 	
“In addition, Sharma describes that "[t]he output ... of the [Visual-]LSTM ... can ... be computed ... as yt'' (a liveliness prediction label for a visual modality), and "[t]he output ... of the [Audio-]LSTM ... can ... be computed ... as y/' (a liveliness prediction label for an audio modality), where each of the respective liveliness prediction labels output from the Visual LSTM and the Audio-LSTM indicates whether the video clip is lively.6 Sharma also appears to describe that depending on "the setup corresponding to [the video] clip," one of the respective liveliness prediction labels from the Visual-LSTM or the "[A]udio-LSTM ... [may be] used as the final [liveliness prediction] label for the 10-second [video] clip."7 
In formulating the rejection, the Office Action seems to suggest that Sharma's Visual LSTM and Audio-LSTM, collectively, could correspond to a "gate expert," as recited in claim 1. 8 However, Sharma describes that "[t]he output ... of the [Visual-]LSTM ... can ... be computed ... as yt" (a liveliness prediction label for a visual modality), and that "[t]he output ... of the [Audio-]LSTM ... can ... be computed ... as yt'' (a liveliness prediction label for an audio modality), where each of the respective liveliness prediction labels indicates whether a 10 second video clip is lively.9 Sharma further describes that one of the respective liveliness prediction labels may be "used as [a] final [liveliness prediction] label for [a] 10-second clip." 10 In contrast, Applicant's claim 1 recites, "a gate expert configured to ... output a set of weights ... [used] to generate a weighted prediction." In other words, each weight in the set of weights contributes to the same weighted prediction. Respectfully, a liveliness prediction label used as a final liveliness predication label is not the same as a set of weights used to generate a weighted prediction. Moreover, one label that is repurposed as another label is not the same as a set of weights used to generate something other than a set of weights-that is, a weighted prediction.  Indeed, nowhere does Sharma disclose or suggest that a liveliness prediction label is used to generate a "Weighted prediction. Therefore, even assuming arguendo that one (or both) of Sharma's respective liveliness prediction labels output from the Visual-LSTM or the Audio-LSTM could somehow correspond to a "set of weights"-a proposition not accepted by Applicant, Sharma would still fail to disclose or suggest "a gate expert configured to ... output a set of weights ... [used] to generate a weighted prediction," as recited in claim 1.
For at least the foregoing reasons, Applicant submits that Sharma fails to disclose or suggest the above-recited features of Applicant's claim 1. None of the other cited references cure the critical deficiencies of Sharma discussed above. Therefore, Applicant's claim 1 is patentable over the cited references.”

Examiner respectfully disagrees, and indicates that the cited Prior Art reasonably address limitations of the claimed invention. Applicant is reminded that Examiner will interpret each claim in the broadest reasonable sense, as such, the claims and only the claims form the metes and bounds of the invention.
The applicant has failed to produce evidence in the prior art that precludes the combination of Sharma and Wu from carrying out the weighted prediction in claims 1, 8, 16.  It is well-known in image analysis that Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) perform weighted predictions by implementing weights in layers and adjusting weights to extract data and output image data predictions (see figure 1, section 3.2.3, section 3.5), using output prediction functions such as a softmax function (see section 3.5).  The examiner notes that CNNs and LSTM perform weighted predictions.  The applicant must further define the claim language “weighted prediction” to distinguish the claims from the well-known neural network algorithms CNNs and LSTMs.

	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma et al. (Non-Patent Literature titled “LIVELINET: A Multimodal Deep Recurrent Neural Network to Predict Liveliness in Educational Videos”) in view of Wu et al. (Non-Patent Literature titled “Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification”).

Regarding claim 1, Sharma teaches a system comprising: a plurality of expert modules, each expert module configured to receive data corresponding to one of a plurality of input modalities and extract associated features (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame.  Each neural network extracts features);
a plurality of class prediction modules, each class prediction module configured to receive extracted features from a corresponding one of the expert modules and predict an associated class (see figure 1, section 3.3, where Sharma discusses Convolutional Neural Network and Long Short Term Memory (LSTM) networks that contain expert layers producing prediction values);
 a gate expert configured to receive the extracted features from the plurality of expert modules and output a set of weights for the input modalities (see figure 1, section 3.6, where Sharma discusses Convolutional Neural Network and Long Short Term Memory networks that contain gates and produce prediction weights).
Wu teaches a fusion module configured to generate a weighted prediction based on the class predictions and the set of weights (see figure 1, section 3.3, section 4.2.2, where Wu discusses multi-class prediction fusion of Convolutional Neural Network and Long Short Term Memory (LSTM) networks).
Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 1.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.
The determination of obviousness is predicated upon the following:  One skilled in the art would have been motivated to modify Sharma in this manner in order to improve class prediction using Neural Networks using weighted neuron parameters that adjust the network and ultimately produce fused weight predictions for the different modalities optical flow, image frame, and audio spectrogram.  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in this manner explained using known engineering design, interface and/or programming techniques, without changing a fundamental operating principle of Sharma, while the teaching of Wu continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of combining prediction values from different modalities to optimize the classification performance of the network.  The Sharma and Wu systems perform Neural Network classification, therefore one of ordinary skill in the art would have reasonable expectation of success in the combination.  It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.

Regarding claim 2, Sharma teaches wherein the input modalities include images, video and audio (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame).
The same motivation of claim 1 is applied to claim 2.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 2.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 3, Sharma teaches wherein the expert modules comprise an image expert, a video expert and an audio expert (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame).
The same motivation of claim 1 is applied to claim 3.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 3.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 4, Sharma teaches wherein each expert module comprises a neural network (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality).
The same motivation of claim 1 is applied to claim 4.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 4.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 5, Sharma and Wu teach wherein at least one of the class prediction modules comprises a long short-term memory network (see figure 1, section 3.6, where Sharma discusses long short-term memory networks to form predictions; see figure 1, where Wu discusses long short-term memory networks).
The same motivation of claim 1 is applied to claim 5.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 5.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 6, Sharma and Wu teach wherein the gate expert comprises a long short-term memory network (see figure 1, section 3.6, where Sharma discusses long short-term memory networks to form predictions; see figure 1, where Wu discusses long short-term memory networks).
The same motivation of claim 1 is applied to claim 6.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 6.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 7, Wu teaches further comprising a co-learning framework (see figure 1, section 3.3, section 4.2.2, where Wu discusses multi-class prediction fusion).
The same motivation of claim 1 is applied to claim 7.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 7.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 8, Sharma teaches a method comprising: receiving a plurality of data streams, each data stream having a corresponding input modality (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame.  Each neural network extracts features);
for each data stream, extracting features associated with the corresponding input modality, and determining an event classification using the corresponding extracted features (see figure 1, section 3.6, where Sharma discusses a neural network for each input modality that extracts features and classifies visual features and audio features);
generating a set of weights from the extracted features (see figure 1, section 3.6, where Sharma discusses convolutional neural network and LSTM networks contain weights for data feature extraction); 
performing at least one action in accordance with the weighted classification prediction (see figure 1, section 3.6, where Sharma discusses performing qualitative analysis).
Wu teaches fusing the event classifications and the set of weights to produce a weighted classification prediction for the plurality of data streams (see figure 1, section 3.3, section 4.2.2, where Wu discusses multi-class prediction fusion for audio, optical flow, and image frame data streams).
Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 8.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.
The determination of obviousness is predicated upon the following:  One skilled in the art would have been motivated to modify Sharma in this manner in order to improve class prediction using Neural Networks using weighted neuron parameters that adjust the network and ultimately produce fused weight predictions for the different modalities optical flow, image frame, and audio spectrogram.  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in this manner explained using known engineering design, interface and/or programming techniques, without changing a fundamental operating principle of Sharma, while the teaching of Wu continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of combining prediction values from different modalities to optimize the classification performance of the network.  The Sharma and Wu systems perform Neural Network classification, therefore one of ordinary skill in the art would have reasonable expectation of success in the combination.  It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.



Regarding claim 9, Sharma teaches wherein receiving the plurality of data streams comprises sensing, using a plurality of sensor types, one or more conditions in an environment, and wherein each of the plurality of sensor types has a corresponding input modality (see figure 1, figure 2, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, optical flow, and individual image frame).
The same motivation of claim 1 is applied to claim 9.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 9.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 10, Sharma teaches wherein each of the plurality of data streams contributes to the weighted classification prediction with a different degree of confidence depending on sensed conditions (see figure 2, section 3.5, where Sharma discusses the prediction output results of the LSTM networks for each sensed visual condition). 
The same motivation of claim 1 is applied to claim 10.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 10.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 11, Wu teaches wherein the set of weights is generated dynamically to combine information from the plurality of sensor types in accordance with sensed conditions (see figure 1, section 4.2.2, where Wu discusses weighted fusion of data from each modality network from different sensor data).
The same motivation of claim 1 is applied to claim 11.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 11.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 12, Sharma and Wu teach wherein for each data stream, extracting features associated with the corresponding input modality and determining an event classification using the corresponding extracted features, further comprises: providing a plurality of trained neural networks, each trained neural network configured to classify one or more events for an input modality (see figure 1, section 3.6, where Sharma discusses plurality of CNN and long short-term memory networks that contain training layers that output classification predictions; see figure 1, where Wu discusses plurality of CNN and long short-term memory networks).
The same motivation of claim 1 is applied to claim 12.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 12.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 13, Wu teaches wherein generating a set of weights from the extracted features further comprises providing a gating modular neural network trained to dynamically generate a set of weights for the outputs of a plurality of sensor networks, by balancing a utility of each data stream (see figure 1, section 3.2, section 3.3, where Wu discusses set of weights for each LSTM network output).
The same motivation of claim 1 is applied to claim 13.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 13.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 14, Sharma and Wu teach wherein fusing the class predictions and the set of weights to produce a weighted classification prediction for the plurality of data streams, further comprises: detecting voice activity in the plurality of data streams; and providing corresponding input frames to one or more applications for further processing (see figure 1, section 1, where Sharma discusses a neural network for voice audio; see figure 1, section 3.1, where Wu discusses detecting voice activity in audio spectrogram input).
The same motivation of claim 1 is applied to claim 14.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 14.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 15, Sharma teaches wherein the input modalities include images, video and audio (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame).
The same motivation of claim 1 is applied to claim 15.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 15.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Claim 16 is rejected as applied to claim 8 as pertaining to a corresponding system.
Claim 17 is rejected as applied to claim 15 as pertaining to a corresponding system.
Claim 18 is rejected as applied to claim 12 as pertaining to a corresponding system.

Regarding claim 19, Sharma and Wu teach wherein predicting a class using the corresponding extracted features comprises inputting the extracted features to a long short-term memory network (see figure 1, section 3.6, where Sharma discusses long short-term memory networks to form predictions; see figure 1, where Wu discusses long short-term memory networks).
The same motivation of claim 1 is applied to claim 19.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 19.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 20, Sharma and Wu teach wherein generating the set of weights from the extracted features comprises inputting the extracted features to a long short-term memory network (see figure 1, section 3.6, where Sharma discusses long short-term memory networks to form predictions; see figure 1, where Wu discusses long short-term memory networks).
The same motivation of claim 1 is applied to claim 20.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 20.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 


Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KENNY A CESE whose telephone number is (571) 270-1896.  The examiner can normally be reached on Monday – Friday, 9am – 4pm.
If attempts to reach the primary examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571) 270-1051.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Kenny A Cese/
Primary Examiner, Art Unit 2663