DETAILED ACTION


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Information Disclosure Statement
The information disclosure statements (IDS) filed on 8/23/2019 and 10/11/2019 were considered and placed on the file of record by the examiner.

	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma et al. (Non-Patent Literature titled “LIVELINET: A Multimodal Deep Recurrent Neural Network to Predict Liveliness in Educational Videos”) in view of Wu et al. (Non-Patent Literature titled “Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification”).

Regarding claim 1, Sharma teaches a system comprising: a plurality of expert modules, each expert module configured to receive data corresponding to one of a plurality of input modalities and extract associated features (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame.  Each neural network extracts features);
a plurality of class prediction modules, each class prediction module configured to receive extracted features from a corresponding one of the expert modules and predict an associated class (see figure 1, section 3.3, where Sharma discusses Convolutional Neural Network and Long Short Term Memory (LSTM) networks that contain expert layers producing prediction values);
 a gate expert configured to receive the extracted features from the plurality of expert modules and output a set of weights for the input modalities (see figure 1, section 3.6, where Sharma discusses Long Short Term Memory (LSTM) networks that contain gates and produce prediction weights).
Wu teaches a fusion module configured to generate a weighted prediction based on the class predictions and the set of weights (see figure 1, section 3.3, section 4.2.2, where Wu discusses multi-class prediction fusion of Convolutional Neural Network and Long Short Term Memory (LSTM) networks).
Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 1.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.
The determination of obviousness is predicated upon the following:  One skilled in the art would have been motivated to modify Sharma in this manner in order to improve class prediction using Neural Networks using weighted neuron parameters that adjust the network and ultimately produce fused weight predictions for the different modalities optical flow, image frame, and audio spectrogram.  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in this manner explained using known engineering design, interface and/or programming techniques, without changing a fundamental operating principle of Sharma, while the teaching of Wu continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of combining prediction values from different modalities to optimize the classification performance of the network.  The Sharma and Wu systems perform Neural Network classification, therefore one of ordinary skill in the art would have reasonable expectation of success in the combination.  It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.

Regarding claim 2, Sharma teaches wherein the input modalities include images, video and audio (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame).
The same motivation of claim 1 is applied to claim 2.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 2.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 3, Sharma teaches wherein the expert modules comprise an image expert, a video expert and an audio expert (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame).
The same motivation of claim 1 is applied to claim 3.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 3.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 4, Sharma teaches wherein each expert module comprises a neural network (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality).
The same motivation of claim 1 is applied to claim 4.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 4.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 5, Sharma and Wu teach wherein at least one of the class prediction modules comprises a long short-term memory network (see figure 1, section 3.6, where Sharma discusses long short-term memory networks to form predictions; see figure 1, where Wu discusses long short-term memory networks).
The same motivation of claim 1 is applied to claim 5.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 5.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 6, Sharma and Wu teach wherein the gate expert comprises a long short-term memory network (see figure 1, section 3.6, where Sharma discusses long short-term memory networks to form predictions; see figure 1, where Wu discusses long short-term memory networks).
The same motivation of claim 1 is applied to claim 6.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 6.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 7, Wu teaches further comprising a co-learning framework (see figure 1, section 3.3, section 4.2.2, where Wu discusses multi-class prediction fusion).
The same motivation of claim 1 is applied to claim 7.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 7.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 8, Sharma teaches a method comprising: receiving a plurality of data streams, each data stream having a corresponding input modality (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame.  Each neural network extracts features);
for each data stream, extracting features associated with the corresponding input modality, and determining an event classification using the corresponding extracted features (see figure 1, section 3.6, where Sharma discusses a neural network for each input modality that extracts features and classifies visual features and audio features);
generating a set of weights from the extracted features (see figure 1, section 3.6, where Sharma discusses convolutional neural network and LSTM networks contain weights for data feature extraction); 
performing at least one action in accordance with the weighted classification prediction (see figure 1, section 3.6, where Sharma discusses performing qualitative analysis).
Wu teaches fusing the event classifications and the set of weights to produce a weighted classification prediction for the plurality of data streams (see figure 1, section 3.3, section 4.2.2, where Wu discusses multi-class prediction fusion for audio, optical flow, and image frame data streams).
Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 8.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.
The determination of obviousness is predicated upon the following:  One skilled in the art would have been motivated to modify Sharma in this manner in order to improve class prediction using Neural Networks using weighted neuron parameters that adjust the network and ultimately produce fused weight predictions for the different modalities optical flow, image frame, and audio spectrogram.  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in this manner explained using known engineering design, interface and/or programming techniques, without changing a fundamental operating principle of Sharma, while the teaching of Wu continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of combining prediction values from different modalities to optimize the classification performance of the network.  The Sharma and Wu systems perform Neural Network classification, therefore one of ordinary skill in the art would have reasonable expectation of success in the combination.  It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.



Regarding claim 9, Sharma teaches wherein receiving the plurality of data streams comprises sensing, using a plurality of sensor types, one or more conditions in an environment, and wherein each of the plurality of sensor types has a corresponding input modality (see figure 1, figure 2, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, optical flow, and individual image frame).
The same motivation of claim 1 is applied to claim 9.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 9.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 10, Sharma teaches wherein each of the plurality of data streams contributes to the weighted classification prediction with a different degree of confidence depending on sensed conditions (see figure 2, section 3.5, where Sharma discusses the prediction output results of the LSTM networks for each sensed visual condition). 
The same motivation of claim 1 is applied to claim 10.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 10.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 11, Wu teaches wherein the set of weights is generated dynamically to combine information from the plurality of sensor types in accordance with sensed conditions (see figure 1, section 4.2.2, where Wu discusses weighted fusion of data from each modality network from different sensor data).
The same motivation of claim 1 is applied to claim 11.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 11.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 12, Sharma and Wu teach wherein for each data stream, extracting features associated with the corresponding input modality and determining an event classification using the corresponding extracted features, further comprises: providing a plurality of trained neural networks, each trained neural network configured to classify one or more events for an input modality (see figure 1, section 3.6, where Sharma discusses plurality of CNN and long short-term memory networks that contain training layers that output classification predictions; see figure 1, where Wu discusses plurality of CNN and long short-term memory networks).
The same motivation of claim 1 is applied to claim 12.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 12.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 13, Wu teaches wherein generating a set of weights from the extracted features further comprises providing a gating modular neural network trained to dynamically generate a set of weights for the outputs of a plurality of sensor networks, by balancing a utility of each data stream (see figure 1, section 3.2, section 3.3, where Wu discusses set of weights for each LSTM network output).
The same motivation of claim 1 is applied to claim 13.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 13.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 14, Sharma and Wu teach wherein fusing the class predictions and the set of weights to produce a weighted classification prediction for the plurality of data streams, further comprises: detecting voice activity in the plurality of data streams; and providing corresponding input frames to one or more applications for further processing (see figure 1, section 1, where Sharma discusses a neural network for voice audio; see figure 1, section 3.1, where Wu discusses detecting voice activity in audio spectrogram input).
The same motivation of claim 1 is applied to claim 14.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 14.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 15, Sharma teaches wherein the input modalities include images, video and audio (see figure 1, section 3.3, where Sharma discusses a neural network for each input modality comprising audio, video, and individual image frame).
The same motivation of claim 1 is applied to claim 15.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 15.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Claim 16 is rejected as applied to claim 8 as pertaining to a corresponding system.
Claim 17 is rejected as applied to claim 15 as pertaining to a corresponding system.
Claim 18 is rejected as applied to claim 12 as pertaining to a corresponding system.

Regarding claim 19, Sharma and Wu teach wherein predicting a class using the corresponding extracted features comprises inputting the extracted features to a long short-term memory network (see figure 1, section 3.6, where Sharma discusses long short-term memory networks to form predictions; see figure 1, where Wu discusses long short-term memory networks).
The same motivation of claim 1 is applied to claim 19.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 19.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Regarding claim 20, Sharma and Wu teach wherein generating the set of weights from the extracted features comprises inputting the extracted features to a long short-term memory network (see figure 1, section 3.6, where Sharma discusses long short-term memory networks to form predictions; see figure 1, where Wu discusses long short-term memory networks).
The same motivation of claim 1 is applied to claim 20.  Motivation to combine may be gleaned from the prior art considered.  It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify the invention of Sharma with Wu to derive at the invention of claim 20.  The result would have been expected, routine, and predictable in order to perform class prediction using Neural Networks.

Conclusion

Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KENNY A CESE whose telephone number is (571) 270-1896.  The examiner can normally be reached on Monday – Friday, 9am – 4pm.
If attempts to reach the primary examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571) 270-1051.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Kenny A Cese/
Primary Examiner, Art Unit 2663