DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-9, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. ("User-level psychological stress detection from social media using deep neural network.") in view of Furukawa et al. ("Parallel grid-based recursive Bayesian Papadimitriou et al. (“Optimal multi-scale patterns in time series streams”).
Regarding Claim 1,
Liu teaches a computer-implemented method of determining patterns within a time-series social-media data set, the computer-implemented method comprising: 
receiving, by a social-media-data pattern-identification system, the time-series social-media data set (pg. 512 section 4.3; “The attributes of tweets from a user’s weekly tweet postings in timeline form a time-series…They focus on learning stationary local attributes for series like images (pixel series), speeches and other time-series. We can learn user-scope content attributes from a series of single tweet in time-series to describe one’s stress state in a week.”); 
applying, by the social-media-data pattern-identification system, a deep-learning algorithm to the time-series social-media data set (pg. 511 Section 4.1; 1) First we design a convolutional neural network with cross autoencoders to generate user-scope content attributes from low-level content attributes, thus the tweet-scope content attributes can be combined with the userscope statistical attributes; 2) We propose a deep neural network model to incorporate the two types of user-scope attributes for user-level psychological stress detection.), wherein the deep-learning algorithm is designed and configured to analyze the time-series social-media data set for patterns (pg. 508; Micro-blog is one of the most popular social media that can be publicly accessed. People can post text with no more than 140 words, upload images or have social interactions with others. Employing real online micro-blog data, we first investigate the correlations between users’ stress and their tweeting content, behavior patterns and social engagement.) across multiple time scales… and to output pattern-identification data containing information on patterns in a plurality of the multiple time scales and across a plurality of the multiple time scales (pg. 513; Especially, to avoid the noise in data ground truth, we establish a small scale dataset DB2 from Sina Weibo. And pg. 514 In the following experiments, we first train and test our model on the large-scale Sina Weibo dataset DB1.); and 
providing the output pattern-identification data to an output-interface of the social-media- data pattern-identification system (Figure 1 and Figure 4; Examiner note: The outputs are displayed.); 
wherein the deep-learning algorithm comprising a …convolutional Bayesian model (CBM)… ((pg. 511 Section 4.1; 1) First we design a convolutional neural network with cross autoencoders to generate user-scope content attributes from low-level content attributes, thus the tweet-scope content attributes can be combined with the userscope statistical attributes; 2) We propose a deep neural network model to incorporate the two types of user-scope attributes for user-level psychological stress detection. And pg. 514; Naive Bayes (NB) is a simple probabilistic classifier based on Bayes’ theorem that calculates the posterior probability by calculating prior probability of attributes.) having a hierarchical structure that includes multiple Convolutional Bayesian model (CBM) levels stacked with one another (Figure 5-8; The following figures depict a hierarchical structure of the neural network with stacked layers.)  and configured to capture compositional structure of social dynamics across multiple time scales (pg. 511; The network is trained to reconstruct input pattern from activation of the hidden layer, which is actually stimulated by the input itself.).
Liu does not explicitly disclose
wherein the deep-learning algorithm is designed and configured to analyze the time-series social-media data set for patterns across multiple time scales of differing lengths...
wherein the deep-learning algorithm comprising a recursive convolutional Bayesian model… (RCBM)…
However, Papadimitriou teaches
wherein the deep-learning algorithm is designed and configured to analyze the time-series social-media data set for patterns across multiple time scales (Pg. 647, Abs. We introduce a method to discover optimal local patterns, which concisely describe the main trends in a time series. Our approach examines the time series at multiple time scales (i.e., window sizes) and efficiently discovers the key patterns in each.) of differing lengths... (pg. 649, section 3; Multi-scale: We do not want to restrict examination to a finite, predetermined maximum window size, or we will miss long range trends that occur at time scales longer than the window size).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method of extracting patterns from time series data of Liu (pg. 507, Abs. Employing real online micro-blog data, we first investigate the correlations between users’ stress and their tweeting content, social engagement and behavior patterns. Then we define two types of stress-related attributes: 1) low-level content attributes from a single tweet, including text, images and social interactions; 2) user-scope statistical attributes through their weekly micro-blog postings, leveraging information of tweeting time, tweeting types and linguistic styles.) with the method of extracting patterns from time series data of Papadimitriou.
Doing so would allow for capturing optimal patterns using a smaller magnitutde of time and space (pg. 656; In summary, the streaming approach’s patterns capture all the essential information, while requiring 1–4 orders of magnitude less time and 1–2 orders of magnitude less space.)
Furukawa teaches
wherein the deep-learning algorithm comprising a recursive convolutional Bayesian model… (RCBM)… (pg. 316; This paper presents the parallelization of grid-based recursive Bayesian estimation (RBE) using a graphics processing unit (GPU) for real-time control of autonomous vehicles. And pg. 318; However, this equation also shows that the computation time for prediction is largely dominated by the size of the convolution kernel.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the predictive model of Lin with the of implementing a predictive model of Furukawa.
Doing so would allow for a faster computational speed (pg. 317; The RBE, described in Section II, has multiple processes which lend themselves to computational speed-up by parallelization.).

Regarding Claim 5,
pg. 316; This paper presents the parallelization of grid-based recursive Bayesian estimation (RBE) using a graphics processing unit (GPU) for real-time control of autonomous vehicles. And pg. 318; However, this equation also shows that the computation time for prediction is largely dominated by the size of the convolution kernel.).

Regarding Claim 6,
Liu, Papadimitriou, and Furukawa teach the computer-implemented method according to claim 1. Liu further teaches wherein the deep-learning algorithm comprises a set of filter matrices (Pg. 513 The CAE units are used as filters in the 1D CNN (Fig.7) and convolute over the sequence of tweets to form one feature map. Thus the latent user-scope attributes can be generated from the low-level attributes from the single tweets.) and a corresponding set of activation vectors applied to the time-series social-media data set (pg. 510; Thus, we get a 3-dimensional vector to represent the social attributes of a tweet.), wherein solution for the filter matrices results in the output-identification data (pg. 508 Our solution: Inspired by previous research [19], we have built a stressed-twitter-posting database using the “I feel stressed” sentence pattern as the ground-truth label for detecting stress from micro-blog data. With a small set of psychological stress scale score labeled dataset as test, it is proved that our ground truth labeling method is reliable).
Regarding Claim 7,

setting an initial temporal resolution to be a finest resolution for the particular application (pg. 513 Especially, to avoid the noise in data ground truth, we establish a small scale dataset DB2 from Sina Weibo. DB2 is collected from the users that have shared the score of a psychological stress scale2 with 50 items via Sina Weibo. If the resulted score is over 80, then the test subject is claimed to be stressed.); and 
using updating rules (pg. 512; CAE can be trained with standard gradient descent algorithms, but with a special designed data set.), learning relevant patterns (pg. 512; To train the autoencoder to reconstruct input pattern and learn distinctive attributes on the hidden layer, we minimize the following performance function by updating the parameter set with gradient descent) starting with the initial temporal resolution and iterating the learning using at least one increased temporal resolution (pg. 515 Though the training of model can be done offline, efficiency is still a considerable factor for evaluating an algorithm. For DNN model, we sum up both pre-training phase and finetuning phase.).
Regarding Claim 8,
Liu, Papadimitriou, and Furukawa the computer-implemented method according to claim 7. Liu wherein each of the at least one increased temporal resolution is determined by max-pooling (pg. 513 There are two commonly used pooling operations: max-pooling and mean-pooling. When max pooling is used, the pooled attribute unit is assigned with the maximally activation among all units in the attribute map. When mean-pooling is applied, the mean of activations of all units in the attribute map is assigned to the pooled attribute unit.) activation strength vectors (pg. 511; We use a 4-dimensional vector of the numbers of tweets in the above 4 types respectively to represent the tweeting type attribute.).
Regarding Claim 9,
Liu, Papadimitriou, and Furukawa teach the computer-implemented method according to claim 7. Lin further teaches further comprising: 
determining whether a most recent one of the at least one temporal resolution is still relevant to the particular application (Table 5; pg. 515; As shown in Table 5, we report predict performance of using content attributes (composed with only the named attributes in Table 5) alone as well as combining with statistical attributes. Using just text attribute gains rather high performance. Simply combining visual or social attributes even reduces the result, especially the social attributes. This trend is even more obvious when both types of attributes (content and statistical) are used. Nevertheless, using all attributes together outperforms using only text attributes. Highest detection performance is observed when using all attribute and working with both types of attributes.); 
terminating the iterating when the most recent one of the at least one temporal resolution is not relevant to the application (pg. 515; The results show that training DNN takes around 5 hours which is still reasonable while it get the best detection performance results.); and 
pg. 515; We use a matured model trained with large scale Sina Weibo dataset, and then test it against another set of subject independently sampled from Sina Weibo. For the test set, we collect weekly tweets from the users that have shared the score of a psychological stress scale with 50 items via Sina Weibo. Detection result shows that the test accuracy is 74.13% and f1-score is 0.7778, which approves that the overall model is consistent and the sentence pattern based ground truth labeling method is reliable.) and the at least one increased temporal resolution (pg. 516; We test on data collected from another major Chinese Micro-blog platform. For this test, we use the attribute extractor trained with large scale Sina Weibo dataset and only finetune the network with Twitter dataset in 5-fold. The accuracy is 76.78% and f1-score is 0.7915 which demonstrate the capability of the proposed model.).
Regarding Claim 21,
Liu, Papadimitriou, and Furukawa teach the computer-implemented method according to claim 1. Liu further teaches wherein all of the multiple CBM levels share a common structure (pg. 511; w1 and w2 are the connection weights while b1 and b2 are bias to the postsynaptic units. Each layer have connection weights, which are the shared common structure.).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. ("User-level psychological stress detection from social media using deep neural network.") in view of Furukawa et al. ("Parallel grid-based recursive Bayesian Papadimitriou et al. (“Optimal multi-scale patterns in time series streams”), and Strigl et al. ("Performance and scalability of GPU-based convolutional neural networks.").
Regarding Claim 3,
Liu, Papadimitriou, and Furukawa teach the computer-implemented method according to claim 1.
Lin et al. Papadimitriou, and Furukawa do not explicitly disclose wherein the RCBM uses a convolutional operator that carries out a scale-and-copy task.
	However, Strigl et al. teaches (“Performance and Scalability of GPU-based Convolutional Neural Networks”) teaches
wherein the RCBM uses a convolutional operator that carries out a scale-and-copy task (pg. 6; The input is copied to a matrix where the elements of each convolutional kernel form one row (in [19]) or one column (in our implementation [25]). And pg. 6; In this benchmark we scaled the input size of the training patterns fed to a LeNet5. Increasing the input size automatically increases the number of neurons in the convolutional and subsampling layers and the number of trainable parameters (weights, biases).).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the CNN of Lin et al. with the CNN of Strigl et al.
Doing so would allow for a shorter training time (pg. 2; Therefore, we implemented a high performance library in CUDA to perform fast training and classification of CNNs on the GPU..
Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. ("User-level psychological stress detection from social media using deep neural network.") in view of Furukawa et al. ("Parallel grid-based recursive Bayesian estimation using GPU for real-time autonomous navigation."), Papadimitriou et al. (“Optimal multi-scale patterns in time series streams”), and O’Shea et al. ("An introduction to convolutional neural networks.").
Regarding Claim 22,
Liu, Papadimitriou, and Furukawa teach the computer-implemented method according to claim 21.
	Liu, Papadimitriou, and Furukawa do not explicitly disclose
	wherein each of the multiple CBM levels has a set of a number, K, of activation vectors, with K remaining roughly the same across the multiple CBM levels.
	However, O’Shea (“An Introduction to Convolutional Neural Networks”) teaches 
 	wherein each of the multiple CBM levels has a set of a number, K, of activation vectors, with K remaining roughly the same across the multiple CBM levels (pg. 9; An example of this problem could be in filtering a large image (anything over 128 × 128 could be considered large), so if the input is 227 × 227 (as seen with ImageNet) and we’re filtering with 64 kernels each with a zero padding of then the result will be three activation vectors of size 227 × 227 × 64 - which calculates to roughly 10 million activations - or an enormous 70 megabytes of memory per image.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the CNN of Liu with the CNN of O’Shea.
pg. 392; This scheme extracts CNN activations for local patches at multiple scale levels, performs orderless VLAD pooling of these activations at each level separately, and concatenates the result. The resulting MOP-CNN representation can be used as a generic feature for either supervised or unsupervised recognition tasks, from image classification to instance-level retrieval;).
Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. ("User-level psychological stress detection from social media using deep neural network.") in view of Furukawa et al. ("Parallel grid-based recursive Bayesian estimation using GPU for real-time autonomous navigation."), Papadimitriou et al. (“Optimal multi-scale patterns in time series streams”), and Sharma et al. (US-20160206250-A1).
Regarding Claim 23,
Liu, Papadimitriou, and Furukawa teach the computer-implemented method according to claim 1.
	Liu, Papadimitriou, and Furukawa do not explicitly disclose
 	wherein the RCBM has a joint probability, and the deep-learning algorithm further comprises decomposing the joint probability using Bayes' rule.
	However, Sharma (US 20160206250 A1) teaches
wherein the RCBM has a joint probability, and the deep-learning algorithm further comprises decomposing the joint probability using Bayes' rule (para [0204] When there are multiple variables involved, Bayes' rule may be expanded. The posterior probability computations will then involve computing joint probability distributions and defining multiple combinations of conditional probabilities.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the Bayesian model of Liu with the Bayesian model of Sharma.
	Doing so would allow for handling multiple variables (para [0204] When there are multiple variables involved, Bayes' rule may be expanded.). 
Claims 24, 25, 27, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. ("User-level psychological stress detection from social media using deep neural network.") in view of Furukawa et al. ("Parallel grid-based recursive Bayesian estimation using GPU for real-time autonomous navigation."), Papadimitriou et al. (“Optimal multi-scale patterns in time series streams”), and Taylor et al. ("Convolutional learning of spatio-temporal features.").
Regarding Claim 24, 
Liu, Papadimitriou, and Furukawa teach the computer-implemented method according to claim 1.
	Liu, Papadimitriou, and Furukawa do not explicitly disclose
wherein the RCBM has been trained with external data using only a first one of the multiple CBM levels.
However, Taylor (“Convolutional Learning of Spatio-temporal Features”) teaches
wherein the RCBM has been trained with external data using only a first one of the multiple CBM levels (pg. 148; We approach the problem with a multi-stage architecture (see Figure 3) that combines convolutional and fully-connected layers. At the lowest layer, a convolutional GRBM extracts features from every successive pair of frames. And Pg. 148; The convGRBM is trained unsupervised using CD, while the upper layers are trained by backpropagation. We do not backpropagate through the first layer following unsupervised training, though this could be done to make the low-level features more discriminative.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the CNN of Liu with the CNN of Taylor.
Doing so would allow for extracting useful features from training data (pg. 141; We then use it to extract useful features for human activity recognition in a multi-stage architecture that achieves state-of-the-art performance on the KTH actions dataset.)
Regarding Claim 25,
Liu, Papadimitriou, Furukawa and Taylor teach the computer-implemented method according to claim 24. Taylor further teaches wherein each of the multiple CBM levels beyond the first CBM level has been trained using a set of dynamics obtained by down-sampling a set of activation vectors from an immediately prior one of the multiple CBM levels (pg. 148; The output of the second convolutional layer is a series of 3D feature maps. To cope with variable-length sequences, we perform an additional max pooling in the temporal dimension. This ensures that the mid-level features can be reduced to a vector of consistent size. And pg. 149; The nonlinearities we used were identical to those in [7] with the exception of extending contrast normalization and downsampling to 3D: LCN was performed using a 9×9×9 smoothing filter, followed by 4×4×4 average downsampling.).
Regarding Claim 27,

	Liu, Papadimitriou, and Furukawa do not explicitly disclose
wherein the multiple CBM levels have been trained for differing levels of abstraction.
However, Taylor teaches
wherein the multiple CBM levels have been trained for differing levels of abstraction (pg. 140; In recent years, feature-learning methods have focused on learning multiple layers of feature hierarchies to extract increasingly abstract representations at each stage.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the CNN of Liu with the CNN of Taylor.
Doing so would allow for extracting useful features from training data (pg. 141; We then use it to extract useful features for human activity recognition in a multi-stage architecture that achieves state-of-the-art performance on the KTH actions dataset.)
Regarding Claim 28,
Liu, Papadimitriou, Furukawa and Taylor teach the computer-implemented method according to claim 27. Liu further teaches wherein the multiple CBM levels include a first CBM level, and training for differing levels of abstraction involves training each of the multiple CBM levels beyond the first CBM level using a set of dynamics obtained by down-sampling a set of activation vectors from an immediately prior one of the multiple CBM levels (Pg. 148; The convGRBM is trained unsupervised using CD, while the upper layers are trained by backpropagation. We do not backpropagate through the first layer following unsupervised training, though this could be done to make the low-level features more discriminative. And pg. 148; The output of the second convolutional layer is a series of 3D feature maps. To cope with variable-length sequences, we perform an additional max pooling in the temporal dimension. This ensures that the mid-level features can be reduced to a vector of consistent size. And pg. 149; The nonlinearities we used were identical to those in [7] with the exception of extending contrast normalization and downsampling to 3D: LCN was performed using a 9×9×9 smoothing filter, followed by 4×4×4 average downsampling.).
Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. ("User-level psychological stress detection from social media using deep neural network.") in view of Furukawa et al. ("Parallel grid-based recursive Bayesian estimation using GPU for real-time autonomous navigation."), Papadimitriou et al. (“Optimal multi-scale patterns in time series streams”), and He et al. (SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection).
Regarding Claim 26,
Liu, Papadimitriou, and Furukawa teach the computer-implemented method according to claim 1.
	Liu, Papadimitriou, and Furukawa do not explicitly disclose
	However, He (“SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection”) teaches 
pg. 335; Filters Wu,l and bias vectors bu,l are the trainable parameters of the network. The filter banks perform a 1D convolution operation on the input to produce multiple feature maps, each of which describes local information of the input. And pg. 336; The networks for all the other scales are copies of the finest scale networks, sharing all parameter values.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the CNN of Liu with the CNN of He.
Doing so would allow for improved performance (pg. 331; It takes only 0.45s to produce a saliency map for a 400 ×300 image… While SuperCNN produces comparable results to the state-of-the-art methods on the simple MSRA1000 dataset, it achieves much better performances than all other methods on the other two datasets that contain complex scenarios.). 

Relevant Prior Art not cited in the Rejection
Borchu et al. (“A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning”)
The prior art discloses a Bayesian model comprising a hierarchical structure, Gaussian convolutions, and recursions.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217.  The examiner can normally be reached on Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.










/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121