Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in response to the amendments filed 06/17/2022. Claims 1, 3, 8, 10, 15, 17 have been amended, claims 2, 4, 9, and 16 have been cancelled. Claims 1, 3, 5-8, 10-15, 17-20 are currently pending.
	
Response to Arguments
Claims 2, 4, 9, and 16 have been cancelled, therefore the rejections of claims 2, 4, 9, and 16 no longer stand.
In light of Applicant’s amendment, the 112(b) rejection of claim 3 has been withdrawn.
Applicant’s arguments regarding the prior art rejection have been fully considered but are moot because of the new grounds of rejection. Applicant argues on pages 8-9 that “One skilled in the art would not find motivation in Hasan or Xu to combine convolutional and deconvolutional layers of Hasan used in a classification task with factor analysis process taught be Xu”, and that the prior art has been improperly applied with hindsight. Examiner notes that both convolution and factor analysis are well-known, popular mathematical operations and, as per MPEP 2143, prior art references may be combined when one known technique is applied to a known device to obtain predictable results. Therefore, one of ordinary skill may substitute the known element of an inner product of a factor analyzer with the known element of convolution in order to obtain a predictable result. The prior art rejections have been updated to include the amended limitations and to clarify the reasoning given for the limitations that were not amended.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 8, 11-12, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hasan et al* (“Learning Temporal Regularity in Video Sequences”, herein Hasan), in view of Chen et al* (“Deep Learning with Hierarchical Convolutional Factor Analysis”, herein Chen).
*this document was listed in the IDS from 02/11/2019, therefore a copy has not been attached to this office action.
Regarding claim 1, Hasan teaches a method (the abstract recites “we propose two methods that are built upon the autoencoders for their ability to work with little to no supervision”), comprising: 
receiving training data as input (section 1 para. 6 recites “We train our models using multiple datasets including CUHK Avenue [8], Subway (Enter and Exit) [11], and UCSD Pedestrian datasets (Ped1 and Ped2) [12], without compensating the dataset bias [13].” (i.e. receiving training data)); 
training a machine learning model using the training data (fig. 2 shows the training process on the left side of the figure), wherein the machine learning model comprises multiple layers and utilizes convolution (section 3.2.1 para 1-2 recites “Figure 4 illustrates the architecture of our fully convolutional autoencoder. The encoder consists of convolutional layers [31] and the decoder consists of deconvolutional layers that are the reverse of the encoder with padding removal at the boundary of images. We use three convolutional layers and two pooling layers on the encoder side and three deconvolutional layers and two unpooling layers on the decoder side by considering the size of input cuboid and training data” (i.e. the machine learning model has multiple layers and uses convolution)); 
receiving sensor data as input (section 1 para. 6 and figs. 8-12 recite the use of camera videos as input data to the model, the broadest reasonable interpretation of receiving sensor data includes receiving video input from a series of cameras); 
and detecting an anomaly in the sensor data using the trained machine learning model (section 4.5 para. 1 and Table 1 recite “As our model learns the temporal regularity, it can be used for detecting anomalous events in a weakly supervised manner.”).
However, Hasan does not teach wherein the machine learning model utilizes a deep convolutional factor analyzer having linear Gaussian nodes, each of the Gaussian nodes representing a variable at a particular time in a particular layer, wherein variables at a bottom layer are independent and variables at higher layers gain temporal and spatial dependency through up-sampling and convoluting variables at each layer with variables of the next higher layer.
Chen teaches wherein the machine learning model utilizes a deep convolutional factor analyzer (the abstract recites “The model is represented using a hierarchical convolutional factor-analysis construction, with sparse factor loadings and scores. The computation of layer-dependent model parameters is implemented within a Bayesian setting, employing a Gibbs sampler and variational Bayesian (VB) analysis, that explicitly exploit the convolutional nature of the expansion”) having linear Gaussian nodes, each of the Gaussian nodes representing a variable at a particular time in a particular layer (Section III A para. 1 recites “We employ a model of the form:

    PNG
    media_image1.png
    193
    598
    media_image1.png
    Greyscale

where J denotes the number of pixels in the dictionary elements dk and dkj is the j-th component of dk. Since the same basic model is used at each layer of the hierarchy, in (5) we do not employ model-layer superscripts, for generality. The integer P denotes the number of pixels in Xn, and IP represents a P × P identity matrix. The hyperparameters (e, f) and (g,h) are set to favor large αnki and βj, thereby imposing that the set of wnki will be compressible or approximately sparse, with the same found useful for the dictionary elements dk (which yields dictionary elements that look like sparse “sketches” of images, as shown in Figure 1).” Section III D para. 1 recites “In (5), recall that upon marginalizing out the precisions αnki we are imposing a Student-t prior on the weights wnki [26]. Hence, with appropriate settings of hyperparameters (e, f), the Student-t imposes that the factor scores wnki should be (nearly) sparse. This is closely connected to the model in [1], [8], [9], [10], in which an ℓ1 regularization is imposed on wnki, and a single (point) estimate is inferred on all model parameters. In [1] the authors also effectively imposed a Gaussian prior on ϵn, as we have in (5)” (i.e. using a Gaussian prior distribution, or Gaussian nodes, to represent a variable in a given layer)), wherein variables at a bottom layer are independent and variables at higher layers gain temporal and spatial dependency through up-sampling and convoluting variables at each layer with variables of the next higher layer (Section II A para. 1 recites “The nth, image to be analyzed is Xn ϵ RnyxnxxKc, where Kc is the number of color channels (e.g., for gray-scale images Kc = 1, while for RGB images Kc = 3). We consider N images {Xn}n=1,N, and each image Xn is expanded in terms of a dictionary, with the dictionary defined by compact canonical elements dk ϵ Rn’yxn’xxKc, with n’x << nx and n’y << ny. The dictionary elements are designed to capture local structure within Xn, and all possible two-dimensional (spatial) shifts of the dictionary elements are considered for representation of Xn. For K canonical dictionary elements the cumulative dictionary is {dk}k=1,K. In practice the number of dictionary elements K is made large, and we wish to infer the subset of dictionary elements needed to sparsely render Xn as 

    PNG
    media_image2.png
    80
    261
    media_image2.png
    Greyscale

where ∗ is the convolution operator and bnk ϵ {0, 1} indicates whether dk is used to represent Xn, and represents the residual. The matrix Wnk represents the weights of dictionary k for image Xn, and the support of Wnk is 
    PNG
    media_image3.png
    40
    235
    media_image3.png
    Greyscale
, allowing for all possible shifts, as in a typical convolutional model”. Section II B para. 1 recites “For the nth image Xn and dictionary element dk, we have a set of coefficients {Wnki}iϵℓ corresponding to all possible shifts in the set ℓ. A “max-pooling” step is applied to each Wnk , with this employed previously in deep models and in recent related image processing analysis”. Section II B para. 2 recites “After fitting the model at the second layer, to move to layer three, max-pooling is again performed, yielding a level-three tensor for each image, with which factor analysis is again performed. Note that, because of the max-pooling step, the number of spatial positions in such images decreases as one moves to higher levels. Therefore, the basic computational complexity decreases with increasing layer within the hierarchy. This process may be continued for additional layers; in the experiments we consider up to three layers” (i.e. variables gain less complexity or greater dependency as up-sampling/expansion and convolution are applied at each layer)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the hierarchical convolutional factor analysis from Chen to replace the convolutional operations from the autoencoder in Hasan. Hasan and Chen are both directed to applying convolutional methods to image analysis, but Hasan does not teach a model that utilizes Gaussian nodes or up-sampling variables at every layer. One of ordinary skill would be motivated to apply the known technique of convolutional factor analysis from Chen to the known anomaly detection system utilizing convolution from Hasan in order to yield a predictable result and improve performance of the system from Hasan.
Regarding claim 5, the combination of Hasan and Chen teaches method of claim 1, wherein the anomaly is indicative of a sensed operational parameter of a machine having a value outside of a normal operating range (Hasan section 2 para. 3 recites “One of the applications of our model is abnormal or anomalous event detection. The survey paper [6] contains a comprehensive review of this topic. Most video-based anomaly detection approaches involve a local feature extraction step followed by learning a model on training video. Any event that is an outlier with respect to the learned model is regarded as the anomaly” (i.e. a value outside of a normal operating range). Hasan section 1 para. 6 and figs. 8-12 recite the use of camera videos as input data to the model, the broadest reasonable interpretation of receiving sensor data includes receiving video input from a series of cameras).
Claim 8 is a system claim and its limitation is included in claim 1. The only difference is that claim 8 requires a system (Hasan section 2 para. 5 recites “For an end-to-end learning system for regularity in videos, we employ the convolutional autoencoder”). Therefore, claim 8 is rejected for the same reasons as claim 1.
Claim 11 is a system claim and its limitation is included in claim 1. Claim 11 is rejected for the same reasons as claim 1.
Claim 12 is a system claim and its limitation is included in claim 5. Claim 12 is rejected for the same reasons as claim 5.
Claim 15 is a computer program product claim and its limitation is included in claim 1. The only difference is that claim 15 requires a computer program product (Hasan section 4 para. 1 recites “We learn the model using multiple video datasets, totaling 1 hour 50 minutes, and evaluate our method both qualitatively and quantitatively. We modify1 and use Caffe [59] for all of our experiments on NVIDIA Tesla K80 GPUs”). Therefore, claim 15 is rejected for the same reasons as claim 1.
Claim 18 is a computer program product claim and its limitation is included in claim 1. Claim 18 is rejected for the same reasons as claim 1.

Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Hasan et al* (“Learning Temporal Regularity in Video Sequences”, herein Hasan) in view of Chen et al* (“Deep Learning with Hierarchical Convolutional Factor Analysis”, herein Chen), in further view of Ahmad et al* (“Real-Time Anomaly Detection for Streaming Analytics”, herein Ahmad).
*this document was listed in the IDS from 02/11/2019, therefore a copy has not been attached to this office action.
Regarding claim 3, the combination of Hasan and Chen teaches the method of claim 1, 
However, the combination of Hasan and Chen does not explicitly teach wherein detecting the anomaly comprises: determining, using the trained machine learning model, a probability of a time series value at a t-th temporal location and an i-th spatial dimension of a time series of the sensor data; and performing a comparison of the probability to a threshold value, classifying the time series value as normal or anomalous data based on the comparison.
Ahmad teaches wherein detecting the anomaly comprises: determining, using the trained machine learning model, a probability of a time series value at a t-th temporal location and an i-th spatial dimension of a time series of the sensor data (section 2 para. 8 recites “In this paper we focus on using Hierarchical Temporal Memory (HTM) for anomaly detection. HTM is a machine learning algorithm derived from neuroscience that models spatial and temporal patterns in streaming data” (i.e. spatial and temporal dimensions of the time series values are considered). Section 3.2 para. 2 recites “Rather than thresholding the raw score directly, we model the distribution of anomaly scores and use this distribution to check for the likelihood that the current state is anomalous. The anomaly likelihood is thus a metric defining how anomalous the current state is based on the prediction history of the HTM model” (i.e. determining the probability of the time series data)); 
and performing a comparison of the probability to a threshold value, classifying the time series value as normal or anomalous data based on the comparison (section 3.2 para. 3 recites “We then compute a recent short-term average of anomaly scores, and apply a threshold to the Gaussian tail probability (Q-function, (Karagiannidis & Lioumpas, 2007)) to decide whether or not to declare an anomaly. We define the anomaly likelihood (Lt) as the complement of the tail probability. We threshold Lt and report an anomaly if it is very close to 1 (i.e. if the probability fails to satisfy a threshold the data it is considered anomalous)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by adding the methods of determining the time series probability and the threshold comparisons from Ahmad to the anomaly detection methods from Hasan (as modified by Chen), as Hasan and Ahmad are both directed to detecting anomalies in time series data. Ahmad section 3.2 para. 1 recites “The raw anomaly score described above represents an instantaneous measure of the predictability of the current input stream. This works well for predictable scenarios but in many practical applications, the underlying system is inherently noisy and unpredictable. In these situations it is often the change in predictability that is indicative of anomalous behavior.” Therefore, one of ordinary skill would benefit adding the anomaly likelihood metric analysis from Ahmad, as it would improve the performance of the methods from Hasan by making them more robust in less predictable scenarios.
Claim 10 is a system claim and its limitation is included in claim 3. Claim 10 is rejected for the same reasons as claim 3.
Claim 17 is a computer program product claim and its limitation is included in claim 3. Claim 17 is rejected for the same reasons as claim 3.

Claims 6-7, 13-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hasan et al* (“Learning Temporal Regularity in Video Sequences”, herein Hasan) in view of Chen et al* (“Deep Learning with Hierarchical Convolutional Factor Analysis”, herein Chen), in further view of Xu et al* (“Bayesian Wavelet PCA Methodology for Turbomachinery Damage Diagnosis Under Uncertainty”, herein Xu).
*this document was listed in the IDS from 02/11/2019, therefore a copy has not been attached to this office action
Regarding claim 6, the combination of Hasan and Chen teaches method of claim 1, wherein the training data comprises training time series data (Hasan section 1 para. 6 recites “We train our models using multiple datasets including CUHK Avenue [8], Subway (Enter and Exit) [11], and UCSD Pedestrian datasets (Ped1 and Ped2) [12], without compensating the dataset bias [13].” (i.e. time series training data));
iterating through the multiple layers of the machine learning model to recompute the training time series data and to estimate output parameters, determining that output criteria are satisfied, and outputting the output parameters associated with a most recent iteration (Hasan fig. 5 shows the relationship between the length of the input time series and the number of iterations required to reach convergence. The description of fig. 5 recites “Effect of temporal length (T) of input video cuboid. (Left) X-axis is the increasing number of iterations, Y-axis is the training loss, and three plots correspond to three different values of T. (Right) X-axis is the increasing number of video frames and Y-axis is the regularity score. As T increases, the training loss takes more iterations to converge as it is more likely that the inputs with more channels have more irregularity to hamper learning regularity. On the other hand, once the model is learned, the regularity score (i.e. the output) is more distinguishable for higher values of T between regular and irregular regions.“ Examiner’s Note: one of ordinary skill would understand that convergence is another manner of determining that output criteria are satisfied). 
However, the combination of Hasan and Chen does not explicitly teach wherein training the machine learning model comprises: initializing the training time series data using principal component analysis to obtain multiple layers of the machine learning model; iterating through the multiple layers of the machine learning model to recompute the training time series data and to estimate output parameters; determining that output criteria are satisfied, and outputting the output parameters associated with a most recent iteration.
Xu teaches wherein training the machine learning model comprises: initializing the training time series data using principal component analysis to obtain multiple layers of the machine learning model (section 3.3 para. 1 recites “After the multivariate time series data are cleaned, the probabilistic principal component analysis (PPCA) approach is developed in this section to (1) reduce data dimensionality, (2) address the multivariate correlation, and (3) consider data uncertainty. Principal component analysis (PCA) [26] is a well-established statistical method for dimensionality reduction and has been widely applied in data compression, image processing, exploratory data analysis, pattern recognition, and time series prediction” (i.e. using principal component analysis on the training data). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the probabilistic principal component analysis methods from Xu to reduce the dimensionality of the multivariate input data from Hasan (as modified by Chen). Xu and Hasan are both directed to detecting anomalies in time series data. One of ordinary skill would benefit from using the probabilistic principal component analysis methods from Xu to simplify the multivariate input data from Hasan, which would improve the performance of the machine learning model from Hasan by allowing the model to process more data in a faster manner.
Regarding claim 7, the combination of Hasan, Chen, and Xu teaches method of claim 6, wherein determining that the output criteria are satisfied comprises: 
determining a current lower bound of the training time series data responsive, at least in part, to completion of the most recent iteration; and determining that a difference between the current lower bound of the training time series data and a lower bound associated with a previous iteration satisfies a threshold value (Hasan section 4.5 para. 2 recites “We find the local minimas in the time series of regularity scores to detect abnormal events. However, these local minima are very noisy and not all of them are meaningful local minima. We use the persistence1D [61] algorithm to identify meaningful local minima and span the region with a fixed temporal window (50 frames) and group nearby expanded local minimal regions when they overlap to obtain the final abnormal temporal regions. Specifically, if two local minima are within fifty frames of one another, they are considered to be a part of same abnormal event. We consider a detected abnormal region as a correct detection if it has at least fifty percent overlap with the ground truth” (i.e. determining a lower bound of the training time series data and comparing the difference between lower bounds to satisfy a threshold value)).
Claim 13 is a system claim and its limitation is included in claim 6. Claim 13 is rejected for the same reasons as claim 6.
Claim 14 is a system claim and its limitation is included in claim 7. Claim 14 is rejected for the same reasons as claim 7.
Claim 19 is a computer program product claim and its limitation is included in claim 6. Claim 19 is rejected for the same reasons as claim 6.
Claim 20 is a computer program product claim and its limitation is included in claim 7. Claim 20 is rejected for the same reasons as claim 7.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Multi-Task Learning for Bayesian Matrix Factorization (Yuan) teaches utilizing Bayesian matrix factorization with Dirichlet process mixtures in a multi-task setting for collaborative filtering.
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations (Lee et al) teaches a translation invariant hierarchical generative model which supports both top-down and bottom-up probabilistic inference by stacking convolutional restricted Boltzmann machines into a multilayer architecture analogous to a deep belief network using probabilistic max-pooling.
Tang et al (Deep Mixtures of Factor Analysers) teaches a greedy layer-wise learning algorithm to share each lower-level factor loading matrix by many different higher level MFAs and prevent overfitting.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571)272-8350. The examiner can normally be reached on M-F 0800-1700.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	/L.M.F./             Examiner, Art Unit 2121                    



	/Li B. Zhen/             Supervisory Patent Examiner, Art Unit 2121