Detailed Action
This action is in response to Applicant's communications filed 24 January 2019.  
Claims 1-25 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 24 January 2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 3, 5, 7-9, 13, 21, and 23 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Brown et al. (Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection, hereinafter "Brown").

Regarding Claim 1,
Brown teaches one or more non-transitory computer-readable storage mediums having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
monitoring information ("Our methods are generally applicable to any computer system and logging source" sec. Abs, p. 1; "By modeling the normal distribution of events in system logs, the anomaly detection approach can discover complex relationships buried in these logs.") relating to one or more factors ("attention mechanisms allow an immediate view into what factors are affecting model decisions." sec. 2, p. 2) of an artificial intelligence (AI) network during operation of the network ("unsupervised recurrent neural network (RNN) language models for system log anomaly detection. By modeling the normal distribution of events in system logs, the anomaly detection approach can discover complex relationships buried in these logs." sec.1, p. 1), the network to receive input data and output a decision based at least in part on the input data ("the language models consume a sequence of log-line tokens and output log-line-level anomaly scores." sec. 3, p. 2);
determining attention received by the one or more factors of the network during the operation of the network based at least in part on the monitored information (Figure 7, Figure 8, Attention weights for each token; "Figure 7 shows the mean weights for the Fixed attention which has a single fixed query that does not change with the context at the current time step. The source user, destination domain and source PC dominate the weight vectors, suggesting that they are the most important fields to this model." sec. 5.1.1, p. 6);
determining one or more relationships between the attention received by the one or more factors and a decision of the network (""the attention mechanisms provide information on both feature importance and relational mapping between features. Additionally, architectural insights can be gleaned from the attention applied, which may in the future lead to designing more effective models." sec. 6, p. 8; "By modeling the normal distribution of events in system logs, the anomaly detection approach can discover complex relationships buried in these logs." sec. 1, p. 1); and
generating an analysis of the operation of the network based at least in part on the one or more relationships between attention received by the one or more factors and the decision of the network ("the attention mechanisms provide information on both feature importance and relational mapping between features. Additionally, architectural insights can be gleaned from the attention applied, which may in the future lead to designing more effective models." sec. 6, p. 8; As an example: "Figure 7 shows the mean weights for the Fixed attention which has a single fixed query that does not change with the context at the current time step. The source user, destination domain and source PC dominate the weight vectors, suggesting that they are the most important fields to this model." sec. 5.1.1, p. 6).

Regarding Claim 3,
Brown teaches the one or more mediums of claim 1. Brown further teaches wherein determining the one or more relationships includes generating one or more factor vectors, a factor vector indicating a grade or measure of attention that is received by a factor of one or more factors in generating the decision of the network with a corresponding set of input data ("In this work we use dot product attention (Figure 3), wherein an “attention vector” a is generated from three values: 1) a key matrix K, 2) a value matrix V, and 3) a query vector q." sec. 3.3, p. 3).

Regarding Claim 5,
Brown teaches the one or more mediums of claim 1. Brown further teaches wherein the monitoring of information includes one or more of monitoring a data store ("Our methods are generally applicable to any computer system and logging source" sec. Abs, p. 1; "By modeling the normal distribution of events in system logs, the anomaly detection approach can discover complex relationships buried in these logs."; it is noted the claim only requires one type of information).

Regarding Claim 7,
Brown teaches the one or more mediums of claim 1. Brown further teaches wherein operation of the network includes one or both of training ("We employ a simple online algorithm for training and prediction, which both allows our model to continually adapt to changing distributions of activities on a network and to be deployed on high throughput streaming data sources. " sec. 3.4, p. 4) and inference or other decisions-making of the network ("we illustrate two approaches to analysis of attention-equipped LSTM language models: 1) Analysis of global model behavior from summary statistics of attention weights, and 2) analysis of particular model decisions from case studies of attention weights and language model predictions." sec. 5, p. 6).

Regarding Claim 8,
Brown teaches the one or more mediums of claim 7. Brown further teaches wherein the network is a neural network ("unsupervised recurrent neural network (RNN) language models for system log anomaly detection. By modeling the normal distribution of events in system logs, the anomaly detection approach can discover complex relationships buried in these logs." sec.1, p. 1).

Regarding Claim 9,
Brown teaches the one or more mediums of claim 7. Brown further teaches upon determining that one or more factors are not receiving enough attention during training of the network, augmenting the input data with additional examples of the one or more factors to address the attention deficiency ("Attention provides a mechanism for predictions to be directly, selectively conditioned on a subset of the relevant tokens. In practice, this is accomplished by making p(t ) a function of the concatenation of h(t −1) with an attention vector a(t −1) that is a weighted sum over hidden states" sec. 3.3, p. 3; "This attention mechanism not only introduces shortcuts in the flow of information over time, allowing the model to more readily access the relevant information for any given prediction, but the weights on the weighted sum also yield insights into the model’s decision process, aiding interpretability." sec. 3.3, p. 3).

Regarding Claim 13,
Brown teaches the one or more mediums of claim 1. Brown further teaches wherein monitoring variables in a data storage includes compact indication to capture reduced data, the reduced data including less than all data relating to an address ("Figure 1 illustrates two methods to partition log lines into sequences of tokens: word and character tokenization. For word based language modeling, the tokens are the fields of the CSV format log file. The user fields are further split on the “@” character to generate user name and domain tokens. A frequency threshold is applied to replace infrequent words with an “out of vocabulary” (OOV) token; a value must occur in a field at least 40 times to be added to the vocabulary. The OOV token ensures that our models will have non-zero probabilities when encountering previously unseen words during evaluation." sec. 3.1.2, p. 2; tokenization teaches reduced data).

Regarding Claims 21 and 23,
Claim(s) 21 and 23 recite(s) a system including a processor and memory storing instructions for performing functions corresponding to the processor and non-transitory computer-readable storage medium recited in claim(s) 1 and 3, respectively.  Brown teaches the limitations of claim(s) 21 and 23 as set forth above in connection with claim(s) 1 and 3.  Therefore, claim(s) 21 and 23 is/are rejected under the same rationale as respective claim(s) 1 and 3.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 2, 4, 6, 14, 16-18, 22, and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Brown et al. (Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection, hereinafter "Brown") in view of Graves et al. (Symbolic Reasoning with Differentiable Neural Computers, hereinafter "Graves").

Regarding Claim 2,
Brown teaches the one or more mediums of claim 1. Brown does not explicitly teach wherein the attention for a factor includes measurement of a level of access to the factor during the operation of the network.
Graves teaches wherein the attention for a factor includes measurement of a level of access to the factor during the operation of the network ("While conventional computers use unique addresses to access memory contents, DNC uses differentiable attention mechanisms to define distributions over the rows, or locations, in the memory matrix. These distributions, which we call weightings, represent the degree to which each location is involved in a read or write operation, and are typically very sparse in a trained system." p.3).
Brown and Graves are analogous art because both are directed to neural networks with attention mechanisms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of Brown with the memory access factors of Graves.  The modification would have been obvious because one of ordinary skill in the art would be motivated to combine the advantages of neural and computational processing by providing a neural network with read-write access, as suggested by Grave (p. 1, para. 1).

Regarding Claim 4,
Brown teaches the one or more mediums of claim 1. Brown does not explicitly teach generating access statistics for the monitored information.
Graves teaches generating access statistics for the monitored information ("While conventional computers use unique addresses to access memory contents, DNC uses differentiable attention mechanisms to define distributions over the rows, or locations, in the memory matrix. These distributions, which we call weightings, represent the degree to which each location is involved in a read or write operation, and are typically very sparse in a trained system." p.3).
Brown and Graves are analogous art because both are directed to neural networks with attention mechanisms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of Brown with the memory access factors of Graves.  The modification would have been obvious because one of ordinary skill in the art would be motivated to combine the advantages of neural and computational processing by providing a neural network with read-write access, as suggested by Grave (p. 1, para. 1).

Regarding Claim 6,
Brown teaches the one or more mediums of claim 4. Brown does not explicitly teach wherein the monitored information includes data in a data storage, and wherein the access statistics include read statistics and write statistics for variables in the data storage.
Graves teaches wherein the monitored information includes data in a data storage, and wherein the access statistics include read statistics and write statistics for variables in the data storage ("While conventional computers use unique addresses to access memory contents, DNC uses differentiable attention mechanisms to define distributions over the rows, or locations, in the memory matrix. These distributions, which we call weightings, represent the degree to which each location is involved in a read or write operation, and are typically very sparse in a trained system." p.3).
Brown and Graves are analogous art because both are directed to neural networks with attention mechanisms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of Brown with the memory access factors of Graves.  The modification would have been obvious because one of ordinary skill in the art would be motivated to combine the advantages of neural and computational processing by providing a neural network with read-write access, as suggested by Grave (p. 1, para. 1).

Regarding Claim 14,
Brown teaches the one or more mediums of claim 1. Brown does not explicitly teach directing data regarding analysis of the operation of the network to an output device.
Graves teaches directing data regarding analysis of the operation of the network to an output device ("Figure 1: DNC Architecture. a: A recurrent controller network receives input from an external data source and produces output.").
Brown and Graves are analogous art because both are directed to neural networks with attention mechanisms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of Brown with the memory access factors of Graves.  The modification would have been obvious because one of ordinary skill in the art would be motivated to combine the advantages of neural and computational processing by providing a neural network with read-write access, as suggested by Grave (p. 1, para. 1).

Regarding Claims 16-18,
Claim(s) 16-18 recite(s) method directing data regarding analysis of the operation of the neural network to an output device corresponding to the functions performed by the processor and non-transitory computer-readable storage medium recited in claim(s) 14, 2, and 4, respectively.  The Brown/Graves combination teaches the limitations of claim(s) 16-18 as set forth above in connection with claim(s) 14, 2, and 4.  Therefore, claim(s) 16-18 is/are rejected under the same rationale as respective claim(s) 14, 2, and 4.

Regarding Claims 22 and 25,
Claim(s) 22 and 25 recite(s) a system including a processor and memory storing instructions for performing functions corresponding to the processor and non-transitory computer-readable storage medium recited in claim(s) 2 and 11, respectively.  Brown teaches the limitations of claim(s) 22 and 25 as set forth above in connection with claim(s) 2 and 11.  Therefore, claim(s) 22 and 25 is/are rejected under the same rationale as respective claim(s) 2 and 11.

Claim(s) 10-12 and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Brown et al. (Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection, hereinafter "Brown") in view of Yang et al. (Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning, hereinafter "Yang).

Regarding Claim 10,
Brown teaches the one or more mediums of claim 1. Brown does not explicitly teach wherein the monitoring of the variables in the data storage is performed by a performance monitoring unit (PMU).
Yang teaches wherein the monitoring of the variables in the data storage is performed by a performance monitoring unit (PMU) ("In order to obtain realistic estimate of energy consumption at design time of the CNN, we use the framework proposed in [21] that models the two sources of energy consumption in a CNN (computation and memory accesses), and use energy numbers extrapolated from actual hardware measurements [22]." sec, 1, p. 5688).
Brown and Yang are analogous art because both are directed to fine tuning neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of Brown with the energy regulation of Yang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to reduce energy consumption in neural network models, as suggested by Yang (Abstract, p. 5687).

Regarding Claim 11,
Brown teaches the one or more mediums of claim 1. Brown does not explicitly teach measuring energy required to generate the decision, wherein the analysis of the operation of the network is further based on the measured energy.
Yang teaches measuring energy required to generate the decision, wherein the analysis of the operation of the network is further based on the measured energy ("The key to closing the gap between CNN design and energy efficiency optimization is to directly use energy, instead of the number of weights or operations, as a metric to guide the design. In order to obtain realistic estimate of energy consumption at design time of the CNN, we use the framework proposed in [21] that models the two sources of energy consumption in a CNN (computation and memory accesses), and use energy numbers extrapolated from actual hardware measurements [22]. We then extend it to further model the impact of data sparsity and bitwidth reduction. The setup targets battery-powered platforms, such as smartphones and wearable devices, where hardware resources (i.e., computation and memory) are limited and energy efficiency is of utmost importance." sec. 1, p. 5688).
Brown and Yang are analogous art because both are directed to fine tuning neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of Brown with the energy regulation of Yang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to reduce energy consumption in neural network models, as suggested by Yang (Abstract, p. 5687).

Regarding Claim 12,
Brown teaches the one or more mediums of claim 11. Brown does not explicitly teach wherein the measured energy is a relative energy measurement.
Yang teaches wherein the measured energy is a relative energy measurement ("data reuse serves as a good metric for comparing relative energy impact of data" sec. 2.1, p. 5688-5689).
Brown and Yang are analogous art because both are directed to fine tuning neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of Brown with the energy regulation of Yang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to reduce energy consumption in neural network models, as suggested by Yang (Abstract, p. 5687).

Claim(s) 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Brown et al. (Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection, hereinafter "Brown") in view of Chorowski et al. (Attention-Based Models for Speech Recognition, hereinafter "Chorowski").

Regarding Claim 15,
Brown teaches the one or more mediums of claim 1. Brown does not explicitly teach adding input noise to the input noise and determining how the attention received by the one or more factors and the decision of the network are affected by the input noise
Chorowski teaches adding input noise to the input noise ("As TIMIT is a relatively small dataset, proper regularization is crucial. We used the adaptive weight noise as a main regularizer [22]. We first trained our models with a column norm constraint [23] with the maximum norm 1 until the lowest development negative log-likelihood is achieved.3 During this time, e and ρ are set to 10−8 and 0.95, respectively. At this point, we began using the adaptive weight noise, with the model complexity cost LC divided by 10, while disabling the column norm constraints. Once the new lowest development log-likelihood was reached, we fine-tuned the model with a smaller e = 10−10, until we did not observe the improvement in the development phoneme error rate (PER) for 100K weight updates." sec. 4, p. 5-6); and 
determining how the attention received by the one or more factors and the decision of the network are affected by the input noise ("Evaluated Models We evaluated the ARSGs with different attention mechanisms. The encoder was a 3-layer BiRNN with 256 GRU units in each direction, and the activations of the 512 top-layer units were used as the representation h. The generator had a single recurrent layer of 256 GRU units. Generate in Eq. (3) had a hidden layer of 64 maxout units. The initial states of both the encoder and generator were treated as additional parameters. Our baseline model is the one with a purely content-based attention mechanism (See Eqs. (5)–(7).) The scoring network in Eq. (7) had 512 hidden units. The other two models use the convolutional features in Eq. (8) with k = 10 and r = 201. One of them uses the smoothing from Sec. 2.3." sec. 4, p. 6).
Brown and Chorowski are analogous art because both are directed to neural networks with attention mechanisms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of Brown with the noise of Chorowski.  The modification would have been obvious because one of ordinary skill in the art would be motivated to regularize for small data sets, as suggested by Chorowski (sec. 4, p. 5).

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Brown et al. (Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection, hereinafter "Brown") in view of Graves et al. (Symbolic Reasoning with Differentiable Neural Computers, hereinafter "Graves") and Yang et al. (Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning, hereinafter "Yang).

The Brown/Graves combination teaches the method of claim 16. The Brown/Graves combination does not explicitly teach measuring energy required to generate the decision, wherein the analysis of the operation of the network is further based on the measured energy.
Yang teaches measuring energy required to generate the decision, wherein the analysis of the operation of the network is further based on the measured energy ("The key to closing the gap between CNN design and energy efficiency optimization is to directly use energy, instead of the number of weights or operations, as a metric to guide the design. In order to obtain realistic estimate of energy consumption at design time of the CNN, we use the framework proposed in [21] that models the two sources of energy consumption in a CNN (computation and memory accesses), and use energy numbers extrapolated from actual hardware measurements [22]. We then extend it to further model the impact of data sparsity and bitwidth reduction. The setup targets battery-powered platforms, such as smartphones and wearable devices, where hardware resources (i.e., computation and memory) are limited and energy efficiency is of utmost importance." sec. 1, p. 5688).
Brown and Yang are analogous art because both are directed to fine tuning neural network. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of the Brown/Graves combination with the energy regulation of Yang.  The modification would have been obvious because one of ordinary skill in the art would be motivated to reduce energy consumption in neural network models, as suggested by Yang (Abstract, p. 5687).

Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Brown et al. (Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection, hereinafter "Brown") in view of Graves et al. (Symbolic Reasoning with Differentiable Neural Computers, hereinafter "Graves") and Chorowski et al. (Attention-Based Models for Speech Recognition, hereinafter "Chorowski")

The Brown/Graves combination teaches the method of claim 16. The Brown/Graves combination does not explicitly teach adding input noise to the input noise and determining how the attention received by the one or more factors and the decision of the network are affected by the input noise.
Chorowski teaches adding input noise to the input noise ("As TIMIT is a relatively small dataset, proper regularization is crucial. We used the adaptive weight noise as a main regularizer [22]. We first trained our models with a column norm constraint [23] with the maximum norm 1 until the lowest development negative log-likelihood is achieved.3 During this time, e and ρ are set to 10−8 and 0.95, respectively. At this point, we began using the adaptive weight noise, with the model complexity cost LC divided by 10, while disabling the column norm constraints. Once the new lowest development log-likelihood was reached, we fine-tuned the model with a smaller e = 10−10, until we did not observe the improvement in the development phoneme error rate (PER) for 100K weight updates." sec. 4, p. 5-6); and 
determining how the attention received by the one or more factors and the decision of the network are affected by the input noise ("Evaluated Models We evaluated the ARSGs with different attention mechanisms. The encoder was a 3-layer BiRNN with 256 GRU units in each direction, and the activations of the 512 top-layer units were used as the representation h. The generator had a single recurrent layer of 256 GRU units. Generate in Eq. (3) had a hidden layer of 64 maxout units. The initial states of both the encoder and generator were treated as additional parameters. Our baseline model is the one with a purely content-based attention mechanism (See Eqs. (5)–(7).) The scoring network in Eq. (7) had 512 hidden units. The other two models use the convolutional features in Eq. (8) with k = 10 and r = 201. One of them uses the smoothing from Sec. 2.3." sec. 4, p. 6).
Brown and Chorowski are analogous art because both are directed to neural networks with attention mechanisms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network of the Brown/Graves combination with the noise of Chorowski.  The modification would have been obvious because one of ordinary skill in the art would be motivated to regularize for small data sets, as suggested by Chorowski (sec. 4, p. 5).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477. The examiner can normally be reached M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES C KUO/Examiner, Art Unit 2126    
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126