DETAILED ACTION
This action is in response to the amendment  filed 04/26/2022. Claims 1, 8, 19, 23 have been amended. Claims 1-25  are currently pending and have been examined.

				Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments filed 04/26/2022 have been fully considered and are persuasive.
	Specifically regarding the use of Hsu et al for the 35 U.S.C. 103 rejection. In response examiner has updated the rejection.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/26/2022 has been entered.

Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the “determines at least two decoded RNNs based on at least two temporal context vectors” must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered. Examiner notes that in paragraph 0065 of the specification element 618 and 620 are described: “Similarly, at 618, a temporal context vector can be determined by the computing component for TS2, and so on, such that at 620”. Figure 6A depicts the element 618 and 620 depicting the “Temporal Context vector for Ts1”. The drawing should indicate as described in the specification a “Temporal context vector for Ts2” and a “Temporal context vector for Tsd” for 618 and 620, respectively. 
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 3-6, 8, 10-13, 15, 17, 18, 19, 20-25 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Celikyilmaz et al US document ID US 20190287012 A1, hereinafter Celikyilmaz.

Regarding claim 1 and 8
	Celikyilmaz teaches, 	A system, comprising: a memory that stores computer-executable components; a processor, operably coupled to the memory, and that executes the computer-executable components stored in the memory, wherein the computer-executable components comprise: (¶0068 “Hardware-implemented components can provide information to, and receive information from, other hardware-implemented components. Accordingly, the described hardware-implemented components can be regarded as being communicatively coupled…for example, through the storage and retrieval of information in memory structures to which the multiple hardware-implemented components have access”) a computing component that encodes time series data with at least two recurrent neural networks (RNNs)… wherein a first RNN of the at least two RNNs encodes a first time series of the at least two time series and a second RNN of the at least two RNNs encodes a second time series; (¶002 Background “The encoder may be an RNN like the encoder (as is usually the case in language-related applications)” ¶0026 “FIG. 1 is a block diagram schematically illustrating an encoder-decoder neural network architecture… a plurality of multi-layer encoder agents 104, 105, 106, each taking a portion of the input as an input sequence…” ¶0035 “In some embodiments, as mentioned above, all encoder agents 200 within the encoder-decoder network 100 share the same structure and, in particular, the same number of layers.” The multiple encoders in some embodiments are recurrent neural networks each encoding their own portion of a time series. As shown in the Figure 1, each agent operates on a different time series or input sequence.) and determines at least two decoded RNNs based on at least two temporal context vectors, (¶0028 “The output of the encoder agents 104, 105, 106 is fed, via a hierarchical attention mechanism 110, into the decoder 112.” ¶0041 “where v2, W5, W6 and b1 are shared learned parameters of the token attention networks”  The output of each encoder output depicted in Fig. 1 is a temporal context vector fed into the hierarchical attention mechanism. The token attention networks are each of the decoded RNNs because it operates on input vectors to be processed by a decoder stage. These networks are determined via learning. The output of the encoders are temporal context vectors because the encoders are LSTMs whose output are based on hidden temporal vectors and message vector which provides context for the output of the contextual layer shown in Figure 3. ¶0039 “In some embodiments, the function ƒ projects the message vector                         
                            
                                
                                    z
                                
                                
                                    a
                                
                                
                                    k
                                
                            
                        
                     with the agent's previous encoding                         
                            
                                
                                    h
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                        
                     of the input sequence… The function ƒ combines the information sent by the other agents with the context of the current token from the paragraph processed by agent a, yielding different features about the current context in relation to other topics in the document d.”) to determine temporal dependencies and lagged dependencies in at least two time series data, (¶0039 “Herein, v1, W3, and W4, are learned network parameters that may (but need not) be shared across all agents” and Figure 3. As shown,  an encoder model which is a LSTM network learns to determine the parameters h and Z which are temporal dependencies and lagged dependencies because they are latent values depend on the future and past states.) and determines an inter-time series dependence context vector (¶0041 “First, for each encoder agent a, the associated token-attention network (114, 115, or 116) computes a token attention distribution… Using the token attention distributions… a new token context vector                         
                            
                                
                                    C
                                
                                
                                    a
                                
                                
                                    t
                                
                            
                        
                    … can be computed at each time step t for each agent a as a weighted sum of the top-layer hidden-state output vectors” “the agent context vector                         
                            
                                
                                    C
                                
                                
                                    t
                                
                                
                                    *
                                
                            
                        
                     * is computed using a hierarchical attention mechanism 110” each agent determines a context vector, corresponding to the inter-time series dependence context vector, because it is generated based on the inter time series dependencies discussed above and is used to compute an ‘agent context vector’.) and an RNN dependence decoder; and an analysis component that determines inter-time series dependencies in the at least two time series data and forecast values for one or more time series data based on an RNN encoder and the RNN dependence decoder with an attention mechanism based neural network. (Figure 1 and ¶0040 “With reference to FIG. 4… The decoder 112 may be an RNN, such as, for instance, a single-layer LSTM… At each time step t, the decoder 112 predicts a new a new token y.sub.t in the output sequence… and further attending to relevant input context provided by the agents, as reflected in the agent context vector c.sub.t*” the decoder predicts a forecast value for the time series data at each time step. This predicts is based on the determined agent context vector or inter-time series dependencies in the input data. As discussed above in depicted in figure 1, the context vector is generated using a set of attention networks or attention mechanism based neural network.) a combining component that combines and transposes the at least two decoded RNNs (Examiner notes that para. 065 of the specification describes that the decoded dependencies are the outputs that are combined and transposed. Additionally, Fig. 6A and 6B of the specification depict context vectors combined in the transformation stage. ¶0042 “The context vectors c.sub.a.sup.t for the plurality of agents are fed as input into the agent-attention network… at the second level of the hierarchical attention mechanism 110, which decides, conceptually speaking, which encoder's information is more relevant to the current decoding time step t. This is accomplished by weighting the token context vectors c.sub.a.sup.t with an agent attention distribution… The agent attention distribution g t may be computed, for example, according to … 
    PNG
    media_image1.png
    41
    347
    media_image1.png
    Greyscale
… the overall agent context vector c.sub.t* can be computed as … 
    PNG
    media_image2.png
    43
    170
    media_image2.png
    Greyscale
” as noted by the authors the agent attention distribution affectively decides which encoder is most important at a given time step, in this way the encoder of interest is transposed or rearranged at each time step. Finally, the overall context vector is a combination of the output of the decoded RNNs.)

Regarding claim 15 and 19
	Celikyilmaz teaches, 	A computer-implemented method, a computing component operatively coupled to a processor, determining by a combining component operatively coupled to the processor, an analysis component operatively coupled to the processor (¶0068 “Hardware-implemented components can provide information to, and receive information from, other hardware-implemented components. Accordingly, the described hardware-implemented components can be regarded as being communicatively coupled…for example, through the storage and retrieval of information in memory structures to which the multiple hardware-implemented components have access”) Encoding time series data with at least two recurrent neural networks (RNNs)… wherein a first RNN of the at least two RNNs encodes a first time series of the at least two time series and a second RNN of the at least two RNNs encodes a second time series  (¶002 Background “The encoder may be an RNN like the encoder (as is usually the case in language-related applications)” ¶0026 “FIG. 1 is a block diagram schematically illustrating an encoder-decoder neural network architecture… a plurality of multi-layer encoder agents 104, 105, 106, each taking a portion of the input as an input sequence…” ¶0035 “In some embodiments, as mentioned above, all encoder agents 200 within the encoder-decoder network 100 share the same structure and, in particular, the same number of layers.” The multiple encoders in some embodiments are recurrent neural networks each encoding their own portion of a time series. As shown in the Figure 1, each agent operates on a different time series or input sequence.)  determining at least two decoded RNNs based on at least two temporal context vectors (¶0028 “The output of the encoder agents 104, 105, 106 is fed, via a hierarchical attention mechanism 110, into the decoder 112.” ¶0041 “where v2, W5, W6 and b1 are shared learned parameters of the token attention networks”  The output of each encoder output depicted in Fig. 1 is a temporal context vector fed into the hierarchical attention mechanism. The token attention networks are each a decoded RNN because it operates on input vectors to be processed by a decoder stage. These networks are determined via learning. The output of the encoders are temporal context vectors because the encoders are LSTMs whose output are based on hidden temporal vectors and message vector which provides context for the output of the contextual layer shown in Figure 3. ¶0039 “In some embodiments, the function ƒ projects the message vector za(k) with the agent's previous encoding                         
                            
                                
                                    h
                                
                                
                                    i
                                
                                
                                    k
                                
                            
                        
                     of the input sequence… The function ƒ combines the information sent by the other agents with the context of the current token from the paragraph processed by agent a, yielding different features about the current context in relation to other topics in the document d.”) to determine temporal dependencies and lagged dependencies in the at least two time series data, (¶0039 “Herein, v1, W3, and W4, are learned network parameters that may (but need not) be shared across all agents” and Figure 3. As shown, learning an encoder model which is a LSTM network learns to determine the parameters h and Z which are temporal dependencies and lagged dependencies because they are latent values depend on the future and past states.) and determining, by the combining component, an inter-time series dependence context vector (¶0041 “First, for each encoder agent a, the associated token-attention network (114, 115, or 116) computes a token attention distribution… Using the token attention distributions… a new token context vector                         
                            
                                
                                    C
                                
                                
                                    a
                                
                                
                                    t
                                
                            
                        
                    … can be computed at each time step t for each agent a as a weighted sum of the top-layer hidden-state output vectors” “the agent context vector c.sub.t* is computed using a hierarchical attention mechanism 110” each agent determines a context vector, corresponding to the inter-time series dependence context vector, because it is generated based on the inter time series dependencies discussed above and is used to compute an ‘agent context vector’.) and an RNN dependence decoder;  and determining…inter-time series dependencies in the at least two time series data and forecast values for one or more time series data based on an RNN encoder and the RNN dependence decoder with an attention mechanism based neural network (Figure 1 and ¶0040 “With reference to FIG. 4… The decoder 112 may be an RNN, such as, for instance, a single-layer LSTM… At each time step t, the decoder 112 predicts a new a new token y.sub.t in the output sequence… and further attending to relevant input context provided by the agents, as reflected in the agent context vector c.sub.t*” the decoder predicts a forecast value for the time series data at each time step. This predicts is based on the determined agent context vector or inter-time series dependencies in the input data. As discussed above in depicted in figure 1, the context vector is generated using a set of attention networks or attention mechanism based neural network.) combining, by a combining component operatively coupled to the processor, the at least two decoded RNNs, transposing by the combining component, the at least two decoded RNNs (Examiner notes that para. 065 of the specification describes that the decoded  dependencies are the outputs that are combined and transposed. Additionally, Fig. 6A and 6B of the specification depict context vectors combined in the transformation stage. ¶0042 “The context vectors c.sub.a.sup.t for the plurality of agents are fed as input into the agent-attention network… at the second level of the hierarchical attention mechanism 110, which decides, conceptually speaking, which encoder's information is more relevant to the current decoding time step t. This is accomplished by weighting the token context vectors c.sub.a.sup.t with an agent attention distribution… The agent attention distribution g t may be computed, for example, according to … 
    PNG
    media_image1.png
    41
    347
    media_image1.png
    Greyscale
… the overall agent context vector c.sub.t* can be computed as … 
    PNG
    media_image2.png
    43
    170
    media_image2.png
    Greyscale
” as noted by the authors the agent attention distribution affectively decides which encoder is most important at a given time step, in this way the encoder of interest is transposed or rearranged at each time step. Finally, the overall context vector is a combination of the output of the decoded RNNs.)

Regarding claim 3/10
	Celikyilmaz teaches claims 1/8 
Further Celikyilmaz teaches, wherein the computing component further determines converged RNNs by iteratively encoding the at least two RNNs with the respective time series data. (¶0062 “FIG. 7 is a flow chart of an example method 700 for training an encoder-decoder neural network 100, 502. In accordance with various embodiments, the neural network 100, 502 is trained end-to-end, i.e., the network parameters of the encoder agents, decoder, attention networks, and, if applicable, pointer network are all optimized jointly based on a shared loss function… This pre-training stage involves iteratively computing the probabilities of the ground-truth output sequences from the respective input sequence for all training examples”)

Regarding claim 4/11/17/20/24
	Celikyilmaz teaches claim 1/8/15/19/23
Further Celikyilmaz teaches, encoding the at least two RNNs and the combining component combining the at least two decoded RNNs are performed jointly and concurrently. (¶0062 “FIG. 7 is a flow chart of an example method 700 for training an encoder-decoder neural network 100, 502. In accordance with various embodiments, the neural network 100, 502 is trained end-to-end, i.e., the network parameters of the encoder agents, decoder, attention networks, and, if applicable, pointer network are all optimized jointly based on a shared loss function…” training jointly end-to-end is interpreted to mean gradient terms are propagated from the encoder input to the decoded output of the composite RNN, meaning that learning is done concurrently for all parts of the system on each backpropagation.)

Regarding claim 5/12/18/21/25
	Celikyilmaz teaches claim 1/8/15/19/23
Further Celikyilmaz teaches, wherein the at least two RNNs comprise a long-short term memory neural network. (¶0010 “Each encoder agent includes, in some embodiments, a local encoder and a multi-layer contextual encoder…. The local encoders and the layers of the contextual encoders of the plurality of encoder agents may each be or comprise a bi-directional long short-term memory (LSTM) network” )

Regarding claim 6/13/22
	Celikyilmaz teaches claim 1/8/15/19/23
Further Celikyilmaz teaches, the at least two RNNs comprise gated recurrent units as gating mechanisms for the at least two RNNs. (¶0033 “Each of the layers of the local encoder 202 and the contextual encoder 204 may be an RNN built, for example, from long short-term memory (LSTM) units or gated recurrent units (GRUs), or from other types of neural-network units”)

Regarding claim 23
Celikyilmaz teaches the claimed functions as set forth in connection with claim 15, further Celikyilmaz teaches, A computer program product for determining temporal dependencies in time series data using neural networks, comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: ( ¶0008 “One aspect, in accordance with various embodiments, is directed to a computer-implemented method using one or more hardware processors executing instructions stored in one or more machine-readable media to perform the following operations”)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2, 7, 9, 14, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Celikyilmaz, further in view of Song et al US document ID US 20180060665 A1, hereinafter Song. 

Regarding claim 2/9/16
	Celikyilmaz does not explicitly teach, a data collection component that collects the at least two time series data, wherein the at least two time series data comprise multivariate time series data And wherein the lagged dependencies comprise dependencies of a time series value on historical values of one or more time series
	Song however when addressing a encoder/decoder attention network for prediction teaches, a data collection component that collects the at least two time series data, wherein the at least two time series data comprise multivariate time series data (¶0059 “The monitoring system [data collection] is operated by the DA-RNN and captures one or more video sequences in a workplace environment, such as a boiler room …The DA-RNN is trained to generate a plurality of driving series based on a plurality of observations [multivariate time series data], wherein the observations include (i) a workplace personnel, (ii) an action taken by the personnel, and (iii) an object on which the personnel is taking the action on”) And wherein the lagged dependencies comprise dependencies of a time series value on historical values of one or more time series. (Figure 3 ¶0042 “To determine the attention weights 314a-c, the input attention mechanism 302 may reference the input features of the driving series 307 and an encoded hidden state from a previous timestamp (t−1) 309a” the encoded state which corresponds to lagged dependencies because they depend on the previous timestamp are also based on the previous input features because they are used to generate the “encoded hidden state from a previous timestamp”. The input features are extracted from one or more time series.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate an attention network which collects multivariate input data as taught by Song to the disclosed invention of Celikyilmaz.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement the encoder/decoder attention network to monitor real time activity for example “The DA-RNN is deployed within an industrial facility and is trained to monitor hazardous conditions… he DA-RNN is deployed to monitor the boiler room personnel and to take immediate action, based on observed personnel actions, to prevent harm to human life or significant property damage. The monitoring system is operated by the DA-RNN and captures one or more video sequences in a workplace environment” (Song ¶0059)

Regarding claim 7/14
	Celikyilmaz teaches claim 1/8
	Celikyilmaz does not explicitly teach, wherein the system further comprises the attention mechanism-based neural network for the determination of the at least two temporal context vectors
Song however when addressing a encoder/decoder attention network for prediction teaches, wherein the system further comprises the attention mechanism-based neural network for the determination of the at least two temporal context vectors. (¶0046 “The temporal attention mechanism 405 extracts relevant encoded hidden states 409a-c by computing temporal attention weights 414a, 414b and 414c corresponding to importance of each encoded hidden state 409a-c respectively. The decoder 405 applies the temporal attention weights 414a-c to each encoded hidden state 409a-c respectively, thus generating an adaptively extracted context vector 417” the attention mechanism computes weights that are used by the decoder to produce the context vector for a given time step, at the next time step another temporal context vector is generated, corresponding to “at least two temporal context vectors.”)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate an attention network which utilized encoder input attention for generating encoder context as taught by Song to the disclosed invention of Celikyilmaz.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement the encoder/decoder attention network which prioritizes particular input features Song notes “By including an input attention mechanism in the encoder stage of the DA-RNN of the present embodiment, the DA-RNN can adaptively prioritize particular relevant input features at each timestamp that have greater effect on accurate prediction, rather than giving the same importance to all of the input features.” (Song ¶0017)
Conclusion
Prior art
Yao Qin et al. “A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction” discusses the dual-stage recurrent network discussed above in Song to a greater extent.
Xian Zhou et al. “Predicting Multi-step Citywide Passenger Demands Using Attention-based Neural Networks” discusses an encoder decoder attention model for predicting spatial movement over time. The hidden state of the encoder is jointly trained with a decoder attention model.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached on Monday-Friday 7:30 am – 4:00 pm (EST).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki, can be reached at telephone number 5712723719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
	
/J.R.G./Examiner, Art Unit 2122   

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122