Detailed Action
This action is in response to Applicant's communications filed 28 October 2021.
Claim(s) 1, 8-9, and 11-14 was/were amended.  No claims were cancelled.  No claims were withdrawn.  No claims were added.  Therefore, claims 1-6 and 8-15 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments/Arguments
Applicant's arguments, filed 28 October 2021, regarding the rejections of claims 1-6 and 8-15 under 35 USC 103 have been fully considered but are not persuasive.
Applicant argues (Remarks, p. 7) that Pan is a "survey" of transfer learning that discusses a variety of different learning techniques and applications and that it cannot be applied to the prior art because it does not teach the first neural model component that is trained for a plurality of systems and the second neural model component that is trained for specific systems of the joint neural model.  Applicant argues that Pan in section 3.3 on p. 1351 states that transfer learning only aims at boosting the performance of the target domain by utilizing the source domain data, which is a contrast to learn both source and target tasks simultaneously.  Examiner notes that this is not a contradiction.  In the same paragraph, Pan discusses the inductive transfer learning setting and that most approaches described in that section are designed to work under multitask learning 
Applicant argues (Remarks, p. 8) that the prior art is silent to the amended claim language of "a joint neural model." However, Pan discusses several examples of learning a joint neural model that teaches the first neural model component and the second neural model component.  For example, in section 3.3 on p. 1351, Pan discusses learning the generalized terms and task specific terms that teach the first and second neural mode components:  "The proposed method assumed that the parameter, w, in SVMs for each task can be separated into two terms.  One is a common term over tasks and the other is a task-specific term.  In inductive transfer learning, ws = w0 + vs and wt = w0 +vt, where ws and wt are parameters of the SVMS for the source task and the target learning task, respectively.  w0 is  a common parameter while vs and vt are specific parameters for the source task and the target task, respectively."  Thus, Pan teaches the limitations of the amended claims.
The rejection of the dependent claims for depending from rejected claims is maintained.
For the aforementioned reasons, claims 1-6 and 8-15 are rejected under 35 USC 103.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly 

Claim(s) 1-5, 8-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pan et al. (A Survey on Transfer Learning, hereinafter "Pan") in view of Jaeger (US 2004/0015459).

Regarding Claim 1,
Pan teaches a method for controlling a target system on the basis of operational data of a plurality of source systems, comprising:
a) receiving operational data of the plurality of source systems ("collected labeled or unlabeled training data" sec. 2.1, p. 1346), the operational data being distinguished by a source system specific identifier for each respective source system of the plurality of source systems ("TrAdaBoost assumes that, due to the difference in distributions between the source and the target domains, some of the source domain data may be useful in learning for the target domain but some of them may not and could even be harmful. It attempts to iteratively reweight the source domain data to reduce the effect of the “bad” source data while encourage the “good” source data to contribute more for the target domain. For each round of iteration, TrAdaBoost trains the base classifier on the weighted source and target data. The error is only calculated on the target data." sec. 3.1, p. 1350; weighting the source teaches a source specific identifier),
("The proposed method assumed that the parameter, w, in SVMs for each task can be separated into two terms.  One is a common term over tasks and the other is a task-specific term.  In inductive transfer learning, ws = w0 + vs and wt = w0 +vt, where ws and wt are parameters of the SVMS for the source task and the target learning task, respectively.  w0 is  a common parameter while vs and vt are specific parameters for the source task and the target task, respectively." sec. 3.3, p. 1351) on the basis of the received operational data of the plurality of source systems ("Traditional data mining and machine learning algorithms make predictions on the future data using statistical models that are trained on previously collected labeled or unlabeled training data" sec. 2.1, p. 1346)
taking into account a respective source system specific identifier for each respective source system ("TrAdaBoost assumes that, due to the difference in distributions between the source and the target domains, some of the source domain data may be useful in learning for the target domain but some of them may not and could even be harmful. It attempts to iteratively reweight the source domain data to reduce the effect of the “bad” source data while encourage the “good” source data to contribute more for the target domain. For each round of iteration, TrAdaBoost trains the base classifier on the weighted source and target data. The error is only calculated on the target data." sec. 3.1, p. 1350; the weight of each source teaches taking into account the respective source system specific identifier), 
("multitask learning tried to learn both the source and target tasks simultaneously and perfectly." sec. 3.3, p. 1351; "The proposed method assumed that the parameter, w, in SVMs for each task can be separated into two terms.  One is a common term over tasks and the other is a task-specific term.  In inductive transfer learning, ws = w0 + vs and wt = w0 +vt, where ws and wt are parameters of the SVMS for the source task and the target learning task, respectively.  w0 is  a common parameter while vs and vt are specific parameters for the source task and the target task, respectively." sec. 3.3, p. 1351 "A third case can be referred to as parameter-transfer approach [45], [46], [47], [48], [49], which assumes that the source tasks and the target tasks share some parameters or prior distributions of the hyperparameters of the models. The transferred knowledge is encoded into the shared parameters or priors. Thus, by discovering the shared parameters or priors, knowledge can be transferred across tasks." p. 1348) and 
a second neural model component of the joint neural model is trained on properties varying between each respective source system of the plurality of source systems ("The proposed method assumed that the parameter, w, in SVMs for each task can be separated into two terms.  One is a common term over tasks and the other is a task-specific term.  In inductive transfer learning, ws = w0 + vs and wt = w0 +vt, where ws and wt are parameters of the SVMS for the source task and the target learning task, respectively.  w0 is  a common parameter while vs and vt are specific parameters for the source task and the target task, respectively." sec. 3.3, p. 1351 “What to transfer” asks which part of knowledge can be transferred across domains or tasks. Some knowledge is specific for individual domains or tasks, and some knowledge may be common between different domains such that they may help improve performance for the target domain or task. After discovering which knowledge can be transferred, learning algorithms need to be developed to transfer the knowledge, which corresponds to the “how to transfer” issue." sec. 2.3, p. 1347),
c) receiving operational data of the target system ("Given a source domain DS with a learning task T S , a target domain DT and a corresponding learning task T T , unsupervised transfer learning aims to help improve the learning of the target predictive function" sec. 5, p. 1354),
d) further training both the first neural model component ("The basic idea is to learn a low-dimensional representation that is shared across related tasks." sec. 3.2.1, p. 1350) and the second neural model component of the trained joint neural model on the basis of the operational data of the target system ("an unsupervised feature construction method, for learning higher level features for transfer learning. The basic idea of this approach consists of two steps. In the first step, higher level basis vectors b = {b1, b2, ..., bs} are learned on the source domain data... After learning the basis vectors b, in the second step, an optimization algorithm (3) is applied on the target-domain data to learn higher level features based on the basis vectors b." sec. 3.2.2, p. 1350), where a further training of the second neural model component is given preference over a further training of the first neural model component ("in transfer learning, weights in the loss functions for different domains can be different. Intuitively, we may assign a larger weight to the loss function of the target domain to make sure that we can achieve better performance in the target domain." sec. 3.3, p. 1351), and 

Pan does not explicitly teach e) controlling the target system by the further trained joint neural network.  
Jaeger teaches e) controlling the target system by the further trained joint neural network ("The approach to obtain such a controller through training of a recurrent neural network is novel and represents a dependent claim of the invention. More specifically, the claim is a method to obtain closed-loop tracking controllers by training of a recurrent neural network according to the method of the invention." [0103]).  
Pan and Jaeger are analogous art because both are directed to increasing the efficiency of machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the transfer learning methods of Pan with the training method for RNNs of Jaeger.  The modification would have been obvious because one of ordinary skill in the art would be motivated to reduce the overall cost of setting up and training a neural network, as suggested by Jaeger ("the overall cost of using an RNN set up and trained according to the present invention is greatly reduced in cases where many different tasks have to be carried out on the same input data." [0014]).

Regarding Claim 2,
(“A single instantiation of the "reservoir" network can be reused for many tasks, by adding new output units and separately teaching their respective hidden-to-output weights for each task.” [0014], where the hidden-to-output weights are the second adaptive weights associated with the second (varying, smaller) model components; “By contrast, the method disclosed in the present invention utilizes a large recurrent network, whose internal weights (i.e. on hidden-to-hidden, input-to-hidden, or output-to-hidden connections) are not changed at all.” [0013], where the remainder of the weights of the network are associated with the first (larger, invariant after initial training) model component).
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 3,
The Pan/Jaeger combination teach the method of claim 2.  Jaeger further teaches wherein the number of the first adaptive weights is several times greater than the number of the second adaptive weights (“Preferably, the DR is large, i.e. has in the order of 50 or more (no upper limit) units.” [0019]; where it is understood that the reservoir / first component contains several types of weights, and functions as a large overcomplete basis. In comparison, the second weights are not large, and being hidden-to-output weights are proportional to the number of outputs. Fig. 1 also illustrates an example having a much larger reservoir relative to the number of output units.).  
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 4,
The Pan/Jaeger combination teach the method of claim 2.  Jaeger further teaches wherein the first adaptive weights comprise a first weight matrix and the second adaptive weights comprise a second weight matrix ("FIG. 1 provides an overview of a preferred embodiment of the invention, with extra input and output units. In this figure, the DR [ 1] is receiving input by means of extra input units [2] which feed input into the DR through input-to-DR connections [4]. Output is read out of the network by means of extra output units [3], which in the example of FIG. 1 also have output-to-DR feedback connections [7]. Input-to-DR connections [4] and output-to-DR feedback connections [7] are fixed and not changed by training. Finally, there are DR-to-output connections [5] and [possibly, but not necessarily] input-to-output connections [6]. The weights of these connections [5], [6] are adjusted during training." [0032], where the DR-to-output weights [6] are elements of the second weight matrix; the second weight matrix may be a vector, or may be a matrix due to e.g. spatial structure [0021]).  
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 5,
The Pan/Jaeger combination teach the method of claim 4.  Jaeger further teaches wherein for determining adaptive weights of the neural model the first weight matrix is multiplied by the second weight matrix ([0037]-[0038], where the training of the second weight matrix involves taking an inner product with the DR).  
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 8,
The Pan/Jaeger combination teach the method of claim 2.  Jaeger further teaches wherein when further training the trained joint neural model a first subset of the first adaptive weights is substantially kept constant while a second subset of the first adaptive weights is further trained (“Only the hidden-to-output connection weights are adjusted in the teaching process. By this adjustment, the hidden-to-output connections acquire the functionality of a filter which distills and re-combines from the "reservoir" dynamical patterns in a way that realizes the desired learning objective.” [0013]).  
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 9,
The Pan/Jaeger combination teach the method of claim 1.  Jaeger further teaches wherein the joint neural model is a reinforcement learning model ( “The training sequences u(t), y(t) are presented to the network for t=1,2, . . . ,N. At every time step, the DR is updated (according to the chosen update law, e.g., Equation (1)), and the activations of the output units are set to the teacher signal y(t) (teacher forcing).” [0034]).  
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 10,
The Pan/Jaeger combination teach the method of claim 1.  Jaeger further teaches wherein the neural network operates as a recurrent neural network ("recurrent neural network" [0018]).  
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 11,
The Pan/Jaeger combination teach the method of claim 1.  Jaeger further teaches wherein during training of the joint neural model determining whether the joint neural model reflects a distinction between the properties shared by the plurality of source systems and the properties varying between the respective source systems of the plurality of source systems, and affecting the training of the joint neural model in dependence of that determination (“This example is an instance of another dependent claim of the invention, namely, to use the method of the invention to train an RNN on the dynamic relationship between several signals. More specifically, the claim is (1) to present training data ... to n extra units of a DR architecture according to the invention, where these extra units have feedback connections to the DR, (2) train the network such that the mean square error from Eq. (4) is minimized, and then (3) exploit the network in any "direction" by arbitrarily declaring some of the units as input units and the remaining ones as output units.” [0106], see also [0104]-[0114]).  
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 12,
The Pan/Jaeger combination teach the method of claim 1.  Jaeger further teaches wherein policies resulting from the trained joint neural model are run in a closed learning loop with the technical target system ("The approach to obtain such a controller through training of a recurrent neural network ist novel and represents a dependent claim of the invention. More specifically, the claim is a method to obtain closed-loop tracking controllers by training of a recurrent neural network according to the method of the invention" [0103]).  
The motivation to combine Pan and Jaeger is the same as the motivation for claim 1.  

Regarding Claim 15,
The Pan/Jaeger combination teach the method of claim 1.  Pan further teaches wherein the plurality of source systems are systems similar to the target system ("some relationship among the data in the source and target domains is similar.  Thus, the knowledge to be transferred is the relationship among the data." sec. 2.3, p. 1349).

Regarding Claim(s) 13,
Claim(s) 13 recite(s) a controller for performing a method corresponding to the method steps recited in claim(s) 1.  The Pan/Jaeger combination teaches the limitations of claim(s) 13 as set forth above in connection with claim(s) 1.  Therefore, claim(s) 13 is/are rejected under the same rationale as respective claim(s) 1.

Regarding Claim(s) 14,
Claim(s) 14 recite(s) a computer program product (Jaeger: "computer", sec. 3.6, p. 133) executing a method corresponding to the method steps recited in claim(s) 1, respectively.  The Pan/Jaeger combination teaches the limitations of claim(s) 14 as set forth above in connection with claim(s) 1.  Therefore, claim(s) 14 is/are rejected under the same rationale as respective claim(s) 1.

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pan et al. (A Survey on Transfer Learning, hereinafter "Pan") in view of Jaeger (US 2004/0015459), and further in view of Habtom et al. (Estimation of Unmeasured Inputs Using Recurrent Neural Networks and the Extended Kalman Filter, hereinafter "Habtom").

Regarding Claim 6,

Habtom teaches wherein the second weight matrix is a diagonal matrix (“The DRNN can be represented mathematically as follows, [by eqs (1) – (3)] where x(k) and xy(k) specify the output vector of the hidden units and the output vector of the network at time k receptively; Wd represents a diagonal weight matrix connecting the vector x(k) back to the inputs of the hidden units.” page 2068 column 1 paragraph 3; The DRNN [diagonal recurrent neural network] is a single hidden layer RNN, where each neuron manages its own history, and a diagonal weight matrix provides the output of the hidden layer at the prior time step for feedback and history tracking. Here the DRNN is used by Habtom to adapt the outputs of a more general control RNN to a specific use.).  
Pan and Habtom are analogous art because both are directed to training machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the RNN training methods of the Pan/Jaeger combination with the diagonal matrices of Habtom.  The modification would have been obvious because one of ordinary skill in the art would be motivated to employ structure that has better performance, as suggested by Habtom ("The neural network having such as structure is designated as a diagonal recurrent neural network (DRNN) in [9].  This network... has shown a better performance as compared to an RBF and an MLP both with TDL." sec. 2, p. 2068).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477.  The examiner can normally be reached on M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/CHARLES C KUO/Examiner, Art Unit 2126   
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126