DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments have been fully considered but are moot in light of a new rejection.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 5, 8-9, 12, 15-16, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen, Duong, et al. "A multi-task deep learning architecture for maritime surveillance using AIS data streams." in view of Shaked et al. US 2017/0300814. 
	Regarding claims 1, 8, and 15, Nguyen teaches “a method for trainings of learning models, comprising: receiving current streaming sample data” (pg. 331 right col. ¶1 “Recurrent Neural Networks (RNNs) to develop an automatic system that can process and detect, extract and characterize useful information in AIS data streams for maritime surveillance”); and 
(pg. 332 ¶1 “VRNNs model the historical information (x1:t−1, z1:t−1) by the hidden state of its RNN ht = ht(xt−1, zt−1, ht−1).”); and 
“initializing parameters of the current deep learning model as the parameters of the shallowing learning model” (previous citation, “The conditional distribution p(xt|x1:t−1, z1:t) = p(xt|zt, ht), the prior distribution p(zt|x1:t−1, z1:t−1) = p(zt|ht) and the variational posterior distribution q(zt|x1:t, z1:t−1) = p(zt|xt, ht) are parameterized by fully connected networks.”)
	Nguyen however does not explicitly teach the models being separate. Shaked however teaches “a shallow learning model that is a separate machine learning model from the current deep learning model” (see figure 1 which shows the shallow model 106 and the deep learning model 102 which are separate), “and the shallow learning model has a structure that is simpler than the current deep learning model” ([0037] “The wide machine learning model 106 is a wide and shallow model, e.g., a generalized linear model 138, that is configured to process a second set of features (e.g., features 116-122) included in the model input of the wide and deep learning model 102 and to generate a wide model intermediate predicted output”)
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Nguyen with that of Shaked since a combination of known methods would yield predictable results. As shown in Shaked, it is known in the art to have a simpler model for the purposes of initializing a larger complex model. This allows for a model to start smaller and gradually become complex in order to have better and more efficient learning.
	Note that independent claims 8 and 15 recite the same substantial subject matter as independent claim 1, only differing in embodiments. The difference in embodiments including a non-transitory compute readable medium and a processor and storage medium would be inherent to any computing system such as the one of Nguyen since they are an obvious variation of one another. 
	Regarding claims 5, 12, and 19, Nguyen teaches “wherein the parameters of the shallow learning model are used as initialization parameters of the deep learning model with increased quantity of layers” (pg. 332 ¶1 “The conditional distribution p(xt|x1:t−1, z1:t) = p(xt|zt, ht), the prior distribution p(zt|x1:t−1, z1:t−1) = p(zt|ht) and the variational posterior distribution q(zt|x1:t, z1:t−1) = p(zt|xt, ht) are parameterized by fully connected networks.”)
Claims 3, 10, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen, Duong, et al. "A multi-task deep learning architecture for maritime surveillance using AIS data streams." in view of Shaked et al. US 2017/0300814  further in view of Zhao US 2020/0287814.
Regarding claims 3, 10, and 17, the Nguyen and Shaked references have been addressed above. Both however do not explicitly teach offline learning. Zhao however teaches “wherein the historical sample data comprises offline sample data” (Zhao [0077] “a process of training the deep neural network model may include an offline training process”), “and wherein the shallow learning model is obtained through offline training based on the offline sample data” ([0077] “The offline training process may include the following. History data accumulated over a long period of time may be used to establish the training dataset and the testing dataset”).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Nguyen and Shaked with that of Zhao since a combination of known methods would yield predictable results. As shown in Zhao, using offline and online data to train a network is known in the art and thus using this data for model training would operate in a normal and predictable manner.
Claims 4, 6-7, 11, 13-14, 18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen, Duong, et al. "A multi-task deep learning architecture for maritime surveillance using AIS data streams." in view of Shaked et al. US 2017/0300814  further in view of Ward et al. US 2020/0035219.
Regarding claims 4, 11, and 18, the Nguyen and Shaked references have been addressed above. Both however do not explicitly teach the claim limitations. Ward however teaches “wherein a trained deep learning model is obtained after training the current deep learning model, and wherein the method comprises: in response to determining that performance of the trained deep learning model is improved compared to performance of the current deep learning model, using the trained (Ward [0109] “The backpropagation is performed through the expert neural network layer inserted into gap 1120 just as if the expert neural network layer was a permanent part of neural network 1100 and adjusts the weights of each of the nodes of the expert neural network layer through training. After backpropagation, the updated expert neural network layer is stored back in the expert knowledge store, overwriting the prior version”);  “or in response to determining that performance of the trained deep learning model is not improved compared to performance of the current deep learning model: increasing a quantity of hidden layers of the current deep learning model to obtain a deep learning model with increased quantity of layers” ([0113] “When the counter of training examples exceeds the threshold, one or more new rows are added to the expert knowledge store. Each row includes a selector and an associated expert neural network layer”); 
“obtaining a new deep learning model by training the deep learning model with increased quantity of layers based on the current streaming sample data” ([0109] “The backpropagation is performed through the expert neural network layer inserted into gap 1120 just as if the expert neural network layer was a permanent part of neural network 1100 and adjusts the weights of each of the nodes of the expert neural network layer through training. After backpropagation, the updated expert neural network layer is stored back in the expert knowledge store, overwriting the prior version”); and 
“determining the latest deep learning model based on a result of performance comparison between the new deep learning model and the current deep learning model” ([0109] “After backpropagation, the updated expert neural network layer is stored back in the expert knowledge store, overwriting the prior version. The backpropagation trains the expert neural network layer to become more accurate, for those conditions where it is inserted in the network, and allows it to become specialized for particular use cases”)
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Nguyen and Shaked with that of Ward since using the training techniques as described in Ward would allow for more optimal training and classification using neural networks.
	Regarding claims 6, 13, and 20, the Nguyen, Shaked, and Ward references have been addressed above. Ward further teaches “wherein determining the latest deep learning model comprises: in response to determining that performance of the new deep learning model is improved compared to the performance of the current deep learning model, using the new deep learning model as the latest deep learning model” ([0109] “The backpropagation is performed through the expert neural network layer inserted into gap 1120 just as if the expert neural network layer was a permanent part of neural network 1100 and adjusts the weights of each of the nodes of the expert neural network layer through training. After backpropagation, the updated expert neural network layer is stored back in the expert knowledge store, overwriting the prior version”);
“or in response to determining that performance of the new deep learning model is not improved compared to the performance of the current deep learning model, using the current deep learning model as the latest deep learning model” ([0109] “If the neural network 1100 is performing inference, then after neural network 1100 produces its output, the expert neural network layer may be deleted from portion 1120 so that portion 1120 is once again empty and ready to be filled in at the next iteration”)
	Regarding claims 7, and 14, the Nguyen, Shaked, and Ward references have been addressed above. Ward further teaches “comprising: weighting the latest deep learning model and the shallow learning ([0043] “The inputs to the node are combined through a linear combination with weights and the activation function is applied to the result to produce the output.”)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Sahoo, Doyen, et al. "Online deep learning: Learning deep neural networks on the fly." arXiv preprint arXiv:1711.03705 (2017).
Reference (a) above additionally teaches using a shallow model, which gradually progresses into a deep learning model. See pg. 2 left col. ¶1, “We aim to devise an online learning algorithm that is able to start with a shallow network that enjoys fast convergence; then gradually switch to a deeper model (meanwhile sharing certain knowledge with the shallow ones) automatically when more data has been received to learn more complex hypotheses, and effectively improve online predictive performance by adapting the capacity of DNNs.”
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN W FIGUEROA whose telephone number is (571)272-4623. The examiner can normally be reached Monday-Friday, 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

KEVIN W FIGUEROA
Examiner
Art Unit 2124



/Kevin W Figueroa/Examiner, Art Unit 2124