DETAILED ACTION

This action is made FINAL in response to the amendments filed on 4/11/2022.


Claim Objections
Claim 1 objected to because of the following informalities:  
Claim 1, line 7, the limitation “reconstructing the input trained model dynamically expanding the size” IS grammatically wrong. The applicant should maybe change the limitation to -- “reconstructing the input trained model by dynamically expanding the size.”
Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 3, 7, 13, 14 and 15 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Muddu et al (US 2011/0320767).
As to claim 1, Muddu et al teaches a method for re-learning a trained model (paragraph [0005]...training process includes reading a batch of incoming streaming data, retrieving any missing model beliefs from partner processors, and learning on the batch of incoming streaming data. The steps of reading, retrieving, and learning (which includes updating local Bayesian parameters) are repeated until the measured difference in states exceeds a set threshold level), comprising:
receiving an input trained model (paragraph [0021]...a local copy of an original partial model state is received into each of a plurality of processors) consisting of a plurality of neurons (paragraph [0033]...several nodes can be used in parallel and are contemplated within the scope of the invention. Each node loads a partition of an original model state into its model container. The first node 510 is assigned a partition of the original model state, which is loaded into the model container of the first node 510. Likewise, the second node 520 is assigned a different partition of the original model state, which is loaded into the model container of the second node 520. Each node is assigned a partitioned batch of data impressions or incoming data stream, wherein the data impressions or the incoming data stream is partitioned according to a uniformly distributed hash function) and a data set including a new task (paragraph [0034]...another batch of parsed impressions stream 530 is read, and training of the new batch occurs on the first node 510);
identifying a neuron associated with the new task among the plurality of neurons (paragraph [0034]...another batch of parsed impressions stream 530 is read, and training of the new batch occurs on the first node 510) and selectively (paragraph [0033]...the first node 510 is assigned a partition of the original model state, which is loaded into the model container of the first node 510. Likewise, the second node 520 is assigned a different partition of the original model state, which is loaded into the model container of the second node 520. Each node is assigned a partitioned batch of data impressions or incoming data stream, wherein the data impressions or the incoming data stream is partitioned according to a uniformly distributed hash function) re-learning a parameter (paragraph [0038]... a partial updated model state according to individual attributes present) associated with the new task for the identified neuron, wherein the selectively re-learning only re-learns the neuron associated with the new task (paragraph [0022]...each of the plurality of nodes are trained individually according to a partitioned parsed impressions stream to obtain a plurality of partitioned current model states); and
reconstructing the input trained model (paragraph [0021]...the merged differences which exceed the threshold  level are combined with the original partial model states to obtain an updated global model state) dynamically expanding a size of a selectively re- learned trained model on which the selective re-learning is performed to a second size (paragraph [0021].. updated global model state) greater than a first size (paragraph [0021].. partial (local) model state) of the input trained model if a loss (paragraph [0021]...a difference between the original partial model state and its respective current partial model state) of the selectively re-learned trained model exceeds a preset loss value (paragraph [0021]...a difference between the original partial model state and its respective current partial model state is serially determined for each of the plurality of processors according to a divergence function. The determined differences which exceed a threshold level are merged for each of the plurality of processors according to the attributes. The merged differences which exceed the threshold level are combined with the original partial model states to obtain an updated global model state).

As to claim 3, Muddu et al teaches a the method, wherein in the selective (paragraph [0033]...the first node 510 is assigned a partition of the original model state, which is loaded into the model container of the first node 510. Likewise, the second node 520 is assigned a different partition of the original model state, which is loaded into the model container of the second node 520. Each node is assigned a partitioned batch of data impressions or incoming data stream, wherein the data impressions or the incoming data stream is partitioned according to a uniformly distributed hash function) re-learning, a new parameter matrix (paragraph [0034]...current model state 560 for each attribute 550) is calculated using the data set for a network parameter (paragraph [0034]...calculated according to a divergence function) consisting of only the identified neuron (paragraph [0034]...first node 510 ), and the calculated new parameter matrix is reflected to the identified neuron of the trained model to perform the selective re-learning (paragraph [0034]... When the calculated difference is below a set threshold level, it is assumed that the difference between the partitioned original model state and the partitioned current model state 560 for that particular attribute 550 is negligible and the partitioned current model state 560 is essentially unchanged from the partitioned original model state. At this point, another batch of parsed impressions stream 530 is read, and training of the new   batch occurs on the first node 510. A difference is then calculated between the partitioned original model state and the second partition of a current model state 560for each attribute 550. If this calculated difference is still below the set threshold level, then the process is repeated until a difference between the partitioned original model state and the partitioned current model state 560 is above the set vlevel. When the difference between the partitioned original model state and the partitioned current model state 560 exceed   the set threshold level, then the calculated change in state for the first node 510 is applied to the partition of the first node 510. This same process is independently run on the second node 520 to obtain a partitioned current model state 570 for each attribute 550 until the difference between the partitioned original model state and the partitioned current model state 570 of the second node 520 is above the set threshold level. The calculated change in state  above the set threshold level for the second node 520 is applied to the partition of the second node 520).

Claim 7 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Claim 13 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

As to claim 14, Muddu et al teaches the method, further comprising: limiting the size of the input trained model (paragraph [0021]...a local copy of an original partial model state is received into each of a plurality of processors) based on a cumulative knowledge accounting of the new task (paragraph [0001]...online machine learning algorithms are a class of algorithms which make decisions using historical data up to the present moment. Online machine learning algorithms are also known as streaming algorithms. Incremental training is then applied by each machine to learn one instance at a time. As new data becomes available, the algorithms do not require retraining on all data, since they continue to incrementally improve an existing model. Online algorithms have recently achieved improved efficiency over batch algorithms).

As to claim 15, Muddu et al teaches a method of learning a new concept using already-learned knowledge (paragraph [0001]...online machine learning algorithms are a class of algorithms which make decisions using historical data up to the present moment. Online machine learning algorithms are also known as streaming algorithms. Incremental training is then applied by each machine to learn one instance at a time. As new data becomes available, the algorithms do not require retraining on all data, since they continue to incrementally improve an existing model. Online algorithms have recently achieved improved efficiency over batch algorithms)., the method comprising:
training a model to provide a first trained model, the first trained model then being the already-learned knowledge (paragraph [0022]...each of the plurality of nodes are trained individually according to a partitioned parsed impressions stream to obtain a plurality of partitioned current model states);
classifying first data using the first trained model to provide a first classification (paragraph [0021]...a model state is partitioned into partial models according to a partitioning scheme. A local copy of an original partial model state is received into each of a plurality of processors. An incoming data stream is partitioned between the plurality of processors according to their partial distribution model. Each of the plurality of processors serially processes the partitioned incoming streaming data according to attributes to achieve a plurality of current partial model states);
re-learning the first trained model using the method of claim | to provide the-a_ second trained model (paragraph [0038]... a partial updated model state according to individual attributes present); and
classifying second data using the second trained model to provide a second classification, wherein the second classification corresponds to the new concept (paragraph [0021]...a model state is partitioned into partial models according to a partitioning scheme. A local copy of an original partial model state is received into each of a plurality of processors. An incoming data stream is partitioned between the plurality of processors according to their partial distribution model. Each of the plurality of processors serially processes the partitioned incoming streaming data according to attributes to achieve a plurality of current partial model states).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 2 – 5 and 8 - 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Muddu et al (US 2011/0320767) in view of Li et al (US 2018/0046914).
As to claim 2, Muddu et al teaches a method, with selectively (paragraph [0033]...the first node 510 is assigned a partition of the original model state, which is loaded into the model container of the first node 510. Likewise, the second node 520 is assigned a different partition of the original model state, which is loaded into the model container of the second node 520. Each node is assigned a partitioned batch of data impressions or incoming data stream, wherein the data impressions or the incoming data stream is partitioned according to a uniformly distributed hash function) re-learning a parameter (paragraph [0038]... a partial updated model state according to individual attributes present).
Muddu et al fails to explicitly show/teach wherein in the selective re-learning, a new parameter matrix is computed to minimize an objective function having a loss function for the input trained model and a regularization term for sparsity, and the neuron associated with the new task is identified using the computed new parameter matrix.
However, Li et al teaches selective re-learning (paragraph [0014]...retraining the sparse network to learn the final weights), a new parameter matrix (paragraph [0052]...determine an initial compression ratio for each of said plurality of matrices; compression step, for compressing the plurality of submatrices of respective matrix according to its corresponding initial compression ratio, so as to obtain a compressed neural network; fine-tuning step, for fine-tuning said compressed neural network, so as to obtain a final neural network) is computed to minimize an objective function having a loss function (paragraph [0142]...neural network training is a process for optimizing loss function. Loss function refers to the difference between the ideal result and the actual result of a neural network model under predetermined input. It is therefore desirable to minimize the value of) for the input trained model (paragraph [0166]...initial network model) and a regularization term for sparsity (paragraph [0053]...pruning step for pruning the submatrices into sparse submatrices), and the neuron associated with the new task is identified using the computed new parameter matrix (paragraph [0051]... a method for compressing a neural network is proposed, wherein the connection relations between the neurons of the neural network are characterized by a plurality of matrices. The method comprises: dividing step, for dividing at least one of said plurality of matrices into a plurality of submatrices; compression step, for compressing the submatrices into sparse submatrices; and encoding step, for encoding the compressed sparse submatrices).
Therefore,  it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for Muddu et al to selective re-learning, a new parameter matrix is computed to minimize an objective function having a loss function for the input trained model and a regularization term for sparsity, and the neuron associated with the new task is identified using the computed new parameter matrix, as in Li et al, for the purpose of an efficient way so as to improve utilization of resources of the hardware platform.

As to claim 3, modified Muddu et al teaches a method, wherein in the selective (paragraph [0033]...the first node 510 is assigned a partition of the original model state, which is loaded into the model container of the first node 510. Likewise, the second node 520 is assigned a different partition of the original model state, which is loaded into the model container of the second node 520. Each node is assigned a partitioned batch of data impressions or incoming data stream, wherein the data impressions or the incoming data stream is partitioned according to a uniformly distributed hash function) re-learning, a new parameter matrix is calculated (paragraph [0171]... nput parameters related to learning rate modification and training termination includes: start_halving_impr, end_halving_impr, halving_factor, etc. After each iteration, calculating the improvement (referred to as real_impr) based on (loss_prev-loss)/loss_prev, wherein real_impr refers to the relative improvement of the loss of the present iteration compared to that of the previous iteration) using the data set for a network parameter consisting of only the identified neuron, and the calculated new parameter matrix is reflected to the identified neuron of the trained model to perform the selective re-learning (Li et al paragraph [0052]...determine an initial compression ratio for each of said plurality of matrices; compression step, for compressing the plurality of submatrices of respective matrix according to its corresponding initial compression ratio, so as to obtain a compressed neural network; fine-tuning step, for fine-tuning said compressed neural network, so as to obtain a final neural network)

	As to claim 4, Muddu et al teaches the method, wherein in the reconstructing of the input trained model (paragraph [0021]...the merged differences which exceed the threshold  level are combined with the original partial model states to obtain an updated global model state), when the loss exceeds the preset loss value (paragraph [0021]...a difference between the original partial model state and its respective current partial model state is serially determined for each of the plurality of processors according to a divergence function. The determined differences which exceed a threshold level are merged for each of the plurality of processors according to the attributes. The merged differences which exceed the threshold level are combined with the original partial model states to obtain an updated global model state), a fixed number of neurons for each layer is added to the selectively re-learned trained model (paragraph [0025]...the local cache data for each node is deleted and each node is updated with a new partitioned local model state, which is loaded into their respective model containers) and group sparsity is used to eliminate unnecessary neurons from the added neurons, thereby reconstructing the input trained model (Li et al paragraph [0012]...the compression method comprises learning, pruning, and training the neural network. In the first step, it learns which connection is important by training connectivity. The second step is to prune the low-weight connections. In the third step, it retrains the neural networks by fine-tuning the weights of neural network. In recent years, studies show that in the matrix of a trained neural network model, elements with larger weights represent important connections, while other elements with smaller weights have relatively small impact and can be removed (e.g., set to zero). Thus, low-weight connections are pruned, converting a dense network into a sparse network).

As to claim 5, Muddu et al teaches the method, wherein in the reconstructing of the input trained model (paragraph [0166]...initial network model), an unnecessary neuron is identified from the added neurons using an objective function having a loss function (paragraph [0142]...neural network training is a process for optimizing loss function. Loss function refers to the difference between the ideal result and the actual result of a neural network model under predetermined input. It is therefore desirable to minimize the value of)  for the input trained model, a regularization term for sparsity, and a group regularization term for group sparsity (Li et al paragraph [0012]...the compression method comprises learning, pruning, and training the neural network. In the first step, it learns which connection is important by training connectivity. The second step is to prune the low-weight connections. In the third step, it retrains the neural networks by fine-tuning the weights of neural network. In recent years, studies show that in the matrix of a trained neural network model, elements with larger weights represent important connections, while other elements with smaller weights have relatively small impact and can be removed (e.g., set to zero). Thus, low-weight connections are pruned, converting a dense network into a sparse network).
	Therefore, it would have been obvious for the reconstructing of the input trained model, an unnecessary neuron is identified from the added neurons using an objective function having a loss function for the input trained model, a regularization term for sparsity, and a group regularization term for group sparsity, for the same reasons as above. 

Claim 8 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 9 has similar limitations as claim 3. Therefore, the claim is rejected for the same reasons as above. 

Claim 10 has similar limitations as claim 4. Therefore, the claim is rejected for the same reasons as above. 

Claim 11 has similar limitations as claim 5. Therefore, the claim is rejected for the same reasons as above. 

Claim(s) 6 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Muddu et al (US 2011/0320767) in view of Reece (US 10,360,581).
As to claim 6, Muddu et al teaches the method, wherein in the reconstructing of the input trained model (paragraph [0021]...the merged differences which exceed the threshold  level are combined with the original partial model states to obtain an updated global model state).
Muddu et al fails to explicitly show/teach the reconstructing of the input trained model, if a change in the identified neuron has a preset value, the identified neuron is duplicated to expand the input trained model, and the identified neuron has an existing value to reconstruct the input trained model.
However, Reece teaches the reconstructing of the input trained model, if a change in the identified neuron has a preset value, the identified neuron is duplicated to expand the input trained model, and the identified neuron has an existing value to reconstruct the input trained model (column 18, lines 10 – 65...matching a feature of the feature tree with the degraded feature; automatically updating the model, at the discovery system, comprising: discovering nodes of increased specificity over the matching feature's node by discovering nodes from the feature tree which are farther from the root node of the feature tree than the matching feature's node; and adding a feature of a node selected from the discovered nodes, to the model; and deleting the degraded feature from the model).
	Therefore, it would have been obvious for the reconstructing of the input trained model, an unnecessary neuron is identified from the added neurons using an objective function having a loss function for the input trained model, a regularization term for sparsity, and a group regularization term for group sparsity, for the same reasons as above. 


Response to Arguments
Applicant’s arguments with respect to claim(s) 1 - 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075. The examiner can normally be reached Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRANDON S COLE/           Primary Examiner, Art Unit 2128