DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/24/2021 has been entered.

Response to Arguments
Applicant’s arguments, see page 12-13 of Applicant’s reply, filed 05/24/2021, with respect to the rejection(s) of claim(s) 1, 2, 4, 6, 8-10, 20-24, 26, 28, 30-32, and 38-40 under 35 U.S.C. 103 as being unpatentable over Choi, U.S. Patent Application Publication No. 2016/0155049 in view of Stanley and Miikkulainen, “Efficient Evolution of Neural Network Topologies” (hereinafter Stanley) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Sharma and Chandra, “Constructive Neural Networks: A Review”.

 Allowable Subject Matter
Claims 13-16 and 88-91 are allowed.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 2, 4, 6, 8-10, 20-24, 26, 28, 30-32, 38-40, and 92 is/are rejected under 35 U.S.C. 103 as being unpatentable over Choi, U.S. Patent Application Publication No. 2016/0155049 in view of Stanley and Miikkulainen, “Efficient Evolution of Neural Network Topologies” (hereinafter Stanley) and, further, in view of Sharma and Chandra, “Constructive Neural Networks: A Review” (hereinafter Sharma).
Regarding claims 1 and 23, taking claim 23 as exemplary, Choi teaches a computer system for improving a deep neural network, the computer system comprising: 
a first set of one or more processors [Paragraph 83] for training a base deep neural network with training data to a desired performance criterion [The initial neural network is trained and performance determined. Paragraphs 82; 125-126; FIGS. 10, 11A, and 11B], wherein: 
the base deep neural network comprises an input layer, an output layer, a plurality of nodes, and a first hidden layer that is between the input and output layers, wherein the plurality of nodes comprises a first node and a second node [The initial neural network comprises a input layer, a hidden layer, and an output layer, with each node comprising nodes (i.e. a plurality of nodes with a first node and second node). Paragraphs 76-77; FIG. 1]; 
the first hidden layer comprises a first node [Each of the layers comprises nodes. Paragraph 77; FIG. 1]; and 
the first node comprises a first incoming arc and a first outgoing arc [Hidden layer nodes have incoming and outgoing edges (i.e. arcs) connecting them to other nodes in the previous and subsequent layers. Paragraphs 78 and 80; FIG. 1], and wherein there is no arc between the first node and the second node in the base deep neural network [The hidden layer comprises at least two nodes, one of which is the ‘first node” and other of which is the “second node”, wherein the nodes in a single hidden layer are not connected to each eachother. See FIG. 1]; 
a second set of one or more processors [Paragraph 149] for: 
structurally changing the base deep neural network to create an updated deep neural network [The structure of the neural network is extended when necessary. Paragraphs 82, 128; FIGS. 10, 11A, and 11B], wherein the updated deep neural network, prior to subsequent training, is guaranteed to have no degradation in performance relative to the base deep neural network on the training data [The neural network extension only increases performance and, therefore, is guaranteed to have no degradation in performance. Paragraph 123]; and 
subsequently training the updated deep neural network [The extended neural network is trained. Paragraph 128; FIGS. 10, 11A, and 11B].
Choi doesn’t teach that the structurally changing the base deep neural network to create an updated deep neural network by adding a new arc between the first node and the second node. In the same field of changing neural network structure, Stanley teaches: structurally changing a base deep neural network to create an updated deep neural network by adding a new arc between a first node, in a first hidden layer, and a second node [The add connection mutation adds a connection/arc between to existing, but previously unconnected, nodes. Stanley at Section II(A), 2nd paragraph; Fig. 2]. Stanley teaches that evolving neural networks by adding new connections/arcs increase the efficiency of neural network learning [Stanley at Abstract and Section 1, 4th paragraph]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Choi’s extension of neural networks so that the structurally changing the base deep neural network to create an updated deep neural network by adding a new arc between the first node, in the first hidden layer, and a second node in a different layer as taught by Stanley because it would increasing the learning efficiency.
Choi and Stanley don’t teach that training the base deep neural network comprises training the base deep neural network using gradient descent and the subsequent training is using gradient descent. In the same field of changing neural network structure [Sharma at Abstract], Sharma teaches training the [Base and subsequent training is performed using gradient descent. Sharma at Section 5.1]. Sharma teaches that any standard training algorithm can be used in combination with constructive neural network methods [Sharma at Section 4, 5th paragraph] and that gradient descent training minimizes error [See Sharma at Section 5.1]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to modify the neural network training of Choi and Stanely to using gradient descent training as taught by Sharma, such that training the base deep neural network comprises training the base deep neural network using gradient descent and the subsequent training is using gradient descent, because in order to minimize error and, thereby, increase neural network performance.

Regarding claims 2 and 24, taking claim 23 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 23, wherein the new arc comprises a new incoming arc to the first node from the second node [The new connection is an incoming connection/arc to node 4 from node 3. Stanley at Section II(A), 2nd paragraph; Fig. 2].
	
Regarding claims 4 and 26, taking claim 26 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 23, wherein the new arc comprises a new outgoing arc from the first node to the second node [Although the new connection is illustrated as going to the first hidden layer, the new connection is between any previously existing and not connected nodes (See Stanley at Section II(A), 1st - 2nd paragraphs) and Choi’s neural network includes other subsequent layers to the first hidden layer (e.g. another hidden layer or output layer), such that adding a new connection/arc would be from the first node in a first hidden layer to a subsequent layer. See Choi at paragraphs 78-80].

Regarding claims 6 and 28, taking claim 28 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 23, wherein the second set of one or more processors structurally changes the base network by adding a second node to the first hidden layer without degrading the performance of the [Extending the neural network includes adding a new node to a hidden layer. Choi at Paragraph 90].

Regarding claims 8 and 30, taking claim 30 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 28, wherein an activation of the second node is not determined by other nodes in the base deep neural network [The training apparatus determines activation characteristics (e.g. based on the activation frequency of a selected node). Choi at Paragraph 106].

Regarding claims 9 and 31, taking claim 31 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 28, wherein the second node has a specified target value for each vector of input data values to the second node [The neural network, including the extended neural network comprising the second node, has an expected output value based on the input (i.e. vector of input data values). Choi at Paragraph 84; See also paragraph 73. Therefore, the second node has an expected output value for each input (i.e. vector of input data values).].

Regarding claims 10 and 32, taking claim 32 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 23, wherein the second set of one or more processors structurally changes the base network by adding a second hidden layer to the base neural network, wherein the second hidden layer is between the input and output layers and is different from the first hidden layer, without degrading the performance of the updated neural network relative to the base neural network [A new hidden layer is added to increase performance. Choi at Paragraphs 82 and 85].

Regarding claims 20 and 38, taking claim 38 as exemplary, Choi, Stanley, and Sharma teach the method of claim 23 wherein the base deep neural network comprises a deep feed forward neural network [Choi at Paragraphs 79-80; FIG. 1].

Regarding claims 21 and 39, taking claim 39 as exemplary, Choi, Stanley, and Sharma teach the method of claim 23, wherein the base deep neural network comprises a deep recurrent neural network [Choi at Paragraphs 79-81].

Regarding claims 22 and 40, taking claim 40 as exemplary, Choi, Stanley, and Sharma teach the method of claim 23, wherein the desired performance criteria is a stationary point in a back-propagation training algorithm for the base deep neural network [Choi at Paragraph 125].

Regarding claim 92, Choi, Stanley, and Sharma teach the method of claim 1, wherein: training the base deep neural network using gradient descent comprises training the base deep neural network using stochastic gradient descent; and subsequently training the updated deep neural network using gradient descent comprises training the updated deep neural network using stochastic gradient descent [As noted in the rejection of claim 1 above, Sharma teaches that any standard training algorithm can be used for base and subsequent iteration training (updating weights) (Sharma at Section 4, 5th paragraph) and including the stochastic gradient descent method. Sharma at Section 5.4, 1st paragraph].


Claims 3, 5, 7, 17-19, 25, 27, 29, and 35-37 is/are rejected under 35 U.S.C. 103 as being unpatentable over Choi, Stanley, and Sharma and, further, in view of Wierzynski et al., U.S. Patent Application Publication No. 2015/0324689 (hereinafter Wierzynski).

Regarding claims 3 and 25, taking claim 25 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 24, wherein a weight of the new incoming arc is initially set to a predetermined value prior to subsequently training the updated deep neural network [The weight of the edges is set to an initial predetermined value. Choi at Paragraphs 93-95]. Choi, Stanley, and Sharma doesn’t teach that the predetermined initial value is zero. In the same field of extending neural networks [Wierzynski at paragraphs 97 and 119], Wierzynski teaches adding a new node to a neural network wherein a weight of a new incoming arc is initially set to zero prior to subsequently training the updated deep neural network [New nodes are added to the neural network and configured with a zero weight initially. Wierzynski at paragraph 121-122]. Setting the initial weight of new nodes to zero enables the new nodes to have no immediate impact on the model and learn refined weight over time [Wierzynski at paragraph 121-122]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the Choi’s predetermined initial weight value to be zero as taught by Wierzynski because.it would enable the added node to have no immediate impact but learn an appropriate weight over time [Wierzynski at paragraph 121-122].

Regarding claims 5 and 27, taking claim 27 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 24, wherein a weight of the new outgoing arc is initially set to a predetermined value prior to subsequently training the updated deep neural network [The weight of the edges is set to an initial predetermined value. Choi at Paragraphs 93-95]. Choi doesn’t teach that the predetermined initial value is zero. In the same field of extending neural networks [Wierzynski at paragraphs 97 and 119], Wierzynski teaches adding a new node to a neural network wherein a weight of a new outgoing arc is initially set to zero prior to subsequently training the updated deep neural network [New nodes are added to the neural network and configured with a zero weight initially. Wierzynski at paragraph 121-122]. Setting the initial weight of new nodes to zero enables the new nodes to have no immediate impact on the model and learn refined weight over time [Wierzynski at paragraph 121-122]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the Choi’s predetermined initial weight value to be zero as taught by Wierzynski because.it would enable the added node to have no immediate impact but learn an appropriate weight over time [Wierzynski at paragraph 121-122].

Regarding claims 7 and 29, taking claim 29 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 28, wherein: the second node has at least one incoming arc and at least one outgoing arc [The new node is connected to a node in a preceding layer with an edge (i.e. incoming arc) and is connected to a node in a subsequent layer with an edge (i.e. outgoing arc). Choi at Paragraphs 91-92]; and a weight for each of the at least one outgoing arcs of the second node is initially set to a [The weight of the edges is set to an initial predetermined value. Paragraphs 93-95]. Choi doesn’t teach that the predetermined initial value is zero. In the same field of extending neural networks [Wierzynski at paragraphs 97 and 119], Wierzynski teaches adding a new node to a neural network wherein a weight for each of the at least one outgoing arcs of the second node is initially set to zero prior to subsequently training the updated deep neural network [New nodes are added to the neural network and configured with a zero weight initially. Wierzynski at paragraph 121-122]. Setting the initial weight of new nodes to zero enables the new nodes to have no immediate impact on the model and learn refined weight over time [Wierzynski at paragraph 121-122]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the Choi’s predetermined initial weight value to be zero as taught by Wierzynski because it would enable the added node to have no immediate impact but learn an appropriate weight over time [Wierzynski at paragraph 121-122].

Regarding claims 17 and 35, taking claim 35 as exemplary, Choi, Stanley, and Sharma teach the computer system of claim 23. Choi doesn’t teach a machine-learning learning coach, implemented by a third set of one or more processors, wherein the machine-learning learning coach is for, upon detection of a degradation in performance of the updated deep neural network on validation data relative to the base neural network: learning a feature change for the updated deep neural network to remedy the degradation in performance of the updated neural network on the validation data relative to the base neural network; and implementing the feature change in the updated deep neural network. In the same field of structurally updating neural network models [Wierzynski at paragraphs 97 and 119], Wierzynski teaches a machine-learning learning coach, implemented by a third set of one or more processors, wherein the machine-learning learning coach is for [Wierzynski at paragraph 90], upon detection of a degradation in performance of the updated deep neural network on validation data relative to a base neural network: learning a feature change for the updated deep neural network to remedy the degradation in performance of the updated neural network on the validation data relative to the base neural network; and implementing the feature change in the updated deep neural network [When validation indicates that the updated model results in degraded performance relative to a prior model corrective action is performed (Wierzynski at paragraphs 108-109), including learning and updating features changes to the model (Wierzynski at paragraphs 109-111, 125-126)]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Choi’s computer system to include a machine-learning learning coach, implemented by a third set of one or more processors, wherein the machine-learning learning coach is for, upon detection of a degradation in performance of the updated deep neural network on validation data relative to the base neural network: learning a feature change for the updated deep neural network to remedy the degradation in performance of the updated neural network on the validation data relative to the base neural network; and implementing the feature change in the updated deep neural network, as taught by Wierzynski, because correcting a degradation in performance would implicitly result in improved performance [Wierzynski at paragraphs 108-109].

Regarding claims 18 and 36, taking claim 36 as exemplary, Choi, Stanley, and Wierzynski teach the computer system of claim 35, wherein the base deep neural network comprises a deep feed forward neural network [Choi at paragraphs 79-80; FIG. 1].

Regarding claims 19 and 37, taking claim 37 as exemplary, Choi, Stanley, and Wierzynski teach the method of claim 35, wherein the base deep neural network comprises a deep recurrent neural network [Choi at paragraphs 79-81].

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN P GEIB whose telephone number is (571)272-8628.  The examiner can normally be reached on Monday - Friday 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/BENJAMIN P GEIB/Primary Examiner, Art Unit 2123