DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This Office Action is in response to applicant’s communication filed 15 January 2021, in response to the Office Action mailed 15 October 2020.  The applicant’s remarks and any amendments to the claims or specification have been considered, with the results that follow.

The objection to claim 116 has been withdrawn due to the amendment filed.


Information Disclosure Statement
As required by M.P.E.P. 609(c), the applicant's submission of the Information Disclosure Statement, dated 25 January 2021, is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending.  As required by M.P.E.P 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 102, 103, 105-112, 115-119, and 121-128 is/are rejected under 35 U.S.C. 103 as being unpatentable over Deng et al. (Ensemble Deep Learning for Speech .

As per claim 102, Deng teaches a computer-implemented method comprising: training, by a computer system, at least partially, n=1 to N nodal networks, wherein: N is greater than or equal to two [the system can combine multiple networks, including CNNs, DNNs, and RNNs (pg. 1915, section 1; etc.)]; each of the n=1 to N nodal networks comprises a nth network input layer, a nth network output layer, and one or more nth network middle layers, wherein the each of the nth network input layer, the nth network output layer and the one or more nth network middle layers comprise at least one node [the stacked networks, including CNNs, DNNs, and/or RNNs each comprise a number of layers, including input, hidden and output layers (pg. 1917, sections 4.1-4.3, fig. 1, etc.)], wherein the one or more nth network middle layers are between the nth network input layer and the nth network output layer, wherein the nth network input layer is below the one or more nth network middle layers, and the one or more nth network middle layers is below the nth network output layer [the stacked networks, including CNNs, DNNs, and/or RNNs each comprise a number of layers, including input, hidden and output layers (pg. 1917, sections 4.1-4.3, fig. 1, etc.)]; merging, by a computer system, the N nodal networks into a stacked network, such that the nth nodal network is below the (n+1)th nodal network, such that the (N-1)th nodal network is below the Nth nodal network [the networks are stacked, such that some networks may be below others in the stack, or may be connected to the same inputs (pg. 1917, fig. 1, etc.)], and after adding the plurality of cross-[the individual networks and then stacked network are trained with the training data (pg. 1915, abstract, sections 1-2, etc.)].
While Deng teaches merging/stacking multiple nodal networks (see above) it does not teach wherein merging the N nodal networks comprises adding, by the computer system, to the stacked network, a plurality of cross-connections between nodes of the N nodal networks, such that a node of the (n+1)th network covers a node of the nth network.
Adams teaches wherein merging the N nodal networks comprises adding, by the computer system, to the stacked network, a plurality of cross-connections between nodes of the N nodal networks, such that a node of the (n+1)th network covers a node of the nth network [new connections may be formed in the feedforward network(s) after determining that the new connection will improve the performance of the network (paras. 0020-29, 0043-47, etc.); where the node will cover the other in the feedforward network (paras. 0010-18, 0033-36, etc.)]; and after adding the plurality of cross-connections, training, by the computer system, the stacked network [several methods of training may be applied before and after the new connections are added (abstract, etc.)].
Deng and Adams are analogous art, as they are within the same field of endeavor, namely machine learning.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize the creation of new connections between 
Adams provides motivation [a system and method according to the present invention allows a combination of the advantages of fully connected networks and of sparse networks (abstract, paras. 0008-9, 0028-30, etc.)].
Alternatively, it would also have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize a combination of multiple stacked networks, as taught by Deng, in the network creation in the system of Adams.  Deng provides motivation as [combining multiple stacked networks greatly improves the accuracy of the network (abstract, etc.)].

As per claim 103, Deng/Adams teaches wherein adding the plurality of cross connections comprises: evaluating, by the computer system, potential cross-connections between two of the N nodal networks, wherein each potential cross-connection is an arc between a first node in a first of the N nodal networks and a second node in a second of the N nodal networks [new connections may be formed between nodes in the feedforward network(s) after determining that the new connection will improve the performance of the network (Adams: abstract; paras. 0020-29, 0043-47; etc.) for stacked nodal networks forming a combined network (Deng: pg. 1917, sections 4.1-4.3, fig. 1, etc.)], and wherein the evaluation of the potential cross-connection is based on an estimated improvement in an objective of the stacked network that includes the evaluated potential cross-connection [the new connections are preferably formed randomly in the vicinity of the existing connections, and are tested to evaluate if they are "good" or "bad", which can be determined, for example, by the conventional network training rule used. Preferably, good connections are retained, and bad ones are eliminated and/or replaced by good ones (Adams: abstract; paras. 0020-29, 0043-47; etc.)]; and adding, by the computer system, a plurality of the potential cross-connection to the stacked network based, at least in part, on the evaluation such that, after the plurality of cross-connections are added to the stacked network, a node in the second of the N nodal networks covers a node in the first of the N nodal networks [new connections may be formed between nodes in the feedforward network(s) after determining that the new connection will improve the performance of the network (Adams: abstract; paras. 0020-29, 0043-47; etc.) where the node will cover the other in the feedforward network (Adams: paras. 0010-18, 0033-36, etc.) for stacked nodal networks forming a combined network (Deng: pg. 1917, sections 4.1-4.3, fig. 1, etc.)].

As per claim 105, Deng/Adams teaches wherein merging the N nodal networks comprises merging, by the computer system, the N nodal networks such that each of the n=1 to N network input layers receives input data for the stacked network [multiple of the nodal networks may be connected to receive the inputs, and may be selected from multiple configurations (Deng: pg. 1917, sections 4.1-4.3, fig. 1, etc.)].

As per claim 106, Deng/Adams teaches wherein adding the plurality of cross-connections comprises adding, by the computer system, cross-connections from, for n=1 to N-1, a nth network middle layer that is adjacent to and below the nth network output layer to a (n+1)th network middle layer that is adjacent to and above the (n+1)th network input layer [new connections may be formed between nodes in the feedforward network(s) after determining that the new connection will improve the performance of the network (Adams: abstract; paras. 0020-29, 0043-47; etc.) where the node are connected in a feedforward network (Adams: paras. 0010-18, 0033-36, etc.) for stacked nodal networks forming a combined network (Deng: pg. 1917, sections 4.1-4.3, fig. 1, etc.)].

As per claim 107, Deng/Adams teaches wherein: adding the plurality of cross-connections comprises initializing, by the computer system, a connection weight for each of the plurality of cross-connections to a value of zero; and evaluating the plurality of potential cross-connection comprises, for each potential cross-connection, estimating an improvement to the stacked network through addition of the potential cross-connection with connection weight updates for the potential cross-connection through iterative training [the new connections are preferably formed randomly in the vicinity of the existing connections, and are tested to evaluate if they are "good" or "bad", which can be determined, for example, by the conventional network training rule used. Preferably, good connections are retained, and bad ones are eliminated and/or replaced by good ones (Adams: abstract; paras. 0020-29, 0043-47; etc.)].

As per claim 108, Deng/Adams teaches adding, by the computer system, a combining network to the stacked network, such that each of the n=1 to N output layers are input to the combining network [a combined layer is formed at the outputs of the stacked networks taking their outputs (Deng: pg. 1915, section 1; pg. 1917, sections 4.1-4.3 and fig. 1; etc.)].

As per claim 109, Deng/Adams teaches adding, by the computer system, a combining network to the stacked network, such that each of the n=1 to N output layers are input to the combining network [a combined layer is formed at the outputs of the stacked networks taking their outputs (Deng: pg. 1915, section 1; pg. 1917, sections 4.1-4.3 and fig. 1; etc.)].

As per claim 110, Deng/Adams teaches wherein training the N nodal networks comprises training, at least partially, each of the N nodal networks to perform a same classification [the system can combine multiple networks, including CNNs, DNNs, and RNNs into an ensemble to perform classification for speech recognition (Deng: pg. 1915, abstract and section 1; etc.)].

As per claim 111, Deng/Adams teaches wherein training the N nodal networks comprises training each of the N nodal networks to be part of an ensemble [the system can combine multiple networks, including CNNs, DNNs, and RNNs into an ensemble to perform classification for speech recognition (Deng: pg. 1915, abstract and section 1; etc.)].

As per claim 112, Deng/Adams teaches wherein merging the N nodal networks comprises merging, by the computer system, the N nodal networks such that, for n=1 to N, the nth network output layer is directly connected to an output of the stacked network [a combined layer is formed at the outputs of the stacked networks taking their outputs, becoming the new network output layer (Deng: pg. 1915, section 1; pg. 1917, sections 4.1-4.3 and fig. 1; etc.)].

As per claim 115, Deng/Adams teaches wherein adding the plurality of potential cross-connections to the stacked network comprises adding the plurality of potential cross-connections to the stack network such that the knowledge of each of the N nodal networks is preserved [the new connections are preferably formed randomly in the vicinity of the existing connections, and are tested to evaluate if they are "good" or "bad", which can be determined, for example, by the conventional network training rule used. Preferably, good connections are retained, and bad ones are eliminated and/or replaced by good ones (Adams: abstract; paras. 0020-29, 0043-47; etc.)].

As per claim 116, Deng/Adams teaches wherein adding the plurality of potential cross-connections to the stacked network comprises adding the plurality of potential cross-connections to the stacked network such that performance of the stacked network is improved with each added potential cross-connection [the new connections are preferably formed randomly in the vicinity of the existing connections, and are tested to evaluate if they are "good" or "bad", which can be determined, for example, by the conventional network training rule used. Preferably, good connections are retained, and bad ones are eliminated and/or replaced by good ones (Adams: abstract; paras. 0020-29, 0043-47; etc.)].

As per claim 117, Deng/Adams teaches wherein there is a quota on the plurality of cross-connections added to the stacked network [new connections are only added if the change is favorably compared to a specified threshold (Adams: abstract, etc.)].

As per claim 118, see the rejection of claim 102, above, wherein Deng/Adams also teaches a computer system comprising one or more processor cores and a memory in communication with the one or more processor cores storing software that, when executed by the one or more processor cores, cause the cores to perform the method [It is to be understood that the exemplary system modules and method steps described herein may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. Preferably, the present invention is implemented in software as an application program tangibly embodied on one or more program storage devices (Adams: para. 0017, etc.)].

As per claim 119, see the rejection of claim 103, above.

As per claim 121, see the rejection of claim 105, above.

As per claim 122, see the rejection of claim 106, above.

As per claim 123, see the rejection of claim 107, above.

As per claim 124, see the rejection of claim 108, above.

As per claim 125, see the rejection of claim 109, above.

As per claim 126, see the rejection of claim 110, above.

As per claim 127, see the rejection of claim 111, above.

As per claim 128, see the rejection of claim 112, above.


Claims 104, 113, 114, 120, 129, and 130 is/are rejected under 35 U.S.C. 103 as being unpatentable over Deng and Adams as applied to claims 103 and 119 above, and further in view of Yates (US 2017/0105791 – cited in an IDS).

As per claim 104, Deng/Adams teach the computer-implemented method of claim 103, as described above.
While Deng/Adams teaches estimating improvement in the stacked network to evaluate potential cross-connections (see above), it does not explicitly teach wherein the estimated improvement in the objective of the stacked network for the evaluated potential cross-connection is determined based on, at least in part, activation values of the first node and estimates of partial derivatives of the objective with respect to activation of the second node.
Yates teaches wherein the estimated improvement in the objective of the stacked network for the evaluated potential cross-connection is determined based on, at least in part, activation values of the first node and estimates of partial derivatives of the objective with respect to activation of the second node [not all of the input nodes may be connected to each hidden neuron 1404. Values for the hidden nodes 1404 may be determined according to an activation function. In various forms, the outputs of the activation function range from 0 to 1. For example, the output function may be selected to generate outputs between 0 and 1 or, in some forms, results of the output function may be scaled (paras. 0092-95, etc.)].
Deng/Adams and Yates are analogous art, as they are within the same field of endeavor, namely machine learning.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the estimations for creating hidden nodes/connections taught by Yates, for the evaluation of potential cross-connections in the system of Deng/Adams.
Yates provides motivation as [In some forms, it is advantageous to select functions that are continuous and differentiable. This may facilitate training of the neural network. For example, back-propagation training utilizing a gradient method may require computing partial derivatives of the output function, which may be simplified when the optimization functions are continuous and differentiable. One example of such a function that may be utilized as the activation functions is the sigmoid function, as indicated by Equation (1) below (para. 0093, etc.)].

As per claim 113, Deng/Adams/Yates teaches wherein estimating the improvement to the objective of the stacked network through addition of a potential cross-connection comprises evaluating a gradient cross product for the potential cross-connection [not all of the input nodes may be connected to each hidden neuron 1404. Values for the hidden nodes 1404 may be determined according to an activation function. In various forms, the outputs of the activation function range from 0 to 1. For example, the output function may be selected to generate outputs between 0 and 1 or, in some forms, results of the output function may be scaled; and trained using back-propagation training utilizing a gradient method that may require computing partial derivatives of the output function (Yates: paras. 0092-95, etc.)].
Examiner’s Note: the reasoning and motivation for the combination is the same as that provided, above, in the rejection of claim 104.

As per claim 114, Deng/Adams/Yates teaches wherein adding the potential cross-connections comprises adding, by the computer a potential cross-connection when the gradient cross-product for the potential cross-connection exceeds a threshold value [not all of the input nodes may be connected to each hidden neuron 1404. Values for the hidden nodes 1404 may be determined according to an activation function. In various forms, the outputs of the activation function range from 0 to 1. For example, the output function may be selected to generate outputs between 0 and 1 or, in some forms, results of the output function may be scaled; and trained using back-propagation training utilizing a gradient method that may require computing partial derivatives of the output function (Yates: paras. 0092-95, etc.); where the new connections are preferably formed randomly in the vicinity of the existing connections, and are tested to evaluate if they are "good" or "bad", which can be determined, for example, by the conventional network training rule used. Preferably, good connections are retained, and bad ones are eliminated and/or replaced by good ones (Adams: abstract; paras. 0020-29, 0043-47; etc.)].

As per claim 120, see the rejection of claim 104, above.

As per claim 129, see the rejection of claim 113, above.

As per claim 130, see the rejection of claim 114, above.



Response to Arguments
Applicant's arguments filed 15 January 2021 have been fully considered but they are not persuasive.

Applicant argues that the cited art does not teach the same use of the word “stacked” as in the independent claims, arguing that Deng uses stacked to mean taking a linear or log-linear weighted average.
However, as admitted by applicant (in pgs. 10-11 of the remarks), Deng does teach stacking at least some of the networks one-over-the-other.

In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies that there are no existing connections between nodes of different networks when adding any new connections) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
In this case, Deng teaches stacking multiple nodal networks, including connections between the nodes of different networks (see, e.g., Deng: fig. 1), and Adams teaches adding new connections between network nodes in the “neighborhood” of existing connections (see, e.g., Adams: paras. 0010-18, 0020-29, etc.).

Applicant also argues that the cited art does not teach evaluating whether to add a cross-connection “based on an estimated improvement in an objective of the stacked network. . .”
However, Deng/Adams teaches that new connections may be formed between nodes in the feedforward network(s) after determining that the new connection will improve the performance of the network (Adams: abstract; paras. 0020-29, 0043-47; etc.) for stacked nodal networks forming a combined network (Deng: pg. 1917, sections 4.1-4.3, fig. 1, etc.).

Applicant also argues that the cited art does not teach determining the potential improvement from a cross-connection “based on, at least in part, activation values of the first node and estimates of partial derivatives of the objective with respect to activation of the second node”.


Applicant also argues that the cited art does not teach that each of the input layers of the stacked networks receive the input data.
However, Deng teaches that multiple of the nodal networks may be connected to receive the inputs, and may be selected from multiple configurations (Deng: pg. 1917, sections 4.1-4.3, fig. 1, etc.).

Applicant also argues that the cited art does not teach adding cross-connections from a middle layer of one network to a middle layer of another network.
However, Deng/Adams teaches that new connections may be formed between nodes in the feedforward network(s) after determining that the new connection will improve the performance of the network (Adams: abstract; paras. 0020-29, 0043-47; etc.) where the node are connected in a feedforward network (Adams: paras. 0010-18, 0033-36, etc.) for stacked nodal networks forming a combined network (Deng: pg. 1917, sections 4.1-4.3, fig. 1, etc.).

Applicant also argues that the cited art does not teach adding a combining network, where the output layer of each of the stacked networks is input to the combining network.
However, Deng teaches that a combined layer is formed at the outputs of the stacked networks taking their outputs (Deng: pg. 1915, section 1; pg. 1917, sections 4.1-4.3 and fig. 1; etc.).

Applicant also argues that the cited art does not teach directly connecting the output layer of each of the stacked networks to the output of the stacked network.
However, Deng teaches that a combined layer is formed at the outputs of the stacked networks taking their outputs, becoming the new network output layer (Deng: pg. 1915, section 1; pg. 1917, sections 4.1-4.3 and fig. 1; etc.).

Applicant also argues that the cited art does not teach the techniques for evaluating potential cross-connections as recited in claims 113-115.
However, Deng/Adams/Yates teaches that not all of the input nodes may be connected to each hidden neuron 1404. Values for the hidden nodes 1404 may be determined according to an activation function. In various forms, the outputs of the activation function range from 0 to 1. For example, the output function may be selected to generate outputs between 0 and 1 or, in some forms, results of the output function may be scaled; and trained using back-propagation training utilizing a gradient method that may require computing partial derivatives of the output function (Yates: paras. 0092-95, etc.); where the new connections are preferably formed randomly in the 


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claims 1-101 are cancelled; claims 102-130 are rejected.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Fukuda (US 2019/0012594) and Georgescu (US 2016/0174902) – disclose methods of stacking layers to form nodal networks.
Commons (US 9,015,093), Beddo (US 2014/0108094), and Liu (US 2019/0138860) – disclose methods of forming combined networks by stacking nodal networks.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and 

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769.  The examiner can normally be reached on M-F 10am-6pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/GEORGE GIROUX/Primary Examiner, Art Unit 2125