Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


DETAILED ACTION


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 10210860 B1	Ward; Jeff et al. (hereinafter Ward).
Re claim 1, 
1. A processor comprising: two or more processing cores to train portions of a neural network separately in parallel (neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Ward to incorporate embodiments for layered neural networks which process single or multiple neural networks in parallel, such as in at least fig. 17 with parallel actions, thereby improving the neural network training prior to recombination by isolating contexts such as for speech/words.

Re claim 2, Ward teaches 
2. A processor of claim 1, wherein the two or more processing cores apply one or more gradients to different sets of nodes of the neural network. (gradients within neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 3, Ward teaches 
3. The processor of claim 1, where in the neural network is trained, at least in part, by generating a weight update by combining a plurality of a partial weight updates produced in parallel by the two or more processing cores. (weights are combined to produce a predicted weight, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 4, Ward teaches 
4. The processor of claim 1, wherein the processor further divides a weight update operation into a plurality of partial weight update operations and distributes individual partial weight update operations to the two or more processing cores. (partial weights are updated, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 5, Ward teaches 
5. The processor of claim 4, wherein the partial weight update operations are produced by dividing an initial weight and gradient update/descent pushed into a plurality of distinct portions. (inherent layers using gradient updates neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 6, Ward teaches 
6. The processor of claim 4, wherein each partial weight update is executed using a different thread. (thread = process such that in a neural network with inherent nodes and branches, portions are trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 7, Ward teaches 
7. The processor of claim 4, wherein the processor further gathers the partial weight updates to produce the weight update. (partial derivatives and weights combined, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 8, Ward teaches 
8. A system, comprising: one or more processors to train portions of a neural network separately in parallel; and one or more memories to store the neural network. (neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 9, Ward teaches 
9. The system of claim 8 where in the neural network is trained, at least in part, by: generating a plurality of partial weight updates in parallel using the one or more processors; and combining the plurality of partial weight updates to produce a weight gradient update. (weights are combined to produce a predicted weight, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 10, Ward teaches 
10. The system of claim 9, wherein the plurality of partial weight updates is produced in parallel using a plurality of worker threads. (neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claims 11 and 18, Ward teaches 
11. The system of claim 9, wherein the neural network is trained at least in part by: forward propagating an input through the neural network to produce an output; (expressly forward prop, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
determining an error based at least in part on a difference between the output and an expected value; and (difference in output vs training expressly forward prop, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
backpropagating the error to determine a gradient, the plurality of partial weight updates based at least in part on the gradient. (expressly back prop, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 12, Ward teaches 
12. The system of claim 9, wherein the plurality of partial weight updates are produced at least by: identifying a plurality of subsets of network nodes of the neural network; and (neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
producing a partial weight update for each subset in the plurality of subsets. (gradient descent driven as partial derivative, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 13, Ward teaches 
13. The system of claim 12, wherein the plurality of subsets are non-overlapping subsets of weights of the neural network. (as in fig. 11a isolated weights predicted, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 14, Ward teaches 
14. The system of claim 12, wherein: an individual subset of the plurality of subsets includes a quantity of node weights; and (neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
the quantity of node weights is determined based at least in part on an amount of processing power available to a worker assigned to process the individual subset relative to other workers assigned to process other subsets. (inherent but processing power is used for workers/processes/threads, col 32 lines 50-57, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 15, Ward teaches 
15. The system of claim 8, wherein: the system determines a set of gradients for each input of a set of input values; and the set of gradients is distributed to each of the one or more processors. (gradient descent driven as partial derivative in neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 16, Ward teaches 
16. A method, comprising training a neural network by, at least in part, training different portions of a neural network separately in parallel using a plurality of processors. (neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 17, Ward teaches 
17. The method of claim 16 wherein: the neural network is trained at least in part by distributing gradient information to the plurality of workers; (process=worker, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
the workers calculate different portions of a weight update in parallel; (weights are updated, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
 and the different portions are aggregated to produce new weight values of the neural network. (subtracting or altering from weights, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 19, Ward teaches 
19. The method of claim 18, wherein: the gradient is distributed to each of the plurality of workers; and the plurality of workers calculate the different portions of the weight update. . (inherent but processing power is used for workers/processes/threads, col 32 lines 50-57, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 20, Ward teaches 
20. The method of claim 16, wherein each worker of the plurality of workers executes on a different processor. (worker = process, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 21, Ward teaches 
21. The method of claim 16, wherein each worker of the plurality of workers executes in parallel on a graphical processing unit. (GPU chunking as in fig. 17 with related paragraphs, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 22, Ward teaches 
22. The method of claim 17, wherein a number of different portions of the weight update matches a number of available processors of a computer system. (processors update therefore the update matches i.e. is tied to that process or processor per se, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 23, Ward teaches 
23. The method of claim 17, wherein the weight update is divided into substantially equal nonoverlapping groups of node weights to produce the different portions. (nodes and weights as illustrated, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 24, Ward teaches 
24. A speech processing system comprising a neural network that takes a digital representation of sound as input and identifies elements of human speech, portions of the neural network trained to recognize human speech separately in parallel by a plurality of processes. (end to end speech recognition using neural networks with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Re claim 25, Ward teaches 
25. The speech processing system of claim 24 wherein the neural network: generates a plurality of weight updates in parallel; and combines the plurality of weight updates into a single weight update. (as in fig. 13, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

26. The speech processing system of claim 24, wherein the speech processing system further comprises one or more processors and memory to store executable instructions that, as a result of being executed by the one or more processors, cause the speech processing system to at least: 
obtain data representing audio from a microphone; (speech recognition requires microphone col 5 lines 17-32, col 26 line 50 – col 27 line 35, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
process the data using the neural network to identify a spoken word represented in the data; and (words processed and learned to reduce error rate, speech recognition requires microphone col 5 lines 17-32, col 26 line 50 – col 27 line 35, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)
perform an action based at least in part on the identity of the spoken word. (learning expressly is an action, and ASR is used for request/commands, processed and learned to reduce error rate, speech recognition requires microphone col 5 lines 17-32, col 26 line 50 – col 27 line 35, neural network with inherent nodes and branches, portions trained in parallel that are distinct, weights are updated, difference in output vs training utilizing workers/processes/threads analogous to parallel or any group of processes, forward and back propagation, gradient descent driven as partial derivative, subtracting or altering from weights, col 17 lines 35-55, col 20 line 7 to col 21 line 22, col 23 line 54 to col 24 line 29, col 30 line 54 to col 55 line 34, col 32 lines 9-31, fig. 11a, 11b, and 13)

Claim 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 10210860 B1	Ward; Jeff et al. (hereinafter Ward) in view of US 20200327884 A1	 Bui; Trung Huu et al. (hereinafter Bui).
Re claim 27, Ward fails to teach
27. The speech processing system of claim 26, wherein the action is a navigation request to be processed by a navigation system of a vehicle. (Bui 0036 0113)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Ward to incorporate the above claim limitations as taught by Bui to allow for a broad application of parallel neural network processing in the context of GPS/navigation wherein input of user speech is learned in the context of directions/navigation, thereby providing another network context when a user speaks open-ended.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yu; Dong et al.	US 10325200 
Deep neural networks

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C COLUCCI whose telephone number is (571)270-1847.  The examiner can normally be reached on M-F 9 AM - 5 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHAEL COLUCCI/Primary Examiner, Art Unit 2655                                                                                                                                                                                               (571)-270-1847
Examiner FAX:  (571)-270-2847
Michael.Colucci@uspto.gov