Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 09/09/2022 has been entered. 

DETAILED ACTION

Claims 1-20 and 22-23 are currently pending and have been examined.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4-6, 10-11, 14-15, 20 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (U.S. Pub. No. 20210200993 A1) in view of Cipar et al. (U.S. Pub. No. 20120291041 A1), and further in view of Dean et al. (US 20100122065 A1), and further in view of Kang et al. “Partially Connected Feedforward Neural Networks Structured by Input Types”.
Chen, Cipar and Dean were cited in a previous Office Action.

As per claim 1, Chen teaches the invention substantially as claimed including a method of performing an operation in a neural network apparatus, the method comprising:
acquiring, at a first queue unit of a plurality of queue units, an operation group comprising: at least one input feature map and at least one kernel, where a convolution operation between one input feature map of the at least one feature map and one kernel of the at least one kernel creates an Intermediates feature map (par. 0058 CNN stage 800 receives, via input 801 [queue], for example from a previous CNN stage, input feature maps 811 such that input feature maps 811 have ‘n’ channels. Furthermore, input feature maps 811 may have any suitable size such that input feature maps 811 provide an input volume to CNN stage. For example, input feature maps 811 may each have H×W elements and input feature maps 811 may have ‘n’ channels as discussed herein.); and 
creating, by the operating circuit in the idle state, the intermediate feature map from the convolution operation between the one input feature map and the one kernel (par. 0058 … Depth-wise convolution module 802 receives input feature maps 811 and applies a depth-wise separable convolution to input feature maps 811 to generate multiple separate 2D feature maps 812. Depth-wise convolution module 802 applies a per-channel 2D convolution that outputs ‘n’ separate 2D feature maps 812 using ‘n’ convolution kernels of size k×k×1); and
creating, by the post-processing circuit, an output feature map using the intermediate feature map (par. 0059 Depth-wise convolution module 802 receives intermediate feature maps 815 (or combined feature maps 814) and applies a depth-wise separable convolution to intermediate feature maps 815 (or combined feature maps 814) to generate multiple separate 2D feature maps 816. Depth-wise convolution module 806 applies a per-channel 2D convolution that outputs ‘n’ separate 2D feature maps 816 using ‘n’ convolution kernels of size k×k×1),
Chen does not expressly disclose: determining, from among corresponding operation circuits that are connected to the first queue unit, an operating circuit in an idle state.
However, Cipar teaches: determining, from among corresponding operation circuits that are connected to the first queue unit, an operating circuit in an idle state (par. 0048, When a unit of data has reached a particular processing stage (processing stage i), the coordinator 203 determines if there is an idle resource available to execute the respective task for the unit of data (where this task is provided in the queue 510). The coordinator 203 first determines if there is an idle resource available [equiv. to operating unit in idle state]  from the respective stage-specific idle list 504 for the particular processing stage i. If so, the coordinator 203 assigns work to perform the respective task to the idle resource; page 6, claim 5 wherein a particular one of the processing stages is associated with a set of resources dedicated to the particular processing stage, the method further comprising: if a resource from the dedicated set is available, using the resource from the dedicated set to process at least one of the tasks at the particular processing stage).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chan to incorporate the method of assigning a task to an idle resource of a set of resources dedicated to each respective stages as set forth by Cipar because it would provide for dynamically assigning available resources for performing operations on input feature maps at least based on an availability of a resource at a particular stage in order to improve performance workload in such apparatus.  
Chan and Cipar does not expressly teach: tag information, corresponding to the operation group, including a position of a post-processing circuit, from among a plurality of post-processing circuits, to which the intermediate feature map is to be transferred. 
However, Dean teaches: tag information, corresponding to the operation group, including a position of a post-processing circuit, from among a plurality of post-processing circuits, to which the intermediate feature map is to be transferred (par. 0028 … work queue master 214 uses input file information received from a file system to determine the appropriate processor [post processing unit] or process for executing a task; par. 0033 … For reduce tasks, the work queue master 214 may defer assigning any particular reduce task to an idle process [post processing unit]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chan and Cipar by incorporating the method of determining a processor unit to process a reduce task as set forth by Dean because it would provide for efficiently determining post processing units based on profile information associated with a workload/job, with predictable result.
Chan and Cipar and Dean does not teach: wherein the corresponding operating circuits are among a plurality of operating circuits, wherein only some queue units, of the plurality of queue units, are connected in a fully-connected configuration to the corresponding operating circuits, from among a plurality of operating circuits, and remaining queue units of the plurality of queue units are connected to only one respective operating circuit; and wherein the some queue units include the first queue unit.
However, Kang teaches: wherein the corresponding operating circuits are among a plurality of operating circuits, wherein only some queue units, of the plurality of queue units, are connected in a fully-connected configuration to the corresponding operating circuits, from among a plurality of operating circuits, and remaining queue units of the plurality of queue units are connected to only one respective operating circuit; and wherein the some queue units include the first queue unit (pg. 182, Fig. 5 and 6 describe partially connected feedforward neural networks PCFNN, wherein some input units [queues] are fully connected to multiple nodes and remaining input units are connected to only one respective node).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chan, Cipar and Dean by incorporating the technique of connecting inputs using a partially-connected configuration to corresponding computing nodes as set forth by Kang because use of such partially-connected connected configuration reduce the complexity of the network that causes the slow training time, especially for large networks. This would provide redundancy to allow balancing (in case of failure) workload for certain types of inputs while providing a dedicated computing node for other types of inputs.

As per claim 4, Chen further teaches wherein the creating of the output feature map comprises determining, by the post-processing circuit, a partial sum of the intermediate feature map to create the output feature map (par. 0054 adder 405 receives combined feature maps 414 and sums combined feature maps 414 with input feature maps 411 to generate output feature maps 415).

As per claim 5, Chen further teaches wherein the creating of the output feature map includes performing, by the post-processing circuit, at least one of pooling or an activation function operation, after the use of the intermediate feature map by the post-processing circuit, to create the output feature map (par. 0034 Stage 301 (s1) generates feature maps 311 using any suitable convolutional technique or techniques. … Stage 301 (s1) may also include pooling).

As per claim 6, Chen teaches the invention substantially as claimed including a method of implementing a neural network in a neural network apparatus, the method comprising:
acquiring a plurality of operation groups, with each operation group including at least one input feature map, at least one kernel (par. 0058 CNN stage 800 receives, via input 801 [queue], for example from a previous CNN stage, input feature maps 811 such that input feature maps 811 have ‘n’ channels. Furthermore, input feature maps 811 may have any suitable size such that input feature maps 811 provide an input volume to CNN stage. For example, input feature maps 811 may each have H×W elements and input feature maps 811 may have ‘n’ channels as discussed herein.); and
performing, at the one convolution node in the idle state, a convolution operation between an input feature map and a kernel included of the one operation group to create an intermediate feature map (par. 0058 … Depth-wise convolution module 802 receives input feature maps 811 and applies a depth-wise separable convolution to input feature maps 811 to generate multiple separate 2D feature maps 812. Depth-wise convolution module 802 applies a per-channel 2D convolution that outputs ‘n’ separate 2D feature maps 812 using ‘n’ convolution kernels of size k×k×1);
determining, from tag information of the one operation group, a post-processing node, from among the plurality of post-processing nodes; and creating, at the post-processing node, an output feature map from the intermediate feature map (par. 0059 Depth-wise convolution module 802 receives intermediate feature maps 815 (or combined feature maps 814) and applies a depth-wise separable convolution to intermediate feature maps 815 (or combined feature maps 814) to generate multiple separate 2D feature maps 816. Depth-wise convolution module 806 applies a per-channel 2D convolution that outputs ‘n’ separate 2D feature maps 816 using ‘n’ convolution kernels of size k×k×1),
Chen does not expressly disclose: for each of the plurality of operation groups: determining one convolution node in an idle state from among a plurality of convolution nodes of the neural network; transferring from one queue unit, one operation group among the plurality of operation groups, to the one convolution node in the idle state.
However, Cipar teaches: for each of the plurality of operation groups:
determining one convolution node in an idle state from among a plurality of convolution nodes of the neural network (par. 0048, When a unit of data has reached a particular processing stage (processing stage i), the coordinator 203 determines if there is an idle resource available to execute the respective task for the unit of data (where this task is provided in the queue 510). The coordinator 203 first determines if there is an idle resource available [equiv. to operating unit in idle state]  from the respective stage-specific idle list 504 for the particular processing stage i. If so, the coordinator 203 assigns work to perform the respective task to the idle resource; page 6, claim 5 wherein a particular one of the processing stages is associated with a set of resources dedicated to the particular processing stage, the method further comprising: if a resource from the dedicated set is available, using the resource from the dedicated set to process at least one of the tasks at the particular processing stage).
transferring from one queue unit, one operation group among the plurality of operation groups, to the one convolution node in the idle state (par. 0048 … determines if there is an idle resource available … for the particular processing stage i. If so, the coordinator 203 assigns work to perform the respective task [assign implies sending task] to the idle resource from the stage-specific idle list 504; par. 0025 In an example operation, input data can be submitted [transferred] to the first processing stage 1. After processing of the input data, the processing stage 1 provides [transfers] processed data to the next processing stage).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chan to incorporate the method of assigning a task to an idle resource of a set of resources dedicated to each respective stages as set forth by Cipar because it would provide for dynamically assigning available resources for performing operations on input feature maps at least based on an availability of a resource at a particular stage in order to improve performance workload in such apparatus.
Chan and Cipar does not expressly disclose: at least one tag information identifying a position of a corresponding post-processing node among a plurality of post-processing nodes of the neural network.
However, Dean teaches: at least one tag information identifying a position of a corresponding post-processing node among a plurality of post-processing nodes of the neural network (par. 0028 … work queue master 214 uses input file information received from a file system to determine the appropriate processor [post processing unit] or process for executing a task; par. 0033 … For reduce tasks, the work queue master 214 may defer assigning any particular reduce task to an idle process [post processing unit]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chan and Cipar by incorporating the method of determining a processor unit to process a reduce task as set forth by Dean because it would provide for efficiently determining post processing units based on profile information associated with a workload/job, with predictable result.
Chan and Cipar and Dean does not teach: wherein some queue units, of a plurality of queue units of the neural network apparatus, are connected in a fully-connected configuration to some convolution nodes of the plurality of convolution nodes, and the some convolution nodes are connected in a corresponding fully- connected configuration to some post-processing nodes of the plurality of post-processing nodes, and wherein remaining queue units of the plurality of queue units are connected to only one corresponding convolution node and the remaining queue unit is connected to only one post- processing node.
However, Kang teaches: wherein some queue units, of a plurality of queue units of the neural network apparatus, are connected in a fully-connected configuration to some convolution nodes of the plurality of convolution nodes, and the some convolution nodes are connected in a corresponding fully- connected configuration to some post-processing nodes of the plurality of post-processing nodes, and wherein remaining queue units of the plurality of queue units are connected to only one corresponding convolution node and the remaining queue unit is connected to only one post- processing node (pg. 182, Fig. 5 and 6 describe partially connected feedforward neural networks PCFNN, wherein some input units [queues] are fully connected to multiple nodes and remaining input units are connected to only one respective node).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chan, Cipar and Dean by incorporating the technique of connecting inputs using a partially-connected configuration to corresponding computing nodes as set forth by Kang because use of such partially-connected connected configuration reduce the complexity of the network that causes the slow training time, especially for large networks. This would provide redundancy to allow balancing (in case of failure) workload for certain types of inputs while providing a dedicated computing node for other types of inputs.

As per claim 10, it is a non-transitory computer-readable storage medium having similar limitations as claim 1. Thus, claim 10 is rejected for the same rationale as claim 1.

As per claim 11, it is a neural network apparatus having similar limitations as claim 1. Thus, claim 11 is rejected for the same rationale as applied to claim 1. Chen further teaches a processor (Fig. 12, Central Processor 1201).

As per claim 14, it is a neural network apparatus having similar limitations as claim 4. Thus, claim 14 is rejected for the same rationale as applied to claim 4.

As per claim 15, it is a neural network apparatus having similar limitations as claim 5. Thus, claim 15 is rejected for the same rationale as applied to claim 5.

As per claim 20, Chen teaches further comprising a memory storing instructions that, when executed, configures the processor to acquire the operation group, to (par. 0033 … stage 301 (s1) receives normalized input image data 112, which is illustrated as 128.times.128 pixels of a single channel (e.g., grayscale). However, normalized input image data 112 may include any suitable input image data). Cipar further teaches: determine whether the operating circuit is in the idle state, to control the operating circuit in the idle state, to determine the post-processing circuit, and to control the post-processing circuit to create the output feature map (par. 0048, When a unit of data has reached a particular processing stage (processing stage i), the coordinator 203 determines if there is an idle resource available to execute the respective task for the unit of data (where this task is provided in the queue 510). The coordinator 203 first determines if there is an idle resource available from the respective stage-specific idle list 504 for the particular processing stage i. If so, the coordinator 203 assigns work to perform the respective task to the idle resource).

As per claim 23, Kang further teaches wherein the first queue unit is connected to a first corresponding operating circuit and second corresponding operating circuit, and wherein a second queue unit of the plurality of queue units is connected to only the first corresponding operating circuit (pg. 182, Fig. 5 and 6 describe partially connected feedforward neural networks PCFNN, wherein some input units [queues] are fully connected to multiple nodes and remaining input units are connected to only one respective node).

Claims 2, 8, 12, 16, 18 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Cipar, Dean and Kang, as applied to claims 1 and 11, and further in view of Sinha (U.S. Pub. No. 20150317169 A1).
Sinha was cited in a previous Office Action.

As per claim 2, Chen, Cipar, Dean and Kang teaches the limitations of claim 1. Cipar further teaches wherein the determining of the operating circuit in the idle state (par. 0048 a queue 510 of tasks that are waiting to be processed by the processing subsystem 200 … it is noted that there can be multiple queues 510, one for each processing stage) comprises: receiving, at a … [coordinator], a signal from the first queue unit indicating that a workload is to be processed, and a signal from the operating circuit in the idle state indicating that the operating circuit is in the idle state; matching, at the … [coordinator], the first queue unit with the operating circuit in the idle state (par. 0048 … When a unit of data has reached a particular processing stage (processing stage i), the coordinator 203 determines if there is an idle resource available to execute the respective task for the unit of data (where this task is provided in the queue 510). The coordinator 203 first determines if there is an idle resource available from the respective stage-specific idle list 504 for the particular processing stage i. If so, the coordinator 203 assigns work to perform the respective task to the idle resource from the stage-specific idle list 504 [matches the queue); and
transferring the one input feature map, the one kernel, and the tag information from the first queue unit to the operating circuit in the idle state (par. 0048 … determines if there is an idle resource available … for the particular processing stage i. If so, the coordinator 203 assigns work to perform the respective task [assign implies sending task] to the idle resource from the stage-specific idle list 504; par. 0025 In an example operation, input data can be submitted [transferred] to the first processing stage 1. After processing of the input data, the processing stage 1 provides [transfers] processed data to the next processing stage).
Chen, Cipar, Dean and Kang does not expressly disclose a load balancing unit.
However, Sinha teaches a load balancing unit (par. 0041 … In today's public-cloud environments, this is typically done through a service with a front-end load balancer acting as the arbiter and decision-maker on where a specific request should go, i.e., which physical server will receive the request). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chen, Cipar, Dean and Kang to incorporate a load balancer unit as set forth by Sinha because it would provide for efficiently distribute tasks/operations of a workload to be processed by resources of the system, with predictable results. 

As per claim 8, Sinha teaches wherein the load balancing circuit comprises an arbiter (par. 0041 … In today's public-cloud environments, this is typically done through a service with a front-end load balancer acting as the arbiter and decision-maker on where a specific request should go, i.e., which physical server will receive the request).

As per claim 12, it is a neural network apparatus having similar limitations as claim 2. Thus, claim 12 is rejected for the same rationale as applied to claim 2.

As per claim 16, it is a neural network apparatus having similar limitations as claim 6. Thus, claim 16 is rejected for the same rationale as applied to claim 6.

As per claim 18, it is a neural network apparatus having similar limitations as claim 8. Thus, claim 18 is rejected for the same rationale as applied to claim 8.

As per claim 22, Kang teaches wherein some operating circuits of the plurality of operating circuits are connected to each of some post-processing circuits of the plurality of post-processing circuits in a corresponding fully-connected configuration (pg. 182, Fig. 5 and 6 describes a partially connected neural network, wherein input units [queues] are connected to a hidden layer comprising plurality of intermediate and post processing nodes).

Claims 3, 9, 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Cipar, Dean and Kang as applied to claims 1 and 11 above, further in view of Dhong (U.S. Pub. No. 20020163881 A1).
Dhong was cited in a previous Office Action.

As per claim 3, Chen, Cipar, Dean and Kang teaches the limitations of claim 1. Cipar further teaches wherein the plurality of operating circuits are connected to the plurality of post-processing circuits …, and receiving … the intermediate feature map and the tag information from the operating circuit; and transferring … the intermediate feature map to the post- processing circuit … (par. 0020 … a map function processes corresponding segments of input data to produce intermediate results, where each of the multiple map tasks (that are based on the map function) processes corresponding segments of the input data. For example, the map tasks process input key-value pairs to generate a set of intermediate key-value pairs. The reduce tasks (based on the reduce function) produce an output from the intermediate results; Fig. 2 and 0025 In an example operation, input data can be submitted to the first processing stage 1. After processing of the input data, the processing stage 1 provides processed data to the next processing stage, which applies further processing on the data. This flow continues until the processed data reaches the last stage).
Chen, Cipar, Dean and Kang does not expressly describe a connection.
However, Dhong teaches a connection (page 7, claim 8, (a) the communications bus is also connected between a number of additional source nodes and the same number of additional destination nodes).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chan and Cipar, Dean and Kang to incorporate communication bus as set forth by Dhong because it would provide for communication workload task output between the processing nodes, with predictable results.

As per claim 9, Dhong teaches wherein the connection comprises at least one of a multiplexer or a bus (page 7, claim 8, (a) the communications bus is also connected between a number of additional source nodes and the same number of additional destination nodes).

As per claim 13, it is a neural network apparatus having similar limitations as claim 3. Thus, claim 13 is rejected for the same rationale as applied to claim 3.

As per claim 19, it is a neural network apparatus having similar limitations as claim 9. Thus, claim 19 is rejected for the same rationale as applied to claim 9.

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Cipar, Dean and Kang as applied to claims 1 and 11, further in view of Sprangle et al. (U.S. Pub. No. 20140149717 A1).
Sprangle was cited in a previous Office Action.

As per claim 7, Chen, Cipar, Dean and Kang teaches the limitations of claim 1. Chen, Cipar, Dean and Kang does not expressly teach: wherein the operating units comprise multiply and accumulate (MAC) units.
However, Sprangle teaches wherein the operating units comprise multiply and accumulate (MAC) units. (par. 0058 … As described above, the processor cores may include MAC units or other logic to perform user-level multiply-multiply instructions in accordance with an embodiment of the present invention).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Chen, Cipar, Dean and Kang to include MAC units as disclosed by Sprangle because providing MAC units as operating units would allow for computing complex multiplication operations, with predictable results.

As per claim 17, it is a neural network apparatus having similar limitations as claim 7. Thus, claim 17 is rejected for the same rationale as applied to claim 7.


Response to Arguments
Applicant's arguments with respect to claims 1, 6, 10, 11 have been considered but are moot in view of the new ground(s) of rejection. 

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Willy W. Huaracha whose telephone number is (571)270-5510.  The examiner can normally be reached on M-F 8:30-5:00pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571) 272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WH/
Examiner, Art Unit 2195

/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195