DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/6/2021, 10/6/2021, 8/11/2021, 8/9/2021, and 7/26/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 7 and 16 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 7 and 16 recite the limitation "said NN processor" in “wherein path separation begins within said NN processor at one or more input buffers (IBs) or L3 memory.”.  There is insufficient antecedent basis for this limitation in the claim. For interpretation purposes, Examiner will interpret the “NN processor” as the “neural network processor”.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 5-6, 10, 12, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Saxena et al. (US-20210279055-A1) in view of Huynh et al. (US-11232016-B1).
Regarding Claim 1,
Saxena (US 20210279055 A1) teaches a method of end to end failure detection for use in a neural network processor, the method comprising: 
providing a plurality of redundant hardware resources (fig. 15; para [0110] In at least one embodiment, as shown in FIG. 15, data center infrastructure layer 1510 may include a resource orchestrator 1512, grouped computing resources 1514, and node computing resources (“node C.R.s”) 1516(1)-1516(N), where “N” represents any whole, positive integer. Resources 1516(1)-1516(N) (i.e. redundant resources).) in said neural network processor (para [0166] In at least one embodiment, one or more of SoC(s) 1604 may include data store(s) 1616 (e.g., memory). In at least one embodiment, data store(s) 1616 may be on-chip memory of SoC(s) 1604, which may store neural networks to be executed on GPU(s) 1608 and/or DLA.); 
allocating a main computational path (para [0068] FIG. 1 illustrates a flowchart of a technique 100 of determining a transformation result based on at least one bit matrix multiply accumulate (BMMA) operation, according to at least one embodiment. BMMA operation (i.e. main computational path).) from said plurality of redundant hardware resources (para [0119] In at least one embodiment, at least one of grouped computing resources 1514 and node C.R. 1516 are used to determine a transformation result based, at least in part, on executing at least one BMMA instruction. And fig. 15; para [0110] In at least one embodiment, as shown in FIG. 15, data center infrastructure layer 1510 may include a resource orchestrator 1512, grouped computing resources 1514, and node computing resources (“node C.R.s”) 1516(1)-1516(N), where “N” represents any whole, positive integer. Resources 1516(1)-1516(N) (i.e. redundant resources).), said main computational path to be protected from end to end failures (para [0566] In at least one embodiment, a transport block may be data that is intended to be transmitted. In at least one embodiment, a transmission in a physical layer starts with grouped resource data, which may be referred to as transport blocks. In at least one embodiment, a transport block is received by a cyclic redundancy check (CRC) 4902. In at least one embodiment, a cyclic redundancy check is appended to each transport block for error detection.); 
allocating one or more redundant computational paths (para [0073] In at least one embodiment, determining a transformation result based, at least in part, on performing one or more BMMA operations at block 106 includes generating an encoded output set of bits that represents an encoded set of data based, at least in part on an input set of bits. In at least one embodiment, encoded output set of bits represents a low density parity check (LDPC) encoded set of data. In at least one embodiment, one or more processors generate encoded output set of bits in response to performing one or more sets of BMMA operations. One or more sets of BMMA operations (i.e. one or more redundant computational paths).) from said plurality of redundant hardware resources (para [0110] and para [0119]), said one or more redundant computational paths operative to protect said main computational path from end to end failures (para [0566] In at least one embodiment, a transport block may be data that is intended to be transmitted. In at least one embodiment, a transmission in a physical layer starts with grouped resource data, which may be referred to as transport blocks. In at least one embodiment, a transport block is received by a cyclic redundancy check (CRC) 4902. In at least one embodiment, a cyclic redundancy check is appended to each transport block for error detection.); 
tensor stream data output from said main computational path (para [0085] In at least one embodiment, BMMA instruction is a linear tensor instruction) and said one or more redundant computational paths (para [0452] In at least one embodiment, tensor cores are configured to perform deep learning matrix arithmetic, such as convolution operations for neural network training and inferencing. In at least one embodiment, each tensor core operates on a 4×4 matrix and performs a matrix multiply and accumulate operation D=A×B+C, where A, B, C, and D are 4×4 matrices. Tensor data output by BMMA (matrix multiply and accumulate) computational paths.); and 
detecting an error if said calculated CRC checksums do not match (para [0566] In at least one embodiment, a transport block may be data that is intended to be transmitted. In at least one embodiment, a transmission in a physical layer starts with grouped resource data, which may be referred to as transport blocks. In at least one embodiment, a transport block is received by a cyclic redundancy check (CRC) 4902. In at least one embodiment, a cyclic redundancy check is appended to each transport block for error detection.).
While Saxena teaches tensor data output from a computational path Saxena does not explicitly disclose calculating cyclic redundancy code checksums on tensor data output. In other words, Saxena does not explicitly disclose
calculating cyclic redundancy code (CRC) checksums on tensor stream data output from said main computational path and said one or more redundant computational paths;
However, Huynh (US 11232016 B1) teaches
calculating cyclic redundancy code (CRC) checksums on tensor stream data (Col. 3 lines 7-16; Training an artificial neural network or using the trained artificial neural network for inference generally requires a significant amount of computation power to perform, for example, the matrix multiplications or convolutions. Thus, specialized hardware circuits, such as graphic processing units (GPUs), tensor processing units (TPUs), neural network processing units (NPUs), FPGAs, ASICs, or other highly parallel processing circuits may be used for the training and/or inference.) output from said main computational path and said one or more redundant computational paths (Col. 2 lines 16-18; In some embodiments, one or more cyclic redundancy check (CRC) circuits may be added at the input and/or output of each processing engine of a neural network. And Col. 12 lines 21-29; At block 420, the compiler may calculate expected debug outputs, such as error detection codes (e.g., CRC bits), for various operations and instructions described in the neural network model. For example, the compiler may compute the ideal or expected CRC bits for the input data for an operation (e.g., filtering, convolution, activation, pooling, etc.) and the CRC bits for the output data of the operation based on the neural network model described in a high-level programming language, such as a functional C model.);
detecting an error if said calculated CRC checksums do not match (Col. 10 lines 38-43; If the CRC bits for the input data match the expected CRC bits for the input data for an instruction, but the CRC bits for the output data do not match the expected CRC bits for the output data for the instruction, the processing engine may have malfunctioned for at least that instruction.);
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the tensor data of Saxena with the CRC error detection of Huynh.
Doing so would allow for logging CRC signatures to reduce the total amount of debug data logged significantly reduced to minimize the impact of debugging circuit on the performance of the system (Huynh Col. 2 lines 26-32;)
Regarding Claim 3,
Saxena and Huynh teach the method according to claim 1. Saxena further teaches wherein said main computational path and said one or more redundant computational paths are functionally identical to each other (para [0073] In at least one embodiment, determining a transformation result based, at least in part, on performing one or more BMMA operations at block 106 includes generating an encoded output set of bits that represents an encoded set of data based, at least in part on an input set of bits. In at least one embodiment, encoded output set of bits represents a low density parity check (LDPC) encoded set of data. In at least one embodiment, one or more processors generate encoded output set of bits in response to performing one or more sets of BMMA operations. BMMA operations are functionally identical.).
Regarding Claim 5,
Saxena and Huynh teach the method according to claim 1. Saxena further teaches wherein said main computational path and said one or more redundant computational paths use different control resources selected from a group consisting of layer controller units (LCUs) and memory management units (MMUs) (para [0111] In at least one embodiment, separate groupings of node C.R.s within grouped computing resources 1514 may include grouped compute, network, memory or storage resources that may be configured or allocated to support one or more workloads. Memory or storage computing resources (i.e. memory management units).).
Regarding Claim 6,
Saxena and Huynh teach the method according to claim 1. Saxena further teaches wherein hardware resources for said main computational path and said one or more redundant computational paths are allocated in different clusters for each path (para [0119] In at least one embodiment, at least one of grouped computing resources 1514 and node C.R. 1516 are used to determine a transformation result based, at least in part, on executing at least one BMMA instruction. Grouped (i.e. clustered).).
Regarding Claim 10,
Saxena teaches an apparatus for end to end failure detection for use in a neural network processor, comprising: 
a plurality of redundant hardware resources (fig. 15; para [0110] In at least one embodiment, as shown in FIG. 15, data center infrastructure layer 1510 may include a resource orchestrator 1512, grouped computing resources 1514, and node computing resources (“node C.R.s”) 1516(1)-1516(N), where “N” represents any whole, positive integer. Resources 1516(1)-1516(N) (i.e. redundant resources).) within said neural network processor (para [0166] In at least one embodiment, one or more of SoC(s) 1604 may include data store(s) 1616 (e.g., memory). In at least one embodiment, data store(s) 1616 may be on-chip memory of SoC(s) 1604, which may store neural networks to be executed on GPU(s) 1608 and/or DLA.); 
a main computational path (para [0068] FIG. 1 illustrates a flowchart of a technique 100 of determining a transformation result based on at least one bit matrix multiply accumulate (BMMA) operation, according to at least one embodiment. BMMA operation (i.e. main computational path).) allocated from said plurality of redundant hardware resources (para [0119] In at least one embodiment, at least one of grouped computing resources 1514 and node C.R. 1516 are used to determine a transformation result based, at least in part, on executing at least one BMMA instruction. And fig. 15; para [0110] In at least one embodiment, as shown in FIG. 15, data center infrastructure layer 1510 may include a resource orchestrator 1512, grouped computing resources 1514, and node computing resources (“node C.R.s”) 1516(1)-1516(N), where “N” represents any whole, positive integer. Resources 1516(1)-1516(N) (i.e. redundant resources).), said main computational path to be protected from end to end failures (para [0566] In at least one embodiment, a transport block may be data that is intended to be transmitted. In at least one embodiment, a transmission in a physical layer starts with grouped resource data, which may be referred to as transport blocks. In at least one embodiment, a transport block is received by a cyclic redundancy check (CRC) 4902. In at least one embodiment, a cyclic redundancy check is appended to each transport block for error detection.); 
one or more redundant computational paths (para [0073] In at least one embodiment, determining a transformation result based, at least in part, on performing one or more BMMA operations at block 106 includes generating an encoded output set of bits that represents an encoded set of data based, at least in part on an input set of bits. In at least one embodiment, encoded output set of bits represents a low density parity check (LDPC) encoded set of data. In at least one embodiment, one or more processors generate encoded output set of bits in response to performing one or more sets of BMMA operations. One or more sets of BMMA operations (i.e. one or more redundant computational paths).) allocated from said plurality of redundant hardware resources (para [0110] and para [0119]), said one or more redundant computational paths operative to protect said main computational path from end to end failures (para [0566] In at least one embodiment, a transport block may be data that is intended to be transmitted. In at least one embodiment, a transmission in a physical layer starts with grouped resource data, which may be referred to as transport blocks. In at least one embodiment, a transport block is received by a cyclic redundancy check (CRC) 4902. In at least one embodiment, a cyclic redundancy check is appended to each transport block for error detection.); and 
tensor stream data (para [0085] In at least one embodiment, BMMA instruction is a linear tensor instruction) output from said main computational path (para [0452] In at least one embodiment, tensor cores are configured to perform deep learning matrix arithmetic, such as convolution operations for neural network training and inferencing. In at least one embodiment, each tensor core operates on a 4×4 matrix and performs a matrix multiply and accumulate operation D=A×B+C, where A, B, C, and D are 4×4 matrices. Tensor data output by BMMA (matrix multiply and accumulate) computational paths.) and said one or more redundant computational paths and to detect an error if said calculated CRC checksums do not match (para [0566] In at least one embodiment, a transport block may be data that is intended to be transmitted. In at least one embodiment, a transmission in a physical layer starts with grouped resource data, which may be referred to as transport blocks. In at least one embodiment, a transport block is received by a cyclic redundancy check (CRC) 4902. In at least one embodiment, a cyclic redundancy check is appended to each transport block for error detection.).
While Saxena teaches tensor data output from a computational path Saxena does not explicitly disclose calculating cyclic redundancy code checksums on tensor data output. In other words, Saxena does not explicitly disclose
one or more cyclic redundancy code (CRC) engines operative to generate CRC checksums on tensor stream data output from said main computational path and said one or more redundant computational paths…
However, Huynh (US 11232016 B1) teaches
one or more cyclic redundancy code (CRC) engines operative to generate CRC checksums on tensor stream data (Col. 3 lines 7-16; Training an artificial neural network or using the trained artificial neural network for inference generally requires a significant amount of computation power to perform, for example, the matrix multiplications or convolutions. Thus, specialized hardware circuits, such as graphic processing units (GPUs), tensor processing units (TPUs), neural network processing units (NPUs), FPGAs, ASICs, or other highly parallel processing circuits may be used for the training and/or inference.) output from said main computational path (Col. 2 lines 16-18; In some embodiments, one or more cyclic redundancy check (CRC) circuits may be added at the input and/or output of each processing engine of a neural network. And Col. 12 lines 21-29; At block 420, the compiler may calculate expected debug outputs, such as error detection codes (e.g., CRC bits), for various operations and instructions described in the neural network model. For example, the compiler may compute the ideal or expected CRC bits for the input data for an operation (e.g., filtering, convolution, activation, pooling, etc.) and the CRC bits for the output data of the operation based on the neural network model described in a high-level programming language, such as a functional C model.) and said one or more redundant computational paths and to detect an error if said calculated CRC checksums do not match (Col. 10 lines 38-43; If the CRC bits for the input data match the expected CRC bits for the input data for an instruction, but the CRC bits for the output data do not match the expected CRC bits for the output data for the instruction, the processing engine may have malfunctioned for at least that instruction.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the tensor data of Saxena with the CRC error detection of Huynh.
Doing so would allow for logging CRC signatures to reduce the total amount of debug data logged significantly reduced to minimize the impact of debugging circuit on the performance of the system (Huynh Col. 2 lines 26-32;)
Regarding Claim 12,
Claim 12 is the apparatus corresponding to the method of claim 1. Claim 12 is substantially similar to claim 3 and is rejected on the same grounds.
Regarding Claim 14,
Claim 14 is the apparatus corresponding to the method of claim 1. Claim 14 is substantially similar to claim 5 and is rejected on the same grounds.
Regarding Claim 15,
Claim 15 is the apparatus corresponding to the method of claim 1. Claim 15 is substantially similar to claim 6 and is rejected on the same grounds.

Claim(s) 2, 11, 17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Saxena et al. (US-20210279055-A1) in view of Huynh et al. (US-11232016-B1) and Brady et al. (US 20190392296 A1).
Regarding Claim 2,
Saxena and Huynh teach the method according to claim 1. 
	Saxena and Huynh do not explicitly disclose
wherein said allocation is determined a priori by a compiler in accordance with desired performance goals for a target neural network.
However, Brady (US 20190392296 A1) teaches
wherein said allocation is determined a priori by a compiler in accordance with desired performance goals for a target neural network (para [0057] In some implementations, a composition API may be provided, which is configured to generate an intermediate representation, or “computation model” 140, for the particular neural network. In some instances, an operation registry 1212 may be provided to define, within the compiler, a number of operations of which the compiler 105 is familiar and that may correspond to nodes in example neural network graphs. The operation registry 1212 may be used to define how the compiler is to handle allocation of hardware resources in order to enable performance of the particular operation. And para [0059] The entries on such a list and their order may be specific for both target platform and compilation objective, for instance to optimize for performance or optimize for size.); 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural network resources of Saxena and Huynh with the resource allocation of Brady.
Doing so would allow for optimizing performance and size of the neural network for different platforms and devices (para [0059]).
Regarding Claim 11,
Claim 11 is the apparatus corresponding to the method of claim 1. Claim 11 is substantially similar to claim 2 and is rejected on the same grounds.
Regarding Claim 17,
Saxena (US 20210279055 A1) teaches a method of end to end failure detection for use in a neural network processor, the method comprising: 
providing a plurality of redundant hardware resources in said neural network processor (fig. 15; para [0110] In at least one embodiment, as shown in FIG. 15, data center infrastructure layer 1510 may include a resource orchestrator 1512, grouped computing resources 1514, and node computing resources (“node C.R.s”) 1516(1)-1516(N), where “N” represents any whole, positive integer. Resources 1516(1)-1516(N) (i.e. redundant resources).); 
configuring a plurality of redundant computational tensor data flow paths (para [0073] In at least one embodiment, determining a transformation result based, at least in part, on performing one or more BMMA operations at block 106 includes generating an encoded output set of bits that represents an encoded set of data based, at least in part on an input set of bits. In at least one embodiment, encoded output set of bits represents a low density parity check (LDPC) encoded set of data. In at least one embodiment, one or more processors generate encoded output set of bits in response to performing one or more sets of BMMA operations. One or more sets of BMMA operations (i.e. one or more redundant computational paths).) from said plurality of redundant hardware resources (para [0119] In at least one embodiment, at least one of grouped computing resources 1514 and node C.R. 1516 are used to determine a transformation result based, at least in part, on executing at least one BMMA instruction. And fig. 15; para [0110] In at least one embodiment, as shown in FIG. 15, data center infrastructure layer 1510 may include a resource orchestrator 1512, grouped computing resources 1514, and node computing resources (“node C.R.s”) 1516(1)-1516(N), where “N” represents any whole, positive integer. Resources 1516(1)-1516(N) (i.e. redundant resources).), said plurality of redundant computational tensor data flow paths (para [0119] In at least one embodiment, at least one of grouped computing resources 1514 and node C.R. 1516 are used to determine a transformation result based, at least in part, on executing at least one BMMA instruction. functionally identical to each other (para [0073] In at least one embodiment, determining a transformation result based, at least in part, on performing one or more BMMA operations at block 106 includes generating an encoded output set of bits that represents an encoded set of data based, at least in part on an input set of bits. In at least one embodiment, encoded output set of bits represents a low density parity check (LDPC) encoded set of data. In at least one embodiment, one or more processors generate encoded output set of bits in response to performing one or more sets of BMMA operations. BMMA operations are functionally identical.) and operative to provide protection from end to end failures by way of said redundancy (para [0566] In at least one embodiment, a transport block may be data that is intended to be transmitted. In at least one embodiment, a transmission in a physical layer starts with grouped resource data, which may be referred to as transport blocks. In at least one embodiment, a transport block is received by a cyclic redundancy check (CRC) 4902. In at least one embodiment, a cyclic redundancy check is appended to each transport block for error detection.); and 
redundant computation tensor data (para [0085] In at least one embodiment, BMMA instruction is a linear tensor instruction) flow path (para [0452] In at least one embodiment, tensor cores are configured to perform deep learning matrix arithmetic, such as convolution operations for neural network training and inferencing. In at least one embodiment, each tensor core operates on a 4×4 matrix and performs a matrix multiply and accumulate operation D=A×B+C, where A, B, C, and D are 4×4 matrices. Tensor data output by BMMA (matrix multiply and accumulate) computational paths.) and detecting an error if a mismatch is detected (para [0566] In at least one embodiment, a transport block may be data that is intended to be transmitted. In at least one embodiment, a transmission in a physical layer starts with grouped resource data, which may be referred to as transport blocks. In at least one embodiment, a transport block is received by a cyclic redundancy check (CRC) 4902. In at least one embodiment, a cyclic redundancy check is appended to each transport block for error detection.).
Saxena does not explicitly disclose
determining a resource allocation scheme in accordance with desired performance goals for a target neural network; 
comparing cyclic redundancy code (CRC) checksums generated for each redundant computation tensor data flow path and detecting an error if a mismatch is detected;
However, Brady (US 20190392296 A1) teaches
determining a resource allocation scheme in accordance with desired performance goals for a target neural network (para [0057] In some implementations, a composition API may be provided, which is configured to generate an intermediate representation, or “computation model” 140, for the particular neural network. In some instances, an operation registry 1212 may be provided to define, within the compiler, a number of operations of which the compiler 105 is familiar and that may correspond to nodes in example neural network graphs. The operation registry 1212 may be used to define how the compiler is to handle allocation of hardware resources in order to enable performance of the particular operation. And para [0059] The entries on such a list and their order may be specific for both target platform and compilation objective, for instance to optimize for performance or optimize for size.); 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural network resources of Saxena with the resource allocation of Brady.
Doing so would allow for optimizing performance and size of the neural network for different platforms and devices (Brady para [0059]).
However, Huynh (US 11232016 B1) teaches
comparing cyclic redundancy code (CRC) checksums generated for each redundant computation tensor data (Col. 3 lines 7-16; Training an artificial neural network or using the trained artificial neural network for inference generally requires a significant amount of computation power to perform, for example, the matrix multiplications or convolutions. Thus, specialized hardware circuits, such as graphic processing units (GPUs), tensor processing units (TPUs), neural network processing units (NPUs), FPGAs, ASICs, or other highly parallel processing circuits may be used for the training and/or inference.) flow path (Col. 2 lines 16-18; In some embodiments, one or more cyclic redundancy check (CRC) circuits may be added at the input and/or output of each processing engine of a neural network. And Col. 12 lines 21-29; At block 420, the compiler may calculate expected debug outputs, such as error detection codes (e.g., CRC bits), for various operations and instructions described in the neural network model. For example, the compiler may compute the ideal or expected CRC bits for the input data for an operation (e.g., filtering, convolution, activation, pooling, etc.) and the CRC bits for the output data of the operation based on the neural network model described in a high-level programming language, such as a functional C model.) and detecting an error if a mismatch is detected (Col. 10 lines 38-43; If the CRC bits for the input data match the expected CRC bits for the input data for an instruction, but the CRC bits for the output data do not match the expected CRC bits for the output data for the instruction, the processing engine may have malfunctioned for at least that instruction.);
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the tensor data of Saxena with the CRC error detection of Huynh.
Doing so would allow for logging CRC signatures to reduce the total amount of debug data logged significantly reduced to minimize the impact of debugging circuit on the performance of the system (Huynh Col. 2 lines 26-32;)
Regarding Claim 19,
Saxena, Brady, and Huynh teach the method according to claim 17. Saxena further teaches wherein said plurality of redundant computational tensor data flow paths use different control resources selected from a group consisting of layer controller units (LCUs) and memory management units (MMUs) (para [0111] In at least one embodiment, separate groupings of node C.R.s within grouped computing resources 1514 may include grouped compute, network, memory or storage resources that may be configured or allocated to support one or more workloads. Memory or storage computing resources (i.e. memory management units).).
Regarding claim 20,
Saxena, Brady, and Huynh teach the method according to claim 17. Saxena further teaches wherein hardware resources for said plurality of redundant computational tensor data flow paths are allocated in different clusters for each path (para [0119] In at least one embodiment, at least one of grouped computing resources 1514 and node C.R. 1516 are used to determine a transformation result based, at least in part, on executing at least one BMMA instruction. Grouped (i.e. clustered).).

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saxena/Huynh, as applied above, and further in view of Banyai et al. (US 20200136943 A1).
Regarding Claim 4,
Saxena and Huynh teach the method according to claim 1.
	Saxena and Huynh do not explicitly disclose 
wherein said main computational path and said one or more redundant computational paths use different data resources selected from a group consisting of stream managers (SMs), portions of L4 memory allocated to said stream managers, input buffers (IBs), portions of L3 memory, input aligners (IAs), subsclusters (SCs), activation processing units (APUs), and output buffers (OBs).
However, Banyai (US 20200136943 A1) teaches
wherein said main computational path and said one or more redundant computational paths use different data resources selected from a group consisting of stream managers (SMs), portions of L4 memory allocated to said stream managers, input buffers (IBs), portions of L3 memory, input aligners (IAs), subsclusters (SCs), activation processing units (APUs), and output buffers (OBs) (para [0063] At block 800, a fixed number of cache ways (for example, second subset of cache ways 706) are allocated in the L3 cache 606 to store data shared by the solid-state state drive 404 and the network interface controller 302. Portions of L3 memory.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify neural network resources of Saxena and Huynh with the memory resource allocation of Banyai.
Doing so would allow for reducing workload variability providing a more precise and predictable resource allocation of storage. This enables more accurate service level predictability (Banyai para [0059]).
Regarding Claim 13,
Claim 13 is the apparatus corresponding to the method of claim 1. Claim 13 is substantially similar to claim 4 and is rejected on the same grounds.

Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saxena/Huynh, as applied above, and further in view of Shao et al. (US 20200174829 A1).
Regarding Claim 7,
Saxena and Huynh teaches the method according to claim 1. 
	Saxena and Huynh do not explicitly disclose
wherein path separation begins within said NN processor at one or more input buffers (IBs) or L3 memory.
However, Shao (US 20200174829 A1) teaches
wherein path separation begins within said NN processor at one or more input buffers (IBs) or L3 memory (para [0065] The demultiplexer may allocate tasks to a processing unit by marking or otherwise identifying tasks of the workload as being for processing at a particular processing unit—for example, the demultiplexer may cause tasks to be allocated to a processing unit by allocating the task to an input buffer of that processing unit from which the processing unit retrieves its tasks.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the resource allocation of Saxena and Huynh with the task allocation of Shao.
Doing so would allow for increased computational performance, reduced latency, and increased throughput (Shao para [0132]).
Regarding Claim 16,
Claim 16 is the apparatus corresponding to the method of claim 1. Claim 16 is substantially similar to claim 7 and is rejected on the same grounds.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saxena/Huynh, as applied above, and further in view of Morrison et al. (US 9361104 B2).
Regarding Claim 8,
Saxena and Huynh teach the method according to claim 1. 
	Saxena and Huynh do not explicitly disclose
further comprising providing built in self-test (BIST) ability where one of said calculated CRC checksums is intentionally altered so as to force generation of an error signal.
However, Morrison (US 9361104 B2) teaches
further comprising providing built in self-test (BIST) ability where one of said calculated CRC checksums is intentionally altered so as to force generation of an error signal (col. 2 lines 15-18; By using a subsequent cross-check instruction that is different than the reference instruction to check execution of the reference instruction, hard (permanent) failures can be detected that could not be detected by re-executing the same instruction. The use of temporal redundancy in conjunction with built-in self testing (BIST) and cyclic redundancy check (CRC) mechanisms can reduce the need for lockstep processing.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the CRC of Saxena and Huynh with the built-in self testing of Morrison.
Doing so would allow for reducing the need for lockstep processing (Morrison col. 2 lines 15-18;).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saxena/Huynh, as applied above, and further in view of Paek et al. (US 20200379740 A1).
Regarding Claim 9,
Saxena and Huynh teach the method according to claim 1. 
	Saxena and Huynh do not explicitly disclose
wherein hardware resources for said main computational path and said one or more redundant computational paths are allocated on a per layer basis.
However, Paek (US 20200379740 A1) teaches
wherein hardware resources for said main computational path and said one or more redundant computational paths are allocated on a per layer basis (para [0078] In an example, the code includes particular code (e.g., C code) corresponding to allocations of memory for each layer of the set of layers. Moreover, determining the respective memory allocation for each respective layer is based at least in part on a resource constraint (e.g., a total amount of memory and/or an amount of available memory) of a target device (e.g., the wireless audio output device 104).).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the neural network of Saxena and Huynh with the resource allocation of Paek.
Doing so would allow for improving the performance of neural networks by avoiding using dynamic memory allocation techniques. It may not be feasible for some processors to perform dynamic memory allocation (Paek para [0016]). 

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Saxena/Brady/Huynh, as applied above, and further in view of Banyai et al. (US 20200136943 A1).
Regarding Claim 18,
Saxena, Brady, and Huynh teach the method according to claim 17. 
	Saxena, Brady, and Huynh do not explicitly disclose
wherein said plurality of redundant computational tensor data flow paths use different data resources selected from a group consisting of stream managers (SMs), portions of L4 memory allocated to said stream managers, input buffers (IBs), portions of L3 memory, input aligners (IAs), subsclusters (SCs), activation processing units (APUs), and output buffers (OBs).
However, Banyai teaches 
wherein said plurality of redundant computational tensor data flow paths use different data resources selected from a group consisting of stream managers (SMs), portions of L4 memory allocated to said stream managers, input buffers (IBs), portions of L3 memory, input aligners (IAs), subsclusters (SCs), activation processing units (APUs), and output buffers (OBs) (para [0063] At block 800, a fixed number of cache ways (for example, second subset of cache ways 706) are allocated in the L3 cache 606 to store data shared by the solid-state state drive 404 and the network interface controller 302.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify neural network resources of Saxena, Brady, and Huynh with the memory resource allocation of Banyai.
Doing so would allow for reducing workload variability providing a more precise and predictable resource allocation of storage. This enables more accurate service level predictability (para [0059]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure Park et al. (US 20220248387 A1) – discloses resource allocation and cyclic redundancy check.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.N./Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121