DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant, via amendment, has overcome the claim rejections under 35 U.S.C. 112(a) and 112(b) rejections set forth in the previous Office action. Specifically, claim 1 has been amended to no longer receive interpretation under 35 U.S.C. 112(f), resulting in overcoming the claim rejections under 35 U.S.C. 112(a) and 112(b). Therefore, the rejections have been withdrawn.
Applicant’s arguments, see page 8 of reply, filed 12/01/2021, with respect to the rejection(s) of claim(s) 1, 5, 6, 8, 9, 14, 18, and 19 under 35 U.S.C. 102(a)(1) as being anticipated by Korthikanti et al., U.S. Patent No. 10,169,296 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Or Sharir and Amnon Shashua, “On the Expressive Power of Overlapping Operations of Deep Networks”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1, 5, 8, 9, 14, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Korthikanti et al., U.S. Patent No. 10,169,296 (herein Korthikanti) in view of Or Sharir and Amnon Shashua, “On the Expressive Power of Overlapping Operations of Deep Networks” (herein Sharir). 

Regarding claims 1 and 14, taking clam 1 as exemplary, Korthikanti teaches a neural network processor [The processing system is used in implementing artificial neural networks. Korthikanti at column 4, lines 10-22], comprising: 
a plurality of neurons [Processing resources 210 (e.g. matrix processing units 234). Korthikanti at column 9, lines 20-46; FIGS. 2A-2C]; and 
a group partitioner and scheduler unit [Master control CPU  (MCC) 232. Korthikanti at column 11, lines 31-36] configured to 
divide a workload for the neural network processor into a plurality of partitions based on a quantity of the plurality of neurons [MCC 232 uses the slicing engine 236 to partition the matrix operands (i.e. input and weight volumes) based on the number of available processing resources 210 (i.e. neurons). Korthikanti at column 12, lines 11-16; column 14, lines 1-7; column 24, line 57 – column 25, line 11; FIG. 9], and 
assign a group of the neurons to each of the plurality of partitions [MCC 232 distributes the matrix operands across the processing resources, such as across MPUS in a cluster 230 (i.e. groups of neurons) or across logical processing nodes (i.e. groups of neurons). Korthikanti at column 10, lines 16-20; column 13, line 65 – column 14, line 7; column 14, lines 26-46; column 25, lines 12-25; FIG. 9] to maximize a total number of the plurality of neurons that simultaneously processor the workload while reducing power consumption [The operations are scaled/distributed across the processing resources achieving 100% processing efficiency, thereby maximizing processing and reducing power consumption. Korthikanti at column 5, lines 25-31; column 12, lines 52-60; column 13, lines 5-12]; and 
wherein the neurons within each group of neurons are configured to: 
[The processing resources 210 (e.g. MPU 234) processes the matrix portion to generate partial results. Korthikanti at column 11, lines 39-52; column 14, line 7 – 14; column 25, lines 36-40; FIG. 9] by
performing a convolution operation on a partition containing a portion of an input volume and a portion of a weight volume [The processing includes performing convolution operations on operands/smaller matrices of input/activations matrices and weight matrices (i.e. a portion of an input volume and a portion of a weight volume). Korthikanti at column 9, lines 20-26; column 11, lines 39-52; column 12, lines 11-2; column 18, lines 24-55], and
sum partial output values generated by the neurons in each group of neurons to generate an output value for the workload [The processing resources 210 receive partial matrix data (i.e. partial output values) from neighboring processing resources 210 (i.e. in each group of neurons) and continue matrix operations, which includes summing of the received partial matrix data. Korthikanti at column 14, line 10-25; column 22, lines 37-47; Column 25, line 41 – column 26, line 3; FIG. 9].
Korthikanti doesn’t teach that performing the convolution operation is on overlapping intervals defined by strides in two dimensions. In the same field of neural network processing, Sharir teaches performing a convolution operation on overlapping intervals defined by strides in two dimensions [The convolution operation is performed on overlapping sections and is defined by strides S x S (i.e. strides in two dimension). Sharir at Abstract; Section 3, pages 3-4; Fig. 2]. Having overlapping intervals defined by strides in two dimensions increases the expressive capacity of the neural network [Sharir at Abstract]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the convolution operation of Korthikanti to be performed on overlapping intervals defined by strides in two dimensions, as taught by Sharir, because it would increase the expressive capacity of the implemented neural network.

Regarding claim 8, Korthikanti teaches a neural network processor [The processing system is used in implementing artificial neural networks. Korthikanti at column 4, lines 10-22], comprising: 
[High bandwidth memory (HBM) 240 storing matrix operands, wherein the matrix operands include input matrices and weight matrices (i.e. input and weight volumes) . Korthikanti at column 10, line 55 – column 11, line 8; column 18, lines 24-28]; 
a plurality of neurons [Processing resources 210 (e.g. matrix processing units 234). Korthikanti at column 9, lines 20-46; FIGS. 2A-2C]; and 
a group partitioner and scheduler [Master control CPU  (MCC) 232. Korthikanti at column 11, lines 31-36] configured to 
partition the input volume and the weight volume into a plurality of partitions based on a quantity of the plurality of neurons [MCC 232 uses the slicing engine 236 to partition the matrix operands (i.e. input and weight volumes) based on the number of available processing resources 210 (i.e. neurons). Korthikanti at column 12, lines 11-16; column 14, lines 1-7; column 24, line 57 – column 25, line 11; FIG. 9], and 
assign a group of the neurons to each of the plurality of partitions [MCC 232 distributes the matrix operands across the processing resources, such as across MPUS in a cluster 230 (i.e. groups of neurons) or across logical processing nodes (i.e. groups of neurons). Korthikanti at column 10, lines 16-20; column 13, line 65 – column 14, line 7; column 14, lines 26-46; column 25, lines 12-25; FIG. 9], to maximize a total number of the plurality of neurons that simultaneously process a workload while reducing power consumption [The operations are scaled/distributed across the processing resources achieving 100% processing efficiency, thereby maximizing processing and reducing power consumption. Korthikanti at column 5, lines 25-31; column 12, lines 52-60; column 13, lines 5-12]; and 
wherein the neurons within each group of neurons are configured to 
process a workload defined by an assigned partition to generate a partial output value [The processing resources 210 (e.g. MPU 234) processes the matrix portion to generate partial results. Korthikanti at column 11, lines 39-52; column 14, line 7 – 14; column 25, lines 36-40; FIG. 9] by
performing a convolution operation on a partition containing a portion of an input volume and a portion of a weight volume [The processing includes performing convolution operations on operands/smaller matrices of input/activations matrices and weight matrices (i.e. a portion of an input volume and a portion of a weight volume). Korthikanti at column 9, lines 20-26; column 11, lines 39-52; column 12, lines 11-2; column 18, lines 24-55], and
sum partial output values generated by the neurons in each group of neurons to generate an output value for the workload [The processing resources 210 receive partial matrix data (i.e. partial output values) from neighboring processing resources 210 (i.e. in each group of neurons) and continue matrix operations, which includes summing of the received partial matrix data. Korthikanti at column 14, line 10-25; column 22, lines 37-47; Column 25, line 41 – column 26, line 3; FIG. 9].
Korthikanti doesn’t teach that performing the convolution operation is on overlapping intervals defined by strides in two dimensions. In the same field of neural network processing, Sharir teaches performing a convolution operation on overlapping intervals defined by strides in two dimensions [The convolution operation is performed on overlapping sections and is defined by strides S x S (i.e. strides in two dimension). Sharir at Abstract; Section 3, pages 3-4; Fig. 2]. Having overlapping intervals defined by strides in two dimensions increases the expressive capacity of the neural network [Sharir at Abstract]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the convolution operation of Korthikanti to be performed on overlapping intervals defined by strides in two dimensions, as taught by Sharir, because it would increase the expressive capacity of the implemented neural network.

Regarding claims 5, 9, and 18, taking claim 5 as exemplary, Korthikanti and Sharir teach the neural network processor of claim 1, wherein the workload is divided into a plurality of partitions such that the number of neurons that can simultaneously process the workload is maximized [The operartions/workload is scaled across the processing resources achieving 100% processing efficiency. Korthikanti at column 5, lines 25-31; column 12, lines 52-60; column 13, lines 5-12].


Claims 2-4, 10-12, and 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Korthikanti and Sharir and, further, in view of Shen et al., “Maximizing CNN Accelerator Efficiency Through Resource Partitioning” (herein Shen).

Regarding claims 2, 10, and 15, taking claim 2 as exemplary, Korthikanti and Sharir teach the neural network processor of claim 1, wherein the workload comprises an input volume and a weight volume having height and width dimensions [The workload comprises input and weight matrices (i.e. volumes) having height (row) and width (column) dimensions. Korthikanti at column 10, line 55 – column 11, line 8; column 18, lines 24-28; FIG. 5]. Korthikanti and Sharir don’t explicitly teach that the input volume and weight volume have a depth dimension, wherein the workload is partitioned along the depth dimension. However, Korthikanti teaches that the stored and retrieved data from memory may comprise any number of dimensions, including three dimensions (i.e. having a depth dimension) and that the workload is partitioned along any dimension [Korthikanti at column 10, line 55 – column 11, line 8; column 14, lines 1-7]. In the same field of neural network processing, Shen teaches partitioning a workload in a convolutional layer processor, wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions [Shen at Fig. 3 and Section II], and wherein the workload is partitioned along the depth dimension [The workload is partitioned along the layer dimensions, including the depth dimension. Shen at Section IV, 1st paragraph; Fig. 1]. Shen teaches that partitioning the workload along the layer dimensions increases the computational efficiency and increases overall throughput [Shen at Abstract]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the partitioning of Korthikanti such that the input volume and weight volume are three dimensions (i.e. have a depth dimension) and the workload (i.e. input and weight volumes) is partitioned along the depth dimension, as taught by Shen because doing so would increases the computational efficiency and increases overall throughput.

Regarding claims 3, 11, and 16, taking claim 3 as exemplary, Korthikanti and Sharir teach the neural network processor of claim 1, wherein the workload comprises an input volume and a weight volume having height and width dimensions [The workload comprises input and weight matrices (i.e. volumes) having height (row) and width (column) dimensions. Korthikanti at column 18, lines 24-28; FIG. 5], wherein the workload is partitioned along the height dimension [The matrix operand is partitioned across the row/height dimension. Korthikanti at column 14, lines 1-7]. Korthikanti and Sharir don’t explicitly teach that the input volume and weight volume have a depth dimension. However, Korthikanti teaches that the stored and retrieved data from memory may comprise any number of dimensions, including three dimensions (i.e. having a depth dimension) [Korthikanti at column 10, line 55 – column 11, line 8]. In the same field of neural network processing, Shen teaches partitioning a workload in a convolutional layer processor, wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions [Shen at Fig. 3 and Section II]. Shen teaches that using convolutional neural networks, whose inputs and weights comprise three-dimensional data, provide increased machine learning accuracy/performance across a variety of applications [Shen at Section I, 1st and 2nd paragraphs]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the partitioning of Korthikanti such that the input volume and weight volume are for CNNs and, thereby, are three dimensions (i.e. have a depth dimension), as taught by Shen in order to improve machine learning performance for a variety of applications.

Regarding claims 4, 12, and 17, taking claim 4 as exemplary, Korthikanti and Sharir teach the neural network processor of claim 1, wherein the workload comprises an input volume and a weight volume having height and width dimensions [The workload comprises input and weight matrices (i.e. volumes) having height (row) and width (column) dimensions. Korthikanti at column 18, lines 24-28; FIG. 5]. Korthikanti and Sharir don’t teach explicitly teach that the input volume and weight volume have a depth dimension, wherein the workload is partitioned along the width dimension. However, Korthikanti teaches that the stored and retrieved data from memory may comprise any number of dimensions, including three dimensions (i.e. having a depth dimension) and that the workload is partitioned along any dimension [Korthikanti at column 10, line 55 – column 11, line 8; column 14, lines 1-7]. In the same field of neural network processing, Shen teaches partitioning a workload in a convolutional layer processor, wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions [Shen at Fig. 3 and Section II], and wherein the workload is partitioned along the width [The workload is partitioned along the layer dimensions, including the width dimension. Shen at Section IV, 1st paragraph; Fig. 1]. Shen teaches that partitioning the workload along the layer dimensions increases the computational efficiency and increases overall throughput [Shen at Abstract]. It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the partitioning of Korthikanti such that the input volume and weight volume are three dimensions (i.e. have a depth dimension) and the workload (i.e. input and weight volumes) is partitioned along the width dimension, as taught by Shen because doing so would increases the computational efficiency and increases overall throughput.


Claims 7, 13, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Korthikanti and Sharir and, further, in view of in view of Dhong et al., U.S. Patent No. 7,137,021 (hereinafter Dhong).

Regarding claims 7, 13, and 20, taking claim 7 as exemplary, Korthikanti and Sharir teach the neural network processor of claim 1. Korthikanti and Sharir don’t teach that the plurality of neurons are powered down following generation of the output values for the workload. In an analogous field of processing, Dhong teaches a processor comprises functional units that are powered down when there are no instructions for the respective functional units [Dhong at column 2, lines 15-38; column 5, lines 11-22; FIGS. 1 and 4]. Dhong teaches that powering down the functional units when not needed reduces power consumption, reduce cooling demand, and improves reliability [Dhong at column 1, lines 24-29]. It would have been obvious to a person of ordinary skill in the art, before the effective filling date of the invention, to modify Korthikanti’s neural network processor so that the processing resources, which are functional units, are powered down when there are no operations/instructions to execute (i.e. following generation of the output values for the workload) because it would reduce power consumption, reduce cooling demand, and improve reliability.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN P GEIB whose telephone number is (571)272-8628. The examiner can normally be reached Monday - Friday 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BENJAMIN P GEIB/Primary Examiner, Art Unit 2123