DETAILED ACTION
Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”), such as claim 20, are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-5, 7, 11-12, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Diril et al. (US 2019/0205358 A1) in view of Goyal et al. (US 2017/0316312 A1).

Diril and Goyal were cited in IDS filed on 10/08/2020.

Regarding claim 1, Diril teaches An apparatus comprising: 
a first tensor compute cluster configured to receive first input feature tensors ([0047] Density-aware logical unit 304… may be coupled to dot-product engine 310; [0038] Dot-product engine 272 may be a logical unit configured to perform dot-product multiplication on matrices and may include one or more processing elements, such as processing element 274(1), processing element 274(2), and processing element 274(3)) 
a second tensor compute cluster configured to receive second input feature tensors more sparse than the first input feature tensors ([0047]: Density-aware logical unit 304… may be coupled to dot-product engine 310; [0038] Dot-product engine 272 may be a logical unit configured to perform dot-product multiplication on matrices and may include one or more processing elements, such as processing element 274(1), processing element 274(2), and processing element 274(3)); and 
circuitry ([0050]: Input subsystem 302) configured to: 
partition an input feature map into a plurality of input feature tensors including the first input feature tensors and the second input feature tensors based on a compression criteria ([0004] In some embodiments, a hardware accelerator may include logic (e.g., a multiplexer) that identifies zero-value elements in dot-product input matrices and removes these elements from a dot-product processing stream. [0050]: Input subsystem 302 may also receive the vectors in any suitable manner. For example, input subsystem 302 may receive the vectors as two input streams of data. As another example, input subsystem 302 may receive two matrices and extract the vectors from columns and/or rows of the matrices. Furthermore, input subsystem 302 may receive the vectors as inputs, or operands for a dot-product operation. For example, input subsystem may receive (e.g., from another device or subsystem) preprocessed vectors of equal length that are intended for use in a dot-product operation. Alternatively, input subsystem may receive data in matrices or other formats and may process the data (e.g., by selecting two equal-length strings of values) for use in a dot-product operation.); and
assign each of the plurality of input feature tensors to one of the first tensor compute cluster, the second tensor compute cluster, or the vector accelerator based upon a respective sparsity parameter of each of the plurality of input feature tensors ([0004] In other words, the hardware accelerator may be configured to send only non-zero matrix elements to a dot-product engine for processing. Since the computational power involved in identifying zero-value elements may be less than the computational power needed to perform multiply-and-accumulate operations, dropping zero-value elements from dot-product computations may increase processing efficiency and provide a variety of other advantages.; [0010]; [0053]: Sparsity-aware logical unit 306 may be disabled or bypassed in some embodiments (e.g., embodiments where processing overhead of sparsity-aware logical unit 306 may be close to, or greater than, any savings gained by skipping zero-value elements). For example, density-aware logical-unit 304 may process incoming vectors or matrices to determine their density (e.g., how many non-zero elements they have, the percentage of non-zero elements they have, etc.). Density-aware logical unit 304 may determine density in a variety of ways. For example, density-aware logical unit 304 may evaluate all or a portion of an input matrix or vector, may read metadata or receive other information indicative of the density of an input matrix or vector, etc. If density-aware logical unit 304 determines that at least one of the first and second vectors or matrices are dense (e.g., have less than predefined number or threshold of zero-value elements), density-aware logical unit 304 may cause sparsity-aware logical unit 306 to be bypassed or disabled; [0038]).

Diril does not expressly teach a vector accelerator; and 
assign each of the plurality of input feature tensors to one of the first tensor compute cluster, the second tensor compute cluster, or the vector accelerator based upon a respective sparsity parameter of each of the plurality of input feature tensors.

However, Goyal teaches a vector accelerator and assign each of the plurality of input feature tensors to one of the first tensor compute cluster, the second tensor compute cluster, or the vector accelerator based upon a respective sparsity parameter of each of the plurality of input feature tensors (Abstract: A hardware-based programmable deep learning processor (DLP) is proposed, wherein the DLP comprises with a plurality of accelerators dedicated for deep learning processing. Specifically, the DLP includes a plurality of tensor engines configured to perform operations for pattern recognition and classification based on a neural network. Each tensor engine includes one or more matrix multiplier (MatrixMul) engines each configured to perform a plurality of dense and/or sparse vector-matrix and matrix-matrix multiplication operations, one or more convolutional network (ConvNet) engines each configured to perform a plurality of efficient convolution operations on sparse or dense matrices, one or more vector floating point units (VectorFPUs) each configured to perform floating point vector operations).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Goyal with the teachings of Diril to have a plurality of engines and vector accelerators/FPU to handle different types of tensors. The modification would have been motivated to combine prior art elements according to known methods to yield predictable results.

Regarding claim 2, Diril teaches wherein the circuitry is further configured to classify each of the plurality of input feature tensors as either a sparse tensor or a dense tensor such that the sparse tensor comprises a greater number or percentage of zeroes than the dense tensor ([0004] sparsity (e.g., having zero-value elements) within input vectors and matrices.; [0010] determine that at least one of first and second matrices are dense (e.g., are matrices with fewer than a predetermined number of zero-value elements). The density-aware logical unit may, in response to identify one or both of the input matrices as being dense; [0047] Density-aware logical unit 304 and sparsity-aware logical unit 306 may be coupled to input subsystem 302 and may include various components and/or logical constructs configured to detect density and sparsity within vectors and/or matrices.).

Regarding claim 3, Goyal teaches wherein the circuitry is further configured to assign the sparse tensor to the second tensor compute cluster and assign the dense tensor to the first tensor compute cluster (Claim 1: one or more matrix multiplier engines each configured to perform a plurality of dense and/or sparse vector-matrix and matrix-matrix multiplication operations; one or more convolutional network engines each configured to perform a plurality of convolution operations by exploring sparsity of the vectors and/or matrices).

Regarding claim 4, Goyal teaches wherein the circuitry is further configured to assign the sparse tensor or the dense tensor to the vector accelerator that are not supported by the first tensor compute cluster or the second tensor compute cluster ([0017]: Each tensor engine includes one or more matrix multiplier (MatrixMul) engines each configured to perform a plurality of dense and/or sparse vector-matrix and matrix-matrix multiplication operations, one or more convolutional network (ConvNet) engines each configured to perform a plurality of efficient convolution operations on sparse or dense matrices, one or more vector floating point units (VectorFPUs) each configured to perform floating point vector operations, and a data engine configured to retrieve and store multi-dimensional (e.g., 2D) data to both on-chip and external memories).

Regarding claim 5, Diril teaches wherein the circuitry is further configured to assign each of the plurality of input feature tensors based upon the respective sparsity parameter in combination with an optimization parameter that comprises an execution time or a power consumption ([0010] The computing system may also include a density-aware logical subsystem configured to determine that at least one of first and second matrices are dense (e.g., are matrices with fewer than a predetermined number of zero-value elements). The density-aware logical unit may, in response to identify one or both of the input matrices as being dense, disable (e.g., bypass) the sparsity-aware logical unit to save the time and energy involved in evaluating sparsity since the dot-product engine may only realize minimal benefits from skipping a relatively small number of sparse matrices.).

Regarding claim 7, Diril teaches wherein the circuitry is further configured to: for each of the plurality of input feature tensors, determine a power consumption for executing that one of the plurality of input feature tensors on the first tensor compute cluster or the second tensor compute cluster ([0010] The computing system may also include a density-aware logical subsystem configured to determine that at least one of first and second matrices are dense (e.g., are matrices with fewer than a predetermined number of zero-value elements). The density-aware logical unit may, in response to identify one or both of the input matrices as being dense, disable (e.g., bypass) the sparsity-aware logical unit to save the time and energy involved in evaluating sparsity since the dot-product engine may only realize minimal benefits from skipping a relatively small number of sparse matrices.); and 
assign that one of the plurality of input feature tensors to the first tensor compute cluster or the second tensor compute cluster based upon the respective power consumption ([0004] In other words, the hardware accelerator may be configured to send only non-zero matrix elements to a dot-product engine for processing. Since the computational power involved in identifying zero-value elements may be less than the computational power needed to perform multiply-and-accumulate operations, dropping zero-value elements from dot-product computations may increase processing efficiency and provide a variety of other advantages.; [0010]; [0053]: Sparsity-aware logical unit 306 may be disabled or bypassed in some embodiments (e.g., embodiments where processing overhead of sparsity-aware logical unit 306 may be close to, or greater than, any savings gained by skipping zero-value elements). For example, density-aware logical-unit 304 may process incoming vectors or matrices to determine their density (e.g., how many non-zero elements they have, the percentage of non-zero elements they have, etc.). Density-aware logical unit 304 may determine density in a variety of ways. For example, density-aware logical unit 304 may evaluate all or a portion of an input matrix or vector, may read metadata or receive other information indicative of the density of an input matrix or vector, etc. If density-aware logical unit 304 determines that at least one of the first and second vectors or matrices are dense (e.g., have less than predefined number or threshold of zero-value elements), density-aware logical unit 304 may cause sparsity-aware logical unit 306 to be bypassed or disabled; [0038]). 

Regarding claim 11, Diril teaches wherein the first tensor compute cluster comprises a plurality of first tensor compute units ([0038] Dot-product engine 272 may be a logical unit configured to perform dot-product multiplication on matrices and may include one or more processing elements, such as processing element 274(1), processing element 274(2), and processing element 274(3); [0047]: Both density-aware logical unit 304 and sparsity-aware logical unit 306 may be coupled to dot-product engine 310).

Regarding claim 12, Diril teaches wherein the second tensor compute cluster comprises a plurality of second tensor compute units ([0038] Dot-product engine 272 may be a logical unit configured to perform dot-product multiplication on matrices and may include one or more processing elements, such as processing element 274(1), processing element 274(2), and processing element 274(3); [0047]: Both density-aware logical unit 304 and sparsity-aware logical unit 306 may be coupled to dot-product engine 310).

Regarding claim 19, it is a media/product type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above.

Regarding claim 20, it is a system type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above.

Claims 6 and 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Diril et al. (US 2019/0205358 A1) in view of Goyal et al. (US 2017/0316312 A1), in further view of Feng et al. (US 2019/0377606 A1).

Feng was cited in IDS filed on 10/08/2020.

Regarding claim 6, Diril and Goyal do not expressly teach wherein the circuitry is further configured to: for each of the plurality of input feature tensors, determine an execution time for executing that one of the plurality of input feature tensors on the first tensor compute cluster or the second tensor compute cluster; and assign that one of the plurality of input feature tensors to the first tensor compute cluster or the second tensor compute cluster based upon the respective execution times.
	However, Feng teaches wherein the circuitry is further configured to: for each of the plurality of input feature tensors, determine an execution time for executing that one of the plurality of input feature tensors on the first tensor compute cluster or the second tensor compute cluster; and assign that one of the plurality of input feature tensors to the first tensor compute cluster or the second tensor compute cluster based upon the respective execution times ([0062] In some embodiments, the mechanisms of the present invention use the following three components to implement the various disclosed functionality: (a) A Metrics Collector to collect runtime information associated with each deep learning job; (b) A Cost Calculator to evaluate the performance impact of topology changes of the accelerators within the cluster; and (c) A Decision Maker to identify and implement an actual plan to allocate and/or reclaim accelerators from respective deep learning jobs to gain an optimal accelerator topology across the cluster. It should be noted that components within the disclosure may be implemented within a session scheduler which schedules resource allocations to each deep learning job, inside the deep learning job itself (i.e., within the scheduler client), or a combination thereof. For example, the computations as discussed herein may be performed within the deep learning job itself, which then may send hints to the session scheduler as to the pre-screening of resource scheduling for the instant and/or future jobs. [0065] In various embodiments, a job cost is calculated for a remaining time period of the deep learning job).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Feng with the teachings of Diril and Goyal to determine performance metrics such as execution time to determine how to schedule jobs. The modification would have been motivated by the desire of improving deep learning job scheduling.

Regarding claim 8, Feng teaches wherein the circuitry is further configured to: for a first input feature tensor of the plurality of input feature tensors, determine an execution time for a similar input feature tensor that previously executed on the first tensor compute cluster and the second tensor compute cluster and assign the first input feature tensor to the first tensor compute cluster or the second tensor compute cluster based upon the respective execution time for the similar input feature tensor ([0062] In some embodiments, the mechanisms of the present invention use the following three components to implement the various disclosed functionality: (a) A Metrics Collector to collect runtime information associated with each deep learning job; (b) A Cost Calculator to evaluate the performance impact of topology changes of the accelerators within the cluster; and (c) A Decision Maker to identify and implement an actual plan to allocate and/or reclaim accelerators from respective deep learning jobs to gain an optimal accelerator topology across the cluster. It should be noted that components within the disclosure may be implemented within a session scheduler which schedules resource allocations to each deep learning job, inside the deep learning job itself (i.e., within the scheduler client), or a combination thereof. For example, the computations as discussed herein may be performed within the deep learning job itself, which then may send hints to the session scheduler as to the pre-screening of resource scheduling for the instant and/or future jobs. [0065] In various embodiments, a job cost is calculated for a remaining time period of the deep learning job).

Regarding claim 9, Feng teaches wherein the circuitry is further configured to: for a first input feature tensor of the plurality of input feature tensors, determine a power consumption for a similar input feature tensor that previously executed on the first tensor compute cluster and the second tensor compute cluster; and assign the first input feature tensor to the first tensor compute cluster or the second tensor compute cluster based upon the respective power consumption for the similar input feature tensor ([0061]; [0062] In some embodiments, the mechanisms of the present invention use the following three components to implement the various disclosed functionality: (a) A Metrics Collector to collect runtime information associated with each deep learning job; (b) A Cost Calculator to evaluate the performance impact of topology changes of the accelerators within the cluster; and (c) A Decision Maker to identify and implement an actual plan to allocate and/or reclaim accelerators from respective deep learning jobs to gain an optimal accelerator topology across the cluster. It should be noted that components within the disclosure may be implemented within a session scheduler which schedules resource allocations to each deep learning job, inside the deep learning job itself (i.e., within the scheduler client), or a combination thereof. For example, the computations as discussed herein may be performed within the deep learning job itself, which then may send hints to the session scheduler as to the pre-screening of resource scheduling for the instant and/or future jobs.).

Regarding claim 10, Feng teaches wherein the circuitry is further configured to: 
for a first input feature tensor of the plurality of input feature tensors, identify a second input feature tensor that previously executed on the first tensor compute cluster or the second tensor compute cluster and that has a similar metric as the first input feature tensor; and assign the first input feature tensor to the first tensor compute cluster or the second tensor compute cluster based upon which tensor compute cluster executed the second input feature tensor ([0062] In some embodiments, the mechanisms of the present invention use the following three components to implement the various disclosed functionality: (a) A Metrics Collector to collect runtime information associated with each deep learning job; (b) A Cost Calculator to evaluate the performance impact of topology changes of the accelerators within the cluster; and (c) A Decision Maker to identify and implement an actual plan to allocate and/or reclaim accelerators from respective deep learning jobs to gain an optimal accelerator topology across the cluster. It should be noted that components within the disclosure may be implemented within a session scheduler which schedules resource allocations to each deep learning job, inside the deep learning job itself (i.e., within the scheduler client), or a combination thereof. For example, the computations as discussed herein may be performed within the deep learning job itself, which then may send hints to the session scheduler as to the pre-screening of resource scheduling for the instant and/or future jobs.).

Claims 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Diril et al. (US 2019/0205358 A1) in view of Goyal et al. (US 2017/0316312 A1), in further view of Diamant et al. (US 10,579,591).

Regarding claim 13, Goyal teaches wherein to partition the input feature map into the plurality of input feature tensors, the circuitry is further configured to: 
divide the input feature map into a plurality of portions, each of the plurality of portions having a first cell size ([0027]: the DLP 102 is configured to partition a large dense or sparse matrix into smaller portions and distribute the portions of the matrix across multiple tensor engines 104.). Diril and Goyal do not expressly teach recursively divide each of the plurality of portions into a plurality of sub-portions until the compression criteria is met.

However, Diamant et al. teaches recursively divide each of the plurality of portions into a plurality of sub-portions until the compression criteria is met (Col. 10, lines 53-59: In some embodiments, the co-processor can divide the input data into a plurality of portions. This division may include, for example, determining a size of a first portion, and treating the remaining data as a second portion, which can be further divided later as part of an iterative process. The division into portions may be based, at least in part, on the compression target block size.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Diamant with the teachings of Diril and Goyal to further fragment tensor data to meet a compression target. The modification would have been motivated by the desire of improving input size for optimized processing.

Regarding claim 14, Diamant teaches wherein the compression criteria comprises a number of compression levels or a threshold minimum cell size of each of the plurality of sub-portions (Col. 10, lines 59-61: the compression target block size is 1 MB).

Regarding claim 15, Diamant teaches wherein the circuitry is further configured to create an extended sub-array from the plurality of sub-portions upon meeting the compression criteria (Col. 11, lines 18-30: This process may be repeated any number of times in an iterative manner. With each new compressed portion added to the output data file, the file size of the output data file will continue to increase).

Regarding claim 16, it is a method type claim having similar limitations as claim 1 in combination with claims 13-15 above. Therefore, it is rejected under the same rationale above.

Regarding claim 17, Diril teaches wherein the at least some of the input feature tensors that are assigned to the dense tensor compute cluster, the sparse tensor compute cluster, or the vector accelerator have at least one non-zero value ([0004] In other words, the hardware accelerator may be configured to send only non-zero matrix elements to a dot-product engine for processing. Since the computational power involved in identifying zero-value elements may be less than the computational power needed to perform multiply-and-accumulate operations, dropping zero-value elements from dot-product computations may increase processing efficiency and provide a variety of other advantages.; [0010]; [0053]: Sparsity-aware logical unit 306 may be disabled or bypassed in some embodiments (e.g., embodiments where processing overhead of sparsity-aware logical unit 306 may be close to, or greater than, any savings gained by skipping zero-value elements). For example, density-aware logical-unit 304 may process incoming vectors or matrices to determine their density (e.g., how many non-zero elements they have, the percentage of non-zero elements they have, etc.). Density-aware logical unit 304 may determine density in a variety of ways. For example, density-aware logical unit 304 may evaluate all or a portion of an input matrix or vector, may read metadata or receive other information indicative of the density of an input matrix or vector, etc. If density-aware logical unit 304 determines that at least one of the first and second vectors or matrices are dense (e.g., have less than predefined number or threshold of zero-value elements), density-aware logical unit 304 may cause sparsity-aware logical unit 306 to be bypassed or disabled; [0038]).

Regarding claim 18, Diril teaches wherein the input feature tensors that have all zero values are not assigned to any of the dense tensor compute cluster, the sparse tensor compute cluster, or the vector accelerator (Abstract: (1) identifying, within the first and second vectors, at least one zero-value element and (2) executing, in response to identifying the zero-value element, a reduced dot-product operation that excludes, relative to the full dot-product operation, at least one mathematical operation in which the zero-value element is an operand.).

Response to Arguments
Applicant's arguments filed 07/11/2022 have been fully considered but they are not persuasive.
In Remarks, Applicant argues:

(I) Claim 1 as amended recites: 
circuitry configured to... 
assign each of the plurality of input feature tensors to one of the first tensor 
compute cluster, the second tensor compute cluster, or the vector accelerator based 
upon a respective sparsity parameter of each of the plurality of input feature tensors. 

No such features are taught or suggested by the cited references. The present application teaches: 

The accelerator 200 also includes a scheduling engine 235 that is configured to perform a sparsity analysis, and assign, in some embodiments, each of the input feature tensors to one of a dense tensor compute cluster 240, a sparse tensor compute cluster 245, and a vector accelerator 250 based upon the sparsity analysis. (paragraph 0034) 

In contrast, Diril, which was cited as disclosing the above features, does not assign input feature tensors based upon a respective sparsity parameter. While Diril discloses sparsity-aware components, Diril uses sparsity information to skip operations that include zero-value elements, not to assign input feature tensors to different compute clusters. 

Sparsity-aware dot-product systems may take advantage of sparsity within input matrices to improve processing efficiency by skipping operations that include zero- value elements. In other words, sparsity-aware dot-product systems may send only non- zero matrix elements to a dot-product engine for processing, which may reduce the 
number of operations performed in a dot-product calculation. (Diril’s paragraph 0025) 

Diril does not disclose that any of processing elements 247(1) - 247(3) are configured to receive different input feature tensors or that any sparsity-aware component (e.g., sparsity-aware logical unit 306) is configured to assign input feature tensors to processing elements 247(1) - 247(3) based upon a respective sparsity parameter of the input feature tensors. 
Diril was cited as disclosing the above features and Goyal does not appear to disclose any relevant features. Because at least these features are not taught or suggested by the cited references alone or in combination, claim 1 is submitted to be allowable. Claims 2-15 depend from claim 1 and are submitted to be allowable at least for depending from an allowable base claim. 

(II) Claims (i.e., 2-20) are submitted to be allowable for at least the reasons discussed above with respect to claim 1.



In view of the above, examiner submits the following:

As to point (I)
	
Upon further review and consideration of the references and Remarks provided, examiner respectfully disagrees with the Applicant for at least the following reason.

Applicant’s argue that “[Diril] does not assign input feature tensors based upon a respective sparsity parameter. While Diril discloses sparsity-aware components, Diril uses sparsity information to skip operations that include zero-value elements, not to assign input feature tensors to different compute clusters” by providing Diril’s [0025]. 

Examiner directs attention to the instant Application Specification [0031] below.
[0031] “Each of the feature maps of the input image 205 may undergo a compression process within the compression block 210. The compression process may be configured to divide each feature map into a plurality of portions to take advantage of sparsity. For example, the compression process may identify portions of the feature map having zero and non-zero values. Zero values may be indicative of information (e.g., background of an image) that may not be needed for classification, and therefore, does not need to be processed. Non-zero values may be indicative of classifiable information that may need to be processed. Thus, the compression process divides a feature map into portions of a particular cell-size having either all zero values or at least one non- zero value. The portions having all zero values need not be processed, thereby reducing usage of computational resources and increasing speed of computation.”

When reviewing the amended claim language “circuitry configured to... assign each of the plurality of input feature tensors to one of the first tensor compute cluster, the second tensor compute cluster, or the vector accelerator based upon a respective sparsity parameter of each of the plurality of input feature tensors.” One of ordinary skill in the art would have understood under its broadest reasonable interpretation consistent with the specification, that non-zero matrix elements being assigned to a dot-product engine reasonably encompass the claimed limitation as both the instant application and Diril determine to schedule portions having non-zero values for processing while skipping/not to processing feature maps having all zero values “based upon sparsity”. As such, Diril reasonably teaches the claimed limitation. Accordingly, Applicant’s argument is not persuasive and the rejection is maintained. 

As to point (II)

Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORGE A CHU JOY-DAVILA whose telephone number is (571)270-0692. The examiner can normally be reached Monday-Friday, 9:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai T An can be reached on (571)-272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JORGE A CHU JOY-DAVILA/Primary Examiner, Art Unit 2195