DETAILED ACTION
	This application has been examined. Claims 1-20 are pending.
In order to facilitate communication with the Examiner and expedite the prosecution of the instant application the Applicant is requested to submit written authorization to authorize the USPTO to communicate via electronic mail.  The written authorization must be compliant with the language from MPEP § 502.03.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
	 This application claims benefits of priority from Provisional Application 63003883 filed April 1, 2020.
  	The effective date of the claims described in this application is April 1, 2020.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/10/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

	 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hutchins (US Patent 7079156) further in view of Chalamalasetti (USPGPUB 2020/0042287).

In regard to Claim 1
Hutchins Column 7 Lines 10-15 disclosed wherein Gatekeeper stage 420 performs a data flow control function to the downstream units. In one embodiment, gatekeeper stage 420 has an associated scoreboard 425 for scheduling, load balancing, resource allocation, and hazard avoidance of pixel packets. Scoreboard 425 tracks the entry and retirement of pixels.

Hutchins Column 8 Lines 30-35 disclosed wherein the interpolators 501 508 are programmable and can be flexibly assigned interpolation computations. In other words, the parameter assignment to the interpolators is programmable. The assigned interpolation computations can be software scheduled such that each of the eight interpolators is kept busy as much as possible (e.g., on a per clock basis). In one embodiment, a software scheduler ensures the interpolators 501 508 are kept busy and that latency/idle time is avoided. 
Hutchins Column 8 Lines 50-55 disclosed wherein the array of interpolators 501 508 are divided into high precision interpolators (e.g., interpolators 501 504) and low precision interpolators (e.g., interpolators 505 508). The division is configured to maintain the flexible assignability of interpolation computations, while simultaneously conserving silicon area dedicated to the interpolator array. The division is configured to take advantage of the fact that some parameters need to be computed in high precision (e.g., texture coordinates) while other parameters do not. For such low precision parameters, the extra precision afforded by a high precision computation provides no significant contribution to the resulting image. Accordingly, low precision parameters can be assigned to the low precision interpolators 505 508.
Hutchins disclosed (re. Claim 1) an apparatus comprising: a machine learning system comprising: a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision; (Hutchins-Column 8 Lines 50-55,array of interpolators 501 508 are divided into high precision interpolators (e.g., interpolators 501 504) and low precision interpolators (e.g., interpolators 505 508). The division is configured to maintain the flexible assignability of interpolation computations, while simultaneously conserving silicon area dedicated to the interpolator array. The division is configured to take advantage of the fact that some parameters need to be computed in high precision (e.g., texture coordinates) while other parameters do not. For such low precision parameters, the extra precision afforded by a high precision computation provides no significant contribution to the resulting image. Accordingly, low precision parameters can be assigned to the low precision interpolators 505 508  )
a load balancing circuit configured to (Hutchins-Column 7 Lines 10-15,Gatekeeper stage 420 performs a data flow control function to the downstream units.. gatekeeper stage 420 has an associated scoreboard 425 for scheduling, load balancing, resource allocation, and hazard avoidance of pixel packets ) select a load balancing technique (Hutchins-Column 8 Lines 30-35,the interpolators 501 508 are programmable and can be flexibly assigned interpolation computations. In other words, the parameter assignment to the interpolators is programmable. The assigned interpolation computations can be software scheduled such that each of the eight interpolators is kept busy as much as possible (e.g., on a per clock basis). In one embodiment, a software scheduler ensures the interpolators 501 508 are kept busy and that latency/idle time is avoided, Column 10 Lines 60-65, By interleaving rows of pixel packets in this fashion, stalls due to functional unit latencies in the pipeline 400 can be avoided  )


While Hutchins substantially disclosed the claimed invention Hutchins does not disclose (re. Claim 1) alternately loading the computation circuit with at least a first weight subdivision combination and a second weight subdivision combination, and load a computation circuit with a selected weight subdivision and a computation circuit configured to compute a partial computation result based, at least in part, upon the weight subdivision.
Chalamalasetti Figure 4A,Paragraph 18 disclosed wherein determination of a weight threshold for each network layer. In the conversion process, weights with absolute value lower than this weight threshold are typically scaled linearly from the initial floating-point number (for example, from a 32-bit floating-point accuracy) to a fixed precision integer number (for example, an 8-bit integer value). Weights with an absolute value larger than the threshold value saturate at maximum positive or negative value that may then be represented in the chosen integer precision. 

Chalamalasetti disclosed (re. Claim 1) alternately loading the computation circuit with at least a first weight subdivision combination and a second  weight subdivision combination, (Chalamalasetti- Figure 4A,Paragraph 18, determination of a weight threshold for each network layer ,Paragraph 20, use adjusted quantized weights from training and as a result determine that fewer computing elements may be used for an individual stage.)  
Hutchins and Chalamalasetti are analogous art because they present concepts and practices regarding configuration of neural networks. At the time of the effective filing date of the claimed invention it would have been obvious to combine Chalamalasetti into Hutchins.  The motivation for the said combination would have been to process different portions of a compute task using automatically determined, adjustable levels of precision such that levels of precision may be automatically determined at run-time rather than manually determined and set prior to processing.  (Chalamalasetti-Paragraph 14)
Hutchins-Chalamalasetti disclosed (re. Claim 1)  load a computation circuit with a selected data subdivision (Hutchins-Column 8 Lines 50-55,array of interpolators 501 508 are divided into high precision interpolators (e.g., interpolators 501 504) and low precision interpolators (e.g., interpolators 505 508).  ) and a selected weight subdivision (Chalamalasetti- Figure 4A,Paragraph 18, determination of a weight threshold for each network layer ,Paragraph 20, use adjusted quantized weights from training and as a result determine that fewer computing elements may be used for an individual stage.)   based, at least in part, upon the load balancing technique; (Hutchins- Column 7 Lines 10-15, gatekeeper stage 420 has an associated scoreboard 425 for scheduling, load balancing, resource allocation, and hazard avoidance of pixel packets ) and a computation circuit configured to compute a partial computation result based, (Hutchins-Column 2 Lines 60-65 ,Column 8 Lines 30-35, scheduling the high precision interpolation computations and the low precision interpolation computations for parallel execution on an array of interpolators by using a software scheduler, Column 10 Lines 60-65, Once all of the rows associated with pixel packet 520 are loaded into pipeline 400, rows associated with the next pixel packet are loaded into pipeline 400. In one embodiment, rows of pixel data for one pixel packet are interleaved with rows of pixel data from the next pixel packet. By interleaving rows of pixel packets in this fashion, stalls due to functional unit latencies in the pipeline 400 can be avoided  )  at least in part, upon the selected data subdivision and the weight subdivision.

In regard to Claim 9
 Claim 9 (re. apparatus) recites substantially similar claim limitations as Claim 1 and 2.  Claim 9 is rejected on the same basis as Claims 1-2. 
In regard to Claim 18
 Claim 18 (re. apparatus) recites substantially similar claim limitations as Claim 1 and 2.  Claim 18 is rejected on the same basis as Claims 1-2. 

In regard to Claim 2
Hutchins-Chalamalasetti disclosed (re. Claim 2) a fusion circuit configured to combine a first partial computation result with at least a second partial computation result to form a combined computation result.(Chalamalasetti-Paragraph 53, Results collector 475 represents a correlation point for the single input stream to be provided as a single output stream.  ) 

In regard to Claim 3,10,19
Hutchins-Chalamalasetti disclosed (re. Claim 3,10,19) wherein the load balancing circuit is configured to select a load balancing technique that causes the fusion circuit to shift a partial computation result a width no greater than a width of a data subdivision.(Hutchins-Column 11 Lines 5-10, The pixel data in each row 821 is 80 bits in length… the pixel data in each row 821 is represented using four (4) sets of 20-bit values (e.g., 822 825). Each of the sets of 20-bit values may represent one or more instances of pixel data. ) 

In regard to Claim 4,11,20
Hutchins-Chalamalasetti disclosed (re. Claim 4,11,20) wherein the load balancing circuit employs a load balancing technique that comprises: creating the first data/weight subdivision combination from a first data subdivision (Hutchins-Column 8 Lines 50-55,array of interpolators 501 508 are divided into high precision interpolators (e.g., interpolators 501 504) and low precision interpolators (e.g., interpolators 505 508).  ) and a first weight subdivision; (Chalamalasetti- Figure 4A,Paragraph 18, determination of a weight threshold for each network layer ,Paragraph 20, use adjusted quantized weights from training and as a result determine that fewer computing elements may be used for an individual stage.)   creating the second data/weight subdivision combination from a second data subdivision and a second weight subdivision; (Chalamalasetti- Figure 4A,Paragraph 18, determination of a weight threshold for each network layer ,Paragraph 20, use adjusted quantized weights from training and as a result determine that fewer computing elements may be used for an individual stage.)   alternately loading the computation circuit with either the first data/weight subdivision combination and the second data/weight subdivision combination.(Chalamalasetti-Paragraph 48, an image processing process with multiple steps where different steps in the image processing pipeline may utilize different levels of precision… The example ordering of less precision and high precision steps is for illustration only. In real world implementations of neural networks, the precision may show an opposite trend or alternate amounts of precision between steps.) 

In regard to Claim 5,12
Hutchins-Chalamalasetti disclosed (re. Claim 5,12) wherein the load balancing circuit employs a load balancing technique that comprises: creating at least the first data/weight subdivision combination and the first data/weight subdivision combination from permutations of combinations data subdivisions and weight subdivisions; and loading, in a round robin fashion, the computation circuit with the data/weight subdivision combinations.(Chalamalasetti-Paragraph 48, an image processing process with multiple steps where different steps in the image processing pipeline may utilize different levels of precision… The example ordering of less precision and high precision steps is for illustration only. In real world implementations of neural networks, the precision may show an opposite trend or alternate amounts of precision between steps.) 

In regard to Claim 6,13
Hutchins-Chalamalasetti disclosed (re. Claim 6,13) wherein the first data/weight subdivision combination and the second data/weight subdivision combination are selected from different cells of data.( Hutchins-Column 8 Lines 50-55,the array of interpolators 501 508 are divided into high precision interpolators (e.g., interpolators 501 504) and low precision interpolators (e.g., interpolators 505 508)…The division is configured to take advantage of the fact that some parameters need to be computed in high precision (e.g., texture coordinates) while other parameters do not. For such low precision parameters, the extra precision afforded by a high precision computation provides no significant contribution to the resulting image. Accordingly, low precision parameters can be assigned to the low precision interpolators 505 508.)

In regard to Claim 7,14
Hutchins-Chalamalasetti disclosed (re. Claim 7,14) wherein the load balancing technique (Hutchins- Column 7 Lines 10-15, scheduling, load balancing, resource allocation, and hazard avoidance of pixel packets )  includes a static load balancing technique. (Hutchins-Column 8 Lines 30-35,the interpolators 501 508 are programmable and can be flexibly assigned interpolation computations. In other words, the parameter assignment to the interpolators is programmable. The assigned interpolation computations can be software scheduled such that each of the eight interpolators is kept busy as much as possible (e.g., on a per clock basis). In one embodiment, a software scheduler ensures the interpolators 501 508 are kept busy and that latency/idle time is avoided, Column 10 Lines 60-65, By interleaving rows of pixel packets in this fashion, stalls due to functional unit latencies in the pipeline 400 can be avoided  )

In regard to Claim 8,15
Hutchins-Chalamalasetti disclosed (re. Claim 8,15) wherein the load balancing circuit is configured to select a load balancing technique that reduces an amount of time the computation circuit is stalled. (Hutchins-Column 8 Lines 30-35,the interpolators 501 508 are programmable and can be flexibly assigned interpolation computations. In other words, the parameter assignment to the interpolators is programmable. The assigned interpolation computations can be software scheduled such that each of the eight interpolators is kept busy as much as possible (e.g., on a per clock basis). In one embodiment, a software scheduler ensures the interpolators 501 508 are kept busy and that latency/idle time is avoided, Column 10 Lines 60-65, By interleaving rows of pixel packets in this fashion, stalls due to functional unit latencies in the pipeline 400 can be avoided  )
In regard to Claim 16
Hutchins-Chalamalasetti disclosed (re. Claim 16) wherein the load balancing circuit is configured to: load the first computation circuit with a first data/weight subdivision combination; load the second computation circuit with a second data/weight subdivision combination; ( Hutchins-Column 8 Lines 50-55,the array of interpolators 501 508 are divided into high precision interpolators (e.g., interpolators 501 504) and low precision interpolators (e.g., interpolators 505 508)…The division is configured to take advantage of the fact that some parameters need to be computed in high precision (e.g., texture coordinates) while other parameters do not. For such low precision parameters, the extra precision afforded by a high precision computation provides no significant contribution to the resulting image. Accordingly, low precision parameters can be assigned to the low precision interpolators 505 508.)
and in response to the first computation circuit producing the first partial computation result and without regard to the second computation circuit not producing the second partial computation result, load the first computation circuit with a third data/weight subdivision combination. (Hutchins-Column 8 Lines 30-35,the interpolators 501 508 are programmable and can be flexibly assigned interpolation computations. In other words, the parameter assignment to the interpolators is programmable. The assigned interpolation computations can be software scheduled such that each of the eight interpolators is kept busy as much as possible (e.g., on a per clock basis). In one embodiment, a software scheduler ensures the interpolators 501 508 are kept busy and that latency/idle time is avoided, Column 10 Lines 60-65, By interleaving rows of pixel packets in this fashion, stalls due to functional unit latencies in the pipeline 400 can be avoided  )

In regard to Claim 17
Hutchins-Chalamalasetti disclosed (re. Claim 17) wherein the first computation circuit is configured to begin computing the third partial computation result without waiting for the second computation circuit to produce the second partial computation result. (Hutchins-Column 8 Lines 30-35,the interpolators 501 508 are programmable and can be flexibly assigned interpolation computations. In other words, the parameter assignment to the interpolators is programmable. The assigned interpolation computations can be software scheduled such that each of the eight interpolators is kept busy as much as possible (e.g., on a per clock basis). In one embodiment, a software scheduler ensures the interpolators 501 508 are kept busy and that latency/idle time is avoided, Column 10 Lines 60-65, By interleaving rows of pixel packets in this fashion, stalls due to functional unit latencies in the pipeline 400 can be avoided  )
 
Conclusion
Examiner’s Note: In the case of amending the claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please refer to the enclosed PTO-892 form.
 Any inquiry concerning this communication or earlier communications from the examiner should be directed to GREG C BENGZON whose telephone number is (571)272-3944.  The examiner can normally be reached on Monday - Friday 8 AM - 4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Follansbee can be reached on (571) 272-3964.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


	/GREG C BENGZON/           Primary Examiner, Art Unit 2444