DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and §103 (or as subject to pre-AIA  35 U.S.C. §102 and §103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Priority
Examiner acknowledges Applicant's claim for benefit of 62/465,063 filed 2/28/2017.
Examiner notes that at least original claims 3, 6, 10, 13, and 17 do not appear to be fully supported by 62/465,063, and therefore the effective filing date of these claims is the actual filing date of 6/29/2017.

Title
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. Examiner believes that the title of the invention is imprecise. A descriptive title indicative of the invention will help in proper indexing, classifying, searching, etc. See MPEP §606.01. However, the title of the invention should be limited to 500 characters. Examiner suggests in including the aspect(s) of the claims which Applicant believes to be novel or nonobvious over the prior art.

Claim Objections
Claim(s) 2, 9, and 16 is/are objected to because of the following informalities:  
Claims 2, 9, and 16: Change “programable” to – programmable –.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §102 and §103 (or as subject to pre-AIA  35 U.S.C. §102 and §103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. §102(b)(2)(C) for any potential 35 U.S.C. §102(a)(2) prior art against the later invention.
Claim(s) 1-6 and 8-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Chernikov (US 6,539,368) in view of
Zhang (“Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency”) further in view of
Biederman (US 2018/0191642)

Claim 1 (Independent)
Chernikov discloses: A method for evaluating a neural network model corresponding to a service in a system comprising a plurality of nodes interconnected via a network, wherein each node comprises a plurality of on-chip memory blocks (e.g. C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks), the method comprising:
upon service activation receiving an N by M matrix of coefficients corresponding to the neural network model, (e.g. C19L20-60: neural network, matrix of weight coefficients of input data applied to a neural input);
loading the N by M matrix of coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units (e.g. C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks or C19L20–C20L30: neural network division into fragments, each block executes operations, W = matrix of weight coefficients of input data applied to a neural input, Matrix W is moved to the first memory block); and
regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the N by M matrix of coefficients corresponding to the neural network model in the (e.g. C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks or C19L20–C20L30: neural network division into fragments, each block executes operations, W = matrix of weight coefficients of input data applied to a neural input, Matrix W is moved to the first memory block).  
Chernikov fails to explicitly recite:
a plurality of compute units;
wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8.
Zhang discloses: 
wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8 (e.g. §3 equation 10 and the associated discussion; EN: It is clear that this T x T weight matrix for the neural network is contemplated by the author as being 8x8 or larger, as even if each ellipsis is only one value (instead of multiple), this results in the matrix being 8x8).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Chernikov to incorporate the weight matrix being at least 8x8 as taught by Zhang for the benefit of efficient learning (Zhang especially e.g. §3).
The combination of Chernokov and Zhang fails to explicitly recite:
a plurality of compute units.
Biederman discloses:
a neural network implemented with a plurality of compute units (e.g. ¶182 or Figures 1, 3 and the associated discussion).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Chernikov and Zhang to incorporate multiple processing units as taught by Biederman for the benefit of quick processing (Biederman especially e.g. ¶182).

Claim 8 (Independent)
Chernikov discloses: A method for evaluating a neural network model corresponding to a service in a system comprising a plurality of nodes interconnected via a network, wherein each node comprises a plurality of on-chip memory blocks and a plurality of compute units (e.g. )
upon service activation, partitioning the neural network model into separate layers, wherein each layer comprises an N by M matrix of coefficients corresponding to the neural network model, wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8 (e.g. C19L20-60: neural network, matrix of weight coefficients of input data applied to a neural input);
loading the N by M matrix of coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units (e.g. C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks or C19L20–C20L30: neural network division into fragments, each block executes operations, W = matrix of weight coefficients of input data applied to a neural input, Matrix W is moved to the first memory block); and 
regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintaining the N by M matrix of coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced (e.g. C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks or C19L20–C20L30: neural network division into fragments, each block executes operations, W = matrix of weight coefficients of input data applied to a neural input, Matrix W is moved to the first memory block).
Chernikov fails to explicitly recite:
wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8.
Zhang discloses: 
wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8 (e.g. §3 equation 10 and the associated discussion; EN: It is clear that this T x T weight matrix for the neural network is contemplated by the author as being 8x8 or larger, as even if each ellipsis is only one value (instead of multiple), this results in the matrix being 8x8). 
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Chernikov to incorporate the weight matrix being at least 8x8 as taught by Zhang for the benefit of efficient learning (Zhang especially e.g. §3).
The combination of Chernokov and Zhang fails to explicitly recite:
a plurality of compute units.
Biederman
a neural network implemented with a plurality of compute units (e.g. ¶182 or Figures 1, 3 and the associated discussion).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Chernikov and Zhang to incorporate multiple processing units as taught by Biederman for the benefit of quick processing (Biederman especially e.g. ¶182).
Claim 15 (Independent)
Chernikov teaches: A system comprising a plurality of nodes interconnected via a network for evaluating a neural network model corresponding to a service, wherein each node comprises a plurality of on-chip memory blocks and a plurality of compute units (e.g. ), wherein each node is configured to:
upon service activation, receive an N by M matrix of coefficients corresponding to the neural network model, wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8 (e.g. C19L20-60: neural network, matrix of weight coefficients of input data applied to a neural input);
load the N by M matrix of coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks for processing by the plurality of compute units (e.g. C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks or C19L20–C20L30: neural network division into fragments, each block executes operations, W = matrix of weight coefficients of input data applied to a neural input, Matrix W is moved to the first memory block); and 
regardless of a utilization of the plurality of the on-chip memory blocks as part of an evaluation of the neural network model, maintain the N by M matrix of coefficients corresponding to the neural network model in the plurality of the on-chip memory blocks until the service is interrupted or the neural network model is modified or replaced (e.g. C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks or C19L20–C20L30: neural network division into fragments, each block executes operations, W = matrix of weight coefficients of input data applied to a neural input, Matrix W is moved to the first memory block).
Chernikov fails to explicitly recite:
wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8.
Zhang teaches: 
wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8 (e.g. §3 equation 10 and the associated discussion; EN: It is clear that this T x T weight matrix for the neural network is contemplated by the author as being 8x8 or larger, as even if each ellipsis is only one value (instead of multiple), this results in the matrix being 8x8). 
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Chernikov to incorporate the weight matrix being at least 8x8 as taught by Zhang for the benefit of efficient learning (Zhang especially e.g. §3).
The combination of Chernokov and Zhang fails to explicitly recite:
a plurality of compute units.
Biederman discloses:
a neural network implemented with a plurality of compute units (e.g. ¶182 or Figures 1, 3 and the associated discussion).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Chernikov and Zhang to incorporate multiple processing units as taught by Biederman for the benefit of quick processing (Biederman especially e.g. ¶182).

Claims 2 and 9 and 16
In the combination above, Biederman discloses: 
wherein the node comprises a field programable gate array (FPGA) and wherein each of the plurality of the on-chip memory blocks comprises a static random access memory block (e.g. ¶182: cache, reprogrammed, FPGA).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Chernikov and Zhang to incorporate FPGAs as taught by Biederman for the benefit of quick processing (Biederman especially e.g. ¶182).
  
Claim 3 and 10 and 17
In the combination above, Biederman discloses: 

wherein each of the plurality of compute units comprises a set of pre-configured resources on the FPGA (e.g. ¶182: parameters or weights needed to run the neural network preloaded into the cache).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Chernikov and Zhang to incorporate preconfigured FPGA resources as taught by Biederman for the benefit of quick processing (Biederman especially e.g. ¶182).

Claim 4 and 11 and 18
Chernikov discloses: wherein the plurality of the on-chip memory blocks is arranged in rows and wherein each of the plurality of compute units is configured to process at least a subset of at least one of the rows per clock cycle (e.g. C19L1-C20L55).  
Claim 5 and 12 and 19
Chernikov discloses: wherein the loading the N by M matrix of coefficients corresponding to the neural network model into the plurality of the on-chip memory blocks comprises streaming data corresponding to the N by M matrix of coefficients corresponding to the neural network model via a broadcast block into the plurality of the on-chip memory blocks (e.g. C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks or C19L20–C20L30: neural network division into fragments, each block executes operations, W = matrix of weight coefficients of input data applied to a neural input, Matrix W is moved to the first memory block).  
Claim 6 and 13
Chernikov discloses: wherein the streaming does not comprise loading any additional data corresponding to the neural network model from an off-chip memory in response to any operation associated with the N by M matrix of coefficients corresponding to the neural network model (C2L20-C3L45: neural processor loading neural coefficient matrices into memory blocks or C19L20–C20L30: neural network division into fragments, each block executes operations, W = matrix of weight coefficients of input data applied to a neural input, Matrix W is moved to the first memory block).


Claim Rejections - 35 USC § 103
Claim(s) 7 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Chernikov (US 6,539,368) in view of
Zhang (“Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency”) further in view of
Biederman (US 2018/0191642) further in view of
Lipton (“A Critical Review of Recurrent Neural Networks for Sequence Learning”).

Claim 7 and 20
The combination of Chernokov and Zhang and Biederman fails to explicitly recite:
Long Short Term Memory (LSTM)
Lipton discloses: 
wherein the N by M matrix of coefficients comprises a Long Short Term Memory (LSTM) weights matrix (e.g. §3: Whx and Whh are weight matrices or §4.1: LSTM model, nodes are memory cells, uses weight matrices as shown on page 20).
Rationale:
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Chernikov and Zhang and Biederman to incorporate LSTM weights matrices as taught by Lipton for the benefit of overcoming the problem of vanishing gradients and LSTM having shown a superior ability to learn long-range dependencies (Lipton especially e.g. §4.1).  
Examiner’s Note
The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well.

Conclusion
Any prior art made of record on the attached PTO-892 and not relied upon is considered pertinent to applicant's disclosure.
Applicant is reminded that in amending in response to a rejection of claims, the patentable novelty must be clearly shown in view of the state of the art disclosed by the references cited and the objections made.  Applicant must also show how the amendments avoid such references and objections.  See 37 CFR §1.111(c).  Additionally when amending, in their remarks Applicant should particularly cite to the supporting paragraphs in the original disclosure for the amendments.

Correspondence Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN J BUSS whose telephone number is (571)272-5831.  The examiner can normally be reached on M-F 9A-5P ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
As detailed in MPEP 502.03, communications via Internet e-mail are at the discretion of the applicant.  Without a written authorization by applicant in place, the USPTO will not respond via Internet e-mail to any Internet correspondence which contains information subject to the confidentiality requirement as set forth in 35 U.S.C. 122. A paper copy of such correspondence will be placed in the appropriate patent application. Examiner suggests filing PTO/SB/439 if applicant desires the examiner to be able to communicate by email.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 


/B. B./
Examiner, Art Unit 2125



/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125