DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the applicant and claims filed 1/30/2019.
The petition to defer prosecution filed 1/30/2019 was granted. As such, the examination of the application was deferred for a period of 36 months from the earliest filing date for which benefit is claimed (January 30, 2019), which elapsed as of 1/30/2022. 
Claims 1-25 are pending and have been examined.

Information Disclosure Statement
Acknowledgment is made of the information disclosure statement filed 7/02/2020, which complies with 37 CFR 1.97. As such, the information disclosure statement has been placed in the application file and the information referred to therein has been considered by the examiner.

Specification
The disclosure is objected to because of the following informalities:
The use of the terms Google Android, Microsoft Windows, Apple OS X and Linux, which are trade names or marks used in commerce, has been noted in this application. They should be capitalized wherever they appear and be accompanied by 
Although the use of trade names and marks used in commerce (i.e., trademarks, service marks, certification marks, and collective marks) are permissible in patent applications, the proprietary nature of the marks should be respected and every effort made to prevent their use in any manner which might adversely affect their validity as commercial marks. Appropriate correction is required.

Claim Objections
Claims 2-16 and 24 are objected to because of the following informalities: 
In lines 8-9 and 12-13 of claim 2, the recitations of “analog voltage values associated elements of the first memory circuit” and “analog voltage values associated elements of the second memory circuit” are grammatically incorrect and appear to be missing one or more words. If supported by the original specification, the examiner suggests that one way to at least partially address this objection would be to amend lines 8-9 and 12-13 of claim 2 to recite “analog voltage values associated with elements of the first memory circuit” and “analog voltage values associated with elements of the second memory circuit”. Appropriate correction is required. 
In claim 10, “wherein the instruction to perform thresholding” should read “wherein the instructions to perform thresholding” (see intervening claim 9, from which claim 10 depends, which recites, inter alia, “instructions to perform thresholding”. Appropriate correction is required.

Also, claims 3-16, which depend directly or indirectly from claim 2, are objected to based on their respective dependencies from claim 2. 
 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 2-17, 20 and 23 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 2 and 20 both recite “the corresponding NN layer” (see, line 4 of claim 2 and lines 2-3 of claim 20). There is insufficient antecedent basis for this limitation in these claims. Applicant did not previously introduce any “corresponding NN layer” or corresponding neural network layer in these claims or in their respective base claims, claims 1 and 19.
For examination purposes, the examiner is interpreting the first recitation of the term “the corresponding NN layer” in claims 2 and 20 as “[[the]] a corresponding NN layer”. Appropriate correction is required.
Claims 7 and 23 both recite “a Manhattan (L1) difference” and “a Euclidean (L2) difference” (see, lines 3-4 of claims 7 and 23). Aside from merely repeating the claim language in paragraphs 26, 38, 67 and 85, the specification does not provide any examples or define what is meant by “a Manhattan (L1) difference” and “a Euclidean 
Claims 8 and 23 both recite “an L1 normalization, an L2 normalization” (see, line 3 of claims 8 and line 4 of claim 23). Aside from merely repeating the claim language in paragraphs 26, 38, 68 and 86, and mentioning “L2 normalization pooling” in paragraphs 28 and 72, the specification does not provide any examples or define what is meant by “an L1 normalization, an L2 normalization”. As such, one of ordinary skill in the art would not be able to draw a clear boundary between what is and is not covered by claims 8 and 23. For examination purposes, “an L1 normalization, an L2 normalization” are being interpreted as any normalizations. Appropriate correction is required.
Claims 17 and 25 both recite “wherein the CPU is an x86-architecture processor” in line 1. Aside from repeating the claim language in paragraphs 77 and 95 and stating “the CPU 110 may be an x86 architecture processor, which is to say a processor implementing an x86 instructions set or some portion thereof.” and “Processor 110 may be implemented as a complex instruction set computer (CISC) or a reduced instruction set computer (RISC) processor. In some embodiments, the CPU 110 may be an x86 architecture processor, which is to say a processor implementing an x86 instructions set or some portion thereof.” in paragraphs 17 and 43, the specification does not provide examples or specifically define what is meant by “an x86-architecture processor”, much less set forth any clear boundaries for the claimed “x86-architecture processor”. Thus, 
Also, claims 3-16, which depend directly or indirectly from claim 2, are rejected under 35 U.S.C. 112(b) as being indefinite under the same rationale as claim 2.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-2, 9-10 and 15-18 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 10, 12, 14 and 19-20 of copending Application No. 16/258,522 (reference application) in view of non-patent literature Chi, et al. ("PRIME: A Novel Processing-In-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 27-39, hereinafter “Chi”). Although the claims at issue are not identical, they are not patentably distinct from each other because the limitations of 
Regarding independent claim 1, claim 1 of the copending application teaches the limitations as shown in the table below; however, the copending application does not explicitly teach CPU) to execute instructions from a general-purpose instruction set; a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set; and … an AI processor coupled to the CPU, the AI processor to perform analog in-memory computations based on … (3) the AI instruction set executed by the NPU.
In the same field, analogous art Chi teaches a “CPU) to execute instructions from a general-purpose instruction set (aside from repeating the claim language in paragraphs 15, 61 and 80, the specification does not define “a general-purpose instruction set”. Therefore, “a general-purpose instruction set”, under the broadest reasonable interpretation (BRI), in light of the specification, are any instructions executable by a general-purpose processor or CPU) (see, e.g., pages 29 and 33-34, “When NN applications are running, PRIME can execute them … the first instruction set architecture for NN accelerators has been proposed”, “the PRIME controller that decodes instructions”, “PRIME to support NN programming … hardware execution, … and code execution [i.e., a processor/CPU to execute instructions]);
a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set (see, e.g., FIG. 3 – showing the PRIME NPU architecture including an integrated “CPU”, and Abstract and pages 30, 32 the AI processor to perform analog in-memory computations (see, e.g., pages 30 and 37, Section III, “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs”, “PRIME reduces all the three parts of energy consumption significantly. For computation, ReRAM based analog computing is very energy-efficient” [i.e., AI processor of PRIME performs analog in-memory computing/computations]) based on … (3) the AI instruction set executed by the NPU (see, e.g., page 32, Section III C, “PRIME controller that decodes instructions and provides control signals … including the function selection of each mat among programming synaptic weights, computation, and memory, and also the input source selection for computation” [i.e., the AI processor of PRIME performs in analog in-memory computations based on the AI instruction set executed by the NPU/PRIME]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Chi with the hybrid artificial intelligence 
Instant Application No. 16/262,583
(as filed on 01/30/2019)
Copending Application No. 16/258,522 (as filed on 01/25/2019)
1. A hybrid artificial intelligence (AI) processing system comprising:
a central processing unit (CPU) to execute instructions from a general-purpose instruction set;
















a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set; and
















an AI processor coupled to the CPU, the AI processor to perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU, (2) input data provided by the CPU, and (3) the AI instruction set executed by the NPU.










2. The system of claim 1, wherein the AI processor comprises one or more NN layers, at least one of the one or more NN layers including:
a digital access circuit to receive a subset of the weighting factors, the subset associated with the corresponding NN layer, and to receive data associated with the corresponding NN layer;
a first memory circuit to store the subset of the weighting factors;
a first bit line processor (BLP) associated with the first memory circuit, the first BLP to perform analog calculations based on analog voltage values associated elements of the first memory circuit;




a second memory circuit to store the data associated with the corresponding NN layer;
a second BLP associated with the second memory circuit, the second BLP to perform analog calculations based on analog voltage values associated elements of the second memory circuit; and

a cross bit line processor (CBLP) to perform analog calculations based on results generated by the first BLP and the second BLP.


9. The system of claim 2, wherein the AI instruction set includes instructions to perform thresholding on the results of the CBLP analog calculations.
10. The system of claim 9, wherein the instruction to perform thresholding includes an option to specify at least one of sigmoid thresholding, Rectified Linear Unit (ReLU) thresholding, hyperbolic tangent thresholding, sign thresholding, minimum thresholding, maximum thresholding, and softmax thresholding.

11. The system of claim 9, wherein the AI instruction set includes instructions to perform pooling on the thresholded results of the CBLP analog calculations.


15. The system of claim 2, wherein at least one of the NN layers is a convolutional NN layer.

16. The system of claim 2, wherein at least one of the NN layers is a fully connected NN layer.
17. The system of claim 1, wherein the CPU is an x86-architecture processor.

An integrated circuit or chip set comprising the system of claim 1.
A hybrid artificial intelligence (AI) processing system comprising:
a central processing unit (CPU);
Aside from repeating the claim language in paragraphs 15, 61 and 80, the specification does not define “a general-purpose instruction set”. Therefore, “a general-purpose instruction set”, under 
See, e.g., Chi pages 29 and 33-34, “When NN applications are running, PRIME can execute them … the first instruction set architecture for NN accelerators has been proposed”, “the PRIME controller that decodes instructions”, “PRIME to support NN programming … hardware execution, … and code execution [i.e., a processor/CPU to execute instructions]
See, e.g., Chi FIG. 3 – showing the PRIME NPU architecture including an integrated “CPU”, and Abstract and pages 30, 32 and 36, Section III, “compared with a state-of-the-art neural processing unit design, PRIME improves the performance” [i.e., PRIME is a neural processing unit/NPU], “PRIME directly leverages ReRAM cells to perform computation”, “the PRIME controller that decodes instructions … including the ; and
an AI processor coupled to the CPU, the AI processor to perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU and (2) input data provided by the CPU. and 
(3) See, e.g., Chi, page 32, Section III C, “PRIME controller that decodes instructions and provides control signals … including the function selection of each mat among programming synaptic weights, computation, and memory, and also the input source selection for computation” [i.e., the AI processor of 
2. The system of claim 1, wherein the AI processor comprises one or more NN layers, at least one of the one or more NN layers including:
a first digital access circuit to receive, from the CPU, a subset of the weighting factors, the subset associated with the corresponding NN layer;

a first memory circuit to store the subset of the weighting factors;
a first bit line processor (BLP) associated with the first memory circuit, the first BLP to generate a first sequence of vectors of analog voltage values, each of the first sequence of vectors associated with a column of the first memory circuit;
a second digital access circuit to receive data associated with the corresponding NN layer;
a second memory circuit to store the data associated with the corresponding NN layer;
a second bit line processor (BLP) associated with the second memory circuit, the second BLP to generate a second sequence of vectors of analog voltage values, each of the second sequence of vectors associated with a column of the second memory circuit; and
a cross bit line processor (CBLP) to calculate a sequence of analog dot products, each of the analog dot products calculated between one of the first sequence of vectors and one of the second sequence of vectors.
10. The system of claim 2, wherein at least one of the NN layers further includes a Rectified Linear Unit (ReLU) to perform thresholding on the sequence of analog dot products.








11. The system of claim 10, wherein at least one of the NN layers further includes a pooling logic circuit to perform maximum pooling on the thresholded sequence of analog dot products.

19. The NN layer of claim 16, wherein the NN layer is a convolutional NN layer.


20. The NN layer of claim 16, wherein the NN layer is a fully connected NN layer.

12. The system of claim 1, wherein the CPU is an x86-architecture processor.

An integrated circuit or chip set comprising the system of claim 1.


Regarding instant claim 1, the instant claim is substantially identical to claim 1 of the co-pending application, except the instant claim 1 recites a “CPU) to execute instructions from a general-purpose instruction set; a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set; and an AI processor coupled to the CPU, the AI processor to perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU, (2) input data provided by the CPU, and (3) the AI instruction set executed by the NPU” whereas the co-pending application claim 1 recites “a central processing unit (CPU); and an AI processor coupled to the CPU, the AI processor to perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU and (2) input data provided by the CPU.”
As shown in the table above, claim 1 of the instant application is substantially identical to claim 1 of the co-pending application in view of Chi. Therefore, instant application claim 1 is taught by claim 1 of copending Application No. 16/258,522 in view of Chi. In particular, as discussed above, in the same field, analogous art Chi teaches a 
Regarding each of instant dependent claims 2, 9-10 and 15-18, each of the instant claims obviously encompasses the claimed invention of each of claims 2, 10, 12, 14 and 19-20 in the reference patent in view of Chi and the claims differ only in terminology. As such, instant claims 2, 9-10 and 15-18 are rejected under the same rationale as instant claim 1.
As further shown in the table and above, claims dependent 2, 9-10 and 15-18 of the instant application are substantially identical to dependent claims 2, 10, 12, 14 and 19-20 of the co-pending application in view of Chi. Therefore, instant application claims 2, 9-10 and 15-18 are taught by claims 2, 10, 12, 14 and 19-20 of copending Application No. 16/258,522 in view of Chi. 
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-12, 15-16 and 18-24 are rejected under 35 U.S.C. 103 as being unpatentable over Yakopcic et al. (U.S. Patent No. 10,176,425 B2, hereinafter “Yakopcic”) in view of non-patent literature Chi, et al. ("PRIME: A Novel Processing-In-.
With respect to claim 1, Yakopcic discloses the invention as claimed including a hybrid artificial intelligence (AI) processing system (see, e.g., column 2, lines 57-58, column 17, lines 65-68, column 23, lines 49-53, “The present invention also provides an analog neuromorphic system”, "the analog neuromorphic circuit 400 may be incorporated into digital signal processing applications", “the analog neuromorphic circuit 400 may be incorporated into analog neuromorphic configurations to execute popular neural network algorithms to execute popular neural network algorithms" [i.e., a hybrid analog-digital neuromorphic/neural network/AI system]) comprising:
a central processing unit (CPU) to execute instructions from a general-purpose instruction set (aside from repeating the claim language in paragraphs 15, 61 and 80, the specification does not define “a general-purpose instruction set”. Therefore, “a general-purpose instruction set”, under the broadest reasonable interpretation (BRI), in light of the specification, are any instructions executable by a general-purpose processor or CPU) (see, e.g., column 2, line 67, column 4, lines 60-62 and column 17, lines 65-68, ''A controller is configured”, “instructions stored on a machine-readable medium, which may be read and executed by one or more processors.”, "digital signal processing applications" [i.e., a digital/conventional processor/CPU to execute instructions]); … and
an AI processor (see, e.g., FIG. 1 depicting neuromorphic processing device 100 [i.e., an AI processor] that is electrically and communicatively coupled to processor/CPU via lines 140 and 180 and column 6, lines 33-34, “an analog , the AI processor to perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU, (2) input data provided by the CPU (see, e.g., column 6, lines 10-14, 35-40 and 50-54, column 17, lines 55-58, column 25, lines 57-59, "simultaneous execution of addition and multiplication operations in an analog circuit” [i.e., perform analog computations], “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100 and the analog neuromorphic processing device 100 then generates a plurality of output signals 180”, “resistive memories are also of nano-scale sizes that enable a significant amount of resistive memories to be configured within the analog neuromorphic processing device 100 [i.e., the AI processor/ neuromorphic processing device 100 performs analog in-memory computations based on input data], “the analog neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values” [i.e., and based on NN weighting factors], “feature maps after being generated may then be stored in a digital storage layer” [i.e., provided by the CPU]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set; and …
an AI processor coupled to the CPU, the AI processor to perform analog in-memory computations based on … (3) the AI instruction set executed by the NPU.
a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set (see, e.g., FIG. 3 – showing the PRIME NPU architecture including an integrated “CPU”, and Abstract and pages 30, 32 and 36, Section III, “compared with a state-of-the-art neural processing unit design, PRIME improves the performance” [i.e., PRIME is a neural processing unit/NPU], “PRIME directly leverages ReRAM cells to perform computation without the need for extra PUs [processing units]. To achieve this, as shown in Figure 3(c), PRIME partitions a ReRAM bank”, “the PRIME controller that decodes instructions … including the function selection of each mat among programming synaptic weights”, “We also evaluate two different NPU solutions: using a complex parallel NPU [17] as a co-processor (pNPU-co), and using the NPU as a PIM-processor” [i.e., PRIME/NPU executes instructions from an AI instruction set for programming AI synaptic weights]); and
an AI processor coupled to the CPU (see, e.g., FIG. 3 – showing PRIME architecture with an AI processor coupled to the CPU and pages 32 and 34, “when PRIME is accelerating NN computation, CPU can still access the memory and work in parallel”, “When LRN layers are applied PRIME requires the help of CPU for LRN computation” [i.e., an AI processor in the PRIME architecture works in parallel with the CPU and is communicatively coupled to the CPU]), the AI processor to perform analog in-memory computations (see, e.g., pages 30 and 37, Section III, “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs”, “PRIME reduces all the three parts of energy consumption significantly. For computation, ReRAM based analog computing is based on … (3) the AI instruction set executed by the NPU (see, e.g., page 32, Section III C, “PRIME controller that decodes instructions and provides control signals … including the function selection of each mat among programming synaptic weights, computation, and memory, and also the input source selection for computation” [i.e., the AI processor of PRIME performs in analog in-memory computations based on the AI instruction set executed by the NPU/PRIME]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 2, as discussed above, Yakopcic in view of Chi teaches the system of claim 1.
wherein the AI processor comprises one or more NN layers (see, e.g., column 8, lines 28-29 and column 23, line 48: "layering of the analog neuromorphic processing device 100 with other similar analog neuromorphic circuits”, “in a given layer of a CNN system”), at least one of the one or more NN layers including:
a digital access circuit to receive a subset of the weighting factors, the subset associated with the corresponding NN layer (aside from repeating the claim language and stating, with reference to the high level block diagram of FIG. 2, that “Digital access circuits 210 are configured to receive, from the CPU, weighting factors 120, or a subset of those weights associated with the NN layer.” – see paragraphs 19, 21, 37 and 62, the specification does not define “a digital access circuit”. Therefore, “a digital access circuit”, under the BRI, in light of the specification, is any digital circuit or circuitry that is capable of receiving weights or weighting factors. Also, as indicated above, “the corresponding NN layer” has been interpreted as “a corresponding NN layer”) (see, e.g., column 9, lines 48-52, column 17, lines 49-57 and 65-67, column 23, lines 54-55 and column 25, lines 57-59, “combined weight 295 as shown in FIG. 2 as representative of the combined weight for the input voltage 240a is shown as Wj , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “neuromorphic circuit 400 is capable of executing dot product operations in numerous applications such as but not limited to neural applications, image recognition, image processing, digital signal processing … neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., a subset of , and to receive data associated with the corresponding NN layer (see, e.g., column 10, lines 17-25, column 18, lines 26-33 and column 25, lines 56-59, “neuromorphic circuit 200 may also be scaled to include additional layers of neurons … to the extent that the neural network configuration 300 can execute learning algorithms. For example, a neural network configuration with a significant number of input”, "image is a two-dimensional image depicted by the image matrix" [i.e., the image data is input data associated with the NN layer], “24x24 pixel feature maps after being generated may then be stored in a digital storage layer as the output of the first convolution layer.” [i.e., receive data associated with the NN layer]);
a first memory circuit to store the subset of the weighting factors (see, e.g., column 17, lines 56-57, column 23, lines 54-56 and column 25, lines 57-59 “vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., subset of weighted values/weighting factors to be applied to an image], “In executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n) may be determined” [i.e., subset of weights for each memory circuit 410a-n], “pixel feature maps after being generated may then be stored in a digital storage layer as the output of the first convolution layer.” [i.e., a first memory circuit stores the subset of weights]);
a first … processor … associated with the first memory circuit, the first [processor] … to perform analog calculations based on analog voltage values associated elements of the first memory circuit (see, e.g., column 18, lines 37-42 and 59-61, " the controller 405 may convert the image matrix xex into the vector values included in the vector xex that are applied as the input voltages 440(a-n) and the vector values included in the vector -xex that are applied as the complemented input voltages 460( a-n )”, “controller 405 may then convert the kernel matrix kex into kex + and kex - which are similar to w+ and w- discussed above", “The output configuration 500 to convert the output voltage signal 510 to the non-binary values represented by the dot-product operation value 470a and the complemented dot-product operation value 450a” [i.e., processor/controller 405 converts the kernel, which is a matrix of weights by performing analog calculations based on analog voltage values/input voltages 440a-n associated with the first memory circuit]);
a second memory circuit to store the data associated with the corresponding NN layer (see, e.g., column 18, lines 26-33 and column 25, lines 35-38 and 57-58: "image is a two-dimensional image depicted by the image matrix" [i.e., the image data is data associated with the NN layer], "neuromorphic circuit 1000 includes … resistive memories 410(a-n) … a digital storage layer" [i.e., resistive memories 410a-n include a second memory circuit for storing the data]);
a second [processor] … to perform analog calculations based on analog voltage values associated elements of the second memory circuit (see, e.g., column 19, lines 27-29 and column 20, lines 11-15, “the analog neuromorphic circuit 400 generates an output voltage signal 510. The output voltage signal 510 is generated ; and
a … processor … to perform analog calculations based on results generated by the first … and the second [processors] (see, e.g., column 20, lines 28-35, “The output configuration 500 includes the first op-amp configuration 520 and the second op-amp configuration 530 that may be positioned at the output of each column of the analog neuromorphic circuit 400 to both scale the output voltage signal 510 to a value on the non-linear smooth function 610 between "0" and "1" and does so by incorporating a neuron function such as an activation function and/or a thresholding function.” [i.e., a processor/op-amp configuration of the analog neuromorphic circuit 400 to perform analog calculations based on results/output voltage signal 510 generated by the first and second processors – controller 405 and output configuration 500]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose a first bit line processor (BLP) associated with the first memory circuit, the first BLP to perform analog calculations based on analog voltage values associated elements of the first memory circuit; …
a second BLP associated with the second memory circuit, the second BLP to perform analog calculations based on analog voltage values associated elements of the second memory circuit; and
a cross bit line processor (CBLP) to perform analog calculations based on results generated by the first BLP and the second BLP.
In the same field, analogous art Chi teaches a first bit line processor (BLP) associated with the first memory circuit, the first BLP to perform analog calculations based on analog voltage values associated elements of the first memory circuit (aside from repeating the claim language in paragraphs 20, 24 and 62 and stating, with reference to the high level block diagram of FIG. 2, “The first BLP circuit 230, associated with the first memory circuit 220, is configured to generate a first sequence of vectors of analog voltage values.” in paragraph 23, the specification does not define “a first bit line processor (BLP)” or a “bit line processor (BLP)”. Therefore, “a first bit line processor (BLP)”, under the BRI, in light of the specification, is any processor, functional unit, circuitry or circuit that is capable of performing analog calculations based on analog voltage values) (see, e.g., page 29, “execute the neural networks in Figure 2(a). The input data ai is represented by analog input voltages … Then the current flowing to the end of each bitline is viewed … After sensing the current on each bitline, the neural networks adopt a nonlinear function unit to complete the execution. Implementing NNs with ReRAM crossbar arrays requires specialized peripheral circuit design.” [i.e., a bitline unit/circuit performs analog calculations based on analog voltage values associated with a first ReRAM/memory circuit in the crossbar array]).
a second BLP associated with the second memory circuit, the second BLP to perform analog calculations based on analog voltage values associated elements of the second memory circuit (aside from repeating the claim language in ; and
a cross bit line processor (CBLP) to perform analog calculations based on results generated by the first BLP and the second BLP (As indicated above, the first and second BLPs, under the BRI, are any processors, functional units, circuitry or circuits that are capable of performing analog calculations based on analog voltage values. Also, aside from repeating the claim language in paragraphs 20, 24, 62 and 64 and stating, with reference to the high level block diagram of FIG. 2, “The CBLP circuit 240 is configured to calculate a sequence of analog dot products” and “the CBLP circuit 240 performs the analog multiply portion of the dot product operation by timing current integration over a capacitor. Circuit 240 may be configured as a capacitor in series with 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging 

Regarding claim 3, as discussed above, Yakopcic in view of Chi teaches the system of claim 2.
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to employ the digital access circuit to copy the weighting factors from the CPU to the first memory circuit and to copy the input data from the CPU to the second memory circuit.
In the same field, analogous art Chi teaches wherein the AI instruction set includes instructions to employ the digital access circuit to copy the weighting factors from the CPU to the first memory circuit and to copy the input data from the CPU to the second memory circuit (see, e.g., pages 28-30, 32 and 35, “We propose a ReRAM main memory architecture, which contains a portion of memory arrays (full function subarrays) that can be configured as NN accelerators or as normal memory on demand. It is a novel PIM solution to accelerate NN applications, which enjoys the advantage of in-memory data movement”, “many NN applications require high memory bandwidth to fetch large-size input data and synaptic weights, the data movement between memory and processor”, “for NN computation the FF subarrays enjoy the high bandwidth of in-memory data movement, and can work in parallel with CPU, with the help of the Buffer subarrays”, “The right four commands in Table I control the data movement. They are applied during the whole computation phase.” [i.e., NN/AI 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).


Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to employ the digital access circuit to store results of the CBLP analog calculations to a third memory circuit associated with the CPU.
In the same field, analogous art Chi teaches wherein the AI instruction set includes instructions to employ the digital access circuit to store results of the CBLP analog calculations to a third memory circuit associated with the CPU (In line with the BRI indicated above, the “digital access circuit”, under the BRI, in light of the specification, is any digital circuit or circuitry that is capable of storing results. As also indicated above the “CBLP”, under the BRI, in light of the specification, is any processor, functional unit, capacitor, circuitry or circuit that is capable of performing analog calculations based on generated results or outputs from other processors, units or circuits.) (see, e.g., pages 31 and 34, “we modify the column multiplexers in ReRAM … in order to allow FF subarrays to switch bitlines between memory and computation modes, we attach a multiplexer to each bitline to control the switch … After analog processing, the output current is sensed” [i.e., results of analog cross-bitline/CBLP calculations are stored in ReRAM/a memory circuit], “To implement synapse composing, the high-bit and low-bit parts of the synaptic weights are stored in adjacent bitlines of the corresponding crossbar array … (as shown in Figure 4 A ); the output currents are accumulated at the bitlines.” [i.e., cross-bitline unit/circuit/CBLP calculation results/output are stored in a third memory circuit/ReRAM in the crossbar array]). 


Regarding claim 6, as discussed above, Yakopcic in view of Chi teaches the system of claim 5.
Yakopcic further discloses wherein the AI instruction set includes instructions to de-vectorize the results (paragraph 29 of the specification states “Figure 3 illustrates a vectorization process 300, in accordance with certain embodiments of the present disclosure. In this example, the input data 125 is in the form of a two-dimensional input image X. The image X is broken up into smaller two-dimensional patches 320. The patches are then vectorized (also referred to as unrolling) into linear vectors, or columnized patches 330 for storage … A complementary de-
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose the CBLP analog calculations in the third memory circuit
In the same field, analogous art Chi teaches the CBLP analog calculations in the third memory circuit (see, e.g., pages 31 and 34, “we modify the column 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).


Yakopcic further discloses wherein the AI instruction set includes instructions to employ the first and second BLPs to perform the analog calculations wherein the analog calculations include at least one of a multiplication, a Manhattan (L1) difference, and a Euclidean (L2) difference (as indicated above, “a Manhattan (L1) difference” and “a Euclidean (L2) difference” have been interpreted as any geometric distances or differences. As also indicated above, the first and second “BLPs” under the BRI, are any processors, functional units, circuitry or circuits that are capable of performing analog calculations based on analog voltage values.) (see, e.g., column 5, line 61-column 6, line 2 and column 7, lines 15-18, “Each resistive memory may apply a resistance to each input voltage so that each input voltage is multiplied by each resistance. … multiplication in parallel enables multiple multiplication operations to be executed simultaneously. … The simultaneous execution of addition and multiplication operations in an analog circuit”, “the resistive memories may simultaneously execute multiple addition and multiplication operations in parallel in response to the input voltages 140(a-n) being applied to the inputs of the analog neuromorphic processing device 100.” [i.e., instructions to employ first and second resistive memories/circuits to perform analog calculations including a multiplication, based on analog input voltage values]).

Regarding claim 8, as discussed above, Yakopcic in view of Chi teaches the system of claim 2.
 further discloses wherein the AI instruction set includes instructions to employ the CBLP to perform the analog calculations wherein the analog calculations include at least one of a dot product, an L1 normalization, an L2 normalization, a maximum operation, and a minimum operation (as indicated above, the “CBLP”, under the BRI, in light of the specification, is any processor, functional unit, capacitor, circuitry or circuit that is capable of performing analog calculations based on generated results or outputs from other processors, units or circuits. As also indicated above, “an L1 normalization, an L2 normalization” are being interpreted as any normalizations.) (see, e.g., column 7, lines 15-18 and column 11, lines 8-9, “The analog neuromorphic circuit 400 may be implemented so that dot-product operations may be executed”, “The controller 405 may then identify a minimum resistance value and a maximum resistance value for the resistance values of resistive memories 410c and 410/ and select resistance values for the resistive memories 410c and 410/ that are within the minimum and maximum resistance value range” [i.e., instructions to employ analog circuitry/resistive memories to perform analog calculations including a dot product and minimum and maximum operations]).

Regarding claim 9, as discussed above, Yakopcic in view of Chi teaches the system of claim 2.
Yakopcic further discloses wherein the AI instruction set includes instructions to perform thresholding on the results of the CBLP analog calculations (see, e.g., column 20, lines 28-35, “The output configuration 500 includes the first op-amp configuration 520 and the second op-amp configuration 530 that may 

Regarding claim 10, as discussed above, Yakopcic in view of Chi teaches the system of claim 9.
Yakopcic further discloses wherein the instruction to perform thresholding includes an option to specify at least one of sigmoid thresholding, … hyperbolic tangent thresholding, … minimum thresholding, maximum thresholding, and softmax thresholding (see, e.g., column 14, lines 8-13 and column 20, lines 5-8 and 24-35, “The controller 405 may then identify a minimum resistance value and a maximum resistance value for the resistance values of resistive memories 410c and 410f and select resistance values for the resistive memories 410c and 410f that are within the minimum and maximum resistance value range” “the output configuration 500 may be incorporated into the analog neuromorphic circuit 400 to simulate a non-linear smooth function configuration 600 in FIG. 6, such as a sigmoid function”, “The output configuration 500 may convert the output voltage signal to model the sigmoid function, the inverse tangent function, and/or any other type of non-linear smooth function … The output configuration 500 includes the first op-amp configuration 520 and the second op-amp configuration 530 that may be positioned at the output of each column of the analog neuromorphic circuit 400 to both scale the output voltage signal 510 to a value 
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the instruction to perform thresholding includes an option to specify at least one of sigmoid thresholding, Rectified Linear Unit (ReLU) thresholding.
In the same field, analogous art Chi teaches wherein the instruction to perform thresholding includes an option to specify at least one of sigmoid thresholding, Rectified Linear Unit (ReLU) thresholding (see, e.g., page 31, “The modified column multiplexer incorporates … a nonlinear threshold (sigmoid) unit”, “we add a hardware unit to support ReLU function, a function in the convolution layer of CNN.”, “Our circuit design supports two activation functions: sigmoid and ReLU. Sigmoid is implemented by the sigmoid unit in Figure 4 B, and ReLU is implemented by the ReLU unit.” [i.e., instructions/functions to perform thresholding on results of the analog calculations include options to specify sigmoid and ReLU thresholding]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … 

Regarding claim 11, as discussed above, Yakopcic in view of Chi teaches the system of claim 9.
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to perform pooling on the thresholded results of the CBLP analog calculations.
In the same field, analogous art Chi teaches wherein the AI instruction set includes instructions to perform pooling on the thresholded results of the CBLP analog calculations (see, e.g., FIG. 4 C – showing “4-1 max pooling function units” and pages 31 and 34, “a circuit to support 4-1 max pooling is included”, “To implement max pooling function, we adopt 4:1 max pooling hardware in Figure 4 C , which is able to support n:1 max pooling”, “Mean pooling is easier to implement than max pooling, because it can be done with ReRAM and does not require extra hardware. To perform n:1 mean pooling, we simply pre-program the weights [1/n, · · · , 1/n] in ReRAM cells” 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 12, as discussed above, Yakopcic in view of Chi teaches the system of claim 11.
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the instruction to perform pooling includes an option to specify a least one of maximum pooling, minimum pooling, average pooling, dropout pooling, and L2 normalization pooling.
wherein the instruction to perform pooling includes an option to specify a least one of maximum pooling, minimum pooling, average pooling, dropout pooling, and L2 normalization pooling (see, e.g., FIG. 4 C – showing “4-1 max pooling function units” [i.e., maximum pooling] and pages 31 and 34, “a circuit to support 4-1 max pooling is included”, “To implement max pooling function, we adopt 4:1 max pooling hardware in Figure 4 C , which is able to support n:1 max pooling”, “Mean pooling is easier to implement than max pooling, because it can be done with ReRAM and does not require extra hardware. To perform n:1 mean pooling, we simply pre-program the weights [1/n, · · · , 1/n] in ReRAM cells” [i.e., option to specify maximum and mean/average pooling]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 15, as discussed above, Yakopcic in view of Chi teaches the system of claim 2.
Yakopcic further discloses wherein at least one of the NN layers is a convolutional NN layer (see, e.g., FIG. 8 – depicting “CONVOLUTION” layers 830 in convolutional neural network/CNN 800 and column 24, lines 13-15 and column 25, lines 56-59, “The feature extractor 810 includes the combination of two different types layers that are the convolution layers 830(a-n)”, “pixel feature maps after being generated may then be stored in a digital storage layer as the output of the first convolution layer.”).

Regarding claim 16, as discussed above, Yakopcic in view of Chi teaches the system of claim 2.
Yakopcic further discloses wherein at least one of the NN layers is a fully connected NN layer (see, e.g., FIG. 8 – depicting “FULLY CONNECTED LAYER” in convolutional neural network/CNN 800 and column 24, lines 31-33, “The outputs of the last layer of the conventional CNN 800 are then input to a fully connected network that is the classifier 820.”).

Regarding claim 18, as discussed above, Yakopcic in view of Chi teaches the system of claim 1.
Examiner’s Note: claim 18, as drafted, depends from claim 1. If applicant intended for claim 18 to be an independent claim, the examiner suggests that one way to do so is to amend the last portion of claim 18 to explicitly recite the limitations of claim 
Yakopcic further discloses an integrated circuit or chip set (see, e.g., column 6, lines 57-61 and column 7, lines 55-57, “the analog neuromorphic processing device 100 has significant computational efficiency while maintaining the size of the analog neuromorphic processing device 100 to a chip that may easily be positioned on a circuit board.”, “The scaling of the resistive memories into additional neurons may be done within the analog neuromorphic processing device 100 such as within a single chip. However, the analog neuromorphic processing device 100 may also be scaled with other analog neuromorphic circuits contained in other chips” [i.e., an integrated circuit or chip set]) comprising the system of claim 1 (as indicated above, Yakopcic in view of Chi teaches the system of claim 1, see above citations to Yakopcic and Chi regarding the limitations of claim 1).

With respect to independent claim 19, Yakopcic discloses the invention as claimed including an artificial intelligence (AI) processing system (see, e.g., column 2, lines 57-58 and column 23, lines 49-53, “The present invention also provides an analog neuromorphic system”, “the analog neuromorphic circuit 400 may be incorporated into analog neuromorphic configurations to execute popular neural network algorithms to execute popular neural network algorithms" [i.e., a neuromorphic/neural network/AI system]) comprising:
a central processing unit (CPU) to execute instructions from a general-purpose instruction set (as indicated above, “a general-purpose instruction set”, under ; … and
(see, e.g., FIG. 1 depicting neuromorphic processing device 100 [i.e., an AI processor] that is electrically and communicatively coupled to processor/CPU via lines 140 and 180 and column 6, lines 33-34, “an analog neuromorphic processing device 100” [i.e., neuromorphic processing device 100/AI processor is coupled to processor/CPU via input lines 140]) … , the AI processor to perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU, (2) input data provided by the CPU (see, e.g., column 6, lines 10-14, 35-40 and 50-54, column 17, lines 55-58, column 25, lines 57-59, "simultaneous execution of addition and multiplication operations in an analog circuit” [i.e., perform analog computations], “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100 and the analog neuromorphic processing device 100 then generates a plurality of output signals 180”, “resistive memories are also of nano-scale sizes that enable a significant amount of resistive memories to be configured within the analog neuromorphic processing device 100 [i.e., the AI processor/ neuromorphic processing device 100 performs analog in-memory computations based on input data], “the analog neuromorphic circuit 400 may be , … 
wherein the AI processor comprises a NN layer (see, e.g., column 8, lines 28-29 and column 23, line 48: "layering of the analog neuromorphic processing device 100 with other similar analog neuromorphic circuits”, “in a given layer of a CNN system”), the NN layer including analog processing circuitry and memory circuitry, the memory circuitry to store the weighting factors and to store the input data (see, e.g., column 8, lines 28-29, column 10, lines 17-25, column 23, lines 54-55 and column 25, lines 57-58, "layering of the analog neuromorphic processing device 100 with other similar analog neuromorphic circuits”, “the analog neuromorphic circuit 200 may also be scaled to include additional layers of neurons … additional layers of neurons also exponentially increases the computational efficiency of the neural network configuration 300 to the extent that the neural network configuration 300 can execute learning algorithms. For example, a neural network configuration with a significant number of input voltages” [i.e., NN layer includes analog processing circuitry], “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)", "a digital storage layer” [i.e., NN layer includes resistive memoires/circuitry to store the weights/weighting factors and the input data]), the analog processing circuitry to perform calculations between the stored weighting factors and the stored input data (see, e.g., column 9, lines 48-52, column 10, lines 17-25, column 17, lines 54-57 and column 23, lines 54-55, “combined weight 295 as shown in FIG. 2 as representative j , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “the analog neuromorphic circuit 200 may also be scaled to include additional layers of neurons … to the extent that the neural network configuration 300 can execute learning algorithms. For example, a neural network configuration with a significant number of input”, “the analog neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image”, “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)" [i.e., the analog processing circuitry executes learning algorithms/image processing/performs calculations using the stored weights/weighting factors and the stored input]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set; and …
an AI processor coupled to the CPU … the AI processor to perform analog in-memory computations based on … (3) the AI instruction set executed by the NPU.
In the same field, analogous art Chi teaches a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set (see, e.g., FIG. 3 – showing the PRIME NPU architecture including an integrated “CPU”, and Abstract and pages 30, 32 and 36, Section III, “compared with a state-of-the-art neural processing unit design, PRIME improves the performance” [i.e., PRIME is a neural processing unit/NPU], “PRIME directly leverages ReRAM cells to perform ; and
an AI processor coupled to the CPU (see, e.g., FIG. 3 – showing PRIME architecture with an AI processor coupled to the CPU and pages 32 and 34, “when PRIME is accelerating NN computation, CPU can still access the memory and work in parallel”, “When LRN layers are applied PRIME requires the help of CPU for LRN computation” [i.e., an AI processor in the PRIME architecture works in parallel with the CPU and is communicatively coupled to the CPU]), the AI processor to perform analog in-memory computations (see, e.g., pages 30 and 37, Section III, “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs”, “PRIME reduces all the three parts of energy consumption significantly. For computation, ReRAM based analog computing is very energy-efficient” [i.e., AI processor of PRIME performs analog in-memory computing/computations]) based on … (3) the AI instruction set executed by the NPU (see, e.g., page 32, Section III C, “PRIME controller that decodes instructions and provides control signals … including the function selection of each mat among programming synaptic weights, computation, and memory, and also the input source 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 20, as discussed above, Yakopcic in view of Chi teaches the system of claim 19.
Yakopcic further discloses wherein the AI processor further includes a digital access circuit to receive a subset of the weighting factors, the subset associated with the corresponding NN layer, and to receive data associated with the corresponding NN layer (As indicated above, “a digital access circuit”, under the BRI, in light of the specification, is any digital circuit or circuitry that is capable of receiving j , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “neuromorphic circuit 400 is capable of executing dot product operations in numerous applications such as but not limited to neural applications, image recognition, image processing, digital signal processing … neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., a subset of weights/weighting factors associated with a neuromorphic layer/NN layer], “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)" [i.e., circuitry/resistive memories receive the subset of weights associated with the NN layer], “stored in a digital storage layer as the output of the first convolution layer” [i.e., computing system for digital processing includes a digital circuit]) (see, e.g., column 10, lines 17-25, column 18, lines 26-33 and column 25, lines 56-59, “neuromorphic circuit 200 may also be scaled to include additional layers of neurons … to the extent that the neural network configuration 300 can execute learning algorithms. For example, a neural network configuration with a significant number of input”, "image is a two-dimensional image depicted by the image matrix" [i.e., the image data is input data associated with the NN layer], “24x24 pixel feature maps after being generated may then be stored in a digital .
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose and the instruction set includes instructions to employ the digital access circuit to copy the weighting factors and the input data from the CPU to the NN layer memory circuitry
In the same field, analogous art Chi teaches and the instruction set includes instructions to employ the digital access circuit to copy the weighting factors and the input data from the CPU to the NN layer memory circuitry (see, e.g., pages 28-30, 32 and 35, “We propose a ReRAM main memory architecture, which contains a portion of memory arrays (full function subarrays) that can be configured as NN accelerators or as normal memory on demand. It is a novel PIM solution to accelerate NN applications, which enjoys the advantage of in-memory data movement”, “many NN applications require high memory bandwidth to fetch large-size input data and synaptic weights, the data movement between memory and processor”, “for NN computation the FF subarrays enjoy the high bandwidth of in-memory data movement, and can work in parallel with CPU, with the help of the Buffer subarrays”, “The right four commands in Table I control the data movement. They are applied during the whole computation phase.” [i.e., NN/AI instructions/commands to move/copy data between the processor/CPU and memory circuits], “Small-Scale NN: Replication … Our optimization is to replicate the small NN to different independent portions of the mat. For example, to implement a 128 − 1 NN, we duplicate it and map a 256 − 2 NN to the target mat. This optimization can also be applied to convolution layers. Furthermore, if there is another 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claims 4 and 21, as discussed above, Yakopcic in view of Chi teaches the systems of claims 2 and 19.
Yakopcic further discloses wherein the AI instruction set includes instructions to vectorize the weighting factors stored in the first memory circuit and to vectorize the input data stored in the second memory circuit (paragraph 29 of the specification states “Figure 3 illustrates a vectorization process 300, in 

Regarding claim 22, as discussed above, Yakopcic in view of Chi teaches the system of claim 19.
wherein the AI instruction set includes instructions to employ the digital access circuitry to store results of the analog calculations to memory circuitry associated with the CPU.
In the same field, analogous art Chi teaches wherein the AI instruction set includes instructions to employ the digital access circuitry to store results of the analog calculations to memory circuitry associated with the CPU (In line with the BRI indicated above, the “digital access circuit”, under the BRI, in light of the specification, is any digital circuit or circuitry that is capable of storing results.) (see, e.g., pages 31 and 34, “we modify the column multiplexers in ReRAM … in order to allow FF subarrays to switch bitlines between memory and computation modes, we attach a multiplexer to each bitline to control the switch … After analog processing, the output current is sensed” [i.e., results of analog calculations are stored in ReRAM/memory circuitry], “To implement synapse composing, the high-bit and low-bit parts of the synaptic weights are stored in adjacent bitlines of the corresponding crossbar array … (as shown in Figure 4 A ); the output currents are accumulated at the bitlines.” [i.e., analog calculation results/output are stored in a memory circuit/ReRAM in the crossbar array that is associated with the CPU]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … 

Regarding claim 23, as discussed above, Yakopcic in view of Chi teaches the system of claim 19.
Yakopcic further discloses wherein the AI instruction set includes instructions to employ the analog processing circuitry to perform the analog calculations wherein the analog calculations include at least one of a multiplication, a Manhattan (L1) difference, a Euclidean (L2) difference, a dot product, an L1 normalization, an L2 normalization, a maximum operation, and a minimum operation (as indicated above, “a Manhattan (L1) difference” and “a Euclidean (L2) difference” have been interpreted as any geometric distances or differences and “an L1 normalization, an L2 normalization” are being interpreted as any normalizations.) (see, e.g., column 5, line 61-column 6, line 2, column 7, lines 15-18, column 11, lines 8-9 and column 14, lines 8-13, “Each resistive memory may apply a resistance to each input voltage so that each input voltage is multiplied by each resistance. … multiplication in parallel enables multiple multiplication operations to be 

Regarding claim 24, as discussed above, Yakopcic in view of Chi teaches the system of claim 19.
Yakopcic further discloses wherein the AI instruction set includes instructions to perform thresholding on the results of the analog calculations (see, e.g., column 20, lines 28-35, “The output configuration 500 includes the first op-amp configuration 520 and the second op-amp configuration 530 that may be positioned at the output of each column of the analog neuromorphic circuit 400 to both scale the output voltage signal 510 to a value on the non-linear smooth function 610 between "0" and "1" and does so by incorporating a neuron function such as … a thresholding 
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to … perform pooling on the thresholded results of the analog calculations (see, e.g., FIG. 4 C – showing “4-1 max pooling function units” and pages 31 and 34, “a circuit to support 4-1 max pooling is included”, “To implement max pooling function, we adopt 4:1 max pooling hardware in Figure 4 C , which is able to support n:1 max pooling”, “Mean pooling is easier to implement than max pooling, because it can be done with ReRAM and does not require extra hardware. To perform n:1 mean pooling, we simply pre-program the weights [1/n, · · · , 1/n] in ReRAM cells” [i.e., instructions to perform pooling on the thresholded results of the analog calculations]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging 

Claim 14 is rejected under 35 U.S.C. 103 as being obvious over Yakopcic in view of Chi as applied to claims 1 and 2 above, and further in view of Kim et al. (U.S. Patent No. 10,726,895 B1, hereinafter “Kim”). Kim was filed on January 7, 2019, and this date is before the effective filing date of this application, i.e., January 30, 2019. Therefore, Kim constitutes prior art under 35 U.S.C. 102(a)(2).
Regarding claim 14, as discussed above, Yakopcic in view of Chi teaches the system of claim 2. However, Yakopcic in view of Chi is not relied on to teach wherein the AI instruction set includes instructions to cause the CPU to employ the AI processor to perform backpropagation training.
In the same field, analogous art Kim teaches wherein the AI instruction set includes instructions to cause the CPU to employ the AI processor to perform backpropagation training (see, e.g., FIG. 2 – illustrating a backpropagation algorithm [i.e., an AI instruction set/algorithm including instructions to cause a CPU to employ an AI processor to perform backpropagation] and col. 6, lines 7-19, “FIG. 2 illustrates a backpropagation algorithm (used to train NN) which is composed of three cycles, forward, backward and weight update … There is an algorithm for neural networks called backpropagation algorithm that can be a primary generator of learning in neural networks” [i.e., an AI instruction set/algorithm including instructions to cause the CPU to employ the AI processor to perform backpropagation training/learning]).


Claims 13, 17 and 25 are rejected under 35 U.S.C. 103 as being obvious over Yakopcic in view of Chi as applied to claim 1 above, and further in view of Deisher et al. (U.S. Patent Application Pub. No. 2018/0121796 A1, hereinafter “Deisher”).
Regarding claim 13, as discussed above, Yakopcic in view of Chi teaches the system of claim 2. However, Yakopcic in view of Chi is not relied on to teach wherein the AI instruction set includes instructions to transpose at least one of the first memory circuit and the second memory circuit.
In the same field, analogous art Deisher teaches wherein the AI instruction set includes instructions to transpose at least one of the first memory circuit and the second memory circuit (aside from repeating the claim language in paragraphs 73 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Deisher with Yakopcic in view of Chi to provide an “NN system 200 [that] may be a system on a chip (SoC) that has an NN 
Regarding claims 17 and 25, as discussed above, Yakopcic in view of Chi teaches the systems of claims 1 and 19. However, Yakopcic in view of Chi is not relied on to teach wherein the CPU is an x86-architecture processor.
In the same field, analogous art Deisher teaches wherein the CPU is an x86-architecture processor (as indicated above, “an x86-architecture processor” has been interpreted as any CISC CPU or processor in the x86 family of CPUs and processors or any CPU or processor capable of executing x86 instructions such as x86 assembly language) (see, e.g., paragraph 280, “Processor 1410 may be implemented as a Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Deisher with Yakopcic in view of Chi to provide an “NN system 200 [that] may be a system on a chip (SoC) that has an NN Accelerator (NNA)” and includes a “processor 250 [that] may process instructions and may send data to, and receive data from, a volatile memory 248 which may be on-board, on-die or on-chip relative to the SoC, and may be RAM such as DRAM or SRAM and so forth. The processor 250 may control data flow with the memory” and “The processor 250 may retrieve or transmit data to other external (off-die or off-chip) volatile memory (such as cache and/or RAM) or non-volatile memory whether as memory 248 or another memory” for storing “the layer data within a layer as arranged in the memory” and storing “an input array” and a “weight matrix” (See, e.g., Deisher, paragraphs 46-47, 59 and 62). Doing so would have allowed Yakopcic in view of Chi to use Deisher’s NN system and NN accelerator components to achieve a “substantial reduction in the use of memory transactions and bandwidth to upload the same weight matrix multiple times for different groups”, as suggested by Deisher (See, e.g., Deisher, paragraphs 46-47 and 62).

Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon, is considered pertinent to applicant's disclosure.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Schuyler with Yakopcic in view of Chi to “provide an example MLP neural network accelerator backend called DANA (a Dynamically Allocated Neural Network Accelerator)” where “DANA uses a Processing Element (PE) model” (See, e.g., Schuyler, page 11). Doing so would have allowed Yakopcic in view of Chi to enable “Using additional x86 assembly instructions, we were then [be] able to use DANA … to accelerate neural network computation”, as suggested by Schuyler (See, e.g., Schuyler, page 95).
Also, for example, Liu (U.S. Patent No. 4,601,006, hereinafter “Liu”) discloses “Avoidance of heavy traffic between the main memory and a second memory is a first crucial problem to the implementation of an efficient two dimensional FFT. Another problem is in the matrix transpose. In the literature, most algorithms and/or their implementations require the transpose of a matrix either by Single Instruction Multiple Data (SIMD) or by conventional machine. The efficient storage of data in the secondary storage device such that it can avoid the matrix transpose or minimize the traffic between the main memory and the secondary memory is also a crucial problem.” [i.e., instructions to transpose matrices in a storage device/memory] (see, e.g., Liu, col. 1,lines 30-40).
 line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For 





/R.K.B./Examiner, Art Unit 2125 

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125