DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the amendment filed 6/21/2022. Claims 1-25 are pending, claims 1, 2, 5, 10, 19, 20 and 24 were amended by applicant, and no claims were added or cancelled in the amendment.

Response to Amendment
The amendment filed on 6/21/2022 has been entered. 
The previous objections to the specification are withdrawn in view of the 6/21/2022 amendments to the specification.
The previous objections to claims 2-16 and 24 are withdrawn in view of the 6/21/2022 amendments to the claims. However, as discussed below, objections to claims 1-25 remain.
The previous nonstatutory double patenting rejections of claims 1-2, 9-10 and 15-18 over claims 1-2, 10, 12, 14 and 19-20 of copending Application No. 16/258,522 in view of non-patent literature Chi are withdrawn in view of the 6/21/2022 amendments to the claims.
The previous rejections of claims 2-17, 20 and 23 under 35 U.S.C. 112(b) are withdrawn in view of the 6/21/2022 amendments to the claims and applicant’s remarks.

Response to Arguments
Applicant's arguments filed 6/21/2022 with respect to the objections to the specification have been fully considered and are persuasive.
Applicant's arguments filed 6/21/2022 with respect to the previous objections to claims 2-16 and 24 have been fully considered and are persuasive. However, as discussed below, objections to claims 1-25 remain.
Applicant's arguments filed 6/21/2022 with respect to the rejections of claims 2-12 under 35 U.S.C. 112(b) have been fully considered and are persuasive.
Applicant's remarks filed 6/21/2022 with respect to the nonstatutory double patenting rejections of claims 1-2, 9-10 and 15-18 on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 10, 12, 14 and 19-20 of copending Application No. 16/258,522 have been fully considered and are acknowledged. 
Regarding the double patenting rejections, Applicant states “A Terminal Disclaimer will be sumitted [sic – submitted] to overcome the obviousness-type double patenting rejection when the other rejections in this Applicantion [sic – application] have been resolved.” (applicant’s remarks, page 11). 
Applicant is correct in noting that a terminal disclaimer is one way to overcome obviousness-type double patenting rejections. However, as indicated above, the previous nonstatutory double patenting rejections of claims 1-2, 9-10 and 15-18 over claims 1-2, 10, 12, 14 and 19-20 of copending Application No. 16/258,522 in view of Chi are withdrawn in view of the amendments to the claims filed 6/21/2022. 
Applicant's arguments with respect to the rejections of claims 1-25 under 35 U.S.C. 103 have been fully considered but are moot because the arguments do not apply to the combination of references used in the current rejections. In particular, as discussed in detail below, a new combination of references (i.e., Yakopcic in view of Chi, and further in view of the newly-asserted reference Ren, et al., "On vectorization of deep convolutional neural networks for vision tasks." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 29. No. 1. 2015: 1840-1846, hereinafter “Ren”) is applied to reject amended independent claims 1 and 19 as well as amended dependent claim 2, 10, 20 and 24. As also detailed below, this new combination of references (i.e., Yakopcic in view of Ch and further in view of Ren) is also applied to reject dependent claims 3-9, 11-12, 15-18, 21-23. As further discussed below, a new combination of references, Yakopcic in view of Chi and Ren and further in view of Deisher is applied to reject dependent claims 13, 17 and 25. As additionally detailed below, a new combination of references, Yakopcic in view of Chi and Ren and further in view of Kim, is applied to reject dependent claim 14. Applicant’s amendments have necessitated the claim objections and rejections under 35 U.S.C. 103 discussed below.
Regarding claim 1, applicant states “claim 1 has been amended to recite that the [‘]artificial intelligence processor’ is configured to:
receive a subset of weighting factors from the CPU;
receive input data from the CPU;
vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors;
vectorize the input data received from the CPU to generate vectorized input data; 
Thus, the ‘arificial [sic – artificial] intelligence processor’ of claim 1 is configured to vectorize the ‘input data’ from the CPU and the ‘weighting factors’ from the CPU to generate ‘vectorized input data.’” (applicant’s remarks, page 12, emphasis in original). 
With continued reference to amended claim 1, applicant alleges, which examiner does not acquiesce to, that “Yakopcic is teaching doing vector dot product calculations on vectors that already exist and never teaches that vectors are generated from the input data and the weighting factors received from the central processing unit” before asserting that “Yakovic [sic - Yakopcic] does not teach an AI processor that is configured to provide vectorization so as to ‘generate the vectorized input data’ and ‘generate the vectorized weighting factors’ as recited in amended in claim 1. Chi does not cure the deficiencies of Yakovic [sic - Yakopcic].” (applicant’s remarks, pages 12-13).
Regarding the remaining claims, applicant asserts “Claim 18 is an independent claim 1 that includes all of the features of claim 1, as amended. Therefore, claim 18 is also allowable. With regard to independent claim 19, claim 19 has been amended to recite features similar to amended claim 1. Therefore, for at least the same reasons discussed above with regard to claim 1, claim 19 is also allowable. Dependent claims 2-1 7 depend from amended claim 1 and dependent claims 20-25 depend from amended claim 19” before generally asserting “Therefore, dependent claims 2-17 and 20-25 are also allowable.” (applicant’s remarks, page 13). 
As a preliminary matter, examiner notes that, contrary to applicant’s above assertion, claim 18 is not an independent claim. In particular, claim 18 depends from claim 1. 
Accordingly, applicant argues that the claim limitations that were added to independent claims 1 and 19 in the amendment filed on 6/21/2022, i.e., “receive a subset of weighting factors from the CPU; receive input data from the CPU; vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors; vectorize the input data received from the CPU to generate vectorized input data”, using respective similar language, are not disclosed or taught in the portions of the Yakopcic and Chi references applied to claims 1 and 19 in the previous Office Action. 
The examiner respectfully disagrees in view of newly-applied Ren reference, and points applicant to the below discussion of Yakopcic and Ren.
With regard to the limitation “receive a subset of weighting factors from the CPU” added to independent claims 1 and 19, using respective similar language, the examiner points to column 4, lines 60-62, column 5, lines 6-9, column 9, lines 48-52, column 17, lines 65-67 and column 23, lines 54-55 of Yakopcic, which explicitly disclose “instructions stored on a machine-readable medium, which may be read and executed by one or more processors”, “processors, controllers, or other devices executing the firmware, software, routines, instructions” [i.e., data is received from the processor/CPU as a result of instructions executed by the CPU], “combined weight 295 as shown in FIG. 2 as representative of the combined weight for the input voltage 240a is shown as Wj , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., a subset of weights/weighting factors] and “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)" [i.e., receive the subset of weights from the CPU’s execution of NN algorithms].
Regarding the new limitation “receive input data from the CPU” added to independent claims 1 and 19, using respective similar language, the examiner points to column 4, lines 60-62, column 5, lines 6-9, column 6, lines 35-40 and column 7, lines 16-19 of Yakopcic, which disclose “instructions stored on a machine-readable medium, which may be read and executed by one or more processors”, “processors, controllers, or other devices executing the firmware, software, routines, instructions” [i.e., data is received from the processor/CPU as a result of instructions executed by the CPU], “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100” [i.e., receive inputs/input data] and “execute multiple addition and multiplication operations in parallel in response to the input voltages 140(a-n) being applied to the inputs of the analog neuromorphic processing device 100.” [i.e., receive the inputs/input data from the CPU/in response to/from the CPU’s execution of operations on input voltages 140].
Regarding the newly-added limitations “vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors; vectorize the input data received from the CPU to generate vectorized input data” recited, using respective similar language, in amended claims 1 and 19, paragraph 29 of applicant’s specification states “the input data 125 is in the form of a two-dimensional input image X. The image X is broken up into smaller two-dimensional patches 320. The patches are then vectorized (also referred to as unrolling) into linear vectors, or columnized patches 330, for storage in the second memory circuit 250 as a vectorized X 310. A similar process is applied to the weights which are vectorized into linear vectors, or columnized kernels.” Therefore, “vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors” and “vectorize the input data received from the CPU to generate vectorized input data”, under the broadest reasonable interpretation (BRI), in light of the specification, is using any technique, process or method for creating, generating or populating linear vectors or columns of weights, weight values or weighting factors and input data such as an input image.
Regarding limitation “vectorize the input data received from the CPU to generate vectorized input data” added, using respective similar language, to claims 1 and 19, the examiner points to pages 1841-1842 and 1845 of Ren, which explicitly disclose “Scaling up CNN by vectorized GPU implementations … Vectorization refers to the process that transforms the original data structure into a vector representation … Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … bl is the bias weight. … Adding bias weight and applying nonlinear mapping are element-wise operations which can be deemed as already fully vectorized … all the original data f, b and w can be viewed as data vectors. Specifically, we seek vectorization operators Ϥc() to map kernel or feature map to its matrix form” [i.e., vectorize a subset of weighting factors/bias weights bl], “Convolution kernels are arranged by rows in another matrix. We can see that the product of these two matrices will put all the convolved feature maps in the resulting matrix, one feature map per row. Ϥc() here can be can be efficiently implemented by the … GPU1 and CPU” and “inputs can actually share all the weights in a CNN except the ones in the connection between conv layer and fully connected layer” [i.e., matrix including the subset of CNN weights received from the CPU, vectorization operators Ϥc() implemented by the CPU].
Regarding the “vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors” limitation, the examiner further points to pages 1841-1842 of Ren, which explicitly disclose “Scaling up CNN by vectorized GPU implementations … We mark the places where vectorization plays an important role. ‘a’ is the convolution layer that transforms the input image into feature representations … ‘e’ is the vectorization operation required to simultaneously process multiple input samples … Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … For vision tasks, f can be 2- or 3-dimension. … can be deemed as one single input fl. … all the original data f, b and w can be viewed as data vectors. Specifically, we seek vectorization operators Ϥc() to map kernel or feature map to its matrix form” [i.e., vectorize original input data f/input image] and “put all the convolved feature maps in the resulting matrix, one feature map per row. Ϥc() here can be can be efficiently implemented by the … GPU1 and CPU” [i.e., matrix including image data f received from a CPU or GPU, vectorization operator Ϥc() implemented by the CPU].
Further, regarding the limitation “perform analog in-memory computations based on (1) vectorized weighting factors, (2) the vectorized input data” recited in amended claims 1 and 19, using respective similar language, the examiner points to column 6, lines 10-14, 35-40 and 50-54, column 17, lines 55-58 and column 25, lines 57-59 of Yakopcic, which disclose "simultaneous execution of addition and multiplication operations in an analog circuit” [i.e., perform analog computations], “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100 and the analog neuromorphic processing device 100 then generates a plurality of output signals 180”, “resistive memories are also of nano-scale sizes that enable a significant amount of resistive memories to be configured within the analog neuromorphic processing device 100 [i.e., the AI processor/neuromorphic processing device 100 performs analog in-memory computations based on the inputs/input data] and “the analog neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values” [i.e., and based on NN weighting factors].
With continued reference to the above-noted perform computations limitation, the examiner additionally points to pages 1841 and 1845 of Ren, which explicitly disclose performing a “Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … bl is the bias weight … For vision tasks, f can be 2- or 3-dimension. … can be deemed as one single input fl. … all the original data f, b and w can be viewed as data vectors. … we seek vectorization operators Ϥc()to map kernel or feature map to its matrix form so that convolution can be conducted by matrix-vector multiplication.” and “vectorization … gives us a chance (perhaps for the first time) to unify both high level vision tasks and low level vision tasks in a single computational framework.” [i.e., perform convolutions and multiplication/computations based on the vectorized weights/bl and vectorized input data/input image].
As detailed below, the combination of Yakopcic, Chi and Ren (i.e., Yakopcic in view of Chi and further in view of Ren) teaches the limitations of amended independent claims 1 and 19. 
As further detailed below, the combination of Yakopcic, Chi and Ren (i.e., Yakopcic in view of Chi and further in view of Ren) teaches the limitations dependent claims 2-12, 15-16 and 18-24.
As also discussed below, the combination of Yakopcic, Chi, Ren and Deisher (Yakopcic in view of Chi and Ren, and further in view of Deisher) teaches the limitations of dependent claims 13, 17 and 25. 
As additionally discussed below, the combination of Yakopcic, Chi, Ren and Kim (Yakopcic in view of Chi and Ren, and further in view of Kim) teaches the limitations of dependent claim 14.
Applicant’s amendments have necessitated the claim objections and rejections under 35 U.S.C. 103 discussed below.

Claim Objections
Claims 1-25 are objected to because of the following informalities: 
In claim 1, the word “and” is missing between “generate vectorized input data;” and “perform analog in-memory computations” in the penultimate and last limitations of the claim. Also in claim 1, the word “the” is missing before “vectorized weighting factors” in the last limitation, next-to-last line of the claim. The examiner suggests that one way to address these objections would be to amend the last 4 lines of claim 1 to recite “vectorize the input data received from the CPU to generate vectorized input data; and
perform analog in-memory computations based on (1) the vectorized weighting factors, (2) the vectorized input data, and (3) the AI instruction set executed by the NPU.” Appropriate correction is required.
In the preamble of claim 2 in lines 2-3, the recitation of “one or more neural network (NN) layers that include a corresponding NN layers” is grammatically incorrect and should read “one or more neural network (NN) layers that include a corresponding NN layer 
In independent claim 19, the word “and” is missing between “generate vectorized input data;” and “perform analog in-memory computations” in the penultimate and last limitations of the claim in the penultimate and last limitations of the claim. The examiner suggests that one way to address this objection would be to amend the last 9 lines of claim 19 to recite “vectorize the input data received from the CPU to generate vectorized input data; and
perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU, (2) input data provided by the CPU, and (3) the AI instruction set executed by the NPU, wherein the AI processor comprises a corresponding NN layer, the corresponding NN layer including analog processing circuitry and memory circuitry, the memory circuitry to store the vectorized weighting factors and to store the vectorized input data, the analog processing circuitry to perform calculations between the vectorized weighting factors and the vectorized input data.” Appropriate correction is required.
Claim 19 also recites “receive a subset of weighting factors from the central processing unit; receive input data from the central processing unit” in lines 8-9. Applicant previously introduced “a central processing unit (CPU)” in line 3 and recites “the CPU” elsewhere in the claim (see, e.g., lines 7, 10 and 12. For examination purposes, the recitations of “the central processing unit” are being interpreted as the previously introduced “CPU”. For consistency and clarity, lines 8-9 of the claim should read “receive a subset of weighting factors from the CPU CPU 
Claims 2-18, which depend directly or indirectly from claim 1, are objected to under the same rationale as base claim 1. 
Also, claims 3-16, which depend directly or indirectly from claim 2, are objected to under the same rationale as claim 2. 
Lastly, claims 20-25, which depend directly or indirectly from claim 19, are objected to under the same rationale as claim 19.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-12, 15-16 and 18-24 are rejected under 35 U.S.C. 103 as being unpatentable over Yakopcic et al. (U.S. Patent No. 10,176,425 B2, hereinafter “Yakopcic”) in view of non-patent literature Chi, et al. ("PRIME: A Novel Processing-In-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 27-39, hereinafter “Chi”) and further in view of non-patent literature Ren, et al. ("On vectorization of deep convolutional neural networks for vision tasks." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 29. No. 1. 2015: 1840-1846, hereinafter “Ren”).
With respect to claim 1, Yakopcic discloses the invention as claimed including a hybrid artificial intelligence (AI) processing system (see, e.g., column 2, lines 57-58, column 17, lines 65-68, column 23, lines 49-53, “The present invention also provides an analog neuromorphic system”, "the analog neuromorphic circuit 400 may be incorporated into digital signal processing applications", “the analog neuromorphic circuit 400 may be incorporated into analog neuromorphic configurations to execute popular neural network algorithms to execute popular neural network algorithms" [i.e., a hybrid analog-digital neuromorphic/neural network/AI system]) comprising:
a central processing unit (CPU) to execute instructions from a general-purpose instruction set (aside from repeating the claim language in paragraphs 15, 61 and 80, applicant’s specification does not define “a general-purpose instruction set”. Therefore, “a general-purpose instruction set”, under the broadest reasonable interpretation (BRI), in light of the specification, are any instructions executable by a general-purpose processor or CPU) (see, e.g., column 2, line 67, column 4, lines 60-62 and column 17, lines 65-68, ''A controller is configured”, “instructions stored on a machine-readable medium, which may be read and executed by one or more processors.”, "digital signal processing applications" [i.e., a digital/conventional processor/CPU to execute instructions]); … and
an AI processor (see, e.g., FIG. 1 depicting neuromorphic processing device 100 [i.e., an AI processor] that is electrically and communicatively coupled to processor/CPU via lines 140 and 180 and column 6, lines 33-34, “an analog neuromorphic processing device 100” [i.e., neuromorphic processing device 100/AI processor is coupled to processor/CPU via input lines 140]) … , the AI processor to:
 receive a subset of weighting factors from the CPU (see, e.g., column 4, lines 60-62, column 5, lines 6-9, column 9, lines 48-52, column 17, lines 65-67 and column 23, lines 54-55, “instructions stored on a machine-readable medium, which may be read and executed by one or more processors”, “processors, controllers, or other devices executing the firmware, software, routines, instructions” [i.e., data is received from the processor/CPU as a result of instructions executed by the CPU], “combined weight 295 as shown in FIG. 2 as representative of the combined weight for the input voltage 240a is shown as Wj , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., a subset of weights/weighting factors], “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)" [i.e., receive the subset of weights from the CPU’s execution of NN algorithms]);
receive input data from the CPU (see, e.g., column 4, lines 60-62, column 5, lines 6-9, column 6, lines 35-40 and column 7, lines 16-19, “instructions stored on a machine-readable medium, which may be read and executed by one or more processors”, “processors, controllers, or other devices executing the firmware, software, routines, instructions” [i.e., data is received from the processor/CPU as a result of instructions executed by the CPU], “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100” [i.e., receive inputs/input data], “execute multiple addition and multiplication operations in parallel in response to the input voltages 140(a-n) being applied to the inputs of the analog neuromorphic processing device 100.” [i.e., receive the inputs/input data from the CPU/in response to/from the CPU’s execution of operations on input voltages 140]); …
perform analog in-memory computations based on (1) … weighting factors, (2) the … input data (see, e.g., column 6, lines 10-14, 35-40 and 50-54, column 17, lines 55-58 and column 25, lines 57-59, "simultaneous execution of addition and multiplication operations in an analog circuit” [i.e., perform analog computations], “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100 and the analog neuromorphic processing device 100 then generates a plurality of output signals 180”, “resistive memories are also of nano-scale sizes that enable a significant amount of resistive memories to be configured within the analog neuromorphic processing device 100 [i.e., the AI processor/neuromorphic processing device 100 performs analog in-memory computations based on the inputs/input data], “the analog neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values” [i.e., and based on NN weighting factors]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set; and …
an AI processor coupled to the CPU, the AI processor to … perform analog in-memory computations based on … (3) the AI instruction set executed by the NPU.
In the same field, analogous art Chi teaches a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set (see, e.g., FIG. 3 – showing the PRIME NPU architecture including an integrated “CPU”, and Abstract and pages 30, 32 and 36, Section III, “compared with a state-of-the-art neural processing unit design, PRIME improves the performance” [i.e., PRIME is a neural processing unit/NPU], “PRIME directly leverages ReRAM cells to perform computation without the need for extra PUs [processing units]. To achieve this, as shown in Figure 3(c), PRIME partitions a ReRAM bank”, “the PRIME controller that decodes instructions … including the function selection of each mat among programming synaptic weights”, “We also evaluate two different NPU solutions: using a complex parallel NPU [17] as a co-processor (pNPU-co), and using the NPU as a PIM-processor” [i.e., PRIME/NPU executes instructions from an AI instruction set for programming AI synaptic weights]); and
an AI processor coupled to the CPU (see, e.g., FIG. 3 – showing PRIME architecture with an AI processor coupled to the CPU and pages 32 and 34, “when PRIME is accelerating NN computation, CPU can still access the memory and work in parallel”, “When LRN layers are applied PRIME requires the help of CPU for LRN computation” [i.e., an AI processor in the PRIME architecture works in parallel with the CPU and is communicatively coupled to the CPU]), the AI processor to … perform analog in-memory computations (see, e.g., pages 30 and 37, Section III, “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs”, “PRIME reduces all the three parts of energy consumption significantly. For computation, ReRAM based analog computing is very energy-efficient” [i.e., AI processor of PRIME performs analog in-memory computing/computations]) based on … (3) the AI instruction set executed by the NPU (see, e.g., page 32, Section III C, “PRIME controller that decodes instructions and provides control signals … including the function selection of each mat among programming synaptic weights, computation, and memory, and also the input source selection for computation” [i.e., the AI processor of PRIME performs in analog in-memory computations based on the AI instruction set executed by the NPU/PRIME]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).
Although Yakopcic in view of Chi substantially teaches the claimed invention, Yakopcic in view of Chi is not relied on to teach vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors;
vectorize the input data received from the CPU to generate vectorized input data; and
perform … computations based on (1) vectorized weighting factors, (2) the vectorized input data.
In the same field, analogous art Ren teaches vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors (paragraph 29 of applicant’s specification states “the input data 125 is in the form of a two-dimensional input image X. The image X is broken up into smaller two-dimensional patches 320. The patches are then vectorized (also referred to as unrolling) into linear vectors, or columnized patches 330, for storage in the second memory circuit 250 as a vectorized X 310. A similar process is applied to the weights which are vectorized into linear vectors, or columnized kernels.” Therefore, “vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors”, under the BRI, in light of the specification, is using any technique, process or method for creating, generating or populating linear vectors or columns of weights, weight values or weighting factors) (see, e.g., pages 1841-1842 and 1845, “Scaling up CNN by vectorized GPU implementations … Vectorization refers to the process that transforms the original data structure into a vector representation … Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … bl is the bias weight. … Adding bias weight and applying nonlinear mapping are element-wise operations which can be deemed as already fully vectorized … all the original data f, b and w can be viewed as data vectors. Specifically, we seek vectorization operators Ϥc() to map kernel or feature map to its matrix form” [i.e., vectorize a subset of weighting factors/bias weights bl], “Convolution kernels are arranged by rows in another matrix. We can see that the product of these two matrices will put all the convolved feature maps in the resulting matrix, one feature map per row. Ϥc() here can be can be efficiently implemented by the … GPU1 and CPU”, “inputs can actually share all the weights in a CNN except the ones in the connection between conv layer and fully connected layer” [i.e., matrix including the subset of CNN weights received from the CPU, vectorization operators Ϥc() implemented by the CPU]);
vectorize the input data received from the CPU to generate vectorized input data (paragraph 29 of applicant’s specification states “the input data 125 is in the form of a two-dimensional input image X. The image X is broken up into smaller two-dimensional patches 320. The patches are then vectorized (also referred to as unrolling) into linear vectors, or columnized patches 330, for storage in the second memory circuit 250 as a vectorized X 310.” Therefore, “vectorize the input data received from the CPU to generate vectorized input data”, under the BRI, in light of the specification, is using any technique, process or method for creating, generating or populating linear vectors or columns of input data such as data from an input image) (see, e.g., pages 1841-1842, “Scaling up CNN by vectorized GPU implementations … We mark the places where vectorization plays an important role. ‘a’ is the convolution layer that transforms the input image into feature representations … ‘e’ is the vectorization operation required to simultaneously process multiple input samples … Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … For vision tasks, f can be 2- or 3-dimension. … can be deemed as one single input fl. … all the original data f, b and w can be viewed as data vectors. Specifically, we seek vectorization operators Ϥc() to map kernel or feature map to its matrix form” [i.e., vectorize original input data f/input image], “put all the convolved feature maps in the resulting matrix, one feature map per row. Ϥc() here can be can be efficiently implemented by the … GPU1 and CPU” [i.e., matrix including image data f received from a CPU or GPU, vectorization operator Ϥc() implemented by the CPU]); 
perform … computations based on (1) vectorized weighting factors, (2) the vectorized input data (see, e.g., pages 1841 and 1845, “Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … bl is the bias weight … For vision tasks, f can be 2- or 3-dimension. … can be deemed as one single input fl. … all the original data f, b and w can be viewed as data vectors. … we seek vectorization operators Ϥc()to map kernel or feature map to its matrix form so that convolution can be conducted by matrix-vector multiplication.”, “vectorization … gives us a chance (perhaps for the first time) to unify both high level vision tasks and low level vision tasks in a single computational framework.” [i.e., perform convolutions and multiplication/computations based on the vectorized weights/bl and vectorized input data/input image]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Ren with Yakopcic in view of Chi to provide “deep convolutional neural networks (CNN) … implementations with various degrees of vectorization” and “vectorization strategies for different layers in Deep CNNs” (See, e.g., Ren, Abstract and pages 1840-1841). Doing so would have allowed Yakopcic in view of Chi to use Ren’s CNN implementations with vectorization and vectorization strategies to take advantage of the “impact of vectorization on the speed of model training and testing … along with a vectorized Matlab implementation with state-of-the-art speed performance” where “the training and testing speed of our fully vectorized implementation … is competitive, if not faster in all the tested cases”, as suggested by Ren (See, e.g., Ren, Abstract and page 1843).

Regarding claim 2, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 1.
Yakopcic further discloses wherein the AI processor comprises one or more neural network (NN) layers (see, e.g., column 8, lines 28-29 and column 23, line 48: "layering of the analog neuromorphic processing device 100 with other similar analog neuromorphic circuits”, “in a given layer of a CNN system”) that include a corresponding NN layers including:
a digital access circuit to receive a subset of the weighting factors, the subset associated with the corresponding NN layer (aside from repeating the claim language and stating, with reference to the high level block diagram of FIG. 2, that “Digital access circuits 210 are configured to receive, from the CPU, weighting factors 120, or a subset of those weights associated with the NN layer.” – see paragraphs 19, 21, 37 and 62, applicant’s specification does not define “a digital access circuit”. Therefore, “a digital access circuit”, under the BRI, in light of the specification, is any digital circuit or circuitry that is capable of receiving weights or weighting factors.) (see, e.g., column 9, lines 48-52, column 17, lines 49-57 and 65-67, column 23, lines 54-55 and column 25, lines 57-59, “combined weight 295 as shown in FIG. 2 as representative of the combined weight for the input voltage 240a is shown as Wj , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “neuromorphic circuit 400 is capable of executing dot product operations in numerous applications such as but not limited to neural applications, image recognition, image processing, digital signal processing … neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., a subset of weights/weighting factors associated with a corresponding neuromorphic layer/NN layer], “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)" [i.e., circuitry/resistive memories receive the subset of weights associated with the corresponding NN layer], “stored in a digital storage layer as the output of the first convolution layer” [i.e., computing system for digital processing includes a digital circuit]), and to receive the input data associated with the corresponding NN layer (see, e.g., column 10, lines 17-25, column 18, lines 26-33 and column 25, lines 56-59, “neuromorphic circuit 200 may also be scaled to include additional layers of neurons … to the extent that the neural network configuration 300 can execute learning algorithms. For example, a neural network configuration with a significant number of input”, "image is a two-dimensional image depicted by the image matrix" [i.e., the image data is input data associated with the corresponding NN layer], “24x24 pixel feature maps after being generated may then be stored in a digital storage layer as the output of the first convolution layer.” [i.e., receive data associated with the NN layer]);
a first memory circuit to store the subset of the weighting factors (see, e.g., column 17, lines 56-57, column 23, lines 54-56 and column 25, lines 57-59 “vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., subset of weighted values/weighting factors to be applied to an image], “In executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n) may be determined” [i.e., subset of weights for each memory circuit 410a-n], “pixel feature maps after being generated may then be stored in a digital storage layer as the output of the first convolution layer.” [i.e., a first memory circuit stores the subset of weights]);
a first … processor … associated with the first memory circuit, the first [processor] … to perform analog calculations based on analog voltage values associated with elements of the first memory circuit (see, e.g., column 18, lines 37-42 and 59-61, " the controller 405 may convert the image matrix xex into the vector values included in the vector xex that are applied as the input voltages 440(a-n) and the vector values included in the vector -xex that are applied as the complemented input voltages 460( a-n )”, “controller 405 may then convert the kernel matrix kex into kex + and kex - which are similar to w+ and w- discussed above", “The output configuration 500 to convert the output voltage signal 510 to the non-binary values represented by the dot-product operation value 470a and the complemented dot-product operation value 450a” [i.e., processor/controller 405 converts the kernel, which is a matrix of weights by performing analog calculations based on analog voltage values/input voltages 440a-n associated with the first memory circuit]);
a second memory circuit to store the data associated with the corresponding NN layer (see, e.g., column 18, lines 26-33 and column 25, lines 35-38 and 57-58: "image is a two-dimensional image depicted by the image matrix" [i.e., the image data is data associated with the NN layer], "neuromorphic circuit 1000 includes … resistive memories 410(a-n) … a digital storage layer" [i.e., resistive memories 410a-n include a second memory circuit for storing the data]);
a second [processor] … to perform analog calculations based on analog voltage values associated with elements of the second memory circuit (see, e.g., column 19, lines 27-29 and column 20, lines 11-15, “the analog neuromorphic circuit 400 generates an output voltage signal 510. The output voltage signal 510 is generated from each input voltage 440(a-n)”, “The output configuration 500 to convert the output voltage signal 510 to the non-binary values represented by the dot-product operation value 470a and the complemented dot-product operation value 450a” [i.e., output configuration 500 converts the output voltage signal 510 by performing analog calculations based on analog voltage values/output voltage signal 510 associated with input voltages 440a-n of the first memory circuit]); and
a … processor … to perform analog calculations based on results generated by the first … and the second [processors] (see, e.g., column 20, lines 28-35, “The output configuration 500 includes the first op-amp configuration 520 and the second op-amp configuration 530 that may be positioned at the output of each column of the analog neuromorphic circuit 400 to both scale the output voltage signal 510 to a value on the non-linear smooth function 610 between "0" and "1" and does so by incorporating a neuron function such as an activation function and/or a thresholding function.” [i.e., a processor/op-amp configuration of the analog neuromorphic circuit 400 to perform analog calculations based on results/output voltage signal 510 generated by the first and second processors – controller 405 and output configuration 500]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose a first bit line processor (BLP) associated with the first memory circuit, the first BLP to perform analog calculations based on analog voltage values associated elements of the first memory circuit; …
a second BLP associated with the second memory circuit, the second BLP to perform analog calculations based on analog voltage values associated elements of the second memory circuit; and
a cross bit line processor (CBLP) to perform analog calculations based on results generated by the first BLP and the second BLP.
In the same field, analogous art Chi teaches a first bit line processor (BLP) associated with the first memory circuit, the first BLP to perform analog calculations based on analog voltage values associated elements of the first memory circuit (aside from repeating the claim language in paragraphs 20, 24 and 62 and stating, with reference to the high level block diagram of FIG. 2, “The first BLP circuit 230, associated with the first memory circuit 220, is configured to generate a first sequence of vectors of analog voltage values.” in paragraph 23, the specification does not define “a first bit line processor (BLP)” or a “bit line processor (BLP)”. Therefore, “a first bit line processor (BLP)”, under the BRI, in light of the specification, is any processor, functional unit, circuitry or circuit that is capable of performing analog calculations based on analog voltage values) (see, e.g., page 29, “execute the neural networks in Figure 2(a). The input data ai is represented by analog input voltages … Then the current flowing to the end of each bitline is viewed … After sensing the current on each bitline, the neural networks adopt a nonlinear function unit to complete the execution. Implementing NNs with ReRAM crossbar arrays requires specialized peripheral circuit design.” [i.e., a bitline unit/circuit performs analog calculations based on analog voltage values associated with a first ReRAM/memory circuit in the crossbar array]).
a second BLP associated with the second memory circuit, the second BLP to perform analog calculations based on analog voltage values associated elements of the second memory circuit (aside from repeating the claim language in paragraphs 20, 24 and 62 and stating, with reference to the high level block diagram of FIG. 2, “The second BLP circuit 230, associated with the second memory circuit 250, is configured to generate a second sequence of vectors of analog voltage values.” in paragraph 24, the specification does not define “a second BLP” [or a second bit line processor]. Therefore, “a second BLP”, under the BRI, in light of the specification, is any second processor, functional unit, circuitry or circuit that is capable of performing analog calculations based on analog voltage values) (see, e.g., pages 29 and 31, “current flowing to the end of each bitline is viewed … After sensing the current on each bitline, the neural networks adopt a nonlinear function unit to complete the execution. Implementing NNs with ReRAM crossbar arrays requires specialized peripheral circuit design.” “in order to allow FF subarrays to switch bitlines between memory and computation modes, we attach a multiplexer to each bitline to control the switch” [i.e., a second bitline unit/circuit performs analog calculations based on analog voltage values associated with a second memory circuit/ReRAM in the crossbar array]); and
a cross bit line processor (CBLP) to perform analog calculations based on results generated by the first BLP and the second BLP (As indicated above, the first and second BLPs, under the BRI, in light of the specification, are any processors, functional units, circuitry or circuits that are capable of performing analog calculations based on analog voltage values. Also, aside from repeating the claim language in paragraphs 20, 24, 62 and 64 and stating, with reference to the high level block diagram of FIG. 2, “The CBLP circuit 240 is configured to calculate a sequence of analog dot products” and “the CBLP circuit 240 performs the analog multiply portion of the dot product operation by timing current integration over a capacitor. Circuit 240 may be configured as a capacitor in series with a switch.” in paragraphs 25-26, applicant’s specification does not define “a cross bit line processor (CBLP)”. Therefore, “cross bit line processor (CBLP)”, under the BRI, in light of the specification, is any processor, functional unit, capacitor, circuitry or circuit that is capable of performing analog calculations based on generated results or outputs from other processors, units or circuits.) (see, e.g., pages 31 and 34, “in order to allow FF subarrays to switch bitlines between memory and computation modes, we attach a multiplexer to each bitline to control the switch”, “To implement synapse composing, the high-bit and low-bit parts of the synaptic weights are stored in adjacent bitlines of the corresponding crossbar array … (as shown in Figure 4 A ); the output currents are accumulated at the bitlines.” [i.e., a cross-bitline unit/circuit performs analog calculations based results/output generated by the first and second bitline units/circuits]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 3, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2.
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to employ the digital access circuit to copy the weighting factors from the CPU to the first memory circuit and to copy the input data from the CPU to the second memory circuit.
In the same field, analogous art Chi teaches wherein the AI instruction set includes instructions to employ the digital access circuit to copy the weighting factors from the CPU to the first memory circuit and to copy the input data from the CPU to the second memory circuit (see, e.g., pages 28-30, 32 and 35, “We propose a ReRAM main memory architecture, which contains a portion of memory arrays (full function subarrays) that can be configured as NN accelerators or as normal memory on demand. It is a novel PIM solution to accelerate NN applications, which enjoys the advantage of in-memory data movement”, “many NN applications require high memory bandwidth to fetch large-size input data and synaptic weights, the data movement between memory and processor”, “for NN computation the FF subarrays enjoy the high bandwidth of in-memory data movement, and can work in parallel with CPU, with the help of the Buffer subarrays”, “The right four commands in Table I control the data movement. They are applied during the whole computation phase.” [i.e., NN/AI instructions/commands to move/copy data between the processor/CPU and memory circuits], “Small-Scale NN: Replication … Our optimization is to replicate the small NN to different independent portions of the mat. For example, to implement a 128 − 1 NN, we duplicate it and map a 256 − 2 NN to the target mat. This optimization can also be applied to convolution layers. Furthermore, if there is another FF mat available, we can also duplicate the mapping to the second mat” [i.e., instructions to move/replicate duplicate/copy the weights/weighting factors of the NN and the NN input data from the CPU to the first and second memory circuits]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 5, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2.
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to employ the digital access circuit to store results of the CBLP analog calculations to a third memory circuit associated with the CPU.
In the same field, analogous art Chi teaches wherein the AI instruction set includes instructions to employ the digital access circuit to store results of the CBLP analog calculations to a third memory circuit associated with the CPU (In line with the BRI indicated above, the “digital access circuit”, under the BRI, in light of the specification, is any digital circuit or circuitry that is capable of storing results. As also indicated above the “CBLP”, under the BRI, in light of the specification, is any processor, functional unit, capacitor, circuitry or circuit that is capable of performing analog calculations based on generated results or outputs from other processors, units or circuits.) (see, e.g., pages 31 and 34, “we modify the column multiplexers in ReRAM … in order to allow FF subarrays to switch bitlines between memory and computation modes, we attach a multiplexer to each bitline to control the switch … After analog processing, the output current is sensed” [i.e., results of analog cross-bitline/CBLP calculations are stored in ReRAM/a memory circuit], “To implement synapse composing, the high-bit and low-bit parts of the synaptic weights are stored in adjacent bitlines of the corresponding crossbar array … (as shown in Figure 4 A ); the output currents are accumulated at the bitlines.” [i.e., cross-bitline unit/circuit/CBLP calculation results/output are stored in a third memory circuit/ReRAM in the crossbar array]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 6, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 5.
Yakopcic further discloses wherein the AI instruction set includes instructions to de-vectorize the results (paragraph 29 of the specification states “Figure 3 illustrates a vectorization process 300, in accordance with certain embodiments of the present disclosure. In this example, the input data 125 is in the form of a two-dimensional input image X. The image X is broken up into smaller two-dimensional patches 320. The patches are then vectorized (also referred to as unrolling) into linear vectors, or columnized patches 330 for storage … A complementary de-vectorization (or rolling) process may also be performed, for example to convert results back to a patch format.” Therefore, “de-vectorize the results”, under the BRI, in light of the specification, is converting a results or output vector back to another format) (see, e.g., column 15, lines 16-17, 23-25, and 33-36 and col. 17, lines 26-31, “each of the vector values included in the vector are applied as input voltages 440(a-n).”, “Each complement of the vector value may also be applied to the analog neuromorphic circuit 400 as complemented input voltages 460(a-n).”, “both the input voltages 440(a-n) representing the vector values as well as the complemented input voltages 460(a-n) representing the complemented vector values applied to the analog neuromorphic circuit 400, the controller 405 may generate a co-linear first relationship”, “neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image”, “The dot-product operation with a vector, such as the example vector in Equation 1, and a matrix, such as the example matrix in Equation 2, may then be executed incorporating the analog neuromorphic circuit 400. As noted above, each of the values in the vector may be applied as input voltages 440(a-n)” [i.e., instructions to de-vectorize results of applying weights and input voltages via dot-product operation]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose the CBLP analog calculations in the third memory circuit
In the same field, analogous art Chi teaches the CBLP analog calculations in the third memory circuit (see, e.g., pages 31 and 34, “we modify the column multiplexers in ReRAM … in order to allow FF subarrays to switch bitlines between memory and computation modes, we attach a multiplexer to each bitline to control the switch … After analog processing, the output current is sensed” [i.e., results of analog cross-bitline/CBLP calculations are stored in ReRAM/a memory circuit], “To implement synapse composing, the high-bit and low-bit parts of the synaptic weights are stored in adjacent bitlines of the corresponding crossbar array … (as shown in Figure 4 A ); the output currents are accumulated at the bitlines.” [i.e., cross-bitline unit/circuit/CBLP calculation results/output are in the third memory circuit/ReRAM in the crossbar array]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 7, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2.
Yakopcic further discloses wherein the AI instruction set includes instructions to employ the first and second BLPs to perform the analog calculations wherein the analog calculations include at least one of a multiplication, a Manhattan (L1) difference, and a Euclidean (L2) difference (paragraphs 38, 67 and 85 of applicant’s specification state “computations include multiplication, Manhattan (L1) difference, Euclidean (L2) difference” and “the analog calculations include at least one of a multiplication, a Manhattan (L1) difference, and a Euclidean (L2) difference.” Therefore, “a Manhattan (L1) difference, and a Euclidean (L2) difference”, under the BRI, in light of the specification, are as any computed or calculated geometric distances or differences. As indicated above, the first and second “BLPs” under the BRI, in light of the specification, are any processors, functional units, circuitry or circuits that are capable of performing analog calculations based on analog voltage values.) (see, e.g., column 5, line 61-column 6, line 2 and column 7, lines 15-18, “Each resistive memory may apply a resistance to each input voltage so that each input voltage is multiplied by each resistance. … multiplication in parallel enables multiple multiplication operations to be executed simultaneously. … The simultaneous execution of addition and multiplication operations in an analog circuit”, “the resistive memories may simultaneously execute multiple addition and multiplication operations in parallel in response to the input voltages 140(a-n) being applied to the inputs of the analog neuromorphic processing device 100.” [i.e., instructions to employ first and second resistive memories/circuits to perform analog calculations including a multiplication, based on analog input voltage values]).

Regarding claim 8, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2.
Yakopcic further discloses wherein the AI instruction set includes instructions to employ the CBLP to perform the analog calculations wherein the analog calculations include at least one of a dot product, an L1 normalization, an L2 normalization, a maximum operation, and a minimum operation (as indicated above, the “CBLP”, under the BRI, in light of the specification, is any processor, functional unit, capacitor, circuitry or circuit that is capable of performing analog calculations based on generated results or outputs from other processors, units or circuits. As also indicated above, “an L1 normalization, an L2 normalization” are being interpreted as any normalizations.) (see, e.g., column 7, lines 15-18 and column 11, lines 8-9, “The analog neuromorphic circuit 400 may be implemented so that dot-product operations may be executed”, “The controller 405 may then identify a minimum resistance value and a maximum resistance value for the resistance values of resistive memories 410c and 410/ and select resistance values for the resistive memories 410c and 410/ that are within the minimum and maximum resistance value range” [i.e., instructions to employ analog circuitry/resistive memories to perform analog calculations including a dot product and minimum and maximum operations]).

Regarding claim 9, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2.
Yakopcic further discloses wherein the AI instruction set includes instructions to perform thresholding on the results of the CBLP analog calculations (see, e.g., column 20, lines 28-35, “The output configuration 500 includes the first op-amp configuration 520 and the second op-amp configuration 530 that may be positioned at the output of each column of the analog neuromorphic circuit 400 to both scale the output voltage signal 510 to a value on the non-linear smooth function 610 between "0" and "1" and does so by incorporating a neuron function such as … a thresholding function.” [i.e., instructions to perform thresholding on results of the analog calculations]).

Regarding claim 10, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 9.
Yakopcic further discloses wherein the instructions to perform thresholding includes an option to specify at least one of sigmoid thresholding, … hyperbolic tangent thresholding, … minimum thresholding, maximum thresholding, and softmax thresholding (see, e.g., column 14, lines 8-13 and column 20, lines 5-8 and 24-35, “The controller 405 may then identify a minimum resistance value and a maximum resistance value for the resistance values of resistive memories 410c and 410f and select resistance values for the resistive memories 410c and 410f that are within the minimum and maximum resistance value range” “the output configuration 500 may be incorporated into the analog neuromorphic circuit 400 to simulate a non-linear smooth function configuration 600 in FIG. 6, such as a sigmoid function”, “The output configuration 500 may convert the output voltage signal to model the sigmoid function, the inverse tangent function, and/or any other type of non-linear smooth function … The output configuration 500 includes the first op-amp configuration 520 and the second op-amp configuration 530 that may be positioned at the output of each column of the analog neuromorphic circuit 400 to both scale the output voltage signal 510 to a value on the non-linear smooth function 610 between "0" and "1" and does so by incorporating a neuron function such as … a thresholding function.” [i.e., instructions/functions to perform thresholding on results of the analog calculations including options to specify sigmoid thresholding, tangent thresholding, minimum and maximum thresholding]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the instruction to perform thresholding includes an option to specify at least one of sigmoid thresholding, Rectified Linear Unit (ReLU) thresholding.
In the same field, analogous art Chi teaches wherein the instruction to perform thresholding includes an option to specify at least one of sigmoid thresholding, Rectified Linear Unit (ReLU) thresholding (see, e.g., page 31, “The modified column multiplexer incorporates … a nonlinear threshold (sigmoid) unit”, “we add a hardware unit to support ReLU function, a function in the convolution layer of CNN.”, “Our circuit design supports two activation functions: sigmoid and ReLU. Sigmoid is implemented by the sigmoid unit in Figure 4 B, and ReLU is implemented by the ReLU unit.” [i.e., instructions/functions to perform thresholding on results of the analog calculations include options to specify sigmoid and ReLU thresholding]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 11, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 9.
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to perform pooling on the thresholded results of the CBLP analog calculations.
In the same field, analogous art Chi teaches wherein the AI instruction set includes instructions to perform pooling on the thresholded results of the CBLP analog calculations (see, e.g., FIG. 4 C – showing “4-1 max pooling function units” and pages 31 and 34, “a circuit to support 4-1 max pooling is included”, “To implement max pooling function, we adopt 4:1 max pooling hardware in Figure 4 C , which is able to support n:1 max pooling”, “Mean pooling is easier to implement than max pooling, because it can be done with ReRAM and does not require extra hardware. To perform n:1 mean pooling, we simply pre-program the weights [1/n, · · · , 1/n] in ReRAM cells” [i.e., instructions to perform pooling on the thresholded results of the analog calculations]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 12, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 11.
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the instruction to perform pooling includes an option to specify a least one of maximum pooling, minimum pooling, average pooling, dropout pooling, and L2 normalization pooling.
In the same field, analogous art Chi teaches wherein the instruction to perform pooling includes an option to specify a least one of maximum pooling, minimum pooling, average pooling, dropout pooling, and L2 normalization pooling (see, e.g., FIG. 4 C – showing “4-1 max pooling function units” [i.e., maximum pooling] and pages 31 and 34, “a circuit to support 4-1 max pooling is included”, “To implement max pooling function, we adopt 4:1 max pooling hardware in Figure 4 C , which is able to support n:1 max pooling”, “Mean pooling is easier to implement than max pooling, because it can be done with ReRAM and does not require extra hardware. To perform n:1 mean pooling, we simply pre-program the weights [1/n, · · · , 1/n] in ReRAM cells” [i.e., option to specify maximum and mean/average pooling]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 15, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2.
Yakopcic further discloses wherein at least one of the NN layers is a convolutional NN layer (see, e.g., FIG. 8 – depicting “CONVOLUTION” layers 830 in convolutional neural network/CNN 800 and column 24, lines 13-15 and column 25, lines 56-59, “The feature extractor 810 includes the combination of two different types layers that are the convolution layers 830(a-n)”, “pixel feature maps after being generated may then be stored in a digital storage layer as the output of the first convolution layer.”).

Regarding claim 16, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2.
Yakopcic further discloses wherein at least one of the NN layers is a fully connected NN layer (see, e.g., FIG. 8 – depicting “FULLY CONNECTED LAYER” in convolutional neural network/CNN 800 and column 24, lines 31-33, “The outputs of the last layer of the conventional CNN 800 are then input to a fully connected network that is the classifier 820.”).

Regarding claim 18, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 1.
Examiner’s Note: claim 18, as drafted, depends from claim 1. If applicant intended for claim 18 to be an independent claim, the examiner suggests that one way to do so is to amend the last portion of claim 18 to explicitly recite the limitations of claim 1 instead of the current recitation of an “integrated circuit or chip set comprising the system of claim 1”.
Yakopcic further discloses an integrated circuit or chip set (see, e.g., column 6, lines 57-61 and column 7, lines 55-57, “the analog neuromorphic processing device 100 has significant computational efficiency while maintaining the size of the analog neuromorphic processing device 100 to a chip that may easily be positioned on a circuit board.”, “The scaling of the resistive memories into additional neurons may be done within the analog neuromorphic processing device 100 such as within a single chip. However, the analog neuromorphic processing device 100 may also be scaled with other analog neuromorphic circuits contained in other chips” [i.e., an integrated circuit or chip set]) comprising the system of claim 1 (as indicated above, Yakopcic in view of Chi and Ren teaches the system of claim 1, see above citations to Yakopcic, Chi and Ren regarding the limitations of claim 1).

With respect to independent claim 19, Yakopcic discloses the invention as claimed including an artificial intelligence (AI) processing system (see, e.g., column 2, lines 57-58 and column 23, lines 49-53, “The present invention also provides an analog neuromorphic system”, “the analog neuromorphic circuit 400 may be incorporated into analog neuromorphic configurations to execute popular neural network algorithms to execute popular neural network algorithms" [i.e., a neuromorphic/neural network/AI system]) comprising:
a central processing unit (CPU) to execute instructions from a general-purpose instruction set (as indicated above, “a general-purpose instruction set”, under the BRI, in light of the specification, are any instructions executable by a general-purpose processor or CPU) (see, e.g., column 2, line 67, column 4, lines 60-62 and column 17, lines 65-68, ''A controller is configured”, “instructions stored on a machine-readable medium, which may be read and executed by one or more processors.”, "digital signal processing applications" [i.e., a digital/conventional processor/CPU to execute instructions]); … and
(see, e.g., FIG. 1 depicting neuromorphic processing device 100 [i.e., an AI processor] that is electrically and communicatively coupled to processor/CPU via lines 140 and 180 and column 6, lines 33-34, “an analog neuromorphic processing device 100” [i.e., neuromorphic processing device 100/AI processor is coupled to processor/CPU via input lines 140]) … , the AI processor to:
receive a subset of weighting factors from the central processing unit (as indicated above, “the central processing unit” has been interpreted as the previously-introduced “CPU”) (see, e.g., column 4, lines 60-62, column 5, lines 6-9, column 9, lines 48-52, column 17, lines 65-67 and column 23, lines 54-55, “instructions stored on a machine-readable medium, which may be read and executed by one or more processors”, “processors, controllers, or other devices executing the firmware, software, routines, instructions” [i.e., data is received from the processor/CPU as a result of instructions executed by the CPU], “combined weight 295 as shown in FIG. 2 as representative of the combined weight for the input voltage 240a is shown as Wj , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., a subset of weights/weighting factors], “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)" [i.e., receive the subset of weights from the CPU’s execution of NN algorithms]);
receive input data from the central processing unit (as indicated above, “the central processing unit” has been interpreted as the previously-introduced “CPU”) (see, e.g., column 4, lines 60-62, column 5, lines 6-9, column 6, lines 35-40 and column 7, lines 16-19, “instructions stored on a machine-readable medium, which may be read and executed by one or more processors”, “processors, controllers, or other devices executing the firmware, software, routines, instructions” [i.e., data is received from the processor/CPU as a result of instructions executed by the CPU], “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100” [i.e., receive inputs/input data], “execute multiple addition and multiplication operations in parallel in response to the input voltages 140(a-n) being applied to the inputs of the analog neuromorphic processing device 100.” [i.e., receive the inputs/input data from the CPU/in response to/from the CPU’s execution of operations on input voltages 140]); …
perform analog in-memory computations based on (1) neural network (NN) weighting factors provided by the CPU, (2) input data provided by the CPU (see, e.g., column 4, lines 60-62, column 5, lines 6-9, column 6, lines 6-14, 35-40 and 50-54, column 17, lines 55-58, column 25, lines 57-59, “instructions stored on a machine-readable medium, which may be read and executed by one or more processors”, “processors, controllers, or other devices executing the firmware, software, routines, instructions” [i.e., data is provided by the processor/CPU as a result of instructions executed by the CPU], "simultaneous execution of addition and multiplication operations in an analog circuit” [i.e., perform analog computations], “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100 and the analog neuromorphic processing device 100 then generates a plurality of output signals 180”, “resistive memories are also of nano-scale sizes that enable a significant amount of resistive memories to be configured within the analog neuromorphic processing device 100 [i.e., the AI processor/ neuromorphic processing device 100 performs analog in-memory computations based on input data], “the analog neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values” [i.e., and based on NN weighting factors], “feature maps after being generated may then be stored in a digital storage layer” [i.e., feature maps are generated based on input and weights provided by the CPU]), … 
wherein the AI processor comprises a corresponding NN layer (see, e.g., column 8, lines 28-29 and column 23, line 48: "layering of the analog neuromorphic processing device 100 with other similar analog neuromorphic circuits”, “in a given layer of a CNN system”), the corresponding NN layer including analog processing circuitry and memory circuitry, the memory circuitry to store the … weighting factors and to store the … input data (see, e.g., column 8, lines 28-29, column 10, lines 17-25, column 23, lines 54-55 and column 25, lines 57-58, "layering of the analog neuromorphic processing device 100 with other similar analog neuromorphic circuits”, “the analog neuromorphic circuit 200 may also be scaled to include additional layers of neurons … additional layers of neurons also exponentially increases the computational efficiency of the neural network configuration 300 to the extent that the neural network configuration 300 can execute learning algorithms. For example, a neural network configuration with a significant number of input voltages” [i.e., NN layer includes analog processing circuitry], “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)", "a digital storage layer” [i.e., NN layer includes resistive memoires/circuitry to store the weights/weighting factors and the input data]), the analog processing circuitry to perform calculations between the … weighting factors and the … input data (see, e.g., column 6, lines 35-40, column 9, lines 48-52, column 10, lines 17-25, column 17, lines 54-57 and column 23, lines 54-55, “The analog neuromorphic processing device 100 includes a plurality of input voltages 140(a-n) that are applied to a plurality of respective inputs of the analog neuromorphic processing device 100”, “combined weight 295 as shown in FIG. 2 as representative of the combined weight for the input voltage 240a is shown as Wj , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “the analog neuromorphic circuit 200 may also be scaled to include additional layers of neurons … to the extent that the neural network configuration 300 can execute learning algorithms. For example, a neural network configuration with a significant number of input”, “the analog neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image”, “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)" [i.e., the analog processing circuitry executes learning algorithms/image processing/performs calculations using the stored weights/weighting factors and the inputs/input data]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set; and …
an AI processor coupled to the CPU … the AI processor to perform analog in-memory computations based on … (3) the AI instruction set executed by the NPU.
In the same field, analogous art Chi teaches a neural processing unit (NPU), integrated with the CPU, the NPU to execute instructions from an AI instruction set (see, e.g., FIG. 3 – showing the PRIME NPU architecture including an integrated “CPU”, and Abstract and pages 30, 32 and 36, Section III, “compared with a state-of-the-art neural processing unit design, PRIME improves the performance” [i.e., PRIME is a neural processing unit/NPU], “PRIME directly leverages ReRAM cells to perform computation without the need for extra PUs [processing units]. To achieve this, as shown in Figure 3(c), PRIME partitions a ReRAM bank”, “the PRIME controller that decodes instructions … including the function selection of each mat among programming synaptic weights”, “We also evaluate two different NPU solutions: using a complex parallel NPU [17] as a co-processor (pNPU-co), and using the NPU as a PIM-processor” [i.e., PRIME/NPU executes instructions from an AI instruction set for programming AI synaptic weights]); and
an AI processor coupled to the CPU (see, e.g., FIG. 3 – showing PRIME architecture with an AI processor coupled to the CPU and pages 32 and 34, “when PRIME is accelerating NN computation, CPU can still access the memory and work in parallel”, “When LRN layers are applied PRIME requires the help of CPU for LRN computation” [i.e., an AI processor in the PRIME architecture works in parallel with the CPU and is communicatively coupled to the CPU]), the AI processor to perform analog in-memory computations (see, e.g., pages 30 and 37, Section III, “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs”, “PRIME reduces all the three parts of energy consumption significantly. For computation, ReRAM based analog computing is very energy-efficient” [i.e., AI processor of PRIME performs analog in-memory computing/computations]) based on … (3) the AI instruction set executed by the NPU (see, e.g., page 32, Section III C, “PRIME controller that decodes instructions and provides control signals … including the function selection of each mat among programming synaptic weights, computation, and memory, and also the input source selection for computation” [i.e., the AI processor of PRIME performs in analog in-memory computations based on the AI instruction set executed by the NPU/PRIME]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).
Although Yakopcic in view of Chi substantially teaches the claimed invention, Yakopcic in view of Chi is not relied on to teach vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors;
vectorize the input data received from the CPU to generate vectorized input data; and
perform calculations between the vectorized weighting factors and the vectorized input data.
In the same field, analogous art Ren teaches vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors (as indicated above, “vectorize the subset of weighting factors received from the CPU to generate vectorized weighting factors”, under the BRI, in light of the specification, is using any technique, process or method for creating, generating or populating linear vectors or columns of weights, weight values or weighting factors) (see, e.g., pages 1841-1842 and 1845, “Scaling up CNN by vectorized GPU implementations … Vectorization refers to the process that transforms the original data structure into a vector representation … Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … bl is the bias weight. … Adding bias weight and applying nonlinear mapping are element-wise operations which can be deemed as already fully vectorized … all the original data f, b and w can be viewed as data vectors. Specifically, we seek vectorization operators Ϥc() to map kernel or feature map to its matrix form” [i.e., vectorize a subset of weighting factors/bias weights bl], “Convolution kernels are arranged by rows in another matrix. We can see that the product of these two matrices will put all the convolved feature maps in the resulting matrix, one feature map per row. Ϥc() here can be can be efficiently implemented by the … GPU1 and CPU”, “inputs can actually share all the weights in a CNN except the ones in the connection between conv layer and fully connected layer” [i.e., matrix including the subset of CNN weights received from the CPU, vectorization operators Ϥc() implemented by the CPU]);
vectorize the input data received from the CPU to generate vectorized input data (as indicated above, “vectorize the input data received from the CPU to generate vectorized input data”, under the BRI, in light of the specification, is using any technique, process or method for creating, generating or populating linear vectors or columns of input data such as data from an input image) (see, e.g., pages 1841-1842, “Scaling up CNN by vectorized GPU implementations … We mark the places where vectorization plays an important role. ‘a’ is the convolution layer that transforms the input image into feature representations … ‘e’ is the vectorization operation required to simultaneously process multiple input samples … Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … For vision tasks, f can be 2- or 3-dimension. … can be deemed as one single input fl. … all the original data f, b and w can be viewed as data vectors. Specifically, we seek vectorization operators Ϥc() to map kernel or feature map to its matrix form” [i.e., vectorize original input data f/input image], “put all the convolved feature maps in the resulting matrix, one feature map per row. Ϥc() here can be can be efficiently implemented by the … GPU1 and CPU” [i.e., matrix including image data f received from a CPU or GPU, vectorization operator Ϥc() implemented by the CPU]); 
perform calculations between the vectorized weighting factors and the vectorized input data (see, e.g., pages 1841 and 1845, “Vectorizing Convolution We refer to the image and intermediate feature maps as f, one of the convolution kernels as wi, the convolution layer can be typically expressed as 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
, where … bl is the bias weight … For vision tasks, f can be 2- or 3-dimension. … can be deemed as one single input fl. … all the original data f, b and w can be viewed as data vectors. … we seek vectorization operators Ϥc()to map kernel or feature map to its matrix form so that convolution can be conducted by matrix-vector multiplication.”, “vectorization … gives us a chance (perhaps for the first time) to unify both high level vision tasks and low level vision tasks in a single computational framework.” [i.e., perform convolutions and multiplication/calculations/computations between the vectorized weights/bl and vectorized input data/input image]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Ren with Yakopcic in view of Chi to provide “deep convolutional neural networks (CNN) … implementations with various degrees of vectorization” and “vectorization strategies for different layers in Deep CNNs” (See, e.g., Ren, Abstract and pages 1840-1841). Doing so would have allowed Yakopcic in view of Chi to use Ren’s CNN implementations with vectorization and vectorization strategies to take advantage of the “impact of vectorization on the speed of model training and testing … along with a vectorized Matlab implementation with state-of-the-art speed performance” where “the training and testing speed of our fully vectorized implementation … is competitive, if not faster in all the tested cases”, as suggested by Ren (See, e.g., Ren, Abstract and page 1843).

Regarding claim 20, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 19.
Yakopcic further discloses wherein the AI processor further includes a digital access circuit to receive the subset of the weighting factors, the subset associated with the corresponding NN layer, and to receive data associated with the corresponding NN layer (As indicated above, “a digital access circuit”, under the BRI, in light of the specification, is any digital circuit or circuitry that is capable of receiving weights or weighting factors. As also indicated above, “the corresponding NN layer” has been interpreted as “a corresponding NN layer”) (see, e.g., column 9, lines 48-52, column 17, lines 49-57 and 65-67, column 23, lines 54-55 and column 25, lines 57-59, “combined weight 295 as shown in FIG. 2 as representative of the combined weight for the input voltage 240a is shown as Wj , in FIG. 3. Similar combined weights for the input voltage 240b and the input voltage 240n”, “neuromorphic circuit 400 is capable of executing dot product operations in numerous applications such as but not limited to neural applications, image recognition, image processing, digital signal processing … neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., a subset of weights/weighting factors associated with a neuromorphic layer/NN layer], “executing each of the neural network algorithms, the weights of each of the resistive memories 410(a-n)" [i.e., circuitry/resistive memories receive the subset of weights associated with the NN layer], “stored in a digital storage layer as the output of the first convolution layer” [i.e., computing system for digital processing includes a digital circuit]) (see, e.g., column 10, lines 17-25, column 18, lines 26-33 and column 25, lines 56-59, “neuromorphic circuit 200 may also be scaled to include additional layers of neurons … to the extent that the neural network configuration 300 can execute learning algorithms. For example, a neural network configuration with a significant number of input”, "image is a two-dimensional image depicted by the image matrix" [i.e., the image data is input data associated with the NN layer], “24x24 pixel feature maps after being generated may then be stored in a digital storage layer as the output of the first convolution layer.” [i.e., receive data associated with the NN layer]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose and the instruction set includes instructions to employ the digital access circuit to copy the weighting factors and the input data from the CPU to the NN layer memory circuitry
In the same field, analogous art Chi teaches and the instruction set includes instructions to employ the digital access circuit to copy the weighting factors and the input data from the CPU to the NN layer memory circuitry (see, e.g., pages 28-30, 32 and 35, “We propose a ReRAM main memory architecture, which contains a portion of memory arrays (full function subarrays) that can be configured as NN accelerators or as normal memory on demand. It is a novel PIM solution to accelerate NN applications, which enjoys the advantage of in-memory data movement”, “many NN applications require high memory bandwidth to fetch large-size input data and synaptic weights, the data movement between memory and processor”, “for NN computation the FF subarrays enjoy the high bandwidth of in-memory data movement, and can work in parallel with CPU, with the help of the Buffer subarrays”, “The right four commands in Table I control the data movement. They are applied during the whole computation phase.” [i.e., NN/AI instructions/commands to move/copy data between the processor/CPU and memory circuits], “Small-Scale NN: Replication … Our optimization is to replicate the small NN to different independent portions of the mat. For example, to implement a 128 − 1 NN, we duplicate it and map a 256 − 2 NN to the target mat. This optimization can also be applied to convolution layers. Furthermore, if there is another FF mat available, we can also duplicate the mapping to the second mat” [i.e., instructions to move/replicate duplicate/copy the weights/weighting factors of the NN and the NN input data from the CPU to the first and second memory circuits]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claims 4 and 21, as discussed above, Yakopcic in view of Chi and Ren teaches the systems of claims 2 and 19.
Yakopcic further discloses wherein the AI instruction set includes instructions to vectorize the weighting factors stored in the first memory circuit and to vectorize the input data stored in the second memory circuit (paragraph 29 of the specification states “Figure 3 illustrates a vectorization process 300, in accordance with certain embodiments of the present disclosure. In this example, the input data 125 is in the form of a two-dimensional input image X. The image X is broken up into smaller two-dimensional patches 320. The patches are then vectorized (also referred to as unrolling) into linear vectors, or columnized patches 330 for storage in the second memory circuit 250 as a vectorized X 310.” Therefore, “vectorize the weighting factors” and “vectorize the input data”, under the BRI, in light of the specification, is converting weights and input data such as image data into vectors or columns) (see, e.g., column 12, lines 50-55, column 15, lines 16-17 and 33-36, “neuromorphic circuit 400 may execute dot-product operations that involve positive and/or negative floating point numbers included in the vector and/or matrix and is also able to accurately generate the dot-product operation values resulting from the dot-product operation of the vector and matrix”, “each of the vector values included in the vector are applied as input voltages 440(a-n).”, “both the input voltages 440(a-n) representing the vector values as well as the complemented input voltages 460(a-n) representing the complemented vector values applied to the analog neuromorphic circuit 400, the controller 405 may generate a co-linear first relationship”, “neuromorphic circuit 400 may be incorporated into image processing applications where the vector represents an image and the matrix includes a set of weighted values that are to be applied to the image” [i.e., map input image data and weights to a linear vector/vectorization]).

Regarding claim 22, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 19.
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to employ the digital access circuitry to store results of the analog calculations to memory circuitry associated with the CPU.
In the same field, analogous art Chi teaches wherein the AI instruction set includes instructions to employ the digital access circuitry to store results of the analog calculations to memory circuitry associated with the CPU (In line with the BRI indicated above, the “digital access circuit”, under the BRI, in light of the specification, is any digital circuit or circuitry that is capable of storing results.) (see, e.g., pages 31 and 34, “we modify the column multiplexers in ReRAM … in order to allow FF subarrays to switch bitlines between memory and computation modes, we attach a multiplexer to each bitline to control the switch … After analog processing, the output current is sensed” [i.e., results of analog calculations are stored in ReRAM/memory circuitry], “To implement synapse composing, the high-bit and low-bit parts of the synaptic weights are stored in adjacent bitlines of the corresponding crossbar array … (as shown in Figure 4 A ); the output currents are accumulated at the bitlines.” [i.e., analog calculation results/output are stored in a memory circuit/ReRAM in the crossbar array that is associated with the CPU]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Regarding claim 23, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 19.
Yakopcic further discloses wherein the AI instruction set includes instructions to employ the analog processing circuitry to perform the analog calculations wherein the analog calculations include at least one of a multiplication, a Manhattan (L1) difference, a Euclidean (L2) difference, a dot product, an L1 normalization, an L2 normalization, a maximum operation, and a minimum operation (as indicated above, “a Manhattan (L1) difference” and “a Euclidean (L2) difference” have been interpreted as any geometric distances or differences and “an L1 normalization, an L2 normalization” are being interpreted as any normalizations.) (see, e.g., column 5, line 61-column 6, line 2, column 7, lines 15-18, column 11, lines 8-9 and column 14, lines 8-13, “Each resistive memory may apply a resistance to each input voltage so that each input voltage is multiplied by each resistance. … multiplication in parallel enables multiple multiplication operations to be executed simultaneously. … The simultaneous execution of addition and multiplication operations in an analog circuit”, “the resistive memories may simultaneously execute multiple addition and multiplication operations in parallel in response to the input voltages 140(a-n) being applied to the inputs of the analog neuromorphic processing device 100.”, “The analog neuromorphic circuit 400 may be implemented so that dot-product operations may be executed”, “The controller 405 may then identify a minimum resistance value and a maximum resistance value for the resistance values of resistive memories 410c and 410f and select resistance values for the resistive memories 410c and 410f that are within the minimum and maximum resistance value range” [i.e., instructions to employ analog processing circuitry/resistive memories to perform analog calculations including a multiplication, a dot product and minimum and maximum operations]).

Regarding claim 24, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 19.
Yakopcic further discloses wherein the AI instruction set includes instructions to perform thresholding on the results of the analog calculations (see, e.g., column 20, lines 28-35, “The output configuration 500 includes the first op-amp configuration 520 and the second op-amp configuration 530 that may be positioned at the output of each column of the analog neuromorphic circuit 400 to both scale the output voltage signal 510 to a value on the non-linear smooth function 610 between "0" and "1" and does so by incorporating a neuron function such as … a thresholding function.” [i.e., instructions to perform thresholding on results of the analog calculations]).
Although Yakopcic substantially discloses the claimed invention, Yakopcic is not relied on to explicitly disclose wherein the AI instruction set includes instructions to … perform pooling on the thresholded results of the analog calculations (see, e.g., FIG. 4 C – showing “4-1 max pooling function units” and pages 31 and 34, “a circuit to support 4-1 max pooling is included”, “To implement max pooling function, we adopt 4:1 max pooling hardware in Figure 4 C , which is able to support n:1 max pooling”, “Mean pooling is easier to implement than max pooling, because it can be done with ReRAM and does not require extra hardware. To perform n:1 mean pooling, we simply pre-program the weights [1/n, · · · , 1/n] in ReRAM cells” [i.e., instructions to perform pooling on the thresholded results of the analog calculations]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yakopcic to incorporate the teachings of Chi to provide “a novel PIM [Processing-in-memory] architecture, called PRIME, to accelerate NN [neural network] applications in ReRAM based main memory” where the architecture for “processing in ReRAM-based main memory, PRIME … directly leverages ReRAM cells to perform computation without the need for extra PUs.” [processing units] (See, e.g., Chi, Abstract and page 30, section III). Doing so would have allowed Yakopcic to use Chi’s architecture to achieve “significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance” because the architecture “efficiently accelerates NN computation by leveraging ReRAM’s computation capability and the PIM architecture”, as suggested by Chi (See, e.g., Chi, Abstract and page 30, section III).

Claim 14 is rejected under 35 U.S.C. 103 as being obvious over Yakopcic in view of Chi and Ren as applied to claims 1 and 2 above, and further in view of Kim et al. (U.S. Patent No. 10,726,895 B1, hereinafter “Kim”). Kim was filed on January 7, 2019, and this date is before the effective filing date of this application, i.e., January 30, 2019. Therefore, Kim constitutes prior art under 35 U.S.C. 102(a)(2).
Regarding claim 14, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2. However, Yakopcic in view of Chi and Ren is not relied on to teach wherein the AI instruction set includes instructions to cause the CPU to employ the AI processor to perform backpropagation training.
In the same field, analogous art Kim teaches wherein the AI instruction set includes instructions to cause the CPU to employ the AI processor to perform backpropagation training (see, e.g., FIG. 2 – illustrating a backpropagation algorithm [i.e., an AI instruction set/algorithm including instructions to cause a CPU to employ an AI processor to perform backpropagation] and col. 6, lines 7-19, “FIG. 2 illustrates a backpropagation algorithm (used to train NN) which is composed of three cycles, forward, backward and weight update … There is an algorithm for neural networks called backpropagation algorithm that can be a primary generator of learning in neural networks” [i.e., an AI instruction set/algorithm including instructions to cause the CPU to employ the AI processor to perform backpropagation training/learning]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Kim with Yakopcic in view of Chi and Ren to provide “trainable resistive crosspoint devices, (RPUs) and circuit methodology for differential weight reading in resistive processing devices (RPU)” and “method(s) of utilizing current differentials within circuits to generate weight value(s) for weight storage device(s).” (See, e.g., Kim, column 4, lines 14-19). Doing so would have allowed Yakopcic in view of Chi and Ren to use Kim’s trainable devices and methods that use “The resistive processing unit (RPU) device [in order to] accelerate DNN training by orders of magnitude while using much less power than conventional devices. The RPU device can store and update weight values locally thus minimizing data movement during training and allowing to exploit locality and parallelism of training algorithm(s)”, as suggested by Kim (See, e.g., Kim, column 3, lines 27-32).

Claims 13, 17 and 25 are rejected under 35 U.S.C. 103 as being obvious over Yakopcic in view of Chi and Ren as applied to claim 1 above, and further in view of Deisher et al. (U.S. Patent Application Pub. No. 2018/0121796 A1, hereinafter “Deisher”).
Regarding claim 13, as discussed above, Yakopcic in view of Chi and Ren teaches the system of claim 2. However, Yakopcic in view of Chi and Ren is not relied on to teach wherein the AI instruction set includes instructions to transpose at least one of the first memory circuit and the second memory circuit.
In the same field, analogous art Deisher teaches wherein the AI instruction set includes instructions to transpose at least one of the first memory circuit and the second memory circuit (aside from repeating the claim language in paragraphs 73 and 91, the only mentions of any “transpose” operation in the specification are in paragraphs 29, 32 and 40, which state “the weights which are vectorized into linear vectors, or columnized kernels 350, for storage in the first memory circuit 220 as a vectorized W 340, in transposed form relative to the vectorized X 310.” and “Transpose memory region via pool logic or physical implementation” and “the weights and data may be vectorized and/or transposed.” Therefore, “instructions to transpose at least one of the first memory circuit and the second memory circuit”, under the BRI, in light of the specification are any instructions, logic or code to transpose a vector, array or matrix stored in at least one memory) (see, e.g., FIGs. 16 and 16-A depicting transposing arrays stored in memory and paragraphs 62-63, “Referring to FIG. 16-A, an input array held in memory, whether external memory or internal memory, is shown in interleaved form. In this form, the groups in the array 1602 may be arranged in columns (transposed from the row form shown for array 1600) where the input array 1602 is held in external memory … The result is a substantial reduction in the use of memory transactions and bandwidth to upload the same weight matrix multiple times for different groups”, “a transpose layer exists so that when an input array is provided in one arrangement (interleave or de-interleaved), it can be transposed to the other arrangement when needed” [i.e., instructions to transpose an array/contents in at least the first or second memory circuits storing the subset of weighting factors/weight matrix or data associated with an NN layer]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Deisher with Yakopcic in view of Chi and Ren to provide an “NN system 200 [that] may be a system on a chip (SoC) that has an NN Accelerator (NNA)” and includes a “processor 250 [that] may process instructions and may send data to, and receive data from, a volatile memory 248 which may be on-board, on-die or on-chip relative to the SoC, and may be RAM such as DRAM or SRAM and so forth. The processor 250 may control data flow with the memory” and “The processor 250 may retrieve or transmit data to other external (off-die or off-chip) volatile memory (such as cache and/or RAM) or non-volatile memory whether as memory 248 or another memory” for storing “the layer data within a layer as arranged in the memory” and storing “an input array” and a “weight matrix” (See, e.g., Deisher, paragraphs 46-47, 59 and 62). Doing so would have allowed Yakopcic in view of Chi and Ren to use Deisher’s NN system and NN accelerator components to achieve a “substantial reduction in the use of memory transactions and bandwidth to upload the same weight matrix multiple times for different groups”, as suggested by Deisher (See, e.g., Deisher, paragraphs 46-47 and 62).
Regarding claims 17 and 25, as discussed above, Yakopcic in view of Chi and Ren teaches the systems of claims 1 and 19. However, Yakopcic in view of Chi and Ren is not relied on to teach wherein the CPU is an x86-architecture processor.
In the same field, analogous art Deisher teaches wherein the CPU is an x86-architecture processor (see, e.g., paragraph 280, “Processor 1410 may be implemented as a Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) processors; x86 instruction set compatible processors, multi-core, or any other microprocessor or central processing unit (CPU).”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Deisher with Yakopcic in view of Chi and Ren to provide an “NN system 200 [that] may be a system on a chip (SoC) that has an NN Accelerator (NNA)” and includes a “processor 250 [that] may process instructions and may send data to, and receive data from, a volatile memory 248 which may be on-board, on-die or on-chip relative to the SoC, and may be RAM such as DRAM or SRAM and so forth. The processor 250 may control data flow with the memory” and “The processor 250 may retrieve or transmit data to other external (off-die or off-chip) volatile memory (such as cache and/or RAM) or non-volatile memory whether as memory 248 or another memory” for storing “the layer data within a layer as arranged in the memory” and storing “an input array” and a “weight matrix” (See, e.g., Deisher, paragraphs 46-47, 59 and 62). Doing so would have allowed Yakopcic in view of Chi and Ren to use Deisher’s NN system and NN accelerator components to achieve a “substantial reduction in the use of memory transactions and bandwidth to upload the same weight matrix multiple times for different groups”, as suggested by Deisher (See, e.g., Deisher, paragraphs 46-47 and 62).

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
The prior art made of record, listed on form PTO-892, and not relied upon, is considered pertinent to applicant's disclosure.
	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.K.B./Examiner, Art Unit 2125

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125