DETAILED ACTION
Claims 1-20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The amended title of the invention is not sufficiently descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed.  At this point in time, the examiner recommends --Instruction for Performing Activation Operations and N Convolutions on N Source Matrices Loaded a Single Time--.  This recommendation may change, particularly if the claims change throughout prosecution.
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
The disclosure is objected to because of the following informalities:
In amended paragraph [0278], “a register maps” is grammatically incorrect and must be reworded.  While this issue was corrected in other paragraphs, it appears a similar change to this paragraph was accidentally omitted.
Appropriate correction is required.

Claim Objections
Claim 1 is objected to because of the following informalities:
In line 7, insert --and-- before “perform” (as was done in claims 9 and 17).
Appropriate correction is required.

Claim Interpretation
In claims 1, 9, and 17, due to the use of plural language (e.g. convolutions, matrices, etc.), N is interpreted to be at least 2.  That is, N is not interpreted to encompass only 1. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as obvious over Liu et al., U.S. Patent Application Publication No. 2019/0057063 A1 (herein referred to as Liu), in view of Huang et al., U.S. Patent Application Publication No. 2018/0173571 (herein referred to as Huang), and one or more of Nurvitadhi et al., EP 3396533 A2 (herein referred to as Nurvitadhi) and the examiner’s taking of Official Notice.
Referring to claim 1, Liu has taught a processor (FIG.1, 100) comprising:
a) fetch circuitry to fetch an instruction (FIG.1, 132);
b) decode circuitry (FIG.1, 130) to decode the instruction (paragraph [0033], CONV instruction) having fields to specify an opcode (there is an inherent CONV opcode to indicate and location of a destination (results are stored in either destination 306 or 112 (paragraphs [0055]-[0056]).  Even if the choice is not indicated explicitly, the instruction, via its encoding, inherently identifies that the results are to be stored in the appropriate destination),
c) With respect to the limitation that the instruction has at least one field to specify N source matrices, this is not patentable for multiple reasons:
c1) Under a first interpretation where matrices may be interpreted as sub-matrices, from paragraph [0033] of Liu, the instruction has a field including a starting address of the submatrix, which in combination with width, height, and/or stride fields, defines locations of multiple submatrices (source matrices) to be loaded.
c2) Under a second interpretation, where the N source matrices are not interpreted as sub-matrices of the same matrix, the instruction of Liu can be said to include a field to identify one matrix (the one identified by the starting submatrix address).  However, Nurvitadhi has taught that a single convolution instruction could indicate operation to be performed on multiple input channels in parallel (e.g. on a red matrix, a green matrix, and a blue matrix, which correspond to pixel color for an image).  See paragraphs [0121] and [0251].  One of ordinary skill in the art would have recognized that the instruction of Liu could be modified to include locations of multiple source matrices (e.g. red, blue, and green source matrices) simply by replicating fields and operation to carry out more operations in parallel, which would speed up processing of color images.  As such, it would have been obvious to one of ordinary skill in the art before the effective filing date N source matrices.
d) Liu, alone or as modified, has further taught the opcode indicating the processor is to load the N source matrices from memory (each identified matrix/submatrix is loaded based on its starting address and, optionally, one or more of the width, height, and stride fields), perform N convolutions on the N source matrices to generate N feature maps (under the first interpretation, a convolution is performed between each of the N submatrices and a kernel (paragraph [0033]) to generate N results (feature maps).  Under the second interpretation, N=3 convolutions would be performed (one for the red, one for the blue, and one for the green), to generate N=3 feature maps), and that results are to be passed to an activation layer (the examiner notes that passing results to an activation layer is an inherent part of a convolutional neural network),
e) With respect to the limitation “store results of the N convolutions in registers”, this is not patentable for multiple reasons:
e1) Under a first interpretation, where a memory that registers results constitutes registers, from paragraphs [0055]-[0056], the results are registered in either neuron cache 306 or matrix cache 112 (either of which includes registers for registering the results)).
e2) Under a second interpretation, where “registers” is taken to mean registers in a register file, while this is not taught by Liu, the examiner notes that indicating one or more destination register file registers in an instruction for storage of results (intermediate or final) is well known and accepted in the art.  Registers are known, fast storage and thus, indication thereof for result storage would be advantageous for speed.  As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date store results of the N convolutions in registers.
f) Liu has not taught the opcode indicating the processor is to perform operations of the activation layer using the results of the N convolutions from the registers.  However, Huang has taught a “matrix-matrix multiplication and elementwise activation instruction” that instructions a convolution and widely-used ReLU activation (see paragraph [0086]).  As stated above, both convolution and activation are known layers in a CNN.  Activation typically follows convolution and helps the neural network learn complex patterns in data.  Having a single instruction do both convolution and activation would allow for the minimal amount of software to perform both functions.  For instance, instead of having two separate instructions (one for convolution and one for activation), one could implement a single instruction that does both (akin to CISC style design where multiple operations are incorporated into a single instruction) so as to reduce the size of programs and their utilization of memory (and, thus, the number of memory accesses to access this code).  Further, integrating multiple operations into a single piece construction (i.e., a single instruction in the software art) would amount to a routine expedient.  See MPEP 2144.04, including section (V)(B).  As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Liu such that the opcode indicates the processor is to perform operations of the activation layer using the results of the N convolutions from the registers.
g) With respect to the limitation “wherein the processor is to perform the N convolutions and the operations of the activation layer with at most one memory load of each of the N source matrices”, this is not patentable for multiple reasons:
.
g2) Under the second interpretation, Liu and Nurvitadhi are silent as to how many times each red, blue, and green matrix is loaded.  However, the examiner notes that caching data is well known and accepted in the art.  That is, cache is fast memory to store information that is likely to be accessed again in the near future, thereby precluding the processor from making a more expensive access to main memory.  As such, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Liu such that each matrix is only loaded from main memory once, so as to avoid time-consuming accesses to the same data in the future.
h) Liu has further taught scheduling circuitry to schedule execution of the instruction (from paragraphs [0022]-[0024], the instruction will be queued until its dependencies, if any, are satisfied.  Queuing until ready constitutes scheduling); and
i) Liu has further taught execution circuitry to execute the instruction as per the opcode (FIG.1, 110). 
Referring to claim 2, Liu, as modified, has taught the processor of claim 1, wherein the execution circuitry is to perform each of the N convolutions by convolving a feature identifier over a first source matrix one element at a time, each time generating products of each element of the feature identifier and a corresponding element of a receptive field of the first source matrix, and storing a sum of the products into a corresponding element of a first feature map (see paragraphs [0055]-[0056].  This is how a convolution neural network operates).
Referring to claim 3, Liu, as modified, has taught the processor of claim 1, wherein the activation layer comprises one of Rectified Linear Unit (ReLU) (see Huang, paragraph [0086]), tanh, sigmoid, and softmax.
Referring to claim 4, Liu, as modified, has taught the processor of claim 1, wherein N equals three, and elements of the N source matrices comprise red, green, and blue pixel values (again, recall the rejection of claim 1, part (c2)).
Referring to claim 5, Liu, as modified, has taught the processor of claim 1, but has not taught wherein the opcode further calls for the processor to perform pooling on the results of the N convolutions in order to down-sample each of the N feature maps.  However, pooling to down-sample results is well known in the art to reduce the amount of data (and, thus, storage requirements) while also generating a summary of features.  This makes the model more robust to variations in the position of the features in the input image.  As such, it would have first been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Liu to perform pooling on the results of the N convolutions in order to down-sample each of the N feature maps.  The examiner additionally notes that Liu, as modified, has taught all claimed functionality.  Further, it is a design choice to package all of this functionality into a single instruction so as to carry out the claimed functionality with minimal software, i.e., one instruction.  Such an instruction would perform convolution that includes pooling (note that Nurvitadhi even shows that pooling is part of convolution (FIG.9B) and, thus, In re Larson, 340 F.2d 965, 968, 144 USPQ 347, 349 (CCPA 1965).  As a result, to perform pooling with no additional instructions, it would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Liu such that the opcode further calls for the processor to perform pooling.
Referring to claim 6, Liu, as modified, has taught the processor of claim 1, but has not taught wherein the opcode further calls for the processor to perform a fully connected layer.  However, it is known to carry out a fully connected layer after feature extraction using convolution.  This allows the neural network to learn non-linear combinations of the extracted features and perform classification of the image.  As discussed in the rejection of claim 5, one would be motivated to package as much functionality into a single instruction to minimize software size/execution.  As such, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Liu such that the opcode further calls for the processor to perform a fully connected layer.
Referring to claim 7, Liu, as modified, has taught the processor of claim 1, but has not taught wherein the opcode further calls for the processor to pad each of the N source matrices with zeroes, such that each of the N feature maps has the same dimensions as a corresponding one of the N source matrices.  However, padding of source matrices is known in the art to better preserve information on the borders of images, for instance, so that deeper networks can be used.  Note that applying convolution-operation (with (f x f) filter/kernel) outputs (n + 2p – f + 1) x (n + 2p – f + 1) data.  For example, adding one layer of padding to an the opcode further calls for the processor to pad each of the N source matrices with zeroes, such that each of the N feature maps has the same dimensions as a corresponding one of the N source matrices.
Referring to claim 8, Liu, as modified, has taught the processor of claim 1, but has not taught wherein the opcode further calls for the processor to generate N additional feature maps by performing an additional convolution on each of the N feature maps, results of the N additional convolutions to be stored in registers to be passed to the activation layer, wherein the processor is to perform the N convolutions, the N additional convolutions, and the activation layer with at most one memory load of each of the N source matrices.  However, multiple convolution layers is known in the art.  While a first convolution would extract basis low-level features (e.g. edges, corners, circles, etc.), a subsequent convolution later would generate more complex features (combinations of low-level features to form more complex shapes, etc.).  Again, as discussed in the rejection of claim 5, one would be motivated to package as much functionality into a single instruction to minimize software size/execution.  As such, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Liu such that the opcode further calls for the processor to generate N additional feature maps by performing an additional convolution on each of the N feature maps, results of the N additional convolutions to be stored in registers to be passed to the activation layer, wherein the processor is to perform the N convolutions, the N additional convolutions, and the activation layer with at most one memory load of each of the N source matrices.
Claims 9-20 are respectively rejected for similar reasons as claims 1-8 and 1-4.

Response to Arguments
Applicant argues that Liu has not taught the claims as amended.
The examiner agrees.  However, the examiner asserts that it is obvious to perform additional functionality (e.g. activation operations) with a single instruction, in view of Huang, which has taught a combined convolution and activation instruction. 

Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Chen, 2019/0073583, has taught a number of instructions that include convolution parameters and activation parameters (e.g. see TABLE 1 and paragraphs [0052]-[0055]).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David J. Huisman whose telephone number is 571-272-4168.  The examiner can normally be reached on Monday-Friday, 9:00 am-5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta, can be reached at 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/David J. Huisman/Primary Examiner, Art Unit 2183