Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10,459,876. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims herein are broader than the scope of the claims in the patent and only include the addition of the activation engine which is obvious in the context of neural networks and/or the inclusion of a reference like Yan used below.

Claim Rejections - 35 USC § 112
The 112 rejections are withdrawn in light of the amendments on 11/04/2022.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 9-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shacham et al, US Pub No 2018/0005075, (herein Shacham) in view of Yoshihara, US Patent No 5,671,337.
As to claim 9, Shacham teaches: A processing element (PE) (FIG. 6, unit w/ execution lane 601 that includes an ALU) comprising:
a first interface configured to receive a first X-in element and a second X-in element (FIG. 11c, [0098] and [0111] inputs corresponding to pixel values.  See FIG. 4 for inputs for the lanes or further details in FIG. 6), and 
a second interface configured to receive a first Y-in element and a second Y-in element (FIG. 11c, [0098] and [0111]. See FIG. 4 for the inputs for the lanes or further details in FIG. 6), 
wherein the PE is configured to:
perform a first computational operation on the first X-in element and a weight value to generate a first intermediate result, and on the second X-in element and the weight value to generated a second intermediate (FIG. 11c, [0098] “at pixel locations G and J, coefficient a4 is multiplied by G and by J”, a coefficient a4 [weight] is multiplied for two input pixels G and J [first and second x-in]. See also [0111]); and 
perform a second computational operation on the first intermediate result and the first Y-in element to generate a first Y-out element, and on the second intermediate result and the second Y-in element to generate a second Y-out element (FIG. 11c, [0098] “The partial product G×a4 is added to the locally stored value” The product [result] is added to the local values H and K [Y-in]).
	Shacham does not explicitly teach: concurrently receiving a first X-in element and a second X-in element; wherein the first interface includes a first row input port for receiving the first X-in element and a second row input port for receiving the second X-in element. While Shacham makes it clear the operation is carried our concurrently ([0111]) it does not explicitly establish that the inputs are received concurrently or that the inputs have their own inputs. Shacham FIG. 4 shows additional inputs but only shows single lines for each element/lane and thus it can’t be concluded it provides two distinct input ports/lines per element. However, Yoshihara teaches a delay unit 19 to cause multiple x-in inputs (X1, X2, and X3) to be received simultaneously [concurrently] in a processing unit [element] (FIG. 11, C. 9 L. 44-49). Yoshihara also explicitly shows each input having their own row input port, e.g. x1, x2, xn (FIG. 11 and other figures). The combination would allow Shacham to receive the inputs concurrently on their own input ports which would in turn make processing more efficient as the actual process element would not have to wait itself for inputs to arrive at various times, thus freeing the processing element. The specific input ports would reduce any contention or need to handle organizing the data and instead further improve processing time in a predicable manner as parallel communication is understood in the art. One of ordinary skill would appreciate the ability to improve processing efficiency and timing and to ensure items arrive to the processing element at the same time – this improves efficiency and can optionally reduce the complexity of the processing element itself by potentially eliminating the need for more complex internal buffering.
	Therefore it would have been obvious at the time of filing to modify Shacham to receive the inputs concurrently on their own ports. One of ordinary skill would have been motivated to improve the efficiency of the processing element by reducing delays while waiting for inputs to arrive that would otherwise lock the processing element.
As to claim 10, Shacham/Yoshihara teaches: The processing element of claim 9, wherein the first Y-out element and the second Y-out element are provided as Y-in elements to another PE (Shacham FIG. 6 shows to neighbor units, see also FIG. 4).
As to claim 11, Shacham/Yoshihara teaches: The processing element of claim 9, wherein the first interface is configured to receive the first X-in element and the second X-in element from another PE (Shacham FIG. 4).
As to claim 12, Shacham/Yoshihara teaches: The processing element of claim 9, wherein the second interface is configured to receive the first Y-in element and the second Y-in element from another PE (Shacham FIG. 4).
As to claim 13, Shacham/Yoshihara teaches: The processing element of claim 9, further comprising a third interface configured to generate a first X-out element and a second X-out element from the first X-in element and the second X-in element (Shacham FIG. 6, FIG. 11a-11j showing other operations who’s output could go in other directions).
As to claim 14, Shacham/Yoshihara teaches: The processing element of claim 9, wherein the first interface is coupled to a data path that receives the first Y-out element and the second Y-out element (Shacham FIG. 4).
As to claims 15-18, these claims are the method claims corresponding to the apparatus/PE claims 9-12 and are rejected for the same reasons mutatis mutandis.
As to claim 19, Shacham/Yoshihara teaches: The method of claim 15, further comprising outputting a first X-out element based on the first X-in element, and a second X-out element based on the second X-in element to another PE (Shacham FIG. 6).
As to claim 20, Shacham/Yoshihara teaches: The method of claim 15, further comprising: applying a function to the first Y-out element and the second Y-out element (Shacham FIG. 11c, multiplication); and providing a result of the function to the row datapath (Shacham FIG. 6).

Claims 1 and 3-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shacham/Yoshihara as applied to claims 9-20 above, and further in view of Yan et al., US Pub No. 2018/0218518 (herein Yan).
As to claim 1, Shacham teaches: An integrated circuit device (FIG. 4) comprising: 
a processing element (PE) (FIG. 6, unit w/ execution lane 601 that includes an ALU) including: 
a row input interface configured to receive a first X-in element and a second X-in element of a row input based on the row data (FIG. 11c, [0098] and [0111] inputs corresponding to pixel values.  See FIG. 4 for inputs for the lanes or further details in FIG. 6); 
a column input interface configured to receive a column input (FIG. 11c, [0098] and [0111]. See FIG. 4 for the inputs for the lanes or further details in FIG. 6); and 
a column output interface configured to provide a column output computed from the row input, the column input, and the weight value (FIG. 11c, [0098]);
Shacham does not explicitly teach: concurrently receive a row of inputs; wherein the first interface includes a first row input port for receiving the first X-in element and a second row input port for receiving the second X-in element. While Shacham makes it clear the operation is carried our concurrently ([0111]) it does not explicitly establish that the inputs are received concurrently or that the inputs have their own inputs. Shacham FIG. 4 shows additional inputs but only shows single lines for each element/lane and thus it can’t be concluded it provides two distinct input ports/lines per element. However, Yoshihara teaches a delay unit 19 to cause multiple x-in inputs (X1, X2, and X3) to be received simultaneously [concurrently] in a processing unit [element] (FIG. 11, C. 9 L. 44-49). Yoshihara also explicitly shows each input having their own row input port, e.g. x1, x2, xn (FIG. 11 and other figures). The combination would allow Shacham to receive the inputs concurrently on their own input ports which would in turn make processing more efficient as the actual process element would not have to wait itself for inputs to arrive at various times, thus freeing the processing element. The specific input ports would reduce any contention or need to handle organizing the data and instead further improve processing time in a predicable manner as parallel communication is understood in the art. One of ordinary skill would appreciate the ability to improve processing efficiency and timing and to ensure items arrive to the processing element at the same time – this improves efficiency and can optionally reduce the complexity of the processing element itself by potentially eliminating the need for more complex internal buffering.
Therefore it would have been obvious at the time of filing to modify Shacham to receive the inputs concurrently on their own ports. One of ordinary skill would have been motivated to improve the efficiency of the processing element by reducing delays while waiting for inputs to arrive that would otherwise lock the processing element.
Shacham/Yoshihara does not explicitly teaches: a state buffer configured to provide row data and a weight value; an output buffer configured to store a computational result derived from the column output; and an activation engine configured to apply a function to the computational result and store an output of the function in the state buffer. Shacham discloses the use of memory (e.g. RAM 407) for storing values provided to the units / lanes as well as storing the results (e.g. registers internal to the FIG. 4) but does not explicitly detail a specific external state buffer or an explicit external output buffer nor is there any explicit discussion of post-processing work being performed on the results. Shacham arguably provided this in some context by the memory, lane buffers, and registers but it is not integrated as a single storage external to the array itself as claimed.
Yan, however, teaches buffers for input and weight (FIG. 2A, buffers 230 and 220, fed from memory interface 205) as well as an accumulator (FIG. 2A, accumulator 245) to buffer results from the PE array ([0032] “The accumulator 245 within the DLA 200 accumulates the results generated by the PE array 240”). Yan also discloses a post processor [activation engine] (FIG. 2A, post processor 246) to perform tasks, e.g. ReLU, on the values from the accumulator ([0032]) that stores its results, ultimately, back to memory and the same buffers. The combination of Yan into Shacham/Yoshihara would provide explicit buffers for input and output as well as provide post processing [activation]. MPEP 2144.04 notes that making elements separate and/or rearrangement of parts is obvious and not patentably distinct; where the data is stored is not distinguishable for the functionality of the system and there are ultimately a finite set of options available with their own trade-offs. More specifically, the post processing would make external memory useful so as to reduce the I/O load on the elements themselves as there would otherwise need to be frequent access back. The inclusion of post processing, ReLU for example, is known to be useful in neural networks particularly in improving training of networks dating back at least to 2011. ReLU are commonly used in computer vision and speech recognition, which Shacham/Yoshihara is noted for use in vision/image processing ([0115]) and has important commercial importance. This would make it obvious for one of ordinary skill to explore the incorporation of the post processing of Yan into a known neural network system of Shacham/Yoshihara to improve vision / image recognition.
Therefore it would have been obvious, at the time the invention was filed/made, to include external buffers and to include post processing [ReLU / activation] into a specific neural network architecture to improve the performance with deep neural networks for image processing and speech recognition.
As to claim 3, Shacham/Yoshihara/Yan teaches: The integrated circuit device of claim 1, wherein the column output includes two Y-out elements (Shacham FIG. 11c).
As to claim 4, Shacham/Yoshihara/Yan teaches: The integrated circuit device of claim 3, wherein the column input includes two Y-in elements (Shacham FIG. 11c H and K).
As to claim 5, Shacham/Yoshihara/Yan teaches: The integrated circuit device of claim 1, wherein the PE further includes a row output interface configured to provide a row output (Shacham FIG. 11c and FIG. 6).
As to claim 6, Shacham/Yoshihara/Yan teaches: The integrated circuit device of claim 5, further comprising another PE configured to concurrently receive the two or more elements of the row output from the row output interface (Shacham FIG. 6; Yoshihara FIG. 11, C. 9 L. 44-49).
As to claim 7, Shacham/Yoshihara/Yan teaches: The integrated circuit device of claim 1, further comprising another PE configured to receive the column output from the column output interface (Shacham FIG. 6).
As to claim 8, Shacham/Yoshihara/Yan teaches: The integrated circuit device of claim 1, wherein the function that the activation engine is configured to apply to the computational result is one of a bypass function or a ReLU function (Yan [0032] ReLU).

Response to Arguments
Applicant’s arguments, filed 11/04/2022, have been considered but are not persuasive. Applicant argues in substance:
Without commenting on the basis for the double patenting rejection, Applicant respectfully requests that the Office hold this rejection in abeyance until this application is otherwise in condition for allowance.
Examiner will hold the double patenting rejection until the case is otherwise in condition for allowance.
For example, in reference to Applicant's claim 1, while Yoshihara describes a "neuron unit" that receives multiple input values at multiple inputs ("inputs x=(xi, x2 , ..., xn)" in FIG. 1 of Yoshihara), Yoshihara does not disclose "a column output interface configured to provide a column output" in the manner recited in claim 1. In reference to Applicant's claims 9 and 15, while Yoshihara describes a "neuron unit" with multiple inputs, the "neuron unit" computes a single output "y" and therefore fails to disclose a "first Y-out element" and a "second Y-out element" as recited in claims 9 and 15.
This argument is not persuasive. The rejection at hand is a 103 rejection, a combination of references, and Yoshihara is not relied upon to teach the limitations identified, rather Shacham teaches column / y-out elements (FIG. 11C, [0098]).
Shacham and Yan fail to remedy the above-noted deficiencies of Yoshihara. For example, the disclosure in Shacham, which includes an embodiment of "an array of execution lanes 405 that are logically positioned "above" a two-dimensional shift register array structure 406", appears to lack a relevant description to a "processing element (PE)" that includes the features recited in the independent claims, such as a "row input interface" that includes "a first row input port for receiving the first X-in element and a second row input port for receiving the second X- in element" and a "column output interface".
This argument is not persuasive. While Examiner agrees that Shacham and Yan do not explicitly teach the added limitations, Yoshihara explicitly illustrates the input ports, e.g. x1 and x2, and the output interface, e.g. y (FIG. 1). Assuming the specific orientation of row versus column is more narrow the combination with Shacham (FIG. 4, FIG. 11c) address that orientation.
The additional limitations added seem to have direct support in Yoshihara (FIG. 1) and thus do not resolve the current rejection. Given that the combination of Yoshihara and Shacham have a very similar input method to the current invention (e.g. FIG. 8) Examiner would suggest focusing elsewhere to overcome the rejection. Shacham illustrates row and column input and outputs (e.g. FIG. 4) but does not clarify they are distinct and separate pins for concurrent operations. Yoshihara resolves this concern by showing distinct input pins which would allow for concurrent input and thus speed up processing and resolve contention. While Yoshihara does not make it explicit that other inputs or outputs are multi-pinned there would be an argument that extending the plural pins to other I/O would be obvious for similar reasons.
In the interest of compact prosecution, Examiner would suggest that Applicant:
(a) claim both sets of output (Xout and Yout) and clarify how operations are routed; or 
(b) detail of the operations performed internally. 
Both aspects appear in FIG. 8. With regard to (a), the plural Xout ports generally appear to function in a pass through nation for Xin while the plural Yout ports appear to output the results of computations between respective Xin and Yin inputs. With regard to (b), the calculations performed appear that each input is multiplied by a weight and then added to a second respective input. Claiming either of these distinctions would likely overcome the current art and may result in allowance of the claims, pending further search and consideration. Examiner is available for an interview at Applicant’s convenience to discuss clarifications or potential amendments. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to William B Partridge whose telephone number is (571)270-1402.  The examiner can normally be reached on Mon-Fri Noon-3 Pacific.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/William B Partridge/Primary Examiner, Art Unit 2183