DETAILED ACTION
Claims 20-23, 25-30, and 32-41 are pending.
The office acknowledges the following papers:
Claims and remarks filed on 11/23/2021.
  
Allowable Subject Matter
Claim 41 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 20, 27, 34, and 38-39 are rejected under 35 U.S.C. 103 as being unpatentable over Jackson (U.S. 2016/0011869), in view of Ford (U.S. 2005/0198473), Devaney et al. (U.S. 2004/0193838), in view of Wentzlaff et al. (U.S. 8,631,205).
As per claim 20:
Jackson, Ford, and Devaney disclosed an image processor, wherein a processing unit comprises: 
four input ports and two output ports (Jackson: Figures 1-2 elements 22, 110, and 124, paragraphs 30-31, 33, and 36)(The functional unit containing the ALU has four inputs and two outputs.); and
a first arithmetic-logic unit (ALU) and a second ALU (Jackson: Figure 2 element 202, paragraph 36), 
wherein each processing unit is configured to operate on instructions having an instruction format having an opcode that specifies whether the first ALU and the second ALU operate in sequence or in parallel (Devaney: Figures 6A-B, paragraphs 18 and 94)(Ford: Figures 3, 5-6, 8, and 10, paragraphs 97, 99, 101, 104, and 106)(Jackson: Figures 1-2 and 4 elements 110, 124, 202, 406, and 412, paragraphs 30, 36, and 41)(Devaney disclosed vector and scalar instruction opcodes. Ford disclosed SIMD operations that include opcodes and 64-bit source registers with 16-bit data elements. This SIMD operation performs four parallel half-width calculations. Jackson generally disclosed the functional units can be SIMD units. In addition, Jackson disclosed two 32-bit ALUs operating in a 64-bit mode that perform a single double-wide ALU operation. The combination allows for the execution of SIMD operations with opcodes on the ALUs of Jackson. The combination allows for execution of scalar instructions with opcodes on the ALUs of Jackson.) and that, when executed, cause the processing unit to switch between (i) operating the first ALU and the second ALU in sequence to perform a double-width ALU operation (Jackson: Figures 2 and 4 elements 202, 406, and 412, 
wherein when the opcode specifies that the first ALU and the second ALU operate in sequence, each processing unit is configured to perform a double-width ALU operation (Devaney: Figures 6A-B, paragraphs 18 and 94)(Jackson: Figures 2 and 4 elements 202, 406, and 412, paragraphs 36 and 41)(Devaney disclosed vector and scalar instruction opcodes. Jackson disclosed two 32-bit ALUs operating in a 64-bit mode that perform double-wide ALU operations on fetched 64-bit instructions from a single thread. The combination allows for such execution upon receiving such 64-bit instructions with opcodes.), during which:
the first ALU is configured to receive data from a first pair of input ports, to perform a first full-width ALU operation to compute (i) a lower half result of the double-width ALU operation and (ii) a carry term, to provide the lower half result of the double-width ALU operation to a first output port of the two output ports, and to provide the carry term to the second ALU (Jackson: Figures 1-2 elements 124 and 
the second ALU is configured to receive data from a second pair of input ports and receive the carry term from the first ALU, to perform a second full-width ALU operation to compute an upper half result of the double-width ALU operation, and to provide the upper half result of the double-width ALU operation to a second output port of the two output ports (Jackson: Figures 1-2 elements 124 and 202, paragraph 36)(The upper ALU receiving operands C and D, as well as the carry bit, to perform the upper 32-bit ALU operation. When the ALU is configured for 64-bit operations, the upper 32-bit adder outputs an upper 32-bit execution result.), and
wherein when the opcode specifies that the first ALU and the second ALU operation in parallel, each processing unit is configured to perform four half-width ALU operations at least partially in parallel (Devaney: Figures 6A-B, paragraphs 18 and 94)(Ford: Figures 3, 5-6, 8, and 10, paragraphs 97, 99, 101, 104, and 106)(Jackson: Figures 1-2 and 4 elements 110, 124, 202, 406, and 412, paragraphs 30, 36, and 41)(Devaney disclosed vector and scalar instruction opcodes. The combination allows for the SIMD operations to include opcodes. The combination allows for the two 32-bit adders of Jackson to be used in parallel in a 64-bit mode to perform the SIMD operations of Ford. This SIMD operation performs four parallel half-width calculations. Jackson generally disclosed the functional units can be SIMD units.), during which:
the first ALU is configured to receive input operands from the first pair of 
the second ALU is configured to receive input operands from the second pair of input ports, to perform a first half-width operation on a lower half of each of the input operands, to perform a second half-width operation on an upper half of each of the input operands, and to write a result to the second output port of the two output ports (Ford: Figures 3, 6, 8, and 10, paragraphs 97, 101, 104, and 106)(Jackson: Figures 1-2 and 4 elements 116, 124, 22, 406, and 412, paragraphs 30, 36, and 41)(The combination allows for the two 32-bit adders to be used in parallel in 64-bit mode to perform the SIMD operations of Ford. The combination allows for a second 32-bit adder to receive 32-bit source operands c and d to perform two 16-bit additions on the upper/lower 16-bits, which the overall 32-bit result is output to the register file.).
The advantage of SIMD execution is that multiple parallel operations can be performed on different data sets, which improves processing performance. Thus, it would 
The advantage of using instruction opcodes is that decoders can detect instruction types and produce control signals to correctly execute the received instruction. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to include the opcodes of Devaney within the instructions of Jackson and Ford.
Jackson, Ford, and Devaney failed to teach an array of processing units.
However, Wentzlaff combined with Jackson, Ford, and Devaney disclosed an array of processing units (Wentzlaff: Figures 1 and 2A element 102, column 4 lines 24-35 and column 5 lines 8-27)(Jackson: Figure 1 elements 110 and 116, paragraphs 30-31)(Wentzlaff disclosed an array of interconnected tiles. The combination implements the register file and functional units of Jackson into each tile of Wentzlaff.).
The advantage of processing arrays is that processing of large data sets can be performed in parallel for improved performance. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the functional units of Jackson into the processing tiles of Wentzlaff for the advantage of increased performance.
As per claim 27:
Claim 27 essentially recites the same limitations of claim 20. Therefore, claim 27 is rejected for the same reasons as claim 20.
As per claim 34:
Claim 34 essentially recites the same limitations of claim 20. Therefore, claim 34 
As per claim 38:
Jackson, Ford, Devaney, and Wentzlaff disclosed the image processor of claim 20, wherein the instruction causes the first ALU and the second ALU to perform operations including causing the first ALU to compute the lower half result and the carry term from the first pair of input ports (Jackson: Figures 1-2 elements 124 and 202, paragraph 36)(When the ALU is configured for 64-bit operations, the lower 32-bit adder outputs a carry bit and a lower 32-bit execution result.), and causing the second ALU to compute the upper half result from the second pair of input ports and the carry term  (Jackson: Figures 1-2 elements 124 and 202, paragraph 36)(When the ALU is configured for 64-bit operations, the upper 32-bit adder outputs an upper 32-bit execution result using operands C and D, as well as the carry-in bit.).
As per claim 39:
Claim 39 essentially recites the same limitations of claim 38. Therefore, claim 39 is rejected for the same reasons as claim 38.

Claims 21-23, 28-30, and 35-37 are rejected under 35 U.S.C. 103 as being unpatentable over Jackson (U.S. 2016/0011869), in view of Ford (U.S. 2005/0198473), Devaney et al. (U.S. 2004/0193838), in view of Wentzlaff et al. (U.S. 8,631,205), in view of Official Notice.
As per claim 21:
Jackson, Ford, Devaney, and Wentzlaff disclosed the image processor of claim 20, wherein the each processing unit is configured to perform the second full-width ALU 
As per claim 22:
Jackson, Ford, Devaney, and Wentzlaff disclosed the image processor of claim 21, wherein each processing unit has a carry line between the first ALU and the second ALU to provide the carry term to the second ALU (Jackson: Figure 2 element 204, paragraph 36).
As per claim 23:
Jackson, Ford, Devaney, and Wentzlaff disclosed the image processor of claim 22, wherein the second ALU is configured to perform the second full-width ALU operation only upon receiving the carry term on the carry line (Jackson: Figures 1-2 elements 124 and 202, paragraph 36)(The upper ALU receives operands C and D, as well as the carry bit, to perform the upper 32-bit ALU operation. It would have been obvious to one of ordinary skill in the art that the upper 32-bit adder has to wait to receive the carry in bit prior to performing the calculation in order to correctly perform the calculation. This 
As per claim 28:
The additional limitation(s) of claim 28 basically recite the additional limitation(s) of claim 21. Therefore, claim 28 is rejected for the same reason(s) as claim 21.
As per claim 29:
The additional limitation(s) of claim 29 basically recite the additional limitation(s) of claim 22. Therefore, claim 29 is rejected for the same reason(s) as claim 22.
As per claim 30:
The additional limitation(s) of claim 30 basically recite the additional limitation(s) of claim 23. Therefore, claim 30 is rejected for the same reason(s) as claim 23.
As per claim 35:
The additional limitation(s) of claim 35 basically recite the additional limitation(s) of claim 21. Therefore, claim 35 is rejected for the same reason(s) as claim 21.
As per claim 36:
The additional limitation(s) of claim 36 basically recite the additional limitation(s) of claim 22. Therefore, claim 36 is rejected for the same reason(s) as claim 22.
As per claim 37:
The additional limitation(s) of claim 37 basically recite the additional limitation(s) of claim 23. Therefore, claim 37 is rejected for the same reason(s) as claim 23.

Claims 25-26 and 32-33 are rejected under 35 U.S.C. 103 as being unpatentable Jackson (U.S. 2016/0011869), in view of Ford (U.S. 2005/0198473), Devaney et al. (U.S. 2004/0193838), in view of Wentzlaff et al. (U.S. 8,631,205), further in view of Sih et al. (U.S. 6,606,700).
As per claim 25:
Jackson, Ford, Devaney, and Wentzlaff disclosed the image processor of claim 20.
Jackson, Ford, Devaney, and Wentzlaff failed to teach wherein the first ALU and the second ALU of each processing unit are further configured to perform a fused operation comprising a second operation performed serially on the result of a first operation, during which: the first ALU is configured to receive data from the first pair of input ports, to perform the first operation, and to provide a result of the first operation to the second ALU; and the second ALU is configured to receive data from one input port of the second pair of input ports and to receive the result of the first operation from the first ALU, to perform the second operation, and to provide a result of the second operation to one of the two output ports.
However, Sih combined with Jackson, Ford, Devaney, and Wentzlaff disclosed wherein the first ALU and the second ALU of each processing unit are further configured to perform a fused operation comprising a second operation performed serially on the result of a first operation (Sih: Figure 1 elements 104 and 118, column 2 lines 3-26)(Jackson: Figures 1-2 elements 21-22, paragraphs 30 and 36)(Sih disclosed MAC operations that fuse a multiply and an add operation that are performed sequentially. Jackson disclosed add operations with carry bits being performed sequentially. The combination allows for the ALU of Jackson to perform MAC operations.), during which: 

the second ALU is configured to receive data from one input port of the second pair of input ports and to receive the result of the first operation from the first ALU, to perform the second operation, and to provide a result of the second operation to one of the two output ports (Sih: Figure 1 elements 104 and 118, column 2 lines 3-26)(Jackson: Figures 1-2 elements 21-22, paragraphs 30 and 36)(Sih disclosed MAC operations that fuse a multiply and an add operation that are performed sequentially. The combination allows for the ALU of Jackson to perform MAC operations. The adder receives a register file input and the multiplier output. The execution result is written back to the register file.).
The advantage of multiply-accumulation operations is that two operations can be completed as part of a single instruction, which improves processor performance. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the MAC operations of Sih into the processor of Jackson for the advantage of increased performance.
As per claim 26:
Jackson, Ford, Devaney, Wentzlaff, and Sih disclosed the image processor of 
As per claim 32:
The additional limitation(s) of claim 32 basically recite the additional limitation(s) of claim 25. Therefore, claim 32 is rejected for the same reason(s) as claim 25.
As per claim 33:
The additional limitation(s) of claim 33 basically recite the additional limitation(s) of claim 26. Therefore, claim 33 is rejected for the same reason(s) as claim 26.

Claim 40 is rejected under 35 U.S.C. 103 as being unpatentable over Jackson (U.S. 2016/0011869), in view of Ford (U.S. 2005/0198473), Devaney et al. (U.S. 2004/0193838), in view of Wentzlaff et al. (U.S. 8,631,205), further in view of Ronen et al. (U.S. 2002/0087955).
As per claim 40:
Jackson, Ford, Devaney, and Wentzlaff disclosed the image processor of claim 20, wherein the opcode represents one or two ALU operations, and wherein if the opcode represents one ALU operation, the processing unit is configured to cause both the first ALU and the second ALU to perform the same operation (Devaney: Figures 6A-B, paragraphs 18 and 94)(Ford: Figures 3, 5-6, 8, and 10, paragraphs 97, 99, 101, 104, and 106)(Jackson: Figures 1-2 and 4 elements 110, 124, 202, 406, and 412, paragraphs 30, 36, and 41)(Devaney disclosed vector and scalar instruction opcodes. Jackson disclosed two 32-bit ALUs operating in a 64-bit mode that perform a single double-wide 
Jackson, Ford, Devaney, and Wentzlaff failed to teach wherein if the opcode represents two ALU operations, the processing unit is configured to cause the first ALU and the second ALU to perform different operations.
However, Ronen combined with Jackson, Ford, Devaney, and Wentzlaff disclosed wherein if the opcode represents two ALU operations, the processing unit is configured to cause the first ALU and the second ALU to perform different operations (Devaney: Figures 6A-B, paragraphs 18 and 94)(Ronen: Figures 2 and 6 elements 710a-b, paragraphs 18, 26-27, 29, 38, and 41-42)(Jackson: Figures 2 and 4 elements 202 and 422, paragraphs 36 and 41)(Devaney disclosed vector and scalar instruction opcodes .Ronen disclosed fused 32-bit instructions with an opcode that specifies two different operations. The combination allows for Jackson to execute the fused addition-subtraction 32-bit instructions in the second operation mode by detecting the corresponding opcode.).
The advantage of implementing fused instructions is that the code footprint of a program is reduced, which improves performance (Ronen: Paragraphs 5-6 and 8). Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the fused instructions of Ronen into the processor of Jackson for the advantage of increased performance.

Response to Arguments
The arguments presented by Applicant in the response, received on 11/23/2021 are partially considered persuasive.
Applicant argues for claims 20, 27, and 34:
“Applicant respectfully submits that the cited portion of Jackson does not disclose or suggest "instructions having an instruction format having an opcode that specifies whether the first ALU and the second ALU operate in sequence or in parallel," as recited by amended claim 20. Additionally, the cited portions of Jackson do not disclose or suggest an instruction format that causes the processing unit to "switch between (i) operating the first ALU and the second ALU in sequence to perform a double-width ALU operation and (ii) operating the first ALU and the second ALU in parallel to perform four half-width ALU operations," as recited by amended claim 20. Rather, Jackson merely recites an "arbitrating logic determining which thread has priority."”

This argument is found to be persuasive for the following reason. The examiner agrees that Jackson alone failed to teach the newly claimed limitations. However, a new ground of rejection has been given due to the amendment.
Applicant argues for claims 20, 27, and 34:
“However, these alleged officially noticed facts do not cure the deficiencies of Jackson. A mere opcode for an ALU does not disclose or suggest "instructions having an instruction format having an opcode that specifies whether the first ALU and the second ALU operate in sequence or in parallel and that, when executed, cause the processing unit to switch between (i) operating the first ALU and the second ALU in sequence to perform a double-width ALU operation and (ii) operating the first ALU and the second ALU in parallel to perform four half-width ALU operations," as recited by amended claim 20.”  

This argument is not found to be persuasive for the following reason. A new secondary reference has replaced the previous official notice to teach instruction opcodes. An instruction opcode detected by a decoder provides control signals to correctly execute the instruction by execution units. Thus, the opcode allows for 

	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183