DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

RESPONSE TO ARGUMENTS
Applicant's arguments filed 10/21/2021 have been fully considered but they are not persuasive.

In response to applicant’s arguments with regard to the independent claim 1 rejected under 35 U.S.C. 103(a) that the combination of the references does not teach/suggest the claimed feature “… replication circuitry configured to, responsive to a vector instruction and while retaining the data in the vector register, replicate a selected sub-vector value from the vector register and input multiple copies of the selected sub-vector value to vector operation circuitry …” because:
Reid does not teach the above claimed features;
Gschwind fail to disclose inputting the replicated value to vector operation circuitry; and 
Eichenberger (2012) do not disclose or suggest that different elements of the same vector register are within different FMA units, and that the load and splat instruction inputs the replicated data to the FMA units because Eichenberger (2012) describes that load and splat instructions causes the replicated data to be provided to a vector register of the QRF 330 that is outside the processing execution unit 320 and that an element of the vector register is associated with a particular FMA unit (Fig. 16; 
The examiner respectfully disagrees, and to further clarify, in relation to applicant’s above discussing of the cited Eichenberger reference, by combining Gschwind’s operations in response to a vector instruction and while retaining the data in the vector register, replicate a selected sub-vector value from the vector register, and operating with the selected sub-vector value accordingly (e.g. equate to vector splat instruction that duplicate an element of a vector register into every element of another vector register) (col. 13, l. 48 to col. 14, l. 13) with Eichenberger’s replication circuitry (e.g. circuitry associated with load and splat operation wherein an element (“a”) is replicated into multiple/different elements (multiple “a”) in vector register (1640 in Fig. 16) that may be part of QRF (300) in Fig. 3) and input multiple copies of the value to vector operation circuitry (e.g. associated with inputting the multiple/different elements to the FMA units in the QPU (320) in Figure 3 for processing accordingly) (Fig. 3; Fig. 16; [0006]; [0073]; [0152]; [0159]-[0161]; and [0209]), the resulting combination of the references would further teaches/suggests the above claimed features.
As applicant appears to be applying the above arguments for independent claim 1 towards independent claims 10, 18 and 21, the examiner will also apply the above response for independent claim 1 towards independent claims 10, 18 and 21.

In response to applicant’s arguments with regard to the dependent claim 2 rejected under 35 U.S.C. 103(a) that the combination of the references does not teach/suggest the claimed feature “… input … to the vector operation circuitry independent of any vector register included within the vector operation circuitry …”  because Eichenberger (2012) does not disclose/suggest the above claimed features; applicant's arguments have fully been considered, but are not found to be persuasive.
The examiner respectfully disagrees, and to further clarify, Eichenberger (2012) does disclose/suggest input to the vector operation circuitry independent of any vector register included within the vector operation circuitry (e.g. associated with inputting via wiring interconnecting the QRF (300) and the QPU (320) wherein the interconnecting wiring for inputting is independent of any vector register) Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]).

In response to applicant’s arguments with regard to the dependent claim 4 rejected under 35 U.S.C. 103(a) that the combination of the references does not teach/suggest the claimed feature “… the vector operation circuitry is configured to perform, responsive to the vector instruction [that caused the replication circuitry to replicate the selected sub-vector value and input multiple copies of the selected sub-vector value to the vector operation circuitry], a vector multiply-accumulate operation using replicated sub-vector values and using sub-vector values in the second vector register …” because Eichenberger (2012) is silent regarding the FMA unit performing the multiplication responsive to the load and splat instruction or another instruction; applicant's arguments have fully been considered, but are not found to be persuasive.
The examiner respectfully disagrees, and to further clarify, Eichenberger’s load and splat instruction replicates a value and input multiple copies of the value to FMA units, wherein the input multiple copies of the value would then be multiply accordingly Eichenberger’s load and splat instruction multiplication operations are performed (Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]).

I. REJECTIONS BASED ON PRIOR ART
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-10, and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Reid et al. (US Patent 10,261,789) in view of Gschwind et al. (US Patent 9,588,746) and Eichenberger et al. (US Pub.: 2012/0011348).

As per claim 1, Reid teaches/suggests a processor comprising: a vector register configured to load data from a cache memory responsive to a special purpose load instruction (e.g. as the vector load instruction move data operands from the cache/memory to the vector register) (col. 9, l. 53 to col. 10, l. 26).
Reid does not teach the processor comprising: replication circuitry configured to, responsive to a vector instruction and while retaining the data in the vector register, replicate a selected sub-vector value from the vector register and input multiple copies of the selected sub-vector value to vector operation circuitry.
Gschwind teaches/suggests a processor comprising: circuitry configured to, responsive to a vector instruction and while retaining the data in the vector register, replicate a selected sub-vector value from the vector register, and operating with the selected sub-vector value accordingly (e.g. equate to vector splat instruction that duplicate an element of a vector register into every element of another vector register) (col. 13, l. 48 to col. 14, l. 13).
Eichenberger (2012) teaches/suggests a processor comprising: replication circuitry (e.g. circuitry associated with load and splat operation wherein an element (“a”) is replicated into multiple/different elements (multiple “a”) in vector register (1640 in Fig. 16) that may be part of QRF (300) in Fig. 3) and input multiple copies of the value to vector operation circuitry (e.g. associated with inputting the multiple/different elements to the FMA units in the QPU (320) in Figure 3 for processing accordingly) (Fig. 3; Fig. 16; [0006]; [0073]; [0152]; [0159]-[0161]; and [0209]).
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Gschwind’s instruction operations and Eichenberger’s vector operations into Reid’s processor for the benefit of implementing an effective and efficient complier to easily port code written for Big Endian to a target system that is Little Endian, and vice versa (Gschwind, col. 14, ll. 59-64), and efficient manipulation of vectors by combining them with scalar instructions while accomplish Eichenberger, [0043]) to obtain the invention as specified in claim 1.

As per claim 2, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 1 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the circuitry includes a multiplexor having an input coupled to the vector register and an output coupled to the vector operation circuitry, the multiplexor configured to select any sub-vector value from the vector register and to input the multiple copies of the selected sub-vector value to the vector operation circuitry independent of any vector register included within the vector operation circuitry (e.g. associated with inputting via wiring interconnecting the QRF (300) and the QPU (320) wherein the interconnecting wiring for inputting is independent of any vector register) (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]), wherein it would have been obvious and/or well-known to one of ordinary skilled in the art to use multiplexer for to splat data to multiple targets, wherein Eichenberger (2012)’s Figure 7 also suggest on the use of multiplexer for data communication.

As per claim 4, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 1 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor further comprising a second vector register, wherein the vector instruction corresponds to a vector multiply-accumulate instruction, Eichenberger (2012)) using replicated sub-vector values and using sub-vector values in the second vector register (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]).

As per claim 5, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 4 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor further comprising a vector register file that includes the second vector register, and wherein the vector register is outside of the vector register file (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 2-3; Fig. 16; [0006]; [0069]-[0071]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]).

As per claim 6, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 4 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the replication circuitry is further configured to replicate a second sub-vector value from the vector register in parallel with replicating the selected sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 16; Fig. 27; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0195]-[0197]; [0209]).

Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 6 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the vector operation circuitry is configured to perform a second vector operation in parallel with performing the vector multiply-accumulate operation, the second vector operation using the second replicated sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 16; Fig. 27; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0195]-[0201]; [0209]).

As per claim 8, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 6 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the replication circuitry is configured to apply an offset to a position in the vector register of the selected sub-vector value to select a position in the vector register of the second sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 16; Fig. 27; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0195]-[0201]; [0209]), functionally equate to the proper selection of the first/selected sub-vector value and the second sub-vector value.

As per claim 9, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 8 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the position of the selected sub-vector value is indicated by a loop parameter of a convolutional filter operation Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 16; Fig. 27; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0195]-[0201]; [0209]), functionally equate to the proper indication/selection of the corresponding sub-vector value in the vector register for vector operation.

As per claim 10, claim 10 is rejected in accordance to the same rational and reasoning as the above rejection of claim 1, where Reid, Gschwind and Eichenberger further (2012) further teach/suggest the method comprising without altering the data in the vector register (e.g. equate to vector splat instruction that duplicate an element of a vector register into every element of another vector register) (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]).

As per claim 13, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 10 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the method further comprising, responsive to the vector instruction: performing a vector operation using the replicated sub-vector values and sub-vector values in a second vector register; and storing results of the vector operation into a third vector register (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 16; Fig. 27; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0195]-[0197]; [0209]).

. 

Claims 3, 12 and 18-23 are rejected under 35 U.S.C. 103 as being unpatentable over Reid et al. (US Patent 10,261,789) in view of Gschwind et al. (US Patent 9,588,746) and Eichenberger et al. (US Pub.: 2012/0011348) as applied to claims 1 and 10 above, and further in view of Eichenberger et al. (US Pub.: 2009/0307656) and Hui (US Patent 8,108,652).

As per claim 3, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 1 above, where Reid, Gschwind and Eichenberger (2012) teach/suggest the processor comprising wherein: the cache memory comprises a higher-level cache (e.g. L2 cache 74 in Fig. 1 of Reid) and a separate lower-level cache (e.g. L1 cache 72 in Fig. 1 of Reid); and the special purpose load instruction is configured to cause loading of multiple values in parallel into the vector register the multiple values (Reid, Fig. 1; col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]), but Reid, Gschwind and Eichenberger (2012) do not teach the processor comprising: loading scalar values from the higher-level cache without transferring the scalar values through the lower-level cache.
Eichenberger (2009) teaches/suggests a processor comprising: loading scalar values ([0034]-[0040]).
Hui teaches/suggests a processor comprising: from the higher-level cache without transferring through the lower-level cache (e.g. transferring data directly from L2 cache: col. 8, ll. 17-18) (col. 8, ll. 13-63; and col. 11, ll. 5-9)
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Eichenberger’s scalar loading operations and Hui’s direct transferring into Reid, Gschwind and Eichenberger (2012)’s processor for the benefit of optimizing scalar code execution on a SIMD engine (Eichenberger (2009), Abstract) and utilizing the L1 cache more efficiently (Hui, col. 8, ll. 13-27) to obtain the invention as specified in claim 3.

As per claim 12, claim 12 is rejected in accordance to the same rational and reasoning as the above rejection of claim 3 

As per claim 18, claim 18 is rejected in accordance to the same rational and reasoning as the above rejection of claims 1 and 3, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the apparatus comprising: means for performing a vector operation; means for storing data (e.g. associated with vector register); and the means for performing a vector operation (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9).

As per claim 19, Reid, Gschwind, Eichenberger (2012), and Hui teach/suggest all the claimed features of claim 18 above, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the apparatus further comprising: second means for storing data, and wherein the means for performing a vector operation is configured to use the replicated sub-vector values and sub-vector values in the second means for storing data (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9)

As per claim 20, Reid, Gschwind, Eichenberger (2012), and Hui teach/suggest all the claimed features of claim 18 above, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the apparatus comprising: wherein the means for replicating and inputting is further configured to replicate a second sub-vector value from the means for storing data in parallel with replicating the selected sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 16; Fig. 27; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0195]-[0197]; [0209]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9).

As per claim 21, claim 21 is rejected in accordance to the same rational and reasoning as the above rejection of claims 1, 3 and 18.

Reid, Gschwind, Eichenberger (2012), and Hui teach/suggest all the claimed features of claim 21 above, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the non-transitory computer-readable medium, wherein the operations further comprise: performing a vector operation using the replicated sub-vector values and using sub-vector values in a second vector register; and storing results of the vector operation into a third vector register (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; Eichenberger (2012), Fig. 3; Fig. 16; Fig. 27; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0195]-[0197]; [0209]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9).

As per claim 23, Reid, Gschwind, Eichenberger (2012), and Hui teach/suggest all the claimed features of claim 21 above, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the non-transitory computer-readable medium comprising wherein a position of the selected sub-vector value in the vector register is indicated by a loop parameter of a convolutional filter operation (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; Eichenberger (2012), Fig. 3; Fig. 16; Fig. 27; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0195]-[0201]; [0209]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9), functionally equate to the proper indication/selection of the corresponding sub-vector value in the vector register for vector operation.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Reid et al. (US Patent 10,261,789) in view of Gschwind et al. (US Patent 9,588,746) and Eichenberger et al. (US Pub.: 2012/0011348) as applied to claim 10 above, and further in view of Eichenberger et al. (US Pub.: 2009/0307656).
Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 10 above, where Reid, Gschwind and Eichenberger (2012) teach/suggest the method further comprising accessing a value from a register, the value indicating the selected sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 3; Fig. 7; Fig. 16; [0006]; [0073]; [0098]-[0099]; [0152]; [0159]-[0161]; [0209]), but Reid, Gschwind and Eichenberger (2012) do not teach the processor comprising: having a scalar value from a scalar register, the scalar value operating accordingly.
Eichenberger (2009) teaches/suggests a processor comprising: having a scalar value from a scalar register, the scalar value operating accordingly ([0034]-[0040]).
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Eichenberger’s scalar loading operations into Reid, Gschwind and Eichenberger (2012)’s processor for the benefit of optimizing scalar code execution on a SIMD engine (Eichenberger (2009), Abstract) to obtain the invention as specified in claim 11.

II. CLOSING COMMENTS
CONCLUSION
STATUS OF CLAIMS IN THE APPLICATION
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P.  707.07(i):
CLAIMS REJECTED IN THE APPLICATION
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
    
DIRECTION OF FUTURE CORRESPONDENCES
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHUN KUAN LEE whose telephone number is (571)272-0671.  The examiner can normally be reached on Monday-Friday.				
IMPORTANT NOTE

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHUN KUAN LEE/Primary Examiner
Art Unit 2181                                                                                                                                                                                                        November 10, 2021