DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

RESPONSE TO ARGUMENTS
Applicant's arguments filed 1/11/2021 have been fully considered but they are not persuasive. Applicant’s arguments with respect to claims 3, 12, 18 and 21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

In response to applicant’s arguments with regard to the independent claim 1 rejected under 35 U.S.C. 103(a) that a person of ordinary skilled in the art would not be motivated to incorporate Gschwind with Reid or Eichenberger because neither Reid nor Eichenberger express any concern with Gschwind’s big endian, little endian, or bi-endian architectures; applicant's arguments have fully been considered, but are not found to be persuasive.
The examiner respectfully disagrees, and to further clarify, as Reid discloses a processing apparatus with vector processing portion for processing vector instructions, and Gschwind and Eichenberger disclose processing of vector instructions; therefore, a person of ordinary skilled in the art would combined the cited references to teach/suggest the processing apparatus with vector processing portion for processing vector instructions.


In response to applicant’s arguments with regard to the independent claim 1 rejected under 35 U.S.C. 103(a) that the combination of the references does not teach/suggest the claimed feature “… circuitry configured to … replicate a selected sub-vector value from the vector register …” because Gschwind teaches a source code instruction for a vector splat operation, which is not the same as circuitry configure to replicate a selected sub-vector value; applicant's arguments have fully been considered, but are not found to be persuasive.
The examiner respectfully disagrees, and to further clarify, Gschwind does teach/suggest “circuit” as Gschwind would have included the corresponding hardware/circuit in order to process software/code/instruction for implementing the vector splay operation.
As applicant appears to be applying the above arguments for independent claim 1 towards independent claims 10, 18 and 21, the examiner will also apply the above response for independent claim 1 towards independent claims 10, 18 and 21.

I. REJECTIONS BASED ON PRIOR ART
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-10, and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Reid et al. (US Patent 10,261,789) in view of Gschwind et al. (US Patent 9,588,746) and Eichenberger et al. (US Pub.: 2012/0011348).

As per claim 1, Reid teaches/suggests a processor comprising: a vector register configured to load data from a cache memory responsive to a special purpose load instruction (e.g. as the vector load instruction move data operands from the cache/memory to the vector register) (col. 9, l. 53 to col. 10, l. 26).
Reid does not teach the processor comprising: circuitry configured to, responsive to a vector instruction and while retaining the data in the vector register, replicate a selected sub-vector value from the vector register to inputs of vector operation circuitry.
Gschwind teaches/suggests a processor comprising: circuitry configured to, responsive to a vector instruction and while retaining the data in the vector register, replicate a selected sub-vector value from the vector register (e.g. equate to vector splat 
Eichenberger (2012) teaches/suggests a processor comprising: communicating to inputs of vector operation circuitry (e.g. communicating to Fig, 16, ref. 1640 of vector operation circuitry) (Fig. 16; and [0159]-[0161])
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Gschwind’s instruction operations and Eichenberger’s vector operations into Reid’s processor for the benefit of implementing an effective and efficient complier to easily port code written for Big Endian to a target system that is Little Endian, and vice versa (Gschwind, col. 14, ll. 59-64), and efficient manipulation of vectors by combining them with scalar instructions while accomplish more complex functions (Eichenberger, [0043]) to obtain the invention as specified in claim 1.

As per claim 2, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 1 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the circuitry includes a multiplexor having an input coupled to the vector register and an output coupled to the vector operation circuitry, the multiplexor configured to select any sub-vector value from the vector register and to output multiple copies of the selected sub-vector value to the vector operation circuitry (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 7; Fig. 16; [0098]-[0099]; [0159]-[0161]), wherein it would have been obvious and/or well-known to one of ordinary skilled in the Eichenberger (2012)’s Figure 7 also suggest on the use of multiplexer for data communication.

As per claim 4, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 1 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor further comprising a second vector register, wherein the vector instruction corresponds to a vector multiply-accumulate instruction, and wherein the vector operation circuitry is configured to perform, responsive to the vector instruction, a vector multiply-accumulate operation (e.g. multiplication addition operations in Fig. 16 of Eichenberger (2012)) using replicated sub-vector values and using sub-vector values in the second vector register (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 7; Fig. 16; [0098]-[0099]; [0159]-[0161]).

As per claim 5, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 4 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor further comprising a vector register file that includes the second vector register, and wherein the vector register is outside of the vector register file (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 2; Fig. 16; [0069]-[0071]; [0098]-[0099]; [0159]-[0161]).

As per claim 6, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 4 above, where Reid, Gschwind and Eichenberger (2012) Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 16; Fig. 27; [0098]-[0099]; [0159]-[0161]; [0195]-[0197]).

As per claim 7, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 6 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the vector operation circuitry is configured to perform a second vector operation in parallel with performing the vector multiply-accumulate operation, the second vector operation using the second replicated sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 16; Fig. 27; [0098]-[0099]; [0159]-[0161]; [0195]-[201]).

As per claim 8, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 6 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the circuitry is configured to apply an offset to a position in the vector register of the selected sub-vector value to select a position in the vector register of the second sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 16; Fig. 27; [0098]-[0099]; [0159]-[0161]; [0195]-[201]), functionally equate to the proper selection of the first/selected sub-vector value and the second sub-vector value.

As per claim 9, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 8 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the processor comprising wherein the position of the selected sub-vector value is indicated by a loop parameter of a convolutional filter operation (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 16; Fig. 27; [0098]-[0099]; [0159]-[0161]; [0195]-[201]), functionally equate to the proper indication/selection of the corresponding sub-vector value in the vector register for vector operation.

As per claim 10, claim 10 is rejected in accordance to the same rational and reasoning as the above rejection of claim 1, where Reid, Gschwind and Eichenberger further (2012) further teach/suggest the method comprising without altering the data in the vector register (e.g. equate to vector splat instruction that duplicate an element of a vector register into every element of another vector register) (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 7; Fig. 16; [0098]-[0099]; [0159]-[0161]).

As per claim 13, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 10 above, where Reid, Gschwind and Eichenberger (2012) further teach/suggest the method further comprising, responsive to the vector instruction: performing a vector operation using the replicated sub-vector values and sub-vector values in a second vector register; and storing results of the vector operation Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 16; Fig. 27; [0098]-[0099]; [0159]-[0161]; [0195]-[0197]).

As per claims 14-17, claims 14-17 are rejected in accordance to the same rational and reasoning as the above rejection of claims 6-9. 

Claims 3, 12 and 18-23 are rejected under 35 U.S.C. 103 as being unpatentable over Reid et al. (US Patent 10,261,789) in view of Gschwind et al. (US Patent 9,588,746) and Eichenberger et al. (US Pub.: 2012/0011348) as applied to claims 1 and 10 above, and further in view of Eichenberger et al. (US Pub.: 2009/0307656) and Hui (US Patent 8,108,652).

As per claim 3, Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 1 above, where Reid, Gschwind and Eichenberger (2012) teach/suggest the processor comprising wherein: the cache memory comprises a higher-level cache (e.g. L2 cache 74 in Fig. 1 of Reid) and a separate lower-level cache (e.g. L1 cache 72 in Fig. 1 of Reid); the special purpose load instruction is configured to cause loading of multiple values in parallel into the vector register the multiple values (Reid, Fig. 1; col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 7; Fig. 16; [0098]-[0099]; and [0159]-[0161]), but Reid, Gschwind and Eichenberger (2012) do not teach the processor comprising: loading 
Eichenberger (2009) teaches/suggests a processor comprising: loading scalar values ([0034]-[0040]).
Hui teaches/suggests a processor comprising: from the higher-level cache without transferring through the lower-level cache (e.g. transferring data directly from L2 cache: col. 8, ll. 17-18) (col. 8, ll. 13-63; and col. 11, ll. 5-9)
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Eichenberger’s scalar loading operations and Hui’s direct transferring into Reid, Gschwind and Eichenberger (2012)’s processor for the benefit of optimizing scalar code execution on a SIMD engine (Eichenberger (2009), Abstract) and utilizing the L1 cache more efficiently (Hui, col. 8, ll. 13-27) to obtain the invention as specified in claim 3.

As per claim 12, claim 12 is rejected in accordance to the same rational and reasoning as the above rejection of claim 3 

As per claim 18, claim 18 is rejected in accordance to the same rational and reasoning as the above rejection of claims 1 and 3, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the apparatus comprising: means for performing a vector operation; means for storing data (e.g. associated with vector register); and into multiple inputs of the means for performing a vector operation (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; Eichenberger Hui, col. 8, ll. 13-63; col. 11, ll. 5-9).

As per claim 19, Reid, Gschwind, Eichenberger (2012), and Hui teach/suggest all the claimed features of claim 18 above, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the apparatus further comprising: second means for storing data, and wherein the means for performing a vector operation is configured to use the replicated sub-vector values and sub-vector values in the second means for storing data (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; Eichenberger (2012), Fig. 7; Fig. 16; [0098]-[0099]; [0159]-[0161]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9)

As per claim 20, Reid, Gschwind, Eichenberger (2012), and Hui teach/suggest all the claimed features of claim 18 above, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the apparatus comprising: wherein the means for replicating is further configured to replicate a second sub-vector value from the means for storing data in parallel with replicating the selected sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 16; Fig. 27; [0098]-[0099]; [0159]-[0161]; [0195]-[0197]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9).

As per claim 21, claim 21 is rejected in accordance to the same rational and reasoning as the above rejection of claims 1, 3 and 18.

As per claim 22, Reid, Gschwind, Eichenberger (2012), and Hui teach/suggest all the claimed features of claim 21 above, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the non-transitory computer-readable medium, wherein the operations further comprise: performing a vector operation using the replicated sub-vector values and using sub-vector values in a second vector register; and storing results of the vector operation into a third vector register (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; Eichenberger (2012), Fig. 16; Fig. 27; [0098]-[0099]; [0159]-[0161]; [0195]-[0197]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9).

As per claim 23, Reid, Gschwind, Eichenberger (2012), and Hui teach/suggest all the claimed features of claim 21 above, where Reid, Gschwind, Eichenberger (2012), and Hui further teach/suggest the non-transitory computer-readable medium comprising wherein a position of the selected sub-vector value in the vector register is indicated by a loop parameter of a convolutional filter operation (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; Eichenberger (2012), Fig. 16; Fig. 27; [0098]-[0099]; [0159]-[0161]; [0195]-[201]; and Hui, col. 8, ll. 13-63; col. 11, ll. 5-9), functionally equate to the proper indication/selection of the corresponding sub-vector value in the vector register for vector operation.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Reid et al. (US Patent 10,261,789) in view of Gschwind et al. (US Patent 9,588,746) and Eichenberger et al. (US Pub.: 2012/0011348) as applied to claim 10 above, and further in view of Eichenberger et al. (US Pub.: 2009/0307656).
Reid, Gschwind and Eichenberger (2012) teach/suggest all the claimed features of claim 10 above, where Reid, Gschwind and Eichenberger (2012) teach/suggest the method further comprising accessing a value from a register, the value indicating the selected sub-vector value (Reid, col. 9, l. 53 to col. 10, l. 26; Gschwind, col. 13, l. 48 to col. 14, l. 13; and Eichenberger (2012), Fig. 7; Fig. 16; [0098]-[0099]; and [0159]-[0161]), but Reid, Gschwind and Eichenberger (2012) do not teach the processor comprising: having a scalar value from a scalar register, the scalar value operating accordingly.
Eichenberger (2009) teaches/suggests a processor comprising: having a scalar value from a scalar register, the scalar value operating accordingly ([0034]-[0040]).
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Eichenberger’s scalar loading operations into Reid, Gschwind and Eichenberger (2012)’s processor for the benefit of optimizing scalar code execution on a SIMD engine (Eichenberger (2009), Abstract) to obtain the invention as specified in claim 11.
II. CLOSING COMMENTS
CONCLUSION
STATUS OF CLAIMS IN THE APPLICATION
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P.  707.07(i):
CLAIMS REJECTED IN THE APPLICATION
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
    
DIRECTION OF FUTURE CORRESPONDENCES
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHUN KUAN LEE whose telephone number is (571)272-0671.  The examiner can normally be reached on Monday-Friday.				
IMPORTANT NOTE
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Idriss Alrobaye can be reached on (571) 270-1023.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHUN KUAN LEE/Primary Examiner
Art Unit 2181                                                                                                                                                                                                        February 02, 2021