DETAILED ACTION

Status of Application
Claims 1-20 are pending in the present application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 8 is objected to because of the following informalities:  “instruciton” is line 5 is misspelled.  Appropriate correction is required.


Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the 

Claims 1, 7, 8, and 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lichtenau et al (hereinafter Lichtenau), U.S. Publication No. 2020/0264877 A1, in view of Anderson et al (hereinafter Anderson), U.S. Publication No. 2017/0115989 A1, in view of Shippy et al (hereinafter Shippy), U.S. Patent No. 6,226,722 B1, in view of Heishi et al (hereinafter Heishi), U.S. Publication No. 2004/015006 A1.
Referring to claim 1, Lichtenau discloses a method to reverse source data in a processor [paragraph 70, “Another instruction to load and reverse data is the Vector Load Elements Reversed (VLER) instruction”], comprising: 
a source register [fig. 4B, element 420; paragraph 70, “This instruction loads elements of data from memory (or other source location) to another location (e.g., a register or other location)”] including a plurality of lanes [paragraphs 72-76, element size is indicated by field 412 where element size can be a word (4 bytes). Hence each lane is a word (4 bytes) and thus the source register 420 includes four lanes (paragraph 75, 420 includes 16 bytes of data. The size of the data elements (word in this case) is equivalent to lanes into which the source data is divided]; and
a destination register [fig. 4B; paragraph 70, “This instruction loads elements of data from memory (or other source location) to another location (e.g., a register or other location)”] including a plurality of lanes corresponding to the plurality of lanes of the source register [paragraphs 72-76, element size is 
executing the vector reverse instruction, wherein executing the vector reverse instruction includes storing the data elements in the destination register in reverse order [paragraph 74, “The order of the elements is reversed when loading into the vector register. For instance, element zero in storage becomes the rightmost element in the vector register, element one in storage becomes the second to last element, and so forth”; paragraph 75, Example resulting byte positions from executing the instruction, based on the element size, are shown in FIG. 4B. As depicted, operand 2 (420) includes data to be loaded from memory (e.g., 16 bytes of data). If M3 is equal to 2 (word), then the result is as shown at 424], wherein each of the data elements is stored in one of the plurality of lanes of the destination register [fig. 4B, storing word size data elements into destination register 424. For example, M3 is equal to 2 and register 420 includes four lanes (four left most bytes can be lane 1, next four left most bytes can be lane 2…, four right most bytes can be lane 4). The order of elements are reversed and the result is shown at 424 where the four right most bytes of 424 (lane 4) stores the data elements of lane 1 of 420].
Lichtenau does not explicitly disclose selecting a data path for a vector reverse instruction based on a value of an instruction bit in the vector reverse instruction;
responsive to selecting the data path for the vector reverse instruction.
However, Anderson discloses selecting a data path for a vector reverse instruction based on a value of an instruction bit in the vector reverse instruction [fig. 1, 
responsive to selecting the data path for the vector reverse instruction [paragraphs 39-40, Instruction decode unit 113 decodes each instruction in a current execute packet. Decoding includes identification of the functional unit performing the instruction, identification of registers used to supply data for the corresponding data processing operation from among possible register files and identification of the register destination of the results of the corresponding data processing operation], in order to provide greater flexibility in arranging instructions [paragraph 7].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the method of Lichtenau to provide greater flexibility in arranging instructions. It is for this reason one of ordinary skill in the art would have been motivated to implement selecting a data path for a vector reverse instruction based on a value of an instruction bit in the vector reverse instruction; responsive to selecting the data path for the vector reverse instruction.
The modified Lichtenau does not explicitly disclose retrieving the source data from a level two cache and providing the source data to the processor, wherein retrieving and providing the source data includes bypassing a level one cache;

including a portion of the source data.
However, Shippy discloses retrieving the source data from a level two cache and providing the source data to the processor, wherein retrieving and providing the source data includes bypassing a level one cache [fig. 4, col. 6, line 63 – col. 7, line 15, “However, the present invention allows information found in a level two cache to be directly provided to the requesting processor”; “If the L2 cache hit was for data information, it would be provided to register 51 and then accessed by either FPU 5 or FXU 7, without being previously stored in L1 D-cache 14”; col. 7, lines 46-56, “Data requested by either FPU 5 or FXU 7 can also be directly provided from L2 cache 15, via reload bus 2. That is, when data is requested and a hit occurs in L2 cache 15 the data is placed into register 51 and then subsequently moved into the requesting processing unit (FPU 5 or FXU 7) and L1 D-cache 14 during the next cycle. Although register 51 is physically located in L1 cache chip 13, no L1 latency is associated with storing data from the L2 cache 15 therein, i.e. there is no address translation or the like. Thus, the data from L2 cache 15 and memory 17 bypasses L1 cache 14”];
providing a data element to each lane of the source register, each data element
including a portion of the source data [fig. 4, col. 6, line 63 – col. 7, line 15, “If the L2 cache hit was for data information, it would be provided to register 51”; col. 7, lines 46-56, “the data is placed into register 51”; The examiner notes that element 420 of Lichtenau comprises a data element to each lane and Shippy teaches providing data to be placed into register 51/register 420 of Lichtenau], in order to provide increased speed and efficiency [col. 2, lines 51-63].

including a portion of the source data.
	The modified Lichtenau does not explicitly disclose responsive to a parallel (P) bit in the vector reverse instruction being in a first state, executing the vector reverse instruction in parallel with a subsequent instruction.
	However, Heishi discloses responsive to a parallel (P) bit in the vector reverse instruction being in a first state, executing the vector reverse instruction in parallel with a subsequent instruction [paragraph 82; figs. 1A-1D; “The 0th bit of each instruction indicates parallel execution boundary information. When the parallel execution boundary information is ‘1’, there exists a boundary of parallel execution between the instruction and the subsequent instructions. When the parallel execution boundary information is ‘0’, there exists no boundary of parallel execution. How to use the parallel execution boundary information will be described later”; paragraphs 108-109; figs. 4-5C; parallel execution boundary information for instructions 1, 3, 4, 6, 7, 8 is "0"; The execution unit 70 executes in parallel the instructions up to the instruction whose parallel execution boundary information is 1, hence an instruction including 0th bit (parallel bit) being in a first state of ‘0’ is executed in parallel with a subsequent th bit of ‘0’ which indicates it is to be executed in parallel with subsequent instruction 2. Instruction 2 has a 0th bit of ‘1’ which indicates a boundary and therefore no parallel execution with the subsequent instruction. Instructions 3, 4, and 5 are executed in parallel with instruction 3 having a 0th bit of ‘0’ which indicates it is to be executed in parallel with the subsequent instruction 4. Instruction 4 has a 0th bit of ‘0’ which indicates it is to be executed in parallel with instruction 5. Instruction 5 has a 0th bit of ‘1’ which indicates a boundary and therefore no parallel execution with the subsequent instruction], in order to provide parallel processing capability with low power consumption [paragraph 8].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the method of the modified Lichtenau to provide parallel processing capability with low power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement responsive to a parallel (P) bit in the vector reverse instruction being in a first state, executing the vector reverse instruction in parallel with a subsequent instruction.
Referring to claims 7 and 14, taking claim 7 as exemplary, the modified Lichtenau discloses the method of claim 1, wherein each data element remains in-order when creating the reversed source data [Lichtenau, paragraph 74, “The bytes within the elements themselves are not reversed, in this example”].
Referring to claim 8, Lichtenau discloses a data processor, comprising:

a destination register [fig. 4B; paragraph 70, “This instruction loads elements of data from memory (or other source location) to another location (e.g., a register or other location)”] including a plurality of lanes corresponding to the plurality of lanes of the source register [paragraphs 72-76, element size is indicated by field 412 where element size can be a word (4 bytes). Hence each lane is a word (4 bytes) and thus the source register 420 includes 4 lanes and destination register 424 includes corresponding 4 lanes];
wherein, responsive to execution of the vector reverse instruction, the data processor is configured to store the data elements in the destination register in reverse order [paragraph 74, “The order of the elements is reversed when loading into the vector register. For instance, element zero in storage becomes the rightmost element in the vector register, element one in storage becomes the second to last element, and so forth”; paragraph 75, Example resulting byte positions from executing the instruction, based on the element size, are shown in FIG. 4B. As depicted, operand 2 (420) includes data to be loaded from memory (e.g., 16 bytes of data). If M3 is equal to 2 3 is equal to 2 and register 420 includes four lanes (four left most bytes can be lane 1, next four left most bytes can be lane 2…, four right most bytes can be lane 4). The order of elements are reversed and the result is shown at 424 where the four right most bytes of 424 (lane 4) stores the data elements of lane 1 of 420].
Lichtenau does not explicitly disclose an instruction dispatch unit to select a data path for a vector reverse instruction based on a value of an instruction bit; and
a streaming engine.
However, Anderson discloses an instruction dispatch unit to select a data path for a vector reverse instruction based on a value of an instruction bit  [fig. 1, elements 115-117, paragraph 39, “One part of the dispatch task of instruction dispatch unit 112 is determining whether the instruction is to execute on a functional unit in scalar datapath side A 115 or vector datapath side B 116. An instruction bit within each instruction called the s bit determines which datapath the instruction controls”; paragraphs 39-40]; and
a streaming engine [fig. 1, element 125] in order to provide greater flexibility in arranging instructions [paragraph 7].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the method of Lichtenau to provide greater flexibility in arranging instructions. It is for this reason one of ordinary skill in the art would have been motivated to implement an instruction 
The modified Lichtenau does not explicitly disclose the streaming engine to retrieve source data from a level two cache and provide the source data to the data processor, wherein retrieving and providing the source data includes bypassing a level one cache.
However, Shippy discloses the streaming engine [fig. 4, element 13] to retrieve source data from a level two cache [fig. 4, element 15] and provide the source data to the data processor [fig. 4, element 5 or 7], wherein retrieving and providing the source data includes bypassing a level one cache [fig. 4, col. 6, line 63 – col. 7, line 15, “However, the present invention allows information found in a level two cache to be directly provided to the requesting processor”; “If the L2 cache hit was for data information, it would be provided to register 51 and then accessed by either FPU 5 or FXU 7, without being previously stored in L1 D-cache 14”; col. 7, lines 46-56, “Data requested by either FPU 5 or FXU 7 can also be directly provided from L2 cache 15, via reload bus 2. That is, when data is requested and a hit occurs in L2 cache 15 the data is placed into register 51 and then subsequently moved into the requesting processing unit (FPU 5 or FXU 7) and L1 D-cache 14 during the next cycle. Although register 51 is physically located in L1 cache chip 13, no L1 latency is associated with storing data from the L2 cache 15 therein, i.e. there is no address translation or the like. Thus, the data from L2 cache 15 and memory 17 bypasses L1 cache 14”], in order to provide increased speed and efficiency [col. 2, lines 51-63].

The modified Lichtenau does not explicitly disclose the vector reverse instruction having a parallel (P) bit that specifies that a subsequent instruction is to be executed in parallel with the vector reverse instruction.
	However, Heishi discloses the vector reverse instruction having a parallel (P) bit that specifies that a subsequent instruction is to be executed in parallel with the vector reverse instruction [paragraph 82; figs. 1A-1D; “The 0th bit of each instruction indicates parallel execution boundary information. When the parallel execution boundary information is ‘1’, there exists a boundary of parallel execution between the instruction and the subsequent instructions. When the parallel execution boundary information is ‘0’, there exists no boundary of parallel execution. How to use the parallel execution boundary information will be described later”; paragraphs 108-109; figs. 4-5C; parallel execution boundary information for instructions 1, 3, 4, 6, 7, 8 are "0"; The execution unit 70 executes in parallel the instructions up to the instruction whose parallel execution boundary information is 1, hence an instruction including 0th bit (parallel bit) being in a first state of ‘0’ is executed in parallel with a subsequent instruction. Figures 5A-5C show instructions 1 and 2 being executed in parallel (2 is subsequent to 1) and th bit of ‘0’ which indicates it is to be executed in parallel with subsequent instruction 2. Instruction 2 has a 0th bit of ‘1’ which indicates a boundary and therefore no parallel execution with the subsequent instruction. Instructions 3, 4, and 5 are executed in parallel with instruction 3 having a 0th bit of ‘0’ which indicates it is to be executed in parallel with the subsequent instruction 4. Instruction 4 has a 0th bit of ‘0’ which indicates it is to be executed in parallel with instruction 5. Instruction 5 has a 0th bit of ‘1’ which indicates a boundary and therefore no parallel execution with the subsequent instruction], in order to provide parallel processing capability with low power consumption [paragraph 8].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the processor of the modified Lichtenau to provide parallel processing capability with low power consumption. It is for this reason one of ordinary skill in the art would have been motivated to implement the vector reverse instruction having a parallel (P) bit that specifies that a subsequent instruction is to be executed in parallel with the vector reverse instruction.
Referring to claims 15 and 16, taking claim 15 as exemplary, the modified Lichtenau discloses the method of claim 1, wherein the vector reverse instruction is a fixed-length instruction [Anderson, paragraphs 5, 72, 75, “a fixed word length instruction set architecture requires fewer hardware resources to decode”; “CPU 110 operates on an instruction pipeline. Instructions are fetched in instruction packets of fixed length”; “The preferred embodiment employs a fixed 32-bit instruction length”].
Referring to claims 17 and 18, taking claim 17 as exemplary, the modified Lichtenau discloses the method of claim 1, wherein the vector reverse instruction is a 32-bit instruction [Anderson, paragraph 75, “The preferred embodiment employs a fixed 32-bit instruction length”].
Referring to claims 19 and 20, taking claim 19 as exemplary, the modified Lichtenau discloses the method of claim 1, wherein the processor is a very long instruction word (VLIW) processor [Anderson, paragraph 7, “In a very long instruction word (VLIW) central processing unit instructions are grouped into execute packets that execute in parallel”].
Claims 2-6 and 9-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lichtenau, in view of Anderson, in view of Shippy, in view of  Heishi, as applied to claims 1 and 8 above, and further in view of Corbal et al (hereinafter Corbal), U.S. Publication No. 2016/0179522 A1.
Referring to claims 2 and 9, taking claim 2 as exemplary, the modified Lichtenau does not explicitly disclose the method of claim 1, wherein the source data comprises a 512-bit vector.
However, Corbal discloses wherein the source data comprises a 512-bit vector [paragraphs 150, 157, SRC1 comprises 512-bit vector register], in order to provide a vector friendly instruction format without being tied to any specific instruction set [paragraph 34].
One of ordinary skill in the art before the effective filing date of the claimed invention would have clearly recognized that it is quite advantageous for the method of the modified Lichtenau to provide a vector friendly instruction format without being tied 
Referring to claims 3 and 10, taking claim 3 as exemplary, the modified Lichtenau discloses the method of claim 2, wherein the plurality of lanes of the source register comprise 8-bit lanes [Lichtenau, paragraphs 35, 73, “For instance, bytes (or other data units) within a data element are reversed or data elements themselves are reversed”; “Other sizes are also possible including, but not limited to”; Corbal, fig. 16, IMM=8].
Referring to claims 4 and 11, taking claim 4 as exemplary, the modified Lichtenau discloses the method of claim 2, wherein the plurality of lanes of the source register comprise 16-bit lanes [Lichtenau, paragraphs 35, 72-73, Halfword].
Referring to claims 5 and 12, taking claim 5 as exemplary, the modified Lichtenau discloses the method of claim 2, wherein the plurality of lanes of the source register comprise 32-bit lanes [Lichtenau, paragraphs 35, 72-73, word].
Referring to claims 6 and 13, taking claim 6 as exemplary, the modified Lichtenau discloses the method of claim 2, wherein the plurality of lanes of the source register comprise 64-bit lanes [Lichtenau, paragraphs 35, 72-73, Doubleword].

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARLEY J ABAD whose telephone number is (571)270-3425.  The examiner can normally be reached on M-Th 6:30 - 3:00 PM; Fri 7:30 - 4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Idriss Alrobaye can be reached on (571) 270-1023.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private 






/Farley Abad/Primary Examiner, Art Unit 2181