DETAILED ACTION
Status of Claims 
Claims 1-22 have been considered. It is hereby acknowledged that the following papers have been received and placed of record in the file:
Applicant Remarks 						-Receipt Date 02/10/2021
Amended Claims 						-Receipt Date 02/10/2021

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/10/2021 has been entered.
 
Response to Amendment
This office action is in response to the amendment filed on 02/10/2021. Claims 1-22 are pending. Claims 1-2, 4-5, 7-9, 11-12, 15-20, and 22 are amended. 

Response to Arguments
Applicant's arguments filed 02/10/2021 have been fully considered but they are not persuasive. 

“Adrian merely describes decoding the vector permute instructions to generate the plurality of micro-operations (uops) that are executed to output the permutation results. However, Adrian nowhere teaches or suggests that the data elements for permutation are selected by a set of control signals that are generated by decoding the opcode, where the set of control signals are used to control an execution unit to select data units from the set of source operands using the control signals and permute the data units selected by the set of control signals to generate permutation results, in the manner as claimed in amended independent claim 1.” (Remarks, page 10)
	However, this argument is not persuasive. The current Office Action maps the control register and immediate value specified in the permute instruction as the claimed control signals, see [0153]. In decoding the permute instruction that indicates the control register and immediate values, the control signals are generated. These values are used to control the execution unit to select data elements from the source operand as described in Adrian at [0153]-[0154] and shown in Fig. 13 of Adrian as the permute control and immediate being provided to the vector permute logic. 

	Applicant submits:
“even if the source registers of 512-bit of Adrian are combined with the shuffle permutation of Knowles, still the combination of Adrian and Knowles will, at best, teach performing the shuffle permutation of the data elements in the source registers to output the permutation results. However, the combination of Adrian and Knowles nowhere teaches or suggests dividing a data element into a set of upper bytes and a set of lower bytes, where the permutation results comprise selected data bytes from the set of upper bytes and the set of lower bytes.” (Remarks, page 12)
. 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 12 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 12 recites “wherein said set of source operands comprises a single source operand divided into a set of upper bytes and a set of lower bytes”, however, this claim depends from claim 8 which recites “first registers configured to store a set of source operands of said instruction”. There is 
Claim 20 recites similar limitations to claim 12 and depends from claim 16 which recites similar limitations to claim 8 and is rejected for similar reasons. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-22 are rejected under 35 U.S.C. 103 as being unpatentable over San Adrian et al. US 2016/0188530 (hereinafter, Adrian) in view of Knowles et al. US 7 ,933,405 (hereinafter, Knowles) and Ando US 5,684,983.
Regarding claim 1, Adrian teaches:	
1. A method of executing instructions in a processor, said method comprising: 
fetching an instruction ([0165]: the method fetches a vector permute instruction) comprising an opcode and a set of source operands comprising one or more source operands ([0157]: the vector permute instruction comprises an opcode VPERMBI and a set of source operands), wherein said set of source operands comprises a plurality of data units ([0157]: the set of source operands includes source data elements/units); 
decoding said opcode to generate a decoded opcode and a set of control signals ([0148]: a decoder decodes the vector permute instruction to generate a decoded opcode/uops and a set of control signals, i.e. the control register and immediate included in the permute instruction, see also [0157]); 
sending said decoded opcode, said set of control signals, and said plurality of data units to an execution circuit ([0148]: the uops, control register and immediate, and data elements are sent to the execution logic, see also Fig. 13); 
said decoded opcode in combination with said set of control signals controlling said execution circuit to: 
use said control signals to select data units from said set of source operands ([0153]-[0154]: the control register and immediate of the decoded instruction are used to select data elements from the source register);
permute said data units selected by said set of control signals ([0153]-[0154]: the data elements selected by the match between the control register and immediate are permuted into a destination register); and 
output permutation results to an output register ([0153]-[0154]: the output of the permute instruction is stored in the destination register 1315)
	Although Adrian teaches outputting the permutation results to an output register ([0153]), Adrian does not teach outputting its permutation results to a plurality of output registers in a single cycle. That is, Adrian does not explicitly teach:
output permutation results to output registers in a single execution cycle
	However, Knowles teaches:
output permutation results to output registers (col 6 lines 14-23 and col 9 lines 33-45: during a shuffle, which is a type of permutation, data elements of two input registers are shuffled and the output is contained in two registers).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the permute instructions of Adrian to support different classes of permute operations, including a shuffles as taught by Knowles, such that the permute instruction of Adrian would shuffle its input data elements and output them to output registers. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction (Knowles col 6 lines 14-23). 
	Further, Ando teaches:
output results to output registers in a single execution cycle (col 9 lines 6-26: two results are written to two registers in one cycle).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to the vector permute logic of Adrian in view of Knowles to write its output registers in a single execution cycle as taught by Ando. One of ordinary skill in the art would have been motivated to make this modification because writing to destination registers in parallel is a known technique on the known device of a computer processor for outputting results of instructions and would yield the predictable result of reducing execution time with respect to the outputting being done in series. 

	Regarding claim 2, Adrian in view of Knowles and Ando teaches:
2. The method of Claim 1, wherein said permutation results comprise a first permutation result and a second permutation result (Adrian [0154] and Fig. 13: B3 and B4 in destination 1315 are first and second permutation results), and wherein said set of control signals comprises: 
a first signal operable to control said execution circuit to select a first set of data units from said plurality of data units for said first permutation result (Adrian [0154]: the index value selecting permute byte B3 is a first signal that controls the permute logic to select B3 from the plurality of data elements); and 
a second signal operable to control said execution circuit to select a second set of data units from said plurality of data units for said second permutation result (Adrian [0154] and Fig. 13: the index value selective permute byte B4 is a second signal that controls the permute logic to select B4 from the plurality of data elements).

	Regarding claim 3, Adrian in view of Knowles and Ando teaches:
3. The method of Claim 2, wherein said set of control signals further comprises a third signal indicating correspondences between each selected data unit comprised in a permutation result and a source operand in said set of source operands (Adrian [0153]-[0154: the least significant bits of the immediate is a third signal that indicates the correspondence between each selected data unit in the permutation result in destination 1315 and the source 1305).

	Regarding claim 4, Adrian in view of Knowles and Ando teaches: 
4. The method of Claim 1, wherein each data unit of said plurality of data units is a data byte (Adrian [0147]: the vector instruction permutes vector bytes)
	Adrian in view of Knowles and Ando, as currently mapped, does not teach:
wherein said set of source operands comprises two source operands, wherein each source operand comprises at least two data units, and wherein each of said permutation results comprises selected data bytes from both of said two source operands.

wherein said set of source operands comprises two source operands (Knowles col 9 lines 33-45: the two vector registers storing data elements to be shuffled are two source operands, see also Fig. 5B), wherein each source operand comprises at least two data units (Knowles col 9 lines 33-45: each source vector register includes four data units), and wherein each of said permutation results comprises selected data bytes from both of said two source operands (Knowles col 9 lines 33-45: the permutation results from the shuffle includes selected data bytes from both of the source operands).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include to compute the permutation result from two source operands as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to support shuffling input registers to enable broader application of the permute instruction (Knowles col 6 lines 14-23).

	Regarding claim 5, Adrian in view of Knowles and Ando teaches: 
5. The method of Claim 1, wherein said set of source operands comprises a single source operand (Adrian [0153]: the source operands include source register 1305) 
	Although Adrian further teaches that the source register may comprise 512 bit vector registers capable of supporting data elements of 64 bits (Adrian [0155]), Adrian does not teach:
the single source operand divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes 
	However, Knowles teaches:
an input divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes (col 9 lines 33-45: the input in Fig. 5B is divided into upper byte 7-4 and lower bytes 3-0 stored in two 64-bit registers and the permutation results includes bytes selected from the upper and lower bytes)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to divide its 512-bit vector into an upper and lower half to support a permute shuffle as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction by supporting select and shuffle permutation (Knowles col 6 lines 14-23) while also efficiently using register space. 

	Regarding claim 6, Adrian in view of Knowles and Ando teaches: 
6. The method of Claim 1, wherein said instruction is one of a load instruction, a store instruction, a floating point instruction, a single-instruction-multiple-data (SIMD) vector instruction, and an SIMD permute instruction (Adrian [0165]: the instruction is a vector/SIMD permute instruction).

	Regarding claim 7, Adrian in view of Knowles and Ando teaches:
7. The method of Claim 1, wherein said set of control signals comprises a set of control words generated based on at least one of: said opcode; a size of vector elements; a length of each of said set of source operands; and a length of each of said permutation results (Adrian [0154]: the control bytes are a set of control words generated based on a length of the source operands since the index bits of the control bytes are used to index the source operands).


8. A processor configured to execute instructions and fetch an instruction comprising an opcode ([0026] and [0165]: a vector permute instruction is fetched, the instruction format includes an opcode), said processor comprising: 
first registers configured to store a set of source operands of said instruction, wherein said set of source operands comprises one or more source operands and comprises a plurality of data units ([0156] and [0165]: a vector permute instruction is fetched, the instruction format includes an opcode and source vector registers, i.e. first registers to store one or more source operands, the source vector registers stores/comprises data elements/units); 
a second register ([0153]: multiple permuted data elements are output to destination register 1305); 
a decoder configured to decode said opcode to generate a decoded opcode and a set of control signals ([0148]: a decoder decodes the vector permute instruction to generate a decoded opcode/uops and a set of control signals, i.e. the control register and immediate included in the permute instruction, see also [0157]); and 
an execution circuit configured to, in response to said decoded opcode and under control of said set of control signals: 
use said control signals to select data units from said set of source operands ([0153]-[0154]: the control register and immediate of the decoded instruction are used to select data elements from the source register);
permute said data units selected by said set of control signals to generate permutation results ([0153]-[0154]: the data elements selected by the match between the control register and immediate are permuted into a destination register); and 
output said permutation results to said second register ([0153]-[0154]: the output of the permute instruction is stored in the destination register 1315).
Although Adrian teaches outputting the permutation results to an output register ([0153]), Adrian does not teach outputting its permutation results to a plurality of output registers in a single cycle. That is, Adrian does not explicitly teach:
output permutation results to second registers in a single execution cycle
	However, Knowles teaches:
output permutation results to second registers (col 6 lines 14-23 and col 9 lines 33-45: during a shuffle, which is a type of permutation, data elements of two input registers are shuffled and the output is contained in two registers).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the permute instructions of Adrian to support different classes of permute operations, including a shuffles as taught by Knowles, such that the permute instruction of Adrian would shuffle its input data elements and output them to output registers. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction (Knowles col 6 lines 14-23).
Further, Ando teaches:
output results to output registers in a single execution cycle (col 9 lines 6-26: two results are written to two registers in one cycle).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to the vector permute logic of Adrian in view of Knowles to write its output registers in a single execution cycle as taught by Ando. One of ordinary skill in the art would have been motivated to make this modification because writing to destination registers in parallel is a known 

	Regarding claim 9, Adrian in view of Knowles and Ando teaches: 
9. The processor of Claim 8, wherein said permutation results comprise a first permutation result and a second permutation result (Adrian [0154] and Fig. 13: B3 and B4 in destination 1315 are first and second permutation results), and wherein said set of control signals comprises: 
a first signal operable to control said execution circuit to select a first set of data units from said plurality of data units for said first permutation result (Adrian [0154]: the index value selecting permute byte B3 is a first signal that controls the permute logic to select B3 from the plurality of data elements); and 
a second signal operable to control said execution circuit to select a second set of data units from said plurality of data units for said second permutation result (Adrian [0154] and Fig. 13: the index value selective permute byte B4 is a second signal that controls the permute logic to select B4 from the plurality of data elements).

	Regarding claim 10, Adrian in view of Knowles and Ando teaches:
10. The processor of Claim 9, wherein said set of control signals comprises a third signal indicating correspondences between each selected data unit comprised in a permutation result and a source operand in said set of source operands (Adrian [0153]-[0154: the least significant bits of the immediate is a third signal that indicates the correspondence between each selected data unit in the permutation result in destination 1315 and the source 1305).

	Regarding claim 11, Adrian in view of Knowles and Ando teaches:
11. The processor of Claim 8, wherein each data unit of said plurality of data units is a data byte (Adrian [0147]: the vector instruction permutes vector bytes)
	Adrian in view of Knowles and Ando, as currently mapped, does not teach:
wherein said set of source operands comprises two source operands, wherein each source operand comprises at least two data units, and wherein each of said permutation results comprises selected data bytes from both of said two source operands.
	However, Knowles further teaches:
wherein said set of source operands comprises two source operands (Knowles col 9 lines 33-45: the two vector registers storing data elements to be shuffled are two source operands, see also Fig. 5B), wherein each source operand comprises at least two data units (Knowles col 9 lines 33-45: each source vector register includes four data units), and wherein each of said permutation results comprises selected data bytes from both of said two source operands (Knowles col 9 lines 33-45: the permutation results from the shuffle includes selected data bytes from both of the source operands).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include to compute the permutation result from two source operands as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to support shuffling input registers to enable broader application of the permute instruction (Knowles col 6 lines 14-23).



12. The processor of Claim 8, wherein said set of source operands comprises a single source operand (Adrian [0153]: the source operands include source register 1305) 
	Although Adrian further teaches that the source register may comprise 512 bit vector registers capable of supporting data elements of 64 bits (Adrian [0155]), Adrian does not teach:
the single source operand divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes 
	However, Knowles teaches:
an input divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes (col 9 lines 33-45: the input in Fig. 5B is divided into upper byte 7-4 and lower bytes 3-0 stored in two 64-bit registers and the permutation results includes bytes selected from the upper and lower bytes)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to divide its 512-bit vector into an upper and lower half to support a permute shuffle as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction by supporting select and shuffle permutation (Knowles col 6 lines 14-23) while also efficiently using register space. 

	Regarding claim 13, Adrian in view of Knowles and Ando teaches:
13. The processor of Claim 8, wherein said instruction is one of a load instruction, a store instruction, a floating point instruction, a single-instruction-multiple-data (SIMD) vector instruction, and an SIMD permute instruction (Adrian [0165]: the instruction is a vector/SIMD permute instruction).

	Regarding claim 14, Adrian in view of Knowles and Ando teaches: 
14. The processor of Claim 8, 
	Adrian in view of Knowles and Ando, as currently mapped, does not teach:
wherein said execution circuit comprises: a plurality of multiplexers; and a permutation switching fabric (Knowles col 11 lines 38-57: column multiplexer stage includes a plurality of multiplexers and operand crossbar switch stage is a permutation switching fabric).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include the multiplexers and crossbar switch taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification because multiplexers and crossbars are known techniques on the known device of a computer processor for selecting data and would yield the predictable result of efficiently enabling the selection of data. 

	Regarding claim 15, Adrian in view of Knowles and Ando teaches: 
15. The processor of Claim 8, wherein said set of control signals comprises a set of control words generated based on at least one of: said opcode; a size of vector elements; a length of each of said set of source operands; and a length of each of said multiple permutation results (Adrian [0154]: the control bytes are a set of control words generated based on a length of the source operands since the index bits of the control bytes are used to index the source operands).

	Regarding claim 16, Adrian teaches:
16. A system comprising: 
a memory (Fig. 4B memory unit 470); and
a processor coupled to said memory (Fig. 4B processor 490 is coupled to memory unit 470) and configured to fetch an instruction comprising an opcode from said memory ([0026] and [0165]: ]: a vector permute instruction is fetched, the instruction format includes an opcode), said processor comprising: 
first registers configured to store a set of source operands of said instruction, wherein said set of source operands comprises one or more source operands, said one or more source operands comprising a plurality of data units ([0156] and [0165]: a vector permute instruction is fetched, the instruction format includes an opcode and source vector registers, i.e. first registers to store one or more source operands, the source vector registers stores/comprises data elements/units);
a second register ([0153]: multiple permuted data elements are output to destination register 1315);
a decoder configured to decode said opcode to generate a decoded opcode and a set of control signals ([0148]: a decoder decodes the vector permute instruction to generate a decoded opcode/uops and a set of control signals, i.e. the control register and immediate included in the permute instruction, see also [0157]); and 
an execution circuit comprising a pair merge unit (Fig. 13, 1300) and configured to, in response to said decoded opcode and under control of said set of control signals: 
use said control signals to select data units from said set of source operands ([0153]-[0154]: the control register and immediate of the decoded instruction are used to select data elements from the source register);
permute said data units selected by said set of control signals to generate permutation results ([0153]-[0154]: the data elements selected by the match between the control register and immediate are permuted into a destination register); and
output said permutation results to said second register ([0153]-[0154]: the output of the permute instruction is stored in the destination register 1315).), wherein each of said permutation results comprises a combination of said selected data units from said plurality of data units of said set of source operands ([0153]-[0154]: the permutation results include selected data elements from the source register).
	Although Adrian teaches outputting the permutation results to an output register ([0153]), Adrian does not teach outputting its permutation results to a plurality of output registers in a single cycle. That is, Adrian does not explicitly teach: 
output permutation results to second registers in a single execution cycle
	However, Knowles teaches:
output permutation results to second registers (col 6 lines 14-23 and col 9 lines 33-45: during a shuffle, which is a type of permutation, data elements of two input registers are shuffled and the output is contained in two registers).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the permute instructions of Adrian to support different classes of permute operations, including a shuffles as taught by Knowles, such that the permute instruction of 
Further, Ando teaches:
output results to output registers in a single execution cycle (col 9 lines 6-26: two results are written to two registers in one cycle).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to the vector permute logic of Adrian in view of Knowles to write its output registers in a single execution cycle as taught by Ando. One of ordinary skill in the art would have been motivated to make this modification because writing to destination registers in parallel is a known technique on the known device of a computer processor for outputting results of instructions and would yield the predictable result of reducing execution time with respect to the outputting being done in series. 

	Regarding claim 17, Adrian in view of Knowles and Ando teaches:
17. The system of Claim 16, wherein said permutation results comprise a first permutation result and a second permutation result (Adrian [0154] and Fig. 13: B3 and B4 in destination 1315 are first and second permutation results), and wherein said set of control signals comprises: 
a first signal operable to control said execution circuit to select a first set of data units from said plurality of data units for said first permutation result (Adrian [0154]: the index value selecting permute byte B3 is a first signal that controls the permute logic to select B3 from the plurality of data elements); and 
a second signal operable to control said execution circuit to select a second set of data units from said plurality of data units for said second permutation result (Adrian [0154] and Fig. 13: the index value selective permute byte B4 is a second signal that controls the permute logic to select B4 from the plurality of data elements).

Regarding claim 18, Adrian in view of Knowles and Ando teaches:
18. The system of Claim 17, wherein said set of control signals comprises a set of control words generated based on said opcode (Adrian [0148] and [0157]: the control register specified in the instruction is generated based on decoding of the instruction), a size of vector elements, a length of each of said set of source operands (Adrian [0153]-[0154]: the number of bits in the control byte to index the source operands is generated based on a size of vector elements and a number/length of the set of source operands, i.e. the index is 6 bits to select from 64 elements based on the size of the vector elements being 8 bits and the length of the source operand being 512 bits,), and a length of each of said multiple permutation results (Adrian [0154]: the number of control bytes are generated based on the number of permutation results), and wherein said set of control signals comprises a third signal indicating correspondences between each selected data unit comprised in a permutation result and a source operand in said set of source operands ((Adrian [0153]-[0154: the least significant bits of the immediate is a third signal that indicates the correspondence between each selected data unit in the permutation result in destination 1315 and the source 1305).

	Regarding claim 19, Adrian in view of Knowles and Ando teaches:
19. The system of Claim 16, wherein each data unit of said plurality of data units is a data byte (Adrian [0147]: the vector instruction permutes vector bytes)
	Adrian in view of Knowles and Ando, as currently mapped, does not teach:
wherein said set of source operands comprises two source operands, wherein each source operand comprises at least two data units, and wherein each of said permutation results comprises selected data bytes from both of said two source operands.
	However, Knowles further teaches:
wherein said set of source operands comprises two source operands (Knowles col 9 lines 33-45: the two vector registers storing data elements to be shuffled are two source operands, see also Fig. 5B), wherein each source operand comprises at least two data units (Knowles col 9 lines 33-45: each source vector register includes four data units), and wherein each of said permutation results comprises selected data bytes from both of said two source operands (Knowles col 9 lines 33-45: the permutation results from the shuffle includes selected data bytes from both of the source operands).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include to compute the permutation result from two source operands as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to support shuffling input registers to enable broader application of the permute instruction (Knowles col 6 lines 14-23).

	Regarding claim 20, Adrian in view of Knowles and Ando teaches:
20. The system of Claim 16, wherein said set of source operands comprises a single source operand (Adrian [0153]: the source operands include source register 1305) 
	Although Adrian further teaches that the source register may comprise 512 bit vector registers capable of supporting data elements of 64 bits (Adrian [0155]), Adrian does not teach:
the single source operand divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes 
	However, Knowles teaches:
an input divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes (col 9 lines 33-45: the input in Fig. 5B is divided into upper byte 7-4 and lower bytes 3-0 stored in two 64-bit registers and the permutation results includes bytes selected from the upper and lower bytes)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to divide its 512-bit vector into an upper and lower half to support a permute shuffle as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction by supporting select and shuffle permutation (Knowles col 6 lines 14-23) while also efficiently using register space. 

Regarding claim 21, Adrian in view of Knowles and Ando teaches: 
21. The system of Claim 16, 
	Adrian in view of Knowles and Ando, as currently mapped, does not teach:
wherein said execution circuit comprises: a plurality of multiplexers; and a permutation switching fabric (Knowles col 11 lines 38-57: column multiplexer stage includes a plurality of multiplexers and operand crossbar switch stage is a permutation switching fabric).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include the multiplexers and crossbar switch taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification because multiplexers and crossbars are known techniques on the known device of a computer processor for 

	Regarding claim 22, Adrian in view of Knowles and Ando teaches:
22. The system of Claim 16, wherein said set of control signals comprises a set of control words generated based on at least one of: said opcode; a size of vector elements; a length of each of said set of source operands; and a length of each of said multiple permutation results (Adrian [0154]: the control bytes are a set of control words generated based on a length of the source operands since the index bits of the control bytes are used to index the source operands).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476.  The examiner can normally be reached on Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 5712724169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






                                                                                                                                                                                                     /KASIM ALLI/Examiner, Art Unit 2183