DETAILED ACTION
Status of Claims 
Claims 1-22 have been considered. It is hereby acknowledged that the following papers have been received and placed of record in the file:
Applicant Remarks 						-Receipt Date 06/22/2022
Amended Claims 						-Receipt Date 06/22/2022

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/22/2022 has been entered.
 
Response to Amendment
This office action is in response to the amendment filed on 06/22/2022. Claims 1-22 are pending. Claims 1, 7-8, 15-16, 18, and 22 are amended.

Response to Arguments
Applicant's arguments filed 06/22/2022 have been fully considered but they are not persuasive. 
Applicant submits:
“[…] However, Adrian does not teach or suggest that the control bytes (presumably equated to the claimed "control words") are based on a length of each of the operands (i.e., the data elements). 
In fact, Adrian merely describes that each shorter vector length is half the vector length of the preceding vector length. However, Adrian is silent on the aspect of any relation between the control bytes and the length of each of the operands (i.e., the data elements). 
Further, the Advisory Action (in the Continuation Sheet (PTOL-303)) states that Adrian allegedly discloses "the length of the data used by the vector permute instruction is based on the length specified by the vector length field 159B in the opcode of the instruction." In response, Applicant respectfully submits that Adrian nowhere through its disclosure teaches or suggests that the vector permute instruction is based on a length of each of the operands. “ (Remarks, page 10)
	However, the argument that “Adrian is silent on any relation between the control bytes and the length of each of the operands” is not persuasive because Applicant does not appear to consider that Adrian teaches that the vector permute instruction of Fig. 14 has the instruction format shown in Fig. 1B which includes the vector length field 159B in the opcode, as evidenced by [0026] which discloses that each instruction of the ISA may be expressed by an instruction template such as the template shown in Fig. 1B, see also [0029]. Since the field of the opcode 159B indicates the length of the registers used by the instruction, i.e. whether the instruction uses zmm/ymm/xmm having respective lengths, see [0096]-[0097], Adrian teaches a relation between a length (field 159B) and the registers used by the instruction- the registers used by the instruction are of the length specified by field 159B. Since the claimed “control signals” are mapped to the signals indicating the control register (as well as the immediate data) used by the vector permute instruction, which has a length defined by field 159B of the vector permute instruction, the control signals are based on a length; and since 159B defines the length of the registers used by the vector permute instruction, including its source registers zmm1 and zmm31, the length specified by 159B is a length of each source operand. 

	Applicant submits:
“Furthermore, Applicant respectfully submits that one of ordinary skill in the art cannot arrive at the above-mentioned feature of amended independent claim 1 because of the technical advancement provided by the invention utilizing at least the above- mentioned feature. The Applicant's as-filed specification, in paragraph [025], recites: "generate ... a set of control words used to enable provision of multiple results in one execution cycle for supply to the execution circuit ... The controls words control the pair merge circuit 442 that rearranges the data elements from the two inputs into the desired arrangement for the two results" (emphasis added). Thus, Applicant respectfully submits that the control words being based on the length of each of the source operands enable multiple results in one execution cycle, which is not taught or suggested by Adrian.”
	However, this argument is not persuasive because the Office Action does not rely on one of ordinary skill in the art “arriving” at the above-mentioned feature of the control signals being based on a length since the vector length field 159B is explicitly taught in Adrian and the claims do not require providing multiple results in one execution cycle.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites “permute said data units” at line 16, it is unclear whether this refers to the “plurality of data units” introduced at line 5 or the “data units” introduced at line 14, for purposes of examination this limitation will be interpreted as referring to the data units introduced at line 14.
Claims 8 and 16 recite similar limitations and are rejected for similar reasons
The dependent claims are rejected based on their dependence from rejected base claims. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-22 are rejected under 35 U.S.C. 103 as being unpatentable over San Adrian et al. US 2016/0188530 (hereinafter, Adrian) in view of Knowles et al. US 7 ,933,405 (hereinafter, Knowles).
Regarding claim 1, Adrian teaches:	
1. A method of executing instructions in a processor, said method comprising: 
fetching an instruction ([0165]: the method fetches a vector permute instruction) comprising an opcode and a set of source operands comprising one or more source operands ([0157]-[0158]: the vector permute instruction comprises an opcode VPERMBI and a set of source operands zmm1 and zmm31), wherein said set of source operands comprises a plurality of data units ([0157]: the set of source operands includes source data elements/units in zmm1); 
decoding said opcode to generate a decoded opcode and a set of control signals ([0148]: a decoder decodes the vector permute instruction to generate a decoded opcode/uops; Fig 1B: the vector permute instruction of the class B template shown in Fig. 1B has vector length field 159B and data element width field 164; [0096]-[0097]: field 159B in the opcode indicates the length of the vector registers used by the instruction such that the instruction will reference 64 byte zmm registers, 32 byte ymm registers, or 16 byte xmm registers based on the vector length field 159B; [0041]: field 164 in the opcode indicates the width of the data elements used by the instruction; when the decoder decodes the vector permute instruction, it will interpret the bits in the opcode which will indicate how to interpret the remaining bits in the instruction, this will include generating control signals that indicate the control register, i.e. as a zmm/ymm/xmm register according to 159B, included in the remaining bits in the vector permute instruction, generating control signals that indicate the width of the data elements in the registers included in the remaining bits in the vector permute instruction, and generating control signals indicating the immediate included in the remaining bits in the vector permute instruction; further, the signals indicating the contents of the control register are also part of the claimed control signals since the control register name must first be generated from the decoding of the opcode, in this way the decoding of the opcode is done in order “to generate” signals indicating the contents of the control register), wherein said set of control signals is based on a length of each of said set of source operands ([0096]-[0097]: the control signals indicating the vector registers, including the control register zmm31 used by the vector permute instruction, are based on vector length field 159B which indicates whether to use zmm/ymm/xmm registers, i.e. a which is also an indication of a length since zmm/ymm/xmm have different lengths, and the length indicated by 159B is also a length of the source operand vector zmm1 and control register zmm31 of the vector permute instruction since 159B indicates zmm/ymm/xmm registers for all the registers in the instruction, see also [0028]-[0029] describing that an embodiment of the invention may use only vector operations with a vector friendly instruction format which may be the class B instruction format shown in Fig. 1B having the vector field 159B and in this embodiment the vector permute instruction will have the vector field 159B);
sending said decoded opcode, said set of control signals, and said plurality of data units to an execution circuit ([0148] and Fig. 13: the uops, control register and immediate, and data elements in zmm1 are sent to the vector permute logic 1300 which is an execution circuit); 
said decoded opcode in combination with said set of control signals controlling said execution circuit to: 
use said set of control signals to select data units from said set of source operands ([0153]-[0154]: the control register and immediate of the decoded instruction are used to select data elements from the source register);
permute said data units to produce permutation results ([0153]-[0154]: the data elements selected by the match between the control register and immediate are permuted into a destination register); and 
output said permutation results to an output register ([0153]-[0154]: the output of the permute instruction is stored in the destination register 1315)
	Although Adrian teaches outputting the permutation results to an output register ([0153]), Adrian does not teach outputting its permutation results to a plurality of output registers. That is, Adrian does not explicitly teach:
output permutation results to output registers 
	However, Knowles teaches:
output permutation results to output registers (col 6 lines 14-23 and col 9 lines 33-45: during a shuffle, which is a type of permutation, data elements of two input registers are shuffled and the output is contained in two registers).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the permute instructions of Adrian to support different classes of permute operations, including a shuffle as taught by Knowles, such that the permute instruction of Adrian would shuffle its input data elements and output them to output registers. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction (Knowles col 6 lines 14-23). 

	Regarding claim 2, Adrian in view of Knowles teaches:
2. The method of Claim 1, wherein said permutation results comprise a first permutation result and a second permutation result (Adrian [0154] and Fig. 13: B3 and B4 in destination 1315 are first and second permutation results), and wherein said set of control signals comprises: 
a first signal operable to control said execution circuit to select a first set of data units from said plurality of data units for said first permutation result (Adrian [0154]: the index value selecting permute byte B3 is a first signal that controls the permute logic to select B3 from the plurality of data elements); and 
a second signal operable to control said execution circuit to select a second set of data units from said plurality of data units for said second permutation result (Adrian [0154] and Fig. 13: the index value selective permute byte B4 is a second signal that controls the permute logic to select B4 from the plurality of data elements).

	Regarding claim 3, Adrian in view of Knowles teaches:
3. The method of Claim 2, wherein said set of control signals further comprises a third signal indicating correspondences between each selected data unit comprised in a permutation result and a source operand in said set of source operands (Adrian [0153]-[0154: the least significant bits of the immediate is a third signal that indicates the correspondence between each selected data unit in the permutation result in destination 1315 and the source 1305).

	Regarding claim 4, Adrian in view of Knowles teaches: 
4. The method of Claim 1, wherein each data unit of said plurality of data units is a data byte (Adrian [0147]: the vector instruction permutes vector bytes)
	Adrian in view of Knowles, as currently mapped, does not teach:
wherein said set of source operands comprises two source operands, wherein each source operand comprises at least two data units, and wherein each of said permutation results comprises selected data bytes from both of said two source operands.
	However, Knowles further teaches:
wherein said set of source operands comprises two source operands (Knowles col 9 lines 33-45: the two vector registers storing data elements to be shuffled are two source operands, see also Fig. 5B), wherein each source operand comprises at least two data units (Knowles col 9 lines 33-45: each source vector register includes four data units), and wherein each of said permutation results comprises selected data bytes from both of said two source operands (Knowles col 9 lines 33-45: the permutation results from the shuffle includes selected data bytes from both of the source operands).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include to compute the permutation result from two source operands as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to support shuffling input registers to enable broader application of the permute instruction (Knowles col 6 lines 14-23).

	Regarding claim 5, Adrian in view of Knowles teaches: 
5. The method of Claim 1, wherein said set of source operands comprises a single source operand (Adrian [0153]: the source operands include source register 1305) 
	Although Adrian further teaches that the source register may comprise 512 bit vector registers capable of supporting data elements of 64 bits (Adrian [0155]), Adrian does not teach:
the single source operand divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes 
	However, Knowles teaches:
an input divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes (col 9 lines 33-45: the input in Fig. 5B is divided into upper byte 7-4 and lower bytes 3-0 stored in two 64-bit registers and the permutation results includes bytes selected from the upper and lower bytes)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to divide its 512-bit vector into an upper and lower half to support a permute shuffle as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction by supporting select and shuffle permutation (Knowles col 6 lines 14-23) while also efficiently using register space. 

	Regarding claim 6, Adrian in view of Knowles teaches: 
6. The method of Claim 1, wherein said instruction is one of a load instruction, a store instruction, a floating point instruction, a single-instruction-multiple-data (SIMD) vector instruction, and an SIMD permute instruction (Adrian [0165]: the instruction is a vector/SIMD permute instruction).

	Regarding claim 7, Adrian in view of Knowles teaches:
7. The method of Claim 1, wherein said set of control signals is further based on at least one of: a size of vector elements and a length of each of said permutation results (Adrian [0041] and [0148]: field 164 of the opcode is a size of vector elements used by the instruction; when the opcode of the vector permute instruction is decoded, control signals are generated based on field 164 to indicate the width/size of the data/vector elements in the registers included in the remaining bits in the vector permute instruction; the width indicated by 164 is also a length of each of the permutation results in destination register 1315 shown in Fig. 13 since 164 indicates the width of data elements used by the vector permute instruction and 1315 includes data elements that are permuted by the vector permute instruction).

	Regarding claim 8, Adrian teaches:
8. A processor configured to execute instructions and fetch an instruction comprising an opcode ([0026] and [0165]: a vector permute instruction is fetched, the instruction format includes an opcode), said processor comprising: 
one or more first registers configured to store a set of source operands of said instruction, wherein said set of source operands comprises one or more source operands and comprises a plurality of data units ([0156] and [0165]: a vector permute instruction is fetched, the instruction format includes an opcode and source vector registers zmm1 and zmm31, i.e. first registers to store one or more source operands, the source vector registers stores/comprises data elements/units); 
a second register ([0153]: multiple permuted data elements are output to destination register 1305); 
a decoder configured to decode said opcode to generate a decoded opcode and a set of control signals ([0148]: a decoder decodes the vector permute instruction to generate a decoded opcode/uops; Fig 1B: the vector permute instruction of the class B template shown in Fig. 1B has vector length field 159B and data element width field 164; [0096]-[0097]: field 159B in the opcode indicates the length of the vector registers used by the instruction such that the instruction will reference 64 byte zmm registers, 32 byte ymm registers, or 16 byte xmm registers based on the vector length field 159B; [0041]: field 164 in the opcode indicates the width of the data elements used by the instruction; when the decoder decodes the vector permute instruction, it will interpret the bits in the opcode which will indicate how to interpret the remaining bits in the instruction, this will include generating control signals that indicate the control register, as a zmm/ymm/xmm register according to 159B, included in the remaining bits in the vector permute instruction, generating control signals that indicate the width of the data elements in the registers included in the remaining bits in the vector permute instruction, and generating control signals indicating the immediate included in the remaining bits in the vector permute instruction; further, the signals indicating the contents of the control register are also part of the claimed control signals since the control register name must first be generated from the decoding of the opcode, in this way the decoding of the opcode is done in order “to generate” signals indicating the contents of the control register), wherein said set of control signals is based on a length of each of said set of source operands ([0096]-[0097]: the control signals indicating the vector registers, including the control register zmm31 used by the vector permute instruction, are based on vector length field 159B which indicates whether to use zmm/ymm/xmm registers, i.e. a which is also an indication of a length since zmm/ymm/xmm have different lengths, and the length indicated by 159B is also a length of the source operand vector zmm1 and control register zmm31 of the vector permute instruction since 159B indicates zmm/ymm/xmm registers for all the registers in the instruction, see also [0028]-[0029] describing that an embodiment of the invention may use only vector operations with a vector friendly instruction format which may be the class B instruction format shown in Fig. 1B having the vector field 159B and in this embodiment the vector permute instruction will have the vector field 159B); and 
an execution circuit configured to, in response to said decoded opcode and under control of said set of control signals: 
use said set of control signals to select data units from said set of source operands ([0153]-[0154]: the control register and immediate of the decoded instruction are used to select data elements from the source register);
permute said data units to generate permutation results ([0153]-[0154]: the data elements selected by the match between the control register and immediate are permuted into a destination register); and 
output said permutation results to said second register ([0153]-[0154]: the output of the permute instruction is stored in the destination register 1315).
Although Adrian teaches outputting the permutation results to an output register ([0153]), Adrian does not teach outputting its permutation results to a plurality of output registers. That is, Adrian does not explicitly teach:
output permutation results to second registers
	However, Knowles teaches:
output permutation results to second registers (col 6 lines 14-23 and col 9 lines 33-45: during a shuffle, which is a type of permutation, data elements of two input registers are shuffled and the output is contained in two registers).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the permute instructions of Adrian to support different classes of permute operations, including a shuffles as taught by Knowles, such that the permute instruction of Adrian would shuffle its input data elements and output them to output registers. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction (Knowles col 6 lines 14-23).

	Regarding claim 9, Adrian in view of Knowles teaches: 
9. The processor of Claim 8, wherein said permutation results comprise a first permutation result and a second permutation result (Adrian [0154] and Fig. 13: B3 and B4 in destination 1315 are first and second permutation results), and wherein said set of control signals comprises: 
a first signal operable to control said execution circuit to select a first set of data units from said plurality of data units for said first permutation result (Adrian [0154]: the index value selecting permute byte B3 is a first signal that controls the permute logic to select B3 from the plurality of data elements); and 
a second signal operable to control said execution circuit to select a second set of data units from said plurality of data units for said second permutation result (Adrian [0154] and Fig. 13: the index value selective permute byte B4 is a second signal that controls the permute logic to select B4 from the plurality of data elements).

	Regarding claim 10, Adrian in view of Knowles teaches:
10. The processor of Claim 9, wherein said set of control signals comprises a third signal indicating correspondences between each selected data unit comprised in a permutation result and a source operand in said set of source operands (Adrian [0153]-[0154: the least significant bits of the immediate is a third signal that indicates the correspondence between each selected data unit in the permutation result in destination 1315 and the source 1305).

	Regarding claim 11, Adrian in view of Knowles teaches:
11. The processor of Claim 8, wherein each data unit of said plurality of data units is a data byte (Adrian [0147]: the vector instruction permutes vector bytes)
	Adrian in view of Knowles, as currently mapped, does not teach:
wherein said set of source operands comprises two source operands, wherein each source operand comprises at least two data units, and wherein each of said permutation results comprises selected data bytes from both of said two source operands.
	However, Knowles further teaches:
wherein said set of source operands comprises two source operands (Knowles col 9 lines 33-45: the two vector registers storing data elements to be shuffled are two source operands, see also Fig. 5B), wherein each source operand comprises at least two data units (Knowles col 9 lines 33-45: each source vector register includes four data units), and wherein each of said permutation results comprises selected data bytes from both of said two source operands (Knowles col 9 lines 33-45: the permutation results from the shuffle includes selected data bytes from both of the source operands).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include to compute the permutation result from two source operands as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to support shuffling input registers to enable broader application of the permute instruction (Knowles col 6 lines 14-23).


	Regarding claim 12, Adrian in view of Knowles teaches:
12. The processor of Claim 8, wherein said set of source operands comprises a single source operand (Adrian [0153]: the source operands include source register 1305) 
	Although Adrian further teaches that the source register may comprise 512 bit vector registers capable of supporting data elements of 64 bits (Adrian [0155]), Adrian does not teach:
the single source operand divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes 
	However, Knowles teaches:
an input divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes (col 9 lines 33-45: the input in Fig. 5B is divided into upper byte 7-4 and lower bytes 3-0 stored in two 64-bit registers and the permutation results includes bytes selected from the upper and lower bytes)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to divide its 512-bit vector into an upper and lower half to support a permute shuffle as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction by supporting select and shuffle permutation (Knowles col 6 lines 14-23) while also efficiently using register space. 

	Regarding claim 13, Adrian in view of Knowles teaches:
13. The processor of Claim 8, wherein said instruction is one of a load instruction, a store instruction, a floating point instruction, a single-instruction-multiple-data (SIMD) vector instruction, and an SIMD permute instruction (Adrian [0165]: the instruction is a vector/SIMD permute instruction).

	Regarding claim 14, Adrian in view of Knowles teaches: 
14. The processor of Claim 8, 
	Adrian in view of Knowles, as currently mapped, does not teach:
wherein said execution circuit comprises: a plurality of multiplexers; and a permutation switching fabric (Knowles col 11 lines 38-57: column multiplexer stage includes a plurality of multiplexers and operand crossbar switch stage is a permutation switching fabric).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include the multiplexers and crossbar switch taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification because multiplexers and crossbars are known techniques on the known device of a computer processor for selecting data and would yield the predictable result of efficiently enabling the selection of data. 

	Regarding claim 15, Adrian in view of Knowles teaches: 
15. The processor of Claim 8, wherein said set of control signals are further based on at least one of: a size of vector elements and a length of each of said permutation results (Adrian [0041] and [0148]: field 164 of the opcode is a size of vector elements used by the instruction; when the opcode of the vector permute instruction is decoded, control signals are generated based on field 164 to indicate the width/size of the data/vector elements in the registers included in the remaining bits in the vector permute instruction; the width indicated by 164 is also a length of each of the permutation results in destination register 1315 shown in Fig. 13 since 164 indicates the width of data elements used by the vector permute instruction and 1315 includes data elements that are permuted by the vector permute instruction).

	Regarding claim 16, Adrian teaches:
16. A system comprising: 
a memory (Fig. 4B memory unit 470); and
a processor coupled to said memory (Fig. 4B processor 490 is coupled to memory unit 470) and configured to fetch an instruction comprising an opcode from said memory ([0026] and [0165]: ]: a vector permute instruction is fetched, the instruction format includes an opcode), said processor comprising: 
one or more first registers configured to store a set of source operands of said instruction, wherein said set of source operands comprises one or more source operands, said one or more source operands comprising a plurality of data units ([0156] and [0165]: a vector permute instruction is fetched, the instruction format includes an opcode and source vector registers zmm1 and zmm31, i.e. first registers to store one or more source operands, the source vector registers stores/comprises data elements/units), wherein said set of control signals is based on a length of each of said set of source operands ([0096]-[0097]: the control signals indicating the vector registers, including the control register zmm31 used by the vector permute instruction, are based on vector length field 159B which indicates whether to use zmm/ymm/xmm registers, i.e. a which is also an indication of a length since zmm/ymm/xmm have different lengths, and the length indicated by 159B is also a length of the source operand vector zmm1 and control register zmm31 of the vector permute instruction since 159B indicates zmm/ymm/xmm registers for all the registers in the instruction, see also [0028]-[0029] describing that an embodiment of the invention may use only vector operations with a vector friendly instruction format which may be the class B instruction format shown in Fig. 1B having the vector field 159B and in this embodiment the vector permute instruction will have the vector field 159B);
a second register ([0153]: multiple permuted data elements are output to destination register 1315);
a decoder configured to decode said opcode to generate a decoded opcode and a set of control signals ([0148]: a decoder decodes the vector permute instruction to generate a decoded opcode/uops; Fig 1B: the vector permute instruction of the class B template shown in Fig. 1B has vector length field 159B and data element width field 164; [0096]-[0097]: field 159B in the opcode indicates the length of the vector registers used by the instruction such that the instruction will reference 64 byte zmm registers, 32 byte ymm registers, or 16 byte xmm registers based on the vector length field 159B; [0041]: field 164 in the opcode indicates the width of the data elements used by the instruction; when the decoder decodes the vector permute instruction, it will interpret the bits in the opcode which will indicate how to interpret the remaining bits in the instruction, this will include generating control signals that indicate the control register, as a zmm/ymm/xmm register according to 159B, included in the remaining bits in the vector permute instruction, generating control signals that indicate the width of the data elements in the registers included in the remaining bits in the vector permute instruction, and generating control signals indicating the immediate included in the remaining bits in the vector permute instruction; further, the signals indicating the contents of the control register are also part of the claimed control signals since the control register name must first be generated from the decoding of the opcode, in this way the decoding of the opcode is done in order “to generate” signals indicating the contents of the control register); and 
an execution circuit comprising a pair merge unit (Fig. 13, 1300) and configured to, in response to said decoded opcode and under control of said set of control signals: 
use said set of control signals to select data units from said set of source operands ([0153]-[0154]: the control register and immediate of the decoded instruction are used to select data elements from the source register);
permute said selected data units to generate permutation results ([0153]-[0154]: the data elements selected by the match between the control register and immediate are permuted into a destination register); and
output said permutation results to said second register ([0153]-[0154]: the output of the permute instruction is stored in the destination register 1315), wherein each of said permutation results comprises a combination of said data units from said plurality of data units of said set of source operands ([0153]-[0154]: the permutation results include selected data elements from the source register).
	Although Adrian teaches outputting the permutation results to an output register ([0153]), Adrian does not teach outputting its permutation results to a plurality of output registers. That is, Adrian does not explicitly teach: 
output permutation results to second registers
	However, Knowles teaches:
output permutation results to second registers (col 6 lines 14-23 and col 9 lines 33-45: during a shuffle, which is a type of permutation, data elements of two input registers are shuffled and the output is contained in two registers).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the permute instructions of Adrian to support different classes of permute operations, including a shuffles as taught by Knowles, such that the permute instruction of Adrian would shuffle its input data elements and output them to output registers. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction (Knowles col 6 lines 14-23). 

	Regarding claim 17, Adrian in view of Knowles teaches:
17. The system of Claim 16, wherein said permutation results comprise a first permutation result and a second permutation result (Adrian [0154] and Fig. 13: B3 and B4 in destination 1315 are first and second permutation results), and wherein said set of control signals comprises: 
a first signal operable to control said execution circuit to select a first set of data units from said plurality of data units for said first permutation result (Adrian [0154]: the index value selecting permute byte B3 is a first signal that controls the permute logic to select B3 from the plurality of data elements); and 
a second signal operable to control said execution circuit to select a second set of data units from said plurality of data units for said second permutation result (Adrian [0154] and Fig. 13: the index value selective permute byte B4 is a second signal that controls the permute logic to select B4 from the plurality of data elements).

Regarding claim 18, Adrian in view of Knowles teaches:
18. The system of Claim 17, wherein said set of control signals are further based on at least one of a size of vector elements and a length of each of said permutation results (Adrian [0041] and [0148]: field 164 of the opcode is a size of vector elements used by the instruction; when the opcode of the vector permute instruction is decoded, control signals are generated based on field 164 to indicate the width/size of the data/vector elements in the registers included in the remaining bits in the vector permute instruction; the width indicated by 164 is also a length of each of the permutation results in destination register 1315 shown in Fig. 13 since 164 indicates the width of data elements used by the vector permute instruction and 1315 includes data elements that are permuted by the vector permute instruction), and wherein said set of control signals comprises a third signal indicating correspondences between each selected data unit comprised in a permutation result and a source operand in said set of source operands ((Adrian [0153]-[0154: the least significant bits of the immediate is a third signal that indicates the correspondence between each selected data unit in the permutation result in destination 1315 and the source 1305).

	Regarding claim 19, Adrian in view of Knowles teaches:
19. The system of Claim 16, wherein each data unit of said plurality of data units is a data byte (Adrian [0147]: the vector instruction permutes vector bytes)
	Adrian in view of Knowles, as currently mapped, does not teach:
wherein said set of source operands comprises two source operands, wherein each source operand comprises at least two data units, and wherein each of said permutation results comprises selected data bytes from both of said two source operands.
	However, Knowles further teaches:
wherein said set of source operands comprises two source operands (Knowles col 9 lines 33-45: the two vector registers storing data elements to be shuffled are two source operands, see also Fig. 5B), wherein each source operand comprises at least two data units (Knowles col 9 lines 33-45: each source vector register includes four data units), and wherein each of said permutation results comprises selected data bytes from both of said two source operands (Knowles col 9 lines 33-45: the permutation results from the shuffle includes selected data bytes from both of the source operands).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include to compute the permutation result from two source operands as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to support shuffling input registers to enable broader application of the permute instruction (Knowles col 6 lines 14-23).

	Regarding claim 20, Adrian in view of Knowles teaches:
20. The system of Claim 16, wherein said set of source operands comprises a single source operand (Adrian [0153]: the source operands include source register 1305) 
	Although Adrian further teaches that the source register may comprise 512 bit vector registers capable of supporting data elements of 64 bits (Adrian [0155]), Adrian does not teach:
the single source operand divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes 
	However, Knowles teaches:
an input divided into a set of upper bytes and a set of lower bytes, and wherein each of said permutation results comprises selected data bytes from said set of upper bytes and said set of lower bytes (col 9 lines 33-45: the input in Fig. 5B is divided into upper byte 7-4 and lower bytes 3-0 stored in two 64-bit registers and the permutation results includes bytes selected from the upper and lower bytes)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to divide its 512-bit vector into an upper and lower half to support a permute shuffle as taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification to enable broader application of the permute instruction by supporting select and shuffle permutation (Knowles col 6 lines 14-23) while also efficiently using register space. 

Regarding claim 21, Adrian in view of Knowles teaches: 
21. The system of Claim 16, 
	Adrian in view of Knowles, as currently mapped, does not teach:
wherein said execution circuit comprises: a plurality of multiplexers; and a permutation switching fabric (Knowles col 11 lines 38-57: column multiplexer stage includes a plurality of multiplexers and operand crossbar switch stage is a permutation switching fabric).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Adrian to include the multiplexers and crossbar switch taught by Knowles. One of ordinary skill in the art would have been motivated to make this modification because multiplexers and crossbars are known techniques on the known device of a computer processor for selecting data and would yield the predictable result of efficiently enabling the selection of data. 

	Regarding claim 22, Adrian in view of Knowles teaches:
22. The system of Claim 16, wherein said set of control signals are further based on at least one of: a size of vector elements and a length of each of said permutation results (Adrian [0041] and [0148]: field 164 of the opcode is a size of vector elements used by the instruction; when the opcode of the vector permute instruction is decoded, control signals are generated based on field 164 to indicate the width/size of the data/vector elements in the registers included in the remaining bits in the vector permute instruction; the width indicated by 164 is also a length of each of the permutation results in destination register 1315 shown in Fig. 13 since 164 indicates the width of data elements used by the vector permute instruction and 1315 includes data elements that are permuted by the vector permute instruction).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476. The examiner can normally be reached Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KASIM ALLI/Examiner, Art Unit 2183                                                                                                                                                                                                        
/JYOTI MEHTA/Supervisory Patent Examiner, Art Unit 2182