DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on June 14, 2022, has been entered.
  
Claims 1-20 are pending in this office action and presented for examination. Claims 1, 4, 6, 10, 13, and 15 are newly amended by the response received June 14, 2022.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-12, and 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zbiciak et al. (Zbiciak) (US 20170168898 A1) in view of Moyer ‘534 (US 20050055534 A1) in view of Ould-Ahmed-Vall et al. (Ould-Ahmed-Vall) (US 20130275730 A1) in view of Moyer ‘332 (US 20090327332 A1).
Consider claim 1, Zbiciak discloses a method to store source data ([0099], lines 2-3, store instructions send the address and data to memory) in a processor ([0052], line 3, processor 100) in response to a vector store instruction ([0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits), wherein the vector store instruction specifies: a first source register containing the source data ([0081], line 1, global vector register file 231), wherein the first source register comprises a plurality of lanes ([0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; [0139], lines 1-7, vectors consist of equal-sized lanes, each lane containing a sub-element. The central processing unit core 110 designates the rightmost lane of the vector as lane 0, regardless of device's current endian mode. Lane numbers increase right-to-left. The actual number of lanes within a vector varies depending on the length of the vector and the data size of the sub-element) and each lane contains a first respective data element of a set of data elements having an associated index value ([0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; [0139], lines 1-7, vectors consist of equal-sized lanes, each lane containing a sub-element. The central processing unit core 110 designates the rightmost lane of the vector as lane 0, regardless of device's current endian mode. Lane numbers increase right-to-left. The actual number of lanes within a vector varies depending on the length of the vector and the data size of the sub-element); a second source register containing address data ([0070], lines 11-16, D1/D2 local register file 214 will generally store base and offset addresses used in address calculations for the corresponding loads and stores. The two operands are each recalled from an instruction specified register in either global scalar register file 211 or D1/D2 local register file 214), wherein the method comprises: executing the vector store instruction ([0099], lines 2-3, store instructions send the address and data to memory) by: storing the source data in contiguous locations in a memory beginning at a location determined based on the address data ([0099], lines 2-3, store instructions send the address and data to memory), wherein the index values of the first respective data elements are determined based on a position of the data elements in the first source register ([0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; [0139], lines 1-7, vectors consist of equal-sized lanes, each lane containing a sub-element. The central processing unit core 110 designates the rightmost lane of the vector as lane 0, regardless of device's current endian mode. Lane numbers increase right-to-left. The actual number of lanes within a vector varies depending on the length of the vector and the data size of the sub-element)).
However, Zbiciak does not disclose that the vector store instruction is a bit-reversed vector store instruction, wherein executing the bit-reversed vector store instruction is by: creating reordered source data by, for each lane, replacing the first respective data element in the lane with a second respective data element of the set of data elements having a bit-reversed index value relative to the associated index value of the first respective data element; and storing the reordered source data in contiguous locations in a memory beginning at a location determined based on the address data, wherein the index values of the first respective data elements are determined based on a position of the data elements in the first source register and an opcode of the bit-reversed vector store instruction. Zbiciak also does not disclose the bit-reversed vector store instruction specifies a number of bits in each of the plurality of lanes.
On the other hand, Moyer ‘534 discloses storing elements in a bit-reversed manner by creating reordered source data by replacing a first respective data element with a second respective data element of a set of data elements having a bit-reversed index value relative to an associated index value of the first respective data element; and storing the reordered source data in contiguous locations in a memory ([0131], lines 1-10, similarly, the stmvex_ fft instruction can be used to store the elements in a bit reversed fashion to memory. For example, the stmvex_ fft instruction, with a radix of 8, can be used to store the bit reversed X elements from R1 and R2 into memory at locations 0.times.16-0.times.24, such that the elements in memory are not bit reversed as compared to those in R1 and R2. Similarly, the stmvex_ fft instruction can be used to store the sequential Y elements from R4 and R5 into memory at locations 0.times.44-0.times.52, such that the elements in memory are bit reversed compared to those in R4 and R5) beginning at a location determined based on address data ([0059], lines 8-9, the address of the first element to be stored is pointed to by the register rA).
Moyer ‘534’s teaching is useful with fast Fourier transforms (Moyer ‘534, [0127], lines 1-2).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Moyer ‘534 with the invention of Zbiciak in view of the usefulness of the teaching of Moyer ‘534 in performing fast Fourier transforms. Alternatively, the aforementioned modification merely entails applying a known technique (Moyer ‘534’s teaching of storing elements in a bit-reversed manner) to a known device (method, or product) ready for improvement (the invention of Zbiciak, as cited above) to yield predictable results (the invention of Zbiciak, supporting the storing of elements in a bit-reversed manner), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that Moyer ‘534’s teaching above, when applied to the invention of Zbiciak wherein the source register comprises a plurality of lanes, results in the overall claimed limitation of a bit-reversed vector store instruction, wherein the method comprises: executing the bit-reversed vector store instruction by: creating reordered source data by, for each lane, replacing the first respective data element in the lane with a second respective data element of the set of data elements having a bit-reversed index value relative to the associated index value of the first respective data element; and storing the reordered source data in contiguous locations in a memory beginning at a location determined based on the address data.
However, the combination thus far does not entail the bit-reversed vector store instruction specifies a number of bits in each of the plurality of lanes, and the index values are determined based on an opcode of the bit-reversed vector store instruction. 
On the other hand, Ould-Ahmed-Vall discloses an instruction specifying a number of bits in each of a plurality of lanes, and index values are determined based on an opcode of the instruction ([0065], lines 2-4, The VPERMW instruction accepts a 512 bit input vector as a first input operand 401_I. The 512 bit input vector is viewed as having thirty two 16 bit data values (words); [0066], lines 1-4, According to the logical operation of the VPERMW instruction, each element in the resultant vector 403_I is filled with any one of the thirty two elements in the input vector 401_I; [0071], lines 2-4, The VPERMD instruction accepts a 512 bit input vector as a first input operand 401_J. The 512 bit input vector is viewed as having sixteen 32 bit data values; [0072], lines 1-3, According to the logical operation of the VPERMD instruction, each element in the resultant vector 403_J is filled with any one of the sixteen elements in the input vector 401_J; in other words, a W or D in the instruction specifies whether 16 or 32 bits are in each of a plurality of lanes, and the index values are determined based on whether the opcode reflects a W or a D). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ould-Ahmed-Vall with the combination of Zbiciak and Moyer ‘534 in order to increase the capability and flexibility of the processor by supporting different sized vector elements. Alternatively, the aforementioned modification merely entails combining prior art elements (Ould-Ahmed-Vall’s teaching of specifying vector element size by an opcode, and the combination of Zbiciak and Moyer ‘534 as cited above) according to known methods (Ould-Ahmed-Vall explicitly discloses the well-known method of specifying a vector element size by an opcode, as cited) to yield predictable results (the combination of Zbiciak and Moyer ‘534 as cited above, wherein the bit-reversed vector store instruction supports different vector element sizes), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143.
Regarding Moyer ‘534’s teaching in Figures 34-35 and paragraphs [0129] (which relates to the bit reversal of the cited paragraph [0131], Examiner submits that one of ordinary skill in the art before the effective filing date of the claimed invention would readily recognize that a typos have been made in the explanation of the bit-reversal: “X0 X4 X6 X2 X1 X5 X3 X7” and “Y0 Y4 Y6 Y2 Y1 Y5 Y3 Y7” have been disclosed, rather than “X0 X4 X2 X6 X1 X5 X3 X7” and “Y0 Y4 Y2 Y6 Y1 Y5 Y3 Y7”. Examiner submits that it is clear that typos have been made: Moyer ‘534 teaches this sequence of indices in the context of bit reversal as cited, and “010” (corresponding to index=2), when its bits are reversed, is “010” (corresponding to index=2). Similarly, “011” (corresponding to index=3), when its bits are reversed, is “110” (corresponding to index=6). As such, the element at index=2 (i.e. the third element of the vector) should be X2 (and Y2), and the element at index=3 (i.e. the fourth element of the vector) should be X6 (and Y6). Nevertheless, Moyer ‘332 explicitly discloses that the bit-reversed order of “0, 1, 2, 3, 4, 5, 6, and 7” is “0, 4, 2, 6, 1, 5, 3, and 7” ([0004], lines 1-11, Many types of filtering algorithms, such as in digital signal processing (DSP) applications, utilize buffers to hold sets of input samples and computed output samples from a set of filtering operations, such as Fast Fourier Transform (FFT) filters. These filters are typically accessed in a bit-reversed fashion to obtain the data and store outputs in a predetermined order which corresponds to the natural order of computations. For example, for an 8 element FFT buffer having elements 0, 1, 2, 3, 4, 5, 6, and 7 stored in a linear order, the bit-reversed order in which they need to be accessed is elements 0, 4, 2, 6, 1, 5, 3, and 7.) As such, to any possible extent to which it may be argued that the aforementioned portions of Moyer ‘534 do not reflect typos, it would have been obvious for the reordered source data to reflect the “0, 4, 2, 6, 1, 5, 3, and 7” order as explicitly taught, sans typos, by Moyer ‘332, as this modification merely entails simple substitution of one known element for another to obtain predictable results, which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. (Also, such a modification results in correct execution that dovetails with the bit-reversal operation being performed.)

Consider claim 2, the overall combination entails the source data comprises a 512-bit vector (Zbiciak, [0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; Ould-Ahmed-Vall, [0065], lines 2-3, 512 bit input vector; Ould-Ahmed-Vall, [0071], lines 2-3, 512 bit input vector).

Consider claim 3, the overall combination entails the lanes of the first source register comprise 32-bit lanes (Zbiciak, [0141], lines 1-12, FIG. 21 illustrates a second example of lane allocation in a vector. Vector 2100 is divided into 16 32-bit lanes (16.times.32 bits=512 bits the vector length). Lane 0 includes bits 0 to 31; line 1 includes bits 32 to 63; lane 2 includes bits 64 to 95; lane 3 includes bits 96 to 127; lane 4 includes bits 128 to 159; lane 5 includes bits 160 to 191; lane 6 includes bits 192 to 223; lane 7 includes bits 224 to 255; lane 8 includes bits 256 to 287; line 9 occupied bits 288 to 319; lane 10 includes bits 320 to 351; lane 11 includes bits 352 to 387; lane 12 includes bits 388 to 415; lane 13 includes bits 416 to 447; lane 14 includes bits 448 to 479; and lane 15 includes bits 480 to 511; Ould-Ahmed-Vall, [0071], lines 2-4, The VPERMD instruction accepts a 512 bit input vector as a first input operand 401_J. The 512 bit input vector is viewed as having sixteen 32 bit data values).

Consider claim 5, the overall combination entails the lanes of the first source register comprise 64-bit lanes (Zbiciak, [0140], lines 1-7, FIG. 20 illustrates a first example of lane allocation in a vector. Vector 2000 is divided into 8 64-bit lanes (8.times.64 bits=512 bits the vector length). Lane 0 includes bits 0 to 63; line 1 includes bits 64 to 125; lane 2 includes bits 128 to 191; lane 3 includes bits 192 to 255, lane 4 includes bits 256 to 319, lane 5 includes bits 320 to 383, lane 6 includes bits 384 to 447 and lane 7 includes bits 448 to 511; Ould-Ahmed-Vall, [0077], lines 2-6, the VPERMQ instruction accepts a first 512 bit input vector as a first input operand 401_K and accepts a second 512 bit input vector as a second input operand (not shown). Both of the 512 bit input vectors are viewed as having eight 64 bit data values).

Consider claim 6, the overall combination entails the index values of the data elements are 0-7 and an order of the first respective data elements in the source data is given by: 0, 1, 2, 3, 4, 5, 6, 7; and wherein an order of the second respective data elements in the reordered source data is given by: 0, 4, 2, 6, 1, 5, 3, 7 (Zbiciak, [0140], lines 1-7, FIG. 20 illustrates a first example of lane allocation in a vector. Vector 2000 is divided into 8 64-bit lanes (8.times.64 bits=512 bits the vector length). Lane 0 includes bits 0 to 63; line 1 includes bits 64 to 125; lane 2 includes bits 128 to 191; lane 3 includes bits 192 to 255, lane 4 includes bits 256 to 319, lane 5 includes bits 320 to 383, lane 6 includes bits 384 to 447 and lane 7 includes bits 448 to 511; Moyer ‘534, Figures 34-35 and paragraphs [0129]; see the explanation regarding the typo in the rejection of the independent claim; Moyer ‘332, [0004], lines 1-11, Many types of filtering algorithms, such as in digital signal processing (DSP) applications, utilize buffers to hold sets of input samples and computed output samples from a set of filtering operations, such as Fast Fourier Transform (FFT) filters. These filters are typically accessed in a bit-reversed fashion to obtain the data and store outputs in a predetermined order which corresponds to the natural order of computations. For example, for an 8 element FFT buffer having elements 0, 1, 2, 3, 4, 5, 6, and 7 stored in a linear order, the bit-reversed order in which they need to be accessed is elements 0, 4, 2, 6, 1, 5, 3, and 7.)

Consider claim 7, the overall combination entails specifying, in a field of the bit-reversed vector store instruction, a third source register containing offset data, wherein the contiguous locations in the memory begin at a location specified by the address data and the offset data (Zbiciak, [0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; [0070], lines 11-13, D1/D2 local register file 214 will generally store base and offset addresses used in address calculations for the corresponding loads and stores).

Consider claim 8, the overall combination entails the memory comprises a level 1 data cache (Zbiciak, [0052], line 4, level one data cache (L1D) 123).

Consider claim 9, the overall combination entails the source data comprises an output of a fast Fourier transform computation (Moyer ‘534, [0127], lines 1-2, instruction that may be used with Fast Fourier Transforms (FFTs); [0128], lines 1-2, store multiple vector elements FFT (stmvex_fft); [0129], lines 4-7, for example, for FFT, it is known that data in the order of X.sub.0, X.sub.1, X.sub.2, X.sub.3, X.sub.4, X.sub.5, X.sub.6, and X.sub.7 is "bit reversed" into the order of X.sub.0, X.sub.4, X.sub.6, X.sub.2, X.sub.1, X.sub.5, X.sub.3, X.sub.7 for certain FFT calculations; [0131], lines 1-10, similarly, the stmvex_fft instruction can be used to store the elements in a bit reversed fashion to memory. For example, the stmvex_fft instruction, with a radix of 8, can be used to store the bit reversed X elements from R1 and R2 into memory at locations 0.times.16-0.times.24, such that the elements in memory are not bit reversed as compared to those in R1 and R2. Similarly, the stmvex_fft instruction can be used to store the sequential Y elements from R4 and R5 into memory at locations 0.times.44-0.times.52, such that the elements in memory are bit reversed compared to those in R4 and R5; Moyer '332, [0004], lines 1-11, Many types of filtering algorithms, such as in digital signal processing (DSP) applications, utilize buffers to hold sets of input samples and computed output samples from a set of filtering operations, such as Fast Fourier Transform (FFT) filters. These filters are typically accessed in a bit-reversed fashion to obtain the data and store outputs in a predetermined order which corresponds to the natural order of computations. For example, for an 8 element FFT buffer having elements 0, 1, 2, 3, 4, 5, 6, and 7 stored in a linear order, the bit-reversed order in which they need to be accessed is elements 0, 4, 2, 6, 1, 5, 3, and 7).

Consider claim 10, Zbiciak discloses a data processor ([0052], line 3, processor 100), comprising: an execution unit ([0096], line 2, execution unit); a first source register coupled to the execution unit ([0081], line 1, global vector register file 231) and configured to contain source data ([0099], lines 2-3, store instructions send the address and data to memory); and a second source register coupled to the execution unit and configured to contain address data ([0070], lines 11-16, D1/D2 local register file 214 will generally store base and offset addresses used in address calculations for the corresponding loads and stores. The two operands are each recalled from an instruction specified register in either global scalar register file 211 or D1/D2 local register file 214); wherein the first source register comprises a plurality of lanes ([0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; [0139], lines 1-7, vectors consist of equal-sized lanes, each lane containing a sub-element. The central processing unit core 110 designates the rightmost lane of the vector as lane 0, regardless of device's current endian mode. Lane numbers increase right-to-left. The actual number of lanes within a vector varies depending on the length of the vector and the data size of the sub-element) and each lane is configured to contain a first respective data element of a set of data elements having an associated index value ([0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; [0139], lines 1-7, vectors consist of equal-sized lanes, each lane containing a sub-element. The central processing unit core 110 designates the rightmost lane of the vector as lane 0, regardless of device's current endian mode. Lane numbers increase right-to-left. The actual number of lanes within a vector varies depending on the length of the vector and the data size of the sub-element); and wherein, in response to a vector store instruction that specifies the first source register and the second source register ([0099], lines 2-3, store instructions send the address and data to memory), the execution unit is configured to store the source data in contiguous locations in a memory beginning at a location based on the address data of the second source register ([0099], lines 2-3, store instructions send the address and data to memory), wherein the index values of the first respective data elements are determined based on a position of the data elements in the first source register ([0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; [0139], lines 1-7, vectors consist of equal-sized lanes, each lane containing a sub-element. The central processing unit core 110 designates the rightmost lane of the vector as lane 0, regardless of device's current endian mode. Lane numbers increase right-to-left. The actual number of lanes within a vector varies depending on the length of the vector and the data size of the sub-element)).
However, Zbiciak does not disclose that the vector store instruction is a bit-reversed vector store instruction, wherein in response to the bit-reversed vector store instruction, the execution unit is configured to: create reordered source data based on the source data in the first source register by, for each lane of the first source register, replacing the first respective data element in the lane with a second respective data element of the set of data elements having a bit-reversed index value relative to the associated index value of the first respective data element; and store the reordered source data in contiguous locations in a memory beginning at a location based on the address data of the second source register, wherein the index values of the first respective data elements are determined based on a position of the data elements in the first source register and an opcode of the bit-reversed vector store instruction. Zbiciak also does not disclose the bit-reversed vector store instruction specifies a number of bits in each of the plurality of lanes.
On the other hand, Moyer ‘534 discloses storing elements in a bit-reversed manner by creating reordered source data based on source data in a first source register by replacing a first respective data element with a second respective data element of a set of data elements having a bit-reversed index value relative to an associated index value of the first respective data element; and storing the reordered source data in contiguous locations in a memory ([0131], lines 1-10, similarly, the stmvex_ fft instruction can be used to store the elements in a bit reversed fashion to memory. For example, the stmvex_ fft instruction, with a radix of 8, can be used to store the bit reversed X elements from R1 and R2 into memory at locations 0.times.16-0.times.24, such that the elements in memory are not bit reversed as compared to those in R1 and R2. Similarly, the stmvex_ fft instruction can be used to store the sequential Y elements from R4 and R5 into memory at locations 0.times.44-0.times.52, such that the elements in memory are bit reversed compared to those in R4 and R5) beginning at a location based on address data of a second source register ([0059], lines 8-9, the address of the first element to be stored is pointed to by the register rA).
Moyer ‘534’s teaching is useful with fast Fourier transforms (Moyer ‘534, [0127], lines 1-2).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Moyer ‘534 with the invention of Zbiciak in view of the usefulness of the teaching of Moyer ‘534 in performing fast Fourier transforms. Alternatively, the aforementioned modification merely entails applying a known technique (Moyer ‘534’s teaching of storing elements in a bit-reversed manner) to a known device (method, or product) ready for improvement (the invention of Zbiciak, as cited above) to yield predictable results (the invention of Zbiciak, supporting the storing of elements in a bit-reversed manner), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that Moyer ‘534’s teaching above, when applied to the invention of Zbiciak wherein the source register comprises a plurality of lanes, results in the overall claimed limitation of a bit-reversed vector store instruction, wherein in response to the bit-reversed vector store instruction, the execution unit is configured to: create reordered source data based on the source data in the first source register by, for each lane of the first source register, replacing the first respective data element in the lane with a second respective data element of the set of data elements having a bit-reversed index value relative to the associated index value of the first respective data element; and store the reordered source data in contiguous locations in a memory beginning at a location based on the address data of the second source register.
However, the combination thus far does not entail the bit-reversed vector store instruction specifies a number of bits in each of the plurality of lanes, and the index values are determined based on an opcode of the bit-reversed vector store instruction. 
On the other hand, Ould-Ahmed-Vall discloses an instruction specifying a number of bits in each of a plurality of lanes, and index values are determined based on an opcode of the instruction ([0065], lines 2-4, The VPERMW instruction accepts a 512 bit input vector as a first input operand 401_I. The 512 bit input vector is viewed as having thirty two 16 bit data values (words); [0066], lines 1-4, According to the logical operation of the VPERMW instruction, each element in the resultant vector 403_I is filled with any one of the thirty two elements in the input vector 401_I; [0071], lines 2-4, The VPERMD instruction accepts a 512 bit input vector as a first input operand 401_J. The 512 bit input vector is viewed as having sixteen 32 bit data values; [0072], lines 1-3, According to the logical operation of the VPERMD instruction, each element in the resultant vector 403_J is filled with any one of the sixteen elements in the input vector 401_J; in other words, a W or D in the instruction specifies whether 16 or 32 bits are in each of a plurality of lanes, and the index values are determined based on whether the opcode reflects a W or a D). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Ould-Ahmed-Vall with the combination of Zbiciak and Moyer ‘534 in order to increase the capability and flexibility of the processor by supporting different sized vector elements. Alternatively, the aforementioned modification merely entails combining prior art elements (Ould-Ahmed-Vall’s teaching of specifying vector element size by an opcode, and the combination of Zbiciak and Moyer ‘534 as cited above) according to known methods (Ould-Ahmed-Vall explicitly discloses the well-known method of specifying a vector element size by an opcode, as cited) to yield predictable results (the combination of Zbiciak and Moyer ‘534 as cited above, wherein the bit-reversed vector store instruction supports different vector element sizes), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143.
Regarding Moyer ‘534’s teaching in Figures 34-35 and paragraphs [0129] (which relates to the bit reversal of the cited paragraph [0131], Examiner submits that one of ordinary skill in the art before the effective filing date of the claimed invention would readily recognize that a typos have been made in the explanation of the bit-reversal: “X0 X4 X6 X2 X1 X5 X3 X7” and “Y0 Y4 Y6 Y2 Y1 Y5 Y3 Y7” have been disclosed, rather than “X0 X4 X2 X6 X1 X5 X3 X7” and “Y0 Y4 Y2 Y6 Y1 Y5 Y3 Y7”. Examiner submits that it is clear that typos have been made: Moyer ‘534 teaches this sequence of indices in the context of bit reversal as cited, and “010” (corresponding to index=2), when its bits are reversed, is “010” (corresponding to index=2). Similarly, “011” (corresponding to index=3), when its bits are reversed, is “110” (corresponding to index=6). As such, the element at index=2 (i.e. the third element of the vector) should be X2 (and Y2), and the element at index=3 (i.e. the fourth element of the vector) should be X6 (and Y6). Nevertheless, Moyer ‘332 explicitly discloses that the bit-reversed order of “0, 1, 2, 3, 4, 5, 6, and 7” is “0, 4, 2, 6, 1, 5, 3, and 7” ([0004], lines 1-11, Many types of filtering algorithms, such as in digital signal processing (DSP) applications, utilize buffers to hold sets of input samples and computed output samples from a set of filtering operations, such as Fast Fourier Transform (FFT) filters. These filters are typically accessed in a bit-reversed fashion to obtain the data and store outputs in a predetermined order which corresponds to the natural order of computations. For example, for an 8 element FFT buffer having elements 0, 1, 2, 3, 4, 5, 6, and 7 stored in a linear order, the bit-reversed order in which they need to be accessed is elements 0, 4, 2, 6, 1, 5, 3, and 7.) As such, to any possible extent to which it may be argued that the aforementioned portions of Moyer ‘534 do not reflect typos, it would have been obvious for the reordered source data to reflect the “0, 4, 2, 6, 1, 5, 3, and 7” order as explicitly taught, sans typos, by Moyer ‘332, as this modification merely entails simple substitution of one known element for another to obtain predictable results, which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. (Also, such a modification results in correct execution that dovetails with the bit-reversal operation being performed.)

Consider claim 11, the overall combination entails the source data comprises a 512-bit vector (Zbiciak, [0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; Ould-Ahmed-Vall, [0065], lines 2-3, 512 bit input vector; Ould-Ahmed-Vall, [0071], lines 2-3, 512 bit input vector).

Consider claim 12, the overall combination entails the lanes of the first source register comprise 32-bit lanes (Zbiciak, [0141], lines 1-12, FIG. 21 illustrates a second example of lane allocation in a vector. Vector 2100 is divided into 16 32-bit lanes (16.times.32 bits=512 bits the vector length). Lane 0 includes bits 0 to 31; line 1 includes bits 32 to 63; lane 2 includes bits 64 to 95; lane 3 includes bits 96 to 127; lane 4 includes bits 128 to 159; lane 5 includes bits 160 to 191; lane 6 includes bits 192 to 223; lane 7 includes bits 224 to 255; lane 8 includes bits 256 to 287; line 9 occupied bits 288 to 319; lane 10 includes bits 320 to 351; lane 11 includes bits 352 to 387; lane 12 includes bits 388 to 415; lane 13 includes bits 416 to 447; lane 14 includes bits 448 to 479; and lane 15 includes bits 480 to 511; Ould-Ahmed-Vall, [0071], lines 2-4, The VPERMD instruction accepts a 512 bit input vector as a first input operand 401_J. The 512 bit input vector is viewed as having sixteen 32 bit data values).

Consider claim 14, the overall combination entails the lanes of the first source register comprise 64-bit lanes (Zbiciak, [0140], lines 1-7, FIG. 20 illustrates a first example of lane allocation in a vector. Vector 2000 is divided into 8 64-bit lanes (8.times.64 bits=512 bits the vector length). Lane 0 includes bits 0 to 63; line 1 includes bits 64 to 125; lane 2 includes bits 128 to 191; lane 3 includes bits 192 to 255, lane 4 includes bits 256 to 319, lane 5 includes bits 320 to 383, lane 6 includes bits 384 to 447 and lane 7 includes bits 448 to 511; Ould-Ahmed-Vall, [0077], lines 2-6, the VPERMQ instruction accepts a first 512 bit input vector as a first input operand 401_K and accepts a second 512 bit input vector as a second input operand (not shown). Both of the 512 bit input vectors are viewed as having eight 64 bit data values).

Consider claim 15, the overall combination entails the index values of the data elements are 0-7 and an order of the first respective data elements in the source data is given by: 0, 1, 2, 3, 4, 5, 6, 7; and wherein an order of the second respective data elements in the reordered source data is given by: 0, 4, 2, 6, 1, 5, 3, 7 (Zbiciak, [0140], lines 1-7, FIG. 20 illustrates a first example of lane allocation in a vector. Vector 2000 is divided into 8 64-bit lanes (8.times.64 bits=512 bits the vector length). Lane 0 includes bits 0 to 63; line 1 includes bits 64 to 125; lane 2 includes bits 128 to 191; lane 3 includes bits 192 to 255, lane 4 includes bits 256 to 319, lane 5 includes bits 320 to 383, lane 6 includes bits 384 to 447 and lane 7 includes bits 448 to 511; Moyer ‘534, Figures 34-35 and paragraphs [0129]; see the explanation regarding the typo in the rejection of the independent claim; Moyer '332, [0004], lines 1-11, Many types of filtering algorithms, such as in digital signal processing (DSP) applications, utilize buffers to hold sets of input samples and computed output samples from a set of filtering operations, such as Fast Fourier Transform (FFT) filters. These filters are typically accessed in a bit-reversed fashion to obtain the data and store outputs in a predetermined order which corresponds to the natural order of computations. For example, for an 8 element FFT buffer having elements 0, 1, 2, 3, 4, 5, 6, and 7 stored in a linear order, the bit-reversed order in which they need to be accessed is elements 0, 4, 2, 6, 1, 5, 3, and 7.)

Consider claim 16, the overall combination entails a third source register containing offset data, wherein the contiguous locations in the memory begin at a location specified by the address data and the offset data (Zbiciak, [0070], lines 7-8, D2 unit 237 is used for vector loads and stores of 512 bits; [0070], lines 11-13, D1/D2 local register file 214 will generally store base and offset addresses used in address calculations for the corresponding loads and stores).

Consider claim 17, the overall combination entails the memory comprises a level 1 data cache (Zbiciak, [0052], line 4, level one data cache (L1D) 123).

Consider claim 18, the overall combination entails the source data comprises an output of a fast Fourier transform computation (Moyer ‘534, [0127], lines 1-2, instruction that may be used with Fast Fourier Transforms (FFTs); [0128], lines 1-2, store multiple vector elements FFT (stmvex_fft); [0129], lines 4-7, for example, for FFT, it is known that data in the order of X.sub.0, X.sub.1, X.sub.2, X.sub.3, X.sub.4, X.sub.5, X.sub.6, and X.sub.7 is "bit reversed" into the order of X.sub.0, X.sub.4, X.sub.6, X.sub.2, X.sub.1, X.sub.5, X.sub.3, X.sub.7 for certain FFT calculations; [0131], lines 1-10, similarly, the stmvex_fft instruction can be used to store the elements in a bit reversed fashion to memory. For example, the stmvex_fft instruction, with a radix of 8, can be used to store the bit reversed X elements from R1 and R2 into memory at locations 0.times.16-0.times.24, such that the elements in memory are not bit reversed as compared to those in R1 and R2. Similarly, the stmvex_fft instruction can be used to store the sequential Y elements from R4 and R5 into memory at locations 0.times.44-0.times.52, such that the elements in memory are bit reversed compared to those in R4 and R5; Moyer '332, [0004], lines 1-11, Many types of filtering algorithms, such as in digital signal processing (DSP) applications, utilize buffers to hold sets of input samples and computed output samples from a set of filtering operations, such as Fast Fourier Transform (FFT) filters. These filters are typically accessed in a bit-reversed fashion to obtain the data and store outputs in a predetermined order which corresponds to the natural order of computations. For example, for an 8 element FFT buffer having elements 0, 1, 2, 3, 4, 5, 6, and 7 stored in a linear order, the bit-reversed order in which they need to be accessed is elements 0, 4, 2, 6, 1, 5, 3, and 7).

Consider claim 19, the overall combination entails the first source register is a vector register (Zbiciak, [0081], line 1, global vector register file 231; Moyer ‘534, Figure 35, which shows each register storing multiple elements; Ould-Ahmed-Vall, [0068], line 4, vector register). 

Consider claim 20, the overall combination entails the first source register is a vector register (Zbiciak, [0081], line 1, global vector register file 231; Moyer ‘534, Figure 35, which shows each register storing multiple elements; Ould-Ahmed-Vall, [0068], line 4, vector register). 

Claims 4 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zbiciak, Moyer ‘534, Ould-Ahmed-Vall, and Moyer '332 as applied to claims 3 and 12 above, and further in view of Szedo et al. (Szedo) (US 8572148 B1).
Consider claim 4, the overall combination of Zbiciak, Moyer ‘534, Ould-Ahmed-Vall, and Moyer '332 entails the index values of the data elements are 0-15 and an order of the first respective data elements in the source data is given by: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15; and wherein an order of the second respective data elements in the reordered source data is given by: 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 (Zbiciak, [0141], lines 1-12, FIG. 21 illustrates a second example of lane allocation in a vector. Vector 2100 is divided into 16 32-bit lanes (16.times.32 bits=512 bits the vector length). Lane 0 includes bits 0 to 31; line 1 includes bits 32 to 63; lane 2 includes bits 64 to 95; lane 3 includes bits 96 to 127; lane 4 includes bits 128 to 159; lane 5 includes bits 160 to 191; lane 6 includes bits 192 to 223; lane 7 includes bits 224 to 255; lane 8 includes bits 256 to 287; line 9 occupied bits 288 to 319; lane 10 includes bits 320 to 351; lane 11 includes bits 352 to 387; lane 12 includes bits 388 to 415; lane 13 includes bits 416 to 447; lane 14 includes bits 448 to 479; and lane 15 includes bits 480 to 511; Moyer ‘534, [0131], lines 1-10, similarly, the stmvex_ fft instruction can be used to store the elements in a bit reversed fashion to memory. For example, the stmvex_fft instruction, with a radix of 8, can be used to store the bit reversed X elements from R1 and R2 into memory at locations 0.times.16-0.times.24, such that the elements in memory are not bit reversed as compared to those in R1 and R2. Similarly, the stmvex_fft instruction can be used to store the sequential Y elements from R4 and R5 into memory at locations 0.times.44-0.times.52, such that the elements in memory are bit reversed compared to those in R4 and R5; note that the bit-reversal teaching of Moyer ‘534 when applied to the embodiment of Zbiciak of 16 32-bit lanes, results in the claimed limitations.)
Nevertheless, to any extent to which the aforementioned combination does not necessarily teach the “0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15” order, Szedo discloses the “0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15” order (col. 1, lines 32-40, bit-reversed order is based on a binary number representation of an index. A 16-point FFT block has 16 data values with indices 0, 1, 2, 3 . . . , 14, 15. These indices are represented in binary as 0000, 0001, 0010, . . . , 1110, 1111. A bit-reversed indexing reverses the order of such bits. So a natural index of 0000, 0001, 0010, . . . , 1110, 1111 has a corresponding bit-reversed order of 0000, 1000, 0100, . . . , 0111, 1111 which translates in decimal to indices of 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15.) As such, to any possible extent to which it may be argued that the overall combination of Zbiciak, Moyer ‘534, Ould-Ahmed-Vall, and Moyer '332 does not explicitly teach the aforementioned sequence, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the reordered source data to reflect the “0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15” order as explicitly taught by Szedo, as this modification merely entails combining prior art elements according to known methods to yield predictable results, which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. (Also, such a modification results in correct execution that dovetails with the bit-reversal operation being performed.)

Consider claim 13, the overall combination of Zbiciak, Moyer ‘534, Ould-Ahmed-Vall, and Moyer '332 entails the index values of the data elements are 0-15 and an order of the initial data elements in the source data is given by: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15; and wherein an order of the data elements in the reordered source data is given by: 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 (Zbiciak, [0141], lines 1-12, FIG. 21 illustrates a second example of lane allocation in a vector. Vector 2100 is divided into 16 32-bit lanes (16.times.32 bits=512 bits the vector length). Lane 0 includes bits 0 to 31; line 1 includes bits 32 to 63; lane 2 includes bits 64 to 95; lane 3 includes bits 96 to 127; lane 4 includes bits 128 to 159; lane 5 includes bits 160 to 191; lane 6 includes bits 192 to 223; lane 7 includes bits 224 to 255; lane 8 includes bits 256 to 287; line 9 occupied bits 288 to 319; lane 10 includes bits 320 to 351; lane 11 includes bits 352 to 387; lane 12 includes bits 388 to 415; lane 13 includes bits 416 to 447; lane 14 includes bits 448 to 479; and lane 15 includes bits 480 to 511; Moyer ‘534, [0131], lines 1-10, similarly, the stmvex_fft instruction can be used to store the elements in a bit reversed fashion to memory. For example, the stmvex_fft instruction, with a radix of 8, can be used to store the bit reversed X elements from R1 and R2 into memory at locations 0.times.16-0.times.24, such that the elements in memory are not bit reversed as compared to those in R1 and R2. Similarly, the stmvex_fft instruction can be used to store the sequential Y elements from R4 and R5 into memory at locations 0.times.44-0.times.52, such that the elements in memory are bit reversed compared to those in R4 and R5; note that the bit-reversal teaching of Moyer ‘534 when applied to the embodiment of Zbiciak of 16 32-bit lanes, results in the claimed limitations.)
Nevertheless, to any extent to which the aforementioned combination does not necessarily teach the “0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15” order, Szedo discloses the “0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15” order (col. 1, lines 32-40, bit-reversed order is based on a binary number representation of an index. A 16-point FFT block has 16 data values with indices 0, 1, 2, 3 . . . , 14, 15. These indices are represented in binary as 0000, 0001, 0010, . . . , 1110, 1111. A bit-reversed indexing reverses the order of such bits. So a natural index of 0000, 0001, 0010, . . . , 1110, 1111 has a corresponding bit-reversed order of 0000, 1000, 0100, . . . , 0111, 1111 which translates in decimal to indices of 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15.) As such, to any possible extent to which it may be argued that the overall combination of Zbiciak, Moyer ‘534, Ould-Ahmed-Vall, and Moyer '332 does not explicitly teach the aforementioned sequence, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the reordered source data to reflect the “0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15” order as explicitly taught by Szedo, as this modification merely entails combining prior art elements according to known methods to yield predictable results, which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. (Also, such a modification results in correct execution that dovetails with the bit-reversal operation being performed.)

Response to Arguments
Applicant on page 6 argues: ‘Claims 1-20 stand objected to for informalities. In particular, claims 1 and 10 stand objected to for the recitation "with data element." In the interest of clarity, these claims are amended to recite "with a second respective data element of the set of data elements." This is believed to resolve the informality. Claims 2-9 and 11-20 stand objected to only for depending from either claim 1 or claim 10. Accordingly, it is respectfully requested that the objections to claims 1-20 be reconsidered and withdrawn.’
In view of the aforementioned amendments, the previously presented objections are withdrawn. 

Applicant on page 6 argues: ‘Claims 1-20 stand rejected as indefinite under 35 U.S.C. § 112(b). With respect to claims 1 and 10, the rejections appear to be based on the recitation "depending on a field width." Both of these claims are amended to clarify the recited creating of reordered source data in a manner that does not directly recite "depending on a field width." In view of these amendments, it is respectfully submitted that the metes and bounds of amended claims 1 and 10 are clear, and thus the claims comply with 35 U.S.C. § 112(b). Claims 2-9 and 11-20 stand rejected based on their dependence from either claim 1 or claim 10. Accordingly, claims 2-9 and 11-20 comply with 35 U.S.C. § 112(b) as well, and the Applicant respectfully requests notice to this effect.’
In view of the aforementioned amendments, the previously presented rejections under 35 U.S.C. § 112(b) are withdrawn.

Applicant on page 7 argues: “Claim 1 is amended and the combination of Zbiciak, M1, and M2 does not teach or suggest all of the elements of the amended claim.”
In view of the aforementioned amendment, Examiner is newly relying upon the Ould-Ahmed-Vall reference — see the Claim Rejections - 35 USC § 103 section above.

Applicant on page 8 argues: “Claims 2-9 depend from and further limit independent claim 1. These claims are patentable over the combination of Zbiciak, M1, and M2, for at least this reason. Furthermore, the dependent claims recite additional elements that may further distinguish them over the cited references. The additional cited reference of Szedo does not remedy these deficiencies and is not cited for this purpose. Accordingly, claims 1-9 are patentable over the combinations of Zbiciak, M1, M2, and/or Szedo, and the Applicant respectfully requests notice to that effect.” Applicant on page 8 argues: “Independent claim 10 has been amended and recites elements similar to those that distinguish amended claim 1. For example, claim 10 recites, "in response to a bit-reversed vector store instruction that specifies the first source register, the second source register, and a number of bits in each of the plurality of lanes." Accordingly, claim 10 is patentable over the combination of Zbiciak, M1, and M2, for reasons similar to those presented above. Claims 11-20 depend from and further limit independent claim 10, and are patentable over the cited references for at least this reason. Furthermore, the dependent claims recite additional elements that may further distinguish them over the cited references. The additional cited reference of Szedo does not remedy these deficiencies and is not cited for this purpose. Accordingly, it is respectfully submitted that claims 11-20 are patentably distinct from the cited references and, and the Applicant respectfully requests that they be passed to allowance.”
Examiner’s response to arguments above with respect to claim 1 is likewise applicable to the arguments directed to the aforementioned further claims. Examiner notes, however, that in claim 10, line 10, “based on” may have been intended to be “determined based on”. (See analogous claim 1, line 18.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314. The examiner can normally be reached Monday to Friday, 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571)270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEITH E VICARY/            Primary Examiner, Art Unit 2182