DETAILED ACTION
It is hereby acknowledged that the following papers have been received and placed of record in the file:
Amended Claims						-Receipt Date 12/14/2020
Applicant Arguments						-Receipt Date 12/14/2020		
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This office action is in response to the amendment filed on 12/14/2020. Claims 1-20 are pending. Claims 1-20 are amended. Applicant's amendments to the claims have overcome the objections previously set forth in the Non-Final Office action mailed 09/14/2020. 

Response to Arguments
Applicant's arguments filed 12/14/2020 have been fully considered but they are not persuasive. 
Applicant submits:
“The prior art reference Scales fails to disclose or suggest applicant's claimed elements. Specifically, Scales fails to disclose the claimed element of the vector processor being configured to generate an output vector using an input vector and the arithmetic logic unit, the input vector having a plurality of elements identified by a vector index register. Applicant's remaining independent claims include similar claimed elements. 

	Applicant appears to argue that Scales does not use an arithmetic logic unit to generate its output vector. However, this argument is not persuasive because the Office Action maps the vector processing unit (VPU 24) as the recited arithmetic logic unit since the vector processing unit performs vector-oriented operations, i.e. arithmetic logic operations on vectors, see col 4 lines 29-32, and the permute-with-replicate vector operation described at col 2 line 58-col 3 line 12 uses the VPU to generate an output vector.
	Examiner suggests including further details regarding the arithmetic logic units as disclosed in [0041] of the specification may help to distinguish the arithmetic logic unit from the VPU taught by Scales. 

Claim Objections
Claims 8, 12, and 16 are objected to because of the following informalities:  
Claims 8 and 16- “the vector operation” is unclear which of the previously introduced “a vector operation” in claims 7 and 1 it refers to, similar corrections should be made for “the vector operation” in claim 16
Claim 12 lines 9-11 appear to be a copy paste typo since it repeats language that is in lines 7-8, lines 9-11 should be deleted
Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 and 12 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Scales, III et al. US 6,334,176 (hereinafter, Scales).
Regarding claim 1, Scales teaches:
1. A vector processor (Fig. 1, 10), comprising: 
an arithmetic logic unit (col 4 lines 12-32: VPU 24 is an arithmetic logic unit since it performs vector-oriented/arithmetic logic operations); 
an operand vector register configured to store a list of elements (col 5 lines 43-48 and col 6 lines 45-56: input register 310 is an operand vector register operand vector register and stores a list of elements); and 
a vector index register, configured to store a plurality of indices identifying respectively a plurality of elements from the list stored in the operand vector register (col 5 lines 43-48, col 6 lines 37-51, and col 7 lines 41-59: the alignment/control vector register stores offsets, i.e. a vector index register configured to store a plurality of indices, that identify respectively elements in the list stored in input register 310), 
wherein during a vector operation, the vector processor is configured to generate an output vector using an input vector and the arithmetic logic unit, the input vector having the plurality of elements identified by the vector index register (col 2 line 58-col 3 line 12, col 7 lines 47-52, and col 8 lines 30-31: during a permute-with-replication PWR operation, i.e. a vector operation, the processor generates output vector VT using an input vector in VA and VB and using the VPU 24, where the input vector in VA and VB has the plurality of elements that are identified by the control vector/vector index register).

Regarding claim 12, Scales teaches:
12. A method, comprising: 
storing, in an operand vector register in a vector processor, a list elements (col 5 lines 43-48 and col 6 lines 45-56: input register 310 is an operand vector register operand vector register and stores a list of elements) to be used as input for a vector operation of an arithmetic logic unit in the vector processor (col 2 line 58-col 3 line 12, col 7 lines 47-52, and col 8 lines 30-31: during a permute-with-replication PWR operation, i.e. a vector operation, the processor generates output vector VT using an input vector in VA and VB and using the VPU 24, where the input vector in VA and VB has the plurality of elements that are identified by the control vector/vector index register)
storing, in a vector index register in the vector processor, indices identifying respectively a plurality of elements from the list stored in the operand vector register (col 5 lines 43-48, col 6 lines 37-51, and col 7 lines 41-59: the alignment/control vector register stores offsets, i.e. a vector index register storing indices, that identify respectively elements in the list stored in input register 310); and 
generating, during a vector operation, an output vector using an input vector and the arithmetic logic unit, the input vector having the plurality of elements identified by the vector index register (col 2 line 58-col 3 line 12, col 7 lines 47-52, and col 8 lines 30-31: during a permute-with-replication PWR operation, i.e. a vector operation, the processor generates output vector VT using an input vector in VA and VB and using the VPU 24, where the input vector in VA and VB has the plurality of elements that are identified by the control vector/vector index register).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 5, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Hall et al. US 5,226,171 (hereinafter, Hall), and Buchty et al. US 2004/0153623 (hereinafter, Buchty).
	Regarding claim 2, Scales teaches:
2. The vector processor of claim 1, 
	Scales only teaches aligning vectors using a control vector of indexes (Abstract) and does not teach aligning using a counter that increments the effective address or the control vector. That is, Scales does not teach:
a counter, configured to store a count, wherein the count is addable to an effective address for accessing the operand vector register or wherein the count is for iterating on the plurality of address components stored in the vector index register; and 
a 2:1 multiplexor configured to: 
receive, as inputs, the count from the counter and an output from the vector index register, wherein the output from the vector index register comprises one of the plurality of address components of the vector index register corresponding to the count; 
receive, as a selection input, a mode value, the mode value being a value for selection of the count or a value for selection of the output from the vector index register; 
select either the count or the output from the vector index register according to the received mode value; and 
communicate the selected count or the selected output from the vector index register to a requester accessing the operand vector register for the arithmetic logic unit.
	However, Hall teaches:
a counter, configured to store a count, wherein the count is addable to an effective address for accessing the arithmetic logic unit (col 6 lines 43-47 and col 7 lines 3-10: the index counter increments a previous value that was written or incremented, i.e. is configured to store an increment that is added/addable to a previous/effective address) or wherein the count is for iterating on the plurality of address components stored in the vector index register; and 
vector registers to: 
receive, as inputs, the count from the counter and an output from a vector index register (col 7 lines 53-57: the vector registers are addresses either by an incremented address/count or a memory address register), 
select either the count or the output from the vector index register (col 7 lines 53-57: an incremented address/count or a memory address register is selected to address the vector registers); and 
communicate the selected count or the selected output from the vector index register to a requester accessing the operand vector register for the arithmetic logic unit (col 7 lines 53-57, Fig. 4, and Fig. 6: the selected address source is communicated to a vector register 48/operand vector register to retrieve an elements to be operated on by multipliers/adders, i.e. for the arithmetic logic unit, the logic of Fig. 6 that accesses the vector registers is a requester).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the alignment logic of Scales to support the addressing techniques taught by Hall such that Scales uses a counter to access successive addresses in its input and control vector registers. One of ordinary skill in the art would have been motivated to modify Scales to support the different addressing techniques of Hall to increase the flexibility of the addressing of Scales by enabling efficient accessing of successive or sparse addresses (Hall col 7 lines 5-16). Further, one of ordinary skill in the art would have been motivated to use the counter of Scales for indexing the address or control vector of Scales because using a counter is a known technique on the known device of a computer processor for generating values and would yield the predictable result of reducing hardware costs since values can be generated as needed. 
	Further, Buchty teaches selecting between addressing modes using multiplexers (Abstract). In particular, Buchty teaches:
a 2:1 multiplexor ([0031]: multiplexer 612) configured to: 
receive, as a selection input, a mode value, the mode value being a value for selection of a first index or a value for selection of a second index ([0031]: multiplexer 612 receives mode select signal 616 as a selecting input to select one of the two inputs to use as an index);
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales in view of Hall to use the multiplexer of Buchty for selecting between addressing techniques based on a mode value. One of ordinary skill in the art would have been motivated to make this modification because multiplexing is a known technique on the known device of a computer processor for selecting between inputs and would yield the predictable result of efficiently implementing selection logic. 
	
Regarding claim 5, Scales teaches:
5. The vector processor of claim 1, 
	Scales does not teach:
a vector first register, configured to store a vector first address component, the vector first address component being an address component that directs initial access of the operand vector register at an initial position of the operand vector register based on the vector first address component such that the initial position accessed is not the first position of the operand vector register; 
a counter, configured to store a count, wherein the count is addable to an effective address for accessing the operand vector register, wherein the count is for incrementing the vector first address component, or wherein the count is for iterating on the plurality of address components stored in the vector index register; and 
a 3:1 multiplexor configured to: 
receive, as inputs, the count from the counter, an output derived from the vector first register, and an output from the vector index register, wherein the output from the vector index register comprises one of the plurality of address components of the vector index register corresponding to the count, and wherein the vector first address component corresponds to the count; 
receive, as a selection input, a mode value, the mode value being a value for selection of the count, a value for selection of the output derived from the vector first register, or a value for selection of the output from the vector index register; 
select either the count, the output derived from the vector first register, or the output from the vector index register according to the received mode value; and 
communicate the selected count, the selected output derived from the vector first register, or the selected output from the vector index register to a requester accessing the operand vector register for the arithmetic logic unit.
	However, Hall teaches:
a vector first register, configured to store a vector first address component, the vector first address component being an address component that directs initial access of the operand vector register at an initial position of the operand vector register based on the vector first address component such that the initial position accessed is not the first position of the operand vector register (col 6 lines 43-48 and col 7 lines 3-5: register 78 sends the counter to a starting index to direct initial access of the vector register at an initial position based on the starting index where the starting index is not required to be the first position of the vector register); 
a counter, configured to store a count, wherein the count is addable to an effective address for accessing the operand vector register (col 6 lines 43-47 and col 7 lines 3-10: the index counter increments a previous value that was written or incremented, i.e. is configured to store an increment that is added/addable to a previous/effective address); and 
vector registers to: 
receive, as inputs, the count from the counter, an output derived from the vector first register, and an output from a vector index register (col 7 lines 3-16 and lines 53-57: the vector registers are addresses either by an incremented address/count, a count from one of the FIFOs which may be an index incremented on a previous starting address/derived from the vector first register, or a memory address register), 
select either the count, the output derived from the vector first register, or the output from the vector index register (col 7 lines 53-57: an incremented address/count, FIFO value, or a memory address register is selected to address the vector registers); and 
communicate the selected count, the selected output derived from the vector first register, or the selected output from the vector index register to a requester accessing the operand vector register for the arithmetic logic unit (col 7 lines 53-57, Fig. 4, and Fig. 6: the selected address source is communicated to a vector register 48/operand vector register to retrieve an elements to be operated on by multipliers/adders, i.e. for the arithmetic logic unit, the logic of Fig. 6 that accesses the vector registers is a requester).

	Further, Buchty teaches selecting between addressing modes using multiplexers (Abstract). In particular, Buchty teaches:
a multiplexor ([0031]: multiplexer 612) configured to: 
receive, as a selection input, a mode value, the mode value being a value for selection of a first index or a value for selection of a second index ([0031]: multiplexer 612 receives mode select signal 616 as a selecting input to select one of the two inputs to use as an index);
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales in view of Hall to use the multiplexer of Buchty for selecting between the 3 addressing techniques based on a mode value. One of ordinary skill in the art would have been motivated to make this modification because multiplexing is a known technique on the known device of a computer processor for selecting between inputs and would yield the predictable result of efficiently implementing selection logic. 


14. The method of claim 13, further comprising: 
receiving at least the count from the counter and an output from the vector index register (col 7 lines 53-57: the vector registers are addresses either by an incremented address/count or a memory address register); 
selecting at least either the count or the output from the vector index register (col 7 lines 53-57: an incremented address/count or a memory address register is selected to address the vector registers); and 
communicating the selection to a vector load-store unit of the vector processor accessing the operand vector register for the arithmetic logic unit (col 7 lines 53-57, Fig. 4, and Fig. 6: the selected address source is communicated to a vector register 48/operand vector register to retrieve an elements to be operated on by multipliers/adders, i.e. for the arithmetic logic unit, the logic of Fig. 6 that accesses the vector registers is a requester).
Scales in view of Hall does not teach:
receiving, by a N:1 multiplexor of the vector processor, at least the count from the counter and an output from the vector index register, wherein the output from the vector index register comprises one of the plurality of address components of the vector index register corresponding to the count; 
receiving, by the N:1 multiplexor, a selection input comprising a mode value, the mode value being at least a value for selection of the count or a value for selection of the output from the vector index register; 
selecting, by the N:1 multiplexor, at least either the count or the output from the vector index register according to the received mode value; and 
communicating the selection to a vector load-store unit of the vector processor accessing the operand vector register for the arithmetic logic unit.
Further, Buchty teaches selecting between addressing modes using multiplexers (Abstract). In particular, Buchty teaches:
a N:1 multiplexor ([0031]: multiplexer 612)
receiving, by the N:1 multiplexor, a selection input comprising a mode value, the mode value being at least a value for selection of t first index or a value for selection of a second index ([0031]: multiplexer 612 receives mode select signal 616 as a selecting input to select one of the two inputs to use as an index); 
selecting, by the N:1 multiplexor, at least either the first index or second index according to the received mode value ([0031]: multiplexer 612 receives mode select signal 616 as a selecting input to select one of the two inputs to use as an index)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales in view Hall to use the multiplexer of Buchty for selecting between its addressing techniques based on a mode value. One of ordinary skill in the art would have been motivated to make this modification because multiplexing is a known technique on the known device of a computer processor for selecting between inputs and would yield the predictable result of efficiently implementing selection logic. 

Claims 3 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Hall et al. US 5,226,171 (hereinafter, Hall), Buchty et al. US 2004/0153623 (hereinafter, Buchty), and Jha et al. US 2016/0179526 (hereinafter, Jha).
	Regarding claim 3, Scales in view of Hall and Buchty teaches:
3. The vector processor of claim 2, wherein the requester is a vector load-store unit of the vector processor and the vector load-store unit is configured to: 
generate effective addresses of load and store operations of the vector processor (Scales col 4 lines 39-45: LSU 28 executes load and store instructions, i.e. generates effective addresses to load from or store to); 
	Scales in view of Hall and Buchty does not explicitly teach:
		the vector load-store unit is configured to:
for each address component of the vector index register, add the address component of the vector index register to an effective address for accessing a corresponding position in the operand vector register.
	However, Jha teaches vector index load logic (Abstract). In particular, Jha teaches:
vector load-store unit is configured to:
for each address component of the vector index register, add the address component of the vector index register to an effective address for accessing a corresponding position ([0152] and [0155]-[0156]: vector index load/store logic is part of a memory management unit, i.e. a load-store unit, and is configured to add an index/address component to a base/effective address for accessing positions in memory)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales to configures its load-store unit as taught by Jha such that the load-store unit of Scales adds the address components in the control vectors to the effective addresses for accessing the input vectors. One of ordinary skill in the art would have been motivated to make this modification because using a load-store unit to perform operations on memory addresses is a known technique on the known device of a computer processor for handling memory 

Regarding claim 20, Scales teaches:
20. A system, comprising: 
an arithmetic logic unit (col 4 lines 12-20: VPU 24 is an arithmetic logic unit of multiple execution units, i.e. a plurality of arithmetic logic units);  
an operand vector register, configured to store elements of an operand vector (col 5 lines 43-48: input register 310 is an operand vector register of plurality of operand vector registers VA and VB, and stores elements of an operand vector) to be used as input for a vector operation of the arithmetic logic unit (col 7 lines 47-52 and col 8 lines 30-31: a vector operation, i.e. using arithmetic logic unit/VPU 24, is performed on elements of VA that were loaded into VT, i.e. the elements in VA are used as input for a vector operation); 
a vector index register, configured to store a plurality of address components corresponding to a plurality of positions in the operand vector register (col 5 lines 43-48, col 6 lines 37-51, and col 7 lines 41-59: the alignment vector stores offsets, i.e. a plurality of address components, that correspond to positions in VA), 
each address component addable to an effective address for accessing a corresponding position in the operand vector register (col 7 lines 41-59: each offset/address component is added to a base address, i.e. an effective address, for accessing a position in VA), and 
each position of the operand vector register comprises an element of the operand vector to be operated upon by the arithmetic logic unit (col 5 lines 43-48, col 7 lines 41-59, and col 8 lines 30-31: each position/indexed location in VA comprises an element that will be loaded into VT and operated on in a vector operation by VPU 24, see also VB filled with elements in each position);
	Scales only teaches aligning vectors using a control vector of indexes (Abstract) and does not teach aligning using a counter that increments the effective address or the control vector. That is, Scales does not teach: 
a counter, configured to store a count, wherein the count is at least addable to an effective address for accessing the operand vector register or for iterating on the plurality of address components stored in the vector index register; and 
a N:1 multiplexor configured to: 
receive, as inputs, at least the count from the counter and an output from the vector index register, wherein the output from the vector index register comprises one of the plurality of address components of the vector index register corresponding to the count; 
receive, as a selection input, a mode value, the mode value being at least a value for selection of the count or a value for selection of the output from the vector index register; 
select at least either the count or the output from the vector index register according to the received mode value; and 
communicate the selection to a vector load-store unit accessing the operand vector register for the arithmetic logic unit.
However, Hall teaches:
a counter, configured to store a count, wherein the count is addable to an effective address for accessing the operand vector register (col 6 lines 43-47 and col 7 lines 3-10: the index counter increments a previous value that was written or incremented, i.e. is configured to store an increment that is added/addable to a previous/effective address) or wherein the count is for iterating on the plurality of address components stored in the vector index register; and 
vector registers to: 
receive, as inputs, the count from the counter and an output from a vector index register (col 7 lines 53-57: the vector registers are addresses either by an incremented address/count or a memory address register), 
select either the count or the output from the vector index register (col 7 lines 53-57: an incremented address/count or a memory address register is selected to address the vector registers); and 
communicate the selection for accessing the operand vector register for the arithmetic logic unit (col 7 lines 53-57, Fig. 4, and Fig. 6: the selected address source is communicated to a vector register 48/operand vector register to retrieve an elements to be operated on by multipliers/adders, i.e. for the arithmetic logic unit, the logic of Fig. 6 that accesses the vector registers is a requester).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the alignment logic of Scales to support the addressing techniques taught by Hall such that Scales uses a counter to access successive addresses in its input and control vector registers. One of ordinary skill in the art would have been motivated to modify Scales to support the different addressing techniques of Hall to increase the flexibility of the addressing of Scales by enabling efficient accessing of successive or sparse addresses (Hall col 7 lines 5-16). Further, one of ordinary skill in the art would have been motivated to use the counter of Scales for indexing the address 
	Further, Buchty teaches selecting between addressing modes using multiplexers (Abstract). In particular, Buchty teaches:
a 2:1 multiplexor ([0031]: multiplexer 612) configured to: 
receive, as a selection input, a mode value, the mode value being a value for selection of the a first index or a value for selection of a second index ([0031]: multiplexer 612 receives mode select signal 616 as a selecting input to select one of the two inputs to use as an index);
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales in view of Hall to use the multiplexer of Buchty for selecting between addressing techniques based on a mode value. One of ordinary skill in the art would have been motivated to make this modification because multiplexing is a known technique on the known device of a computer processor for selecting between inputs and would yield the predictable result of efficiently implementing selection logic. 
	Further, Jha teaches:
a vector load-store unit accessing operands for the arithmetic logic unit ([0152] and [0155]-[0156]: vector index load/store logic is part of a memory management unit, i.e. a load-store unit, and is configured to add an index/address component to a base/effective address for accessing positions in memory).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales to configure its load-store unit as taught by Jha such that the load-store unit of Scales is used for for accessing the input vectors. One of ordinary skill in . 


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Ould-Ahmed-Vall et al. US 2017 /0177356 (hereinafter, Ould), Hall et al. US 5,226,171 (hereinafter, Hall), and Buchty et al. US 2004/0153623 (hereinafter, Buchty).
	Regarding claim 4, Scales teaches:
4. The vector processor of claim 1, 
	Scales does not teach using a plurality of control vector registers or aligning using a counter that increments the effective address or the control vector. That is, Scales does not teach:
wherein the vector index register is one of a plurality of vector index registers (vector index registers), and wherein the vector processor further comprises: 
a counter, configured to store a count, wherein the count is addable to an effective address for accessing the operand vector register or wherein the count is for iterating on each respective plurality of address components stored in the vector index registers; and 
a N:1 multiplexor configured to: 
receive, as inputs, the count from the counter and respective outputs from the vector index registers, wherein each output from a given vector index register of the vector index registers comprises one of a plurality of address components of the given vector index register corresponding to the count; 
receive, as a selection input, a mode value, the mode value being a value for selection of the count or a value for selection of one of the respective outputs from the vector index registers; 
select either the count or one of the respective outputs from the vector index registers according to the received mode value; and 
communicate the selected count or selected one of the respective outputs from the vector index registers to a requester accessing the operand vector register for the arithmetic logic unit.
	However, Ould teaches permute instructions using index registers (Abstract). In particular, Ould teaches: 
a plurality of vector index registers (vector index registers) ([0035] and [0041]: multiple index registers may be used for permutation operations)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scaler to use multiple control vectors/index vectors as taught by Ould. One of ordinary skill in the art would have been motivated to make this modification because use of multiple registers is a known technique on the known device of a computer processor for specifying data and would yield the predictable result of enabling more flexibility and capacity for specifying indices. 
Hall teaches:
a counter, configured to store a count, wherein the count is addable to an effective address for accessing the operand vector register (col 6 lines 43-47 and col 7 lines 3-10: the index counter increments a previous value that was written or incremented, i.e. is configured to store an increment that is added/addable to a previous/effective address); and 
to: 
receive, as inputs, the count from the counter and an output from a vector index register (col 7 lines 53-57: the vector registers are addresses either by an incremented address/count or a memory address register), 
select either the count or the output from the vector index register (col 7 lines 53-57: an incremented address/count or a memory address register is selected to address the vector registers); and 
communicate the selected count or the selected output from the vector index register to a requester accessing the operand vector register for the arithmetic logic unit (col 7 lines 53-57, Fig. 4, and Fig. 6: the selected address source is communicated to a vector register 48/operand vector register to retrieve an elements to be operated on by multipliers/adders, i.e. for the arithmetic logic unit, the logic of Fig. 6 that accesses the vector registers is a requester).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the alignment logic of Scales to support the addressing techniques taught by Hall such that Scales uses a counter to access successive addresses in its input and control vector registers. One of ordinary skill in the art would have been motivated to modify Scales to support the different addressing techniques of Hall to increase the flexibility of the addressing of Scales by enabling efficient accessing of successive or sparse addresses (Hall col 7 lines 5-16). Further, one of ordinary skill in the art would have been motivated to use the counter of Scales for indexing the address or control vector of Scales because using a counter is a known technique on the known device of a computer processor for generating values and would yield the predictable result of reducing hardware costs since values can be generated as needed. 

a N:1 multiplexor ([0031]: multiplexer 612) configured to: 
receive, as a selection input, a mode value, the mode value being a value for selection of a first index or a value for selection of a second index ([0031]: multiplexer 612 receives mode select signal 616 as a selecting input to select one of the two inputs to use as an index);
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales in view of Ould and Hall to use the multiplexer of Buchty for selecting between its counter and vector index registers addressing techniques based on a mode value. One of ordinary skill in the art would have been motivated to make this modification because multiplexing is a known technique on the known device of a computer processor for selecting between inputs and would yield the predictable result of efficiently implementing selection logic. 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Jha et al. US 2016/0179526 (hereinafter, Jha).
	Regarding claim 6, Scales teaches: 
6. The vector processor of claim 1, further comprising a vector load-store unit configured to: 
generate effective addresses of load and store operations of the vector processor (Scales col 4 lines 39-45: LSU 28 executes load and store instructions, i.e. generates effective addresses to load from or store to); 
	Scales not explicitly teach:
		the vector load-store unit is configured to:
for each address component of the vector index register, add the address component of the vector index register to an effective address for accessing a corresponding position in the operand vector register.
	However, Jha teaches vector index load logic (Abstract). In particular, Jha teaches:
vector load-store unit is configured to:
for each address component of the vector index register, add the address component of the vector index register to an effective address for accessing a corresponding position ([0152] and [0155]-[0156]: vector index load/store logic is part of a memory management unit, i.e. a load-store unit, and is configured to add an index/address component to a base/effective address for accessing positions in memory)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales to configures its load-store unit as taught by Jha such that the load-store unit of Scales adds the address components in the control vectors to the effective addresses for accessing the input vectors. One of ordinary skill in the art would have been motivated to make this modification because using a load-store unit to perform operations on memory addresses is a known technique on the known device of a computer processor for handling memory accesses and would yield the predictable result of improving performance by allowing other units to perform other useful work.

Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Jha et al. US 2016/0179526 (hereinafter, Jha) and Hall et al. US 5,226,171 (hereinafter, Hall).
	Regarding claim 7, Scales in view of Jha teaches: 
7. The vector processor of claim 6, wherein the vector load-store unit is configured to: 
load an input operand vector stored in the operand vector register (Scales col 4 lines 39-45 and col 6 lines 45-51: an input operand vector is loaded into the operand vector register VA); 
load, from the vector index register, a stored position of an element of the loaded input operand vector (Scales col 5 lines 43-48 and col 7 lines 1-5: the VC/vector index register is loaded with positions of elements in VA); and 
store the element of the loaded input operand vector into an output operand vector register that corresponds to the loaded position from the vector index register, as part of a vector operation (Scales col 6 line 37- col 7 line 5: elements from VA are stored into output register VT that corresponds to positions in VC and is part of a vector operation since it is an operation on vectors).
	Although Scales teaches using its VC/vector index register to load sequential bytes from the input registers (Scales col 6 lines 52-56), Scales in view of Jha does not teach:
		the vector load-store unit is configured to 
load a count from a counter register; and
load, from the vector index register, according to the count
	However, Hall teaches:
load a count from a counter register (col 6 lines 44-47 and col 7 lines 3-8: index counter includes a counter, i.e. a counter register, with a count to increment a previously written or incremented value); 
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales to use the counter of Hall for incrementing through the VC of Scales. One of ordinary skill in the art would have been motivated to make this modification because using a counter is a known technique on the known device of a computer 

	Regarding claim 8, Scales in view of Jha and Hall teaches:
8. The vector processor of claim 7, wherein the vector operation is a compress operation configured to store the elements of the loaded input operand vector into an output operand vector register that correspond to loaded positions stored in the vector index register (Scales col 6 line 37- col 7 line 5: elements from VA are stored into output register VT that corresponds to positions in VC and is part of a compress operation since smaller pieces of VA1 and VB1 are put together into VT1, see Fig. 5).

Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Jha et al. US 2016/0179526 (hereinafter, Jha), Hall et al. US 5,226,171 (hereinafter, Hall), and Nishikawa et al. US 5,511,210 (hereinafter, Nishikawa).
	Regarding claim 9, Scales in view of Jha and Hall teaches: 
9. The vector processor of claim 8, 
	Scales in view of Jha and Hall does not teach:
wherein the vector load-store unit is configured to: 
	load the stored elements from the output operand vector register; and 
iterate a second vector operation over the stored elements from the output operand vector register according to the loaded positions stored in the vector index register, wherein the second vector operation is an expand operation configured to store the elements from the output operand vector register into a second output operand vector register at positions of the second output operand vector register according to the loaded positions stored in the vector index register.
	However, Nishikawa teaches:
load compressed elements (col 7 lines 14-23: data is loaded from memory and compressed into storage means 21)
iterate a second vector operation over the stored elements from the output operand vector register, wherein the second vector operation is an expand operation configured to store the elements from the output operand vector register into a second output operand vector register (col 7 lines 14-39: an expand operation is performed on the compressed data and stored into a vector register)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales in view of Jha and Hall to perform the compress and expand operation using mask registers as control vector as taught by Nishikawa. One of ordinary skill in the art would have been motivated to make this modification because compressing data to be stored and expanding the data to be operated on is a known technique on the known device of a computer processor and would yield the predictable result of reducing storage requirements while allowing for the data to be reloaded in the correct positions when ready to be operated on.
 
	Regarding claim 10, Scales in view of Jha, Hall, and Nishikawa teaches:
10. The vector processor of claim 9, wherein expand operation is further configured to store a scalar into the second output operand vector register at other positions of the second output operand vector register (Nishikawa col 7 lines 22-29: an arbitrary value, i.e. a scalar, is stored into the output where the mask is 0, i.e. at other positions of the output).

11 is rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Jha et al. US 2016/0179526 (hereinafter, Jha), Hall et al. US 5,226,171 (hereinafter, Hall), and Ould-Ahmed-Vall et al. US 2017 /0177356 (hereinafter, Ould).
	Regarding claim 11, Scales in view of Jha and Hall teaches:
11. The vector processor of claim 8, wherein the vector load-store unit is configured to: 
iterate a second compress operation over elements of a second loaded operand vector according to loaded positions stored in a vector index register (Scales col 7 lines 22-27: a second PWR instruction/compress operation is performed on second loaded vector VB/VA according to positions in VC); 
store the elements of the second loaded operand vector into a second output operand vector register that correspond to the loaded positions stored in the vector index register (Scales col 7 lines 22-27: the elements are stored into VT2 in positions corresponding to VC); and 
perform one or more vector operations using the elements from the first output operand vector register and the second output operand vector register (Scales col 8 lines 30-31: vector operations are performed on the elements in output registers VT1 and VT2).
	Scales in view of Jha and Hall do not teach:
		a second vector index register
	However, Ould teaches:
a second vector index register ([0035] and [0041]: multiple index registers may be used for permutation operations).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scaler to use multiple control .

Claims 13, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Hall et al. US 5,226,171 (hereinafter, Hall).
	Regarding claim 13, Scales teaches:
13. The method of claim 12, 
	Scales only teaches aligning vectors using a control vector of indexes (Abstract) and does not teach aligning using a counter that increments the effective address or the control vector. That is, Scales does not teach:
adding a count stored in a counter to an effective address for accessing the operand vector register or iterating on the plurality of address components stored in the vector index register according to the count and subsequently adding an output from the vector index register to the effective address for accessing the operand vector register, wherein the output from the vector index register comprises one of the plurality of address components of the vector index register corresponding to the count.
	However, Hall teaches: 
adding a count stored in a counter to an effective address for accessing an operand vector register (col 6 lines 43-47 and col 7 lines 3-10: the index counter increments a previous value that was written or incremented, i.e. is configured to store an increment that is added/addable to a previous/effective address)


	Regarding claim 15, Scales teaches: 
15. The method of claim 12, further comprising: 
loading an input operand vector stored in the operand vector register (Scales col 4 lines 39-45 and col 6 lines 45-51: an input operand vector is loaded into the operand vector register VA); 
loading, from the vector index register, a stored position of an element of the loaded input operand vector (Scales col 5 lines 43-48 and col 7 lines 1-5: the VC/vector index register is loaded with positions of elements in VA); and 
storing the element of the loaded input operand vector into an output operand vector register that corresponds to the loaded position from the vector index register, as part of a vector operation (Scales col 6 line 37- col 7 line 5: elements from VA are stored into output register VT that corresponds to positions in VC and is part of a vector operation since it is an operation on vectors).

loading a count from a counter register; and
loading, from the vector index register, according to the count
	However, Hall teaches:
loading a count from a counter register (col 6 lines 44-47 and col 7 lines 3-8: index counter includes a counter, i.e. a counter register, with a count to increment a previously written or incremented value); 
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales to use the counter of Hall for incrementing through the VC of Scales. One of ordinary skill in the art would have been motivated to make this modification because using a counter is a known technique on the known device of a computer processor for generating values and would yield the predictable result of reducing hardware costs since values can be generated as needed.

	Regarding claim 16, Scales in view of Hall teaches:
16. The method of claim 15, wherein the vector operation is a compress operation, and wherein the method further comprises storing the elements of the loaded operand vector into an output operand vector register that correspond to the loaded positions stored in the vector index register (Scales col 6 line 37- col 7 line 5: elements from VA are stored into output register VT that corresponds to positions in VC and is part of a compress operation since smaller pieces of VA1 and VB1 are put together into VT1, see Fig. 5).
s 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Hall et al. US 5,226,171 (hereinafter, Hall), and Nishikawa et al. US 5,511,210 (hereinafter, Nishikawa).
	Regarding claim 17, Scales in view of Hall teaches:
17. The method of claim 16, further comprising: 
	Scales in view of Hall does not teach:
loading the stored elements from the output operand vector register; 
iterating a second vector operation over the stored elements from the output operand vector register according to the loaded positions stored in the vector index register, wherein the second vector operation is an expand operation; and 
storing, according to the expand operation, the elements from the output operand vector register into a second output operand vector register at positions of the second output operand vector register according to the loaded positions stored in the vector index register.
	However, Nishikawa teaches:	
load compressed elements (col 7 lines 14-23: data is loaded from memory and compressed into storage means 21)
iterating a second vector operation over the stored elements from the output operand vector register according to the loaded positions stored in the vector index register, wherein the second vector operation is an expand operation (col 7 lines 14-39: an expand operation is performed on the compressed data and stored into a vector register); and 
storing, according to the expand operation, the elements from the output operand vector register into a second output operand vector register at positions of the second output operand vector register according to the loaded positions stored in the vector index register (col 7 lines 14-39: an expand operation is performed on the compressed data and stored into a vector register).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scales in view of Jha and Hall to perform the compress and expand operation using mask registers as control vector as taught by Nishikawa. One of ordinary skill in the art would have been motivated to make this modification because compressing data to be stored and expanding the data to be operated on is a known technique on the known device of a computer processor and would yield the predictable result of reducing storage requirements while allowing for the data to be reloaded in the correct positions when ready to be operated on.

Regarding claim 18, Scales in view of Hall and Nishikawa teaches:
18. The method of claim 17, further comprising storing, according to the expand operation, a scalar into the second output operand vector register at other positions of the second output operand vector register (Nishikawa col 7 lines 22-29: an arbitrary value, i.e. a scalar, is stored into the output where the mask is 0, i.e. at other positions of the output).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Scales, III et al. US 6,334,176 (hereinafter, Scales) in view of Hall et al. US 5,226,171 (hereinafter, Hall), and Ould-Ahmed-Vall et al. US 2017 /0177356 (hereinafter, Ould).
	Regarding claim 19, Scales in view of Hall teaches:
19. The method of claim 16, further comprising: 
iterating a second compress operation over elements of a second loaded operand vector according to loaded positions stored in a vector index register (Scales col 7 lines 22-27: a second PWR instruction/compress operation is performed on second loaded vector VB/VA according to positions in VC); 
storing the elements of the second loaded operand vector into a second output operand vector register that correspond to the loaded positions stored in the vector index register (Scales col 7 lines 22-27: the elements are stored into VT2 in positions corresponding to VC); and 
performing one or more vector operations using the elements from the first output operand vector register and the second output operand vector register (Scales col 8 lines 30-31: vector operations are performed on the elements in output registers VT1 and VT2).
	Scales in view of Hall does not teach:
		a second vector index register
	However, Ould teaches:
a second vector index register ([0035] and [0041]: multiple index registers may be used for permutation operations).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Scaler to use multiple control vectors/index vectors as taught by Ould. One of ordinary skill in the art would have been motivated to make this modification because use of multiple registers is a known technique on the known device of a computer processor for specifying data and would yield the predictable result of enabling more flexibility and capacity for specifying indices. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476.  The examiner can normally be reached on Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 5712724169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic 




/K.A./Examiner, Art Unit 2183                                                                                                                                                                                                        
/William B Partridge/Primary Examiner, Art Unit 2183