DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/2/2021 has been entered.
 	This communication is responsive to the request for continued examination filed on 3/2/2021.  Claims 1-20 are pending.  Claims 1-8, 10-15 and 17-19 have been amended.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
3.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For 
4.	Claims 1, 4-15 and 18 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4-5, 7-8, 10-11, 13-14 and 17-20 of copending Application No. 16/417,526(reference application) in further view Bharadwaj of PGPUB No. 2014/0189323  . Although the claims at issue are not identical, they are not patentably distinct from each other because claim 1 of the instant application is an obvious variant of claims 1 and 4 of ‘526.  Claims 1 and 4 of ‘526 mostly anticipates claim 1 of the instant application (see the table below wherein only the mapping of independent claim 1 is provided for brevity; also note that claim 4 is dependent upon claim 1 and therefore includes all limitations of claim 1).
However, claim 1 of '526 has not taught, “performing, by the vector processor a conditional test operation on each element of at least one of the loaded one or more operand vectors”. The application ‘526 teaches performing a conditional test operation on elements of a first and second lane index position of at least one of the loaded one or more operand vectors, but does not explicitly teach performing the conditional operation of all of the elements of the one or more vector operands.
Bharadwaj teaches “performing, by the vector processor a conditional test operation on each element of at least one of the loaded one or more operand vectors” ([0063-0064]:  wherein a conditional operation is performed for each element of a vector A[i])
It would have been obvious to one of to one of ordinary skill before the effective filing date of the invention to modify claim 1 of reference application ‘526 to perform the conditional test operation on all of the elements of the one or more vectors as taught in Bharadwaj.  It would have been obvious to one of ordinary skill in the art because one would have been applying a known technique (performing an iterative operation over all elements of a vector) to a known device (claim 1 reference application ‘526) ready for improvement to yield predictable results (performing a conditional test operation on all elements of a vector). (MPEP 2143, Example D)
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claim 1 of instant application 16/417,508
Claims 1 and 4 of application 16/417,526
1) A method, comprising: loading, by a vector load-store unit of a vector processor, one or more operand vectors, each vector of the one or more operand vectors being stored in a respective operand vector register; 

performing, by the vector processor, a conditional test operation on each element of at least one of the loaded one or more operand vectors according to a count stored in a counter register, 



storing, by the vector processor, a position, identified by the count for the element in response to a result of the conditional test operation performed using the element in: a first vector index register in response to the result indicating the element meeting a test in the conditional test operation; 

or a second vector index register in response to the result indicating the element not meeting the test in the conditional test operation;

 


 












performing a first vector operation on first elements in the one or more operand vectors, the first elements identified by positions stored in the first vector index register; 

and performing a second vector operation on second elements in the one or more operand vectors, the second elements identified by positions stored in the second vector index register.

1) A method, comprising: loading, by a vector load-store unit of a vector processor, one or more operand vectors, each vector of the one or more operand vectors being stored in a respective operand vector register; 

performing, by the vector processor, a conditional test operation on each element at a first lane index position of at least one of the loaded one or more operand vectors according to a first lane count stored in a first lane counter register, the conditional test operations providing a vector of test results; 

storing, in a first lane index position in a first vector index register for each TRUE result of TRUE results of the conditional test operation (VIRTRUE), a position of the TRUE result in the vector of test results according to the first lane count; 


storing, in a first lane index position in a 
second vector index register for each FALSE result of FALSE results of the conditional test operation (VIRFALSE), a position of the FALSE result in the vector of test results according to the first lane count; performing, by the vector processor, the conditional test operation on each element at a second lane index position of at least one of the loaded one or more operand vectors according to a second lane count stored in a second lane counter register; storing, in a second lane index position in the VIR_TRUE, a position of the TRUE result in the vector of test results according to the second lane count; and storing, in a second lane index position in the VIR_FALSE, a position of the FALSE result in the vector of test results according to the second lane count.

4) The method of claim 1, further comprising: performing a first vector operation on first elements in the one or more operand vectors, the first elements identified by positions stored in the VIR_TRUE; 

and performing a second vector operation on second elements in the one or more operand vectors, the second elements identified by positions stored in the VIR_FALSE.


5.	Claims 1 and 19-18 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 11 and 19 of copending Application No. 16/417,495(reference application).  Although the claims at issue are not identical, they are not patentably distinct from each other because claim 1 of the instant application is an obvious variant of claims 1-2 of ‘495.  Claims 1-2 of ‘495 mostly anticipates claim 1 of the instant application (see the table below wherein only the mapping of independent claim 1 is provided for brevity; also note that claim 2 is dependent upon claim 1 and therefore includes all limitations of claim 1).
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claim 1 of instant application 16/417,508
Claims 1 and 2 of application 16/417,495
1) A method, comprising: loading, by a vector load-store unit of a vector processor, one or more operand vectors, each vector of the one or more operand vectors being stored in a respective operand vector register; 



performing, by the vector processor, a conditional test operation on each element of at least one of the loaded one or more operand vectors according to a count stored in a counter register, 



storing, by the vector processor, a position, identified by the count for the element in response to a result of the conditional test operation performed using the element in: a first vector index register in response to the result indicating the element meeting a test in the conditional test operation; 

or a second vector index register in response to the result indicating the element not meeting the test in the conditional test operation;

performing a first vector operation on first elements in the one or more operand vectors, the first elements identified by positions stored in the first vector index register; and performing a second vector operation on second elements in the one or more operand vectors, the second elements identified by positions stored in the second vector index register.

1) A method, comprising: loading, by a vector load-store unit of a vector processor, an operand vector stored in an operand vector register (OVR); 

loading, by the vector load-store unit, a scalar stored in a scalar register; 

comparing, by the vector processor, an element of the loaded operand vector with the loaded scalar, wherein the element is selected according to a count stored in a counter register; 
until the count equals a vector length of the loaded operand vector,

storing, by the vector processor, a position of the element of the loaded operand vector according to the count in: 
a first vector index register in response to the element of the loaded operand vector matching the loaded scalar, 


or a second vector index register in response to the element of the loaded operand vector not matching the loaded scalar.

2) The method of claim 1, further comprising: 
loading from at least one of the first vector index register, or the second vector register, or a combination thereof, by the vector load-store unit, stored positions of the elements of the loaded operand vector; and iterating one or more vector operations over the elements of the loaded operand vector according to the loaded positions.


Claim Rejections - 35 USC § 112
6.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


7.	Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

In regards to claim 1, lines 10-15, each instance of the limitation “the element” is unclear. Each instance lacks clarity because it is unclear if each recitation is referring to “each element” disclosed in line 5 or is each recitation of “the element” referring to one of the elements of “each element” as disclosed in line 5?
Claim 10, lines 16-21, are similarly rejected on the same basis as claim 1 above.
In regards to claim 2, lines 10-11, the limitation “the vector of test results” lacks clarity.  The limitation is unclear because there is no prior recitation of “a vector of test results”, and therefore the limitation lacks proper antecedent basis.
In regards to claim 2, lines 13-15, the limitation “until the positions of the elements of the loaded first and second operand vectors are stored in the first operand vector register or the second operand vector register” lacks clarity.  The limitations lack clarity because claim 2, is dependent upon claim 1, which states “storing, by the vector processor, a position…in a first vector index register or a second index vector register”, therefore it is unclear if positions of elements in claim 2 are being stored in a “first operand vector register or the second operand vector register”; or are they stored in “a first vector index register or a second index vector register”?  In light of the specification the examiner believes the positions are stored in a “first vector index or second vector index register”, and for purposes of examination will interpret the claim as such.
In regards to claim 18, lines 16-18 and 20-22, each instance of the limitation “the element” is unclear. Each instance lacks clarity because it is unclear if each recitation is referring to “an element” disclosed in line 11 or is each recitation of “the element” referring to one of the elements of “each element” as disclosed in line 9? (i.e. “the element” referring to a conditional test operation performed between each element of a first and second operand vector register or is “the element” referring to a conditional test operation performed between an element of a scalar operand register and an element of an operand vector register?)
Claims 2-9, 11-17 and 19-20 are dependent upon one or more claims above and therefore are similarly rejected to for including similar deficiencies of one or more claims above.
Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9.	Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Krueger PGPUB No.:  2006/0184765, Bharadwaj, PGPUB No.:  2014/0189323 and further in view of Nickolls, PGPUB No.:  2002/0087846.
	In regards to claim 10, Krueger teaches “A system, comprising: a vector processor” ([0012, 0019 and 0022]:  wherein a vector processor is disclosed (combination of circuitry in the system of Fig. 5) (see fig. 5 for further clarity)) “a first operand vector register of the vector processor configured to store a first operand vector; a second operand vector register of the vector processor configured to store a second operand vector” (See Fig. 5:  wherein two vectors stored in the register file (element 502) are used to perform a comparison at element 504.  Therefore the vectors are stored in vector registers ([0012-0013 and 0022]) “the vector processor also being configured to perform a conditional test operation on each element of the first and the second operand vectors” ([0012, 0019 and 0027]:  wherein the ALU (element 504) performs the vector compare operation on each element of the operand vectors and provides a condition vector (see fig. 5)) “store, a position identified by a count for the element in response to a result of the conditional test operation performed using the element, in: a first vector index register in response to the result indicating the element meeting a test in the conditional test operation” ([0017-0020 and 0024-0025]:  wherein an position (an index of a true result), identified by a next_cond count for the element in response to a result of the vector compare operation, is stored in a first vector index register in response to the result indicating the elements meet a test condition of the vector compare (i.e. that the result was true)(see Figs. 3-4 for clarity:  wherein Fig. 4 is performing a method which implements element 304 of Fig. 3)) “the vector processor also being configured to:  perform a first vector operation on first elements in the first and second operand vectors, the first elements identified by positions stored in the first vector index register” ([0021-0022]:  wherein a vector permute operation is performed on elements in the first and second vector operands using the first vector index register (See Fig. 3))
	Krueger does not teach “and a vector load-store unit of the vector processor, configured to: load a first operand vector stored in the first operand vector register; load a second operand vector stored in the second operand vector register”, “perform an operation on each element of the first and second operand vectors according to a count stored in a counter register”, “a second vector index register in response to the result indicating the element not meeting a test in the conditional test operation”, nor “perform a second vector operation on second elements in the first and second operand vectors, the second elements identified by positions stored in the second vector index register”. Krueger discloses generating a true index vector from conditional results and performing an operation on vector operands using the index vector.  However, Krueger does not disclose generating a false vector index from the conditional results and performing a secondary operation using the false vector index.
	Bharadwaj teaches “a second vector index register in response to the result indicating the element not meeting a test in the conditional test operation” ([0063-0065 and 0073]:  wherein a second vector index register (v1) stores positions of false results which correspond to conditional results in a conditional mask register) “perform a second vector operation on second elements in the first and second operand vectors, the second elements identified by positions stored in the second vector index register”. ([0073]:  wherein the false index register is used to perform a permute operation on one or more vector operands using the indices in the index register)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the system of Krueger with the teachings of Bharadwaj which use conditional results to generate a false index vector in order to perform operations on vector elements which produce a false condition.  It would have been obvious to one of ordinary skill in the art because one of ordinary skill would identify that Krueger produces true and false conditional results using a vector compare operation and then generates a vector index for the true results and would see that Bharadwaj generates a vector index for false results based on conditional results.  Therefore, it would have been obvious to one of ordinary skill in the art because it would have expanded the dynamic capabilities of Krueger by allowing a true and false index vector to be generated in order to perform vector operations on vector operands which caused true and false conditions.  It would have been further used for the benefit of vectorising loops with loop carried dependences (See Bharadwaj [0065]).
	The overall combination of Krueger and Bharadwaj thus far does not teach “a vector load-store unit of the vector processor, configured to: load a first operand vector stored in the first operand vector register; load a second operand vector stored in the second operand vector register” nor “perform an operation on each element of the first and second operand vectors according to a count stored in a counter register”. Krueger discloses vector processing using vector registers, as well as using counters to iteratively perform the index vector generation.  However, Krueger includes no discussion of loading vector operands nor using a counter to perform the conditional operation between the loaded vector operands.
	Nickolls discloses a vector load-store unit which loads operand vectors to and from a vector register file ([0136 -0139]:  wherein a vector processor (see Fig. 10)  loads vector operands of a vector register file (element 1010) using vector load/store unit (element 1030)) and performing vector operations on vector elements according to a count stored in a count register ([0137 and 0169-0171]:  wherein the vector processor performs an operation on each element of the one or more vector operands according to a vector count register (element 1002) which indicates a count of the number of vector elements to be processed). The combination would have a processor like Krueger which loads operand data to and from a vector register file using a load-store unit, and performs conditional operations iteratively using a vector count register to track the number of vector elements to process as taught in Nickolls.
	It would have first been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the processor of Krueger and Bharadwaj to load operand vectors, using a load/store unit, to and from a vector register file as the vector processor of Nickolls.  It would have been obvious to one of ordinary skill in the art because loading operand vector data from memory, using a load/store unit, when data is needed, opposed to storing all of the vector operand data locally on the processor, can be used to save memory space and thereby reduce memory cost in a processor.  
	It would have then been further obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the processor of Krueger and Bharadwaj to perform vector operations using a count register as the vector processor of Nickolls.  It would have been obvious to one of ordinary skill in the art because using a vector count register as the reconfigurable vector processor of Nickolls can be used for added flexibility in a vector processor (See Nickolls [0220-0221]).

	Claim 1 is similarly rejected on the same basis as claim 10 above because claim 1 is the method claim which corresponds to the system of claim 10.

	In regards to claim 2, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The method of claim 1” (see rejection of claim 1 above) “wherein the loading of the one or more operand vectors comprises: loading a first operand vector stored in a first operand vector register and loading a second operand vector stored in a second” (Nickolls [0170-0176]:  wherein first and second operand vectors are loaded from a vector register file for processing (See Fig. 10)) “wherein the performing the conditional test operation on each element of at least one of the loaded one or more operand vectors comprises performing the - 76 --Patent ApplicationAttorney Docket No. 120426-055000/USconditional test operation on each element of the loaded first operand vector register and the loaded second operand vector register” (Krueger [0012 and 0019-0022]:  wherein the vector compare (conditional test operation) is performed on each vector element of the vector operands (Note: Nickolls is used for the loading and the overall combination of references teaches this limitation)) “according to the count stored in a counter register” (Nickolls [0010-0011 and 0171]:  wherein vector elements are processed according to a count register (element 1002) (Note:  Krueger is used to teach the conditional test operation (see [0019-0022]) and Nickolls is used to teach vector processing using a count register)) “and wherein the storing of the positions of the results in the vector of test results comprises storing of positions of elements of the loaded first and second operand vectors” (Krueger([0017-0020]:  wherein a first index vector register is used to store a position of the true results in the vector of test results (See Figs. 2B-3 and 5)| Bharadwaj [0063-0065 and 0073]:  wherein a second vector index register (v1) stores positions of false results which correspond to conditional results in a conditional mask register) “according to the count until the positions of the elements of the loaded first and second operand vectors are stored in the first operand vector register or the second operand vector register.” (Nickolls [0010-0011 and 0171]:  wherein vector elements are processed according to a count register (vcnt element 1002).  Wherein the elements are processed until all elements indicated in the count have been reached (Note:  Krueger is used to teach the conditional test operation and storing positions according to a next_cond count (see Figs. 3-4) and Nickolls is used to teach vector processing using a count register.  Therefore the overall combination of references teaches the above limitation)
	
	In regards to claim 3, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The method of claim 2” (see rejection 2 above) “further comprising: loading from at least one of the first vector index register, or the second vector index register, or a combination thereof, by the vector load-store unit, stored positions of the elements of the loaded first and second operand vectors” (Krueger [0021-0022]:  wherein  vector indices from a true vector index are retrieved (loaded) in order to perform a permute operation| Bharadwaj [0073]:  wherein the false index register is retrieved (loaded) to perform a permute operation)| Nickolls ([0136 -0139]:  wherein vector load/store unit (element 1030) load vectors (See Fig. 10) (Note: Nickolls teaches the vector load/store unit and therefore the overall combination of references teaches the above limitation)) “and iterating one or more vector operations over the elements of the loaded first and second operand vectors according to the loaded positions.” (Krueger:  See Fig. 3, element 306 wherein a permute operation is performed over vector operands according to true index vector| Bharadwaj [0063 and 0073-0074]: wherein a permute operation iterated over vector elements according to a false index vector| Nickolls [0010-0011 and 0171-0173]:  wherein elements are processed over iterations (Note: Nickolls teaches the iterative vector operations and therefore the overall combination of references teaches the above limitation)) 

	In regards to claim 11, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The system of claim 10” (see rejection of claim 10) “wherein the first operand vector is a first input operand vector and the second operand vector is a second input operand vector” (Krueger [0012 and 0019-0022]:  wherein a first and second operand are a first and second input operand (See Fig. 5)) “wherein the vector load-store unit” (Nickolls:  See Fig. 10: vector load/store unit (element 1030)) “is configured to: load a second count from a second counter register” (Nickolls [0150 and 0155]:  wherein a vector register (element 1110) is an address register which is used to access the successive elements of a vector by striding through vector elements. Because the register is used to stride to each element of a vector it is used to access each element of a vector, and by accessing the total number of elements in the vector it is counting or determining the total number of elements in a vector) “and load, from the first vector index register, a stored position of respective elements of the loaded first and second input operand vectors according to the second count” (Krueger:  See Fig. 2C:  wherein true index is read (loaded) in order to perform a permute operation (Note:  Nickolls [0150 and 0155] is used to teach using a count register to iteratively process vector elements and the overall combination teaches the limitation above)) “and wherein the vector processor is configured to run a first operation over the respective elements of the loaded first and second input operand vectors according to the loaded position from the first vector index register.” (Krueger:  See Fig. 2C:  wherein true index is read (loaded) in order to perform a permute operation over the input vector operands (Note:  Nickolls [0150 and 0155] is used to teach using a count register to iteratively process vector elements and the overall combination teaches the limitation above in light of running a operation over the elements sequentially))
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the vector load/store unit of Nickolls to load the count in the count register.  It would have been obvious to one of ordinary skill in the art because it would have been applying a known technique (using a vector load/store unit to load a value) to a known device (vector load/store unit of Nickolls) ready for improvement to yield predictable results (a vector processor which includes a vector load/store unit which loads a count) for the benefit of using a single hardware unit to load vectors and count values in a processor which would reduce the amount of hardware needed in a processor, thereby reducing cost. (MPEP 2143, Example D)

	Claim 4 is similarly rejected on the same basis as claim 11 above because claim 4 is the method claim which corresponds to the system of claim 11.

	In regards to claim 12, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The system of claim 11” (see rejection of claim 11 above) “wherein the vector load-store unit is configured to store the result of the first operation into an output operand vector register at a position that corresponds to the loaded position from the first vector index register.” (Krueger:  See Figs. 2C and 5:  wherein the result of the first permute operation is output into a vector (element 226).  Wherein the results are stored at positions which correspond to the loaded positions from the true index (element 224) (Note:  the combination of references teach the load/store unit loading values (See Nickolls [0139 and Fig. 10])))

	Claim 5 is similarly rejected on the same basis as claim 12 above because claim 5 is the method claim which corresponds to the system of claim 12.

	In regards to claim 13, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The system of claim 12” (see rejection of claim 12 above) “wherein the vector processor is configured to:  - 77 --Patent ApplicationAttorney Docket No. 120426-055000/UScontinue to run the first operation over respective elements of the loaded first and second input operand vectors according to loaded positions from the first vector index register” (Krueger:  See Fig. 2C:  wherein true index is read (loaded) in order to perform a permute operation over the input vector operands (see [0019-0022]) (Note:  Nickolls [0150 and 0155] is used to teach using a count register to iteratively process vector elements and the overall combination teaches the limitation above in light of running a operation over the elements sequentially)) “and to store the results of the first operation into the output operand vector register at the corresponding positions that match the loaded positions from the first vector index register” (Krueger:  See Figs. 2C and 5:  wherein the result of the first permute operation is output into a vector (element 226).  Wherein the results are stored at positions which correspond to the loaded positions from the true index (element 224)) “until the second count equals the length of the first index vector register, wherein the count is incremented per loaded position from the first index vector register” (Nickolls [0171-0175]:  wherein processing for vectors continues until vector count register (element 1110) has processed all vector elements which is indicated when the count of vector elements equals zero.  Wherein the vector count register (1110) is incremented per element of the vector (Note:  Krueger is used to teach a vector of true indices, while Nickolls is used to teach iterative vector processing)) “and resetting the second count when the second count equals the length of the first index vector register.” (Nickolls [0175]:  wherein the vector register (element 1110) is written with a new current value (i.e. reset) once the register has reached the length of the vector which is being processed)

	Claim 6 is similarly rejected on the same basis as claim 13 above because claim 6 is the method claim which corresponds to the system of claim 13.

	In regards to claim 14, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The system of claim 13” (see rejection of claim 13 above) “wherein, subsequent to resetting the second count by the vector processor, the vector load-store unit is configured to load, from the second vector index register, a stored position of respective elements of the loaded first and second input operand vectors according to the second count” (Bharadwaj [0073]:  wherein false index values are read (loaded) in order to perform a permute operation over the vector elements (Note:  Nickolls [0150 and 0155] is used to teach using a count register to iteratively process vector elements and the overall combination teaches the limitation above; for example after resetting the register element 1110 a second vector is processed according to element 1110)) “and wherein the vector processor is configured to run a second operation over the respective elements of the loaded first and second input operand vectors according to the loaded position from the second vector index register.” (Bharadwaj [0073]:  wherein false index values are  read (loaded) in order to perform a permute operation over the input vector operands (Note:  Nickolls [0150 and 0155] is used to teach using a count register to iteratively process vector elements and the overall combination teaches the limitation above in light of running a operation over the elements sequentially))


	Claim 7 is similarly rejected on the same basis as claim 14 above because claim 7 is the method claim which corresponds to the system of claim 14.



	In regards to claim 15, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The system of claim 14” (see rejection of claim 14 above) “wherein  the vector processor is configured to - 77 --Patent ApplicationAttorney Docket No. 120426-055000/UScontinue to run the second operation over respective elements of the loaded first and second input operand vectors according to loaded positions from the second vector index register” (Bharadwaj [0064 and 0073]:  wherein false index values are read (loaded) in order to perform a permute operation over the vector operands according to the positions of the false index vector  (Note:  Nickolls [0150 and 0155] is used to teach using a count register to iteratively process vector elements and the overall combination teaches the limitation above in light of running a operation over the elements sequentially)) “and to store the results of the second operation into the output operand vector register at the corresponding positions that match the loaded positions from the second vector index register” (Bharadwaj [0064 and 0073]:  wherein the result of the permute operation is output into a destination register.  Wherein the results are stored at positions which correspond to the loaded positions from the false index) “until the second count equals the length of the second vector index register, wherein the count is incremented per loaded position from the second vector index register” (Nickolls [0171-0175]:  wherein processing for vectors continues until vector register (element 1110) has processed all vector elements which is indicated when the count of vector elements equals zero.  Wherein the vector register (1110) is incremented per element of the vector (Note:  Bharadwaj is used to teach a vector of false indices, while Nickolls is used to teach iterative vector processing))



	Claim 8 is similarly rejected on the same basis as claim 15 above because claim 8 is the method claim which corresponds to the system of claim 15.


	In regards to claim 16, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The system of claim 15” (see rejection of claim 15).
	The overall combination of Krueger, Bharadwaj and Nickolls thus far does not teach “wherein the first operation comprises addition and the second operation comprises subtraction.”  Krueger and Bharadwaj teach the first and second operations being permutation operations.	
	However, Bharadwaj teaches “wherein the first operation comprises addition and the second operation comprises subtraction.”  ([0028]:  wherein an addition and a subtraction operation are disclosed)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the first and second operations performed Krueger and Bharadwaj to be an addition and a subtraction operation.  It would have been obvious to one of ordinary skill in the art because it would have been the simple substitution of one known element (performing an addition and a subtraction operation) for another (performing two permutation operations) for the benefit of added flexibility by allowing various vector operations to be performed. (MPEP 2143, Example B)


	Claim 9 is similarly rejected on the same basis as claim 16 above because claim 9 is the method claim which corresponds to the system of claim 16.

In regards to claim 17, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The system of claim 15” (see rejection of claim 15) “wherein the first vector index register and the second vector index register” (Krueger:  See Fig. 2B:  wherein true index register (element 208) is disclosed| Bharadwaj:  See Fig. 9:  wherein a false index register (v1) is disclosed) “and wherein the vector load-store unit” (Nickolls:  See Fig. 10, wherein a vector load/store unit is disclosed) “is configured to: store, at a top-most unfilled position in the vector index register for each TRUE result of the TRUE results of the conditional test operation a position of the TRUE result in the vector of test” (Krueger [0016-0018]:  wherein indices of true results are stored at the top most unfilled position because the index is formed using a vector compress to the right function (See Fig. 2B)) “and store, for each FALSE result of the FALSE results of the conditional test operation, a position of the FALSE result in the vector of test results.” (Bharadwaj [0073]:  wherein false index corresponding to a false result is stored in a position of a false result in a register (v1))
	The overall combination of Krueger, Bharadwaj and Nickolls thus far does not teach “wherein the first vector index register and the second vector index register are part of one combined vector index register; and store, at a bottom-most unfilled position in the combined vector index register for each FALSE result of the FALSE results of the conditional test operation, a position of the FALSE result in the vector of test results.”  Krueger does teach storing index results in either a bottom or top of a vector index register for true indices (See Krueger [0017-0022]) and Bharadwaj teaches storing false indices in a vector index register.  
	However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the index registers of Krueger and Bharadwaj to be a combined index register to store true and false indices, by using vector compress operations of Krueger to fill the index register with true and false indices.  It would have been obvious to one of ordinary skill in the art because one would see that in Krueger if all results are not true the rest of the empty spaces of the index vector are filled with a fill vector, and see that the empty spaces could be filled with false indices, as taught in Bharadwaj, using a vector compress to the left operation.  Therefore, it would have been obvious to modify the index registers of Krueger and Bharadwaj to be a combined index register to store both true and false indices for the benefit of saving register space by eliminating the need for a true and false index register. It would have been further been obvious to one of ordinary skill in the art because generating the combined index vector using various compress functions can add flexibility to index generation.

	In regards to claim 18, Krueger teaches “A vector processor comprising: a first operand vector register and a second operand vector register of a plurality of operand vector registers” (See Fig. 5:  wherein combination of circuitry in Fig. 5 is a vector processor.  Wherein two vectors stored in the register file are used to perform a comparison at element 504 are disclosed.  Therefore the vectors are stored in vector registers ([0012-0013 and 0022]) “each operand vector register configured to store elements of an operand vector to be used as input for an operation of an ALU” (See Fig. 5:  wherein vector registers store operands to be used as inputs for ALU (element 504)) “the vector processor configured to perform a conditional test operation on each element of the first operand vector register and the second operand vector register, or perform a conditional test operation on an element in the scalar operand register and each element of the first operand vector register” ([0012, 0019 and 0027]:  wherein the ALU (element 504) performs the vector compare operation on each element of the operand vectors and provides a condition vector result (see fig. 5))  “a first vector index register configured to store, for each result of a conditional test operation performed using the element indicating the element meeting a test in the conditional test operation, a position of the result identified by a count for the element” ([0017-0020 and 0024-0025]:  wherein an position (an index of a true result), identified by a next_cond count for the element in response to a result of the vector compare operation, is stored in a first vector index register in response to the result indicating the elements meet a test condition of the vector compare (i.e. that the result was true)(see Figs. 3-4 for clarity:  wherein Fig. 4 is performing a method which implements element 304 of Fig. 3)) “each of the corresponding positions of each operand vector register comprises an element of the operand vector to be operated upon by an ALU” ([0021-0022]:  wherein a vector permute operation is performed on elements in the first and second vector operands using the vector index.  Wherein the permute unit (element 508) is used to execute the instruction)
	Krueger does not teach “a scalar operand register configured to store an element to be used as input for an operation of an arithmetic logic unit (ALU)”,  “a vector processor configured to perform an operation on each element of the first operand vector register and the second operand vector register according to a count stored in a counter register”, “a second vector index register configured to store, for each result of the conditional test operation  performed using the element indicating the element not meeting the test in the conditional test operation, a position identified by the count for the element” nor “each of the positions addable to an effective address for accessing a corresponding position in each operand vector register”.  Krueger does teach generating a true index vector from conditional results and performing an operation on vector operands using the index vector.  However, Krueger does not teach generating a false vector index from conditional results and performing a secondary operation.
	Bharadwaj teaches “a scalar operand register configured to store an element to be used as input for an operation of an arithmetic logic unit (ALU)” ([0028]:  wherein a scalar register stores scalar elements for use in operations in execution clusters (element 160)) “a second vector index register configured to store, for each result of the conditional test operation  performed using the element indicating the element not meeting the test in the conditional test operation, a position identified by the count for the element” ([0063-0065 and 0073]:  wherein a second vector index register (v1) stores positions of false results which correspond to conditional results in a conditional mask register (Note:  Krueger teaches storing the positions according to a count and the overall combination of references teaches the limitation above)) “and each of the positions addable to an effective address for accessing a corresponding position in each operand vector register” ([0091-0093]:  wherein indexes are used in order to generate addresses by adding the positions (indices/indexes) to base addresses)	
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the system of Krueger with the teachings of Bharadwaj which use conditional results to generate a false index vector in order to perform operations on vector elements which produce a false condition.  It would have been obvious to one of ordinary skill in the art because one of ordinary skill would identify that Krueger produces true and false conditional results using a vector compare operation and then generates a vector index for the true results and would see that Bharadwaj generates a vector index for false results based on conditional results.  Therefore, it would have been obvious to one of ordinary skill in the art because it would have expanded the dynamic capabilities of Krueger by allowing a true and false index vector to be generated in order to perform vector operations on vector operands which caused true and false conditions.  It would have been further used for the benefit of vectorising loops with loop carried dependences (See Bharadwaj [0065]).
	It would have further been obvious to one of ordinary skill in the art before the effective filing date to modify Krueger to include scalar operands and use indexes to calculate addresses as taught in Bharadwaj.  It would have been obvious to one of ordinary skill in the art because using scalar operands in addition to vector operands can add flexibility to a processor by allowing it to execute operations on various datatypes.  It would have also added flexibility to the addressing scheme provided in Krueger by using a base plus indexing addressing scheme. 
	The overall combination of Krueger and Bharadwaj thus far does not teach “a vector processor configured to perform an operation on each element of the first operand vector register and the second operand vector register according to a count stored in a counter register”. Krueger discloses vector processing using vector registers, as well as using counters to iteratively perform the index vector generation.  However, Krueger includes no discussion of using a counter to perform the conditional test operations between the vector operands.
	Nickolls discloses performing vector operations on vector elements according to a count stored in a count register ([0137 and 0169-0171]:  wherein the vector processor performs an operation on each element of the one or more vector operands according to a vector count register (element 1002) which indicates a count of the number of vector elements to be processed) The combination would have a processor like Krueger which performs vector compare operations iteratively using a vector count register to track the number of vector elements to process as taught in Nickolls.
	It would have then been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the processor of Krueger and Bharadwaj to perform vector operations using a count register as the reconfigurable vector processor of Nickolls.  It would have been obvious to one of ordinary skill in the art because using a vector count register as the reconfigurable vector processor of Nickolls can be used for added flexibility in a vector processor (See Nickolls [0220-0221]).

In regards to claim 19, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The vector processor of claim 18” (see rejection of claim 18) “wherein the first operand vector is a first input operand vector and the second operand vector is a second input operand vector” (Krueger [0012 and 0019-0022]:  wherein a first and second operand are a first and second input operand (See Fig. 5)) “wherein the vector processor is configured to: load a second count from a second counter register” (Nickolls [0150 and 0155]:  wherein a vector register (element 1110) is an address register which is used to access the successive elements of a vector by striding through vector elements. Because the register is used to stride to each element of a vector it is used to access each element of a vector, and by accessing the total number of elements in the vector it is counting or determining the total number of elements in a vector) “and load, from the first vector index register, a stored position of respective elements of the loaded first and second input operand vectors according to the second count” (Krueger:  See Figs. 2B-C:  wherein true index is read (loaded) in order to perform a permute operation (Note:  Nickolls [0150 and 0155] is used to teach using a count register to iteratively process vector elements and the overall combination teaches the limitation above)) “load, from the second vector index register, a stored position of respective elements of the loaded first and second input operand vectors according to the second count” (Bharadwaj [0073]:  wherein false index is read (loaded) in order to perform a permute operation over the vector elements (Note:  Nickolls [0150 and 0155] is used to teach using a count register to iteratively process vector elements and the overall combination teaches the limitation above; for example after resetting the register element 1110 a second vector is processed according to element 1110)) “run a first operation over  respective elements of the loaded first and second input operand vectors according to the loaded position from the first vector index register and to store the results of the first operation into an output operand vector register at the corresponding positions that match the loaded positions from the first vector index register” (Krueger:  See Figs. 2B-C:  wherein true index is read (loaded) in order to perform a permute operation over the input vector operands and store results in an output vector register at corresponding positions from index) “and run a second operation over respective elements of the loaded first and second input operand vectors according to loaded positions from the second vector index register” (Bharadwaj [0064 and 0073]:  wherein false index is read (loaded) in order to perform a permute operation over the vector operands according to the positions of the false index vector ) “and to store the results of the second operation into the output operand vector register at the corresponding positions that match the loaded positions from the second vector index register.” (Bharadwaj [0064 and 0073]:  wherein the result of the permute operation is output into a destination register.  Wherein the results are stored at positions which correspond to the loaded positions from the false index)
 	The overall combination of Krueger, Bharadwaj and Nickolls thus far does not teach “running the first and second operations in parallel”.
	However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to perform the first and second operations in parallel.  It would have been obvious to one of ordinary skill in the art because it would have been the simple substitution of one known element (executing two operations in parallel) for another (executing two operations sequentially) for the benefit of reducing execution cycles. (MPEP 2143, Example B)


	In regards to claim 20, the overall combination of Krueger, Bharadwaj and Nickolls teaches “The vector processor of claim 19” (see rejection of claim 19).
	The overall combination of Krueger, Bharadwaj and Nickolls thus far does not teach “wherein the first operation comprises addition and the second operation comprises subtraction.”  Krueger and Bharadwaj teach the first and second operations being permutation operations.	
	However, Bharadwaj teaches “wherein the first operation comprises addition and the second operation comprises subtraction.”  ([0028]:  wherein an addition and a subtraction operation are disclosed)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the first and second operations performed Krueger and Bharadwaj to be an addition and a subtraction operation.  It would have been obvious to one of ordinary skill in the art because it would have been the simple substitution of one known element (performing an addition and a subtraction operation) for another (performing two permutation operations) for the benefit of added flexibility by allowing various vector operations to be performed. (MPEP 2143, Example B)



Examiner Notes
10.	The examiner notes that claims 1, 10-11, 14 and 17 use the language “loading by a vector load-store unit” or “a vector load-store unit of the vector processor, configured to: load/store”, which some may construe as invoking 35 U.S.C 112(f). However, the limitations do not invoke 112(f) because the limitations fail the 3-prong test, as the term “vector load/store unit” would be understood by persons of ordinary skill in the art to have a sufficiently definite meaning as a name for structure. (See MPEP
2181 I (A))
Response to Arguments
11.	Applicant argues the double patenting rejections on page 11 of the remarks, in the substance that:
	“Without acquiescing to the propriety of the rejection, Applicant intends to consider a Terminal Disclaimer at such time as allowable subject matter is identified.”

	The examiner respectfully acknowledges that the applicant will consider the double patenting rejections at the time allowable subject matter is identified.  Therefore, the double patenting rejections will remain until such time. 

12.	Applicant's arguments filed on 3/02/2021 have been fully considered but they are not persuasive. Therefore the previous rejection of claim 1 under 35 USC 103 in view of Krueger, Bharadwaj and Nickolls is maintained.
	Claims 10 and 18 are argued for at least the same reasons as claim 1 above and therefore remain rejected for at least the same reasons as rejected claim 1 above.
	The dependent claims 2-9, 11-17 and 19-20 are argued at least based on virtue of their dependencies upon rejected claims 1, 10 and 18 above, and therefore remain rejected for being dependent on the rejected claims 1, 10 and 18 above.


13.	Applicant argues the 35 USC 103 rejection of similar independent claims 1, 10 and 18, on page 12-13 of remarks, in substance that:
	“That is, claim 1 can achieve the combined first and second vector registers in a more efficient manner than Krueger and Bharadwaj collectively. Accordingly, Krueger as modified by Bharadwaj does not teach or suggest every feature from claim 1. Nickolls does not remedy the deficiencies of Krueger and Bharadwaj. 
	For at least the above reasons, claim 1 is patentable over Krueger, Bharadwaj and Nickolls. Claims 10 and 18, though different in scope, are patentable over Krueger, Bharadwaj, and Nickolls for at least the same reasons as claim 1. The remaining claims are dependent on claim 1, 10, or 18 and are therefore patentable at least by virtue of their dependencies. Applicant does not concede the correctness of the rejection for claim features not discussed. Applicant requests favorable reconsideration of the claims and withdrawal of the rejection.”

	The examiner first notes that it appears the applicant is stating that Krueger and Bharadwaj can achieve the combined first and second vector registers as claimed in claim 1, and further states that claim 1 achieves the combined first and second vector registers in a more efficient manner.  However, it does not matter that Krueger and Bharadwaj achieve the claimed combined first and second vector registers in a less efficient manner, if the more efficient manner of achieving the combined first and second vector registers is not claimed.  The applicant argues that claim 1 achieves the vector registers in a more efficient manner, but does not indicate which portion of the claims discloses the more efficient manner.  Therefore, the examiner is unclear which portion of the claim indicates a more efficient manner of generating vector index registers.  The examiner suggest in the next response that the applicant amend the claims to further distinguish the more efficient manner of generating index registers, as well as explain in the remarks the distinguishing features that are more efficient.
	At best the examiner believes the applicant is attempting to claim that storing positions in the first and second index vector registers is done by identifying a position to be stored, in a vector index register, using a count value (see newly amended claim 1 which states “storing, by the vector processor, a position, identified by the count for the element in response to a result of the conditional test operation performed using the element, in: a first vector index register …a second vector index register”), and that somehow using a count value is a more efficient manner.  However, the examiner respectfully disagrees.
	For example, Krueger, Fig. 4 and paragraph [0024], disclose using a next_cond value to identify positions of a condition register (i.e. register which stores results of compare operation which is conditional test operation) which are to be stored in a vector index register.  This next_cond value can be considered a counter used to store values in a first index register because the value counts/tracks the location in the condition register that is currently being used to make a load determination into the first index register.  The combination of Krueger and Bharadwaj would then teach storing values in a first and second vector index register using a variable such as the next_cond value to track which positions are to be stored in respective first and second vector index registers (depending upon if the result meets the condition as taught in Krueger or if it does not meet the condition as taught in Bharadwaj), and the overall combination of references would teach the newly amended claim limitations.  

Conclusion
14.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to COURTNEY P CARMICHAEL-MOODY whose telephone number is (571)431-0692.  The examiner can normally be reached on M-F, 10am-7pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/COURTNEY P CARMICHAEL-MOODY/           Examiner, Art Unit 2183