Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
1.  Applicant's arguments filed May 4th, 2021, with respect to the 35 USC 102 rejection of claim 9 and the 35 USC 103 rejections of claims 1-8 and 10-20 have been fully considered but they are not persuasive.
The arguments regarding claim 9 are directed toward limitations of the claim added via amendment.  Therefore, they will be addressed in the rejection below.

Regarding claim 1, Applicant argues that Chen and Biscondi fail to teach “memory banks that are dedicated to respective VSPs” as “the VGPRs 110a-d do not exclusively send data to a given VSP in Chen and thus are not ‘dedicated’ as defined”.
In response to the above argument, Examiner respectfully disagrees.  In the Office Action dated January 21st, 2021, the claimed “VSPs” of claims 1 and 9 are equated to the “super-SIMDs 200a-d” disclosed by Chen (see 1/21/21 Office Action rejections of claims 1 and 9).  While Applicant argues that, for example, the crossbar 330 displayed in Figure 2 shows that the VGPRs are not “dedicated” as defined by Applicant, crossbar 330 is contained entirely within a single super-SIMD, not used as a connection between multiple super-SIMDs 200a-d.  Figure 3 of Chen, relied upon in the previous rejection, clearly shows each VSP (super-SIMD 200a, b, c, or d) containing a dedicated set of memory banks (VGPR banks 110a-d, one set of banks per super-SIMD).  Therefore, while each VGPR bank may transmit between multiple components of a single super-SIMD, each super-SIMD (equated to the claimed VSPs) includes a plurality of memory banks which are “dedicated” to that particular super-SIMD (see Chen Figure 3).  Therefore, Applicant’s arguments are not considered persuasive and the rejections are maintained.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


2.  Claim 9 is rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen et al (US 2018/0121386, herein Chen).

Regarding claim 9, Chen teaches a processor, comprising:
a plurality of vector sub-processors (VSPs) (Fig 3, [0059], super-SIMDs 200a-d); and
a plurality of memory banks dedicated to respective VSPs of the plurality of VSPs (Figs 1A & 3, [0021], VGPRs 110a-d), wherein a first memory bank dedicated to a first VSP of the plurality of VSPs comprises:
a plurality of operand gathering components configured to be assigned to individual threads and to store operands for the assigned individual threads while the threads are assigned to the first VSP (Chen Figs 1A & 2, [0033], source operand flip-flops storing operands & read crossbar 330, [0034], [0038], per-thread VGPRs).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.  Claims 1-8 and 10-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al (US 2018/0121386, herein Chen) in view of Biscondi et al (US 2009/0254718, herein Biscondi).

Regarding claim 1, Chen teaches a processor, comprising:
a plurality of vector sub-processors (VSPs) (Fig 3, [0059], super-SIMDs 200a-d); and

a first plurality of vector general purpose register (VGPR) banks (Fig 1A, [0021], VGPR banks 110a, 110b, any number of VGPRs can be utilized); and
a second plurality of VGPR banks corresponding to the first plurality of VGPR banks (Fig 1A, [0021], VGPR banks 110c, 110d, any number of VGPRs can be utilized).
	Chen fails to teach wherein the VGPR banks are partitioned into high and low VGPR banks.
	Biscondi teaches a processor, comprising:
	a plurality of memory banks dedicated to a respective vector processor (Fig 5, [0040], vector memory banks) comprising a first plurality of high vector general purpose (VGPR) banks and a first plurality of low VGPR banks corresponding to the plurality of high VGPR banks (Fig 9, [0057-0059], [0064], concatenated register pairs split high and low order bits between adjacent banks of vector registers).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Chen and Biscondi to utilize register pairs for operands that are wider than a single vector register of the processor.  While Chen teaches each vector sub-processor utilizing four or more adjacent vector register banks, Chen does not explicitly contemplate these register banks being utilized in a paired or concatenated manner.  However, Chen does disclose a vector execution unit reading multiple VGPRs as source operands based on the SIMD width (Chen [0022]).  As wide operands are a routine and conventional aspect of SIMD and vector processors in the art, splitting the register banks between high and low pairs to hold wide operands, as disclosed by Biscondi, would be an obvious means of implementing wide vector operands.  Doing so may increase the functionality of the SIMD processor, and would merely entail a combination of known prior art elements to achieve predictable results.

	Regarding claim 2, the combination of Chen and Biscondi teaches the processor of claim 1, further comprising a second memory bank dedicated to a second VSP of the plurality of VSPs, wherein the second memory bank comprises: a second plurality of high VGPR banks; and a second plurality of low VGPR banks corresponding to the second plurality of high VGPR banks (Chen Figs 1A & 3, VGPRs 110a-d of second super-SIMD 200b & Biscondi Fig 9, [0057-0059], [0064], paired register banks for high and low order bits).

Regarding claim 3, the combination of Chen and Biscondi teaches the processor of claim 1, further comprising a broadcast switch configured to broadcast operands between the plurality of VSPs (Chen [0026-0027], operand delivery network 240).

Regarding claim 4, the combination of Chen and Biscondi teaches the processor of claim 1, wherein the first memory bank further comprises a plurality of operand gathering components corresponding to VGPR banks of the first VSP, wherein a first operand gathering component is configured to store a first plurality of operands from a corresponding high VGPR bank and to store a second plurality of operands from a corresponding low VGPR bank (Chen Figs 1A & 2, input multiplexors 105 & read crossbar 330, Biscondi Fig 9, [0057-0059], [0064], paired register banks for high and low order bits).

Regarding claim 5, the combination of Chen and Biscondi teaches the processor of claim 4, further comprising a phase multiplexer of the first VSP, wherein the phase multiplexer is configured to provide operands from the first operand gathering component to an arithmetic logic unit (ALU) of the first VSP (Chen Fig 1A, input multiplexers 105).
Regarding claim 6, the combination of Chen and Biscondi teaches the processor of claim 1, further comprising a scheduler configured to assign threads to individual VSPs of the plurality of VSPs (Chen [0014], [0050], scheduler).

Regarding claim 7, the combination of Chen and Biscondi teaches the processor of claim 6, wherein dedicating a first thread to the first VSP comprises identifying a first high VGPR bank of the first VSP and a first low VGPR bank of the first VSP to store data of the first thread (Chen [0034], [0038], per-thread VGPRs & Biscondi Fig 9, [0057-0059], [0064], paired register banks for high and low order bits).

Regarding claim 8, the combination of Chen and Biscondi teaches the processor of claim 7, wherein identifying the first high VGPR bank is based on at least a portion of an address of the first thread (Chen [0038], [0064], VGPR addressing & Biscondi [0043], vector memory addressing & Fig 9, [0057-0059], [0064], paired register banks for high and low order bits).

Regarding claim 10, Chen teaches the processor of claim 9, wherein a first operand gathering component of operand gathering components comprises a first storage component configured to store a first operand and second operand from a VGPR bank of the first memory bank (Chen Figs 1A & 2, input multiplexors 105 & read crossbar 330, [0034], [0038], per-thread VGPRs).
Chen fails to teach wherein the VGPR banks are partitioned into high and low VGPR banks.
	Biscondi teaches a processor, comprising:
	a plurality of memory banks dedicated to a respective vector processor (Fig 5, [0040], vector memory banks) comprising a first plurality of high vector general purpose (VGPR) banks and a first plurality of low VGPR banks corresponding to the plurality of high VGPR banks (Fig 9, [0057-0059], [0064], concatenated register pairs split high and low order bits between adjacent banks of vector registers).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Chen and Biscondi to utilize register pairs for operands that are wider than a single vector register of the processor.  While Chen teaches each vector sub-processor utilizing four or more adjacent vector register banks, Chen does not explicitly contemplate these register banks 

Regarding claim 11, the combination of Chen and Biscondi teaches the processor of claim 10, wherein the first operand gathering component is configured to receive the first operand and the second operand concurrently (Chen [0022]).

Regarding claim 12, the combination of Chen and Biscondi teaches the processor of claim 10, wherein the first operand fathering component is configured to provide the first operand and the second operand to a phase multiplexer of the first VSP (Chen Figs 1A & 2, input/source muxes).

Regarding claim 13, the combination of Chen and Biscondi teaches the processor of claim 12, wherein the phase multiplexer of the first VSP is configured to provide the first operand and the second operand to an arithmetic logic unit (ALU) of the first VSP (Chen Figs 1A & 2, ALUs).

Regarding claim 14, the combination of Chen and Biscondi teaches the processor of claim 13, further comprising a broadcast switch configured to broadcast operands between the plurality of VSPs (Chen [0026-0027], operand delivery network 240).
Regarding claim 15, the combination of Chen and Biscondi teaches the processor of claim 14, wherein the first VSP is configured to send a matrix multiplication input via the broadcast switch (Biscondi [0030], matrix algebra).

Regarding claim 16, Chen teaches a method, comprising:
receiving, by a scheduler of a processor, an address of a first thread (Figs 1A & 3, [0014], [0050], scheduler);
assigning, by the scheduler, the first thread to a first vector sub-processor (VSP) of a plurality of VSPs of the processor (Figs 1A & 3, [0014], [0034], [0050], scheduling threads); and
sending, by the scheduler, the first thread to a first vector general purpose register (VGPR) bank is dedicated to the first VSP ([0034], [0038], per-thread VGPRs).
Chen fails to teach wherein the VGPR banks are partitioned into high and low VGPR banks.
	Biscondi teaches a method, comprising:
	partitioning a plurality of memory banks dedicated to a respective vector processor (Fig 5, [0040], vector memory banks) into high vector general purpose (VGPR) banks and low VGPR banks corresponding to the plurality of high VGPR banks (Fig 9, [0057-0059], [0064], concatenated register pairs split high and low order bits between adjacent banks of vector registers).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Chen and Biscondi to utilize register pairs for operands that are wider than a single vector register of the processor.  While Chen teaches each vector sub-processor utilizing four or more adjacent vector register banks, Chen does not explicitly contemplate these register banks being utilized in a paired or concatenated manner.  However, Chen does disclose a vector execution unit reading multiple VGPRs as source operands based on the SIMD width (Chen [0022]).  As wide operands are a routine and conventional aspect of SIMD and vector processors in the art, splitting the register banks between high and low pairs to hold wide operands, as disclosed by Biscondi, would be an obvious means of implementing wide vector operands.  Doing so may increase the functionality of the SIMD processor, and would merely entail a combination of known prior art elements to achieve predictable results.


receiving, by the scheduler of the processor, an address of a second thread; assigning, by the scheduler, the second thread to a second VSP of the plurality of VSPs; and sending, by the scheduler, the second thread to a second high VGPR bank and a second low VGPR bank, wherein the second high VGPR bank and the second low VGPR bank are dedicated to the second VSP (Chen [0013], multithreading & [0034], per-thread VGPRs & [0014], [0050], scheduling & Biscondi Fig 9, [0057-0059], [0064], paired register banks for high and low order bits).

Regarding claim 18, the combination of Chen and Biscondi teaches the method of claim 16, further comprising storing, by a first operand gathering component dedicated to the first VSP, a first plurality of operands for the first thread from the first VGPR bank and a second plurality of operands for the first thread from the first low VGPR bank (Chen Figs 1A & 2, input multiplexers & crossbar 330, Biscondi Fig 9, [0057-0059], [0064], paired register banks for high and low order bits).

Regarding claim 19, the combination of Chen and Biscondi teaches the method of claim 18, further comprising selectively sending, by a phase multiplexer of the first VSP, operands from a plurality of operand gathering components dedicated to the first VSP including the first operand gathering component to an arithmetic logic unit (ALU) of the first VSP (Chen Figs 1A, 2, ALUs).

Regarding claim 20, the combination of Chen and Biscondi teaches the method of claim 19, further comprising sending, by the first VSP to a broadcast switch of the processor, double-precision thread data based on the ALU operating on the operands from the phase multiplexer, wherein the first plurality of operands are sent from the first high VGPR bank to the ALU of the VSP without being sent to the broadcast switch (Chen [0027, [0036], operand delivery network 240 & Biscondi Fig 9, [0057-0059], [0064]).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105.  The examiner can normally be reached on Monday-Friday 7:30-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL J METZGER/             Primary Examiner, Art Unit 2182