DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This final office action is responsive to the amendment filed on 10/03/2022.
Claims 1-3, 5, 7-16, 18, 20-26 are pending.

Response to Amendment

Applicant has amended independent claims 1, 14  and dependent claims 2-3, 5, 7-8, 15-16, 18, 20-21 to include new/old limitations in a form not previously presented necessitating new search and considerations.  Claims 4, 6, 17, 19 have been canceled.



Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.



Claims 1-3, 5, 7-16, 18, and 20-26 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.


The following claim language is not clearly understood:

Claim 1 lines 9-10 recite “compute fabric multiple threads” without clearly reciting if these threads are associated with the compute nodes or processor or each/some of the compute nodes or token buffer or the entire fabric. 

Claim 1 lines 12 recites “token buffers” without clearly reciting what is being referred as “token buffers” (i.e. if token buffer is only a buffer/memory or “token” is associated with certain event/thread).

Claim 1 lines 11-15 recite “operand being assigned in the token buffers to the threads that are to use the operands”. It is unclear if the operand in the same token buffers are assigned to each thread or thread and operands are mapped in one to one mapping. 

Claim 11 line 3 recites “valid range” without clearly reciting what constitute the valid range (i.e. what range is valid and what are invalid range).

Claim 14 recites elements of claim 1 and have similar deficiency as claim 1. Therefore, they are rejected for the same rational. Remaining dependent claims are also rejected due to their dependency on the rejected independent claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claim(s) 1-2, 3-5, 7, 14-16, 18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kanstein et al. (US 2009/0070552 A1, hereafter Kanstein) in view of Fleming, Jr. et al. (US 2019/0205269 A1, hereafter Fleming).

Kanstein was cited in the last office action.

As per claim 1, Kanstein teaches the invention substantially as claimed including a  processor (fig. 1 processor 11), comprising: 
a compute fabric, comprising an array of compute nodes ([0077] fig. 1  functional units FUs 13 Register files RFs 14 configurable array 12 ADRES array ) and interconnects that configurably connect the compute 5nodes ([0080] routing resources 16 interconnection FUs 13 Register files RFs 14 [0088] interconnect arrangement); and 
a controller, configured to configure at least some of the compute nodes in the array and at least some of the interconnects in the compute fabric to execute specified code instructions (fig 2 controllers 26-28 program counters 21-23 instruction fetch 29 partitions 17-19 fig. 1 partitions 17-19 associated FUs Register files RFs 14 [0007] control modules, assigned, one of the processing units for control [0010] control module, perform, operations i.e. increment/change on program counter [0047] code partitions, processing unit, executed [0088] every partition, array mode, control module, interconnect arrangement), and to send to the compute fabric multiple threads, each of the threads executing the specified code instructions ([0032] application, plurality of threads [0033] assignment, one or more threads, processing units [0084] fig. 1 partitions 17-19, threads, executed, parallel, combination of partitions to execute a threads [0085] array mode, plurality of threads, run, parallel, plurality of non-overlapping processing units),

wherein at least some of the compute nodes in the array comprise token buffers configured to receive and store operands for use in executing the code instructions ([0011] data storage, registers, provided for each processing unit [0076] ADRES, computational, storage, routing resources, data storage, register files RFs 14, used to store intermediate data [0013 ] data content, processing unit, thread, data [0077] fig. 1  functional units FUs 13 Register files RFs 14 configurable array 12 ADRES array [0080] results of the FUs can be written to data storage such as distributed RF, results of the FUs can be routed other FUs 13, output, buffered, output register, route data from different sources [0081] ADRES instance, functional units 13, data storage, registers [0047] code partitions, processing unit, executed  [0088] every partition, array mode, control module, interconnect arrangement), the operands being assigned in the token buffers to the threads that are to use the operands ([0012] process threads, executable, processing units [0013] data content, which processing unit functions of a thread are to be mapped or data, functional units [0020] single threads/multi thread approach [0033] assignment, one or more threads, processing units), 

wherein a compute node among the compute nodes in the array is configured to execute a code instruction for a first thread ([0012] process threads, executable, processing units ([0077] fig. 1  functional units FUs 13 Register files RFs 14 configurable array 12 ADRES array [0085] plurality of threads, run, parallel, non-overlapping processing units, control module), to transfer a result of the code instruction to a token buffer, and to store the result in the token buffer including assigning the result in the token buffer to a second thread different from the first thread, for use as an operand by the second thread ([0080] result, FUs 13, written to data storage RFs 14, result of the FUs13 can be routed to other FUs 13 [0039] distributing outputs to their assigned partition [0071] [0033] assignment, one or more threads, processing units [0084] fig. 1 partitions 17-19, threads, executed, parallel).

Kanstein doesn’t specifically teach token buffers to store operands for executing the code, assigning the result in the token buffer to second thread different from the first thread.

Fleming, Jr. however, teaches token buffers to store operands for executing the code ([0157] fig. 6 array of processing elements 604 input buffer 606 output buffer 608 [0178] data input buffer, data path, carry, data input value, token, stored, input buffer [0179] processing elements 900, execution, operands), assigning the result in the token buffer to second thread different from the first thread ([0180] producer element, send output from output buffer to respective input buffer of plurality of consumer e.g. receiving processing elements), for use as an operand by the second thread ([0606] each core, multithreading, each core, each threads ).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teachings of Kanstein with the teachings of Fleming, Jr of data path carry data input value, token stored in the input buffer, producer element sending output from output buffer to input buffer of consumer processing elements and each thread executing on each core to improve efficiency and allow token buffers to store operands for executing the code, assigning the result in the token buffer to second thread different from the first thread to the method of Kanestein as in the instant invention. 

As per claim 2, Kanestein teaches wherein transfer of the result is to the buffer of compute node that executes the code instruction ([0080] result, FUs 13, written to data storage RFs 14, result, routed). 
Fleming jr, teaches remaining claim elements of token buffer ([0157] fig. 6 array of processing elements 604 input buffer 606 output buffer 608 [0178] data input buffer, data path, carry, data input value, token, stored, input buffer [0179] processing elements 900, execution, operands).

 
As per claim 3, Kanstein teaches wherein the 20compute node that executes the code instruction is configured to transfer the result to the buffer of a different compute node in the fabric ([0080] result, result of the FUs13 can be routed to other FUs, routed). 
Fleming jr, teaches remaining claim elements of token buffer ([0157] fig. 6 array of processing elements 604 input buffer 606 output buffer 608 [0178] data input buffer, data path, carry, data input value, token, stored, input buffer [0179] processing elements 900, execution, operands).


As per claim 5, Kanstein teaches wherein the buffer is formed of a cascade of two or more buffers of two or more of the compute nodes ([0080] result, FUs 13, written to data storage distributed RFs 14 shared storage register RF’ 15 , global data storage shared between a plurality of functional units 13). 
Fleming jr, teaches remaining claim elements of token buffer ([0157] fig. 6 array of processing elements 604 input buffer 606 output buffer 608 [0178] data input buffer, data path, carry, data input value, token, stored, input buffer [0179] processing elements 900, execution, operands).


As per claim 7, Kanstein teaches wherein the 5compute node is configured to transfer the result by saving the result to a buffer that comprises multiple slots ([0080] result, FUs 13, written to data storage distributed RFs 14 [0116] clustered register file, clustered, each thread has its own register), each slot assigned to a respective thread ([0116] clustered register file, clustered, each thread has its own register), and to associate the result with the second thread by saving the result to a slot assigned to the second thread ([0080] result, FUs 13, written to data storage distributed RFs 14 RF’ 15 data storage shared between plurality of functional units, result of the FUs 13 can be routed to other FUs 13  [0032] [0033] assignment, one or more threads, processing units [0084] [0039] distributing, outputs, assigned partitions [0071] output of device A is directly connected to an input of device B [0116] dual thread execution, shadow register file).    
Fleming jr, teaches remaining claim elements of token buffer ([0157] fig. 6 array of processing elements 604 input buffer 606 output buffer 608 [0178] data input buffer, data path, carry, data input value, token, stored, input buffer [0179] processing elements 900, execution, operands).
10
Claim 14 recites a method for elements of claim 1. Therefore, it is rejected for the same rational.
Claim 15 recites a method for elements of claim 2. Therefore, it is rejected for the same rational.
Claim 16 recites a method for elements of claim 3. Therefore, it is rejected for the same rational.
Claim 18 recites a method for elements of claim 5. Therefore, it is rejected for the same rational.
Claim 20 recites a method for elements of claim 7. Therefore, it is rejected for the same rational.
20

Claim(s) 8, 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kanstein in view of Fleming, Jr. , as applied to above claims, and further in view of  Archer et al. (US 2013/0074097 A1, hereafter Archer).
Archer was cited in the last office action.


As per claim 8, Kanstein teaches wherein the compute node is configured to assign the result to the second thread by transferring to the buffer the result ( [0080] result, FUs 13, written to data storage distributed RFs 14 RF’ 15 data storage shared between plurality of functional units, result of the FUs 13 can be routed to other FUs 13  [0032] [0033] assignment, one or more threads, processing units [0084] [0039] distributing, outputs, assigned partitions [0071] output of device A is directly connected to an input of device B).  
Kanstein do not specifically teach transferring to the token buffer, in addition to the result metadata that specifies the second thread.

Fleming. Jr, however, teaches transferring to the token buffer, the result ([0157] fig. 6 array of processing elements 604 input buffer 606 output buffer 608 [0178] data input buffer, data path, carry, data input value, token, stored, input buffer [0179] processing elements 900, execution, operands).

Kanstein and Fleming, Jr., in combination, do not specifically teach transferring metadata that specifies the second thread.
Archer, however, teaches transferring metadata that specifies the second thread ([0050] parallel active messaging interface, data communication, endpoints, parameters for a thread of execution, compute node [0053]).
It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teachings of Kanstein and Fleming, jr with the teachings of Archer of data communications comprising parameters for a thread of executions to improve efficiency and allow associating result by transferring metadata that specifies the second thread to the method of Kanstein and Fleming, jr. as in the instant invention.
Claim 21 recites a method for elements of claim 8. Therefore, it is rejected for the same rational.


Claim(s) 9-10, 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kanestein in view of Fleming, Jr. as applied to above claims, and further in view of  Surti et al. (US 2014/0306970 A1, hereafter Surti).

Surti wax cited in the last office action.

As per claim 9, Kanestein teaches wherein the 15multiple threads execution, ([0003] simultaneously processing, threads, multi-processing/multi-threading manner).  

Kanstein and Fleming, Jr., in combination, do not specifically teach threads have a predefined order of execution and wherein the first thread is earlier than the second thread in the order of execution.

Surti, however, teaches threads have a predefined order of execution ([0004] thread dependency register [0005] previous threads retire, correct order of the dependent thread, dependent thread may wait) and wherein the first thread is earlier than the second thread in the order of execution ([0005] previous thread retire, fully executed, guaranteeing correct order of thread ).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teachings of Kanstein and Fleming, Jr. with the teachings of Surti of execution of correct order of dependent thread after the previous thread retire to improve efficiency and allow threads have a predefined order of execution and wherein the first thread is earlier than the second thread in the order of execution to the method of Kanstein and Fleming, Jr. as in the instant invention.

As per claim 10, Kanestein teaches wherein the multiple threads of execution ([0003] simultaneously processing, threads, multi-processing/multi-threading manner).  
Surti teaches remaining claim elements of threads have a predefined order and  20wherein the first thread is later than the second thread in the order of execution ([0004] thread dependency register [0005] previous threads retire, correct order of the dependent thread, dependent thread may wait [0005] previous thread retire, fully executed, guaranteeing correct order of thread i.e. order is based on dependency).

Claim 22 recites a method for elements of claim 9. Therefore, it is rejected for the same rational.
Claim 23 recites a method for elements of claim 10. Therefore, it is rejected for the same rational.

 
Claim(s) 11-12, 24-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kanestein in view of  Fleming, Jr., as applied to above claims, and further in view of  Glauert et al. (US 2016/0259668  A1, hereafter Glauert).
Glauert was cited in the last office action.

As per claim 11, Kanestein teaches wherein the compute node is configured to transfer the result ([0080] result, FUs 13, written to data storage distributed RFs 14 RF’ 15 data storage shared between plurality of functional units, result of the FUs 13 can be routed to other FUs 13). 25

Kanestein and Fleming, Jr., in combination, do not specifically teach transfer result only when an identifier of the second thread is within a valid range.

Glauert, however, teaches transfer result only when an identifier of the second thread is within a valid range ([0005] execute plurality of threads, threads, perform operations on thread data, thread identifier, value, dependent on thread identifier [0057] output, input to summation elements, value output from the generator i.e. summation element has unique value [0056] [0063] fig. 3 identifier [0068] fig. 5 image region to be processed 260 range of thread identifier top left (1,1) and bottom right (5,4) is within the range of the valid thread identifiers [0032] fig 10 515).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teachings of Kanstein and Fleming, Jr. with the teachings of Glauert of output is input to summation elements and uniquely identifier summation elements with a value generated based on a constant to improve efficiency and allow teach transfer result only when an identifier of the second thread is within a valid range to the method of Kanstein and Fleming, Jr. as in the instant invention.

As per claim 12, Kanestein teaches wherein at least one of the compute nodes, which is assigned to use the result as an operand, is configured to use the transferred result ([0080] result, FUs 13, written to data storage distributed RFs 14 RF’ 15 data storage shared between plurality of functional units, result of the FUs 13 can be routed to other FUs 13).  
Glauert teaches remaining claim elements to transfer result only when an identifier of the second thread is within a valid range ([0005] execute plurality of threads, threads, perform operations on thread data, thread identifier, value, dependent on thread identifier [0057] output, input to summation elements, value output from the generator i.e. summation element has unique value [0056] [0063] fig. 3 identifier [0068] fig. 5 image region to be processed 260 range of thread identifier top left (1,1) and bottom right (5,4) i.e. rest of the thread identifier is outside the range [0032] fig 10 515).

Claim 24 recites a method for elements of claim 11. Therefore, it is rejected for the same rational.
Claim 25 recites a method for elements of claim 12. Therefore, it is rejected for the same rational.


Claim(s) 13, 26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kanestein in view of Fleming, Jr. and further in view of  Glauert, as applied to above claims, and further in view of Ron et al. (US 2016/0246728 A1, hereafter Ron).
Ron was cited in the last office action.

As per claim 13, Kanestein teaches the compute node, which is assigned to use the result as an operand, is configured for 5execution of the second thread ([0080] result, FUs 13, written to data storage distributed RFs 14 RF’ 15 data storage shared between plurality of functional units, result of the FUs 13 can be routed to other FUs 13).  
Kanestein and Fleming, Jr., in combination, do not specifically teach when the identifier of the second thread is outside the valid range, compute node is configured to obtain the operand from an alternative source.

Glauert, however, teaches when the identifier of the second thread is outside the valid range ([0065] dimensionality of range of thread identifier matches the dimensionality of the data image [0068] fig. 5 image region to be processed 260 range of thread identifier top left (1,1) and bottom right (5,4) i.e. rest of the thread identifier is outside the range).

Kanestein, Fleming, Jr. and Glauert, in combination, do not specifically teach compute node is configured to obtain the operand from an alternative source.

Ron, however, teaches compute node is configured to obtain the operand from an alternative source ([0058] thread, attempts, read, register, marked invalid, obtain data from the thread’s backing memory region rather than the data from the register).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teachings of Kanstein, Fleming, Jr. and Glauert with the teachings of Ron of obtaining data from the thread’s backing memory region if the thread’s register is marked invalid to improve efficiency and allow compute node is configured to obtain the operand from an alternative source to the method of Kanstein, Fleming, Jr. and Glauert as in the instant invention.
Claim 26 recites a method for elements of claim 13. Therefore, it is rejected for the same rational.

Response to Arguments
The previous claim objection has been withdrawn.
The previous claim interpretation under 35 U.S.C. § 112 (f) have been withdrawn.
The previous claim rejections under 35 U.S.C. § 112 (b) have been withdrawn. However, some new 35 U.S.C. § 112 (b) objections have been made.
The previous claim rejections under 35 U.S.C. § 101 have been withdrawn. 
Applicant's arguments filed on 10/03/2022 have been fully considered but they are moot in view of new grounds of rejections.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
LASKOWSKI; Krzysztof (US-20170212791-A1) teach facilitating dynamic thread-safe operations for variable bit-length transactions on computing devices
LI C (CN-112559163-A) teach method and device for optimizing tensor calculation performance
Vembu; Balaji et al. (US-20190317771-A1) teach graphics scheduling mechanism
	
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABU ZAR GHAFFARI whose telephone number is (571)270-3799. The examiner can normally be reached Monday-Thursday 9:00 - 17:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai AN can be reached on 571-272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ABU ZAR GHAFFARI
Primary Examiner
Art Unit 2195



/ABU ZAR GHAFFARI/Primary Examiner, Art Unit 2195