DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 1, 3, 21, 22 are objected to because of the following informalities:  
a) “coupleable” (claim 1, line 1). Suggestion for correction: It may have a potential indefiniteness because it is not sure whether it is included in the claimed processor or not. Suggestion for correction: Use “coupled” in the claim. See the same issue in other claims.
Appropriate correction is required.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1,3,5-20,21,22 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1,4,5-20,21,22, respectively, of copending Application No. 16399769 (20190340022)   in view of  Kailas et al. 20080010413.  
As to current claim 1, copending claim 1 does not but Kailas teaches:
to modify an amount of data [s-k] requested (see the selection of number of set within a given partition as specified in the load or store instruction, [0017]) in a memory load access request 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify an amount of data requested in a memory load access request to the memory circuit to correspond to a cache line boundary, as claimed because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the selection of number of data set within a given partition in the cache specified by the load/store instruction of , to a known device/method, such as the processing core of copending application, for the purpose of selecting and specifying a maximum number of sets per partitions in the cache (see Kailas, [0018]. MPEP 2143 KSR Example D), and it could be readily accomplished by configuring the maximum number of the data cache sets into the processing core of copending application such that the selection and the maximum amount of the data set could be recognized by the processing core of the copending application.
The current dependent claims 3, 5-20 correspond to copending dependent claims 4, 5-20, respectively, and are rejected under the same reason as in claim 1 above. See claim mapping below.
As to current independent claim 21, current claim 21 corresponds and includes similar limitations of copending claim 21 and with the same issue of the current claim 1 (See claim mapping below), and is rejected with the same reason as in current claim 1 above. The details of rejection are not being repeated herein.
.
This is a provisional nonstatutory double patenting rejection.

Copending Application 16399769
Current Application 16399800
1. A processor coupleable to an interconnection network in a system having a memory circuit and a host processor, comprising: 
a processor core adapted to execute a plurality of instructions; and 
a core control circuit coupled to the processor core, 
the core control circuit comprising: 
an interconnection network interface coupleable to the interconnection network to receive and decode a monitoring request from the host processor; 
a thread control memory comprising a plurality of registers, the plurality of registers comprising a thread identifier pool register storing a plurality of thread identifiers, 
a program count register storing a received program count, 
a data cache, and 
a general purpose register storing a received argument; 
an execution queue coupled to the thread control memory; 
a command queue; and 
a control logic and thread selection circuit coupled to the execution queue and the command queue, 
the control logic and thread selection circuit adapted to assign an available thread identifier to an execution thread, to automatically place the thread identifier in the execution queue, and 
to periodically select the thread identifier for execution by the processor core of an instruction of the execution thread, of the plurality of instructions, 

the processor core using data stored in the data cache or general purpose register, and in response to the monitoring request from the host processor, to generate a command to the command queue to copy and transmit, to the host processor, all data from the thread control memory corresponding to a selected thread identifier for monitoring thread state.

2. The processor of claim 1, wherein in response to the monitoring request from the host processor, 
the control logic and thread selection circuit is adapted to provide a program count or an instruction to the processor core to generate the command to the command queue to copy and transmit, to the host processor, all data from the thread control memory corresponding to the selected thread identifier for monitoring thread state.

3. The processor of claim 1, wherein in response to the monitoring request from the host processor, the control logic and thread selection circuit is adapted to directly generate the command to the command queue for the interconnection network interface to copy and transmit, to the host processor, all data from the thread control memory corresponding to the selected thread identifier for monitoring thread state.

4. The processor of claim 1, wherein 

the interconnection network interface is 

adapted to receive a work descriptor data packet, 
to decode the received work descriptor data packet into an execution thread having an initial program count and any received argument, and to generate a return work descriptor packet in response to the execution of a return instruction by the processor core.










5. The processor of claim 2, wherein the control logic and thread selection circuit is further adapted to automatically schedule an instruction, of the plurality of instructions, for execution by the processor core in response to a received event data packet.

6. The processor of claim 5, wherein the interconnection network interface is further adapted to receive an event data packet, and to decode the received event data packet into an event identifier and any received argument.
7. The processor of claim 2, wherein the interconnection network interface is further adapted to store the execution thread having the initial program count and any received argument in the thread control memory using the thread identifier as an index to the thread control memory.

8. The processor of claim 2, wherein the interconnection network interface is further adapted to generate and to receive a point-to-point event data message and a broadcast event data message.

9. The processor of claim 1, wherein the processor core is adapted to execute a fiber create instruction and wherein the core control circuit is further adapted to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads.
10. The processor of claim 9, wherein the control logic and thread selection circuit is further adapted to reserve a predetermined amount of memory space in a thread control memory to store return arguments.

11. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to determine an event number corresponding to a received event data packet and to use an event mask stored in an event mask register to respond to a received event data packet.

12. The processor of claim 1, wherein the core control circuit further comprises: an interconnection network interface; a network response memory; an instruction cache coupled to the control logic and thread selection circuit; and a command queue.

13. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to assign a valid state to the thread identifier of the execution thread, and for as long as the valid state remains, to periodically select the thread identifier for execution of an instruction of the execution thread by the processor core until completion of the execution thread, and to pause thread execution by not returning the thread identifier to the execution queue when it has a pause state.

14. The processor of claim 1, wherein the thread control memory further comprises a register selected from the group consisting of: a thread state register; a pending fiber return count register; a return argument buffer or register; a return argument link list register; a custom atomic transaction identifier register; an event received mask register; an event state register; and combinations thereof.

15. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to assign a pause state to the execution thread in response to the processor core executing a memory load instruction or a memory store instruction.

16. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to change the status of a thread identifier from pause to valid in response to a received event data packet to resume execution of a corresponding execution thread or in response to an event number of a received event data packet to resume execution of a corresponding execution thread.

17. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to end execution of a selected thread and to return a corresponding thread identifier of the selected thread to the thread identifier pool register in response to the execution of a return instruction by the processor core.

18. The processor of claim 17, wherein the control logic and thread selection circuit is further adapted to clear the registers of the thread control memory indexed by the corresponding thread identifier of the selected thread in response to the execution of a return instruction by the processor core.

19. The processor of claim 1, wherein the execution queue further comprises: a first priority queue; and a second priority queue.

wherein the control logic and thread selection circuit further comprises: thread selection control circuitry coupled to the execution queue, the thread selection control circuitry adapted to select a thread identifier from the first priority queue at a first frequency and to select a thread identifier from the second priority queue at a second frequency, the second frequency lower than the first frequency.

21. A processor coupleable to an interconnection network in a system having a memory circuit and a host processor, comprising: 
a processor core adapted to execute a plurality of instructions; and 
a core control circuit coupled to the processor core, the core control circuit comprising: an interconnection network interface coupleable to an interconnection network to receive a work descriptor data packet, to decode the received work descriptor data packet into an execution thread having an initial program count and any received argument, and 
to receive and decode a monitoring request from the host processor; 
a thread control memory coupled to the interconnection network interface and comprising a plurality of registers, the plurality of registers comprising a thread identifier pool register storing a plurality of thread identifiers, a thread state register, a program count register storing the received program count, a data cache, and a general purpose register storing the received argument; 
an execution queue coupled to the thread control memory; 
a command queue; 

a control logic and thread selection circuit coupled to the execution queue, to the command queue and to the thread control memory, 
the control logic and thread selection circuit adapted to assign an available thread identifier to the execution thread, to place the thread identifier in the execution queue, to select the thread identifier for execution, to access the thread control memory using the thread identifier as an index to select the initial program count for the execution thread, and 

in response to the monitoring request from the host processor, to generate a command to the command queue to copy and transmit, to the host processor, all data from the thread control memory corresponding to a selected thread identifier for monitoring thread state; and 
an instruction cache coupled to the processor core and to the control logic and thread selection circuit to receive the initial program count and provide to the processor core a corresponding instruction for execution, of the plurality of instructions.

22. A processor coupleable to an interconnection network in a system having a memory circuit and a host processor, comprising: 
a core control circuit comprising:
 an interconnection network interface coupleable to an interconnection network to receive a call work descriptor data packet, to decode the received work descriptor data packet into an execution thread having an initial program count and any received argument, to encode a work descriptor packet for transmission to other processing elements, and 


a thread control memory coupled to the interconnection network interface and comprising a plurality of registers, the plurality of registers comprising a thread identifier pool register storing a plurality of thread identifiers, a thread state register, a program count register storing the received program count, and a general purpose register storing the received argument; 
an execution queue coupled to the thread control memory; 
a command queue; 
a network response memory coupled to the interconnection network interface; 
a control logic and thread selection circuit coupled to the execution queue, to the command queue, to the thread control memory, and to the instruction cache, the control logic and thread selection circuit adapted to assign an available thread identifier and an initial valid state to the execution thread, to place the thread identifier in the execution queue, to select the thread identifier for execution, to access the thread control memory using the thread identifier as an index to select the initial program count for the execution thread, and in response to the monitoring request from the host processor, to generate a command to the command queue to copy and transmit, to the host processor, all data from the thread control memory corresponding to a selected thread identifier for monitoring thread state; 
an instruction cache coupled to the control logic and thread selection circuit to receive the initial program count and provide a corresponding instruction for execution; and 


a processor core coupled to the instruction cache and to the command queue of the core control circuit, the processor core adapted to execute the corresponding instruction.
A processor coupleable to an interconnection network in a system having a memory circuit, comprising: 

a processor core adapted to execute a plurality of instructions; and 
a core control circuit coupled to the processor core, 
the core control circuit comprising:
 



a thread control memory comprising a plurality of registers, the plurality of registers comprising a thread identifier pool register storing a plurality of thread identifiers, 
a program count register storing a received program count, 
a data cache, and 
a general purpose register storing a received argument; 
an execution queue coupled to the thread control memory; and 

a control logic and thread selection circuit coupled to the execution queue, 

the control logic and thread selection circuit adapted to assign an available thread identifier to an execution thread, 
to automatically place the thread identifier in the execution queue, 
to periodically select the thread identifier for execution by the processor core of an instruction of the execution thread, of the plurality of instructions, 

the processor core using data stored in the data cache or general purpose register, and to modify an amount of data requested in a memory load access request to the memory circuit to correspond to a cache line boundary of the data cache.




2. The processor of claim 1, wherein 


the control logic and thread selection circuit is further adapted to increase or decrease the amount of data requested in the memory load access request to correspond to the cache line boundary of the data cache.















3. The processor of claim 1, wherein the core control circuit further comprises: an interconnection network interface coupleable to an interconnection network 
to receive a work descriptor data packet, the interconnection network interface adapted to decode the received work descriptor data packet into an execution thread having an initial program count and any received argument, and to generate a return work descriptor packet in response to the execution of a return instruction by the processor core.
4. The processor of claim 3, wherein the control logic and thread selection circuit is further adapted to automatically schedule an instruction, of the plurality of instructions, corresponding to the initial program count for execution by the processor core in response to the received work descriptor data packet.


5. The processor of claim 3, wherein the control logic and thread selection circuit is further adapted to automatically schedule an instruction, of the plurality of instructions, for execution by the processor core in response to a received event data packet.

6. The processor of claim 5, wherein the interconnection network interface is further adapted to receive an event data packet, and to decode the received event data packet into an event identifier and any received argument.
7. The processor of claim 3, wherein the interconnection network interface is further adapted to store the execution thread having the initial program count and any received argument in the thread control memory using the thread identifier as an index to the thread control memory.

8. The processor of claim 3, wherein the interconnection network interface is further adapted to generate and to receive a point-to-point event data message and a broadcast event data message.

9. The processor of claim 1, wherein the processor core is adapted to execute a fiber create instruction and wherein the core control circuit is further adapted to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads.
10. The processor of claim 9, wherein the control logic and thread selection circuit is further adapted to reserve a predetermined amount of memory space in a thread control memory to store return arguments.

11. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to determine an event number corresponding to a received event data packet and to use an event mask stored in an event mask register to respond to a received event data packet.

12. The processor of claim 1, wherein the core control circuit further comprises: an interconnection network interface; a network response memory; an instruction cache coupled to the control logic and thread selection circuit; and a command queue.

13. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to assign a valid state to the thread identifier of the execution thread, and for as long as the valid state remains, to periodically select the thread identifier for execution of an instruction of the execution thread by the processor core until completion of the execution thread, and to pause thread execution by not returning the thread identifier to the execution queue when it has a pause state.

14. The processor of claim 1, wherein the thread control memory further comprises a register selected from the group consisting of: a thread state register; a pending fiber return count register; a return argument buffer or register; a return argument link list register; a custom atomic transaction identifier register; an event received mask register; an event state register; and combinations thereof.

15. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to assign a pause state to the execution thread in response to the processor core executing a memory load instruction or a memory store instruction.

16. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to change the status of a thread identifier from pause to valid in response to a received event data packet to resume execution of a corresponding execution thread or in response to an event number of a received event data packet to resume execution of a corresponding execution thread.

17. The processor of claim 1, wherein the control logic and thread selection circuit is further adapted to end execution of a selected thread and to return a corresponding thread identifier of the selected thread to the thread identifier pool register in response to the execution of a return instruction by the processor core.

18. The processor of claim 17, wherein the control logic and thread selection circuit is further adapted to clear the registers of the thread control memory indexed by the corresponding thread identifier of the selected thread in response to the execution of a return instruction by the processor core.

19. The processor of claim 1, wherein the execution queue further comprises: a first priority queue; and a second priority queue.

wherein the control logic and thread selection circuit further comprises: thread selection control circuitry coupled to the execution queue, the thread selection control circuitry adapted to select a thread identifier from the first priority queue at a first frequency and to select a thread identifier from the second priority queue at a second frequency, the second frequency lower than the first frequency.

21. A processor coupleable to an interconnection network in a system having a memory circuit, comprising: 

a processor core adapted to execute a plurality of instructions; and 
a core control circuit coupled to the processor core, the core control circuit comprising: an interconnection network interface coupleable to the interconnection network to receive a work descriptor data packet, to decode the received work descriptor data packet into an execution thread having an initial program count and any received argument; 


a thread control memory coupled to the interconnection network interface and comprising a plurality of registers, the plurality of registers comprising a thread identifier pool register storing a plurality of thread identifiers, a thread state register, a program count register storing the received program count, a data cache, and a general purpose register storing the received argument; 
an execution queue coupled to the thread control memory; 

a control logic and thread selection circuit coupled to the execution queue and to the thread control memory, 

the control logic and thread selection circuit adapted to assign an available thread identifier to the execution thread, to place the thread identifier in the execution queue, to select the thread identifier for execution, to access the thread control memory using the thread identifier as an index to select the initial program count for the execution thread, and 

to modify a data size of a memory load access request to the memory circuit to correspond to a cache line boundary of the data cache; and 




an instruction cache coupled to the processor core and to the control logic and thread selection circuit to receive the initial program count and provide to the processor core a corresponding instruction for execution, of the plurality of instructions.

22. A processor coupleable to an interconnection network in a system having a memory circuit, comprising: 

a core control circuit comprising: 
an interconnection network interface coupleable to the interconnection network to receive a call work descriptor data packet, to decode the received work descriptor data packet into an execution thread having an initial program count and any received argument, and to encode a work descriptor packet for transmission to other processing elements; 



a thread control memory coupled to the interconnection network interface and comprising a plurality of registers, the plurality of registers comprising a thread identifier pool register storing a plurality of thread identifiers, a thread state register, a program count register storing the received program count, and a general purpose register storing the received argument; 
an execution queue coupled to the thread control memory; 

a network response memory coupled to the interconnection network interface; 
a control logic and thread selection circuit coupled to the execution queue, to the thread control memory, and to the instruction cache, the control logic and thread selection circuit adapted to assign an available thread identifier and an initial valid state to the execution thread, to place the thread identifier in the execution queue, to select the thread identifier for execution, to access the thread control memory using the thread identifier as an index to select the initial program count for the execution thread, and 
to modify an amount of data requested in a memory load access request to the memory circuit to correspond to a cache line boundary of the data cache; 



an instruction cache coupled to the control logic and thread selection circuit to receive the initial program count and provide a corresponding instruction for execution; and a command queue storing one or more commands for generation of one or more work descriptor packets; and 
a processor core coupled to the instruction cache and to the command queue of the core control circuit, the processor core adapted to execute the corresponding instruction.




Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al.  20060056406 in view of Kailas et al. 20080010413.
As to claim 1, Bouchard teaches a processor coupleable to an interconnection network [in a system having a memory circuit (see fog.1 for the overall system; see fig.2 shows the details of the network service processor; [0028] teaches the network services processor 100 receives packets from the Ethernet ports (Gig E) through the physical interfaces PHY 104a, 104b, performs L7-L2 network protocol processing on the received packets and forwards processed packets through the physical interfaces 104a, 104b to another hop in the network or 
a processor core [cores 202] adapted to execute a plurality of instructions; and 
a core control circuit [network service processor 100] coupled to the processor core [cores 202], the core control circuit [network service processor 100] comprising: 
a thread control memory [Free Pool Allocator (FPA) 236] comprising a plurality of registers [register pools], the plurality of registers [register pools] comprising a thread identifier pool register [pool] storing a plurality of thread identifiers [pointers] (See [0049],  a Free Pool Allocator (FPA) 236 maintains pools of pointers to free memory in level 2 cache memory 212 and DRAM. See also [0034], the entry created by packet input unit 214 that uses one of the pools of pointers in the FPA 236 to store received packet data in level 2 cache or DRAM and another pool of pointers to allocate work queue entries in [0034][0055]), a program count register storing a received program count (Note 1: register for storing the program count is not explicitly shown, but, see fig.1 instruction cache 206 for storing instructions of the core 202, para [0046]. Examiner holds that although not explicitly shown, the instruction cache must have an instruction pointer, instruction counter, instruction address, or a register for holding the instruction location, for the purpose of reading and writing the instructions from/to the instruction cache. Otherwise, without pointing/locating the instructions in the cache, the core 202 cannot function), 
a data cache [Level 2 cache 212], and a general purpose register [register file 240] storing a received argument [data/operand for read/write] (see fig.2, [0051], the Fetch and Add Unit (FAU) 240 is a 2 KB register file supporting read, write, atomic fetch-and-add, and atomic 
an execution queue [Level 2 Cache/DRAM: work queue] coupled to the thread control memory [Free Pool Allocator 236], fig.2); and 
a control logic and thread selection circuit [POW 228] coupled to the execution queue [Level 2 Cache/DRAM: work queue]   (see fig.2 [POW 228] [Level 2 Cache/DRAM: work queue]), the control logic and thread selection circuit [POW 228] adapted to assign an available thread identifier [pointer: entry] to an execution thread [work: task], to automatically place the thread identifier [pointer: entry] in the execution queue [work queue] (See [0038], the packet order/work (POW) module (unit) 228 queues and schedules work (packet processing operations) for the processor cores 202.  Work is defined to be any task to be performed by a core that is identified by an entry on a work queue.  The task can include packet processing operations, for example, packet processing operations for L4-L7 layers to be performed on a received packet identified by a work queue entry on a work queue; [0040], The POW module 228 selects (i.e. schedules) work for a core 202 and returns a pointer to the work queue entry that describes the work to the core 202.  Each piece of work (a packet processing operation) has an associated group identifier and a tag), and 
to periodically select (e.g. by the schedules, [0040][0041] ) the thread identifier [pointer: entry] for execution by the processor core [core 202] of an instruction of the execution thread, of the plurality of instructions ([0040], the POW module 228 selects (i.e. schedules) work for a core 202 and returns a pointer to the work queue entry that describes the work to the core 202.  Each piece of work (a packet processing operation) has an associated group identifier and 
at most one core has a given tag), 
the processor core [core 202] using data stored in the data cache [Level 2 cache 212] or general purpose register (see [0037], the packet input unit 214 writes packet data into buffers in Level 2 cache 212 or DRAM 108 in a format that is convenient to higher-layer software executed in at least one processor core 202 for further processing of higher level network protocols).
	Bouchard does not but Kailas teaches:
to modify (e.g. to select) an amount of data [s-k] requested (see the selection of number of set within a given partition as specified in the load or store instruction, [0017])  in a memory load access request [load/store] to the memory circuit to correspond to a cache line boundary [maximum number of sets per partition] of the data cache [100] ([0018]; Note: a partition includes at least a cache line), as claimed.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify an amount of data requested in a memory load access request to the memory circuit to correspond to a cache line boundary, as claimed because one 

As to claim 2, Bouchard does not but Kailas teaches:
to increase or decrease (e.g. by the selection of s-k bits, [0017]) the amount of data requested [s-k] in the memory load access request [load instruction] to correspond to the cache line boundary of the data cache [maximum number of sets per partition] of the data cache [100] (See [0017][0018]; Note: a partition includes at least a cache line), as claimed.
The reason of obviousness in claim 1 is also applicable to claim 2 and not being repeated herein.
As to claim 12, Bouchard teaches the processor of claim 1, wherein the core control circuit [100] (see fig.2) further comprises: 
an interconnection network interface [packet input 214] [interface unit 210a, 210b]; 
a network response memory [238] (see [0050], the bridge 238 includes buffer queues for storing information to be transferred between the I/O bus, coherent memory bus, the packet input unit 214 and the packet output unit 218); 

a command queue [command queue 560].
Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al.  20060056406 in view of Kailas et al. 20080010413, as applied to claim 1 above, and in further view of Lin et al. 20160139201.
As to claim 3, Bouchard teaches the processor of claim 1, wherein the core control circuit further comprises: an interconnection network interface [packet input 214] [interface unit 210a, 210b] coupleable to an interconnection network [I/O bus 192] to receive a work descriptor data packet [packet] (see [0034], the packet input unit 214 allocates and creates a work queue entry for each packet; see Fig.4, [0062-0068], shows the details of an entry of a work, such as checksum, pointer, field length, address, tag, packet data, buffer descriptor, for a packet, see [0036], after the interface unit 210a, 210b has performed L2 network protocol processing, the packet is forwarded to the packet input unit 214. The packet input unit 214 performs pre- processing of L3 and L4 network protocol headers included in the received packet),
the interconnection network interface [packet input 214] [interface unit 210a, 210b] adapted to decode [parsing/preprocessing] the received work descriptor data packet [packet] into an execution thread having an initial program count [see Note 1 below] and any received argument [data/operand] (See [0035], interface unit 210a, 210b perform all parsing of received packets and checking of results to offload the cores 202, [0036], a packet is received by any one of the interface units 210a, 210b through a SPI-4.2 or RGM Il interface. The interface unit 210a, 
Neither Bouchard nor Kailas but Lin teaches generate a return work descriptor packet [return packet] in response to the execution of a return instruction [successful completion of write transaction] by the processor core [core 14] (See Lin [0030], the core 14 then proceeds to process block 68 where the core 14 forwards the packet to the next node 16 in the loop. Since the packet proceeds from one node 16 to the next, the packet eventually returns to the debug controller 12. The return of the packet indicates successful completion of the write transaction; see also Lin [0024], teaches that the packet includes command field 44 that indicates the type of transaction, such as the read/write/pool/broadcast transaction(s), to be executed by the receiving core. Examiner’s Note: the command field 44 specifying the read/write/pool/broadcast transaction(s) is a task indicator, or a work descriptor).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate a return work descriptor packet in response to the execution of a return instruction by the processor core, as claimed, because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the return packet of Lin for indicating the completion of the transaction, to a known device/method, such as the processing core and the interconnection network of Bouchard, for the purpose of indicating the successful completion of the write transaction among the cores, .
Claims 5, 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al.  20060056406 in view of Kailas et al. 20080010413, as applied to claims 1, 2 above, and in further view of Lin et al. 20160139201, as applied to claim 3 above, and in further view of Kelley et al. 20030217189.
As to claim 5, Bouchard teaches the processor of claim 3, wherein the control logic and thread selection circuit [packet order/work (POW) module (unit) 228]   is further adapted to automatically schedule an instruction, of the plurality of instructions [work: task], for execution by the processor core [core 202] (See [0038], the packet order/work (POW) module (unit) 228 queues and schedules work (packet processing operations) for the processor cores 202.  Work is defined to be any task to be performed by a core that is identified by an entry on a work queue.  The task can include packet processing operations, for example, packet processing operations for L4-L7 layers to be performed on a received packet identified by a work queue entry on a work queue; [0040], The POW module 228 selects (i.e. schedules) work for a core 202 and returns a pointer to the work queue entry that describes the work to the core 202.  Each piece of work (a packet processing operation) has an associated group identifier and a tag) but neither Bouchard nor Kailas nor Lin teaches the control logic and thread selection circuit is in response to a received event data packet, as claimed.
However, Kelley teaches a control logic and thread selection circuit [event data packet component 208] for scheduling (e.g. by coordinating) execution of instructions [thread stream 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to wherein the control logic and thread selection circuit is further adapted to automatically schedule an instruction, of the plurality of instructions [work: task], for execution by the processor core (already taught by Bouchard as in the claim mapping above) in response to a received event data packet (taught by Kelley as cited above), as claimed because one of ordinary skill in the art should be able to recognize the application of a known technique, such as Kelley’s coordinating of the stream of threads and service routines in response to the event data packet, to a to a known device/method, such as the processing core and the interconnection network of Bouchard, in order for the data packet segment position indicators to uniquely identify a data segment to the event data packet component 208 so that multiple nested thread stream sources and interrupt service routines can be distinguished and coordinated.  (See Kelley [0047]. MPEP 2143 KSR Example D).
As to claim 6, neither Bouchard nor Kailas nor Lin but Kelly teaches receive (e.g. by reading) an event data packet [event data packet], and to decode the received event data 
Claim 6 is dependent from claim 5, and the reason of obviousness in claim 5 is also applicable in claim 6, and the obviousness reasoning is not being repeated herein.
Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al.  20060056406 in view of Kailas et al. 20080010413, as applied to claims 1, 2 above, and in further view of Lin et al. 20160139201, as applied to claim 3 above, and in view of Moeller et al. 20090254601.
As to claim 8, neither Bouchard nor Kailas but Lin teaches generate and to receive a broadcast event data message, as claimed (Lin [0024], teaches that the packet includes command field 44 that indicates the type of transaction, such as the read/write/pool/broadcast transaction(s), to be executed by the receiving core).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate and to receive a broadcast event data message, as 
Neither Bouchard nor Kailas nor Lin but Moeller teaches receive the a point-to-point event data message [point-to-point data message] (See [0252], if no broadcast channels match the data object, the object is compared directly against the subscription records for the given data type as described earlier, and corresponding event messages sent as point-to-point messages to the correct subscribers), as claimed.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include a point-to-point event data message, as claimed because one of ordinary skill in the art should be able to recognize the application of a known technique, such as Moeller’s point-to-point data message, to a to a known device/method, such as the processing core and the interconnection network of Bouchard,  for the purpose of allowing the appropriate routing mechanism (point-to-point or broadcast) to be chosen based on properties of the data object being transmitted itself. (See Moeller [0252]. MPEP 2143 KSR Example D).
Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al.  20060056406 in view of in view of Kailas et al. 20080010413, as applied to claim 1 above, and in further view of Jones et al. 20060179274.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention  to assign a pause state to the execution thread in response to the processor core executing a memory load instruction or a memory store instruction, as claimed (see claim mapping above) because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the wait state of Jones for waiting to load or store data by the load/store instruction executed by the thread, to a to a known device/method, such as the processing core and the interconnection network of Bouchard, in order for the dispatch scheduler 602 to communicate the state for each thread context, such as the wait state, via respective PM_TC_state 642 input, and it can be accomplished by defining the wait time of the load/store instruction of  Jones into the processing core of Bouchard ( See Jones [0095]. MPEP 2143 KSR Example D).
Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al.  20060056406 in view of Kailas et al. 20080010413, as applied to claim 1 above, and in further view of Chen et al. 20120134449.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include wherein the execution queue further comprises: a first priority queue; and a second priority queue, as claimed (see claim mapping above) because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the high priority queue and the low priority queue of Chen, to a to a known device/method, such as the processing core and the interconnection network of Bouchard, in order for executes tasks in the low-priority queue only when the high-priority queue is empty ( See Chen [0049]. MPEP 2143 KSR Example D).
Allowable Subject Matter
Claims 7, 9, 10,11,13,14,16,17,18,20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. None of the prior art of record teaches:
a) The interconnection network interface is further adapted to store the execution thread having the initial program count and any received argument in the thread control memory using the thread identifier as an index to the thread control memory. (Claim 7)

c) The control logic and thread selection circuit is further adapted to determine an event number corresponding to a received event data packet and to use an event mask stored in an event mask register to respond to a received event data packet. (Claim 11)
d) The assignment a valid state to the thread identifier of the execution thread, and for as long as the valid state remains, to periodically select the thread identifier for execution of an instruction of the execution thread by the processor core until completion of the execution thread, and to pause thread execution by not returning the thread identifier to the execution queue when it has a pause state. (Claim 13)
e) The thread control memory further comprises a register selected from the group consisting of: a thread state register; a pending fiber return count register; a return argument buffer or register; a return argument link list register; a custom atomic transaction identifier register; an event received mask register; an event state register; and combinations thereof. (Claim 14)
f) The change of the status of a thread identifier from pause to valid in response to a received event data packet to resume execution of a corresponding execution thread or in response to an event number of a received event data packet to resume execution of a corresponding execution thread. (Claim 16)

h) The thread selection control circuitry coupled to the execution queue, the thread selection control circuitry adapted to select a thread identifier from the first priority queue at a first frequency and to select a thread identifier from the second priority queue at a second frequency, the second frequency lower than the first frequency. (Claim 20)
Claims 21, 22 are allowable over the art of record. None of the prior art of record teaches the combined features of: (Partial features shown, see claims for full details)
1) The interconnection network interface to receive a work descriptor data packet, to decode the received work descriptor data packet into an execution thread having an initial program count and any received argument; a thread control memory to the interconnection network interface, the plurality of registers, the thread identifier pool register storing a plurality of thread identifiers, a thread state register, a program count register storing the received program count, a data cache, general purpose register storing the received argument; the execution queue to the thread control memory; a control logic and thread selection circuit to the execution queue and to the thread control memory, the assignment of available thread identifier to the execution thread, to place the thread identifier in the execution queue, to select the thread identifier for execution, to access the thread control memory using the thread identifier as an index to select the initial program count for the execution thread, modify a data size of a memory load access request to the memory circuit to correspond to a cache line 
2) The interconnection network interface to receive a call work descriptor data packet, to decode the received work descriptor data packet into an execution thread having an initial program count and any received argument, to encode a work descriptor packet for transmission to other processing elements; a thread control memory to the interconnection network interface, the  plurality of registers, the plurality of registers, the thread identifier pool register storing a plurality of thread identifiers, a thread state register, a program count register storing the received program count, and a general purpose register storing the received argument; execution queue to the thread control memory; a network response memory to the interconnection network interface; a control logic and  the thread selection circuit to the thread control memory, and to the instruction cache, the control logic and thread selection circuit, the  assignment of available thread identifier and an initial valid state to the execution thread, to place the thread identifier in the execution queue, to select the thread identifier for execution, to access the thread control memory using the thread identifier as an index to select the initial program count for the execution thread, modify an amount of data requested in a memory load access request to the memory circuit to correspond to a cache line boundary of the data cache; the  instruction cache to receive the initial program count and provide a corresponding instruction for execution, the command queue storing one or more commands for generation of one or more work descriptor packets; and a processor core coupled to the instruction cache 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  
a) Morishita et al. 20080109809 is cited for the teaching of a number of highest-to-lowest priority queue 705, the virtual processor having the highest priority is allocated in the execution cycle. (See [0116]   ).
b) Jones 20140153582 is cited for the teaching of if a packet is not dequeued, then a subsequent read command for the same queue will return the same packet.  If a packet is dequeued, then the memory occupied by the packet will be returned to a free pool for re-use.  (See [0067]).
c) David Slogsnat is cited for the teaching of a packet-based interconnect protocol, ACM 2007 (see Section 2.1 The HyperTransport Protocol).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL H PAN whose telephone number is (571)272-4172. The examiner can normally be reached M-F 8:30 am -5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DANIEL H. PAN
Examiner
Art Unit 2182



/DANIEL H PAN/Primary Examiner, Art Unit 2182