DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 1, 17, 20 are objected to because of the following informalities:
a) “coupleable” (claim 1, line 1). Suggestion for correction: It may have a potential indefiniteness because it is not sure whether it is included in the claimed processor or not. Suggestion for correction: Use “coupled” in the claim. See the same issue in other claims, such as claims 17, 20.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

s 1, 2, 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al. in view of 20060056406 in view of Koker et al. 20150379670.
As to claim 1, Bouchard teaches a processor coupleable to an interconnection network in a system having a plurality of other processors [processing cores 202]  or hybrid threading fabric circuits (hybrid threading fabric circuits not taught but see Koker below), the processor comprising (see fig.2): 
a processor core [a current core group in a pipeline] adapted to execute a fiber create instruction [GET_WORK request] to generate one or more commands [request(s)] to generate one or more call work descriptor packets [packet: work] to another processor core [another core group in the pipeline] or to a hybrid threading fabric circuit  (not specifically taught but see the obviousness reasoning in view of  Koker below) for execution of a corresponding plurality of execution threads [tasks]  (see [0038], work is defined to be any task to be performed by a core that is identified by an entry on a work queue.  The task can include packet processing operations, for example, packet processing operations for L4-L7 layers to be performed on a received packet identified by a work queue entry on a work queue.  Each separate packet processing operation is a piece of the work to be performed by a core on the received packet stored in memory; see [0043], a core can de-schedule scheduled work in order to transfer work from one core group to another or to avoid consuming a core for work that requires a long synchronization delay; see [0077], work is scheduled by the POW module from a list of POW entries 606 in an input queue 604 in response to a GET_WORK request from a core; see [0097], de-scheduling of work can be used to implement "work pipelining" by transferring work from one core group to another); and 

a command queue storing at least one command to generate a call work descriptor packet; 
an interconnection network interface [packet input 214][interface unit 210a,210b]  coupleable to the interconnection network [I/O bus 192], the interconnection network interface [packet input 214][interface unit 210a,210b]  adapted to generate one or more call work descriptor packets [packet] to another processor core [core in a pipeline] (see [0034], the packet input unit 214 allocates and creates a work queue entry for each packet; see Fig.4, [0062-0068], shows the details of an entry of a work, such as checksum, pointer, field length, address, tag, packet data, buffer descriptor, for a packet, see [0036], after the interface unit 210a, 210b has performed L2 network protocol processing, the packet is forwarded to the packet input unit 214. The packet input unit 214 performs pre- processing of L3 and L4 network protocol headers included in the received packet; [0043], packet processing can be pipelined from one group of cores to another, by defining the groups from which a core will accept work.  A core can de-schedule scheduled work in order to transfer work from one core group to another or to avoid consuming a core for work that requires a long synchronization delay) or to a hybrid threading fabric circuit  (not specifically taught but see the obviousness reasoning in view of  Koker below) in response to the one or more commands to generate one or more call work descriptor packets [packet: work] (see [0077], work is scheduled by the POW module from a list of POW entries 606 in an input queue 604 in response to a GET_WORK request from a core); 

a program count register storing a received program count, (Note 1: register for storing the program count is not explicitly shown, but, see fig.1 instruction cache 206 for storing instructions of the core 202, para [0046]. Examiner holds that although not explicitly shown, the instruction cache must have an instruction pointer, instruction counter, instruction address, or a register for holding the instruction location, for the purpose of reading and writing the instructions. Otherwise, without pointing/locating the instructions in the cache, the core 202 cannot execute the instructions from the instruction cache),
a data cache [Level 2 cache 212], and 
a general purpose register [register file 240] storing a received argument [data/operand for read/write] (see fig.2, [0051], the Fetch and Add Unit (FAU) 240 is a 2 KB register file supporting read, write, atomic fetch-and-add, and atomic update operations. The Fetch and Add Unit (FAU) 240 can be accessed from both the cores 202 and the packet output unit 218); 
an execution queue [Level 2 Cache/DRAM: work queue] coupled to the thread control memory [Free Pool Allocator 236], fig.2); and 

the control logic and thread selection circuit [POW 228] adapted to schedule the fiber create instruction for execution by the processor core (see [0097, a core can de-schedule the in-flight work and the POW module [POW 228] can re-schedule the work later.  The in-flight work also includes scheduled work that is de-scheduled by a core.  De-scheduled work remains in-flight, and will be re-scheduled later, but is not currently executing on a core.  De-scheduling of work can be used to implement "work pipelining" by transferring work from one core group to another; see also [0038] for the introductory teaching of POW 228: the Packet order/work (POW) module (unit) 228 queues and schedules work (packet processing operations) for the processor cores 202) and to reserve a predetermined amount of memory space (e.g. by allocating) in the thread control memory to store any return arguments [packet data]. (See [0054], if the interface unit 210a, 210b accepts the packet, the Free Pool Allocator (FPA) 236 allocates memory in L2 cache memory or DRAM for the packet and the packet is stored in memory. [0055], The packet input unit 214 uses one of the pools of pointers in the FPA 236 to store received packet data in level 2 cache memory or DRAM and another pool of pointers to allocate work queue entries).
Bouchard does not but Koker teaches a hybrid threading fabric circuit [hybrid fabric], as claimed (Koker [0113], see data from a single thread is strictly ordered when transmitted via the hybrid fabric, and per-thread coherency is maintained).
It would have been obvious to one of ordinary skill in the art to include a hybrid threading fabric circuit, as claimed because one of ordinary skill in the art should be able to 
As to claim 2, Bouchard teaches wherein the control logic and thread selection circuit [POW 228] is adapted to assign an available thread identifier to an execution thread, to automatically place the thread identifier [pointer: entry] in the execution queue [work queue] (See [0038], the packet order/work (POW) module (unit) 228 queues and schedules work (packet processing operations) for the processor cores 202. Work is defined to be any task to be performed by a core that is identified by an entry on a work queue. The task can include packet processing operations, for example, packet processing operations for L4-L7 layers to be performed on a received packet identified by a work queue entry on a work queue; [0040], The POW module 228 selects (i.e. schedules) work for a core 202 and returns a pointer to the work queue entry that describes the work to the core 202. Each piece of work (a packet processing operation) has an associated group identifier and a tag), and 
to periodically select (e.g. by the schedules, [0040][0041]) the thread identifier [pointer: entry] for execution by the processor core of an instruction of the execution thread [task], of the plurality of instructions, ([0040], the POW module 228 selects (i.e. schedules) work for a 
the processor core [core 202] using data stored in the data cache [Level 2 cache 212] or general purpose register (see [0037], the packet input unit 214 writes packet data into buffers in Level 2 cache 212 or DRAM 108 in a format that is convenient to higher-layer software executed in at least one processor core 202 for further processing of higher level network protocols).
As to claim 6, Bouchard teaches the processor of claim 1, wherein the processor core [core 202] is adapted to execute a waiting or nonwaiting fiber join instruction or a fiber join all instruction. (See [0108], the tag switch operation has separate switch request and switch completion wait operations.  The POW module 228 completes a requested tag value switch when the required ordering and atomicity constraints for the work are met.  This separated switch transaction allows the core to overlap the latency due to the switch request with other work and to de-schedule the work while a tag switch is pending, thus avoiding long synchronization delays).
Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al. in view of 20060056406 in view of  Koker et al. 20150379670, as applied to claim 1 and in further view of Lin et al. 20160139201.
As to claim 7, Bouchard teaches the processor of claim 1, wherein the core control circuit  [network service processor 100] further comprises: an interconnection network interface [packet input 214][interface unit 210a,210b] coupleable to an interconnection network [I/O bus 192]  to receive a work descriptor data packet [packet] (see [0034], the packet input unit 214 allocates and creates a work queue entry for each packet; see Fig.4, [0062-0068], shows the details of an entry of a work, such as checksum, pointer, field length, address, tag, packet data, buffer descriptor, for a packet, see [0036], after the interface unit 210a, 210b has performed L2 network protocol processing, the packet is forwarded to the packet input unit 214. The packet input unit 214 performs pre- processing of L3 and L4 network protocol headers included in the received packet), the interconnection network interface adapted to decode [parsing/preprocessing] the received work descriptor data packet [packet]  into an execution thread having an initial program count (see Note 1 in claim 1 above) and any received argument [data/operand] (see [0035], interface unit 210a,210b perform all parsing of received packets and checking of results to offload the cores 202; see [0036], a packet is received by any one of the interface units 210a, 210b through a SPI-4.2 or RGM Il interface. The interface unit 210a, 210b handles L2 network protocol pre-processing of the received packet by checking various fields in the L2 network protocol header included in the received packet, see [0036], after the 
Neither Bouchard nor Koker but Lin teaches the interconnection network interface (see fig.1 [10]) further adapted to generate a return work descriptor packet [return packet] in response to the execution of a return instruction [successful completion of write transaction] by the processor core [core 14] (See Lin [0030], the core 14 then proceeds to process block 68 where the core 14 forwards the packet to the next node 16 in the loop. Since the packet proceeds from one node 16 to the next, the packet eventually returns to the debug controller 12. The return of the packet indicates successful completion of the write transaction; see also Lin [0024], teaches that the packet includes command field 44 that indicates the type of transaction, such as the read/write/pool/broadcast transaction(s), to be executed by the receiving core. Examiner’s Note: the command field 44 specifying the read/write/pool/broadcast transaction(s) is a task indicator, or a work descriptor).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate a return work descriptor packet in response to the execution of a return instruction by the processor core, as claimed, because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the return packet of Lin for indicating the completion of the transaction, to a known device/method, such as the processing core and the interconnection network of Bouchard, for the purpose of indicating the successful completion of the write transaction among the cores, and it could be accomplished by reconfiguring the Lin’s return packet into the configuration file .
Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al. in view of 20060056406 in view of  Koker et al. 20150379670, as applied to claim 1 and in further view of Lin et al. 20160139201, as applied to claim 7 above, in further view of Kelley et al. 20030217189.
As to claim 8, nether Bouchard nor Koker nor Lin but Kelly teaches receive (e.g. by reading) an event data packet [event data packet], and to decode the received event data packet [event data packet] into an event identifier [identifier tags] and any received argument [data segment timestamps/ data packet segment position indicators]. (See fig.5, para [0047], the event data packet component 208 which reads the data segments from the first-in, first-out memory component 202 and generates, or otherwise re-packetizes, event data packets by correlating the data segments. The identifier tags, such as the (A)H tag 426 and the (A)E tag 430, the associated data segment timestamps, such as timestamps 424 and 428, the thread stream indicators (A) and (B), and the data packet segment position indicators uniquely identify a data segment to the event data packet component 208 so that multiple nested thread stream sources and interrupt service routines can be distinguished and coordinated).
Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al. in view of 20060056406 in view of  Koker et al. 20150379670,as applied to claims 1,2 above, and in further view of Moeller et al 20090254601.
As to claim 9, nether Bouchard nor Koker but Moeller teaches to generate and to receive a point-to-point event data message [point-to-point message] and a broadcast event 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate and to receive a point-to-point event data message and a broadcast event data message, as claimed because one of ordinary skill in the art should be able to recognize the application of known techniques, such as the point-to-point message and broadcast message,, as taught in Moeller, to a known device/method, such as the processing cores for transferring the packet work of Bouchard, for the purpose of providing application-independent approaches to configure how data can be transmitted among components of the system (e.g., in a point-to-point or broadcast mode, or via delta messages, see Moeller [0030]. MPEP 2143 KSR Example D).
Claim 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchard et al. in view of 20060056406 in view of  Koker et al. 20150379670, as applied to claim 1 above, and in further view of Jones et al. 20060179274.
As to claim 13, neither Bouchard nor Koker but Jones teaches wherein the control logic and thread selection circuit [dispatch scheduler 602] is further adapted to assign a pause state [wait on ITC] to the execution thread [thread] in response to the processor core [core] executing a memory load instruction [load] or a memory store instruction [store]. (see [0095], Waiting on ITC: the dispatch scheduler 602 may not issue instructions of the thread context for execution because the thread context is blocked waiting to load/store data from/to a location in inter- thread communication (ITC) space specified by a load/store instruction executed by the thread).
.
Allowable Subject Matter
Claims 3, 4,5, 10,11, 12,14,15,16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
a) In response to the generation of one or more call work descriptor packets to another processor core or to a hybrid threading fabric, the control logic and thread selection circuit is adapted to store a thread return count in a thread return register of the thread control memory. (Claim 3)
b) The determination of an event number corresponding to a received event data packet and to use an event mask stored in an event mask register to respond to a received event data packet. (Claim 10)

d) The register selected from the group consisting of: a thread state register; a pending fiber return count register; a return argument buffer or register; a return argument link list register; a custom atomic transaction identifier register; an event received mask register; an event state register; and combinations thereof. (Claim 12)
e) The change of the status of a thread identifier from pause to valid in response to a received event data packet to resume execution of a corresponding execution thread or in response to an event number of a received event data packet to resume execution of a corresponding execution thread. (Claim 14)
f) The end execution of a selected thread and to return a corresponding thread identifier of the selected thread to the thread identifier pool register in response to the execution of a return instruction by the processor core. (Claim 15)
g) The first priority queue and the second priority queue; the thread selection control circuitry coupled to the execution queue, the thread selection control circuitry adapted to select a thread identifier from the first priority queue at a first frequency and to select a thread identifier from the second priority queue at a second frequency, the second frequency lower than the first frequency. (Claim 16)

a) The processor core to execute a fiber create instruction to generate one or more commands to generate one or more call work descriptor packets to another processor core or to a hybrid threading fabric circuit for execution of a corresponding plurality of execution threads; the core control circuit comprising: a command queue storing at least one command to generate a call work descriptor packet; the interconnection network interface generates one or more call work descriptor packets to another processor core or to a hybrid threading fabric circuit in response to the one or more commands to generate one or more call work descriptor packets; the plurality of registers, the plurality of registers comprising a thread identifier pool register storing a plurality of thread identifiers, a thread state register, a thread return register storing a thread return count, a program count register storing the received program count, a data cache, and a general purpose register storing the received argument; an execution queue coupled to the thread control memory; the control logic and thread selection circuit schedules the fiber create instruction for execution by the processor core and to reserve a predetermined amount of memory space in the thread control memory to store any return arguments; and an instruction cache coupled to the processor core and to the control logic and thread selection circuit to receive an initial program count for the fiber create instruction and provide to the processor core a corresponding instruction for execution, of the plurality of instructions. (Claim 17)
b) The processor core to execute a fiber create instruction to generate one or more commands to generate one or more call work descriptor packets to another processor core or 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.


b) Jones 20140153582 is cited for the teaching of if a packet is not dequeued, then a subsequent read command for the same queue will return the same packet. If a packet is dequeued, then the memory occupied by the packet will be returned to a free pool for re- use. (See [0067]).
c) David Slogsnat is cited for the teaching of a packet-based interconnect protocol, ACM 2007 (see Section 2.1 The HyperTransport Protocol).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL H PAN whose telephone number is (571)272-4172. The examiner can normally be reached M-F 8:30 am -5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571 270 3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available 

DANIEL H. PAN
Examiner
Art Unit 2182



/DANIEL H PAN/             Primary Examiner, Art Unit 2182