DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1,10,11,12,14,20,22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ware (patent application publication No. 2020/0326939) in view of Nassif (patent application publication No. 2018/0101502)



Ware taught the invention substantially as claimed including (as to claim 1) A stacked integrated circuit device(e.g., see paragraphs 0079,0118 and figs. 13A-13E) comprising: a first die comprising a first plurality of processors and a first switching fabric for exchanging data between the first plurality of processors(e.g., see paragraphs 0080-0081)[note plural X1 components with  each including a plurality of multiply accumulate pipelines  and the  NOC includes a switch interconnect network]; and a second die comprising a second plurality of processors and a second switching fabric for exchanging data between the second plurality of processors, wherein the second die and first die are stacked together [note plural X1 components with  each including a plurality of multiply accumulate pipelines  and the  NOC includes a switch interconnect network (e.g. see figs. 6A-6C, 10,13E and paragraphs 0080-0081)], wherein for each of the first die and the second die, each of the processors of a respective die has: an output exchange bus connected to a switching fabric of the respective die for sending a first data  to another of the processors of the respective die accessible over the switching fabric of the respective die (e.g., see figs. 6A, 6B, 6C); at least one input wire for receiving further data  from another of the processors of the respective die accessible over the switching fabric of the respective die(e.g., see figs. 6A, 6B, 6C);
Ware did not expressly detail the interdie data connection was implemented as a wire. 

Nassif however  taught at least one interdie data connection wire connected to the output exchange bus of a corresponding one of the processors on the other die, wherein the respective processor is operable to send, over the at least one interdie data connection wire (DIE  TO DIE WIRE e.g, see fig. 35a,35b an paragraph 0072), 
It would have been obvious to one of ordinary skill in the art to combine the teachings of Ware and Nassif both references were directed toward problems of transmitting data between processors in a system that contained multiple dies. One of ordinary skill would have been motivated to incorporate the Nassif teachings of wire interdie connection at least to provide connection with insulation from signal noise of other processors to implement transmitting data between processor(s) of different cores in an efficient manner an preventing data loss due to interference and connection for relatively long distances between dies.  

Ware taught a second data packet over the switching fabric of the other die to one of the processors of the other die (Ring Bus  R[1], R[2] R[3] R[0]) (e.g., see fig. 7a and Figs. 6B,6C). As to the transmitted data being organized in packets Nassif  taught transmitting data including payload  data to die(s)  (e.g., see paragraph 0129). Nassif did detail that in certain embodiments herein include not packetizing  and /or serializing the data (e.g., between dies) (e.g., see paragraph 0057). Therefore from this,  implicitly some embodiments include packetizing and /or serializing the data. One of ordinary skill would have been motivated to packetize the data sent between cores at least to enable the cores to easily send data in an organized manner that would be recognized by each recipient core so the data could be used for execution. Also Nassif taught sending the data in data streams generated by decoded instruction(s) (e.g., see paragraph 0092) this provides for data  transmitted that one of ordinary skill would have been motivated to transmit in an organized manner (i.e., packet) so the recipient die(s) could recognize which data was part of each particular instruction for proper execution. 
	Due to the similarities between claims 1 and 14 and 22; claims 14 and 22 are rejected for the same reasons as claim 1 above. As to the local programs limitation claim 22, Nassif taught (e.g., see paragraph 0190) “the scheduler unit(s) , physical register file(s)  unit(s)  and execution cluster(s) are shown as being possibly plural because certain embodiments create separate pipelines  for certain types of data /operations  (e.g.,. scalar integer pipeline, a scalar floating point/packed integer/packed floating point /vector integer /vector floating point pipeline and/or memory access pipeline that each have their own scheduler unit” Therefore one of ordinary skill would have been motivated to implement the processing the plural different types of pipelines using separate local programs to provide efficient control of parallel processing such that different pipelines could processing without delay of waiting for another pipeline to reach a certain point. This would allow some pipelines to generate result(s) that may be used for further processing early which would increase throughput. As to the transmitting and receiving of data packet to/from second third, fourth processor(s) on first or second die ware taught multiple MAC pipelines on each of multiple die(s) (e.g., see figs. 6A,6B,6C and paragraph 0007).

As to claim 10 Ware  and Nassif taught  The stacked integrated circuit device of claim 1, Ware taught wherein each of the first die and the second die has clock distribution wiring for clocking transfer of data along exchange wiring of the respective die, the exchange wiring including at least the switching fabric, output exchange buses, and interdie data connection wires of the respective die, wherein the stacked integrated circuit device comprises a plurality of interdie clock signal connections, each connecting the clock distribution wiring of the first die to the clock distribution wiring of the second die (e.g., see figs. 4,7, 18A,35A, 35B).
As to claim 11, Ware and  Nassif taught  The stacked integrated circuit device of claim 10, Nassif taught wherein for each of the first die and the second die, the respective clock distribution wiring has a plurality of buffer stages (cluster buffers 2150A-2150D) at which a clock signal is buffered, wherein each of the plurality of interdie clock signal connections connects one of the buffer stages of the first die to one of the buffer stages of the second die (e.g., see figs 18,21).

As to claim 12 Ware and Nassif taught The stacked integrated circuit device of claim 1, Ware taught wherein each of the plurality of processors on the first die and the second die is configured to run a local program, wherein the local programs are generated as a related set at compile time(e.g., see paragraphs 0055,0190,0192,0193)[note decoding of macroinstructions and  in paragraph 0189 and the core support for x86 instruction set provides local programs generated as a related set at compile time].





As to claim 20 Ware and Nassif taught  The method of claim 14, Ware taught wherein the first processor and the fourth processor are vertical neighbours (e.g., see fig. 13E). [note ware taught the X1 components are stacked].
Claim(s) 4,6,7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ware and Nassif as applied to claim 1 above, and further in view of Lacey (patent application publication No. 2020/0012537).
As to claim 4 Ware and Nassif taught  The stacked integrated circuit device of claim 1, Lacey taught  wherein for each of the first die and the second die: each of at least some of the processors of the respective die has an associated input multiplexer (210)(e.g., see paragraph 0245) configured to receive one of the second data packets from the switching fabric of the respective die (e.g. see fig.16) , each of at least some of the processors of the respective die is operable to function as a receiving processor to receive the respective one of the second data packets from a processor on the other die at a predetermined receive time relative to a predetermined send time of the respective one of the second data packets from the processor of the other die, each receiving processor being operable to receive the respective one of the second data packets at the predetermined receive time by controlling its associated input multiplexer to connect to the switching fabric of the respective die at a predetermined switch time(e.g., see paragraphs 0232,243,245-248).
It would have been obvious to one of ordinary skill in the art to combine the teachings of Ware and Lacey. Both references were directed toward the problems of transmitting data between processors in an integrated circuit  data processing system. One of ordinary skill would have been motivated to synchronize processors using multiplexers and exchange phase to allow transfer without conflict with processing that prevents conflict between data access and processing  and therefore increases throughput. 

As to claim 6 Ware and Nassif taught  The stacked integrated circuit device of claim 4, Lacey taught  wherein each of the first data packets and second data packets is transmitted by one of the processors of the first die and the second die without an identifier of a destination processor(e.g., see paragraph 0242)[note Lacey taught “the packets do not have headers or any form or destination identifier…”.]

As to claim 7 Ware and Nassif taught  The stacked integrated circuit device of claim 1, Lacey taught  wherein the stacked integrated circuit device is configured to operate in: a compute phase, during which at least some of the processors of the first die and the second die are configured to perform computations on input data to generate results without exchanging data between the processors; and an exchange phase, during which at least some of the processors of the first die and the second die are configured to exchange the first and second data packets with one another, wherein the compute phase is separated from the exchange phase by a predetermined synchronisation barrier(e.g., see paragraph 0236,0243,0247-0248).


Allowable Subject Matter
Claims 2,3,5,8,9,13,15,16,17,18,19,21 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
Dependent claims  2,3,5,8,9,13,15,16,17,18,19,21  respecitvely require among other things the following: 


Claim 2. The stacked integrated circuit device of claim 1, wherein for each of the first die and the second die, each of the processors of the respective die has an associated output multiplexer for outputting data packets onto the output exchange bus of the respective processor, each output multiplexer having: a first input for receiving one of the first data packets from its associated processor for outputting the first data packet onto the output exchange bus of the associated processor; and a second input for receiving one of the second data packets from the corresponding processor of the associated processor via the respective at least one interdie data connection wire of the corresponding processor(e.g., see paragraphs 0245-0246.

Claim 3. The stacked integrated circuit device of claim 2, wherein for each of the first die and the second die, each of the processors of the respective die has an interdie control connection wire connected to the output multiplexer of the corresponding processor for controlling a selection between the first input and the second input.
Claim 5. The stacked integrated circuit device of claim 4, Lacey taught wherein each of the predetermined receive time, predetermined send time and predetermined switch time are timed with respect to a synchronisation signal issued by a synchronisation controller  of the stacked integrated circuit device. 

Claim 8. The stacked integrated circuit device of claim 1, wherein the stacked integrated circuit device is configured to operate in: an intradie exchange period, during which at least some of the processors of the first die are each configured to send data to at least one other processor of the first die without sending data to the processors of the second die, and at least some of the processors of the second die are each configured to send data to at least one other of the processors of the second die without sending data to the processors of the first die; and at a different time, an interdie exchange period, during which at least some of the processors of the first die are each configured to send data to at least one of the processors of the second die without sending data to the processors of the first die, and at least some of the processors of the second die are each configured to send data to at least one of the processors of the first die without sending data to the processors of the second die.

Claim 9. The stacked integrated circuit device of claim 8, wherein the intradie die exchange period and interdie exchange period belong to a single instance of an exchange phase during which at least some of the processors of the first die and the second die are configured to exchange the first and second data packets with one another, wherein a compute phase is separated from the exchange phase by a predetermined synchronisation barrier.
Claim 13. The stacked integrated circuit device of claim 1, wherein each of the plurality of processors on the first die and the second die is configured to run a local program, wherein for each of the processors of the first die and the second die, a respective local program is configured to only execute a send instruction to send the respective first data packet over the output exchange bus of the respective processor if a conflict on that output exchange bus would not occur due to sending of data by the respective processor's corresponding processor over that corresponding processor's at least one interdie data connection wire.
15. The method of claim 14, further comprising: receiving the first data packet at a first input of an output multiplexer associated with the first processor; receiving a third data packet at a second input of the associated output multiplexer from the fourth processor; and outputting the first data packet and the third data packet to the output exchange bus of the first processor.

Claim 16. The method of claim 15, further comprising: at the output multiplexer associated with the first processor, controlling a selection between the first input and the second input in response to a signal received on another interdie connection wire connected to the fourth processor.

Claim 17. The method of claim 14, further comprising: receiving at an input multiplexer associated with the first processor, a third data packet from the first switching fabric, the third data packet being sent from one of the second plurality of processors; receiving the third data packet at the first processor at a predetermined receive time relative to a predetermined send time of the third data packet from the one of the second plurality of processors, wherein the third data packet is received at the predetermined receive time by controlling the input multiplexer to connect to a wire of the first switching fabric at a predetermined switch time.
Claim 18. The method of claim 17, further comprising issuing a synchronisation signal from a synchronisation controller of the stacked integrated circuit device to each of the processors on the first die and the second die, wherein each of the predetermined receive time, the predetermined send time, and the predetermined switch time are timed with respect to the synchronisation signal issued by the synchronisation controller.

Claim 19. The method of claim 17, further comprising transmitting each of the first data packet and the second data packet without an identifier of a destination processor.
Claim 21. The method of claim 14, further comprising: transmitting a third data packet according to a double width transmission in which the first processor borrows outgoing exchange resources of the fifth processor.

	The closest prior art includes Ware and Nassif and Lacey. The closest prior art taught the limitations of the claims that claims 2,3,5,8,9,13,15,16,17,18,19,21 respectively depend.  
However  the closest prior art does not disclose among other things. the limitations of claims 2,3,5,8,9,13,15,16,17,18,19,21 as shown above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
	Wagh (patent No. 8,811,430) disclosed packetitzed interface for coupling agents (e.g., see abstract).
	Mishra (patent application publication No. 2015/0178092) disclosed hierarchical and parallel partition networks (e.g., see abstract). 
	Khare (patent application publication No. 2016/0173413) disclosed architecture for one-die interconnect (e.g., see abstract).
	Wilkinson (patent application publication No. 2019/0121680) disclosed synchronization with a host processor (e.g., see abstract).
	Ooi (patent application publication No. 2019/0227963) disclosed network-on-chip for inter-die and intra-die communication in modularized integrated circuit devices (e.g., see abstract).
	Seo (patent application publication No. 2019/0250985) disclosed semiconductor memory devices, memory systems and methods of operating semiconductor memory devices (e.g., see abstract).
	Noguera Serra (patent application publication No. 2019/0303033) disclosed data processing engine arrangement in a device (e.g.., see abstract).
 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



EC

/ERIC COLEMAN/Primary Examiner, Art Unit 2183