DETAILED ACTION
This communication is in responsive to Application 16/844314 filed on 11/11/2021. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims:
		Claims 1-24 are presented for examination.

Examiner’s Note
3.	The claims are verbose and deal with the widely known parallel processing. Applicant should focus on the novel features of the invention. 


Response to Arguments
4.	Examiner statements in the mailed Non-Final with respect to obvious limitations including common knowledge or well-known in the art are taken to be admitted prior art because applicant failed to traverse the Examiner’s assertion, see MPEP 2144.03 C. 

5.	Applicant’s arguments in the amendment filed on 11/11/2021regarding claim rejection under 35 USC § 103 with respect to Claims 1-19 have been considered and found not persuasive. Thus, the rejection under 35 USC § 103 with respect to Claims 1-19 is maintained. 


a.	applicants argues that the cited art does not teach “receive income incoming data from the one or more others of the processors during the first of the exchange stages…count an amount of at least part of the incoming data received …from the one or more others of the processors.” Examiner disagrees because the cited art still teaches the claim limitation. 
As initial matter, applicants provides no arguments as to why the art does not teach the above limitation, instead applicants –incorrectly- characterize the cited art and state that the data is an outgoing data and not incoming data. 
However, there is nowhere in the cited paragraphs that stats applicant’s interpretation or support such allegation. Moreover, the plain meaning to an input buffer as understood in the art means a location where incoming information is located and stored for processing. One skilled in the art understands exact opposite of applicants’ position because the cited paragraphs expressly teaches a credit counter that corresponding to input buffer space in the receiver tile_a. The cited art does not teach an outgoing buffer nor sender tile_a as alleged. How can a counter count an amount of outgoing data without counting the incoming data of the same buffer
Thus, Examiner maintains his interpretation and rejection. 
b.	applicants argues that the cited art does not teach “in response to determining that the amount of the ….transfer the further outgoing data to the one…” Examiner disagrees because the cited art still teaches the claim limitation.

Again, applicants characterization of the cited art is rejected because “at least one credit” is a predefined amount. Applicants provide no explanation or reasons as to why “at least one credit” do not qualify or not within “predefined amount.” 
Thus, Examiner maintains his interpretation and rejection. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any 
Claims 1-24 are rejected under 35 U.S.C. 103 as being unpatentable over Wentzlaff et al. (hereinafter Wentzlaff) US 7734894 B1 in view of Sridharan et al.  (hereinafter Sridharan) US 2018/0322386 A1.


Regarding Claim 1, Wentzlaff teaches a data processing system comprising a plurality of processors (Fig. 7), wherein each of the processors comprises at least one circuit configured to perform data transfer operations during each of at least some of a plurality of exchange stages to transfer data determined in dependence upon data received at the respective processor in a preceding one of the exchange stages from at least one other of the processors, each of the data transfer operations being for transfer of data to another one of the plurality of processors (Col. 10, lines 22-36; parallel processing environment having interconnected processor cores e.g. tiled integrated circuit. For example, Fig. 21a & Col. 6, lines 31-64 illustrate this limitation), wherein each at least one circuit is configured to: 
perform data transfer operations to transfer outgoing data to one or more others of the processors during a first of the exchange stages (Fig. 2b & Col. 8, lines 49-67; the multiplexers 232C feed the inputs to logic units 240A and 240B.  The output 
receive incoming data from the one or more others of the processors during the first of the exchange stages (Fig. 2b & Col. 8, lines 49-67; the multiplexers 232C feed the inputs to logic units 240A and 240B.  The output buffers 232B and input buffers 232D are mapped to the name space of the register file 236.  When the processor 200 (see FIG. 2A) reads from a register name mapped to a given switch port, data is taken from the corresponding input buffer 232D.  When the processor 200 writes to a register name mapped to a given switch port, data is inserted into the corresponding output buffer 232B);
determine further outgoing data in dependence upon at least part of the incoming data (Fig. 2b & Col. 8, lines 49-67; the multiplexers 232C feed the inputs to logic units 240A and 240B.  The output buffers 232B and input buffers 232D are mapped to the name space of the register file 236.  When the processor 200 (see FIG. 2A) reads from a register name mapped to a given switch port, data is taken from the corresponding input buffer 232D.  When the processor 200 writes to a register name mapped to a given switch port, data is inserted into the corresponding output buffer 232B);
count an amount of at least part of the incoming data received during the first of the exchange stages from the one or more others of the processors (Fig. 7 
and in response to determining that the amount of the at least part of the incoming data received has reached a predefined amount, perform data transfer operations to transfer the further outgoing data to the one or more others of the processors during a second of the exchange stages (Fig. 7 & Col. 21, lines 1-25; Control circuitry 710 counts credits in a credit counter 712 corresponding to input buffer space available in the receiver tile_a. If there is at least one credit and an input buffer has data to be sent, the control circuitry 710 will assert a signal to dequeue data from the appropriate one of the input buffers 704 and enqueue the data to the input buffer 706.  Otherwise the control circuitry 710 will stall, not sending any data to the receiver tile_a);
Wentzlaff does not expressly teach “…during the first of the exchange stages” & “…during a second of the exchange stages” however this limitation is suggested from Col. 21, lines 1-25 where data is counted and processed accordingly. However, Examiner still cites to Sridharan to support Wentzlaff. 
However, Sridharan teaches “…during the first of the exchange stages” & “…during a second of the exchange stages” (Fig. 9a-b & ¶0162-¶0170; convolutions stages in parallel to produce a set of linear activations e.g. this step is done in 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the teachings of Sridharan into the system of Wentzlaff in order to configure the network interface to transmit and receive the gradient data associated with the trainable parameters during a workflow of a machine learning framework (abstract). Utilizing such teachings e.g. parallel processing enable the system to transfer data from system memory via the I/O unit for processing (¶0060).  Also during processing the transferred data can be stored to on-chip memory (e.g., parallel processor memory) during processing, then written back to system memory. Id. 

Regarding Claim 2, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Wentzlaff further teaches wherein each of the at least one circuits is configured to: 
prior to the determining that the amount of the at least part of the incoming data received has reached the predefined amount, perform only some of the data transfer operations to transfer only part of the outgoing data to one or more others of the processors (implied from Fig. 7 & Col. 21, lines 1-25 where only one data out of three is processed before reaching the limit; Control circuitry 710 counts credits in a credit 
and in response to the determining that the amount of incoming data received has reached the predefined amount: perform remaining data transfer operations to transfer a remaining part of the outgoing data to the one or more others of the processors during the first of the exchange stages (implied from Fig. 7 & Col. 21, lines 1-25 where only one data out of three is processed before reaching the limit; Control circuitry 710 counts credits in a credit counter 712 corresponding to input buffer space available in the receiver tile_a. If there is at least one credit and an input buffer has data to be sent, the control circuitry 710 will assert a signal to dequeue data from the appropriate one of the input buffers 704 and enqueue the data to the input buffer 706.  Otherwise the control circuitry 710 will stall, not sending any data to the receiver tile_a);
and subsequently, perform the data transfer operations to transfer the further outgoing data to the one or more others of the processors during the second of the exchange stages (implied from Fig. 7 & Col. 21, lines 1-25 where only one data out of three is processed before reaching the limit; Control circuitry 710 counts credits in a credit counter 712 corresponding to input buffer space available in the receiver tile_a. If there is at least one credit and an input buffer has data to be sent, the control circuitry 710 will assert a signal to dequeue data from the appropriate one of the input buffers 

Regarding Claim 3, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 2, Wentzlaff further teaches wherein each of the at least one circuits is configured to: count an amount of a further part of the incoming data received during the first of the exchange stages from the one or more others of the processors (implied from Fig. 7 & Col. 21, lines 1-25 where only one data out of three is processed before reaching the limit; Control circuitry 710 counts credits in a credit counter 712 corresponding to input buffer space available in the receiver tile_a. If there is at least one credit and an input buffer has data to be sent, the control circuitry 710 will assert a signal to dequeue data from the appropriate one of the input buffers 704 and enqueue the data to the input buffer 706.  Otherwise the control circuitry 710 will stall, not sending any data to the receiver tile_a);  
and following starting to perform the remaining data transfer operations, determine that the amount of the further part of the incoming data received has reached a predefined amount, wherein the subsequently, perform the data transfer operations to transfer the further outgoing data to the one or more others of the processors during the second of the exchange stages is performed in response to determining that the amount of the further part of the incoming data received has reached a predefined amount (implied from Fig. 7 & Col. 21, lines 1-25 where only one data out of three is processed before reaching the limit; Control circuitry 710 counts credits in a credit counter 712 corresponding to input buffer space available in the receiver tile_a. If there is at least 

Regarding Claim 4, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 3, Wentzlaff further teaches wherein the at least part of the incoming data is addressed to a first location in the processor, wherein the further part of the incoming data is addressed to a second location in the processor (Col. 19, lines 41-67 and Fig. 6, After a packet reaches the destination tile, the packet is then sent to a final destination (which can also be indicated in the packet header).  The final destination can direct data to an off-tile location over a network port to the north, east, south, west, or can direct the data to a functional unit within the tile, such as the processor or an on-tile memory unit or functional unit.  This final destination routing enables data to be directed off of the network to an I/O device or memory interface, for example also see Sridharan in ¶0117; processing engine 431-432, N requires to do its work or it can be a pointer to a memory location where the application has set up a command queue of work to be completed).

Regarding Claim 5, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Sridharan further teaches wherein, for each of the processors, the one or more others of the processors comprises two or more processors (this limitation is a design choice, see While a single instance of the parallel 

Regarding Claim 6, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 5, wherein, Sridharan further teaches for each of the processors, the two or more processors comprises only two processors (this limitation is a design choice, see While a single instance of the parallel processing unit 202 is illustrated within the parallel processor 200, any number of instances of the 
parallel processing unit 202 can be included, see ¶0066).


Regarding Claim 7, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Sridharan further teaches wherein each of the processors comprises a plurality of processing units (¶0058), each of at least some of the plurality of processing units being configured to: receive part of the incoming data from the one or more others of the processors (¶0053-¶0066); and send part of the outgoing data to the one or more others of the processors (¶0053-¶0066); wherein the steps of counting the amount of incoming data received and determining that the amount of the incoming data received has reached the predefined amount are performed by one or more of the plurality of processing units of a first type (¶0058-¶0065; cluster array 212 can be configured to perform various types of parallel processing operations).



Regarding Claim 9, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 8, Sridharan further teaches wherein each processor comprises two of the plurality of processing units of the first type (this limitation is a design choice, see While a single instance of the parallel processing unit 202 is illustrated within the parallel processor 200, any number of instances of the 
parallel processing unit 202 can be included, see ¶0066), wherein for each processor: a first of the plurality of processing units of the first type is configured to perform the steps of counting the amount of incoming data received and determining that the amount of the incoming data received has reached the predefined amount (see claim 1. Also see Sridharan in ¶0053-¶0066), a second of the plurality of processing units of the first type is configured to perform the steps of counting the amount of the further part of the incoming data received and determine that the amount of the further part of the 

Regarding Claim 10, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 7, Sridharan further teaches wherein each of some of the at least some of the plurality of processing units is configured to, subsequent to performing its respective operations to send part of the outgoing data, cause control to pass to another one of the at least some of the plurality of processing units for that another one to perform its respective operations to send part of the outgoing data (see claim 1. Also see Sridharan in ¶0053-¶0066).

Regarding Claim 11, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 9, Sridharan further teaches wherein each of the one or more of the plurality of processing units of the first type is configured to perform the causing of control to pass in response to determining that an amount of a part of the incoming data received has reached a predetermined amount (see claim 1. Also see Sridharan in ¶0053-¶0066).

Regarding Claim 12, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Sridharan further teaches wherein each of the incoming data, outgoing data, and further outgoing data comprise a set of gradients for weights of a machine learning model (FIG. 14A-14E, input data 1402 is processed by a machine 

Regarding Claim 13, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Wentzlaff further teaches wherein each of the at least one circuit comprises: counting circuitry configured to perform the counting an amount of the incoming data received during the first of the exchange stages (see program counter  of Fig. 2A and related paragraphs); and an execution unit configured to execute computer readable instructions to: poll the counting circuitry to determine the amount of the incoming data received (164; When a DMA transaction completes, the DMA engine interrupts the main processor 802.  Alternatively, instead of receiving an interrupt, the main processor 802 poll a status register to determine when a DMA transaction completes); and determine that the amount of the incoming data received has reached the predefined amount (Col. 27, lines 13-35; When a DMA transaction completes, the DMA engine interrupts the main processor 802.  Alternatively, instead of receiving an interrupt, the main processor 802 poll a status register to determine when a DMA transaction completes).

Regarding Claim 14, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Sridharan further teaches wherein the at least one circuit comprises a remote direct memory access engine configured to perform the data transfer operations during each of a plurality of exchange stages (¶0223-¶0226; remote transfer from a transmit node).

Regarding Claim 15, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Wentzlaff further teaches wherein the plurality of processors are arranged in a ring topology such that the at least one circuit of each processor is configured to perform the data transfer operations during each of the plurality of exchange stages to transfer data to its two neighbouring processors in the ring (Col. 5, lines 59-67 & Col. 6, lines 1-10; see claim 1. Also a switch coupled to a processor forwards data to and from the processor or between neighboring processors over data paths of a one-dimensional interconnection network such as ring network also see Sridharan in ¶0272), wherein the counting the amount of the incoming data received during the first of the exchange stages from the one or more others of the processors comprises counting an amount of data received from the two neighbouring processors during the first of the exchange stages (see claim 1. Also Col. 5, lines 59-67 & Col. 6, lines 1-10; The example of the integrated circuit 100 shown in FIG. 1 includes a two-dimensional array 101 of rectangular tiles with data paths 104 between neighboring tiles to form a mesh network.  The data path 104 between any two tiles can include multiple "wires" (e.g., serial, parallel or fixed serial and parallel signal paths on the IC100) to support parallel channels in each direction.  Optionally, specific subsets of wires between the tiles can be dedicated to different mesh networks that can operate independently).
	 
Regarding Claim 16, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Sridharan further teaches wherein the determining further 

Regarding Claim 17, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 16, Sridharan further teaches wherein the at least one circuits of the plurality of processors are configured to implement a reduce-scatter collective comprising the steps of each of the at least one circuits: transferring data determined in dependence upon data received at the respective processor in a preceding stage from at least one other of the processors (¶0193-¶0200 & Figs. 14a-14e; multiple types of low-level communication patterns are used to transfer data between nodes.  The low-level communication patterns used are illustrated in Table 5 below including SCATTER distribute data from a single array into multiple segments “reduce.” See Figs. 14a-e that illustrate data transfer using data parallelism); 
and determining further outgoing data in dependence upon at least part of the incoming data (¶0193-¶0200 & Figs. 14a-14e; multiple types of low-level communication patterns are used to transfer data between nodes.  The low-level communication patterns used are illustrated in Table 5 below including SCATTER distribute data from a single array into multiple segments “reduce.” See Figs. 14a-e that illustrate data transfer using data parallelism).

Regarding Claim 18, Wentzlaff in view of Sridharan teach a data processing system as claimed in claim 1, Sridharan further teaches wherein the at least one circuit comprises at least one of a field programmable gate array or application specific integrated circuit configured to performing the counting of an amount of the incoming data received during the first of the exchange stages from the one or more others of the processors (¶0055 & Fig. 2a; FPGA for parallel processing).

Claim 19-24 are substantially similar to above claims, thus the same rationale applies. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emmanuel Moise can be reached on 571-272-3865. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MAHRAN ABU ROUMI
Primary Examiner
Art Unit 2455



/MAHRAN Y ABU ROUMI/Primary Examiner, Art Unit 2455