DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


This Office Action is in response to the amendment filed on 1/8/2021.  This action is made FINAL.

Claims 1-20 are pending and they are presented for examinations.

Response to Arguments

Applicant's arguments filed regarding claim 1 (page 9), “Nothing in this portion of Kondo and Kondo generally, teaches any aspect of receiving, at a network device from a set of devices, a set of processing results derived from processing the task by the set of computing devices, the set of computing devices processing at least portions of the task in parallel with one another, the network device being separate from and external to the set of computing devices, operatively coupled to each of the computing devices…”  
The examiner would like to point out to Kondo in view of Archer discloses the above limitation.  Kondo discloses plurality of devices that execution portions of task in 
Therefore, argument is not persuasive.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 1, 7 and 14 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

The term "external to the set of computing devices" in claim(s) 1, 7 and 14 is/are a relative term which renders the claim indefinite.  The term "external to the set of computing devices" is not defined by the claim(s), the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  The specification does not disclose the metes and bounds of the term “external to the set of computing devices”.  Is the network device physically or logically external?  Is the network device external to a network (i.e. internet, intranet), etc.?  


Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-8, 13, 15-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kondo et al. (Pub 20190073247) (hereafter Kondo) in view of Archer et al. (Pub 20080301683) (hereafter Archer).

As per claim 1, Kondo teaches:
A method of processing a task, comprising: ([Paragraph 3], A large-scale computation, such as a scientific computation, that uses a computer system sometimes involves a parallel computation through the use of a plurality of computers. A computer system that can perform a parallel computation is known as a parallel computer. Each of a plurality of computers that perform parallel computations is called a computation node device.)
receiving, at a network device from a set of computing devices, a set of processing results derived from processing the task by the set of computing devices, the set of computing devices processing at least a portion of the task in parallel with one another, the network device being separate from and external to the set of computing devices, operatively coupled to each of the computing devices, and configured for broadcasting of network data packets to the computing devices; ([Paragraph 17] [Fig. 2], FIG. 2 illustrates an example of a reduction operation;  [Paragraph 32-35],  FIG. 2 illustrates examples of SUM, which is a result of adding together vectors V1 through V4 respectively of nodes 1 through 4.  Vectors V1 through V4 each have eight elements. V1=(1, 2, 3, 4, 5, 6, 7), V2=(7, 8, 9, 10, 11, 12, 13, 14), V3=(13, 14, 15, 16, 17, 18, 19, 20), and V4=(19, 20, 21, 22, 23, 24, 25, 26). Adding vector V1 through V4 together results in SUM=(40, 44, 48, 52, 56, 60, 
in response to receiving the set of processing results from the set of computing devices at the network device, executing a reduction operation on the set of processing results in the network device; and ([Paragraph 7], A reduction operation performed in a computation node device involves a process in which that computation node device receives a packet from a different computation node device, performs an error check on the packet by using the checksum included in the packet, and performs the reduction operation by using the data in the packet when finding no error.  [Paragraph 28], The computation node devices 201-i are computers that perform parallel computations. A computation node device 201-i may be referred to also as node i. The computation node devices 201-i are connected to adjacent computation node 
transmitting a result of the reduction operation from the network device to the set of computing devices via a network data packet broadcasting mechanism of the network device. ([Paragraph 57], The control unit 222 performs a reduction operation by using the data in the received packet. The control unit 222 outputs the received packet to the CPU 231 via the node report unit 224. The control unit 222 outputs a request made by the CPU 231 to the router unit 211-j, the request being received from the node request unit 223.)
Kondo teaches plurality of computing devices that performs parallel tasks.
However, Kondo does not explicitly disclose the plurality of devices are set of computing devices and transmitting a result via a network data packet broadcasting mechanism of the network device.
Archer teaches the plurality of devices are set of computing devices and transmitting a result via a network data packet broadcasting mechanism of the network device. (i.e. group) of computing devices. ([Paragraph 55], In such a manner, the local reduction results from each compute node (102) are combined and cascade up to the physical root node (202) as the global reduction results. Upon the global reduction result being calculated by the physical root node (202), the physical root (202) 
It would have been obvious to a person with ordinary skill in the art, before the effective filing date of the invention, to combine the teachings of Kondo wherein a set of processing results are received from computing devices, reduction operation is performed and results of the reduction operation is transmitted, into teachings of Archer wherein the results of the reduction operation are transmitted to the computing devices because this would enhance the teachings of Kondo wherein by transmitting the results of the reduction operation to the computing devices through the network and storing the reduction results into shared memory via point to point adapter, thus minimizing some bottlenecks incurred when utilizing a message passing service. 


As per claim 2, rejection of claim 1 is incorporated:
Kondo teaches wherein the network device comprises at least one of a programmable switch and a router. ([Paragraph 17] [Fig. 2], FIG. 2 illustrates an 
Archer also teaches ([Paragraph 48], Each processing core (164) includes an ALU (166), and a separate ALU (170) is dedicated to the exclusive use of Global Combining Network Adapter (188) for use in performing the arithmetic and logical functions of reduction operations, including an allreduce operation.)

As per claim 3, rejection of claim 1 is incorporated:
Archer teaches wherein the set of computing devices comprises as first computing device, the set of processing results comprises a first processing result transmitted from the first computing device, and wherein the method further comprises:
after transmitting the result of the reduction operation to the set of computing devices, 
in response to receiving the first processing result from the first computing device again, retransmitting the result of the reduction operations to the set of computing devices. 

Kondo teaches in response to receiving the first processing result from the first computing device again, retransmitting the result of the reduction operations to the set of computing devices. ([Paragraph 47], The router unit 211-1 includes a reception unit 212, a packet check unit 213, a routing unit 214, a routing information generation unit 215, a retransmission request generation unit 216, a retransmission buffer 217, and a transmission unit 218.)


As per claim 4, rejection of claim 1 is incorporated:
Archer teaches before receiving the set of processing results, 
	receiving, from the set of computing devices, a set of request for executing the reduction operation; and 
	in response to receiving the set of requests, transmitting, to each of the set of computing devices, a response to the set of requests. 
([Paragraph 55], Consider, for example, that each compute node (102) calls an allreduce operation. Each compute node (102) transmits local reduction results to the other compute nodes through the global combining network (106). Each compute node (102) performs the arithmetic operation specified by the allreduce operation in the node's global combining adapter on the local reduction results from that node itself and the local reduction results received from the children nodes. Each compute node (102) then passes the result of the arithmetic operation up to the node's parent. In such a manner, the local reduction results from each compute node (102) are combined and cascade up to the physical root node (202) as the global reduction results. Upon the global reduction result being calculated by the physical root node (202), the physical root (202) sends the global reduction result down the tree (106) to each compute node. Each compute node (102) then receives global reduction results through the network (106) and stores the global reduction results into shared memory.)

As per claim 5, rejection of claim 4 is incorporated:
Archer teaches wherein receiving the set of processing results comprises:
determining, based on the received set of requests, respective network information  of the set of computing devices; and
receiving, based on the respective network information of the set of computing devices, the set of processing results from the set of computing devices. ([Paragraph 55], Consider, for example, that each compute node (102) calls an allreduce operation. Each compute node (102) transmits local reduction results to the other compute nodes through the global combining network (106). Each compute node (102) performs the arithmetic operation specified by the allreduce operation in the node's global combining adapter on the local reduction results from that node itself and the local reduction results received from the children nodes. Each compute node (102) then passes the result of the arithmetic operation up to the node's parent. In such a manner, the local reduction results from each compute node (102) are combined and cascade up to the physical root node (202) as the global reduction results. Upon the global reduction result being calculated by the physical root node (202), the physical root (202) sends the global reduction result down the tree (106) to each compute node. Each compute node (102) then receives global reduction results through the network (106) and stores the global reduction results into shared memory.  [Paragraph 54], In the example of FIG. 5, each node in the tree is assigned a unit identifier referred to as a `rank` (250). A node's rank uniquely identifies the node's location in the tree network for use in both point to point and collective operations in the tree network. The ranks in this example are assigned as integers beginning with 0 assigned to the root node (202), 1 assigned to the first node in the second layer of the tree, 2 assigned to the second node in the second layer of the tree, 3 assigned to the first node in the third layer of the tree, 

As per claim 6, rejection of claim 4 is incorporated:
Archer teaches wherein the set of computing devices comprises a second computing device, the set of requests comprises a second request transmitted from the second computing device, and wherein the method further comprises:
after transmitting the response to each of the set of computing devices,
	in response to receiving the second request from the second computing device again, retransmitting the response to the second computing device.	
([Paragraph 55], Consider, for example, that each compute node (102) calls an allreduce operation. Each compute node (102) transmits local reduction results to the other compute nodes through the global combining network (106). Each compute node (102) performs the arithmetic operation specified by the allreduce operation in the node's global combining adapter on the local reduction results from that node itself and the local reduction results received from the children nodes. Each compute node (102) then passes the result of the arithmetic operation up to the node's parent. In such a manner, the local reduction results from each compute node (102) are combined and cascade up to the physical root node (202) as the global reduction results. Upon the global reduction result being calculated by the physical root node (202), the physical root (202) sends the global reduction result down the tree (106) to each compute node. 
Kondo teaches in response to receiving the second request from the second computing device again, retransmitting the response to the second computing device.  ([Paragraph 47], The router unit 211-1 includes a reception unit 212, a packet check unit 213, a routing unit 214, a routing information generation unit 215, a retransmission request generation unit 216, a retransmission buffer 217, and a transmission unit 218.  [Fig. 4, 5, 6], discloses computation node device (i.e. router/switch) network interface with reduction operation unit, reception buffer)

As per claims 7, 8, this is a method claim corresponding to the method claims 1, 2.  Therefore, these are rejected based on similar rationale.  Kondo/Archer discloses transmitting the processing results to network device to execute reduction operation.

As per claim 13, rejection of claim 7 is incorporated:
Archer teaches wherein transmitting the processing result to the network device comprises:
before transmitting the processing result, transmitting a request for executing the reduction operation to the network device; and 
in response to receiving a response to the request from the network device, transmitting the processing result to the network device. ([Paragraph 14], disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an 

As per claims 15-17, these are network device claims corresponding to the method claims 1-3.  Therefore, rejected based on similar rationale.

As per claim 18, this is a computing device claim corresponding to the moetho claim 7.  Therefore, rejected based on similar rationale.

As per claims 19 and 20, these are non-transitory computer storage medium claims corresponding to the method claims 1 and 7.  Therefore, rejected based on similar rationale.

Claims 9-12 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kondo in view of Archer and further in view of Asaad et al. (Pub 20110219208) (hereafter Asaad).

As per claim 9, rejection of claim 7 is incorporated:
wherein transmitting the processing result to the network device comprises:
pre-processing the processing result such that the pre-processing result is adapted for the reduction operation to be executed by the network device; and 
transmitting, to the network device, the pre-processed processing result.
Asaad teaches wherein transmitting the processing result to the network device comprises:
pre-processing the processing result such that the pre-processing result is adapted for the reduction operation to be executed by the network device; and 
transmitting, to the network device, the pre-processed processing result. ([Paragraph 1620], FIG. 1 illustrates a flow chart for adding a plurality of floating point numbers in a parallel computing system. The parallel computing system may include a plurality of computing nodes. A computing node may include, without limitation, at least one processor and/or at least one memory device. At step 100 in FIG. 1, the collective logic device 260 receives the inputs 200 which include a plurality of floating point numbers ("first floating point numbers") from computing nodes or network links. At step 105, the FP exponent max unit 220 finds a maximum exponent (i.e., the largest exponent) of the first floating point numbers, e.g., by comparing exponents of the first floating point numbers. The FP exponent max unit 220 broadcast the maximum exponent to the computing nodes. At step 110, the front-end floating point logic device 270 converts the first floating point numbers to integer numbers, e.g., by performing left shifting and/or right shifting the first floating point numbers according to differences between exponents of the first floating point numbers and the maximum exponent. 
It would have been obvious to a person with ordinary skill in the art, before the effective filing date of the invention, to combine the teachings of Kondo and Archer wherein a set of processing results are received from computing devices, reduction operation is performed and result of the reduction operation is transmitted, into teachings of Asaad wherein the processing result is pre-processed to be adapted for a reduction operation because this would enhance the teachings of Kondo and Archer wherein by performing the pre-processing to change/convert the processing result, the reduction operation (i.e. sum operation using integers) can be performed on floating points and does not generate different results based on an order of addition.

As per claim 10, rejection of claim 9 is incorporated:
Asaad teaches wherein pre-processing the processing result comprises at least one of the following:
converting the processing result into a predetermined value range;
converting the processing result from a negative value into a positive value; and
converting the processing result from a floating point number into an integer. ([Paragraph 1620], FIG. 1 illustrates a flow chart for adding a plurality of floating point numbers in a parallel computing system. The parallel computing system may include a plurality of computing nodes. A computing node may include, without limitation, at least one processor and/or at least one memory device. At step 100 in FIG. 1, the collective logic device 260 receives the inputs 200 which include a plurality of floating point numbers ("first floating point numbers") from computing nodes or network links. At step 105, the FP exponent max unit 220 finds a maximum exponent (i.e., the largest exponent) of the first floating point numbers, e.g., by comparing exponents of the first floating point numbers. The FP exponent max unit 220 broadcast the maximum exponent to the computing nodes. At step 110, the front-end floating point logic device 270 converts the first floating point numbers to integer numbers, e.g., by performing left shifting and/or right shifting the first floating point numbers according to differences between exponents of the first floating point numbers and the maximum exponent. Then, the front-end floating point logic device 270 sends the integer numbers to the ALU tree 230 which includes integer adders (e.g., an adder 280). When sending the integer numbers, the front-end floating point logic device 270 may also send extra bits representing plus(+) infinity, minus(-) infinity and/or a not-a-number (NAN). NAN indicates an invalid operation and may cause an exception.  [Paragraph 1621], The outputs do not depend on an order of the inputs. Since an addition of integer numbers ( converted from the floating point numbers) does not generate a different output based 

As per claim 11, rejection of claim 9 is incorporated:
Asaad teaches in response to receiving the result of the reduction operation, performing post-processing opposite to the pre-processing on the result of the reduction operation. ([Paragraph 1631], Thus, when the number A is converted to an integer number, it becomes 0x0180000000000000. When the number B is converted, it becomes 0x0500000000000000. Note that the integer numbers comprise only the mantissa field. Also note that the most significant bit of the number B is two binary digits to the left (larger) than the most significant bit of the number A. This is exactly the difference between the two exponents (1 and 3).  III. (corresponding to Step 120 in FIG. 1) The two integer numbers are added. In this example, the result is 0x0680000000000000=0x0180000000000000+0x0500000000000000. IV. corresponding to Step 130 in FIG. 1) This result is then converted back to a floating point representation, taking into account the maximum exponent which has been passed through the collective logic device 260 in parallel with the addition as follows: )

As per claim 12, rejection of claim 7 is incorporated:
However, Kondo and Archer do not explicitly disclose in response to failing to receive the result of the reduction operation within a first threshold period after transmitting the processing result, retransmitting the processing result to the network device.
in response to failing to receive the result of the reduction operation within a first threshold period after transmitting the processing result, retransmitting the processing result to the network device. ([Paragraph 1456], If no acks come back for a predetermined timeout period, packets from the retransmission FIFO are retransmitted in the same order to the next node.)
It would have been obvious to a person with ordinary skill in the art, before the effective filing date of the invention, to combine the teachings of Kondo and Archer wherein a set of processing results are received from computing devices, reduction operation is performed and result of the reduction operation is transmitted, into teachings of Asaad wherein result of processing is retransmitted if response is not receiving within a threshold period because this would enhance the teachings of Kondo and Archer wherein by ensuring result is received/transmitted within a period of time, will ensure result(s) of the reduction operations can be collected/received without delay.

As per claim 14, rejection of claim 13 is incorporated:
However, Kondo and Archer do not explicitly disclose in response to failing to receive the response to the request within a second threshold period after transmitting the request, retransmitting the request to the network device.
Asaad teaches in response to failing to receive the response to the request within a second threshold period after transmitting the request, retransmitting the request to the network device. ([Paragraph 1456], If no acks come back for a predetermined timeout period, packets from the retransmission FIFO are retransmitted in the same order to the next node.)
.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emerson Puente can be reached on 5712723652.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DONG U KIM/Primary Examiner, Art Unit 2196