DETAILED ACTION
Status of claims
This action is in response to the amendment filed on 5/2/2022 for application 16/002,636 filed on 6/7/2018. Claim 1, 3 – 8, 15 – 26 are pending and have been examined.

Claim 15 is amended. 


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/7/2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendment
Applicant's remark filed on 5/2/2021 has been fully considered but they are not persuasive. 

Regarding prior art rejection, applicant state that Das do not teach or suggest “that a processing element is in any way associated with a portion of a deep learning model”. Examiner respectfully disagree. The primary reference, Copjak, discloses a method of processing neural network in distributed system, each processing element handles a portion of deep learning model. Das also disclose similar setup to process deep neural network as illustrated in at least fig. 14. Das further discloses using registers to manage distributed processing system including handles that reference to processing elements, which store in memory, with offsets, that are known in memory management field, to point to a specific location in the memory window, thus, point to the portion of the deep learning model when it is combined with Copjak’s disclosure.
 Applicant further state that Das process handle does not teach or suggest the first memory handle of claim 1. Examiner respectfully disagree. Das column 16 line 61 – 64 discloses: select a process element using a process handle. In the context of this paragraph, processing elements are stored in system memory, one of skilled in the art would recognize that, under broadest reasonable interpretation, the process handle of Das falls within the scope of a memory handle which point to a memory segment (window) of the referenced process which is analogous to Claim 1. 
Applicant further state that the offset of Das does not teach or suggest the first memory offset of Claim1. Examiner respectfully disagree. Offset is a well-known technical term in memory address model especially used with handler. While the memory segment contains data of a program, the offset refer to a location in the segment. The knowledge of memory offset can easily be found online through search engine. A lecture notes from Philadelphia University of Jordan is included in this rejection as an evidence reference (Lecture Notes, Microprocessors, Philadelphia University in Jordan, 2010-2011).  
Applicant further state that Das does not teach or suggest transferring aspects of a deep learning model between nodes. Examiner respectfully disagree. Das discloses various interconnection between processing elements including message passing interface and remote memory access (Das, col. 12, ln. 28 – 43), Copjak discloses transferring portions of deep model between processing elements (Copjak, sec. IV B, ln. 11 – 13). Copjak combined with Das discloses that the deep learning model are transferred between processing nodes using message passing interface and remote memory access. One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1, 3 – 5, 7 – 8, 15 – 20 and 23 – 25 are rejected under 35 U.S.C. 103 as being unpatentable over Copjak et al., Advanced architecture distributed systems for the implementation of neural networks, 12th IEEE International Conference on Emerging eLearning Technologies and Applications, 2014, in view of Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation.

Regarding Claim 1, Copjak teach a computer-implemented method (Copjak, intro. ln. 5, where operating computer systems) comprising: 
generating a model mapping table (MMT) storing information regarding respective portions of a deep learning model distributed amongst a plurality of interconnected host nodes (Copjak, fig. 11 & sec. IV. C, para. 2, where layer [portion of deep learning model], computing_node [interconnected host node] in the file format), wherein the MMT comprises a first entry associated with a first portion of the deep learning model (Copjak, sec. IV. C, para. 2, where LAYER [portion of deep learning model])
wherein respective host nodes comprise at least one central processing unit (CPU), at least one CPU memory, at least one graphics processing unit (GPU), and at least one GPU memory (Copjak, sec. V, para. 1, each node has CPU, GPU; sec. II, para. 8, ln. 14, CPU and GPU memory), wherein the deep learning model comprises an amount of data larger than an amount of memory in any respective host node of the plurality of interconnected host nodes (Copjak, Abs. 8, the system is intended to solve large neural networks);
and training the deep learning model by training the respective portions of the deep learning model on the plurality of interconnected host nodes (Copjak, sec. V. para. 1, where each GPGPU solve partial computing problem), the training comprising:
receiving a request from a requesting GPU for a first portion of the deep learning model, wherein the requesting GPU is associated with a requesting GPU memory and a requesting host node (Copjak, sec. IV B, ln. 5 – 8, where GPGPU on the node respond to the master);
identifying the first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model based on … the MMT (Copjak, sec. IV B, ln. 11 – 13, where master identify and send the relevant distributed data [identify the data at first host] );
transferring the first portion of the deep learning model from the first host node to the requesting host node (Copjak, sec. IV B, ln. 11 – 13, where master send the relevant data [first portion of model] to the node [requesting host node] );
providing a first copy of the first portion of the deep learning model from the requesting host node to the requesting GPU memory (Copjak, sec. III, para. 2, ln. 54, where data is processed in GPU memory);
performing processing, by the requesting GPU, on the first copy of the first portion of the deep learning model stored in the requesting GPU memory (Copjak, sec. IV B, ln. 20 – 21, where after receiving data, the node processes; sec. III, para. 2, ln. 54, where data is processed in GPU memory);
synchronizing the first copy of the first portion of the deep learning model with the first portion of the deep learning model in response to performing processing (Copjak, sec. II, para. 5, ln. 29 – 39, where parent child processing block guarantees synchronization of parallel computing);
and updating the MMT based on synchronizing the first copy of the first portion of the deep learning model (Copjak, sec. IV B, ln. 23 – 24, where processed node send message to master and wait for next processing, sec. IV, ln. 15 – 18, system keep the file updated on node/neuron information).
Copjak does not explicitly disclose: 
wherein the first entry comprises a first memory handle and a first memory offset wherein the first memory handle indicates a location of a window associated with the first portion of the deep learning model in a first host node and wherein the first memory offset indicates a location of the first portion of the deep learning model in the window of the first host node
identifying the first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model based on the first memory handle and the first memory offset from the first entry of the MMT
transferring the first portion of the deep learning model from the first host node to the requesting host node using a message passing interface (MPI) remote memory access (RMA) protocol
Das discloses: 
wherein the first entry comprises a first memory handle and a first memory offset wherein the first memory handle indicates a location of a window associated with the first portion of the deep learning model in a first host node and wherein the first memory offset indicates a location of the first portion of the deep learning model in the window of the first host node (Das, col. 16, ln. 61 – col 17, ln. 10, where for the shared programming model … processing elements using a process handle [first memory handle], in one embodiment, process elements are stored in system memory and are addressable using the effective address … lower 16 bits of the process handle may be the offset [first memory offset] of the process element within the process element link [location of the first portion of the deep learning model in the window of the first host node] … application effective address space [location of a window] within system memory stores process elements [first portion of the deep learning model in a first host node]; tbl 4 is an embodiment of process element [first portion of the deep learning model] register information which include the effective address of the process element);
identifying the first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model based on the first memory handle and the first memory offset from the first entry of the MMT (Das, tbl 4, where using the process element information to identify the effective address of the processing element; Col. 16, ln. 61 – col 17, ln. 10, where the process handle [first memory handle] of the effective address include offset [first memory offset]);
transferring the first portion of the deep learning model from the first host node to the requesting host node using a message passing interface (MPI) remote memory access (RMA) protocol (Das, col 12, ln. 37 – 40, where communicate with remote components via interconnect fabric [MPI] and share memory [RMA])
Copjak and Das both teach computer-implemented methods for parallel processing of large neural network system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak’s teaching of distributed neural network processing system with Das’s teaching of optimized computer system for parallel processing to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to ensure bandwidth between components (Das, col, 12, ln. 40 - 42) and to provide paths for the functional units (Das, col, 11, ln. 9 – 13). 

Regarding Claim 3, depending on Claim 1, Copjak in view of Das teach the computer-implemented method of Claim 1, Copjak in view of Das further teach: 
 wherein the first entry further comprises a first pointer, a first layer identifier (Copjak, sec. IV. C, para. 2, where layer [portion of deep learning model &layer identifier], computing_node [first pointer] in the file format) and a first process rank (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Regarding Claim 4, depending on Claim 3, Copjak in view of Das teach the computer-implemented method of Claim 3, Copjak in view of Das further teach: 
wherein the first pointer points to a location of the first portion of the deep learning model in the plurality of interconnected host nodes (Copjak, sec. IV. C, para. 2, where computing_node [first pointer, location] in the file format); 
wherein the first layer identifier indicates a layer of the deep learning model associated with the first portion of the deep learning model (Copjak, sec. IV. C, para. 2, where layer in the file format);
wherein the first process rank comprises a rank of a process associated with the requesting GPU (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Regarding Claim 5, depending on Claim 4, Copjak in view of Das teach the computer-implemented method of Claim 4, Copjak in view of Das further teach: 
wherein the first entry is further associated with metadata indicating a data type of the first portion of the deep learning model (Das, Col. 17, ln. 42 – 45, where work context [data type] stores data from WD [a data element in first entry]).

Regarding Claim 7, Copjak in view of Das teach the computer-implemented method of Claim 1, Copjak further teach: 
 wherein performing processing on the first copy of the first portion of the deep learning model comprises performing forward propagation for a portion of a layer of the deep learning model (Copjak, Fig. 10, where, in computer processor Node 1, model data [first portion of the deep learning model] is processed from input layer to the first hidden layer in forward propagation manner)

Regarding Claim 8, depending on Claim 1, Copjak in view of Das teach the computer-implemented method of Claim 1. Copjak in view of Das further disclose:
wherein the first portion of the deep learning model comprises a portion of a first operation for training the deep learning model, wherein the first operation is associated with a first amount of data that is larger than a memory capacity of the first host node (Das, Col. 42, ln. 12 – 16, where parallel processor enables training with significantly larger dataset than previously feasible).

Regarding Claim 15, Copjak teaches: 
A computer-implemented method (Copjak, sec. II, para. 5, ln. 50 – 52, where system use CUDA and Thrust programming language which the code are stored in the computer-readable storage medium in server) comprising:
generating a model mapping table (MMT) storing … respective portions of a deep learning model distributed amongst the plurality of interconnected host nodes (Copjak, fig. 11 & sec. IV. C, para. 2, where layer [portion of deep learning model], computing_node [interconnected host node] in the file format), wherein respective host nodes comprise at least one central processing unit (CPU), at least one CPU memory, at least one graphics processing unit (GPU), and at least one GPU memory (Copjak, sec. V, para. 1, each node has CPU, GPU; sec. II, para. 8, ln. 14, CPU and GPU memory), wherein the deep learning model comprises an amount of data larger than an amount of memory in any respective host node of the plurality of interconnected host nodes (Copjak, Abs. 8, the system is intended to solve large neural networks)
and outputting a trained deep learning model by distributing training amongst respective portions of the deep learning model on the plurality of interconnected host nodes (Copjak, sec. V. para. 1, where each GPGPU solve partial computing problem), wherein training respective portions of the deep learning model comprises transferring respective portions of the deep learning model between respective host nodes of the plurality of interconnected host nodes (Copjak, sec. IV B, ln. 11 – 13, where master send the relevant data [first portion of model] to the node [requesting host node] ) and providing respective copies of the respective portions of the deep learning model to respective GPU memories for processing by respective GPUs (Copjak, sec. III, para. 2, ln. 54, where data is processed in GPU memory).
Copjak did not explicitly teach: 
Initializing a large model pooler (LMP) by registering with a handle of a window region of each host node of a plurality of interconnected host nodes 
storing respective memory handle and respective memory offset for respective portions of a deep learning model distributed amongst the plurality of interconnected host nodes
Distributing training amongst respective portion of the deep learning model using LMP
using a message passing interface (MPI) remote memory access (RMA) protocol 
Das explicitly teach 
Initializing a large model pooler (LMP) by registering with a handle of a window region of each host node of a plurality of interconnected host nodes (Das, col. 16, ln. 61 – col. 17, ln. 11, where for the shared programming model, … the process handle … registering its context … application effective address space within system memory [window region of each host node] store process elements; fig. 4D & col. 17, ln. 38 – 45, where acceleration integration slice 490 [large model pooler] store work descriptors and manage memory access)
storing respective memory handle and respective memory offset for respective portions of a deep learning model distributed amongst the plurality of interconnected host nodes (Das, col. 16, ln. 61 – col 17, ln. 10, where for the shared programming model … processing elements using a process handle [memory handle], in one embodiment, process elements are stored in system memory and are addressable using the effective address … lower 16 bits of the process handle may be the offset [memory offset] of the process element within the process element link [location of the portion of the deep learning model amongst the plurality of interconnected host node]; tbl 4 is an embodiment of process element [portion of the deep learning model] register information which include the effective address of the process element)
Distributing training amongst respective portion of the deep learning model using LMP (Das, fig. 4 & col 17, 38 – 42, where accelerator integration slice 490 [LMP] manage the next work [respective portion of the deep learning model] to be done by one of the graphic processing engines)
using a message passing interface (MPI) remote memory access (RMA) protocol  (Das, col 12, ln. 37 – 40, where communicate with remote components via interconnect fabric [MPI] and share memory [RMA])
Copjak and Das both teach computer-implemented methods for parallel processing of large neural network system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak’s teaching of distributed neural network processing system with Das’s teaching of optimized computer system for parallel processing to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to ensure bandwidth between components (Das, col, 12, ln. 40 - 42). 

Regarding Claim 16, depending on Claim 15, Copjak in view of Das teach the computer-implemented method claim of Claim 15. Copjak in view of Das further teach:   
receiving a request from a requesting GPU for a first portion of the deep learning model, wherein the requesting GPU is associated with a requesting GPU memory and a requesting host node (Copjak, sec. IV B, ln. 5 – 8, where GPGPU on the node respond to the master);
identifying a first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model based on information in the MMT (Copjak, sec. IV B, ln. 11 – 13, where master identify and send the relevant distributed data [identify the data at first host] );
transferring the first portion of the deep learning model from the first host node to the requesting host node (Copjak, sec. IV B, ln. 11 – 13, where master send the relevant data [first portion of model] to the node [requesting host node] );
providing a first copy of the first portion of the deep learning model from the requesting host node to the requesting GPU memory (Copjak, sec. III, para. 2, ln. 54, where data is processed in GPU memory);
performing processing, by the requesting GPU, on the first copy of the first portion of the deep learning model stored in the requesting GPU memory (Copjak, sec. IV B, ln. 20 – 21, where after receiving data, the node processes; sec. III, para. 2, ln. 54, where data is processed in GPU memory);
synchronizing the first copy of the first portion of the deep learning model with the first portion of the deep learning model in response to performing processing (Copjak, sec. II, para. 5, ln. 29 – 39, where parent child processing block guarantees synchronization of parallel computing);
and updating the MMT based on synchronizing the first copy of the first portion of the deep learning model (Copjak, sec. IV B, ln. 23 – 24, where processed node send message to master and wait for next processing, sec. IV, ln. 15 – 18, system keep the file updated on node/neuron information).

Regarding Claim 17, depending on Claim 16, Copjak in view of Das teach the computer-implemented method of Claim 16. Copjak in view of Das further teach:
wherein the MMT comprises a first entry associated with the first portion of the deep learning model, wherein the first entry comprises a first pointer, a first layer identifier (Copjak, sec. IV. C, para. 2, where layer [portion of deep learning model &layer identifier], computing_node [first pointer] in the file format), a first memory handle, a first memory offset, (Das, col. 16, ln. 61 – col 17, ln. 10, where for the shared programming model … processing elements using a process handle [first memory handle], in one embodiment, process elements are stored in system memory and are addressable using the effective address … lower 16 bits of the process handle may be the offset [first memory offset] of the process element within the process element link) and a first process rank (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Regarding Claim 18, depending on Claim 17, Copjak in view of Das teach the computer-implemented method of Claim 17. Copjak in view of Das further teach:
wherein the first pointer points to a location of the first portion of the deep learning model in the plurality of interconnected host nodes (Copjak, sec. IV. C, para. 2, where computing_node [first pointer, location] in the file format); 
wherein the first layer identifier indicates a layer of the deep learning model associated with the first portion of the deep learning model (Copjak, sec. IV. C, para. 2, where layer in the file format);
wherein the first memory handle indicates a location of a window associated with the first portion of the deep learning model in the first host node; wherein the first memory offset indicates a location of the first portion of the deep learning model in the window of the first host node (Das, col. 16, ln. 61 – col 17, ln. 10, where for the shared programming model … processing elements using a process handle [first memory handle], in one embodiment, process elements are stored in system memory and are addressable using the effective address … lower 16 bits of the process handle may be the offset [first memory offset] of the process element within the process element link [location of the first portion of the deep learning model in the window of the first host node] … application effective address space [location of a window] within system memory stores process elements [first portion of the deep learning model in a first host node]);
and wherein the first process rank comprises a rank of a process associated with the requesting GPU (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Regarding Claim 19, depending on Claim 18, Copjak in view of Das teach the computer-implemented method of Claim 18. Copjak in view of Das further teach:
wherein performing processing on the first copy of the first portion of the deep learning model comprises performing forward propagation for a portion of a layer of the deep learning model (Copjak, Fig. 10, where forward propagation of deep learning model)

Regarding Claim 20, depending on Claim 18, Copjak in view of Das teach the computer-implemented method of Claim 18. Copjak in view of Das further teach:
wherein performing processing on the first copy of the first portion of the deep learning model comprises performing backpropagation for a portion of a layer of the deep learning model (Das, Fig. 12, where back propagation 1205 of neural network)

Regarding Claim 23, Copjak teach a computer-implemented method (Copjak, intro. ln. 5, where operating computer systems) comprising: 
training the deep learning model by training the respective portions of the deep learning model on a plurality of interconnected host nodes (Copjak, sec. V. para. 1, where each GPGPU solve partial computing problem), the training comprising:
receiving a request from a requesting GPU for a first portion of the deep learning model, wherein the requesting GPU is associated with a requesting GPU memory and a requesting host node (Copjak, sec. IV B, ln. 5 – 8, where GPGPU on the node respond to the master);
identifying a first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model … from a first entry of a Model Mapping Table (MMT) (Copjak, sec. IV B, ln. 11 – 13, where master identify and send the relevant distributed data [identify the data at first host]; fig. 11 & sec. IV. C, para. 2, where a XML format table [Model Mapping Table] of nodes);
transferring a first portion of the deep learning model from the first host node to the requesting host node (Copjak, sec. IV B, ln. 11 – 13, where master send the relevant data [first portion of model] to the node [requesting host node] );
providing a first copy of the first portion of the deep learning model from the requesting host node to the requesting GPU memory (Copjak, sec. III, para. 2, ln. 54, where data is processed in GPU memory);
performing processing, by the requesting GPU, on the first copy of the first portion of the deep learning model stored in the requesting GPU memory (Copjak, sec. IV B, ln. 20 – 21, where after receiving data, the node processes; sec. III, para. 2, ln. 54, where data is processed in GPU memory);
synchronizing the first copy of the first portion of the deep learning model with the first portion of the deep learning model in response to performing processing (Copjak, sec. II, para. 5, ln. 29 – 39, where parent child processing block guarantees synchronization of parallel computing);
and updating the MMT based on synchronizing the first copy of the first portion of the deep learning model (Copjak, sec. IV B, ln. 23 – 24, where processed node send message to master and wait for next processing, sec. IV, ln. 15 – 18, system keep the file updated on node/neuron information).
Copjak does not explicitly disclose: 
identifying the first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model based on the first memory handle and the first memory offset from the first entry of a Model Mapping Table (MMT)
transferring the first portion of the deep learning model from the first host node to the requesting host node using a message passing interface (MPI) remote memory access (RMA) protocol
Das discloses: 
identifying the first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model based on the first memory handle and the first memory offset from the first entry of a Model Mapping Table (MMT) (Das, tbl 4, where using the process element information to identify the effective address of the processing element; Col. 16, ln. 61 – col 17, ln. 10, where the processing handle [first memory handle] of the effective address includes offset [first memory offset]);
transferring the first portion of the deep learning model from the first host node to the requesting host node using a message passing interface (MPI) remote memory access (RMA) protocol (Das, col 12, ln. 37 – 40, where communicate with remote components via interconnect fabric [MPI] and share memory [RMA])
Copjak and Das both teach computer-implemented methods for parallel processing of large neural network system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak’s teaching of distributed neural network processing system with Das’s teaching of optimized computer system for parallel processing to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to ensure bandwidth between components (Das, col, 12, ln. 40 - 42) and to provide paths for the functional units (Das, col, 11, ln. 9 – 13). 

Regarding Claim 24, depending on Claim 23, Copjak in view of Das teach the computer-implemented method of Claim 23, Copjak in view of Das further teach: 
 wherein the first entry further comprises a first pointer, a first layer identifier (Copjak, sec. IV. C, para. 2, where layer [portion of deep learning model &layer identifier], computing_node [first pointer] in the file format) and a first process rank (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Regarding Claim 25, depending on Claim 24, Copjak in view of Das teach the computer-implemented method of Claim 24, Copjak in view of Das further teach: 
wherein the first pointer points to a location of the first portion of the deep learning model in the plurality of interconnected host nodes (Copjak, sec. IV. C, para. 2, where computing_node [first pointer, location] in the file format); 
wherein the first layer identifier indicates a layer of the deep learning model associated with the first portion of the deep learning model (Copjak, sec. IV. C, para. 2, where layer in the file format);
wherein the first process rank comprises a rank of a process associated with the requesting GPU (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Copjak et al., Advanced architectures distributed systems for the implementation of neural networks, 12th IEEE International Conference on Emerging eLearning Technologies and Applications, 2014, in view of Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation, filed on Jan, 2018 further in view of Alwani et al, Fused-layer CNN accelerators, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.

Regarding Claim 6, depending on Claim 5, Copjak in view of Das teach the computer-implemented method of Claim 5, Copjak in view of Das do not explicitly disclose: 
wherein the first entry is further associated with a flag indicating a first function that is associated with the first portion of the deep learning model, wherein the first function is selected from the group consisting of: a reuse data function, and a recompute function.
Alwani explicitly teach
wherein the first entry is further associated with a flag indicating a first function that is associated with the first portion of the deep learning model, wherein the first function is selected from the group consisting of: a reuse data function, and a recompute function (Alwani, sec. III A, para. 7, ln. 6 – 8, where 2 way to handle overlap values, recompute or reuse).
Copjak (in view of Das) and Alwani both teach computer-implemented methods for parallel processing of large neural network system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak (in view of Das)’s teaching of distributed neural network processing system with Alwani’s teaching of handling the overlapping data to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification for the simplicity of the operation (Alwani, sec. III A, para. 7, ln. 10).

Claim 21 – 22 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Copjak et al., Advanced architectures distributed systems for the implementation of neural networks, 12th IEEE International Conference on Emerging eLearning Technologies and Applications, 2014, in view of Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation, filed on Jan, 2018, further in view of Nguyen, MPI One-sided Communication, intel, 2014.

Regarding Claim 21, depending on Claim 15, Copjak in view of Das teach the computer-implemented method of Claim 15, Copjak in view of Das do not explicitly disclose: 
wherein the message passing interface (MPI) remote memory access (RMA) protocol is selected from a group consisting of: MPl-1, MPl-2, and MPI-3.
Nguyen explicitly discloses:
wherein the message passing interface (MPI) remote memory access (RMA) protocol is selected from a group consisting of: MPl-1, MPl-2, and MPI-3 (Nguyen, page. 1, para. 4, where MPI-2 standard introduce Remote Memory Access (RMA) communication, … MPI-3 … adding new functionality to improve the performance of MPI 2 RMA; i.e., can choose from different MPI versions to implement MPI RMA)
Copjak (in view of Das) and Nguyen both teach message passing interface and remote memory access in parallel processing in communication and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak (in view of Das)’s teaching of distributed neural network processing system using MPI RMA communication protocol with Nguyen’s teaching of the details of MPI RMA communication protocol to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to overcome the drawback of communication delay (Nguyen, page. 1, para. 3 ln. 4 & para. 4, ln. 1).

Regarding Claim 22, depending on Claim 15, Copjak in view of Das teach the computer-implemented method of Claim 15, Copjak in view of Das do not explicitly disclose: 
wherein the message passing interface (MPI) remote memory access (RMA) protocol comprises a one-way messaging protocol configured to read and write to respective window regions of respective host nodes of the plurality of interconnected host nodes without involvement of other host nodes of the plurality of interconnected host nodes.
Nguyen explicitly discloses:
wherein the message passing interface (MPI) remote memory access (RMA) protocol comprises a one-way messaging protocol (Nguyen, page. 1, para. 4. Ln. 1 – 2, where one-sided communication [one-way messaging]) configured to read and write to respective window regions of respective host nodes (Nguyen, page. 1, para. 5, ln. 1 – 3, where shared memory region, also called a window, synchronization [read and write] between host nodes) of the plurality of interconnected host nodes without involvement of other host nodes of the plurality of interconnected host nodes (Nguyen, page. 1, para. 4, ln. 5 – 6, where the process can have direct access to the memory address space of a remote process, i.e., the hosts can directly access to the memory of the target host without involve other processor)

Regarding Claim 26, depending on Claim 23, Copjak in view of Das teach the computer-implemented method of Claim 23, Copjak in view of Das do not explicitly disclose: 
wherein the message passing interface (MPI) remote memory access (RMA) protocol comprises a one-way messaging protocol configured to read and write to respective window regions of respective host nodes of the plurality of interconnected host nodes without involvement of other host nodes of the plurality of interconnected host nodes.
Nguyen explicitly discloses:
wherein the message passing interface (MPI) remote memory access (RMA) protocol comprises a one-way messaging protocol (Nguyen, page. 1, para. 4. Ln. 1 – 2, where one-sided communication [one-way messaging]) configured to read and write to respective window regions of respective host nodes (Nguyen, page. 1, para. 5, ln. 1 – 3, where shared memory region, also called a window, synchronization [read and write] between host nodes) of the plurality of interconnected host nodes without involvement of other host nodes of the plurality of interconnected host nodes (Nguyen, page. 1, para. 4, ln. 5 – 6, where the process can have direct access to the memory address space of a remote process, i.e., the hosts can directly access to the memory of the target host without involve other processor)
Copjak (in view of Das) and Nguyen both teach message passing interface and remote memory access in parallel processing in communication and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak (in view of Das)’s teaching of distributed neural network processing system using MPI RMA communication protocol with Nguyen’s teaching of the details of MPI RMA communication protocol to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to overcome the drawback of communication delay (Nguyen, page. 1, para. 3 ln. 4 & para. 4, ln. 1).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        

/BRIAN M SMITH/Primary Examiner, Art Unit 2122