DETAILED ACTION
Status of claims
This action is in response to the application filed on 6/7/2018 for application 16/002,636. Claim 1 – 20 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 6/7/2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.



Claim 1, 7 and 9 – 11 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Copjak et al., Advanced architectures distributed systems for the implementation of neural networks, 12th IEEE International Conference on Emerging eLearning Technologies and Applications, 2014. 

Regarding Claim 1, Copjak teach a computer-implemented method (Copjak, intro. ln. 5, where operating computer systems) comprising: 
generating a model mapping table (MMT) storing information regarding respective portions of a deep learning model distributed amongst a plurality of interconnected host nodes (Copjak, fig. 11 & sec. IV. C, para. 2, where layer [portion of deep learning model], computing_node [interconnected host node] in the file format), wherein respective host nodes comprise at least one central processing unit (CPU), at least one CPU memory, at least one graphics processing unit (GPU), and at least one GPU memory (Copjak, sec. V, para. 1, each node has CPU, GPU; sec. II, para. 8, ln. 14, CPU and GPU memory), wherein the deep learning model comprises an amount of data larger than an amount of memory in any respective host node of the plurality of interconnected host nodes (Copjak, Abs. 8, the system is intended to solve large neural networks);
and training the deep learning model by training the respective portions of the deep learning model on the plurality of interconnected host nodes (Copjak, sec. V. para. 1, where each GPGPU solve partial computing problem), the training comprising:
receiving a request from a requesting GPU for a first portion of the deep learning model, wherein the requesting GPU is associated with a requesting GPU memory and a requesting host node (Copjak, sec. IV B, ln. 5 – 8, where GPGPU on the node respond to the master);
identifying a first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model based on information in the MMT (Copjak, sec. IV B, ln. 11 – 13, where master identify and send the relevant distributed data [identify the data at first host] );
transferring the first portion of the deep learning model from the first host node to the requesting host node (Copjak, sec. IV B, ln. 11 – 13, where master send the relevant data [first portion of model] to the node [requesting host node] );
providing a first copy of the first portion of the deep learning model from the requesting host node to the requesting GPU memory (Copjak, sec. III, para. 2, ln. 54, where data is processed in GPU memory);
performing processing, by the requesting GPU, on the first copy of the first portion of the deep learning model stored in the requesting GPU memory (Copjak, sec. IV B, ln. 20 – 21, where after receiving data, the node processes; sec. III, para. 2, ln. 54, where data is processed in GPU memory);
synchronizing the first copy of the first portion of the deep learning model with the first portion of the deep learning model in response to performing processing (Copjak, sec. II, para. 5, ln. 29 – 39, where parent child processing block guarantees synchronization of parallel computing);
and updating the MMT based on synchronizing the first copy of the first portion of the deep learning model (Copjak, sec. IV B, ln. 23 – 24, where processed node send message to master and wait for next processing, sec. IV, ln. 15 – 18, system keep the file updated on node/neuron information).

Regarding Claim 7, Copjak teach the computer-implemented method of Claim 1, Copjak further teach: 
 wherein performing processing on the first copy of the first portion of the deep learning model comprises performing forward propagation for a portion of a layer of the deep learning model (Copjak, Fig. 10, where, in computer processor Node 1, model data [first portion of the deep learning model] is processed from input layer to the first hidden layer in forward propagation manner)

Regarding Claim 9, Claim 9 is the system claim corresponding to Claim 1. Copjak further teach: a processor and a computer-readable storage medium storing program instructions for deep learning model training which, when executed by the processor, are configured to cause the processor to perform a method (Copjak, sec. II, para 5, ln 50 – 52, where CUDA and Thrust are program instructions stored in computer-readable storage medium. When executed by processor, cause processor to perform method). 
Claim 9 is rejected with the same reason as Claim 1. 

Regarding Claim 10, depending on Claim 9, Copjak teach the system claim of Claim 9. Copjak further teach: wherein the program instructions were downloaded over a network from a remote data processing system (Copjak, sec. IV B, where master [remote data processing system] send request [instruction] to computing node; sec. IV B, through interconnected Ethernet link).

Regarding Claim 11, depending on Claim 9, Copjak teach the system claim of Claim 9. Copjak further teach: wherein the program instructions are stored in a computer-readable storage medium in a server data processing system, and wherein the instructions were downloaded over a network to the system to provide deep learning model training functionality to the system (Copjak, sec. II, para. 5, ln. 50 – 52, where system use CUDA and Thrust programming language which the code are stored in the computer-readable storage medium in server; sec. IV B, where master [remote data processing system] send request [instruction] to computing node; sec. IV B, through interconnected Ethernet link to perform partial learning functionality).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 2 – 5, 8 and  13 – 20  are rejected under 35 U.S.C. 103 as being unpatentable over Copjak et al., Advanced architectures distributed systems for the implementation of neural networks, 12th IEEE International Conference on Emerging eLearning Technologies and Applications, 2014, in view of Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation.

Regarding Claim 2, depending on Claim 1, Copjak teach the computer-implemented method of Claim 1, Copjak did not explicitly disclose: 
wherein transferring the first portion of the deep learning model comprises using a message passing interface (MPI) remote memory access (RMA) protocol.
Das disclose:
wherein transferring the first portion of the deep learning model comprises using a message passing interface (MPI) remote memory access (RMA) protocol (Das, col 12, ln. 37 – 40, where communicate with remote components via interconnect fabric [MPI] and share memory [RMA]).
Copjak and Das both teach computer-implemented methods for parallel processing of large neural network system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak’s teaching of distributed neural network processing system with Das’s teaching of optimized computer system for parallel processing to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to ensure bandwidth between components (Das, col, 12, ln. 40 - 42). 

Regarding Claim 3, depending on Claim 1, Copjak teach the computer-implemented method of Claim 1, Copjak further teach: 
 wherein the MMT comprises a first entry associated with the first portion of the deep learning model, wherein the first entry comprises a first pointer, a first layer identifier (Copjak, sec. IV. C, para. 2, where layer [portion of deep learning model &layer identifier], computing_node [first pointer] in the file format)
Copjak did not explicitly teach:
a first memory handle, a first memory offset, and a first process rank.
Das explicitly teach: 
a first memory handle, a first memory offset, (Das, tbl. 4, AMR [memory handle & memory offset]) and a first process rank (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).
Copjak and Das both teach computer-implemented methods for parallel processing of large neural network system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak’s teaching of distributed neural network processing system with Das’s teaching of optimized computer system for parallel processing to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to provide paths for the functional units (Das, col, 11, ln. 9 – 13). 

Regarding Claim 4, depending on Claim 3, Copjak in view of Das teach the computer-implemented method of Claim 3, Copjak in view of Das further teach: 
wherein the first pointer points to a location of the first portion of the deep learning model in the plurality of interconnected host nodes (Copjak, sec. IV. C, para. 2, where computing_node [first pointer, location] in the file format); 
wherein the first layer identifier indicates a layer of the deep learning model associated with the first portion of the deep learning model (Copjak, sec. IV. C, para. 2, where layer in the file format);
wherein the first memory handle indicates a location of a window associated with the first portion of the deep learning model in the first host node (Das, tbl. 4, Authority Mask Register provide the memory location and protection information of the process);
wherein the first memory offset indicates a location of the first portion of the deep learning model in the window of the first host node (Das, tbl. 4, Authority Mask Register provide the memory location and protection information of the process);
and wherein the first process rank comprises a rank of a process associated with the requesting GPU (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Regarding Claim 5, depending on Claim 4, Copjak in view of Das teach the computer-implemented method of Claim 4, Copjak in view of Das further teach: 
wherein the first entry is further associated with metadata indicating a data type of the first portion of the deep learning model (Das, Col. 17, ln. 42 – 45, where work context [data type] stores data from WD [a data element in first entry]).

Regarding Claim 8, depending on Claim 1, Copjak teach the computer-implemented method of Claim 1. Copjak did not explicitly disclose:  
wherein the first portion of the deep learning model comprises a portion of a first operation for training the deep learning model, wherein the first operation is associated with a first amount of data that is larger than a memory capacity of the first host node.
Das explicitly disclose:
wherein the first portion of the deep learning model comprises a portion of a first operation for training the deep learning model, wherein the first operation is associated with a first amount of data that is larger than a memory capacity of the first host node (Das, Col. 42, ln. 12 – 16, where parallel processor enables training with significantly larger dataset than previously feasible).

Regarding Claim 13, Claim 13 is the corresponding system claim of Claim 2. Claim 13 is rejected with the same reason as Claim 2. 

Regarding Claim 14, Claim 14 is the corresponding system claim of Claim 3. Claim 14 is rejected with the same reason as Claim 3.  

Regarding Claim 15, Copjak teaches: 
A computer program product comprising a computer readable storage medium, wherein the computer readable storage medium does not comprise a transitory signal per se, wherein the computer readable storage medium stores instructions executable by a processor to cause the processor to perform a method (Copjak, sec. II, para. 5, ln. 50 – 52, where system use CUDA and Thrust programming language which the code are stored in the computer-readable storage medium in server) comprising:
generating a model mapping table (MMT) storing information regarding respective portions of a deep learning model distributed amongst a plurality of interconnected host nodes (Copjak, fig. 11 & sec. IV. C, para. 2, where layer [portion of deep learning model], computing_node [interconnected host node] in the file format), wherein respective host nodes comprise at least one central processing unit (CPU), at least one CPU memory, at least one graphics processing unit (GPU), and at least one GPU memory (Copjak, sec. V, para. 1, each node has CPU, GPU; sec. II, para. 8, ln. 14, CPU and GPU memory), wherein the deep learning model comprises an amount of data larger than an amount of memory in any respective host node of the plurality of interconnected host nodes (Copjak, Abs. 8, the system is intended to solve large neural networks)
and outputting a trained deep learning model by training the respective portions of the deep learning model on the plurality of interconnected host nodes (Copjak, sec. V. para. 1, where each GPGPU solve partial computing problem), wherein training respective portions of the deep learning model comprises transferring respective portions of the deep learning model between respective host nodes of the plurality of interconnected host nodes (Copjak, sec. IV B, ln. 11 – 13, where master send the relevant data [first portion of model] to the node [requesting host node] ) and providing respective copies of the respective portions of the deep learning model to respective GPU memories for processing by respective GPUs (Copjak, sec. III, para. 2, ln. 54, where data is processed in GPU memory).
Copjak did not explicitly teach: 
using a message passing interface (MPI) remote memory access (RMA) protocol 
Das explicitly teach 
using a message passing interface (MPI) remote memory access (RMA) protocol  (Das, col 12, ln. 37 – 40, where communicate with remote components via interconnect fabric [MPI] and share memory [RMA])
Copjak and Das both teach computer-implemented methods for parallel processing of large neural network system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak’s teaching of distributed neural network processing system with Das’s teaching of optimized computer system for parallel processing to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification in order to ensure bandwidth between components (Das, col, 12, ln. 40 - 42). 

Regarding Claim 16, depending on Claim 15, Copjak in view of Das teach the computer program product claim of Claim 15. Copjak in view of Das further teach:   
receiving a request from a requesting GPU for a first portion of the deep learning model, wherein the requesting GPU is associated with a requesting GPU memory and a requesting host node (Copjak, sec. IV B, ln. 5 – 8, where GPGPU on the node respond to the master);
identifying a first host node of the plurality of interconnected host nodes storing the first portion of the deep learning model based on information in the MMT (Copjak, sec. IV B, ln. 11 – 13, where master identify and send the relevant distributed data [identify the data at first host] );
transferring the first portion of the deep learning model from the first host node to the requesting host node (Copjak, sec. IV B, ln. 11 – 13, where master send the relevant data [first portion of model] to the node [requesting host node] );
providing a first copy of the first portion of the deep learning model from the requesting host node to the requesting GPU memory (Copjak, sec. III, para. 2, ln. 54, where data is processed in GPU memory);
performing processing, by the requesting GPU, on the first copy of the first portion of the deep learning model stored in the requesting GPU memory (Copjak, sec. IV B, ln. 20 – 21, where after receiving data, the node processes; sec. III, para. 2, ln. 54, where data is processed in GPU memory);
synchronizing the first copy of the first portion of the deep learning model with the first portion of the deep learning model in response to performing processing (Copjak, sec. II, para. 5, ln. 29 – 39, where parent child processing block guarantees synchronization of parallel computing);
and updating the MMT based on synchronizing the first copy of the first portion of the deep learning model (Copjak, sec. IV B, ln. 23 – 24, where processed node send message to master and wait for next processing, sec. IV, ln. 15 – 18, system keep the file updated on node/neuron information).

Regarding Claim 17, depending on Claim 16, Copjak in view of Das teach the computer program product of Claim 16. Copjak in view of Das further teach:
wherein the MMT comprises a first entry associated with the first portion of the deep learning model, wherein the first entry comprises a first pointer, a first layer identifier (Copjak, sec. IV. C, para. 2, where layer [portion of deep learning model &layer identifier], computing_node [first pointer] in the file format), a first memory handle, a first memory offset, (Das, tbl. 4, AMR [memory handle & memory offset]) and a first process rank (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Regarding Claim 18, depending on Claim 17, Copjak in view of Das teach the computer program product of Claim 17. Copjak in view of Das further teach:
wherein the first pointer points to a location of the first portion of the deep learning model in the plurality of interconnected host nodes (Copjak, sec. IV. C, para. 2, where computing_node [first pointer, location] in the file format); 
wherein the first layer identifier indicates a layer of the deep learning model associated with the first portion of the deep learning model (Copjak, sec. IV. C, para. 2, where layer in the file format);
wherein the first memory handle indicates a location of a window associated with the first portion of the deep learning model in the first host node (Das, tbl. 4, Authority Mask Register provide the memory location and protection information of the process);
wherein the first memory offset indicates a location of the first portion of the deep learning model in the window of the first host node (Das, tbl. 4, Authority Mask Register provide the memory location and protection information of the process);
and wherein the first process rank comprises a rank of a process associated with the requesting GPU (Das, col. 15, ln. 57, where processing priority [process rank] associated with VM where GPU reside).

Regarding Claim 19, depending on Claim 18, Copjak in view of Das teach the computer program product of Claim 18. Copjak in view of Das further teach:
wherein performing processing on the first copy of the first portion of the deep learning model comprises performing forward propagation for a portion of a layer of the deep learning model (Copjak, Fig. 10, where forward propagation of deep learning model)

Regarding Claim 20, depending on Claim 18, Copjak in view of Das teach the computer program product of Claim 18. Copjak in view of Das further teach:
wherein performing processing on the first copy of the first portion of the deep learning model comprises performing backpropagation for a portion of a layer of the deep learning model (Das, Fig. 12, where back propagation 1205 of neural network)

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Copjak et al., Advanced architectures distributed systems for the implementation of neural networks, 12th IEEE International Conference on Emerging eLearning Technologies and Applications, 2014, in view of Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation, filed on Jan, 2018 further in view of Alwani et al, Fused-layer CNN accelerators, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.

Regarding Claim 6, depending on Claim 5, Copjak in view of Das teach the computer-implemented method of Claim 5, Copjak in view of Das did not explicitly disclose: 
wherein the first entry is further associated with a flag indicating a first function that is associated with the first portion of the deep learning model, wherein the first function is selected from the group consisting of: a reuse data function, and a recompute function.
Alwani explicitly teach
wherein the first entry is further associated with a flag indicating a first function that is associated with the first portion of the deep learning model, wherein the first function is selected from the group consisting of: a reuse data function, and a recompute function (Alwani, sec. III A, para. 7, ln. 6 – 8, where 2 way to handle overlap values, recompute or reuse).
Copjak (in view of Das) and Alwani both teach computer-implemented methods for parallel processing of large neural network system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak (in view of Das)’s teaching of distributed neural network processing system with Alwani’s teaching of handling the overlapping data to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification for the simplicity of the operation (Alwani, sec. III A, para. 7, ln. 10).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Copjak et al., Advanced architectures distributed systems for the implementation of neural networks, 12th IEEE International Conference on Emerging eLearning Technologies and Applications, 2014, in view of Dirac et al. US20150379424, Machine learning service.

Regarding Claim 12, depending on Claim 11, Copjak teach the computer-implemented method of Claim 1, Copjak did not explicitly disclose: 
wherein the program instructions are configured to cause the processor to perform a method further comprising: metering use of the deep learning model training functionality in the system; and generating an invoice in response to metering use of the deep learning model training functionality
Dirac explicitly disclose: 
wherein the program instructions are configured to cause the processor to perform a method further comprising: metering use of the deep learning model training functionality in the system; and generating an invoice in response to metering use of the deep learning model training functionality (Dirac, para. 0077, monitor agent collect metrics from the resources used; para. 0025, generating bills). 
Copjak and Dirac both teach computer-implemented methods of machine learning system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Copjak’s teaching of distributed neural network processing system with Dirac’s teaching of machine learning as a service to achieve the claimed teaching. One of the ordinary skill in the art would have motivated to make this modification to reduce the barrier of utilizing machine learning in the business (Dirac, para. 0002, ln. 3 - 7).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122