DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 7-8, and 13-20 have been amended by Applicant. No claims have been added or cancelled. Claims 1-20 are currently pending. 

Response to Arguments 
Claim Objections
 Objection to claim 17 has been withdrawn in view of Applicant’s amendment to said claim. 
Claim Rejections under 35 U.S.C. 103
The rejection of claims 1-4, 9-12, and 17 under 35 U.S.C. 103 have been maintained herein.
The rejection of claims 7, 8, 13, 14, 15, and 16 have been withdrawn in view of Applicant’s amendment to said claims. However, upon further consideration new grounds of rejection have been made under 35 U.S.C. 103. 
The rejection of claim 18 under 35 U.S.C. 103 has been maintained. 
The rejection of claims 19 and 20 have been withdrawn. Claims 19 and 20 (as amended) have been objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims
Applicant's arguments filed 07/29/2022 have been fully considered but they are not persuasive. 
Applicants argue that Gupta does not teach the limitation “merging, by a processor, a first operation group in a first neural network and second operation group in a second neural network, including identical operations as a shared group. In support, Applicants contend that Gupta does not teach a clusters of jobs and that the term “cluster” appears to be used in Gupta in conjunction with a type of operation performed on data point, i.e., a k-means clustering operation. Applicants further argue that no portion of Gupta, or any other document cited by the Office Action, appears to teach of suggest “identical operations” as claimed in claim 1.

Examiner respectfully disagrees with Applicant’s argument above as it is directly contradicted by the Gupta reference itself. Claim 1 recites, inter alia, merging by a processor, a first operation group in a first neural network and second operation group in a second neural network, including identical operations, as a shared operation group. 

As set forth in the Office Action Gupta was cited as teaching the limitations of merging, by a processor, a first operation group in a first… and a second operation group in a second…, including identical operations as shared operations. To this effect, Gupta Col. 1, Lines 44-47 teach a plurality of tasks that may include multiple clustering tasks in the plurality of parallel tasks. As noted in the Office Action, this citation and the paragraphs surrounding it teach grouping up operations that are common between the parallel tasks. Gupta Col. 1, lines 52-55 was further cited as teaching a task that requires computation of the highest number of clusters, a task that requires maximum number of iterations to convergence, a task with fewest or containing maximum shared clustering attributes across the tasks. As noted in the Office Action, this citation was shown to teach grouping of tasks each having identical operations [i.e., shared tasks among clusters]. 

	Gupta Col. 8, Lines 62-67 and Col. 9, Lines 1-3 further teaches “to help reduce processing, I/O and/or communication overhead, sharing process 10 may identify 200 one or more resource sharing opportunities across a plurality of parallel tasks. As noted below, the plurality of parallel tasks may include relational operations (e.g., join, union, grouping, difference, intersection, Cartesian product, division, etc.), and at least one non-relational operation (e.g., merge, clustering, classification, etc.), or a combination thereof.
	
Gupta Col. 9, Lines 44-47 further teaches Clustering may be performed by resource sharing process 10 using a set of attributes where one or more clustering attributes are common across tasks.

Gupta Col. 9, Lines 48-55 further teaches “As noted above, the plurality of parallel tasks executed 202 by resource sharing process 10 may include zero or more relational operations and at least one non-relational operation, or a combination thereof. Examples of sharing across tasks may be additionally illustrated by FIGS. 5-8. For instance, FIG. 5 illustrates an example flowchart for merging of multiple k-means clustering jobs as a single job using shareClusteringM algorithm. Thus, the k-means clustering jobs is merely an additional example for merging of multiple clustering jobs. This would be in addition to the relational and non-relational operations performed on the tasks as noted in Gupta Col. 8, Lines 62-67 and Col. 9, Lines 1-3. 

	Gupta Paragraph Col. 3, Lines 19-58 teaches the plurality of parallel tasks may include multiple clustering tasks…In the case of multiple clustering tasks, a task of the plurality of tasks may be designated as a primary task….In a second reduce function , a new cluster center may be calculated, and wherein values may be grouped and aggregated for at least one grouping task.

Applicant’s further argue that the Office Action appears to provide no explanation of why it would be obvious, based on the teachings of Yokono, fort the first operation group and a second operation group allegedly taught by Gupta, to come form a first neural network and second neural network, respectively.

	As set forth in the Office Action Gupta does not distinctly disclose neural networks as required by the claim. However, Yokono was cited as teaching merging of two or more distinct neural networks. To this effect, as noted in the Office Action Yokono teaches the environment (two distinct neural networks) where the operations Gupta teaches would be performed.

In response to Applicants argument above, it was noted in the Office Action that it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations as taught by Gupta with the merging of two or more distinct neural networks as taught by Yokono. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would save the neural network system from having to repeat any shared processes [ Yokono (Col. 3, Lines 4-10) ]. This would facilitate the recognized benefit of an increased efficiency for the system overall and require less computational resources for training the machine learning networks.
	
	In view of all of the foregoing, the rejection of claim 1 under 35 U.S.C. 103 has been maintained.
	For at least the same reasons set forth for claim 1, the rejection of claims 2-4, 6, 9-12, 16, and 17 are also maintained under 35 U.S.C. 103.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-4, 6, 9-12, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta (US 8984515 B2), Yokono (US 5295227 A) and further in view of Lee (US 20200412799 A1).


In regards to claim 1, Gupta teaches the following:
A method of operating a … system, the method comprising: merging, by a processor, a first operation group in a first … and a second operation group in a second …, including identical operations, as a shared operation group;
[ (Col. 1, Lines 44-47) “The plurality of parallel tasks may include multiple clustering tasks or a clustering task and multiple grouping tasks, using one or more common data inputs in the plurality of parallel tasks”
	This citation and the paragraphs surrounding it teach grouping up operations that are common between the parallel tasks.]
[ (Col. 1, lines 52-55) “a task that requires computation of highest number of clusters, a task that requires maximum number of iterations to converge, a task with fewest or containing maximum shared clustering attributes across the tasks,”
	This citation teaches the grouping of tasks each having identical operations (shared tasks among clusters) ]
 selecting, by the processor, a first hardware to execute the shared operation group, from among a plurality of hardware; 
 [ (Col. 2, Lines 17-20) “When executed by a processor, the instructions cause the processor to perform operations comprising identifying one or more resource sharing opportunities across a plurality of parallel tasks.” 
Examiner notes that the selection of a first hardware from the plurality of hardware is not distinctly disclosed by Gupta and is instead taught by Lee as seen below. The limitation is kept together for clarity purposes. Gupta does however teach an embodiment containing a plurality of processors (Col. 6, Lines 37-41) but does not select from them. This citation and the one below it from Gupta do teach the processor executing the shared operation group. ]
[ (Col. 2, Lines 45-50) “Sharing one or more resources may include at least one of sharing data reads, sharing computations, sharing intermediate results, sharing at least one of map and reduce computations, sharing data processing resources” ]
and executing the shared operation group by using the first hardware.
[ (Col. 2, Lines 22-24) “The plurality of parallel tasks involving zero or more relational operations and at least one non-relational operation are executed.”
	This citation teaches the processor identifying the shared opportunities in the task and then executing those tasks. ]
	Gupta does not distinctly disclose that the operation groups are specifically coming from respective neural networks. However, Yokono teaches that the operations are merged from neural networks as seen below:
	Neural network
[ (Fig. 5-7) and (Col. 3, Lines 15-31)
	Yokono teaches the merging of two distinct neural networks into one structure as seen in the above figures with the additional texts as support. Yokono teaches the environment (two distinct neural networks) where the operations Gupta teaches would be performed. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations as taught by Gupta with the merging of two or more distinct neural networks as taught by Yokono. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would save the neural network system from having to repeat any shared processes [ Yokono (Col. 3, Lines 4-10) ]. This would facilitate the recognized benefit of an increased efficiency for the system overall and require less computational resources for training the machine learning networks.
	What is not distinctly disclosed by Gupta or Yokono and is instead taught by Lee is seen below:
	selecting, by the processor, a first hardware… from among a plurality of hardware;
[ (¶0085) “An application server may be selected based on the determined application requirements and based on the determined application server conditions (block 840). For example, server selection mechanism 520 may compare the determined application requirements with the determined application server conditions to identify a particular application server 150 that best matches the application requirements.” 
	This citation from Lee teaches a system where the application server(s) (equivalent to the plurality of hardware) can be selected based on the application requirements and application server conditions. The application server conditions include various processing resources such as those seen in (¶0086) which include examples like memory requirements or memory capacity among other possible limitations.]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations in a neural network as taught by Gupta/Yokono with the hardware resource management system as taught by Lee. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the resource management system of Lee with the teachings of Gupta/Yokono would provide a more efficient system for reducing redundant operations across a plurality of operation groups. This would facilitate the recognized benefit of an increased efficiency for the system overall from processing times being reduced from the removed redundant operations and the allocation of resources to better manage processing times.



In regards to claim 2, The method of claim 1, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 1 above. Gupta continues to teach the following:
wherein the merging of the first operation group and the second operation group comprises: obtaining the first operation group and the second operation group from among a plurality of operations in the first … and a plurality of operations in the second … respectively;
[ (Col. 1, Lines 50-55) “The primary task selection criteria may include selection of a task that requires computation of highest number of clusters, a task that requires maximum number of iterations to converge, a task with fewest or containing maximum shared clustering attributes across the tasks, or a combination thereof.”
	This citation from Gupta teaches selecting operations (tasks) from the operation groups (clusters) based on arbitrary selection data that the user is free to modify. ]
and assigning, to the first operation group and the second operation group, a shared identification (ID) indicating the shared operation group.
[ (Col. 1, Lines 61-64) “A map output of the merged task may include a combination of the cluster-id and a job-id as map-key, and a data value as the map-value may enable, at least in part, sharing of the map-output for multiple tasks.”
	This citation teaches the use of IDs for the cluster which is the equivalent to the operation group ID.]
Gupta does not distinctly disclose that the operation groups are specifically coming from respective neural networks. However, Yokono teaches that the operations are merged from neural networks as seen below:
Neural network
[ (Fig. 5-7) and (Col. 3, Lines 15-31)
	Yokono teaches the merging of two distinct neural networks into one structure as seen in the above figures with the additional texts as support. Yokono teaches the environment (two distinct neural networks) where the operations Gupta teaches would be performed. ]
	Please refer to the motivation to combine from claim 1.



In regards to claim 3, The method of claim 2, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 2 above. Gupta continues to teach the following:
wherein the obtaining of the first operation group and the second operation group comprises: obtaining the first operation group and the second operation group, based on at least one of, operation group IDs, or a sub neural network ID set for each of the first … and the second … 
[ (Col. 1, Lines 61-64) “A map output of the merged task may include a combination of the cluster-id and a job-id as map-key, and a data value as the map-value may enable, at least in part, sharing of the map-output for multiple tasks.”
	This citation teaches the cluster ID being used to enable the sharing of the map-output for multiple tasks. ]
	What is not distinctly disclosed by Gupta and is instead taught by Yokono is seen below:
Neural network
[ (Fig. 5-7) and (Col. 3, Lines 15-31)
	Yokono teaches the merging of two distinct neural networks into one structure as seen in the above figures with the additional texts as support. Yokono teaches the environment (two distinct neural networks) where the operations Gupta teaches would be performed. ]
	Please refer to the motivation to combine from claim 1.




In regards to claim 4, The method of claim 2, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 2 above. Yokono continues to teach the following:
wherein the obtaining of the first operation group and the second operation group comprises: analyzing a layer topology of at least one of the first neural network and the second neural network,
[ (Fig. 5) and (Col. 3, Lines 24-27) “or a learning parameter share their input layer or input and intermediate layers so that the optimum neural network can be obtained by providing the same pattern to the plurality of networks. FIG. 5 shows an example of a learning system configured by two networks each having a different structure.”
	This citation and corresponding figure from Yokono teaches the merging of neural network layers. ]
	What is not distinctly disclosed by Yokono and is instead taught by Gupta is seen below:
and obtaining the first operation group and the second operation group, based on a result from the analyzing.
[ [ (Col. 1, Lines 44-47) “The plurality of parallel tasks may include multiple clustering tasks or a clustering task and multiple grouping tasks, using one or more common data inputs in the plurality of parallel tasks”
	This citation and the paragraphs surrounding it teach grouping up operations that are common between the parallel tasks.]
[ (Col. 1, lines 52-55) “a task that requires computation of highest number of clusters, a task that requires maximum number of iterations to converge, a task with fewest or containing maximum shared clustering attributes across the tasks,”
	This citation teaches the grouping of tasks each having identical operations (shared tasks among clusters) ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations as taught by Gupta with the analysis of a layer’s topology as taught by Yokono to form a topological analysis that obtains operation groups based off of that analysis. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would simplify the learning process for neural networks [ Yokono (Col. 4, Lines 58-66) ]. This would facilitate the recognized benefit of an increased efficiency for the system overall and require less computational resources for training the machine learning networks.




In regards to claim 6, The method of claim 2, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 2 above. Gupta continues to teach the following:
wherein the obtaining of the first operation group and the second operation group comprises: generating an operation history by tracing an operation process of at least one of the first … and the second … during runtime of at least one of the first … and the second … and obtaining the first operation group and the second operation group, based on the operation history.
[ (Col. 2, Lines 6-14) “A series of map and reduce functions may be called until a cluster termination condition for various tasks are obtained. In the merged task, the cluster-id of the primary task may be used as the map output key, and wherein a data value may be used as map output values. In a second reduce function, a new cluster center may be calculated, and wherein values may be grouped and aggregated for at least one grouping task. In multiple map-reduce calls of the clustering tasks, one or more grouping operations may be performed.”
	This citation from Gupta teaches the map-reduce calls which break down the parallel tasks and assign keys and values. As the keys represent the operations being performed (cluster IDs), the reduce function then merges the results based on key values. This is equivalent to the tracing of the claim where each operation is carried out (with their operation output being assigned as their map output) and then sorted and merged based off identical keys. ]
What is not distinctly disclosed by Gupta and is instead taught by Yokono is seen below:
Neural network
[ (Fig. 5-7) and (Col. 3, Lines 15-31)
	Yokono teaches the merging of two distinct neural networks into one structure as seen in the above figures with the additional texts as support. Yokono teaches the environment (two distinct neural networks) where the operations Gupta teaches would be performed. ]
	Please refer to the motivation to combine from claim 1.


In regards to claim 9, The method of claim 2, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 2 above. Gupta continues to teach the following:
wherein the assigning of the shared ID comprises: assigning, to the first operation group and the second operation group, a shared count indicating a number of operation groups having the shared ID.
[ (Col. 1, Lines 50-55) “The primary task selection criteria may include selection of a task that requires computation of highest number of clusters, a task that requires maximum number of iterations to converge, a task with fewest or containing maximum shared clustering attributes across the tasks, or a combination thereof”
	This citation from Gupta mentions the hierarchy for selecting which task(s) are going to be completed in which order when there are a plurality of operation groups and shared tasks. Specifically, the citation mentions that one of the criteria could be which one of the tasks is shared between the fewest or maximum shared clusters. This implicitly shows that the cluster-IDs, Job-IDs and other data recorded as taught by Gupta would keep track of how many times each task is shared between the clusters and how many clusters share each task. ]



In regards to claim 10, The method of claim 9, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 9 above. Lee continues to teach the following:
wherein the selecting of the first hardware comprises: selecting the first hardware based on at least one of preferred information of applications, amounts of available processing resources, or the shared count;
[ (¶0085) “An application server may be selected based on the determined application requirements and based on the determined application server conditions (block 840). For example, server selection mechanism 520 may compare the determined application requirements with the determined application server conditions to identify a particular application server 150 that best matches the application requirements.” 
	This citation from Lee teaches a system where the application server(s) (equivalent to the plurality of hardware) can be selected based on the application requirements and application server conditions. The application server conditions include various processing resources such as those seen in (¶0086) which include examples like memory requirements or memory capacity among other possible limitations.]
and assigning a resource ID of the first hardware to the shared operation group.
[ (Fig. 11) and (¶0113) 
	The tailored session activation request explained in the above cited figure and paragraph are equivalent to the resource ID of the claim limitation. The session activation is tied to the application itself (which would be the operation group that is taught by Gupta above) and follows the application wherever it goes in the system. The session activation may be updated (tailored) whenever the application server associated with the application session is assigned. ]
	Please refer to the motivation to combine from claim 1



In regards to claim 11, The method of claim 10, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 9 above. Gupta continues to teach the following:
wherein the selecting of the first hardware comprises: setting a priority with respect to the shared operation group, such that the first hardware executes the shared operation group prior to other operations.
[ (Col. 1, Lines 50-55) “The primary task selection criteria may include selection of a task that requires computation of highest number of clusters, a task that requires maximum number of iterations to converge, a task with fewest or containing maximum shared clustering attributes across the tasks, or a combination thereof”
	This citation teaches that the operation groups in Gupta will have a priority hierarchy in terms of which tasks are selected to be processed first. Although not explicitly mentioning priority, it is implicit that the system will select which tasks based on different criteria like how many clusters utilize the task and how long each task will take to process and can prioritize the task by whichever criteria is selected by the user. ]
	



In regards to claim 12, The method of claim 1, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 1 above. Gupta continues to teach the following:
wherein the merging of the first operation group and the second operation group
[ (Col. 1, Lines 44-47) “The plurality of parallel tasks may include multiple clustering tasks or a clustering task and multiple grouping tasks, using one or more common data inputs in the plurality of parallel tasks”
	This citation and the paragraphs surrounding it teach grouping up operations that are common between the parallel tasks which were previously taught in the combination of Gupta/Yokono of claim 1 to be operation groups within neural networks. ]
[ (Col. 1, lines 52-55) “The plurality of parallel tasks involving zero or more relational operations and at least one non-relational operation are executed. In response to executing the plurality of parallel tasks, one or more resources of the identified resource sharing opportunities are shared across tasks involving zero or more relational operations and at least one non-relational operation.” (emphasis added)
This citation teaches that the merge operation happens during runtime due to it clarifying that the merge operation occurs in response to the execution of the parallel tasks. ]
What is not distinctly disclosed by Gupta and is instead taught by Lee is seen below:
and the selecting of the first hardware are performed during runtime of at least one of the first neural network and the second neural network
[ (¶0085) “An application server may be selected based on the determined application requirements and based on the determined application server conditions (block 840). For example, server selection mechanism 520 may compare the determined application requirements with the determined application server conditions to identify a particular application server 150 that best matches the application requirements.” 
	This citation from Lee teaches a system where the application server(s) (equivalent to the plurality of hardware) can be selected based on the application requirements and application server conditions. The application server conditions include various processing resources such as those seen in (¶0086) which include examples like memory requirements or memory capacity among other possible limitations. ]
[ (Fig. 11) and (¶0110) “FIGS. 11 and 12 describe processes relating to an implementation where UE 110 and resource information server 140 negotiate with respect to application requirements associated with UE applications.”
	In general, this figure and the corresponding paragraph describing it, detail the process of the system. Reference number 1110 in Fig. 11 shows that the application process is started before the application server is selected which shows the process is occurring during runtime. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations in a neural network as taught by Gupta/Yokono so that the operations occurs during runtime and hardware selection as taught by Lee. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the runtime operation and hardware selection system of Lee with the teachings of Gupta/Yokono would provide a more efficient system for reducing redundant operations across a plurality of operation groups. This would facilitate the recognized benefit of an increased efficiency for the system overall from processing times being reduced from the removed redundant operations and the allocation of resources to better manage processing times.



In regards to claim 17, Gupta teaches the following:
a plurality of processors;
[ (Col. 6, Lines 37-39) “The instruction sets and subroutines of resource sharing process 10, which may be stored on storage device 16 coupled to computer 12, may be executed by one or more processors” ]
and a … merge module configured to search for an operation group included in common in a plurality of … performing different tasks,
[ (Col. 3, Lines 5-10) “In another implementation, a computing system includes a processor and memory configured to perform operations comprising identifying one or more resource sharing opportunities across a plurality of parallel tasks. The plurality of parallel tasks includes zero or more relational operations and at least one non-relational operation.”
	This citation teaches the system being able to identify tasks and select from the plurality of tasks, ones which are common (resource sharing opportunities) or otherwise. Examiner is interpreting the tasks of Gupta to be equivalent to the operation groups of the claim limitation being merged. Examiner also notes that the specification recites the merge module capable of being a software, hardware, or an embodiment that includes a combination (¶0034). The specification further adds (¶0035) that general circuitry along with memory and specific computer instructions could be used to carry out said operations. In light of the specification, examiner notes that the citations above that teach the processor and memory from Gupta, along with the operations being taught here teach the neural network merge module. ]
set a shared identification (ID) 
[ (Col. 1, Lines 61-64) “A map output of the merged task may include a combination of the cluster-id and a job-id as map-key, and a data value as the map-value may enable, at least in part, sharing of the map-output for multiple tasks.” ]
[ (Col. 1, Lines 50-55) “The primary task selection criteria may include selection of a task that requires computation of highest number of clusters, a task that requires maximum number of iterations to converge, a task with fewest or containing maximum shared clustering attributes across the tasks, or a combination thereof”
	This citation and the one above it teach the use of IDs attached to both the clusters and jobs. The cluster-ID would be equivalent to the shared-ID as it denotes which tasks are shared within the cluster. Examiner notes that the applicant’s specification points the shared-ID as having the same function/use as the operation group ID with the only difference being that the shared-ID is directed towards the neural networks rather than the shared operations within a neural network like the operation group ID is. ]
thereby setting the operation the found group to be computed in one of the plurality of processors during execution of the plurality of ...
[ (Col. 2, Lines 22-24) “The plurality of parallel tasks involving zero or more relational operations and at least one non-relational operation are executed.”
	This citation and the one above it teach the processor identifying the shared opportunities in the task and then executing those tasks. ]
[ (Col. 2, Lines 45-50) “Sharing one or more resources may include at least one of sharing data reads, sharing computations, sharing intermediate results, sharing at least one of map and reduce computations, sharing data processing resources” ]
Gupta does not distinctly disclose that the operation groups are specifically coming from respective neural networks. However, Yokono teaches that the operations are merged from neural networks as seen below:
A neural network system, comprising:
Neural network
[ (Abstract) “The neural network learning system operates on, for example, a plurality of neural networks” ]
[ (Fig. 5-7) and (Col. 3, Lines 15-31)
	Yokono teaches the merging of two distinct neural networks into one structure as seen in the above figures with the additional texts as support. Yokono teaches the environment (two distinct neural networks) where the operations Gupta teaches would be performed. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations as taught by Gupta with the merging of two or more distinct neural networks as taught by Yokono. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would save the neural network system from having to repeat any shared processes [ Yokono (Col. 3, Lines 4-10) ]. This would facilitate the recognized benefit of an increased efficiency for the system overall and require less computational resources for training the machine learning networks.
	What is not distinctly disclosed by Gupta or Yokono and is subsequently taught by Lee is seen below:
set a shared identification (ID) and a resource ID for an operation group that is found as a result of the searching,
[ (Fig. 11) and (¶0113) 
	The tailored session activation request explained in the above cited figure and paragraph are equivalent to the resource ID of the claim limitation. The session activation is tied to the application itself (which would be the operation group that is taught by Gupta above) and follows the application wherever it goes in the system. The session activation may be updated (tailored) whenever the application server associated with the application session is assigned. Examiner notes that Lee teaches assigning multiple pieces of data to the application which would be the Shared-ID as taught by Gupta above. ]
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations in a neural network as taught by Gupta/Yokono with the hardware resource management system as taught by Lee. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the resource management system of Lee with the teachings of Gupta/Yokono would provide a more efficient system for reducing redundant operations across a plurality of operation groups. This would facilitate the recognized benefit of an increased efficiency for the system overall from processing times being reduced from the removed redundant operations and the allocation of resources to better manage processing times.


Claim 7 is rejected as being unpatentable over Gupta/Yokono/Lee, and in further view of Chilimbi et al. (US 20150324690 A1).
In regards to claim 7, The method of claim 1, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 1 above. Gupta continues to teach the following:
wherein the executing of the shared operation group comprises: storing, by the first hardware, an output of a last operation of the shared operation group in a shared buffer,
[ (Col. 1, Lines 55-59) “Sharing one or more resources may include at least one of sharing data reads, sharing computations, sharing intermediate results, sharing at least one of map and reduce computations, sharing data processing resources, sharing storage resources”
	This citation teaches the shared storage resources. ]
[ (Col. 7, Lines 5-10) “The instruction sets and subroutines of client applications 22, 24, 26, 28, which may be stored on storage devices 30, 32, 34, 36 coupled to client electronic devices 38, 40, 42, 44, may be executed by one or more processors (not shown) and one or more memory architectures” 
This citation teaches that the operations will be stored on the storage devices (including the output of the last operation). ]
[ (Col. 6, Lines 41-44) “Storage device 16 may include but is not limited to: a hard disk drive; a flash drive, a tape drive; an optical drive; a RAID array; a random access memory (RAM); and a read-only memory (ROM).”
	This citation teaches the variety of different storage solutions, one of which is RAM which is a functional equivalent to a data buffer. ]
	The combination does not distinctly disclose but Chilimbi does teach the shared buffer being shared between a second hardware and a third hardware, the second hardware being a hardware to which operations of the first neural network that follow the first operation group are assigned, the third hardware being a hardware to which operations of the second neural network that follow the second operation group are assigned, the second hardware and the third hardware being separate hardware with respect to each other (Chilimbi, Fig. 5 and Paragraph [0008] teaches model replicas (e.g., 502A-C) share a common set of parameters that is stored on a global parameter server 506 [reading on shared buffer], wherein model worker machines are arranged into model replicas such as 502A, 502B, and 502C; Chilimibi, Abstract teaches replicas of training machines communicate asynchronously with a global parameter server to provide updates to a shared model and return updated weight values.).
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations as taught by Gupta as modified by the merging of two or more distinct neural networks as taught by Yokono, as further modified by the hardware resource management system, as taught by Lee, to further include the shared global parameter server, as taught by Chilimbi in order to include computation and communication optimizations that improve system efficiency and scaling of large neural networks. (Chilimbi, Abstract). 

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Gupta/Yokono/Lee/Chilimbi as applied above, and further in view of CN (CN 108780406 B).

In regards to claim 8, The method of claim 7, is taught by Gupta/Yokono/Lee/Chilimbi as seen in the rejection for claim 7 above. CN continues to teach the following:
further comprising: accessing, by the second hardware, the shared buffer in response to a shared buffer ready signal; 
[ CN (Pg. 7, Paragraph 20) “As yet another example of using RDMA in the context of database management, a situation may arise where a replicated database server is newly designated as the primary replication server. When this occurs, the buffer pool of the newly designated master server is "cold" and may impair the performance of the database before filling the buffer pool with relevant data. However, when RDMA is available, RDMA memory accesses may be used to fill the buffer pool of the newly designated primary server by copying the existing contents of the buffer pool from the previous primary server and the now secondary replica server using RDMA memory transfers.”
	This citation from CN teaches the multiple computing systems with a shared buffer that allows any of the computing systems connected to access said buffer. Examiner is interpreting the designation of the server as the shared buffer ready signal. As seen in (Pg. 9, Section “N”), whenever the system designates one of the computing servers to be the master replication server, the action(s) consisting of accessing the shared buffer begin. ]
executing, by the second hardware, operations of the first neural network that follow the first operation group; 
accessing, by the third hardware, the shared buffer in response to a shared buffer ready signal; and [ CN (Pg. 7, Paragraph 20) “As yet another example of using RDMA in the context of database management, a situation may arise where a replicated database server is newly designated as the primary replication server. When this occurs, the buffer pool of the newly designated master server is "cold" and may impair the performance of the database before filling the buffer pool with relevant data. However, when RDMA is available, RDMA memory accesses may be used to fill the buffer pool of the newly designated primary server by copying the existing contents of the buffer pool from the previous primary server and the now secondary replica server using RDMA memory transfers.”]
executing, by the third hardware, operations of the second neural network that follow the second operation group. 
[CN (Pg. 4, Lines 39-44) “In some embodiments, the same application 106 may be executed by each server 102. Similarly, the database 108 of each server 102 may be the same. For example, databases 108 may have the same logical structure and organization and may be synchronized using network communications such that they represent the same data. In other embodiments, applications 106 and RDBMS 104 of different servers 102 may access a common database stored on one of servers 102 or on one or more other servers dedicated to database storage.”
	This citation shows that in some embodiments the computing system from CN will be able to execute the same operations as one another. ]

Gupta/Yokono teaches operations of the first neural network and operations of the second neural network [ (Fig. 5-7) and (Col. 3, Lines 15-31)
	Yokono teaches the merging of two distinct neural networks into one structure as seen in the above figures with the additional texts as support. Yokono teaches the environment (two distinct neural networks) where the operations Gupta teaches would be performed. ]

	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations as taught by Gupta as modified by the merging of two or more distinct neural networks as taught by Yokono, as further modified by the hardware resource management system, as taught by Lee, as modified by the shared global parameter server, as taught by Chilimbi to further include with the shared memory and processing system as taught by CN. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would provide faster memory access speeds [ CN (Abstract) ]. This would facilitate the recognized benefit of an increased efficiency for the system overall by speeding up the processing time and reducing the memory load times.


Claims 13 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta (US8984515B2), and further in view Miranda et al., “Reducing the Training Time of Neural Networks by Partitioning”, ICLR 2016.

In regards to claim 13, Gupta teaches the following:
An application processor, comprising: a memory storing programs; a processor configured to execute the programs stored in the memory;
[ (Col. 5, Lines 50-57) “These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified” 
This citation teaches the processor and the memory within the system. Further it teaches the processor running instructions stored in the memory to execute the operations. ]
However, Gupta does not distinctly disclose wherein the processor is further configured to, by executing the neural network module, identify common sub neural networks included in a plurality of neural networks performing different tasks, and merge the common sub neural networks included in the plurality of neural networks into a shared sub neural network to be executed by a single process . 
Nevertheless, Miranda teaches wherein the processor is further configured to, by executing the neural network module, identify common sub neural networks included in a plurality of neural networks performing different tasks, and merge the common sub neural networks included in the plurality of neural networks into a shared sub neural network to be executed by a single process [Miranda, Pg. 2, ¶ 6, teaches partitioning the neural network in smaller neural networks whose tasks are equivalent to the original training task, wherein the subtasks are independent o each other, which allows them to be learnt in parallel. After the proposed pre-training is complete the obtained smaller neural networks are merge and used as initial condition for the original neural network.

	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method for reducing repeated operations as taught by Gupta with the merging of two or more distinct neural networks as taught by Miranda, in order to reduce the training time required while generalization capability is not affected. [Miranda, Pg. 2, ¶ 7]. 


In regards to claim 16, The processor of claim 13, is taught by Gupta/Miranda as seen in the rejection for claim 13 below. Gupta continues to teach the following:
wherein the processor is further configured to, by executing the … merge module, with respect to the shared sub networks models, generates a shared ID, a shared count,
[ (Col. 1, Lines 61-64) “A map output of the merged task may include a combination of the cluster-id and a job-id as map-key, and a data value as the map-value may enable, at least in part, sharing of the map-output for multiple tasks.” ]
[ (Col. 1, Lines 50-55) “The primary task selection criteria may include selection of a task that requires computation of highest number of clusters, a task that requires maximum number of iterations to converge, a task with fewest or containing maximum shared clustering attributes across the tasks, or a combination thereof”
	This citation and the one above it teach the use of IDs attached to both the clusters and jobs. The cluster-ID would be equivalent to the shared-ID as it denotes which tasks are shared within the cluster. Examiner notes that the applicant’s specification points the shared-ID as having the same function/use as the operation group ID with the only difference being that the shared-ID is directed towards the neural networks rather than the shared operations within a neural network like the operation group ID is. In regards to the shared count, examiner interprets the citation to be equivalent to the shared count. Specifically, the citation mentions that one of the criteria could be which one of the tasks is shared between the fewest or maximum shared clusters. This implicitly shows that the cluster-IDs, Job-IDs and other data recorded as taught by Gupta would keep track of how many times each task is shared between the clusters and how many clusters share each task which would be a shared count. ]
Gupta does not distinctly disclose that the operation groups are specifically coming from respective neural networks. However, Miranda teaches that the operations are merged from neural networks as seen below:
	Neural network
[Miranda, Pg. 2, ¶ 6, teaches partitioning the neural network in smaller neural networks whose tasks are equivalent to the original training task, wherein the subtasks are independent of each other, which allows them to be learnt in parallel. After the proposed pre-training is complete the obtained smaller neural networks are merge and used as initial condition for the original neural network]

	What is not distinctly disclosed by Gupta or Miranda and is instead taught by Lee is seen below:
 	and a resource ID indicating hardware by which the shared neural network model is executed, and adds the shared ID, the shared count, and the resource ID to model information regarding the neural network model.
[ (Fig. 11) and (¶0113) 
	The tailored session activation request explained in the above cited figure and paragraph are equivalent to the resource ID of the claim limitation. The session activation is tied to the application itself (which would be the operation group that is taught by Gupta above) and follows the application wherever it goes in the system. The session activation may be updated (tailored) whenever the application server associated with the application session is assigned. Examiner notes that Lee teaches assigning multiple pieces of data to the application which would be the resource ID taught here and shared count as taught by Gupta. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations in a neural network as taught by Gupta/Yokono with the hardware resource management system as taught by Lee. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the resource management system of Lee with the teachings of Gupta/Yokono would provide a more efficient system for reducing redundant operations across a plurality of operation groups. This would facilitate the recognized benefit of an increased efficiency for the system overall from processing times being reduced from the removed redundant operations and the allocation of resources to better manage processing times.


Claim 14 is rejected under 35 U.S.C. as being unpatentable over Gupta/Miranda, and in further view of Hotson (US 20190065944 A1).

In regards to claim 14, The processor of claim 13, is taught by Gupta/Miranda as seen in the rejection for claim 13 above. However, the combination does not distintly disclose the remaining limitations.

Nevertheless, Hotson teaches the remaining limitations of wherein the processor is further configured to, by executing the neural network merge module, identify the common sub neural networks, based on sub neural network identifications (IDs) preset with respect to each of the plurality of sub neural networks [Hotson, [0043] teaches a processor may process the sensor data based on the weights and values of a base portion or common layers 202 of a neural network 505. For example, the common layers 202 may include layers that were sequentially trained for various tasks using EWC. The common layers 202 may include layers that process sensor data and produce an output that is used by one or more branches of a neural network. For example, a branch may include a sub-portion of a neural network that performs a sub-task such as object recognition (e.g., of a specific object type), lane detection, or image segmentation. The method includes processing 706 an output of the common layers using a plurality of sub-portions (subnetworks) to perform a plurality of different subtasks. For example, each of the first task layers 204, second task layers 304, and third task layers 404 may make up a different subnetwork or branch of the neural network. Each of the sub-portions or subnetworks may provide an output of the task indicating a classification or other result of the task.

	In the citation above identifying the “common layers” reads on identifying common subnetworks and labeling them as “common” reads on identification (ID).]

Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations as taught by Gupta as modified by the merging of two or more distinct neural networks as taught by Miranda, to further include the identification of common subnetworks, as taught by Hotson, in order to reduce computing resources and reduce training load during deployment by using elastic weight consolidation on only a subset of the network, processes or tasks that are common between different neural networks may be combined into a single network.



Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Gupta/Miranda as applied above, and further in view of Keskin (US 20180314524 A1).


In regards to claim 15, The processor of claim 13, is taught by Gupta/Miranda as seen in the rejection for claim 13 above. Keskin continues to teach the following:
wherein the processor is further configured to, by executing the neural network merge module, trace an operation process in runtime with respect to at least one neural network from among the plurality of neural networks,
[ (¶0023) “In operation, the branch predictor circuit may identify branch instructions in a given application program using a sample trace of that program. In some embodiments, a neural network, such as a convolutional neural network is then trained “offline”, using a branch history data for these particular branches”
	This citation from Keskin teaches tracing a neural network to go through all of the operations within all of its branches. Examiner notes that the training is done “offline” post-trace with the tracing operation still being performed during runtime. Further, the multiple neural networks are taught by Yokono.]
thereby generating an operation history with respect to the at least one neural network.
[ (¶0055) “The resulting decision tree may be a flow chart like structure stored on a memory device (e.g., memory 201 of FIG. 2) that may be used to classify the portion of features.”
	This citation teaches the operation history being saved (in the form of a decision tree) of the neural network and being saved. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method for reducing repeated operations as taught by Gupta, as modified by the merging of two or more distinct neural networks, as taught by Miranda, to further include the neural network operation tracing as taught by Keskin. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would eliminate the need for the processing system to wait for an outcome to continue down multiple branches [ Keskin (Abstract) ]. This would facilitate the recognized benefit of an increased efficiency for the system overall by speeding up the processing time and reducing the waiting time for different threads/branches throughout the different networks. 


Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Gupta/Yokono/Lee as applied above, and further in view of Keskin (US 20180314524 A1).


In regards to claim 18, The neural network system of claim 17, is taught by Gupta/Yokono/Lee as seen in the rejection for claim 17 above. Keskin continues to teach the following:
wherein the neural network merge module is further configured to trace operation processes in runtime with respect to at least one neural network from among the plurality of neural networks,
[ (¶0023) “In operation, the branch predictor circuit may identify branch instructions in a given application program using a sample trace of that program. In some embodiments, a neural network, such as a convolutional neural network is then trained “offline”, using a branch history data for these particular branches”
	This citation from Keskin teaches tracing a neural network to go through all of the operations within all of its branches. Examiner notes that the training is done “offline” post-trace with the tracing operation still being performed during runtime. Further, the multiple neural networks are taught by Yokono.]
thereby generating an operation history with respect to the at least one neural network.
[ (¶0055) “The resulting decision tree may be a flow chart like structure stored on a memory device (e.g., memory 201 of FIG. 2) that may be used to classify the portion of features.”
	This citation teaches the operation history being saved (in the form of a decision tree) of the neural network and being saved. ]
	Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a method for reducing repeated operations in a plurality of neural networks as taught by Gupta/Yokono/Lee with the neural network operation tracing as taught by Keskin. The reason it would be obvious is one of ordinary skill in the art would recognize, prior to the effective filing date, that combining the two would eliminate the need for the processing system to wait for an outcome to continue down multiple branches [ Keskin (Abstract) ]. This would facilitate the recognized benefit of an increased efficiency for the system overall by speeding up the processing time and reducing the waiting time for different threads/branches throughout the different networks. 

Allowable Subject Matter
Claims 5 and 19-20 (as amended) are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156. The examiner can normally be reached Mon. - Fri. 7:30a.m.-5:00p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.R.B./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123