Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/02/2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Status of the Claims
The present application is being examined under the claims filed on 07/02/2018.
Claims 1-20 are pending.
Claims 1-20 are rejected.	

Drawings
The drawings filed on 07/02/2018 are acceptable for examination purposes.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are:
Claim 16
a first model parser configured to
a first early-stopper configured to set a hyper parameter based on the network structure and real-time context information of the system configured to drive the DNN. the hyper parameter being used for performing an early-stop function; (Specification [0097]-[0100])
a first job assigner configured to assign a plurality of depth-wise jobs based on the hyper parameter, each of the plurality of depth-wise jobs including at least a part of the plurality of computing operations; and (Specification [0097]-[0100])
a plurality of resources configured to execute the plurality of depth-wise jobs, when an early-stop event for a first layer among the plurality of layers is generated while the plurality of depth-wise jobs are executed, the plurality of resources configured to perform some of a plurality of computing operations included in at least one second layer and to stop a remainder of the plurality of computing operations, the at least one second layer being arranged prior to the first layer. (Specification [0097]-[0100])
Claim 17
a context manager configured to update the real-time context information based on an operating status of the system, wherein the first early-stopper is configured to update the hyper parameter based on the real-time context information. (Specification [0055], [0083], [0097]-[0100])
Claim 18
a resource manager configured to generate resource status information that represents performance and utilization of the plurality of resources. (Specification [0052], [0097]-[0100])
Claim 19
a second model parser configured to
a second early-stopper configured to set the hyper parameter based on the network structure and the real-time context information; and (Specification [0098]-[0100])
a second job assigner configured to assign the plurality of depth-wise jobs based on the hyper parameter. (Specification [0098]-[0100])
Because these claim limitation(s) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10, 13, and 16-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter 
Claim 10:
The term “major” in claim 10 is a relative term which renders the claim indefinite. The term “major” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The limitation of “major characteristic values” in claim 10 has been rendered indefinite by the use of the term “major”. For examination purposes, “major characteristic values” is being interpreted as any characteristic value.
Claim 13:
Claim 13 recites the limitation "the some" in lines 1 and 4.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the some” is being interpreted as “the subset”.
Claim 16-20:
Each of the claim limitations in claims 16-19, as indicated above in section “Claim Interpretation” starting on page 2, invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specification [0096]-[0101] does not definitively identify the structure for the claimed “first model parser” (Specification [0097]-[0100]), “first early-stopper” (Specification [0097]-[0100]), “system” (Specification [0097]-[0100]), “first job assigner” (Specification [0097]-[0100]), “plurality of resources” (Specification [0097]-[0100]), “context manager” (Specification [0097]-[0100]), “resource manager” (Specification [0097]-[0100]), “second model parser” (Specification [0097]-[0100]), “second early-stopper” (Specification [0097]-[0100]), and “second job assigner” (Specification [0097]-[0100]); instead 
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 13, 15, 16, and 19:
Claim 1, 13, 15, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Hara et al. (US 20170228639, “Hara”) in view of S. Teerapittayanon, B. McDanel and H. T. Kung, "BranchyNet: Fast inference via early exiting from deep neural networks," 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 2464-2469, doi: 10.1109/ICPR.2016.7900006 (“Teerapittayanon”).
Regarding claim 1:
Hara teaches a method of controlling a plurality of computing operations in a deep neural network (DNN), the method comprising:
analyzing a network structure of the DNN, the network structure comprising a plurality of layers;
Hara in at least para. [0047] discloses a “learning setting … used for training of first neural networks … may include a learning rate, dropout parameter, number of layers, and selection of activation functions.” Determining the learning setting based on for example a number of layers, requires analyzing a structure of the first neural networks to determine the number of layers.
setting a hyper parameter, based on the network structure and real-time context information of a system configured to drive the DNN, the hyper parameter being used for performing an early-stop function;
Hara in at least para. [0068] discloses generating “a new setting used for training of second neural networks.” Hara in at least para [0038] discloses generating “[the] new setting based on tentative weight data of the second neural network”. Hara in at least para. [0069] discloses “the training section may incompletely train a second neural network with the new setting … In other words, the training section may stop training of the second neural network at the t-th updating epoch.” That is, Hara discloses generating a new setting (i.e., setting a hyper parameter) based on tentative weight data (i.e., network structure) and whether or not training of the second neural network is terminated (i.e., real-time context information of a system). Hara in at least para. [0040] discloses to “terminate the training of the second neural network with the new setting in response to the estimated evaluation value of the second neural network with the new setting not satisfying a criterion”. That is, Hara discloses the new setting being used for terminating the training (i.e., performing an early-stop function).
assigning a plurality of depth-wise jobs to a plurality of resources included in the system based on the hyper parameter to execute the plurality of depth-wise jobs, each of the plurality of depth-wise jobs including at least a part of the plurality of computing operations; and

Hara does not explicitly disclose:
when an early-stop event for a first layer among the plurality of layers is generated while the plurality of depth-wise jobs are executed, performing a subset of a plurality of computing operations included in at least one second layer and stopping a remainder of the plurality of computing operations, the at least one second layer being arranged prior to the first layer.
However, Teerapittayanon discloses:
when an early-stop event for a first layer among the plurality of layers is generated while the plurality of depth-wise jobs are executed, performing a subset of a plurality of computing operations included in at least one second layer and stopping a remainder of the plurality of computing operations, the at least one second layer being arranged prior to the first layer.
Teerapittayanon in at least FIG. 1 and Section III.C. discloses “Once trained, BranchyNet can be used for fast inference by classifying samples at earlier stages in the network based on the algorithm in Figure 2. If the classier at an exit point of a branch has high confidence about correctly labeling a test sample x, the sample is exited and returns a predicted label early with no further computation performed by the higher branches in the network.” That is, referring to FIG. 1 of Teerapittayanon, when using BranchyNet for classification (i.e., while the plurality of depth-wise jobs are executed) and an 
Hara and Teerapittayanon are analogous art to the claimed invention because they are directed to early stopping/termination approaches to training neural networks based on classification accuracy/confidence. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate adding branch exits for early termination when classification accuracy/confidence meets a criterion. One of ordinary skill in the arts would have been motivated to make this modification to exploit the observation that features learned at an early layer of a network may often be sufficient for the classification of many data points and to avoid the cost of added latency and energy usage (Teerapittayanon in at least Abstract).
Regarding claim 13:
Hara in view of Teerapittayanon teaches the method of claim 1 (as mentioned above). Teerapittayanon further discloses:
wherein, after the some of the plurality of computing operations are performed and the remainder of the plurality of computing operations are stopped, remaining jobs among the plurality of depth-wise jobs are continuously executed based on a result of the some of the plurality of computing operations.
Teerapittayanon in at least FIG. 1 and Section III.C. discloses “If the classier at an exit point of a branch has high confidence about correctly labeling a test sample x, the sample is exited and returns a predicted label early with no further computation performed by the higher branches in the network.” That is, referring to FIG. 1 of Teerapittayanon, when an early-stop event for layer “Conv 5x5” (i.e., first layer) is generated, computations for the “Exit 1” branch are performed (i.e., the some of the plurality of 
Hara and Teerapittayanon are analogous art to the claimed invention because they are directed to early stopping/termination approaches to training neural networks based on classification accuracy/confidence. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate adding branch exits for early termination when classification accuracy/confidence meets a criterion. One of ordinary skill in the arts would have been motivated to make this modification to exploit the observation that features learned at an early layer of a network may often be sufficient for the classification of many data points and to avoid the cost of added latency and energy usage (Teerapittayanon in at least Abstract).
Regarding claim 15:
Hara in view of Teerapittayanon teaches the method of claim 1 (as mentioned above). Hara further discloses:
wherein the plurality of resources include at least one of a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an image signal processor (ISP), dedicated hardware and a neural processing unit (NPU).
Hara in at least para. [0101] discloses “CPU 2000 may perform various types of processing … which includes various types of operations, processing of information, condition judging, search/replace of information, etc.” That is, Hara discloses assigning operations to CPU 2000 (i.e., wherein the plurality of resources include at least one of a central processing unit (CPU)).
Regarding claim 16:

a first model parser configured to analyze a network structure of the DNN, the network structure comprising a plurality of layers;
Hara in at least para. [0029] discloses “apparatus 100 may comprise processor” (i.e., first model parser). Hara in at least para. [0047] discloses a “learning setting … used for training of first neural networks … may include a learning rate, dropout parameter, number of layers, and selection of activation functions.” Determining the learning setting based on for example a number of layers, requires analyzing a structure of the first neural networks to determine the number of layers.
a first early-stopper configured to set a hyper parameter based on the network structure and real-time context information of the system configured to drive the DNN, the hyper parameter being used for performing an early-stop function;
Hara in at least para. [0029] discloses “apparatus 100 may comprise processor” (i.e., first early-stopper). Hara in at least para. [0068] discloses generating “a new setting used for training of second neural networks.” Hara in at least para [0038] discloses generating “[the] new setting based on tentative weight data of the second neural network”. Hara in at least para. [0069] discloses “the training section may incompletely train a second neural network with the new setting … In other words, the training section may stop training of the second neural network at the t-th updating epoch.” That is, Hara discloses generating a new setting (i.e., set a hyper parameter) based on tentative weight data (i.e., network structure) and whether or not training of the second neural network is terminated (i.e., real-time context information of a system). Hara in at least para. [0040] discloses to “terminate the training of the second neural network with the new setting in response to the estimated evaluation value of the second neural network with the new setting not satisfying a criterion”. That is, Hara discloses the new setting being used for terminating the training (i.e., performing an early-stop function).
a first job assigner configured to assign a plurality of depth-wise jobs based on the hyper parameter, each of the plurality of depth-wise jobs including at least a part of the plurality of computing operations; and
Hara in at least para. [0029] discloses “apparatus 100 may comprise processor” (i.e., first job assigner). Hara in at least para. [0069] discloses “the training section may incompletely train a second neural network with the new setting based on the training data”, and Hara in at least para. [0101] discloses “CPU 2000 may perform various types of processing … which includes various types of operations, processing of information, condition judging, search/replace of information, etc.” That is, Hara discloses assigning operations (i.e., a plurality of depth-wise jobs) to CPU 2000 to train a second neural network with the new setting (i.e., based on the hyper parameter).
Hara does not explicitly disclose:
a plurality of resources configured to execute the plurality of depth-wise jobs, when an early-stop event for a first layer among the plurality of layers is generated while the plurality of depth-wise jobs are executed, the plurality of resources configured to perform some of a plurality of computing operations included in at least one second layer and to stop.
However, Teerapittayanon discloses:
a plurality of resources configured to execute the plurality of depth-wise jobs, when an early-stop event for a first layer among the plurality of layers is generated while the plurality of depth-wise jobs are executed, the plurality of resources configured to perform some of a plurality of computing operations included in at least one second layer and to stop.
Teerapittayanon in at least FIG. 1 and Section III.C. discloses “Once trained, BranchyNet can be used for fast inference by classifying samples at earlier stages in the network based on the algorithm in Figure 2. If the classier at an exit point of a branch has high confidence about correctly labeling a test sample x, the sample is exited and returns a predicted label early with no further computation 
Hara and Teerapittayanon are analogous art to the claimed invention because they are directed to early stopping/termination approaches to training neural networks based on classification accuracy/confidence. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate adding branch exits for early termination when classification accuracy/confidence meets a criterion. One of ordinary skill in the arts would have been motivated to make this modification to exploit the observation that features learned at an early layer of a network may often be sufficient for the classification of many data points and to avoid the cost of added latency and energy usage (Teerapittayanon in at least Abstract).
Regarding claim 19:
Hara in view of Teerapittayanon teaches the system of claim 16 (as mentioned above). Hara further discloses:
a second model parser configured to analyze the network structure;
Hara in at least para. [0029] discloses “apparatus 100 may comprise processor” (i.e., second model parser). Hara in at least para. [0047] discloses a “learning setting … used for training of first neural networks … may include a learning rate, dropout parameter, number of layers, and selection of activation functions.” Determining the learning setting based on for example a number of layers, requires analyzing a structure of the first neural networks to determine the number of layers.
a second early-stopper configured to set the hyper parameter based on the network structure and the real-time context information; and

a second job assigner configured to assign the plurality of depth-wise jobs based on the hyper parameter.
Hara in at least para. [0029] discloses “apparatus 100 may comprise processor” (i.e., second job assigner). Hara in at least para. [0069] discloses “the training section may incompletely train a second neural network with the new setting based on the training data”, and Hara in at least para. [0101] discloses “CPU 2000 may perform various types of processing … which includes various types of operations, processing of information, condition judging, search/replace of information, etc.” That is, Hara discloses assigning operations (i.e., plurality of depth-wise jobs) to CPU 2000 to train a second neural network with the new setting (i.e., based on the hyper parameter).
Claims 2, 14, 17, 18, and 20:
s 2, 14, 17, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hara and Teerapittayanon as applied to claims 1 and 16 above, and further in view of Savvides et al. (US 20180053091, “Savvides”).
Regarding claim 2:
Hara in view of Teerapittayanon teaches the method of claim 1 (as mentioned above). Hara in view of Teerapittayanon does not explicitly disclose:
updating the real-time context information based on an operating status of the system; and
updating the hyper parameter based on the real-time context information.
However, Savvides discloses:
updating the real-time context information based on an operating status of the system; and
updating the hyper parameter based on the real-time context information.
Savvides in at least para. [0027] discloses “FIG. 3 is a method 50 for data and model compression. The method 50 enables the network (e.g., CNN, fully connected network, neural network, etc.) to be selected, trained, and compressed to enable operation on the embedded system 10. For example, a selection step enables selection of a reduced size network (block 52). As will be described below, the selection step reduces the size of the network by removing layers, removing kernels, or both. That is, the selection step may review parameters of the embedded system 10, such as processor speed, available memory, etc. and determine one or more networks which may operate within the constraints of the embedded system 10. That is, the parameters of the embedded system 10 (e.g., speed, accuracy, size, etc.) may be utilized to develop one or more thresholds to constrain selection of the network.” That is, Savvides determines parameters representing “processing speed, available memory, etc.” (i.e., updating the real-time context information based on an operating status of the system), and Savvides removes layers or kernels of the neural network (i.e., updating the hyper 
Hara, Teerapittayanon, and Savvides are analogous art to the claimed invention because they are directed to optimizing the training of neural networks. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate reviewing parameters of a system used for training a neural network and updating the neural network based on the parameters. One of ordinary skill in the arts would have been motivated to make this modification to enable the system to run the network with the resources available (Savvides in at least para. [0003]-[0004], [0027]).
Regarding claim 14:
Hara in view of Teerapittayanon teaches the method of claim 1 (as mentioned above). Hara in view of Teerapittayanon does not explicitly disclose:
wherein the real-time context information includes at least one of performance and utilization of the plurality of resources, accuracy of the plurality of depth-wise jobs and power status of the system.
However, Savvides discloses:
wherein the real-time context information includes at least one of performance and utilization of the plurality of resources, accuracy of the plurality of depth-wise jobs and power status of the system.
Savvides in at least para. [0027] discloses “FIG. 3 is a method 50 for data and model compression. The method 50 enables the network (e.g., CNN, fully connected network, neural network, etc.) to be selected, trained, and compressed to enable operation on the embedded system 10. For example, a selection step enables selection of a reduced size network (block 52). As will be described below, the selection step reduces the size of the network by removing layers, removing kernels, or both. the selection step may review parameters of the embedded system 10, such as processor speed, available memory, etc. and determine one or more networks which may operate within the constraints of the embedded system 10. That is, the parameters of the embedded system 10 (e.g., speed, accuracy, size, etc.) may be utilized to develop one or more thresholds to constrain selection of the network.” That is, Savvides determines parameters (i.e., real-time context information) representing “processing speed, available memory, etc.” (i.e., includes at least one of performance and utilization of the plurality of resources).
Hara, Teerapittayanon, and Savvides are analogous art to the claimed invention because they are directed to optimizing the training of neural networks. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate reviewing parameters of a system used for training a neural network and updating the neural network based on the parameters. One of ordinary skill in the arts would have been motivated to make this modification to enable the system to run the network with the resources available (Savvides in at least para. [0003]-[0004], [0027]).
Regarding claim 17:
Hara in view of Teerapittayanon teaches the system of claim 16 (as mentioned above). Hara in view of Teerapittayanon does not explicitly disclose:
a context manager configured to update the real-time context information based on an operating status of the system, wherein the first early-stopper is configured to update the hyper parameter based on the real-time context information.
However, Savvides discloses:
a context manager configured to update the real-time context information based on an operating status of the system, wherein the first early-stopper is configured to update the hyper parameter based on the real-time context information.
selection step reduces the size of the network by removing layers, removing kernels, or both. That is, the selection step may review parameters of the embedded system 10, such as processor speed, available memory, etc. and determine one or more networks which may operate within the constraints of the embedded system 10. That is, the parameters of the embedded system 10 (e.g., speed, accuracy, size, etc.) may be utilized to develop one or more thresholds to constrain selection of the network.” That is, Savvides determines parameters representing “processing speed, available memory, etc.” (i.e., update the real-time context information based on an operating status of the system), and Savvides removes layers or kernels of the neural network (i.e., update the hyper parameter) based on the parameters of the embedded system (i.e., based on the real-time context information).
Hara, Teerapittayanon, and Savvides are analogous art to the claimed invention because they are directed to optimizing the training of neural networks. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate reviewing parameters of a system used for training a neural network and updating the neural network based on the parameters. One of ordinary skill in the arts would have been motivated to make this modification to enable the system to run the network with the resources available (Savvides in at least para. [0003]-[0004], [0027]).
Regarding claim 18:
Hara in view of Teerapittayanon teaches the system of claim 16 (as mentioned above). Hara in view of Teerapittayanon does not explicitly disclose:
a resource manager configured to generate resource status information that represents performance and utilization of the plurality of resources.
However, Savvides discloses:
a resource manager configured to generate resource status information that represents performance and utilization of the plurality of resources.
Savvides in at least para. [0027] discloses “FIG. 3 is a method 50 for data and model compression. The method 50 enables the network (e.g., CNN, fully connected network, neural network, etc.) to be selected, trained, and compressed to enable operation on the embedded system 10. For example, a selection step enables selection of a reduced size network (block 52). As will be described below, the selection step reduces the size of the network by removing layers, removing kernels, or both. That is, the selection step may review parameters of the embedded system 10, such as processor speed, available memory, etc. and determine one or more networks which may operate within the constraints of the embedded system 10. That is, the parameters of the embedded system 10 (e.g., speed, accuracy, size, etc.) may be utilized to develop one or more thresholds to constrain selection of the network.” That is, Savvides determines parameters (i.e., resource status information) representing “processing speed, available memory, etc.” (i.e., that represents performance and utilization of the plurality of resources).
Hara, Teerapittayanon, and Savvides are analogous art to the claimed invention because they are directed to optimizing the training of neural networks. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate reviewing parameters of a system used for training a neural network and updating the neural network based on the parameters. One of ordinary skill in the arts would have been motivated to make this modification to enable the system to run the network with the resources available (Savvides in at least para. [0003]-[0004], [0027]).
Regarding claim 20:
Hara in view of Teerapittayanon teaches the system of claim 19 (as mentioned above). Hara further discloses:
wherein the second model parser, the second early-stopper and the second job assigner are used for setting an initial value of the hyper parameter, and
Hara in at least para. [0068] discloses generating “a new setting used for training of second neural networks.” Hara in at least para [0038] discloses generating “[the] new setting based on tentative weight data of the second neural network”. That is, Hara discloses generating a new setting (i.e., setting an initial value of the hyper parameter).
Hara in view of Teerapittayanon does not explicitly disclose:
wherein the first model parser, the first early-stopper and the first job assigner are used for updating the hyper parameter in real-time.
However, Savvides discloses:
wherein the first model parser, the first early-stopper and the first job assigner are used for updating the hyper parameter in real-time.
Savvides in at least para. [0027] discloses “FIG. 3 is a method 50 for data and model compression. The method 50 enables the network (e.g., CNN, fully connected network, neural network, etc.) to be selected, trained, and compressed to enable operation on the embedded system 10. For example, a selection step enables selection of a reduced size network (block 52). As will be described below, the selection step reduces the size of the network by removing layers, removing kernels, or both. That is, the selection step may review parameters of the embedded system 10, such as processor speed, available memory, etc. and determine one or more networks which may operate within the constraints of the embedded system 10. That is, the parameters of the embedded system 10 (e.g., speed, accuracy, size, etc.) may be utilized to develop one or more thresholds to constrain 
Hara, Teerapittayanon, and Savvides are analogous art to the claimed invention because they are directed to optimizing the training of neural networks. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate reviewing parameters of a system used for training a neural network and updating the neural network based on the parameters. One of ordinary skill in the arts would have been motivated to make this modification to enable the system to run the network with the resources available (Savvides in at least para. [0003]-[0004], [0027]).
Claims 3-10:
Claims 3-10 are rejected under 35 U.S.C. 103 as being unpatentable over Hara and Teerapittayanon as applied to claim 1 above, and further in view of Raveane et al. (WO 2015083199, “Raveane”).
Regarding claim 3:
Hara in view of Teerapittayanon teaches the method of claim 1 (as mentioned above). Hara in view of Teerapittayanon does not explicitly disclose:
wherein the first layer is a maximum pooling layer that selects a maximum value among a plurality of characteristic values, and 
wherein the plurality of characteristic values are included in a predetermined region of first volume data that is input to the first layer.
However, Raveane discloses:
wherein the first layer is a maximum pooling layer that selects a maximum value among a plurality of characteristic values, and 
wherein the plurality of characteristic values are included in a predetermined region of first volume data that is input to the first layer.
Raveane in at least para. [0034] discloses “FIG. 4 shows a possible architecture of the convolutional neural networks used by the system. … The input layer [ 18 ] receives the image data in YUV color space (native to most mobile computer device cameras) and prepares it for further analysis through a contrast normalization process. In the case of devices equipped with a depth sensor, the neural network architecture is modified to provide one additional input channel for the depth information, which is then combined to the rest of the network in a manner similar to the U and V color channels. The first convolutional layer [ 19 ] extracts a high level set of features through alternating convolutional and max-pooling layers.” That is, the max-pooling layer (i.e., first layer) selects a maximum value from the output of a convolutional layer (i.e., among a plurality of characteristic values) based on a high level set of features included in the image data in YUV color space (i.e., in a predetermined region of first volume data that is input to the first layer).
Hara, Teerapittayanon, and Raveane are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate a max-pooling layer that can select a maximum value from the output of a convolutional layer. One of ordinary skill in the arts would have been motivated to make this modification to extract higher level and lower level features that can be used to classify the input (Raveane in at least para. [0034]).
Regarding claim 4:
Hara in view of Teerapittayanon and Raveane teaches the method of claim 3 (as mentioned above). Raveane further discloses:
wherein, before the plurality of depth-wise jobs are executed, a position of the maximum value is predetermined based on a training operation

Hara, Teerapittayanon, and Raveane are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate a max-pooling layer that can select a maximum value from the output of a convolutional layer. One of ordinary skill in the arts would have been motivated to make this modification to extract higher level and lower level features that can be used to classify the input.
Regarding claim 5:
Hara in view of Teerapittayanon and Raveane teaches the method of claim 3 (as mentioned above). Raveane further discloses:
wherein, while the plurality of depth-wise jobs are executed, a position of the maximum value is determined by tracking the at least one second layer in real-time.
Raveane in at least para. [0034] discloses “The first convolutional layer [ 19 ] extracts a high level set of features through alternating convolutional and max-pooling layers.” That is, the max-pooling layers track a position of maximum value while training jobs are executed.
Hara, Teerapittayanon, and Raveane are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of 
Regarding claim 6:
Hara in view of Teerapittayanon and Raveane teaches the method of claim 5 (as mentioned above). Raveane further discloses:
wherein second volume data that is input to the at least one second layer includes first through N depths, where N is a natural number greater than or equal to two, and 
wherein only K consecutive depths among the first through N depths are computed, where K is a natural number greater than or equal to two and less than or equal to N.
Raveane in at least FIG. 4 discloses image data in YUV color space (i.e., second volume data that includes first through N depths) that is input to a convolutional layer of convolutional stage 1 (i.e., at least one second layer), and computing the Y, U, and V channels of the image data (i.e., K consecutive depths among the first through N depths are computed).
Hara, Teerapittayanon, and Raveane are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate a max-pooling layer that can select a maximum value from the output of a convolutional layer. One of ordinary skill in the arts would have been motivated to make this modification to extract higher level and lower level features that can be used to classify the input.
Regarding claim 7:
Hara in view of Teerapittayanon and Raveane teaches the method of claim 6 (as mentioned above). Raveane further discloses:
wherein the at least one second layer is a convolutional layer that performs a convolutional operation on the second volume data.
Raveane in at least FIG. 4 discloses inputting image data in YUV color space (i.e., second volume data) into a convolutional layer of convolutional stage 1 (i.e., at least one second layer) that performs convolutional operations on the image data).
Hara, Teerapittayanon, and Raveane are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate a max-pooling layer that can select a maximum value from the output of a convolutional layer. One of ordinary skill in the arts would have been motivated to make this modification to extract higher level and lower level features that can be used to classify the input.
Regarding claim 8:
Hara in view of Teerapittayanon and Raveane teaches the method of claim 5 (as mentioned above). Raveane further discloses:
wherein second volume data that is input to the at least one second layer includes first through N depths, where N is a natural number greater than or equal to two, 
wherein only M depths among the first through N depths are computed, where M is a natural number greater than or equal to two and less than or equal to N, and wherein any two depths among the M depths are arranged spaced apart from each other.
Raveane in at least FIG. 4 discloses image data in YUV color space (i.e., second volume data that includes first through N depths) that is input to a convolutional layer of convolutional stage 1 (i.e., at least one second layer), and computing the Y, U, and V channels of the image data (i.e., M depths among the first through N depths are computed) such that the Y, U, and V input channels are spaced apart (i.e., wherein any two depths among the M depths are arranged spaced apart from each other).

Regarding claim 9:
Hara in view of Teerapittayanon and Raveane teaches the method of claim 8 (as mentioned above). Raveane further discloses:
wherein the M depths are selected based on at least one of a predetermined interval, a predetermined number of times and a predetermined ratio.
Raveane in at least FIG. 4 discloses computing the image data (i.e., wherein the M depths are selected) based on a predetermined interval of the Y, U, and V input channels.
Hara, Teerapittayanon, and Raveane are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate a max-pooling layer that can select a maximum value from the output of a convolutional layer. One of ordinary skill in the arts would have been motivated to make this modification to extract higher level and lower level features that can be used to classify the input.
Regarding claim 10:
Hara in view of Teerapittayanon and Raveane teaches the method of claim 3 (as mentioned above). Raveane further discloses:
wherein the maximum value is selected by tracking X major characteristic values among the plurality of characteristic values, where X is a natural number.
 convolutional layer [ 19 ] extracts a high level set of features through alternating convolutional and max-pooling layers.” That is, the max-pooling layer (i.e., first layer) selects a maximum value from the output of a convolutional layer (i.e., by tracking X major characteristic values among a plurality of characteristic values).
Hara, Teerapittayanon, and Raveane are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate a max-pooling layer that can select a maximum value from the output of a convolutional layer. One of ordinary skill in the arts would have been motivated to make this modification to extract higher level and lower level features that can be used to classify the input.
Claims 11-12:
Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Hara and Teerapittayanon as applied to claim 1 above, and further in view of Lockett (US 10108902).
Regarding claim 11:
Hara in view of Teerapittayanon teaches the method of claim 1 (as mentioned above). Hara in view of Teerapittayanon does not explicitly disclose:
wherein the first layer is an average pooling layer that obtains an average value of a plurality of characteristic values, and 
wherein the plurality of characteristic values are included in a predetermined region of first volume data that is input to the first layer.
However, Lockett discloses:
wherein the first layer is an average pooling layer that obtains an average value of a plurality of characteristic values, and 
wherein the plurality of characteristic values are included in a predetermined region of first volume data that is input to the first layer.
Lockett in at least FIG. 10 and col. 16 lines 30-37 discloses “Text input 1014 is passed through two convolutional layers (1013 and 1029) in parallel, the output of layer 1029 is normalized to, for example, a fixed-sized matrix, at each pooling region (e.g., at each region of a text line) 1027 to serve as a probabilistic attention gate. Attention pooling layer 1011 performs average regional pooling weighted according to attention gates 1037.” That is, the attention pooling layer 1011 (i.e., the first layer) performs average pooling of characteristic values determined by convolutional layers 1013 and 1029 (i.e., is an average pooling layer that obtains an average value of a plurality of characteristic values), and the pooling is regional pooling weighted according to attention gates 1037 (i.e., wherein the plurality of characteristic values are included in a predetermined region of first volume data that is input to the first layer).
Hara, Teerapittayanon, and Lockett are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate an attention pooling layer that performs average regional pooling weighted according to attention values. One of ordinary skill in the arts would have been motivated to make this modification to integrate feedback during training periods so that the neural network can be rapidly trained (Lockett in at least Background and col. 16 lines 53-57). 
Regarding claim 12:
Hara in view of Teerapittayanon and Lockett teaches the method of claim 11 (as mentioned above). Lockett further discloses:
wherein the average value is obtained by selecting Y flows among a plurality of flows used for obtaining the average value, where Y is a natural number.

Hara, Teerapittayanon, and Lockett are analogous art to the claimed invention because they are directed to deep neural networks for classifying image data. It would have been obvious to one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate an attention pooling layer that performs average regional pooling weighted according to attention values. One of ordinary skill in the arts would have been motivated to make this modification to integrate feedback during training periods so that the neural network can be rapidly trained (Lockett in at least Background and col. 16 lines 53-57). 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROHIT K. KRISHNA whose telephone number is (571)272-0924. The examiner can normally be reached M-F 8am-4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571)272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit 



/ROHIT K. KRISHNA/               Examiner, Art Unit 2125          


/KAMRAN AFSHAR/               Supervisory Patent Examiner, Art Unit 2125