DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application, filed on 04/17/2020, is a 371 of PCT/EP2018/077995 (filed on 10/15/2018), and claims priority to Application No. 10 2017 218 851.0 (filed in Germany on 10/23/2017). 
This action is in response to preliminary amendments filed on 04/17/2020. In the preliminary amendments, claims 1-14 are cancelled and claims 15-27 are added. Claims 15-27 are pending and have been examined. 

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 04/17/2020, 08/03/2021, and 03/10/2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 16-22 are objected to because of the following informalities: 
Claim 16 does not end with a period. See MPEP 608.01(m). Claims 17-22 are dependent on claim 16 and are objected to based on the same rationale. 
Claim 18 recites “the basis of” in line 5, it should be “a basis of”.
Claim 19 recites “baesd” in line 4, it should be “based”.
Claim 22 recites “the adapted posterior distribution function” in line 3, it should be “an adapted posterior distribution function”.
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
Claim 27:
A device configured to create a deep neural network, the deep neural network including a plurality of layers and connections having weights, and the weights in the created deep neural network being able to assume only predefinable discrete values from a predefinable list of discrete values,
the device configured to: provide at least one training input variable for the deep neural network; ascertain a variable characterizing a cost function, the variable characterizing the cost function including a first variable, which characterizes a deviation of an output variable of the deep neural network ascertained as a function of the provided training input variable relative to a predefinable setpoint output variable, and the variable characterizing the cost function further including at least one penalization variable which characterizes a deviation of a value of one of the weights from at least one of at least two of the predefinable discrete values; train the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network, at least one value of one of the weights being adapted during the training of the deep neural network as a function of the variable characterizing the cost function; and map values of the weights on one discrete value each contained in the predefinable list.
The Specification provides the following description with respect to “device” (paragraph numbers are based on PGPUB US 20200342315 A1):
[0025]: “In a further aspect, the present invention provides an example device, which is configured to carry out each step of one of the methods.”
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 27 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim limitations in claim 27 (see Claim Interpretation section) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The Specification in paragraph [0025] merely identifies that an “example device” would carry out the steps of a method as disclosed, but does not provide any description of the corresponding structure of the claimed “device” in claim 27. Therefore, claim 27 is rejected under 35 U.S.C. 112(a) for lack of written description. 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 15-27 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim limitations in claim 27 (see Claim Interpretation section) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The Specification in paragraph [0025] merely identifies that an “example device” would carry out the steps of a method as disclosed, but does not provide any description of the corresponding structure of the claimed “device” in claim 27. Therefore, claim 27 is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
For examination purposes, the “device” of claim 27 has been interpreted as a computer with a processor.

Claim 15 recites “training the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network” (emphasis added). This limitation lacks clarity because neither the claim nor Specification provides description of what would be considered “in such a way”. Therefore, one of ordinary skill in the art would not be able to ascertain the metes and bounds of “in such a way”. For examination purposes, “training the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network” has been interpreted as “training the deep neural network to detect an object as a function of the training input variable of the deep neural network”.
Claim 26 recites “training the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network” (emphasis added). This limitation lacks clarity because neither the claim nor Specification provides description of what would be considered “in such a way”. Therefore, one of ordinary skill in the art would not be able to ascertain the metes and bounds of “in such a way”. For examination purposes, “training the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network” has been interpreted as “training the deep neural network to detect an object as a function of the training input variable of the deep neural network”.
Claim 27 recites “train the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network” (emphasis added). This limitation lacks clarity because neither the claim nor Specification provides description of what would be considered “in such a way”. Therefore, one of ordinary skill in the art would not be able to ascertain the metes and bounds of “in such a way”. For examination purposes, “train the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network” has been interpreted as “train the deep neural network to detect an object as a function of the training input variable of the deep neural network”.
Claim 17 recites the limitation "the neural network" in line 2.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the neural network" has been interpreted as "the deep neural network".
Claim 18 recites the limitation "the penalization function" in line 1.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the penalization function" has been interpreted as "a penalization function".
Claim 18 recites the limitation “the respective predefinable discrete value” in line 6.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the respective predefinable discrete value” should be “a respective predefinable discrete value”.
Claim 19 recites the limitation "the positions" in line 3.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the positions" should be "a plurality of positions".
Claim 20 recites the limitation "the respective weighting functions" in line 2.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the respective weighting functions" has been interpreted as "the weighting function".
Claim 22 recites the limitation "the weight" in line 2-3.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the weight" has been interpreted as "the weights".
Claim 25 recites the limitation "the detected object" in line 2-3.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the detected object" has been interpreted as "a detected object".
Each of dependent claims 16-25 is rejected based on the same rationale as the claim from which it depends.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 15 and 24-27 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2017/0083772 A1) in view of Doshi et al. (“Deep Learning Neural Networks Optimization using Hardware Cost Penalty”) and further in view of Prokhorov et al. (US 2018/0074493 A1).
Regarding Claim 15,
Kim et al. teaches A method for creating a deep neural network, the deep neural network including a plurality of layers and...weights (Fig. 3 and pg. 1 [0012]: “The deep neural network-based model may include any one or any combination of a convolutional neural network (CNN) model that uses spatial information and a recurrent deep neural network (RDNN) model that uses time information” teach creating a deep neural network-based model with a plurality of layers; pg. 5 [0087]: “The trainer 230 trains the deep neural network-based recognition model based on the computed cost function. According to one example, the trainer 230 trains the deep neural network-based model in such a way that the cost function can be minimized. The trainer 230 may set parameters that minimize the cost function, and train the deep neural network-based model using the set parameters. In this case, the trainer 230 may designate the parameters as weights of the model” teaches the deep neural network-based model has weights),
...the method comprising the following steps: providing at least one training input variable for the deep neural network (Fig. 2 and pg. 4 [0076]: “The adjacent pixel setter 210 may set a neighboring pixel pair consisting of a first pixel and a second pixel adjacent to the first pixel, with respect to each pixel in an image frame” teach the adjacent pixel setter 210 provides the trainer 230 (which trains the deep neural network-based recognition model) with pixel data, which corresponds to providing at least one training input variable for the deep neural network; also see Fig. 3);
ascertaining a variable characterizing a cost function, the variable characterizing the cost function including a first variable, which characterizes a deviation of an output variable of the deep neural network ascertained as a function of the provided training input variable relative to a predefinable setpoint output variable (pg. 4-5 [0078]: “Once each pixel in the image frame has been labeled, the cost function calculator 220 may calculate a cost function using the difference in entropy between the first pixel and the neighboring pixel pair. In an example, the cost function may indicate a difference between a label of the first pixel and a ground truth label. The ground truth label is data that is related to the actual label of each pixel that is to be labeled using the deep neural network-based model; the ground truth label serves as a standard to gauge the accuracy of labeling. The deep neural network-based model produces a probability density function to be used to label each pixel, and labels each pixel with a class of the highest probability. The cost function calculator 220 calculates a difference between the label of the first pixel and the ground truth label to obtain an index that represents the accuracy of the label of the first pixel that is chosen by the pixel labeler” teaches a variable characterizing the cost function includes a first variable which characterizes a difference (corresponds to deviation) of the label (corresponds to output variable) generated by the deep neural network-based model ascertained as a function of the provided input including the first pixel (corresponds to provided training input variable) relative to a ground truth label (corresponds to predefinable setpoint output variable)),...
training the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network (pg. 1 [0017]: “The object recognition apparatus may include a memory configured to store instructions, and wherein the processor may be configured to execute the instructions to set the neighboring pixel pairs in the image frame, the each neighboring pixel pair including the first pixel and the one or more second pixels adjacent to the first pixel, label the first pixel using the deep neural network-based model based on the probability density function value of the neighboring pixel pairs, and recognize the object based on the labeled first pixel” and pg. 5 [0087]: “The trainer 230 trains the deep neural network-based recognition model based on the computed cost function. According to one example, the trainer 230 trains the deep neural network-based model in such a way that the cost function can be minimized. The trainer 230 may set parameters that minimize the cost function, and train the deep neural network-based model using the set parameters. In this case, the trainer 230 may designate the parameters as weights of the model” teach training the deep neural network-based model to detect an object as a function of the training input variable).
Kim et al. does not appear to explicitly teach and the weights in the created deep neural network being able to assume only predefinable discrete values from a predefinable list of discrete values,...and the variable characterizing the cost function further including at least one penalization variable which characterizes a deviation of a value of one of the weights from at least one of at least two of the predefinable discrete values;...at least one value of one of the weights being adapted during the training of the deep neural network as a function of the variable characterizing the cost function; and mapping values of the weights on one discrete value each contained in the predefinable list.
However, Doshi et al. teaches and the weights in the created deep neural network being able to assume only predefinable discrete values from a predefinable list of discrete values (pg. 1955 Section III B: “In order to convert the floating-point DNN to a fixed-point DNN model, we calculated the range of values of parameters in the DNN. We firstly train the DNN model in floating-point and record the range of possible values for each target parameter being constrained. Then, we determine a magnitude r which captures approximately the central 95% of values (to avoid outliers). This entails constraining certain DNN model parameters to a given range [−r, r] and linearly quantizing all intermediate values...The fixed-point model constrains weights, biases, layer-outputs, back-propagation error, weight updates, and bias updates. Furthermore, the sigmoid function often used in the training of the hidden-layers has been quantized and represented with a lookup table to optimize execution speed” teaches the weights in the created deep neural network assumes predefinable discrete values from a given range of values (corresponds to predefinable list of discrete values)),...
and the variable characterizing the cost function further including at least one penalization variable which characterizes a deviation of a value of one of the weights from at least one of at least two of the predefinable discrete values...at least one value of one of the weights being adapted during the training of the deep neural network as a function of the variable characterizing the cost function (pg. 1954 last full paragraph: “In this paper, we propose an approach to simultaneously minimize complexity in terms of total bit depths and maximize the accuracy of fixed-point DNNs. A bit depth penalty term is proposed to be incorporated into the cost function during the training step of DNNs. The new bit penalty encourages lower bit depths, in addition to lower magnitudes of weights. Specifically, the training step adjusts weights near bit depth boundaries to take on a lower bit depth values” and pg. 1955 Section III C: 
    PNG
    media_image1.png
    395
    599
    media_image1.png
    Greyscale
 teaches a cost function with a bit depth penalty term (corresponds to penalization variable) that characterizes the deviation of a weight value represented in bits (for example, weights near bit depth boundaries) from weight value represented in a lower bit depth values, wherein the weight values are adapted during the training of the deep neural network as a function of the variables characterizing the cost function); 
and mapping values of the weights on one discrete value each contained in the predefinable list (pg. 1955 Section III B: “In order to convert the floating-point DNN to a fixed-point DNN model, we calculated the range of values of parameters in the DNN. We firstly train the DNN model in floating-point and record the range of possible values for each target parameter being constrained. Then, we determine a magnitude r which captures approximately the central 95% of values (to avoid outliers). This entails constraining certain DNN model parameters to a given range [−r, r] and linearly quantizing all intermediate values...The fixed-point model constrains weights, biases, layer-outputs, back-propagation error, weight updates, and bias updates. Furthermore, the sigmoid function often used in the training of the hidden-layers has been quantized and represented with a lookup table to optimize execution speed” teaches constraining (corresponds to mapping) the weights on one discrete value each contained in a given range of values (corresponds to values in predefinable list of discrete values)).
Kim et al. and Doshi et al. are analogous art to the claimed invention because they are directed to implementation of deep neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Doshi et al. to the disclosed invention of Kim et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “an effective method that modified the optimization procedure during the fine-tuning of the DNN training. Experimental results showed that having incorporated the proposed bit penalty term during the training of DNNs, the number of bits can be obviously reduced with limited accuracy drop. The proposed method can be used to develop the tool to analyze and adjust the hardware complexity of DNNs” (Doshi et al. pg. 1957 Section V).
Kim et al. in view of Doshi et al. does not appear to explicitly teach deep neural network including...connections having weights.
Prokhorov et al. teaches deep neural network including...connections having weights (pg. 6 [0080]: “FIG. 4 is an illustration of applying training map data 318 to train a deep neural network of the autonomous decision device 304. The autonomous decision device 304 may include an input layer 412, a batch normalization layer 414, a fully-connected hidden layer 416, and an output layer 481” and pg. 6 [0090]: “The fully-connected hidden layer 416 of the autonomous decision device 304 receives the output of the batch normalization layer 414. The term "fully-connected" relates to a user of convolutions over an input generate an output, providing a local connection, where each region of an input is connected to a neuron of an output. Each layer applies different filters (as may also be referred to as parameters). The fully-connected hidden layer 416 operates to perform transformations that are a function of activations based on the input, but also of filter (that is, weights and biases of the layer's neurons)” teach a deep neural network has neurons wherein inputs are connected to neurons (correspond to deep neural network with connections) and the connections have weights).
Kim et al., Doshi et al., and Prokhorov et al. are analogous art to the claimed invention because they are directed to implementation of deep neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Prokhorov et al. to the disclosed invention of Kim et al. in view of Doshi et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a deep neural network for object detection because “the deep neural network presents a knowledge-base of processed information data and/or experience data based on a learning based on previous driving experience data. This information data and/or experience data may then be drawn on to produce desired vehicle operational data based on a map layers and/or data input relating to a travel route of a vehicle” (Prokhorov et al. pg. 2 [0021]).
Regarding Claim 24,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. teaches the method as recited in claim 15.
Kim et al. further teaches further comprising the following steps: after the training of the deep neural network, ascertaining an input variable of the deep neural network; and detecting an object using the trained deep neural network as a function of the ascertained input variable (pg. 5 [0087]: “The trainer 230 trains the deep neural network-based recognition model based on the computed cost function. According to one example, the trainer 230 trains the deep neural network-based model in such a way that the cost function can be minimized. The trainer 230 may set parameters that minimize the cost function, and train the deep neural network-based model using the set parameters. In this case, the trainer 230 may designate the parameters as weights of the model” and pg. 5 [0088]: “If the cost function is 0 or merely a certain difference lying within the probability density interval, it can be deemed that the label of a particular pixel chosen by the pixel labeler 120 is the same as the ground truth label” teach labeling the input pixel (corresponds to ascertaining input variable) after training the deep neural network; pg. 1 [0017]: “The object recognition apparatus may include a memory configured to store instructions, and wherein the processor may be configured to execute the instructions to set the neighboring pixel pairs in the image frame, the each neighboring pixel pair including the first pixel and the one or more second pixels adjacent to the first pixel, label the first pixel using the deep neural network-based model based on the probability density function value of the neighboring pixel pairs, and recognize the object based on the labeled first pixel” teaches detecting an object using trained deep neural network as a function of the labeled first pixel (corresponds to ascertained input variable)).
Regarding Claim 25,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. teaches the method as recited in claim 24.
Prokhorov et al. further teaches further comprising the following step: activating an at least semiautonomous machine as a function of the detected object (pg. 7 [0091]: “For example, as may be appreciated by one of skill in the art, a transformation may operate to detect edges, which in tum bases a decision on identifying an obstacle of the training map data 318, and outputs a corresponding desired vehicle operational data 310” teaches the desired vehicle operation data is generated as a function of detected objects; Fig. 9 Step 906: “produce vehicle actuator control data from the desired vehicle operational data” and Fig. 9 Step 908: “transmit the vehicle actuator control data to effect the autonomous vehicle control” and pg. 8 [0111] “Because the vehicle target data 604 may not yet be within a segment of the driving map data 302, an indicator 616 is placed on a boundary or edge of the driving map data 308 indicating a desired direction of autonomous and/or semiautonomous travel by the vehicle 100” teach actuating autonomous vehicle control of a vehicle as a function of desired vehicle operation data, wherein the vehicle is one that can provide autonomous and/or semiautonomous travel (thus rendering the vehicle to be a semiautonomous machine)).
Kim et al., Doshi et al., and Prokhorov et al. are analogous art to the claimed invention because they are directed to implementation of deep neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Prokhorov et al. to the disclosed invention of Kim et al. in view of Doshi et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a deep neural network for object detection because “the deep neural network presents a knowledge-base of processed information data and/or experience data based on a learning based on previous driving experience data. This information data and/or experience data may then be drawn on to produce desired vehicle operational data based on a map layers and/or data input relating to a travel route of a vehicle” (Prokhorov et al. pg. 2 [0021]).
Regarding Claim 26,
Kim et al. teaches A non-transitory machine-readable memory element on which is stored a computer program for creating a deep neural network, the deep neural network including a plurality of layers..., the computer program, when executed by a computer, causing the computer to perform the following steps (Fig. 3 and pg. 1 [0012]: “The deep neural network-based model may include any one or any combination of a convolutional neural network (CNN) model that uses spatial information and a recurrent deep neural network (RDNN) model that uses time information” teach creating a deep neural network-based model with a plurality of layers; pg. 5 [0087]: “The trainer 230 trains the deep neural network-based recognition model based on the computed cost function. According to one example, the trainer 230 trains the deep neural network-based model in such a way that the cost function can be minimized. The trainer 230 may set parameters that minimize the cost function, and train the deep neural network-based model using the set parameters. In this case, the trainer 230 may designate the parameters as weights of the model” teaches the deep neural network-based model has weights; pg. 10 [0138] teaches non-transitory computer-readable storage media storing instructions or software that can be executed by computers):
providing at least one training input variable for the deep neural network (Fig. 2 and pg. 4 [0076]: “The adjacent pixel setter 210 may set a neighboring pixel pair consisting of a first pixel and a second pixel adjacent to the first pixel, with respect to each pixel in an image frame” teach the adjacent pixel setter 210 provides the trainer 230 (which trains the deep neural network-based recognition model) with pixel data, which corresponds to providing at least one training input variable for the deep neural network; also see Fig. 3);
ascertaining a variable characterizing a cost function, the variable characterizing the cost function including a first variable, which characterizes a deviation of an output variable of the deep neural network ascertained as a function of the provided training input variable relative to a predefinable setpoint output variable (pg. 4-5 [0078]: “Once each pixel in the image frame has been labeled, the cost function calculator 220 may calculate a cost function using the difference in entropy between the first pixel and the neighboring pixel pair. In an example, the cost function may indicate a difference between a label of the first pixel and a ground truth label. The ground truth label is data that is related to the actual label of each pixel that is to be labeled using the deep neural network-based model; the ground truth label serves as a standard to gauge the accuracy of labeling. The deep neural network-based model produces a probability density function to be used to label each pixel, and labels each pixel with a class of the highest probability. The cost function calculator 220 calculates a difference between the label of the first pixel and the ground truth label to obtain an index that represents the accuracy of the label of the first pixel that is chosen by the pixel labeler” teaches a variable characterizing the cost function includes a first variable which characterizes a difference (corresponds to deviation) of the label (corresponds to output variable) generated by the deep neural network-based model ascertained as a function of the provided input including the first pixel (corresponds to provided training input variable) relative to a ground truth label (corresponds to predefinable setpoint output variable)),...
training the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network (pg. 1 [0017]: “The object recognition apparatus may include a memory configured to store instructions, and wherein the processor may be configured to execute the instructions to set the neighboring pixel pairs in the image frame, the each neighboring pixel pair including the first pixel and the one or more second pixels adjacent to the first pixel, label the first pixel using the deep neural network-based model based on the probability density function value of the neighboring pixel pairs, and recognize the object based on the labeled first pixel” and pg. 5 [0087]: “The trainer 230 trains the deep neural network-based recognition model based on the computed cost function. According to one example, the trainer 230 trains the deep neural network-based model in such a way that the cost function can be minimized. The trainer 230 may set parameters that minimize the cost function, and train the deep neural network-based model using the set parameters. In this case, the trainer 230 may designate the parameters as weights of the model” teach training the deep neural network-based model to detect an object as a function of the training input variable).
Kim et al. does not appear to explicitly teach and the weights in the created deep neural network being able to assume only predefinable discrete values from a predefinable list of discrete values,...and the variable characterizing the cost function further including at least one penalization variable which characterizes a deviation of a value of one of the weights from at least one of at least two of the predefinable discrete values;... at least one value of one of the weights being adapted during the training of the deep neural network as a function of the variable characterizing the cost function; and mapping values of the weights on one discrete value each contained in the predefinable list.
However, Doshi et al. teaches and the weights in the created deep neural network being able to assume only predefinable discrete values from a predefinable list of discrete values (pg. 1955 Section III B: “In order to convert the floating-point DNN to a fixed-point DNN model, we calculated the range of values of parameters in the DNN. We firstly train the DNN model in floating-point and record the range of possible values for each target parameter being constrained. Then, we determine a magnitude r which captures approximately the central 95% of values (to avoid outliers). This entails constraining certain DNN model parameters to a given range [−r, r] and linearly quantizing all intermediate values...The fixed-point model constrains weights, biases, layer-outputs, back-propagation error, weight updates, and bias updates. Furthermore, the sigmoid function often used in the training of the hidden-layers has been quantized and represented with a lookup table to optimize execution speed” teaches the weights in the created deep neural network assumes predefinable discrete values from a given range of values (corresponds to predefinable list of discrete values)),...
and the variable characterizing the cost function further including at least one penalization variable which characterizes a deviation of a value of one of the weights from at least one of at least two of the predefinable discrete values;... at least one value of one of the weights being adapted during the training of the deep neural network as a function of the variable characterizing the cost (pg. 1954 last full paragraph: “In this paper, we propose an approach to simultaneously minimize complexity in terms of total bit depths and maximize the accuracy of fixed-point DNNs. A bit depth penalty term is proposed to be incorporated into the cost function during the training step of DNNs. The new bit penalty encourages lower bit depths, in addition to lower magnitudes of weights. Specifically, the training step adjusts weights near bit depth boundaries to take on a lower bit depth values” and pg. 1955 Section III C: 
    PNG
    media_image1.png
    395
    599
    media_image1.png
    Greyscale
 teaches a cost function with a bit depth penalty term (corresponds to penalization variable) that characterizes the deviation of a weight value represented in bits (for example, weights near bit depth boundaries) from weight value represented in a lower bit depth values, wherein the weight values are adapted during the training of the deep neural network as a function of the variables characterizing the cost function); 
and mapping values of the weights on one discrete value each contained in the predefinable list (pg. 1955 Section III B: “In order to convert the floating-point DNN to a fixed-point DNN model, we calculated the range of values of parameters in the DNN. We firstly train the DNN model in floating-point and record the range of possible values for each target parameter being constrained. Then, we determine a magnitude r which captures approximately the central 95% of values (to avoid outliers). This entails constraining certain DNN model parameters to a given range [−r, r] and linearly quantizing all intermediate values...The fixed-point model constrains weights, biases, layer-outputs, back-propagation error, weight updates, and bias updates. Furthermore, the sigmoid function often used in the training of the hidden-layers has been quantized and represented with a lookup table to optimize execution speed” teaches constraining (corresponds to mapping) the weights on one discrete value each contained in a given range of values (corresponds to values in predefinable list of discrete values)).
Kim et al. and Doshi et al. are analogous art to the claimed invention because they are directed to implementation of deep neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Doshi et al. to the disclosed invention of Kim et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “an effective method that modified the optimization procedure during the fine-tuning of the DNN training. Experimental results showed that having incorporated the proposed bit penalty term during the training of DNNs, the number of bits can be obviously reduced with limited accuracy drop. The proposed method can be used to develop the tool to analyze and adjust the hardware complexity of DNNs” (Doshi et al. pg. 1957 Section V).
Kim et al. in view of Doshi et al. does not appear to explicitly teach deep neural network including...connections having weights.
Prokhorov et al. teaches deep neural network including...connections having weights (pg. 6 [0080]: “FIG. 4 is an illustration of applying training map data 318 to train a deep neural network of the autonomous decision device 304. The autonomous decision device 304 may include an input layer 412, a batch normalization layer 414, a fully-connected hidden layer 416, and an output layer 481” and pg. 6 [0090]: “The fully-connected hidden layer 416 of the autonomous decision device 304 receives the output of the batch normalization layer 414. The term "fully-connected" relates to a user of convolutions over an input generate an output, providing a local connection, where each region of an input is connected to a neuron of an output. Each layer applies different filters (as may also be referred to as parameters). The fully-connected hidden layer 416 operates to perform transformations that are a function of activations based on the input, but also of filter (that is, weights and biases of the layer's neurons)” teach a deep neural network has neurons wherein inputs are connected to neurons (correspond to deep neural network with connections) and the connections have weights).
Kim et al., Doshi et al., and Prokhorov et al. are analogous art to the claimed invention because they are directed to implementation of deep neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Prokhorov et al. to the disclosed invention of Kim et al. in view of Doshi et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a deep neural network for object detection because “the deep neural network presents a knowledge-base of processed information data and/or experience data based on a learning based on previous driving experience data. This information data and/or experience data may then be drawn on to produce desired vehicle operational data based on a map layers and/or data input relating to a travel route of a vehicle” (Prokhorov et al. pg. 2 [0021]).
Regarding Claim 27,
Kim et al. teaches A device configured to create a deep neural network, the deep neural network including a plurality of layers...the device configured to (Fig. 3 and pg. 1 [0012]: “The deep neural network-based model may include any one or any combination of a convolutional neural network (CNN) model that uses spatial information and a recurrent deep neural network (RDNN) model that uses time information” teach creating a deep neural network-based model with a plurality of layers; pg. 5 [0087]: “The trainer 230 trains the deep neural network-based recognition model based on the computed cost function. According to one example, the trainer 230 trains the deep neural network-based model in such a way that the cost function can be minimized. The trainer 230 may set parameters that minimize the cost function, and train the deep neural network-based model using the set parameters. In this case, the trainer 230 may designate the parameters as weights of the model” teaches the deep neural network-based model has weights; pg. 9 [0136] teaches implementation by a computer that includes a digital signal processor or microprocessor):
provide at least one training input variable for the deep neural network (Fig. 2 and pg. 4 [0076]: “The adjacent pixel setter 210 may set a neighboring pixel pair consisting of a first pixel and a second pixel adjacent to the first pixel, with respect to each pixel in an image frame” teach the adjacent pixel setter 210 provides the trainer 230 (which trains the deep neural network-based recognition model) with pixel data, which corresponds to providing at least one training input variable for the deep neural network; also see Fig. 3);
ascertain a variable characterizing a cost function, the variable characterizing the cost function including a first variable, which characterizes a deviation of an output variable of the deep neural network ascertained as a function of the provided training input variable relative to a predefinable setpoint output variable (pg. 4-5 [0078]: “Once each pixel in the image frame has been labeled, the cost function calculator 220 may calculate a cost function using the difference in entropy between the first pixel and the neighboring pixel pair. In an example, the cost function may indicate a difference between a label of the first pixel and a ground truth label. The ground truth label is data that is related to the actual label of each pixel that is to be labeled using the deep neural network-based model; the ground truth label serves as a standard to gauge the accuracy of labeling. The deep neural network-based model produces a probability density function to be used to label each pixel, and labels each pixel with a class of the highest probability. The cost function calculator 220 calculates a difference between the label of the first pixel and the ground truth label to obtain an index that represents the accuracy of the label of the first pixel that is chosen by the pixel labeler” teaches a variable characterizing the cost function includes a first variable which characterizes a difference (corresponds to deviation) of the label (corresponds to output variable) generated by the deep neural network-based model ascertained as a function of the provided input including the first pixel (corresponds to provided training input variable) relative to a ground truth label (corresponds to predefinable setpoint output variable)),...
train the deep neural network in such a way that the deep neural network detects an object as a function of the training input variable of the deep neural network (pg. 1 [0017]: “The object recognition apparatus may include a memory configured to store instructions, and wherein the processor may be configured to execute the instructions to set the neighboring pixel pairs in the image frame, the each neighboring pixel pair including the first pixel and the one or more second pixels adjacent to the first pixel, label the first pixel using the deep neural network-based model based on the probability density function value of the neighboring pixel pairs, and recognize the object based on the labeled first pixel” and pg. 5 [0087]: “The trainer 230 trains the deep neural network-based recognition model based on the computed cost function. According to one example, the trainer 230 trains the deep neural network-based model in such a way that the cost function can be minimized. The trainer 230 may set parameters that minimize the cost function, and train the deep neural network-based model using the set parameters. In this case, the trainer 230 may designate the parameters as weights of the model” teach training the deep neural network-based model to detect an object as a function of the training input variable).
Kim et al. does not appear to explicitly teach and the weights in the created deep neural network being able to assume only predefinable discrete values from a predefinable list of discrete values,...and the variable characterizing the cost function further including at least one penalization variable which characterizes a deviation of a value of one of the weights from at least one of at least two of the predefinable discrete values;...at least one value of one of the weights being adapted during the training of the deep neural network as a function of the variable characterizing the cost function; and map values of the weights on one discrete value each contained in the predefinable list.
However, Doshi et al. teaches and the weights in the created deep neural network being able to assume only predefinable discrete values from a predefinable list of discrete values (pg. 1955 Section III B: “In order to convert the floating-point DNN to a fixed-point DNN model, we calculated the range of values of parameters in the DNN. We firstly train the DNN model in floating-point and record the range of possible values for each target parameter being constrained. Then, we determine a magnitude r which captures approximately the central 95% of values (to avoid outliers). This entails constraining certain DNN model parameters to a given range [−r, r] and linearly quantizing all intermediate values...The fixed-point model constrains weights, biases, layer-outputs, back-propagation error, weight updates, and bias updates. Furthermore, the sigmoid function often used in the training of the hidden-layers has been quantized and represented with a lookup table to optimize execution speed” teaches the weights in the created deep neural network assumes predefinable discrete values from a given range of values (corresponds to predefinable list of discrete values)),...
and the variable characterizing the cost function further including at least one penalization variable which characterizes a deviation of a value of one of the weights from at least one of at least two of the predefinable discrete values;...at least one value of one of the weights being adapted during the training of the deep neural network as a function of the variable characterizing the cost function (pg. 1954 last full paragraph: “In this paper, we propose an approach to simultaneously minimize complexity in terms of total bit depths and maximize the accuracy of fixed-point DNNs. A bit depth penalty term is proposed to be incorporated into the cost function during the training step of DNNs. The new bit penalty encourages lower bit depths, in addition to lower magnitudes of weights. Specifically, the training step adjusts weights near bit depth boundaries to take on a lower bit depth values” and pg. 1955 Section III C: 
    PNG
    media_image1.png
    395
    599
    media_image1.png
    Greyscale
 teaches a cost function with a bit depth penalty term (corresponds to penalization variable) that characterizes the deviation of a weight value represented in bits (for example, weights near bit depth boundaries) from weight value represented in a lower bit depth values, wherein the weight values are adapted during the training of the deep neural network as a function of the variables characterizing the cost function); 
and map values of the weights on one discrete value each contained in the predefinable list (pg. 1955 Section III B: “In order to convert the floating-point DNN to a fixed-point DNN model, we calculated the range of values of parameters in the DNN. We firstly train the DNN model in floating-point and record the range of possible values for each target parameter being constrained. Then, we determine a magnitude r which captures approximately the central 95% of values (to avoid outliers). This entails constraining certain DNN model parameters to a given range [−r, r] and linearly quantizing all intermediate values...The fixed-point model constrains weights, biases, layer-outputs, back-propagation error, weight updates, and bias updates. Furthermore, the sigmoid function often used in the training of the hidden-layers has been quantized and represented with a lookup table to optimize execution speed” teaches constraining (corresponds to mapping) the weights on one discrete value each contained in a given range of values (corresponds to values in predefinable list of discrete values)).
Kim et al. and Doshi et al. are analogous art to the claimed invention because they are directed to implementation of deep neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Doshi et al. to the disclosed invention of Kim et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “an effective method that modified the optimization procedure during the fine-tuning of the DNN training. Experimental results showed that having incorporated the proposed bit penalty term during the training of DNNs, the number of bits can be obviously reduced with limited accuracy drop. The proposed method can be used to develop the tool to analyze and adjust the hardware complexity of DNNs” (Doshi et al. pg. 1957 Section V).
Kim et al. in view of Doshi et al. does not appear to explicitly teach deep neural network including...connections having weights.
Prokhorov et al. teaches deep neural network including...connections having weights (pg. 6 [0080]: “FIG. 4 is an illustration of applying training map data 318 to train a deep neural network of the autonomous decision device 304. The autonomous decision device 304 may include an input layer 412, a batch normalization layer 414, a fully-connected hidden layer 416, and an output layer 481” and pg. 6 [0090]: “The fully-connected hidden layer 416 of the autonomous decision device 304 receives the output of the batch normalization layer 414. The term "fully-connected" relates to a user of convolutions over an input generate an output, providing a local connection, where each region of an input is connected to a neuron of an output. Each layer applies different filters (as may also be referred to as parameters). The fully-connected hidden layer 416 operates to perform transformations that are a function of activations based on the input, but also of filter (that is, weights and biases of the layer's neurons)” teach a deep neural network has neurons wherein inputs are connected to neurons (correspond to deep neural network with connections) and the connections have weights).
Kim et al., Doshi et al., and Prokhorov et al. are analogous art to the claimed invention because they are directed to implementation of deep neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Prokhorov et al. to the disclosed invention of Kim et al. in view of Doshi et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a deep neural network for object detection because “the deep neural network presents a knowledge-base of processed information data and/or experience data based on a learning based on previous driving experience data. This information data and/or experience data may then be drawn on to produce desired vehicle operational data based on a map layers and/or data input relating to a travel route of a vehicle” (Prokhorov et al. pg. 2 [0021]).

Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2017/0083772 A1) in view of Doshi et al. (“Deep Learning Neural Networks Optimization using Hardware Cost Penalty”) in view of Prokhorov et al. (US 2018/0074493 A1) and further in view of Li et al. (“Ternary weight networks”).
Regarding Claim 23,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. teaches the method as recited in claim 15.
Kim et al. in view of Doshi et al. in view of Prokhorov et al. does not appear to explicitly teach wherein one of the at least two of the predefinable discrete values is a value of "0".
However, Li et al. teaches wherein one of the at least two of the predefinable discrete values is a value of "0" (pg. 2 Section 2.2, Equation (3) teaches weights can be set to one of three predefinable discrete values (corresponds to at least two of the predefinable discrete values), including +1, 0, or -1).
Kim et al., Doshi et al., Prokhorov et al., and Li et al. are analogous art to the claimed invention because they are directed to implementation of deep neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Li et al. to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “an approximated solution with a simple but accurate ternary function” for ternary weight networks (TWNs) optimization wherein “[t]he proposed TWNs find a balance between the high accuracy of TWNs and the high model compression rate as well as potentially low computational requirements of BPWNs” (Li et al. pg. 4 Section 4).

Claims 16-17 and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2017/0083772 A1) in view of Doshi et al. (“Deep Learning Neural Networks Optimization using Hardware Cost Penalty”) in view of Prokhorov et al. (US 2018/0074493 A1) and further in view of Kingma et al. (“Variational Dropout and the Local Reparameterization Trick”).
Regarding Claim 16,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. teaches the method as recited in claim 15.
Kim et al. in view of Doshi et al. in view of Prokhorov et al. does not appear to explicitly teach wherein the penalization variable characterizes a deviation of a posterior distribution function of one of the weights from a prior distribution function of the predefinable discrete values of the one of the weights
However, Kingma et al. teaches wherein the penalization variable characterizes a deviation of a posterior distribution function of one of the weights from a prior distribution function of the predefinable discrete values of the one of the weights (pg. 2 Section 2: 
    PNG
    media_image2.png
    307
    772
    media_image2.png
    Greyscale
teaches                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                        
                    , the approximation of posterior distribution of weights and                         
                            p
                            
                                
                                    w
                                
                            
                        
                    , prior distribution of predefinable discrete values of weights; pg. 2 Section 2.1: “We’ll assume that the remaining term in the variational lower bound,                         
                            
                                
                                    D
                                
                                
                                    K
                                    L
                                
                            
                             
                            (
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                            |
                            |
                             
                             
                            p
                            
                                
                                    w
                                
                            
                            )
                        
                    , can be computed deterministically, but otherwise it may be approximated similarly” teaches the KL-divergence penalty term                         
                            
                                
                                    D
                                
                                
                                    K
                                    L
                                
                            
                             
                            (
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                            |
                            |
                             
                             
                            p
                            
                                
                                    w
                                
                            
                        
                     (corresponds to penalization variable) characterizes a deviation between                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                        
                    and                         
                            p
                            
                                
                                    w
                                
                            
                        
                    )
Kim et al., Doshi et al., Prokhorov et al., and Kingma et al. are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Kingma et al. to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[e]fficiency of posterior inference using stochastic gradient-based variational Bayes (SGVB) can often be significantly improved through a local reparameterization where global parameter uncertainty is translated into local uncertainty per datapoint” (Kingma et al. pg. 8 Section 6).
Regarding Claim 17,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. teaches the method as recited in claim 16.
Kingma et al. further teaches wherein the prior distribution function for a predefinable subset of the weights of the neural network is selected as a function of a topology of a part of the deep neural network associated with the predefinable subset (pg. 4 Section 3: “Dropout is a technique for regularization of neural network parameters, which works by adding multiplicative noise to the input of each layer of the neural network during optimization...Here, we re-interpret dropout with continuous noise as a variational method, and propose a generalization that we call variational dropout. In developing variational dropout we provide a firm Bayesian justification for dropout training by deriving its implicit prior distribution and variational objective. This new interpretation allows us to propose several useful extensions to dropout, such as a principled way of making the normally fixed dropout rates p adaptive to the data” and pg. 7 third to fourth paragraph: “We choose the same architecture as [20]: a fully connected neural network with 3 hidden layers and rectified linear units (ReLUs)...To do this we train the neural network described above for either 10 epochs (test error 3%) or 100 epochs (test error 1.3%), using variational dropout with independent weight noise” teach the prior distribution for subset of weights of a neural network is based on the topology of neural network as affected by variational dropout).
Kim et al., Doshi et al., Prokhorov et al., and Kingma et al. are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Kingma et al. to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[e]fficiency of posterior inference using stochastic gradient-based variational Bayes (SGVB) can often be significantly improved through a local reparameterization where global parameter uncertainty is translated into local uncertainty per datapoint” (Kingma et al. pg. 8 Section 6).
Regarding Claim 21,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. teaches the method as recited in claim 16.
Kingma et al. further teaches wherein the deviation of the posterior distribution function from the prior distribution function is ascertained based on an approximation of a Kullback-Leibler divergence between the posterior distribution function and the prior distribution function (pg. 2 Section 2: 
    PNG
    media_image2.png
    307
    772
    media_image2.png
    Greyscale
teaches                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                        
                    , the approximation of posterior distribution and                         
                            p
                            
                                
                                    w
                                
                            
                        
                    , the prior distribution; pg. 2 Section 2.1: “We’ll assume that the remaining term in the variational lower bound,                         
                            
                                
                                    D
                                
                                
                                    K
                                    L
                                
                            
                             
                            (
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                            |
                            |
                             
                             
                            p
                            
                                
                                    w
                                
                            
                            )
                        
                    , can be computed deterministically, but otherwise it may be approximated similarly” teaches the Kullback-Leibler (KL)-divergence penalty term                         
                            
                                
                                    D
                                
                                
                                    K
                                    L
                                
                            
                             
                            (
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                            |
                            |
                             
                             
                            p
                            
                                
                                    w
                                
                            
                        
                     characterizes a deviation between                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                        
                    and                         
                            p
                            
                                
                                    w
                                
                            
                        
                    ; since the KL-divergence is determined based on an approximation of the posterior distribution, the KL-divergence can be considered an approximation).
Kim et al., Doshi et al., Prokhorov et al., and Kingma et al. are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Kingma et al. to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[e]fficiency of posterior inference using stochastic gradient-based variational Bayes (SGVB) can often be significantly improved through a local reparameterization where global parameter uncertainty is translated into local uncertainty per datapoint” (Kingma et al. pg. 8 Section 6).
Regarding Claim 22,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. teaches the method as recited in claim 16.
Kingma et al. further teaches wherein during the training of the deep neural network, the posterior distribution functions is adapted as a function of the cost function, the weight, which is characterized by the adapted posterior distribution function, being adapted as a function of the adapted posterior distribution function (pg. 2 Section 2: 
    PNG
    media_image2.png
    307
    772
    media_image2.png
    Greyscale
teaches                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                        
                    , the approximation of posterior distribution of weights and                         
                            p
                            
                                
                                    w
                                
                            
                        
                    , prior distribution of predefinable discrete values of weights; pg. 2 Section 2.1: “We’ll assume that the remaining term in the variational lower bound,                         
                            
                                
                                    D
                                
                                
                                    K
                                    L
                                
                            
                             
                            (
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                            |
                            |
                             
                             
                            p
                            
                                
                                    w
                                
                            
                            )
                        
                    , can be computed deterministically, but otherwise it may be approximated similarly” teaches the posterior distribution                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                        
                     is adapted as a function of the KL-divergence penalty                         
                            
                                
                                    D
                                
                                
                                    K
                                    L
                                
                            
                             
                            (
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                            |
                            |
                             
                             
                            p
                            
                                
                                    w
                                
                            
                        
                     (corresponds to cost function) and the weight                         
                            w
                        
                     is adapted as a function of                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                        
                    ).
Kim et al., Doshi et al., Prokhorov et al., and Kingma et al. are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Kingma et al. to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[e]fficiency of posterior inference using stochastic gradient-based variational Bayes (SGVB) can often be significantly improved through a local reparameterization where global parameter uncertainty is translated into local uncertainty per datapoint” (Kingma et al. pg. 8 Section 6).


Claims 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2017/0083772 A1) in view of Doshi et al. (“Deep Learning Neural Networks Optimization using Hardware Cost Penalty”) in view of Prokhorov et al. (US 2018/0074493 A1) in view of Kingma et al. (“Variational Dropout and the Local Reparameterization Trick”) and further in view of Li et al. (“Dropout Inference in Bayesian Neural Networks with Alpha-divergences”; hereinafter “Li-2”).
Regarding Claim 18,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. teaches the method as recited in claim 16.
Kim et al. further teaches wherein the penalization function characterizes a weighted summation of ascertained deviations (pg. 5 [0079]- [0080] teaches the cost function (corresponds to penalization function) defined by Equation 2, which characterizes a summation of ascertained KL-divergence (deviations) between i and j wherein the deviations are weighted by λ).
Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. does not appear to explicitly teach ...one deviation each of the posterior distribution function of one of the weights relative to the prior distribution function being ascertained at one position each, which is assigned to one each of the predefinable discrete values, and the ascertained deviation being weighted on the basis of a weighting function, which is assigned to the respective predefinable discrete value.
However, Li-2 teaches ...one deviation each of the posterior distribution function of one of the weights relative to the prior distribution function being ascertained at one position each, which is assigned to one each of the predefinable discrete values, and the ascertained deviation being weighted on the basis of a weighting function, which is assigned to the respective predefinable discrete value (pg. 5 Section 4 second column: 
    PNG
    media_image3.png
    369
    462
    media_image3.png
    Greyscale
teaches the KL divergence (deviation) of the posterior distribution                         
                            q
                        
                     and prior distribution                         
                            
                                
                                    p
                                
                                
                                    0
                                
                            
                        
                     wherein each distribution contains multiple points with values (see pg. 3 first paragraph) wherein Equation (7) provides that the KL divergence (deviation) is weighted on the basis of weighting function 
    PNG
    media_image4.png
    53
    327
    media_image4.png
    Greyscale
).
Kim et al., Doshi et al., Prokhorov et al., Kingma et al., and Li-2 are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Li-2 to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage a technique that “often supersedes existing approximate inference techniques (even sparse Gaussian processes), and is easy to implement” (Li-2 pg. 9 Section 6).
Regarding Claim 20,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. in view of Li-2 teaches the method as recited in claim 18.
Li-2 further teaches wherein one of the ascertained deviations is weighted based on a predefinable value less a sum of the respective weighting functions (pg. 5 Section 4 second column: 
    PNG
    media_image3.png
    369
    462
    media_image3.png
    Greyscale
teaches the ascertained KL divergence (deviation) is weighted based on a predefinable value (const) less a sum of weighting functions 
    PNG
    media_image4.png
    53
    327
    media_image4.png
    Greyscale
).
Kim et al., Doshi et al., Prokhorov et al., Kingma et al., and Li-2 are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Li-2 to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage a technique that “often supersedes existing approximate inference techniques (even sparse Gaussian processes), and is easy to implement” (Li-2 pg. 9 Section 6).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2017/0083772 A1) in view of Doshi et al. (“Deep Learning Neural Networks Optimization using Hardware Cost Penalty”) in view of Prokhorov et al. (US 2018/0074493 A1) in view of Kingma et al. (“Variational Dropout and the Local Reparameterization Trick”) and further in view of Li et al. (“Dropout Inference in Bayesian Neural Networks with Alpha-divergences”; hereinafter “Li-2”) in view of Lin et al. (US 2016/0328646 A1).
Regarding Claim 19,
Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. in view of Li-2 teaches the method as recited in claim 18.
Kingma et al. further teaches wherein the ascertained deviations are, in each case, an ascertained deviation of the posterior distribution function relative to a log uniform distribution function (pg. 2 Section 2: 
    PNG
    media_image2.png
    307
    772
    media_image2.png
    Greyscale
teaches                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                        
                    , the approximation of posterior distribution of weights and                         
                            p
                            
                                
                                    w
                                
                            
                        
                    , prior distribution of predefinable discrete values of weights; pg. 2 Section 2.1: “We’ll assume that the remaining term in the variational lower bound,                         
                            
                                
                                    D
                                
                                
                                    K
                                    L
                                
                            
                             
                            (
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                            |
                            |
                             
                             
                            p
                            
                                
                                    w
                                
                            
                            )
                        
                    , can be computed deterministically, but otherwise it may be approximated similarly” teaches the KL-divergence penalty term                         
                            
                                
                                    D
                                
                                
                                    K
                                    L
                                
                            
                             
                            (
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                            |
                            |
                             
                             
                            p
                            
                                
                                    w
                                
                            
                        
                     characterizes a deviation between                         
                            
                                
                                    q
                                
                                
                                    φ
                                
                            
                            
                                
                                    w
                                
                            
                             
                        
                    and                         
                            p
                            
                                
                                    w
                                
                            
                        
                    ; pg. 5 Section 3.3: “we show that the only prior that meets this requirement is the scale invariant log-uniform prior” teaches the prior distribution is a log uniform distribution, therefore the deviation is of a posterior distribution relative to a log uniform distribution).
Kim et al., Doshi et al., Prokhorov et al., and Kingma et al. are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Kingma et al. to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[e]fficiency of posterior inference using stochastic gradient-based variational Bayes (SGVB) can often be significantly improved through a local reparameterization where global parameter uncertainty is translated into local uncertainty per datapoint” (Kingma et al. pg. 8 Section 6).
Li-2 further teaches the ascertained deviation...and being weighted baesd on the weighting function, which is assigned to the respective predefinable discrete value (pg. 5 Section 4 second column: 
    PNG
    media_image3.png
    369
    462
    media_image3.png
    Greyscale
teaches the KL divergence (deviation) of the posterior distribution                         
                            q
                        
                     and prior distribution                         
                            
                                
                                    p
                                
                                
                                    0
                                
                            
                        
                     wherein each distribution contains multiple points with values (see pg. 3 first paragraph) wherein Equation (7) provides that the KL divergence (deviation) is weighted on the basis of weighting function 
    PNG
    media_image4.png
    53
    327
    media_image4.png
    Greyscale
).
Kim et al., Doshi et al., Prokhorov et al., Kingma et al., and Li-2 are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Li-2 to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage a technique that “often supersedes existing approximate inference techniques (even sparse Gaussian processes), and is easy to implement” (Li-2 pg. 9 Section 6).
Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. in view of Li-2 does not appear to explicitly teach the ascertained deviation being shifted to one of the positions, respectively, of one of the at least two predefinable discrete values.
However, Lin et al. teaches the ascertained deviation being shifted to one of the positions, respectively, of one of the at least two predefinable discrete values (Fig. 6A-6B and pg. 6 [0066]: FIG. 6A illustrates an input distribution 600 for an exemplary deep convolutional network. In this example, the input distribution 600 includes a variance (a) and a mean value (μ). Aspects of the present disclosure are directed towards specifying a zero mean (μ=0) for the distributions of weights, biases and activation values, for example, as shown in FIG. 6B” and pg. 6 [0067]: “FIG. 6B illustrates a modified input distribution 650 for an exemplary deep convolutional network. In this configuration, the mean value (μ) is added to the standard deviation (variance (a)) to reduce computational overhead when determining the range of encoding” teach the ascertained deviation is being shifted to a position of a discrete value).
Kim et al., Doshi et al., Prokhorov et al., Kingma et al., Li-2, and Lin et al. are analogous art to the claimed invention because they are directed to implementation of neural network.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Lin et al. to the disclosed invention of Kim et al. in view of Doshi et al. in view of Prokhorov et al. in view of Kingma et al. in view of Li-2.
One of ordinary skill in the arts would have been motivated to make this modification to modify the distribution of weights in a neural network “to reduce computational overhead when determining the range of encoding” (Lin et al. pg. 6 [0067]).


Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Yoo et al. (US 2017/0140247 A1) teaches recognizing objects using a neural network, which is relevant to Fig. 1 of the present application. 



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YING YU CHEN/               Examiner, Art Unit 2125