Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The effective filing date is 3-13-18. 
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the 

Claims 1-8 are rejected under 35 U.S.C. sec. 103 as being unpatentable as obvious in view of U.S. Patent No.: 10,421,453 B2 to Ferguson et al. filed in 2016 which is prior to the effective filing date of 3-13-18 which is assigned to Waymo™ and in view of Shayer, Oran, et al., Learning Discrete Weights Using the Local Reparameterization Trick, Cornell University, (https://arxiv.org/abs/1710.07739)(Feb 2, 2018)(hereinafter “Shayer”) and in view of U.S. Patent Application Pub. No.: US 2010/0073199 A1 to Christophe et al. that was filed in 2008. 

    PNG
    media_image1.png
    703
    502
    media_image1.png
    Greyscale

Ferguson discloses “ 1.    A system for vehicle behavior prediction, the system comprising: an imaging device that captures images of a vehicle in traffic; (see perception system 172 where the vehicle has  (see col. 10, lines 1-19)
a processing device including policy stored in a memory of the processing device in communication with the imaging device to stochastically model future behavior of the vehicle based on the captured images; (this model is identified in paragraph 45 as a CNN or RNN neural network ) (see col. 14, lines 50 to 67 where a threshold value of 15 percent and below can be discarded as not reliable; whereas in FIG. 9 all possible future trajectories are taken in block 930 based on a set of possible actions and then a likelihood of each based on the contextual information is taken and then a final future trajectory is based on the likelihood and for each trajectory in blocks 910—960 where low probability events under 15 percent are discarded to simplify the number of trajectories based on contextual data)

a policy simulator in communication with the processing device that simulates the policy (see col. 8, line 11-45 where the intersection and information can be stored in the navigation system 168 and 132 to identify the characteristics of the road including lane lines and cross walks and provides context (see Col. 16, lines 15 to 65 and col. 15, lines 8-65) ,  (see col. 14, lines 50 to 67 where a threshold value of 15 percent and below can be discarded as not reliable; whereas in FIG. 9 all possible future trajectories are taken in block 930 based on a set of possible actions and then a likelihood of each based on the contextual information is taken and then a final future trajectory is based on the likelihood and for each trajectory in blocks 910—960 where low probability events under 15 percent are discarded to simplify the number of trajectories based on contextual data) (see Col. 16, lines 15 to 65 and col. 15, lines 8-65) ,  (see col. 14, lines 50 to 67 where a threshold value of 15 percent and below can be discarded as not reliable; whereas in FIG. 9 all possible future trajectories are taken in block 930 based on a set of possible actions and then a likelihood of each based on the contextual information is taken and then a final future trajectory is based on the likelihood and for each trajectory in blocks 910—960 where low probability events under 15 percent are discarded to simplify the number of trajectories based on contextual data) (see FIG. 7c and 8c where the road is a flat street and col. 8, lines 1-16 where the vegetation on the vehicle lanes also is monitored and if the lanes are blocked lanes), (see FIG. 7C where the vehicle 580 is a second vehicle and includes different trajectories of 710-2, 720-2 and 730 and 740-2),  (see Col. )  (see Col. 10, lines 53 to 67 where the brake lights and the signals such as turn signal is also taken) (see FIG. 7a-7c where the street is free of traffic and only includes two cars 580 and 100),
Ferguson is silent but Shayer teaches “…as a reparameterized pushforward policy of a base distribution;” (see page 1-3)
an evaluator that receives the simulated policy from the policy simulator and performs cross-entropy optimization” (see page 5-6)”.
on the future behavior of the vehicle by analyzing the simulated policy and updating the policy according to cross-entropy error; and (see page 7-8)”. 
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a 

Ferguson is silent but Christophe teaches “…an alert system that retrieves the future behavior of the vehicle and recognizes hazardous trajectories of the future trajectories and generates an audible alert using a speaker”. ( see paragraph 3-16 and claim 6);
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of 

Ferguson is silent but Shayer teaches “2.    The system as recited in claim 1, further including a density estimator that estimates a ground-truth probability for evaluating the future behavior. (see page 5-6 and section 4.3)”.

                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization 

Ferguson is silent but Shayer teaches “3.    The system as recited in claim 1, further including a policy model that simulates the policy as an autoregressive map of random noise sequences in terms of deterministic drift and stochastic diffusion”.  (see page 4-7 and table 1 where the error rate is shown and formulae 1-5 on page 3-4 where they use 
                 It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a processor to include training of discrete weights.  See abstract.   In a first phase the network can be trained with discrete weights. See section 3.2.  This is a forward pass.  However at the backward pass a back propagation is performed to determine an optimization. See page 4.  A probability of the weights is then added but it is set for .05 and .95 to never cross zero which would lose data.  See page 5. Then an entropy of the model is reduced.  As can be seen the validation error rates are provided to be much more accurate or 40.1 percent with much less data as a low probability items can 

Ferguson is silent but Shayer teaches “4.    The system as recited in claim 3, further including a sampler to sample the random noise sequences from a base distribution. (see page 4, section 4.1 where a random initializer is used that is a Xavier initializer) 
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a processor to include training of discrete weights.  See abstract.   In a first phase the network can be trained with discrete weights. See section 3.2.  

Ferguson is silent but Shayer teaches “5.    The system as recited in claim 4, wherein the base distribution is a Gaussian distribution.  (see page 5, section 4.2)
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization 

Ferguson is silent but Shayer teaches “6.    The system as recited in claim 1, wherein the evaluator includes the cross-entropy optimization including comparing the future behavior to a probability distribution of an example distribution to determine error of density of a predicted probability distribution corresponding to the future behavior and precision of probabilities of the predicted probability distribution”. (see page 5-6 section 4.3 “for this case they add a beta density regularizer on our probabilities, R(p) = p α−1 (1 − p) β−1 with α, β = 2. From here on we shall denote this regularization hyper-parameter as beta parameter. It is important to note that we are using R(p) as our regularizer and not − log(R(p)) (with α, β < 1) as might be more standard. This is because − log(R(p)) has a much stronger pull towards zero entropy. In general we note that unlike ternary weights, binary weights require more careful fine-tuning in order to perform well;)
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a processor to include training of discrete weights.  See abstract.   In a first 

Ferguson is silent but Shayer teaches “7.    The system as recited in claim 1, wherein the policy is a linear generator.  (see page 5, formula 6-8)
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization 

Ferguson is silent but Shayer teaches “8.    The system as recited in claim 1, wherein the policy is a convolutional neural network. (see abstract and FIG. 1 and page 1-2 and 6 element c7)


Claim 9 is rejected under 35 U.S.C. sec. 103 as being unpatentable as obvious in view of U.S. Patent No.: 10,421,453 B2 to Ferguson et al. filed in 2016 which is prior to the effective filing date of 3-13-18 which is assigned to Waymo™ and in view of Shayer, Oran, et al., Learning Discrete Weights Using the Local Reparameterization Trick, Cornell University, (https://arxiv.org/abs/1710.07739)(Feb 2, 2018)(hereinafter “Shayer”) and in view of U.S. Patent Application Pub. No.: US 2010/0073199 A1 to Christophe et al. that was filed in 2008 and in further in view of NPL, Diederik P. Kingma, Auto-Encoding Variational Bayes, (https://arxiv.org/pdf/1312.6114.pdf)(May 2014)(hereinafter “Kingma”). 

Ferguson is silent but Kingma teaches “…9.    The system as recited in claim 1, wherein the policy includes:
an encoder convolutional neural network to generate an environmentally reasoned encoding;  (see page 1, where a variational auto encoder is used for a recognition model and a neural network and 
an encoder recurrent neural network in parallel with the encoder convolutional neural network to generate a historically reasoned encoding; and (see page 3, where a probabilistic encoder is used for a recognition model and a neural network and page 6-8 and 12-14 and algorithm 2 on page 14)
 a decoder recurrent neural network to decode a combination of the environmentally reasoned encoding and the historically reasoned encoding. (see page 6-7 where a variational decoder is used); see page  
    PNG
    media_image2.png
    778
    1299
    media_image2.png
    Greyscale

                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of KINGMA with the disclosure of Ferguson since KINGMA teaches that a probabilistic encoder can be used with a Bayes neural network with an algorithm. The algorithm imposes a variational lower bound via a lower bound estimator. See abstract and section 2.2 and formula 1-3.   This can provide a decreased amount of training in 20-40 minutes per million training samples and an average variational lower bound per data point which can improve and provide a faster training.    See FIG. 2 and formula 13-15 in page 12. See abstract and pages 2-8. 

Claim 10 is rejected under 35 U.S.C. sec. 103 as being unpatentable as obvious in view of U.S. Patent No.: 10,421,453 B2 to Ferguson et al. filed in 2016 which is prior to the effective filing date of 3-13-18 which is assigned to Waymo™ and in view of Shayer, Oran, et al., Learning Discrete Weights Using the Local Reparameterization Trick, Cornell University, (https://arxiv.org/abs/1710.07739)(Feb 2, 2018)(hereinafter “Shayer”) and in view of U.S. Patent Application Pub. No.: US 2010/0073199 A1 to Christophe et al. that was filed in 2008 and in further in view of NPL, Diederik P. Kingma, Auto-Encoding Variational Bayes, (https://arxiv.org/pdf/1312.6114.pdf)(May 2014)(hereinafter “Kingma”) and in view of NPL, Shakir, The Spectator, Machine Learning Blog, Machine Learning Trick of the Day (4): Reparameterisation Tricks ( http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/)(October 29, 2015).
 Ferguson is silent but Shakir teaches “…10. The system as recited in claim 9, wherein each of the encoder recurrent neural network and the decoder recurrent neural network include gated recurrent units”.  (see page 2-4 where the bayes network is auto-encoded and decoded and 
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of Shakir with the disclosure of Ferguson since Shakir teaches that a neural network can include a problematic troublesome stochastic optimization problem. This can be problematic for the neural network as a random variable may be problematic to process. Re-expressing the troublesome stochastic optimization problem using random variate reparameterisation can be helpful using a different recurring unit or variable.  However, the random variate reparameterisation is a tool by which the neural network can substitute a random variable of a known distribution using a deterministic transformation of another random variable. This can provide a faster processing or a scalable Monte Carlo type gradient estimation using 


Claims 11-14 are rejected under 35 U.S.C. sec. 103 as being unpatentable as obvious in view of U.S. Patent No.: 10,421,453 B2 to Ferguson et al. filed in 2016 which is prior to the effective filing date of 3-13-18 which is assigned to Waymo™ and in view of Shayer, Oran, et al., Learning Discrete Weights Using the Local Reparameterization Trick, Cornell University, (https://arxiv.org/abs/1710.07739)(Feb 2, 2018)(hereinafter “Shayer”) and in view of U.S. Patent Application Pub. No.: US 2010/0073199 A1 to Christophe et al. that was filed in 2008. 

Ferguson discloses “ 111 A system for vehicle behavior prediction, the system comprising: an imaging device that captures images of a vehicle in traffic; (see perception system 172 where the vehicle has camera, radar, sonar and LIDAR; see col. 8, lines 61 to Col. 9, line 5) (see col. 10, lines 1-19)a processing device including policy stored in a memory of the processing device in communication with the imaging device to stochastically model future behavior of the vehicle based on the captured images; (see col. 14, lines 50 to 67 where a threshold value of 15 percent and below can be discarded as not reliable; whereas in FIG. 9 all possible future trajectories are taken in block 930 based on a set of possible actions and then a likelihood of each based on the contextual information is taken and then a final future trajectory is based on the likelihood and for each trajectory in blocks 910—960 where low probability events under 15 percent are discarded to simplify the number of trajectories based on contextual data)

a policy simulator in communication with the processing device that simulates the policy … on the future behavior of the vehicle (see col. 8, line 11-45 where the intersection and information can be stored in the navigation system 168 and 132 to identify the characteristics of the road including lane lines and cross walks and provides context information at col. 10, line 64 associated with the intersection;) (see Col. 16, lines 15 to 65 and col. 15, lines 8-65) ,  (see col. 14, lines 50 to 67 where a threshold value of 15 percent and below can be discarded as not reliable; whereas in FIG. 9 all possible future trajectories are taken in block 930 based on a set of possible actions and  (see Col. 16, lines 15 to 65 and col. 15, lines 8-65) ,  (see col. 14, lines 50 to 67 where a threshold value of 15 percent and below can be discarded as not reliable; whereas in FIG. 9 all possible future trajectories are taken in block 930 based on a set of possible actions and then a likelihood of each based on the contextual information is taken and then a final future trajectory is based on the likelihood and for each trajectory in blocks 910—960 where low probability events under 15 percent are discarded to simplify the number of trajectories based on contextual data) (see FIG. 7c and 8c where the road is a flat street and col. 8, lines 1-16 where the vegetation on the vehicle lanes also is monitored and if the lanes are blocked lanes), (see FIG. 7C where the vehicle 580 is a second vehicle and includes different trajectories of 710-2, 720-2 and 730 and 740-2),  (see Col. 10, lines 18 to 65)  (see Col. 10, lines 53 to 67 where the brake lights and the signals such as turn signal is also taken) (see FIG. 7a-7c where the street is free of traffic and only includes two cars 580 and 100),

 “…as a reparameterized pushforward of a base distribution (see page 1-3), including a policy model that simulates the policy as an autoregressive map of random noise sequences in terms of deterministic drift and stochastic diffusion; (see page 4-7 and table 1 where the error rate is shown and formulae 1-5 on page 3-4 where they use a stochastic network model in which each weight w l ij is sampled independently from a multinomial distribution Wl ij; see also page 5, section 4.2 where a probability decay hyper parameter is added in addition to a weight decay parameter)
a density estimator that estimates a ground-truth probability for evaluating the future behavior; (see page 5-6 and section 4.3)”.
an evaluator in communication with the policy simulator and the density estimator that performs cross-entropy optimization (see page 5-6)”.
…by analyzing the simulated policy and the ground-truth probability according to density of a probability distribution and precision of predicted probabilities corresponding to the future behavior, and updating the policy according to cross-entropy error; and (see page 7-8)”. 


 Ferguson is silent but Christophe teaches “…an alert system that recognizes hazardous trajectories of the probable future trajectories and generates and audible alert using a speaker. ”. ( see paragraph 3-16 and claim 6);
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of CHRISTOPHE with the disclosure of Ferguson since Christophe teaches that a hazardous trajectory can be identified and a vehicle can provide a warning message to provide an audible alarm that repeats so the operator can avoid the hazard and then take a new trajectory.    See abstract and paragraph 1-16 of Christophe et al.

Ferguson is silent but Shayer teaches “ 12.    The system as recited in claim 11, further including a sampler to sample the random noise sequences from a base distribution. (see page 4, section 4.1 where a random initializer is used that is a Xavier initializer);


Ferguson is silent but Shayer teaches “ 13.    The system as recited in claim 12, wherein the base distribution is a Gaussian distribution. .  (see page 5, section 4.2);
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a processor to include training of discrete weights.  See abstract.   In a first phase the network can be trained with discrete weights. See section 3.2.  This is a forward pass.  However at the backward pass a back propagation is performed to determine an optimization. See page 4.  A probability of the weights is then added but it is set for .05 and .95 to never cross zero which would lose data.  See page 5. Then an entropy of the model is reduced.  As can be seen the validation error rates are provided to be much more 

Ferguson is silent but Shayer teaches “ 14.    The system as recited in claim 11, wherein the evaluator includes the cross-entropy optimization including comparing the future behavior a probability distribution of an example distribution to determine error of density of a predicted probability distribution corresponding to the future behavior and precision of probabilities of the predicted probability distribution. ”. (see page 5-6 section 4.3 “for this case they add a beta density regularizer on our probabilities, R(p) = p α−1 (1 − p) β−1 with α, β = 2. From here on we shall denote this regularization hyper-parameter as beta parameter. It is important to note that we are using R(p) as our regularizer and not − log(R(p)) (with α, β < 1) as might be more standard. This is because − log(R(p)) has a much stronger pull towards zero 
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a processor to include training of discrete weights.  See abstract.   In a first phase the network can be trained with discrete weights. See section 3.2.  This is a forward pass.  However at the backward pass a back propagation is performed to determine an optimization. See page 4.  A probability of the weights is then added but it is set for .05 and .95 to never cross zero which would lose data.  See page 5. Then an entropy of the model is reduced.  As can be seen the validation error rates are provided to be much more accurate or 40.1 percent with much less data as a low probability items can be discarded as high probability trained weights are used.   This can provide a low power consuming for training of the neural network as it is 

Claim 15 is rejected under 35 U.S.C. sec. 103 as being unpatentable as obvious in view of U.S. Patent No.: 10,421,453 B2 to Ferguson et al. filed in 2016 which is prior to the effective filing date of 3-13-18 which is assigned to Waymo™ and in view of Shayer, Oran, et al., Learning Discrete Weights Using the Local Reparameterization Trick, Cornell University, (https://arxiv.org/abs/1710.07739)(Feb 2, 2018)(hereinafter “Shayer”) and in view of U.S. Patent Application Pub. No.: US 2010/0073199 A1 to Christophe et al. that was filed in 2008 and in further in view of NPL, Diederik P. Kingma, Auto-Encoding Variational Bayes, (https://arxiv.org/pdf/1312.6114.pdf)(May 2014)(hereinafter “Kingma”). 

Ferguson is silent but Kingma teaches “… 15.    The system as recited in claim 11, wherein the policy includes:
an encoder convolutional neural network to generate an environmentally reasoned encoding;  (see page 1, where a variational auto encoder is used for a recognition model and a neural network and section 3 page 5)
an encoder recurrent neural network in parallel with the encoder convolutional neural network to generate a historically reasoned encoding; and (see page 3, where a probabilistic encoder is used for a recognition model and a neural network and page 6-8)
a decoder recurrent neural network to decode a combination of the environmentally reasoned encoding and the historically reasoned encoding. (see page 6-7 where a variational decoder is used) ;
                 It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of KINGMA with the disclosure of Ferguson since KINGMA teaches that a probabilistic encoder can be used with a Bayes neural network with an algorithm. The algorithm imposes a variational lower bound via a lower bound estimator. See abstract and section 2.2 and formula 1-3.   This can provide a decreased amount of training in 20-40 minutes per million training samples and an average variational lower bound per data point which can 

Claim 16 is rejected under 35 U.S.C. sec. 103 as being unpatentable as obvious in view of U.S. Patent No.: 10,421,453 B2 to Ferguson et al. filed in 2016 which is prior to the effective filing date of 3-13-18 which is assigned to Waymo™ and in view of Shayer, Oran, et al., Learning Discrete Weights Using the Local Reparameterization Trick, Cornell University, (https://arxiv.org/abs/1710.07739)(Feb 2, 2018)(hereinafter “Shayer”) and in view of U.S. Patent Application Pub. No.: US 2010/0073199 A1 to Christophe et al. that was filed in 2008 and in further in view of NPL, Diederik P. Kingma, Auto-Encoding Variational Bayes, (https://arxiv.org/pdf/1312.6114.pdf)(May 2014)(hereinafter “Kingma”) and in view of NPL, Shakir, The Spectator, Machine Learning Blog, Machine Learning Trick of the Day (4): Reparameterisation Tricks ( http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/)(October 29, 2015).
 Ferguson is silent but Shakir teaches “… 16.    The system as recited in claim 15, wherein each of the encoder recurrent neural network and the decoder recurrent neural network include gated recurrent units. ”.  (see page 2-4 where the bayes network is auto-encoded and decoded and where the continuously random variable is expressed with a one line reparameterization that is known; In the second line, we reparameterised our random variable in terms of a one-line generating mechanism. In the final line, the gradient is now unrelated to the distribution with which we take the expectation, so easily passes through the integral. Our assumptions throughout this process were simple: 1) the use of a continuous random variable z with a known one-line reparameterisation,)
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of Shakir with the disclosure of Ferguson since Shakir teaches that a neural network can include a problematic troublesome stochastic optimization problem. This can be problematic for the neural network as a random variable may be problematic to process. Re-expressing the troublesome stochastic optimization problem using random variate reparameterisation can be helpful using a different recurring unit or variable.  However, the random variate reparameterisation is a tool by which the neural network 

Claims 17-20 are rejected under 35 U.S.C. sec. 103 as being unpatentable as obvious in view of U.S. Patent No.: 10,421,453 B2 to Ferguson et al. filed in 2016 which is prior to the effective filing date of 3-13-18 which is assigned to Waymo™ and in view of Shayer, Oran, et al., Learning Discrete Weights Using the Local Reparameterization Trick, Cornell University, (https://arxiv.org/abs/1710.07739)(Feb 2, 2018)(hereinafter “Shayer”) and in view of U.S. Patent Application Pub. No.: US 2010/0073199 A1 to Christophe et al. that was filed in 2008. 

Ferguson discloses “ 17.    A method for vehicle behavior prediction, the method comprising: capturing images of a vehicle in traffic an imaging device; (see perception system 172 where the vehicle has camera, radar, sonar and LIDAR; see col. 8, lines 61 to Col. 9, line 5) (see col. 10, lines 1-19)

stochastically modelling future behavior of the vehicle with policy stored in a memory of a processing device based on the captured images; (see col. 14, lines 50 to 67 where a threshold value of 15 percent and below can be discarded as not reliable; whereas in FIG. 9 all possible future trajectories are taken in block 930 based on a set of possible actions and then a likelihood of each based on the contextual information is taken and then a final future trajectory is based on the likelihood and for each trajectory in blocks 910—960 where low probability events under 15 percent are discarded to simplify the number of trajectories based on contextual data)

simulating the policy (see col. 8, line 11-45 where the intersection and information can be stored in the navigation system 168 and 132 to identify the characteristics of the road including lane lines and cross walks and provides context information at col. 10, line 64 associated with the intersection;) (see Col. 16, lines  (see Col. 16, lines 15 to 65 and col. 15, lines 8-65) ,  (see col. 14, lines 50 to 67 where a threshold value of 15 percent and below can be discarded as not reliable; whereas in FIG. 9 all possible future trajectories are taken in block 930 based on a set of possible actions and then a likelihood of each based on the contextual information is taken and then a final future trajectory is based on the likelihood and for each trajectory in blocks 910—960 where low probability events under 15 percent are discarded to simplify the number of trajectories based on contextual data) (see FIG. 7c and 8c where the road is a flat street and col. 8, lines 1-16 where the vegetation on the vehicle lanes also is monitored and if the lanes are blocked lanes), (see FIG. 7C where the vehicle 580 is a second vehicle and includes different trajectories of 710-2, 720-2 and 730 and 740-2),  (see Col. 10, lines 18 to 65)  (see Col. 10, lines 53 to 67 where the brake lights and ) (see FIG. 7a-7c where the street is free of traffic and only includes two cars 580 and 100),
Ferguson is silent but Shayer teaches “… as a reparameterized pushforward of a base distribution (see page 1-3) with a policy simulator; (see page 5-6)”.
performing cross-entropy optimization on the future behavior of the vehicle by analyzing the simulated policy (see page 1-8) and updating the policy according to cross-entropy error using an evaluator; and (see page 7-8);
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a processor to include training of discrete weights.  See abstract.   In a first phase the network can be trained with discrete weights. See section 3.2.  

Ferguson is silent but Christophe teaches “… recognizing hazardous trajectories of the probable future trajectories and generates and audible alert using a speaker with an alert system. ”. ( see paragraph 3-16 and claim 6);
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of CHRISTOPHE with the disclosure of Ferguson since Christophe teaches that a hazardous trajectory can be identified and a vehicle can provide a 

Ferguson is silent but Shayer teaches “ 18.    The system as recited in claim 17, further including estimating a ground-truth probability with a density estimator for evaluating the future behavior. (see page 5-6 and section 4.3)”.
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a processor to include training of discrete weights.  See abstract.   In a first phase the network can be trained with discrete weights. See section 3.2.  This is a forward pass.  However at the backward pass a back propagation 

Ferguson is silent but Shayer teaches “ 19.    The system as recited in claim 17, further including simulating the policy as an autoregressive map of random noise sequences in terms of deterministic drift and stochastic diffusion using a policy model. ”.  (see page 4-7 and table 1 where the error rate is shown and formulae 1-5 on page 3-4 where they use a stochastic network model in which each weight w l ij is sampled independently from a multinomial distribution Wl ij; see also page 5, section 4.2 where a probability decay hyper parameter is added in addition to a weight decay parameter)
                 It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization networks), for training neural networks with discrete weights using stochastic parameters. An algorithm can use a local reparameterization trick, previously used to train Gaussian distributed weights, to enable a processor to include training of discrete weights.  See abstract.   In a first phase the network can be trained with discrete weights. See section 3.2.  This is a forward pass.  However at the backward pass a back propagation is performed to determine an optimization. See page 4.  A probability of the weights is then added but it is set for .05 and .95 to never cross zero which would lose data.  See page 5. Then an entropy of the model is reduced.  As can be seen the validation error rates are provided to be much more accurate or 40.1 percent with much less data as a low probability items can be discarded as high probability trained weights are used.   This can provide a low power consuming for training of the neural network as it is trained with modest computation, and memory and using a sparse ternary network.  See abstract and page 1-8

Ferguson is silent but Shayer teaches “ 20. The system as recited in claim 17, wherein the cross-entropy optimization includes comparing the future behavior to a probability distribution of an example distribution to determine error of density of a predicted probability distribution corresponding to the future behavior and precision of probabilities of the predicted probability distribution. ”. (see page 5-6 section 4.3 “for this case they add a beta density regularizer on our probabilities, R(p) = p α−1 (1 − p) β−1 with α, β = 2. From here on we shall denote this regularization hyper-parameter as beta parameter. It is important to note that we are using R(p) as our regularizer and not − log(R(p)) (with α, β < 1) as might be more standard. This is because − log(R(p)) has a much stronger pull towards zero entropy. In general we note that unlike ternary weights, binary weights require more careful fine-tuning in order to perform well;)
                It would have been obvious for one of ordinary skill in the art before the effective filing date of the disclosure to combine the teachings of SHAYER with the disclosure of Ferguson since SHAYER teaches that a neural network can operate much faster with compact computing devices using the method of so called “LR-nets” (Local reparameterization 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEAN PAUL CASS whose telephone number is (571)270-1934.  The examiner can normally be reached on Monday to Friday 7 am to 7 pm; Saturday 10 am to 12 noon.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James J Lee can be reached on 571-270-5965.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JEAN PAUL CASS/Primary Examiner, Art Unit 3668