DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to communication dated 05/17/2022.
Claims 1, 2, 3, 5, 8, 9, 10, 12, 15, 16, 17, 18, 19 are amended.
Claims 1 – 20 are presented for examination.

Final Action
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 




Response to Arguments
The Applicant argues that they have amended the claims and that the art of record does not make obvious the amended limitations.

In response the Examiner notes that the amended claim language states: “... generating, via a trained machine-learning model... the sensor fusion representation being defined by sensor-fusion data that corresponds to the simulation object...”

Wu makes obvious “generating, via a trained machine-learning model” (Figure 7 block 702 recites “training image data” block 704 “training sensor data”; Figure 9 block B914: “Train a sequential deep Neural Network”; Figure 11 block B1112; Figure 3 B1306; Par 5: “... data for training a sequential DNN...”; par 30: “... by training the DNN(s)...” par 80, 81, 82, 83, 84: “... training a sequential deep neural network... during training, the sequential DNN may use ground truth data that is generated from  a combination of training image data and training sensor data (e.g., using cross-sensor fusion)... training sensor data may be... fused, with the training image data... correlated... ground truth generation 706, as described herein, may include automatic ground truth generation using a combination of the training image data 702 and the training sensor data – such as... through cross-sensor fusion...” NOTE: this clearly teaches to train a neural network and a neural network is a machine-learning model and that the machine-learning model is trained using sensor-fusion training data.)

Wu also makes obvious “a sensor-fusion representation  as output upon receiving the visualization input” (par 84: “ground truth data generation 706, as described herein, may include automatic truth generation... produced from... machine-automated (e.g., using feature analysis and learning...” NOTE: this clearly teaches to use machine learning to output machine-automated sensor-fusion representations (i.e., ground truth) ).

Therefore; Wu teaches ground truth data for training a machine-learning model where the ground truth data is sensor-fusion data and that the ground-truth data may also be produced automatically by machine-learning methods (par 84) and this results in a trained machine-learning model which accepts as input sensor-fusion data.

As outlined in the Office action the claimed “template” is interpreted as a simulated or virtual object. While Wu does not explicitly teach a simulated or virtual object, as outlined in the Office action, Nygaard_2017 makes a “template” obvious (e.g., simulated or virtual objects) obvious at least at Figure 5 element 510; Figure 11 “object type” = “pedestrian” with characteristics illustrates a template. Col 3 lines 25 – 30: “… fictitious sensor data may include… physical characteristics (size, shape, etc.) for each virtual object at a series of different times…”).

Further because Wu teaches the sensor-fusion data and Nygaard_2017 teaches the claimed template it is the combination of Wu in view of Nygaard_2017 that makes obvious that “the sensor-fusion representation being defined by sensor-fusion data that corresponds to the simulation object” because if the environment includes simulated objects such as simulated cars and pedestrians (Nygaard_2017 Figure 8 elements 850, 860) then the sensor fusion data includes the data for the simulated objects.

Additionally; Wu makes obvious that the machine-learning model is a generative adversarial network as recited in claim 3 at paragraph 40 which recites “... the sequential DNN(s) 104 may include.... generative adversarial... types of machine learning models...”).

While; it would have been obvious to one of ordinary skill in the art that a machine-learning model known as a generative adversarial network which is trained on sensor-fusion data will generate, as output, sensor-fusion data, Wu does not explicitly recite this.

Therefore; the argument is persuasive and the rejection is withdrawn; however, a new grounds of rejection is presented below.

End Response to Arguments



Claim Interpretation
Claims 1, 8, and 15 each recite the phrase “realistic” which according to plain English is a relative term or descriptor. Under normal circumstances such a term, if not properly defined in the specification, would be found to be indefinite under 35 UC 112(b). However, the Examiner notes that paragraph 46 of the instant specification states: “… model versions or template versions (e.g., non-sensor-based versions) of these real-world objects, thereby not be accurate or realistic input for the vehicle processing system 30 compared to real-world detections, which are captured by sensors 220A…”. This discloses that “realistic” is describing the template (non-sensor-based) versions of objects and means that template versions of objects are as accurate as the real-world sensor detections (versions) of objects.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


(1) Claims 1 - 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu (US 2020/0293064 A1 62/819,412 filed on March 15, 2019) in view of Nygaard_2017 (US 9,836,895) in view of Wei_2016 (US 2016/0314224 A1) in view of Shaikh_2017 (Introductory guide to generative adversarial networks (GANs) and their promise! dated June 15, 2017)

Claim 1. Wu makes obvious “A system (par 5: “… the system of the present disclosure…”; par 7: “…the present system an methods…”; par 230: “… the system 1476 may include server(s) 1478, network(s) 1490), and vehicles, including the vehicle 1400…”; Figure 15 computing system 1500; par 237: “… computing device 1500 suitable for use in implementing some embodiments of the present disclosure…”) the system (Fig 15 computing system 1500, Figure 14D element 1476) comprising: a non-transitory computer readable medium (Fig. 15 memory 1504; par 237: “… memory 1504…” par 241: “… the memory 1504 may store-computer-readable instructions… computer storage media does not comprise signals per se…”) including a visualization of a scene (Figure 4 block B402: “receive image data representative of a sequence of images”; Figure 8; par 35: “… the image data 102 may include data representative of images of a field of view of one or more camera of a vehicle, such as stereo camera(s)…”; par 56: “… the method of providing the input data (e.g., the image data… images are processed by the sequential DNN 104… feature map for each image…”; par 231: “… the server(s) 1478 may receive… image data…”) that includes a object within a region (Figure 8 objects 814, 808, 802; par 30: “the image data… data representative of a current location of an object… locations of an object…”; par 41: “… predict a bounding shape for each of the objects in the image…”); a processing system communicatively connected to the non-transitory computer readable medium, the processing system including at least one processing device and being configured to execute computer readable data (Figure 14D, Figure 15) that implements a method (par 237: “FIG. 15 is a block diagram of an example computing device 1500 suitable for use in implementing some embodiments of the present disclosure…” par 243: “… computing device 1500 to perform one or more of the methods and/or processes described herein…”) that includes: generating, via a trained machine-learning model (Figure 7 block 702 recites “training image data” block 704 “training sensor data”; Figure 9 block B914: “Train a sequential deep Neural Network”; Figure 11 block B1112; Figure 3 B1306; Par 5: “... data for training a sequential DNN...”; par 30: “... by training the DNN(s)...” par 80, 81, 82, 83, 84: “... training a sequential deep neural network... during training, the sequential DNN may use ground truth data that is generated from  a combination of training image data and training sensor data (e.g., using cross-sensor fusion)... training sensor data may be... fused, with the training image data... correlated... ground truth generation 706, as described herein, may include automatic ground truth generation using a combination of the training image data 702 and the training sensor data – such as... through cross-sensor fusion...” NOTE: this clearly teaches to train a neural network and a neural network is a machine-learning model and that the machine-learning model is trained using sensor-fusion training data.) a sensor-fusion representation upon receiving the visualization as input (par 5: “… using cross-sensor fusion…”; par 16: “FIG. 8 is an example visualization of a cross-sensor fusions technique for ground truth generation…”; par 29: “… Deep neural network (DNN(s)) of the present disclosure may be trained using image data correlated with sensor data… such as by using cross-sensor fusion techniques… using a combination of image data and sensor data…”; par 49: “… applying cross-sensor fusion or another sensor data to an image data…”), 
the sensor-fusion representation being defined by sensor-fusion data that corresponds to the  object (FIG 7)
 and generating a  scene with a sensor-fusion detection estimate of the simulation object within the region, the sensor-fusion detection estimate including object contour data indicating bounds of the sensor-fusion representation (par 31: “… output by the DNN(s) as location or origin, or centers, of bounding shapes in addition to dimension information (e.g., lengths of sides of the bounding shape… bounding shape in addition to dimension information…”; par 41: “… the sequential DNN 104 may also predict a bounding shape for each of the objects n the image, such as bounding shape 204 for the bus 202, bounding shape 208 for the vehicle 206, and bounding shape 212 for the pedestrial 210. In the visualization 200, the bounding shapes may be represented with borders…”).

While Wu teaches to perform simulations to generate training data for the DNN (par 232: “the server(s) 1478 may be used to train machine learning models (e.g., neural networks) based on training data. The training data may be generated by the vehicles, and/or may be generated in a simulation (e.g., using a game engine). In some examples, the training data is tagged…”; par 171: “… determine the position and extent of objects (e.g., within a world model), to generate real-time visualization simulations…”); this does not explicitly each “a template of a simulation object” used “for generating a realistic simulation.”

NOTE: As outlined above in the claim interpretation a “realistic simulation” is an object in a simulation that is generated based upon an object template rather than sensor data.

Therefore Wu does not explicitly teach: “for generating a realistic simulation” nor “template of a simulation” nor “of the template” nor “simulation of the scene” nor “instead of the template” nor “simulation” nor sensor fusion representation “as output” of the trained machine-learning model.

Nygaard_2017; however, makes obvious “for generating a realistic simulation” (abstract: “… using virtual objects… and virtual object data is received corresponding to a virtual object in the autonomous vehicle’s environment. The virtual object represents a real object that is not in the vehicle’s environment. The autonomous vehicle is maneuvered based on both the sensor data and the virtual object data…”; Figure 8 element 850, 860; col 1 lines 29 – 40: “… testing an autonomous vehicle using virtual objects… receiving virtual object data corresponding to a virtual object… the virtual object representing a real object that is not in the vehicle’s environment…”

and “template of a simulation object” and “of the template” (Figure 5 element 510; Figure 11 “object type” = “pedestrian” with characteristics illustrates a template. Col 3 lines 25 – 30: “… fictitious sensor data may include… physical characteristics (size, shape, etc.) for each virtual object at a series of different times…”)

Wu and Nygaard_2017 are analogous art because they are from the same field of endeavor called autonomous vehicles. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Wu and Nygaard_2017. The rationale for doing so would have been that Wu teaches “to augment the training images such that the training data size and robustness of the system is increased” (par 105) where images are sensor data from stereo cameras (par 135: “stereo camera”) and 

Nygaard_2017 teaches that sensor data can be augmented with virtual object data to generate simulated/virtual objects because “it can be difficult, expensive, and dangerous to set up certain real-life scenarios” (col 1 lines 20 – 25). Therefore it would have been obvious to combine Wu and Nygaard_2017 for the benefit of training and testing scenarios which would be too dangerous in real life to obtain the invention as specified in the claims.


While Wu teaches a simulation, as outlined above, as while both Wu and Nygaard_2017 teaches a bird’s eye view of the world space (Wu par 32: “… in a bird’s-eye view in 2D world space…”; Nygaard_2017 col 2 lines 18 – 30: “… bird’s eye view…”) and while Nygaard_2017 teaches to create virtual object based on object template; this does not explicitly teach a “simulation of the scene” and “instead of the template” nor “simulation” nor sensor fusion representation “as output” of the trained machine-learning model.

Wei_2016; however, makes obvious “simulation of the scene” and “simulation” (Figure 8 block 358: “… in the simulated virtual environment during the simulated mission of the autonomous vehicle…”; par 4: “… a method for simulating a mission for an autonomous vehicle… in the simulated virtual environment during the simulated mission of the autonomous vehicle…”) and “instead of the template” (par 26: “… view 152 can correspond to an image of the dynamic object in a user-recognizable manner… a graphical rendering… or a video image, such as can be provided via a camera and/or other types of electro-optical imaging sensors… various texture mapped mesh models can be associated with the dynamic object…”)

Wu and Nygaard_2017 and Wei_2016 are analogous art because they are from the same field of endeavor called autonomous vehicles. Before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Wu and Wei_2016. The rationale for doing so would have been that Wu teaches to train an autonomous vehicle using simulated data (par 232) and Wei_2016 teaches to test an autonomous vehicle using simulated objects. Therefore; it would have been obvious to combine Wu and Wei_2016 for the benefit of training and testing an autonomous vehicle to ensure it functions correctly to obtain the invention as specified in the claims.

Wu and Nygaard_2017 and Wei_2016 does not explicitly teach sensor fusion representation “as output” of the trained machine-learning model.

Shaikh_2017 makes obvious sensor fusion representation “as output” of the trained machine-learning model (pages 6 – 7 steps 3, 4, 5, 6. The GANs outputs from the discriminator that is input to the generator and the outputs from the generator are inputs to the discriminator.)

Wu and Shaikh_2017 are analogous art because they are from the same field of endeavor called machine learning. before the effective filing date it would have been obvious to a person of ordinary skill in the art to combine Wu and Shaikh_2017.
The rationale for doing so would have been that Wu teaches to have a trained generative adversarial network and Shaikh_2017 teaches the fundamentals of generative adversarial networks including training them. Therefore it would have been obvious to combine Wu and Shaikh_2017 for the benefit of training a generative adversarial network to obtain the invention as specified in the claims.


Claim 8. The limitations of claim 8 are substantially the same as those of claim 1 and are rejected due to the same reasons as outlined above for claim 1.

Claim 15. The limitations of claim 15 are substantially the same as those of claim 1 and are rejected due to the same reasons as outlined above for claim 1.

Claims 2, 9, and 16.  Wu and Nygaard_2017 and Wei_2016 and Shaikh_2017makes obvious all the limitations of claim 1, 8, and 16 as outlined above. Wu makes obvious the trained machine-learning model is trained with (i) other sensor-fusion obtained from sensors during real-world drives of vehicles and (ii) annotations identifying object contour data of detections of objects from among the sensor-fusion data” (par 81: “… the sequential DNN may use ground truth data that is generated from a combination of training image data and training sensor data (e.g., using cross-sensor fusion)… to generate bounding boxes corresponding to locations of objects…”; par 84: “… training data… through cross-sensor fusion. As a result of the labels or annotations representing ground truth being automatically generated… produced from real and/or raw data (e.g., designed and produced from real-world data…” NOTE: real-world data is real-world drives of vehicles; par 94: “… data augmentation… images may be similarly augmented such that the ground truth labels and annotations are accurate…”; par 232: “… the training data may be generated by the vehicles… in some examples, the training data is tagged (e.g., where the neural network benefits from supervised learning…”; par 35: “… the image data 102 captured from a forward-facing camera… and/or boundaries of the driving surface…”).

Also Shaikh_2017 makes obvious “the trained machine-learning model is trained with (i) other sensor-fusion obtained from sensors during real-world drives of vehicles” (page 7 step 3: “train discriminator on real data for n epochs...”).


Claim 3, 10, 17. Wu and Nygaard_2017 and Wei_2016 and Shaikh_2017makes obvious all the limitations of claim 1, 8, and 16 as outlined above. Wu also makes obvious “the trained machine-learning model is a generative adversarial network (GAN) model (paragraph 40 which recites “... the sequential DNN(s) 104 may include.... generative adversarial... types of machine learning models...”) the processing system is configured to generate a sensor-fusion occupancy map directly from the visualization via the GAN model in which the sensor-fusion representation is a part of the sensor-fusion occupancy map (par 136: “… location data (e.g., the HD map 1422 of FIG 14C), location data (e.g., the vehicle’s location 1400 location, such as on a map), direction, location of other vehicles (e.g., an occupancy grid), information about objects…”; par 142: “… generating an occupancy grid…”; par 145: “… create and update the occupancy grid…”; par 5: “… the system of the present disclosure uses correlations between sensor data and image data to generate ground truth data for training a sequential DNN to predict… 2D object motion, and 3D object motion…”; par 40: “the sequential DNN(s) 104 may include… generative adversarial…”); and the processing system is configured to extract the object contour data based on occupancy criteria of the sensor-fusion occupancy map and provide the object contour data as the sensor-fusion detection estimate” (par 81: “… the sequential DNN may use… cross-sensor fusion… to generate bounding boxes corresponding to location of objects in each image…”; page 26 item 3: “… generating… a bounding shape corresponding to the object…”)

Claim 4, 11, 18. Wu and Nygaard_2017 and Wei_2016 and Shaikh_2017 makes obvious all the limitations of claim 1, 8, and 16 as outlined above. Wu also makes obvious “wherein the visualization includes a multi-channel pixel image in which the simulation object is in a channel for simulation objects that is distinct from the other channel” (par 135: “… stereo camera…”; par 144: “one or more stereo cameras 1468 may also be included in a front-facing configuration…”; par 146: “… stereo camera…”; par 139: “… clear pixel camera, such as cameras with an RCCC, and RCCB, and/or RGBC color filter array may be used…”).

Claim 5, 12, 19. Wu and Nygaard_2017 and Wei_2016 and Shaikh_2017 makes obvious all the limitations of claim 1, 8, and 16 as outlined above. Nygaard_2017 also makes obvious “the processing system is configured to receive location data of the simulation object as input along with the visualization (col 3 lines 5 – 15: “… behavior plan may include an object path. the simplest object path may include a linear path between the initial geographic location and a second geographic location… a series of locations as opposed to only two… as an example, the fictious sensor data may include information such as location, direction, speed, physical characteristics (size, shape, etc.) for each virtual object…”).

Wu further makes obvious “to generate the sensor-fusion representation of the simulation object via the trained machine-learning model; and the trained machine-learning model includes a generative adversarial network (GAN) model (par 40: “… the sequential DNN(s) 104 may include… this is not intended to be limiting. For example, and without limitation, the sequential DNN(s) 104 may more broadly include any type of machine learning model, such as… generative adversarial…”).

Claim 6, 13, 20. Wu and Nygaard_2017 and Wei_2016 and Shaikh_2017 makes obvious all the limitations of claim 1, 8, and 16 as outlined above. Wu further makes obvious “wherein the visualization includes a two-dimensional top view of the simulation object within the region” (par 32: “… in a bird’s-eye view in 2D world space…”). Nygaard_2017 also makes obvious “wherein the visualization includes a two-dimensional top view of the simulation object within the region” (col 2 lines 17 – 30: “… bird’s eye view of a selection of roadway…”).

Claim 7, 14. Wu and Nygaard_2017 and Wei_2016 and Shaikh_2017 makes obvious all the limitations of claim 1, 8, and 16 as outlined above. Wu further makes obvious “wherein the sensor-fusion representation is based on a plurality of sensors including at least a camera, a satellite-based sensor, a light detection and ranging sensor, and a radar sensor” (par 33: “… LIDAR… RADAR… SONAR…”; par 35: “… more than one camera or other sensor (e.g., LIDAR sensor, RADAR sensor, etc.) may be used to incorporate multiple fields of view…”). Nygaard_2017 also makes obvious further makes obvious “wherein the sensor-fusion representation is based on a plurality of sensors including at least a camera, a satellite-based sensor, a light detection and ranging sensor, and a radar sensor” (col 1 lines 10 – 15: “… using various sensors such as camera, radar, lasers, and other similar devices…”).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN S COOK whose telephone number is (571)272-4276. The examiner can normally be reached 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamini S. Shah can be reached on 571-272-2279. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRIAN S COOK/Primary Examiner, Art Unit 2146