DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This office action is responsive to the amendment received 2/16/2022.

In the response to the Non-Final Office Action 8/17/2021, the applicant states that claims 1, 5, 13, and 14 have been amended. Claims 1-20 are pending in current application.

Claims 1, 5, 13, and 14 have been amended. In summary, claims 1-20 are pending in current application.

Response to Arguments
Applicant's arguments filed 2/16/2022 have been fully considered but they are not persuasive. 
Regarding to claim 1, the applicant argues that the art of record does not teach or disclose “generating a plurality of simulated frames by at least rendering the three-dimensional model within a set of real-world images at perspectives different from any perspective used to obtain the video data; [and] generating a data set that associates the label with the rendered three-dimensional object in each frame of the plurality of simulated frames”. The arguments 
Fathi discloses “generating a plurality of simulated frames by at least rendering the three- dimensional model within a set of real-world images”. For example, in paragraph [0012], Fathi teaches producing and generating validated 3D information by cross-referencing. In paragraph [0013], Fathi teaches generating and rendering 3D information from the scene. In paragraph [0050], Fathi teaches obtaining the plurality of 2D images from a video camera. In paragraph [0052], Fathi teaches that a plurality of 2D images are derived from video; Fathi further teaches sequential frames. In paragraph [0109], Fathi teaches that 3D information is derived from the plurality of 2D images of the scene; Fathi further teaches 3D information about the scene is provided for processing in predictive models; Fathi further more teaches the predictive models are used to identify and label the one or more objects of interest.
Micks discloses “generating a plurality of simulated frames by at least rendering the three- dimensional model within a set of real-world images at perspectives different from any perspective used to obtain the video data”. For example, in paragraph [0023], Micks teaches generating a scenario of models of vehicles moving along models of roads. In paragraph [0049], Micks teaches the various sensors have different frame or scanning rates. In Fig. 3 and paragraph [0050], Micks teaches simulating sensor perception; Micks further teaches the result of the simulating of sensor perception at step 306 may be a streams of sensor outputs, e.g. a series of images, an audio signal, a set of point clouds, a sets of reflections at different time points. In paragraph [0051], Micks teaches sensor outputs for each type of sensor were simulated. In paragraph [0023], Micks teaches detecting and identify lane-splitting vehicles; 
Fathi further discloses “generating a data set that associates the label with the rendered three-dimensional object in each frame of the plurality of simulated frames”. For example, in paragraph [0033], Fathi teaches the attributes of a given object are used to create the association with its corresponding label. In paragraphs [0052-0053], Fathi teaches image frames. In paragraph [0104], Fathi teaches generating predicted labels. In paragraph [0109], Fathi teaches 2D and 3D information are derived from the plurality of 2D images of the scene; Fathi further teaches defining weights for each of the predictive; Fathi further more teaches the predictive models are used to identify and label the one or more objects of interest.

Regarding to claim 5, the applicant argues that the art of record does not teach or disclose “generat[ing] a plurality of simulated frames by at least rendering a three- dimensional model of the object within a set of real-world images at perspectives different from any perspective in the instance of the content [and] generating a data set that associates a label of the object with the rendered three-dimensional object in each frame of the plurality of simulated frames”. The arguments have been fully considered, but they are not persuasive. The examiner cannot concur with the applicant for following reasons:
 “generate a plurality of simulated frames by at least rendering a three- dimensional model of the object within a set of real-world images at perspectives different from any perspective in the instance of the content”. For example, in paragraph [0023], Micks teaches generating a scenario of models of vehicles moving along models of roads. In paragraph [0049], Micks teaches the various sensors have different frame or scanning rates. In Fig. 3 and paragraph [0050], Micks teaches simulating sensor perception; Micks further teaches the result of the simulating of sensor perception at step 306 may be a streams of sensor outputs, e.g. a series of images, an audio signal, a set of point clouds, a sets of reflections at different time points. In paragraph [0051], Micks teaches sensor outputs for each type of sensor were simulated. In paragraph [0023], Micks teaches detecting and identifying lane-splitting vehicles; the scenario module 114a may add models of lane-splitting vehicles to the scenario moving in inter-lane areas between vehicles. In paragraph [0031], Micks teaches the obstacle identification module 132b analyzes the one or more image streams and identifies potential obstacles. In claim 1, Micks teaches rendering a three dimensional (3D) model.
Micks further discloses “generate a data set that associates a label of the object with the rendered three-dimensional object in each frame of the plurality of simulated frames”. For example, in Fig. 1 and paragraph [0019], Micks teaches the vehicle models are based on actual vehicles; Micks further teaches the database 104 stores vehicle models that include the geometry data of the vehicle. In paragraph [0023], Micks teaches generating a scenario of models of vehicles moving along models of roads. In paragraph [0049], Micks teaches the various sensors have different frame or scanning rates. In Fig. 3 and paragraph [0050], Micks teaches simulating sensor perception; Micks further teaches the result of the simulating of 
Fathi discloses “generate a data set that associates a label of the object”. For example, in paragraph [0104], Fathi teaches validated 2D and 3D information are produced by cross-referencing between the projective geometry information, clustered 3D information, and/or segmented 2D image information, and used to label the objects in the scene. In paragraph [0109], Micks teaches multiple labels are generated for the object(s) of interest.

Regarding to claim 13, the applicant argues that the art of record does not teach or disclose “generat[ing] a plurality of simulated frames by at least rendering a three-dimensional model of the object, based on the determined portions of instances of the content that correspond to the object, within a set of real- world images at perspectives different from any perspective in the content.” The arguments have been fully considered, but they are not persuasive. The examiner cannot concur with the applicant for following reasons:
Micks discloses generat[ing] a plurality of simulated frames by at least rendering a three-dimensional model of the object, based on the determined portions of instances of the content that correspond to the object, within a set of real- world images at perspectives different from any perspective in the content. For example, in paragraph [0023], Micks teaches generating a scenario of models of vehicles moving along models of roads. In paragraph [0049], Micks teaches the various sensors have different frame or scanning rates. In Fig. 3 and paragraph [0050], Micks teaches simulating sensor perception; Micks further teaches the result .

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 5, and 13 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications generating a plurality of simulated frames”. Therefore, the new claimed limitations “generating a plurality of simulated frames” as recited in claim 1, 5, and 13 are new matter.
Claims 2-4, 6-12, and 14-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph due to dependency of claims 1, 5, and 13.
Claim 13 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The specification describes “new simulated data can be generated rapidly” in paragraph [0024]. The specification further describes “generates simulated data” in paragraph [0025]. The specification further more describes “the corresponding frames of the video”. However, the specification does not describe “based on the determined portions of instances of the content that correspond to the object”. Therefore, the new claimed limitations “based on the determined portions of instances of the content that correspond to the object” as recited in claim 13 is new matter. 
Claims 14-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph due to dependency of claim 13. 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5 and 13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claims 5 and 13 recite the limitation " the generated training set" in lines 12-13.  There is insufficient antecedent basis for this limitation in the claim.


Claims 15 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Claims 15 and 16 recite the limitation "the mathematical three-dimensional model".  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 are rejected under 35 U.S.C. 103 as being unpatentable over Fathi (US 20170220887 A1) and in view of Micks (US 20170213149 A1).
Regarding to claim 1 (Currently Amended), Fathi discloses a computer-implemented method ([0013]: a method of generating information about one or more objects of interest in a scene; [0034]: systems and methods to generate information about one or more objects of interest in a scene; Fig. 1; [0145]: an example of a process 100 and a method; generate 2D 
obtaining video data of a three-dimensional object ([0023]: extract information about more than one object of interest in a scene; [0026]: depth image, video or any captured representation; [0050]: a video camera captures and obtains video and the plurality of 2D images; the plurality of 2D images are obtained from a moving camera device; [0065]: video camera captures video; [0134]: automatically detect, identify, and extract 3D objects; extracted objects; [0146]: the 3D information is generated from the 2D images; [0148]: 3D information is provided; [0174]: a video camera), the video data comprising a plurality of two-dimensional frames each capturing the three-dimensional object from a different perspective ([0026]: depth image, video or any captured representation; [0050]: the plurality of 2D images are obtained from a moving camera device; [0051]: video comprises at least about 24 frames per second; [0102]: capture multiple perspectives of the object  via the use of virtual views; [0151]: the 3D information includes all or part of the one or more objects of interest 230); 
generating, based on the video data, a three-dimensional model of the object ([0034]: 3D information includes 3D models; [0055]: generate 3D information; [0094]: predefined models are established through the use of annotated training data; extracted features are modelled; [0149]: the 3D information and 2D image information is processed to generate projective geometry information that combines the 3D information and 2D image information); 
obtaining a label for the three-dimensional object ([0104]: validated 2D and 3D information are used to label the objects in the scene; [0109]: multiple labels are generated for 
generating a plurality of simulated frames by at least rendering the three- dimensional model within a set of real-world images ([0012]: produce and generate validated 3D information by cross-referencing; [0013]: generate 3D information from the scene; [0050]: obtain the plurality of 2D images from a video camera; [0052]: a plurality of 2D images are derived from video; sequential frames; [0109]: 3D information is derived from the plurality of 2D images of the scene; 3D information about the scene is provided for processing in predictive models; the predictive models are used to identify and label the one or more objects of interest); 
generating a data set that associates the label with the rendered three-dimensional object in each frame of the plurality of simulated frames ([0033]: the attributes of a given object are used to create the association with its corresponding label; [0052-0053]: image frames; [0104]:  generate predicted labels; [0109]:2D and 3D information are derived from the plurality of 2D images of the scene; define weights for each of the predictive; the predictive models are used to identify and/or label the one or more objects of interest).
 Fathi fails to explicitly disclose: using the data set to train a model to be used for object recognition.
In same field of endeavor, Micks teaches:
generating, based on the video data, a three-dimensional model of the object (FIG. 1; [0019]: the vehicle models 106a are based on actual vehicles; [0020]: the vehicle model 106b includes a sensor model 108d; [0022]: the machine learning model 110 is trained using the 
generating a plurality of simulated frames by at least rendering the three- dimensional model within a set of real-world images at perspectives different from any perspective used to obtain the video data ([0023]: generate a scenario of models of vehicles moving along models of roads; [0049]: the various sensors have different frame or scanning rates; Fig. 3; [0050]: simulate sensor perception; the result of the simulating of sensor perception at step 306 may be a streams of sensor outputs, e.g. a series of images, an audio signal, a set of point clouds, a sets of reflections at different time points; [0051]: sensor outputs for each type of sensor were simulated; [0023]: detect and identify lane-splitting vehicles; the scenario module 114a may add models of lane-splitting vehicles to the scenario moving in inter-lane areas between vehicles; [0031]: the obstacle identification module 132b analyzes the one or more image streams and identifies potential obstacles; claim 1: a three dimensional (3D) model);
using the data set to train a model to be used for object recognition ([0022]: the machine learning model 110 is trained using the models 106a-106c; a deep neural network; [0025]: the machine learning model 110 is trained to identify and recognize the location of a lane-splitting vehicle; [0053]: a machine learning model 110 is trained to output for an input stream of data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Fathi to include generating, based on the video data, a three-dimensional model of the object; generating a plurality of simulated frames by at least rendering the three- dimensional model within a set of real-world images at perspectives different from any perspective used to obtain the video data; using the data set to train a model to be used for object recognition as taught by Micks. The motivation for doing so would have been to train the machine learning model 110; to train a model to provide an accurate and correct output based on the input stream of data as taught by Micks in paragraphs [0022], [0043] and [0053].

Regarding to claim 2 (Original), Fathi and in view of Micks discloses the computer-implemented method of claim 1, further comprising: 
obtaining an image of an additional three-dimensional object (Fathi; [0023]:  extract information about more than one object of interest in a scene; [0050]: a video camera captures and obtains the plurality of 2D images; the plurality of 2D images are obtained from a moving camera device; [0065]: a single passive video camera; [0134]: automatically detect, identify, and extract 2D/3D objects; extracted objects; [0146]: the 3D information is generated from the 2D images; [0148]: 3D information is provided; [0174]:  a video camera); and 
determining a location of the additional three-dimensional object using the three-dimensional model of the object and the model used for object recognition (Fathi; [0023]: extract information about more than one object of interest in a scene; the one or more objects are identified from the scene; [0044]: provide location information about the objects; [0055]: the 2D images appropriate for recognizing the one or more objects). 

Regarding to claim 3 (Original), Fathi and in view of Micks discloses the computer-implemented method of claim 1, wherein the video data is generated with an infrared image 
Fathi and in view of Micks further discloses wherein the video data is generated with an infrared image sensor, a radar sensor, or a LIDAR sensor (Micks; [0023]: video; [0028]: LIDAR, and RADAR; [0048]: a point cloud from the point of view of the LIDAR sensor; [0049]: point clouds from LIDAR, and reflections from RADAR; [0054]: a LIDAR sensor may product a point cloud, and RADAR and SONAR may record reflections of radio and sound).

Regarding to claim 4 (Original), Fathi and in view of Micks discloses the computer-implemented method of claim 1, further comprising: 
displaying a representation of a three-dimensional environment on an interactive video display terminal (Fathi; [0135]: interact with the objects; [0141]: provide the user with interactive feedback with regard to the image-acquisition parameters; [0142]: the user display of the output herein thereof is configured to provide user generated inputs; [0160]: one or more display devices; [0170]: one or more interaction devices, such as a touch pad or screen); 
obtaining, via the interactive video display terminal, information that identifies the three-dimensional object (Fathi; [0026]: identify a specific object in a digital image, point cloud, depth image, video or any captured representation; [0035]: allow one or more objects of interest in a scene to be detected, and recognized; [0144]: the image capture device is integrated is configured to elicit and receive from the user a selection of a region/area of interest in a captured images of the object of interest from which 2D image and 3D information 
obtaining a label associated with the three-dimensional object via the interactive video display terminal (Fathi; [0109]: multiple labels are generated for the objects of interest; a final label prediction is made with the assignment defined by the class or label; [0142]: the user display of the output herein thereof is configured to provide user generated inputs to facilitate and enhance generation of the plurality of 2D images, 3D information, and/or the extracted object information; [0144]: other information are provided, e.g., measurements, dimensions, topology, and labels). 

Claims 5-13 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Micks (US 20170213149 A1) and in view of Fathi (US 20170220887 A1).
Regarding to claim 5 (Currently Amended), Micks discloses a system ([0016]: a general purpose computer; methods, apparatus, and systems; [0019]: a server system and training of a detection system; Fig. 2; [0034]: computing device 200), comprising: 
one or more processors ([0016]: a processor of a general purpose computer); and 
memory storing executable instructions that, as a result of being executed by the one or more processors, cause the system to ([0014]: RAM; ROM; non-transitory medium; the instruction execution system; [0016]: these computer program instructions are provided to a processor of a general purpose computer, special purpose computer for execution): 
use a model generating algorithm to determine portions of instances of content that correspond to an object represented in the content (FIG. 1; [0019]: the vehicle models 106a are 
generate a plurality of simulated frames by at least rendering a three- dimensional model of the object within a set of real-world images at perspectives different from any perspective in the instance of the content ([0023]: generate a scenario of models of vehicles moving along models of roads; [0049]: the various sensors have different frame or scanning rates; Fig. 3; [0050]: simulate sensor perception; the result of the simulating of sensor perception at step 306 may be a streams of sensor outputs, e.g. a series of images, an audio signal, a set of point clouds, a sets of reflections at different time points; [0051]: sensor outputs for each type of sensor were simulated; [0023]: detect and identify lane-splitting vehicles; the scenario module 114a may add models of lane-splitting vehicles to the scenario moving in inter-lane areas between vehicles; [0031]: the obstacle identification module 132b analyzes the one or more image streams and identifies potential obstacles; claim 1: a three dimensional (3D) model);
generate a data set that associates a label of the object with the rendered three-dimensional object in each frame of the plurality of simulated frames (Fig. 1; [0019]: the vehicle models are based on actual vehicles; the database 104 stores vehicle models that include the geometry data of the vehicle; [0023]: generate a scenario of models of vehicles moving along models of roads; [0049]: the various sensors have different frame or scanning rates; Fig. 3; [0050]: simulate sensor perception; the result of the simulating of sensor perception at step 306 may be a streams of sensor outputs, e.g. a series of images, an audio signal, a set of point 
use the generated training set to train recognition of the object as associated with the label ([0022]: the machine learning model 110 is trained using the models 106a-106c; a deep neural network; [0025]: the machine learning model 110 is trained to identify and recognize the location of a lane-splitting vehicle; [0053]: a machine learning model 110 is trained to output for an input stream of data).
 Micks fails to explicitly disclose a label of the object. 
In same field of endeavor, Fathi teaches: generate a data set that associates a label of the object ([0104]: validated 2D and 3D information are produced by cross-referencing between the projective geometry information, clustered 3D information, and/or segmented 2D image information, and used to label the objects in the scene; [0109]: multiple labels are generated for the object(s) of interest).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Micks to include generate a data set that associates a label of the object as taught by Fathi. The motivation for doing so would have been to provide validated 2D and 3D information that can be produced by cross-referencing between the projective geometry information, clustered 3D information, and/or segmented 2D image information, and used to label the objects in the scene; to train one or more machine learning algorithms to recognize; to train multiple machine learning models as taught by Fathi in paragraphs [0012], [0028], and [0107].


obtaining an object label for the object (Fathi; [0104:  label the objects in the scene; [0109]: multiple labels are generated for the objects of interest);
 associating the object label with the portions of the instances of content (Fathi; [0104]: validated 2D and 3D information are produced by cross-referencing between the projective geometry information, clustered 3D information, and/or segmented 2D image information, and used to label the objects in the scene; [0109]: multiple labels are generated for the object(s) of interest);
Micks in view of Fathi further discloses: 
identifying the object in the portions of the instances of content (Micks; [0019]: training of a detection system; [0023]: generate a scenario of models of vehicles moving along models of roads; detect lane-splitting vehicles; [0031]: the obstacle identification module 132b analyzes the one or more image streams and identifies potential obstacles); and 
associating the object label with the portions of the instances of content (Micks; Fig. 1; [0019]: the vehicle models 106a are based on actual vehicles; the database 104 stores vehicle models that include the geometry data of the vehicle). 

Regarding to claim 7 (Original), Micks in view of Fathi discloses the system of claim 6, wherein: 

the object is identified by a user using an interactive display terminal (Fathi; [0035]: allows one or more objects of interest in a scene to be detected, recognized; [0142]: the user display of the output herein thereof is configured to provide user generated inputs to facilitate and enhance generation of the plurality of 2D images, 3D information, and/or the extracted object information; [0144]: other information are provided, e.g., measurements, dimensions, topology, and labels); and 
the object label is obtained from the user (Fathi; [0104]:  generate predicted labels; [0109]: 2D and 3D information are derived from the plurality of 2D images of the scene; define weights for each of the predictive; the predictive models are used to identify and/or label the one or more objects of interest; [0142]: the user display of the output herein thereof is configured to provide user generated inputs to facilitate and enhance generation of the plurality of 2D images, 3D information, and/or the extracted object information; [0144]: other information are provided, e.g., measurements, dimensions, topology, and labels.). 

Regarding to claim 8 (Original), Micks in view of Fathi discloses the system of claim 5, wherein the executable instructions, as a result of being executed by the one or more processors (same as rejected in claim 5), further cause the system to: 

identify the portions of the instances of content the object using the three-dimensional model (Micks; [0019]: training of a detection system; [0023]: generate a scenario of models of vehicles moving along models of roads; detect lane-splitting vehicles; [0031]:  the obstacle identification module 132b analyzes the one or more image streams and identifies potential obstacles; claim 1: a three dimensional (3D) model). 

Regarding to claim 9 (Original), Micks in view of Fathi discloses the system of claim 5, wherein: 
the instances of content are images (Micks; [0028]: the controller 122 receives one or more image streams from one or more imaging devices 124; one or more cameras may be mounted to the vehicle and output image streams received by the controller 122); and 
the images are generated in part by adding a rendering of the object to each image in a set of background images (Fathi; [0005]: generate the image having well-defined illumination, background, and object pose, or object position and orientation of the target object relative to the camera; [0076]: segmentation and clustering of the 2D image and 3D information, respectively, of the scenes and the one or more objects in the scene may be conducted by associating regions in the image information; alternative segmentation and clustering may comprise other types of perceptual grouping, such as, foreground vs. background elements; [0096]: region grow is premised on the understanding that pixels belonging to a specific object 

Regarding to claim 10 (Original), Micks in view of Fathi discloses the system of claim 9, wherein each image of the images includes a rendering of the object where the object has a different orientation (Fathi; [0005]: generate the image having well-defined illumination, background, and object pose, or object position and orientation of the target object relative to the camera; [0043]: provide information about the orientation and position of the object(s) in a 3D environment; [0044-0045]: the methodology herein can be used to generate information about the orientation or placement of object(s) in the scene; the methodology can generate information about the geometry of all or part of an object(s) of interest, such as angles, and orientation of elements to other elements). 

Regarding to claim 11 (Original), Micks in view of Fathi discloses the system of claim 5, wherein: 
the data set is provided to a machine learning system (Micks; [0022]: the database 104 may store a machine learning model 110; the machine learning model 110 is trained using the models; the machine learning model 110 may be a deep neural network, Bayesian network, or other type of machine learning model; [0025]: the sensor outputs are provided as an input data set); and 
the model is a machine learning model that configures the machine learning system to identify the object in additional instances of content (Micks; [0022]: the machine learning 

Regarding to claim 12 (Original), Micks in view of Fathi discloses the system of claim 11, wherein: 
the instances of content are frames of a first video stream (Micks; [0023]: scenarios are modeled based on video; [0039]: video projection device; [0050]: a streams of sensor outputs, e.g. a series of images; [0054]: the imaging devices may product streams of images); and 
the additional instances of content are frames of a second video stream (Micks; [0031]: the obstacle identification module 132 b analyzes the one or more image streams and identifies potential obstacles; [0049]: the various sensors may have different frame or scanning rates; [0052]: the machine learning model 110 may be then be generated 312 by updating the model 110 for the sensor outputs input 308 for each time step and the vehicle 404 location at each time step).
 Micks in view of Fathi further discloses the additional instances of content are frames of a second video stream (Fathi; [0050]: the plurality of 2D images are obtained from a video camera; [0051]: video suitable for use in the present disclosure comprises at least about 24 frames).

Regarding to claim 13 (Currently Amended), Micks discloses a non-transitory computer-readable storage medium having stored thereon executable instructions that, as a result of 
generate a plurality of simulated frames by at least rendering a three-dimensional model of the object, based on the determined portions of instances of the content that correspond to the object, within a set of real-world images at perspectives different from any perspective in the content ([0023]: generate a scenario of models of vehicles moving along models of roads; [0049]: the various sensors have different frame or scanning rates; Fig. 3; [0050]: simulate sensor perception; the result of the simulating of sensor perception at step 306 may be a streams of sensor outputs, e.g. a series of images, an audio signal, a set of point clouds, a sets of reflections at different time points; [0051]: sensor outputs for each type of sensor were simulated; [0023]: detect and identify lane-splitting vehicles; the scenario module 114a may add models of lane-splitting vehicles to the scenario moving in inter-lane areas between vehicles; [0031]: the obstacle identification module 132b analyzes the one or more image streams and identifies potential obstacles; claim 1: a three dimensional (3D) model);
The rest limitations are similar to claim limitations recited in claim 5. Therefore, same rational used to reject claim 5 is also used to reject rest limitations. 



Regarding to claim 18 (Original), Micks in view of Fathi discloses the non-transitory computer-readable storage medium of claim 13, wherein the executable instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to: 
generate a simulated environment (Micks; [0021]: enable simulation of perception of the objects; [0024]: the training engine 112 includes a sensor simulation module 114b; a perception of the scenario may be simulated by the sensor simulation module 114b); 
add the object to the simulated environment (Micks; [0024]: the training engine 112 includes a sensor simulation module 114b; a perception of the scenario may be simulated by the sensor simulation module 114b; [0025]: identify the location of a lane-splitting vehicle by inputting the simulated sensor outputs as well as the location of the model of a lane-splitting vehicle in the scenario; [0043]: a lane-splitting vehicle 404 may be placed in the mode); and 


Regarding to claim 19 (Original), Micks in view of Fathi discloses the non-transitory computer-readable storage medium of claim 13, wherein the instances of content are video frames (Micks; [0023]: scenarios may be modeled based on video; [0031]: identify vehicle images in the image stream; [0039]: video projection device). 

Claims 14-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Micks (US 20170213149 A1) in view of Fathi (US 20170220887 A1), and further in view of Shotton (US 20130156297 A1).
Regarding to claim 14 (Currently Amended), Micks in view of Fathi discloses the non-transitory computer-readable storage medium of claim 13, wherein the executable instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to (same as rejected in claim 13): 
obtain the object label for the object based on the determined portions of instances of content that correspond to an object represented in the content (Fathi; [0104:  label the objects in the scene; [0109]: multiple labels are generated for the objects of interest); and
associate the object label with the portions of the instances of content (Fathi; [0104]: validated 2D and 3D information are produced by cross-referencing between the projective geometry information, clustered 3D information, and/or segmented 2D image information, and used to label the objects in the scene; [0109]: multiple labels are generated for the object(s) of interest).
Micks in view of Fathi further discloses:
identifying the object in the portions of the instances of content (Micks; [0019]: training of a detection system; [0023]: generate a scenario of models of vehicles moving along models of roads; detect lane-splitting vehicles; [0031]: the obstacle identification module 132b analyzes the one or more image streams and identifies potential obstacles); and 
associate the object label with the portions of the instances of content (Micks; Fig. 1; [0019]: the vehicle models 106a are based on actual vehicles; the database 104 stores vehicle models that include the geometry data of the vehicle)
Micks in view of Fathi fails to explicitly disclose: adding the labeled training image to the data set.
In same field of endeavor, Shotton teaches: 
adding a rendering of the mathematical three-dimensional model to a real-world image to produce a labeled training image ([0032]: applying a function learnt by the machine learning system 102; [0039]: the labels may be obtained by applying an interest point detection process to an image rendered from the scene reconstruction); 
adding the labeled training image to the data set ([0027]: label image elements; [0029]: human judges label empirical images with ground truth labels for each of the different tasks; 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Micks in view of Fathi adding the labeled training image to the data set as taught by Shotton. The motivation for doing so would have been that wherein a machine learning system learns at least one parameter of a function for performing the image processing task by using the training data; to generate suitable training data as taught by Shotton in paragraphs [0007] and [0029].
 
Regarding to claim 15 (Original), Micks in view of Fathi and Shotton discloses the non-transitory computer-readable storage medium of claim 14, wherein the executable instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to (same as rejected in claim 13): 
generate a plurality of renderings of the mathematical three-dimensional model in a variety of different orientations (Micks; FIG. 1; [0019]: the vehicle models 106a are based on actual vehicles; [0020]: the vehicle model 106b additionally includes a sensor model 108d; a sensor model 108d indicates the locations of the sensors on the vehicle, the orientations of the sensors, and one or more descriptors of the sensors; [0023]: scenarios are modeled based on video; [0024]); 
add the plurality of renderings to one or more images to produce a plurality of training images (Fathi; [0031]: the training set includes a well-defined association between each of the 
add the plurality of training images to the data set (Shotton; [0027]: label image elements; [0029]: human judges label empirical images with ground truth labels for each of the different tasks; [0031]: form labeled images from a 2 or higher dimensional model of a scene; FIG. 2; [0032]: image elements are labeled; [0039]: the labels may be obtained by applying an interest point detection process to an image rendered from the scene reconstruction). 

Regarding to claim 16 (Original), Micks in view of Fathi and Shotton discloses the non-transitory computer-readable storage medium of claim 14, wherein the executable instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to (same as rejected in claim 13): 
generate a plurality of renderings of the mathematical three-dimensional model using a variety of different illumination conditions (Micks; FIG. 1; [0019]: the vehicle models 106a are based on actual vehicles; [0020]: the vehicle model 106b additionally includes a sensor model 108d; [0023]: scenarios are modeled based on video; [0024]); 
add the plurality of renderings to one or more images to produce a plurality of training images (Fathi; [0044-0045]: the methodology herein can be used to generate information about 
add the plurality of training images to the data set (Shotton; [0027]: label image elements; [0029]: human judges label empirical images with ground truth labels for each of the different tasks; [0031]: form labeled images from a 2 or higher dimensional model of a scene; FIG. 2; [0032]: image elements are labeled; [0039]: the labels may be obtained by applying an interest point detection process to an image rendered from the scene reconstruction). 

Regarding to claim 20 (Original), Micks in view of Fathi and Shotton discloses the non-transitory computer-readable storage medium of claim 13, wherein the model generating algorithm is a simultaneous localization (Micks; [0020]: a sensor model 108d indicates the locations of the sensors on the vehicle, the orientations of the sensors; [0024]: the training engine 112 may include a sensor simulation module 114 b. In particular, for a scenario, and a vehicle included in the scenario including sensor model data 108 d, a perception of the scenario by the sensors may be simulated by the sensor simulation module 114 b).
Micks in view of Fathi and Shotton further discloses wherein the model generating algorithm is a simultaneous localization and mapping algorithm (Shotton; [0030]: find a mapping from image elements of one image to image elements of another image; [0075]: simultaneous localization and mapping; the real-time camera tracking system 712 may also .

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hai Tao Sun whose telephone number is (571)272-5630. The examiner can normally be reached 9:00AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 5712727794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI TAO SUN/Primary Examiner, Art Unit 2616