Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Preliminary Amendment
The preliminary amendment filed on February 25, 2021 has been entered.
In view of the amendment to the claims, the amendment of claims 6, 9-11, 13, 15, 18 and 22 have been acknowledged. Claims 20 and 21 have been canceled.

The preliminary amendment filed on September 16, 2021 has been entered.
In view of the amendment to the claims, claim 22 has been canceled. Claims 1-19 are pending in the present application.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 7, 9-10 and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Tremblay et al ("Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW); pages 1082-1090, June 18, 2018, listed in IDS submitted by Applicant on 02/25/2021) in view of Tremblay .

	Regarding claim 1, Tremblay discloses a method implemented by one or more processors, the method comprising: 
identifying a size at which to render a foreground three-dimensional (3D) object model in a foreground layer for a synthetic image (Page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center) are rendered on top of a random background (left) along with random flying distractors (geometric shapes next to the background images) in a scene; page 1085, top, FIG. 2, second row shows a car as a "foreground three-dimensional object" rendered. Thus, a certain size of synthetic objects “car” is identified to render a foreground three-dimensional (3D) object model in a foreground layer); 
for each of a plurality of randomly selected background 3D object models (Page 1083, section 3,  a random number of geometric shapes are added to the scene; page 1084, top, FIG. 1, random flying distractors (geometric shapes next to the background images) in a scene): 
rendering the background 3D object model, at a corresponding background location in a background layer for the synthetic image, with a corresponding rotation (Page 1083, section 3,  a random number of geometric shapes are added to the scene. We call these flying distractors. Random textures are then applied to both the objects of interest and the flying distractors. A random number of lights of different types are inserted at random locations, and the scene is rendered from a random camera viewpoint, after which the result is composed over a random background image; page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center) are rendered on top of a random background (left) along with random flying distractors (geometric shapes next to the background images) in a scene. Thus, a random number of geometric shapes are added at a corresponding background location in a background layer for the synthetic image); 
rendering the foreground 3D object model at a foreground location in the foreground layer (Page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center) are rendered on top of a random background (left) along with random flying distractors (geometric shapes next to the background images) in a scene), the rendering of the foreground 3D object model being at the size and being at a given rotation of the foreground 3D object model (Page 1083, section 3, we begin with 3D models of objects of interest (such as cars). A random number of these objects are placed in a 3D scene at random positions and orientations. As shown in FIGS. 1 and 2. The car is rendered as the foreground 3D object model at a certain size and a certain orientation “rotation”); 
generating the synthetic image based on fusing the background layer and the foreground layer (Page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center) are rendered on top of a random background (left) along with random flying distractors (geometric shapes next to the background images) in a scene with random lighting from random viewpoints. Before rendering, random texture is applied to the objects of interest as well as to the flying distractors. The resulting images, along with automatically-generated ground truth (right), are used for training a deep neural network); 
assigning, to the synthetic image, a ground truth label for the rendering of the foreground 3D object model (Page 1083, section 3, the resulting images, with automatically generated ground truth labels (e.g., bounding boxes), are used for training the neural network); and 
providing a training instance, that includes the synthetic image paired with the ground truth label, for training of at least one machine learning model based on the training instance (Page 1083, section 3, a random number of these objects are placed in a 3D scene at random positions and orientations. To better enable the network to learn to ignore objects in the scene that are not of interest, a random number of geometric shapes are added to the scene. We call these flying distractors. Random textures are then applied to both the objects of interest and the flying distractors. A random number of lights of different types are inserted at random locations, and the scene is rendered from a random camera viewpoint, after which the result is composed over a random background image. The resulting images, with automatically generated ground truth labels (e.g., bounding boxes), are used for training the neural network).
	However, Tremblay dose not specifically disclose rendering the background 3D object model with a corresponding size that is determined based on the size at which the foreground 3D object model is to be rendered.
	In the similar field of endeavor, Tremblay_2018 discloses (Paragraph [0021], FIG. 1A illustrates a block diagram of a labeled training data generation system 100, in accordance with an embodiment.  The labeled training data generation system 100 includes a graphics processing unit (GPU) 110, a task-specific training data computation unit 115, and an input image generator 120; paragraph [0038], FIG. 2A illustrates a block diagram of another labeled training data generation system 200, in accordance with an embodiment.  The labeled training data generation system 200 includes a graphics processing unit (GPU) 110, the task-specific training data computation unit 115, and an input image generator 220) rendering the background 3D object model with a corresponding size (Paragraph [0037], a random number of rendered 3D geometric shapes may be inserted into the input image.  The rendered geometric shapes may be referred to as flying distractors; paragraph [0022], the GPU 110 receives a 3D synthetic object (object of interest) and rendering parameters; paragraph [0039], the GPU 110 also renders the geometric shapes according to the rendering parameters to produce rendered images of the geometric shapes. The rendering parameters may specify a position and/or orientation of the geometric shape in a 3D scene, a position and/or orientation of a virtual camera, one or more texture maps, one or more lights including color, type, intensity, position and/or orientation, and the like; paragraph [0043], each the rendered geometric shape may be scaled in size and/or rotated.  In an embodiment, the number of the rendered geometric shapes and the position, scale, and/or rotation for each rendered object of interest is defined by the rendering parameters. In additional, paragraph [0039] of Tremblay_2018 describes “In an embodiment, the geometric shape may be rendered according to different rendering parameters to produce additional rendered images of the object of interest”. Examiner interprets both of 3D synthetic object (object of interest) and the geometric shape are rendered with the same rendering parameter. Thus, rendering 3D geometric shapes with a corresponding size that is determined based on the size of 3D synthetic object (object of interest)) that is determined based on the size at which the foreground 3D object model is to be rendered (Paragraph [0039], the GPU 110 renders the 3D objects of interest as previously described to produce rendered images of objects of interest; paragraph [0022], the GPU 110 receives a 3D synthetic object (object of interest) and rendering parameters.  The GPU 110 processes the 3D object according to the rendering parameters to generate a rendered image of the 3D object, specifically, a rendered image of the object of interest … The rendering parameters may specify a position and/or orientation of the object of interest in a 3D scene …; paragraph [0032], each rendered object of interest may be scaled in size and/or rotated.  In an embodiment, the number of rendered objects of interest and the position, scale, and/or rotation for each rendered object of interest is defined by the rendering parameters).
	Tremblay and Tremblay_2018 are analogous art because both pertain to generate synthetic Images for training deep neural network. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the synthetic data generation taught by Tremblay incorporate the teachings of Tremblay_2018, and applying the synthetic image generation taught by Tremblay_2018 to provide the rendering parameter for generating and scaling the 3D object of interest in size, in additional to render the geometric shapes “not of interest” using the rendering 

	Regarding claim 7, the combination of Tremblay in view of Tremblay_2018 discloses everything claimed as applied above (see claim 1), and Tremblay further disclose wherein for each of a plurality of the selected background 3D object models (Page 1083, section 3,  a random number of geometric shapes are added to the scene; page 1084, top, FIG. 1, random flying distractors (geometric shapes next to the background images) in a scene), rendering the selected background 3D object model at the corresponding background location comprises selecting the background location based on no other background 3D object having yet been rendered at the background location (Page 1083, section 3,  a random number of geometric shapes are added to the scene. We call these flying distractors. Random textures are then applied to both the objects of interest and the flying distractors. A random number of lights of different types are inserted at random locations, and the scene is rendered from a random camera viewpoint, after which the result is composed over a random background image; page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center) are rendered on top of a random background (left) along with random flying distractors (geometric shapes next to the background images) in a scene. Thus, the first selected geometric shape is rendered at a random selected location based on no other geometric shape having yet been rendered at the background location).

	Regarding claim 9, the combination of Tremblay in view of Tremblay_2018 discloses everything claimed as applied above (see claim 1), and Tremblay discloses further comprising: 
selecting an additional background 3D object model (Page 1084, top, FIG. 1, random flying distractors (geometric shapes next to the background images) in a scene, for example, 3D box is selected); 
identifying a random location within a bounding area that bounds the rendering of the foreground 3D object model (Page 1083 section 3, a random number of geometric shapes are added to the scene. We call these flying distractors; page 1084, top, FIG. 1, the bounding box is selected in the up left image of the resulting images); and 
rendering the additional background 3D object model, in the random location and in an occlusion layer of the synthetic image (Page 1083, section 3, a random number of geometric shapes are added to the scene. We call these flying distractors. Random textures are then applied to both the objects of interest and the flying distractors. A random number of lights of different types are inserted at random locations, and the scene is rendered from a random camera viewpoint, after which the result is composed over a random background image), rendering the additional background 3D object model comprising scaling the additional background 3D object model before rendering so as to occlude only a portion of the rendering of the Page 1084, left hand, number, types, colors, and scales of distractors, selected from a set of 3D models (cones, pyramids, spheres, cylinders, partial toroids, arrows, pedestrians, trees, etc.); page 1084, top, FIG. 1, the up left image of the resulting images shows the selected 3D box occludes a portion of car); 
wherein generating the synthetic image is based on fusing the background layer, the foreground layer, and the occlusion layer (Page 1084, top, FIG. 1, the up left image shows the resulting image based on fusing the background, car and the 3D box).       

	Regarding claim 10, the combination of Tremblay in view of Tremblay_2018 discloses everything claimed as applied above (see claim 1), and Tremblay further disclose wherein the foreground 3D object model is selected from a corpus of foreground 3D object models (Page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center); left hand, number and types of objects, selected from a set of 36 downloaded 3D models of generic sedan and hatch-back cars), wherein the background 3D object models are randomly selected from a corpus of background 3D object models (Page 1084, top, FIG. 1, random flying distractors (geometric shapes next to the background images)), and wherein the corpus of foreground objects and the corpus of background objects are disjoint (Page 1084, left hand, number and types of objects, selected from a set of 36 downloaded 3D models of generic sedan and hatchback cars; number, types, colors, and scales of distractors, selected from a set of 3D models (cones, pyramids, spheres, cylinders, partial toroids, arrows, pedestrians, trees, etc.)).

	Regarding claim 13, the combination of Tremblay in view of Tremblay_2018 discloses everything claimed as applied above (see claim 1), and Tremblay further disclose wherein the ground truth label comprises a bounding shape for the foreground object (Page 1083, section 3, the resulting images, with automatically generated ground truth labels (e.g., bounding boxes); FIG. 1 shows the resulting images, along with automatically-generated ground truth (right)), a six-dimensional (6D) pose for the foreground object, and/or a classification for the foreground object.

	Regarding claim 14, the combination of Tremblay in view of Tremblay_2018 discloses everything claimed as applied above (see claim 13), and Tremblay further disclose wherein the ground truth label comprises the bounding shape (Page 1083, section 3, the resulting images, with automatically generated ground truth labels (e.g., bounding boxes)), and wherein the bounding shape is a two-dimensional bounding box (Page 1084, top, FIG. 1 shows the resulting images, along with automatically-generated ground truth (right), the bounding box is two-dimensional bounding box).

	Regarding claim 15, the combination of Tremblay in view of Tremblay_2018 discloses everything claimed as applied above (see claim 1), and Tremblay further disclose wherein rendering the foreground 3D object model at the foreground location in Page 1083, section 3, a random number of these objects are placed in a 3D scene at random positions and orientations; page 1085, right hand, each car instance was randomly picked from a set of 36 models).

Claims 2, 6 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Tremblay et al ("Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW); pages 1082-1090, June 18, 2018, listed in IDS submitted by Applicant on 02/25/2021) in view of Tremblay et al (U.S. Patent Application Publication 2019/0251397 A1, hereinafter referred to as “Tremblay_2018”) in view of Armeni et al (U.S. Patent Application Publication 2018/0077376 A1).

	Regarding claim 2, the combination of Tremblay in view of Tremblay_2018 discloses everything claimed as applied above (see claim 1), and Tremblay disclose further comprising: 
wherein, for each of the selected background 3D object models, rendering the selected background 3D object model with the corresponding size (Page 1083, section 3,  a random number of geometric shapes are added to the scene. We call these flying distractors. Random textures are then applied to both the objects of interest and the flying distractors. A random number of lights of different types are inserted at random locations, and the scene is rendered from a random camera viewpoint, after which the result is composed over a random background image; page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center) are rendered on top of a random background (left) along with random flying distractors (geometric shapes next to the background images) in a scene. Thus, a random number of geometric shapes are added at a corresponding background location in a background layer for the synthetic image) comprises: 
selecting a corresponding scaling value (Page 1084, left hand, number, types, colors, and scales of distractors, selected from a set of 3D models (cones, pyramids, spheres, cylinders, partial toroids, arrows, pedestrians, trees, etc.)); 
scaling the selected background 3D object model, based on the corresponding scaling value, to generate a corresponding scaled background 3D object model (Page 1084; top, FIG. 1, geometric shapes next to the background images are scaled in a scene before rendering, random texture is applied to the objects of interest as well as to the flying distractors); and 
rendering the scaled background 3D object model at the corresponding background location in the background layer (Page 1083, section 3, a random number of geometric shapes are added to the scene. We call these flying distractors. Random textures are then applied to both the objects of interest and the flying distractors. A random number of lights of different types are inserted at random locations, and the scene is rendered from a random camera viewpoint, after which the result is composed over a random background image).    
However, Tremblay dose not specifically disclose determining, based on the size at which to render the foreground 3D object model, a range of scaling values. 
Paragraph [0027], FIG. 1 illustrates an example embodiment of a system for generating synthetic images; paragraph [0029], FIG. 2 illustrates an example embodiment of the flow of information in a system for generating synthetic images and the operations that are performed by a synthetic-image-generation device; paragraph [0080], FIG. 13 illustrates an example embodiment of a system for generating synthetic images) determining, based on the size at which to render the foreground 3D object model (Paragraph [0086], the scene-composition module 1303B includes instructions that, when executed, or circuits that, when activated, cause the synthetic-image-generation device 1300 to select a size for an object model, select a pose of the object model, add a support plane to a scene, add a background plane to a scene, deform the background plane, add a texture to an object model, add a background image to a support plane, or add a background image to a background plane; paragraph [0030], in block B201, the synthetic-image-generation device 200 obtains one or more scene components 220 (e.g., from cameras, from other computing devices, from storage, from a library-storage device) and selects scene components 220 for a synthetic scene.  This includes selecting one or more object models 221 (e.g., a CAD model), such as an object model 221 that belong to one or more object categories for which synthetic images are desired.  FIG. 3B illustrates example embodiments of object models 221 for the `chair` and `table` object categories.  Also, FIG. 4A illustrates example embodiments of object models 421A-C in a `furniture` category, of which object model 421C is a selected object model), a range of scaling values (Paragraph [0038], also for example, to scale the dimensions of an object mode 221, some embodiments of the synthetic-image-generation device 200 perform operations that can be described by the following pseudo code: …; paragraph [0039], FIG. 5A illustrates different dimensions of an example embodiment of an object model. Starting with the initial scale 527A of the object model 521, the scale of the object model 521 is increased to the second scale 527B.  The scale of the object model 521 is then decreased to the third scale 527C, which is smaller than the second scale 527B but larger than the initial scale 527A in this example).
Tremblay and Armeni are analogous art because both pertain to generate synthetic Images. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the synthetic data generation taught by Tremblay incorporate the teachings of Armeni, and applying the synthetic image generation taught by Armeni to provide the scaling range for selecting a scaling value to adjust the dimension of the selected object to match the scene’s scale. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Tremblay according to the relied-upon teachings of Armeni to obtain the invention as specified in claim.

Regarding claim 6, the combination of Tremblay in view of Tremblay_2018 in view of Armeni discloses everything claimed as applied above (see claim 2), and Tremblay further disclose wherein, for each of the selected background 3D object models, selecting the corresponding scaling value comprises randomly selecting the corresponding scaling value (Page 1084, left hand, number, types, colors, and scales of distractors, selected from a set of 3D models (cones, pyramids, spheres, cylinders, partial toroids, arrows, pedestrians, trees, etc.); page 1083, section 3, a random number of geometric shapes are added to the scene. We call these flying distractors. Random textures are then applied to both the objects of interest and the flying distractors. A random number of lights of different types are inserted at random locations, and the scene is rendered from a random camera viewpoint, after which the result is composed over a random background image).  
However, Tremblay dose not specifically disclose selecting the corresponding scaling value, from amongst all scaling values within the range of scaling values.
In the similar field of endeavor, Armeni discloses selecting the corresponding scaling value, from amongst all scaling values within the range of scaling values (Paragraph [0029], FIG. 2 illustrates an example embodiment of the flow of information in a system for generating synthetic images and the operations that are performed by a synthetic-image-generation device; paragraph [0038], also for example, to scale the dimensions of an object mode 221, some embodiments of the synthetic-image-generation device 200 perform operations that can be described by the following pseudo code: … the scale value is selected with the range; paragraph [0039], FIG. 5A illustrates different dimensions of an example embodiment of an object model. Starting with the initial scale 527A of the object model 521, the scale of the object model 521 is increased to the second scale 527B.  The scale of the object model 521 is then decreased to the third scale 527C, which is smaller than the second scale 527B but larger than the initial scale 527A in this example).
Tremblay and Armeni are analogous art because both pertain to generate synthetic Images. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the synthetic data generation taught by Tremblay incorporate the teachings of Armeni, and applying the synthetic image generation taught by Armeni to provide the scaling range for selecting a scaling value to adjust the dimension of the selected object to match the scene’s scale. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Tremblay according to the relied-upon teachings of Armeni to obtain the invention as specified in claim.

	Regarding claim 11, the combination of Tremblay in view of Tremblay_2018 discloses everything claimed as applied above (see claim 1), and Tremblay disclose further comprising: 
generating an additional synthetic image that includes the foreground 3D object model rendered at a different size at which the foreground 3D object is rendered in the synthetic image (Page 1086, left hand, during training, we applied the following data augmentations: random brightness, random contrast, and random Gaussian noise. We also included more classic augmentations to our training process, such as random flips, random resizing, box jitter, and random crop), and that includes alternative background 3D object models rendered at corresponding alternative sizes determined based on the smaller size at which the foreground 3D object model is Page 1084, top, FIG. 1, the resulting images show four synthetic images, each synthetic image includes the rendered alternative background 3D object model “geometric shape” corresponding to the synthetic objects (in this case cars, top-center) based on the different size at the synthetic object is rendered in the additional synthetic image); 
assigning, to the additional synthetic image, an additional ground truth label for the rendering of the foreground 3D object model in the additional synthetic image (Page 1083, section 3, the resulting images, with automatically generated ground truth labels (e.g., bounding boxes), are used for training the neural network); and Page 4 of 9Patent Application No.: 17/271,515 Attorney Docket No. ZS202-21293 Preliminary Amendment 
providing an additional training instance, that includes the additional synthetic image paired with the additional ground truth label (Page 1083, section 3, a random number of these objects are placed in a 3D scene at random positions and orientations. To better enable the network to learn to ignore objects in the scene that are not of interest, a random number of geometric shapes are added to the scene. We call these flying distractors. Random textures are then applied to both the objects of interest and the flying distractors. A random number of lights of different types are inserted at random locations, and the scene is rendered from a random camera viewpoint, after which the result is composed over a random background image. The resulting images, with automatically generated ground truth labels (e.g., bounding boxes), are used for training the neural network), for further training of the at least one machine learning model subsequent to training of the at least one machine learning model based on the training instance (Page 1083, section 3, our approach to using domain randomization (DR) to generate synthetic data for training a neural network is illustrated in Fig. 1; page 1086, left hand, for all architectures, training was stopped when performance on the test set saturated to avoid overfitting, and only the best results are reported …Table 1 compares the performance of the three architectures when trained on VKITTI versus our DR dataset).
However, Tremblay dose not specifically disclose the foreground 3D object model rendered at a smaller size than the size at which the foreground 3D object is rendered in the synthetic image.
In the similar field of endeavor, Armeni discloses (Paragraph [0027], FIG. 1 illustrates an example embodiment of a system for generating synthetic images; paragraph [0029], FIG. 2 illustrates an example embodiment of the flow of information in a system for generating synthetic images and the operations that are performed by a synthetic-image-generation device; paragraph [0080], FIG. 13 illustrates an example embodiment of a system for generating synthetic images) the foreground 3D object model rendered at a smaller size than the size at which the foreground 3D object is rendered in the synthetic image (Paragraph [0086], the scene-composition module 1303B includes instructions that, when executed, or circuits that, when activated, cause the synthetic-image-generation device 1300 to select a size for an object model, select a pose of the object model, add a support plane to a scene, add a background plane to a scene, deform the background plane, add a texture to an object model, add a background image to a support plane, or add a background image to a background plane; paragraph [0030], in block B201, the synthetic-image-generation device 200 obtains one or more scene components 220 (e.g., from cameras, from other computing devices, from storage, from a library-storage device) and selects scene components 220 for a synthetic scene.  This includes selecting one or more object models 221 (e.g., a CAD model), such as an object model 221 that belong to one or more object categories for which synthetic images are desired.  FIG. 3B illustrates example embodiments of object models 221 for the `chair` and `table` object categories.  Also, FIG. 4A illustrates example embodiments of object models 421A-C in a `furniture` category, of which object model 421C is a selected object model; paragraph [0038], also for example, to scale the dimensions of an object mode 221, some embodiments of the synthetic-image-generation device 200 perform operations that can be described by the following pseudo code: … the scale value is selected with the range; paragraph [0039], FIG. 5A illustrates different dimensions of an example embodiment of an object model. Starting with the initial scale 527A of the object model 521, the scale of the object model 521 is increased to the second scale 527B.  The scale of the object model 521 is then decreased to the third scale 527C, which is smaller than the second scale 527B but larger than the initial scale 527A in this example).
Tremblay and Armeni are analogous art because both pertain to generate synthetic Images. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the synthetic data generation taught by Tremblay incorporate the teachings of Armeni, and applying the synthetic image generation taught by Armeni to provide the scaling range for selecting a scaling value to adjust the dimension of the selected object to match the scene’s scale. 

	Regarding claim 12, the combination of Tremblay in view of Tremblay_2018 in view of Armeni discloses everything claimed as applied above (see claim 11), and Tremblay disclose further comprising: 
training the machine learning model based on the training instance (Page 1086, left hand, every architecture was trained on a batch size of 4 on an NVIDIA DGX Station. (We have also trained on a Titan X with a smaller batch size with similar results.)); and 
subsequent to training the machine learning model based on the training instance: training the machine learning model based on the additional training instance (Page 1086, right hand, in an additional experiment, we explored the benefits of fine-tuning [34] on real images after first training on synthetic images. For fine-tuning, the learning rate was decreased by a factor of ten while keeping the rest of the hyperparameters unchanged, the gradient was allowed to fully flow from end-to-end, and the Faster R-CNN network was trained until convergence).

Claims 16 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Armeni et al (U.S. Patent Application Publication 2018/0077376 A1) in view of Tremblay et al ("Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND 

	Regarding claim 16, Armeni discloses a method implemented by one or more processors, the method comprising: 
selecting a foreground three-dimensional (3D) object model (Paragraph [0027], FIG. 1 illustrates an example embodiment of a system for generating synthetic images; paragraph [0029], FIG. 2 illustrates an example embodiment of the flow of information in a system for generating synthetic images and the operations that are performed by a synthetic-image-generation device; paragraph [0030], in block B201, the synthetic-image-generation device 200 obtains one or more scene components 220 (e.g., from cameras, from other computing devices, from storage, from a library-storage device) and selects scene components 220 for a synthetic scene.  This includes selecting one or more object models 221 (e.g., a CAD model), such as an object model 221 that belong to one or more object categories for which synthetic images are desired.  FIG. 3B illustrates example embodiments of object models 221 for the `chair` and `table` object categories.  Also, FIG. 4A illustrates example embodiments of object models 421A-C in a `furniture` category, of which object model 421C is a selected object model; paragraph [0072], FIG. 11 illustrates an example embodiment of an operational flow for generating synthetic images; paragraph [0074], block B1102, where a 
synthetic-image-generation device obtains one or more object models); 
Paragraph [0039], FIG. 5A illustrates different dimensions of an example embodiment of an object model … the initial scale 527A of the object model 521, the scale of the object model 521 is increased to the second scale 527B; paragraph [0074], in block B1104, the synthetic-image-generation device adds the one or more object models to a synthetic scene.  The flow then moves to block B1106, where synthetic-image-generation device selects respective sizes and poses for the one or more object models), a plurality of first scale rotations for the foreground 3D object model (Paragraph [0033], FIG. 4B illustrates an example embodiment of an object model and viewpoints that observe the object model in different poses.  In this example, the object model 421 is a model of a chair.  Also, five viewpoints 441A-E from which the object model 421 can be observed are labeled, although this object model 421 and other object models can be viewed from many more viewpoints.  From the perspective of each viewpoint 441A-E, the object model 421 is in a different pose.  Thus, to change the pose of the object model 421 that is observed by a viewer (e.g., a simulated image sensor), the viewer may be moved to a different viewpoint (e.g., one of viewpoints 441A-E) or the object model 421 may be moved (e.g., rotated on one or more axis, translated on one or more axis)); 
for each of the plurality of first scale rotations for the foreground 3D object model (Paragraph [0040], to account for the synthetic scene's context, the synthetic-image-generation device 200 adds two planes to the synthetic scene: a support plane and a background plane.  The support plane may be a two- or three-dimensional object, and the background plane may be another two- or three-dimensional object.  FIG. 5B illustrates example embodiments of a support plane 542 and a background plane 543, as well as an object model 521): 
rendering the foreground 3D object model, at a corresponding one of the first scale rotations and at the first scale (Paragraph [0049], in block B202 the synthetic-image-generation device 200 also adds a texture 222 (e.g., texture image) to the object model 221 …The texture 222 that is applied to the object model 221 may be an image that depicts a material that can compose the object model 221 in the real-world; paragraph [0075], in block B1114, the synthetic-image-generation device adds respective textures to the one or more object models), in a corresponding randomly selected location in a corresponding first scale foreground layer (Paragraph [0053], for example, some embodiments of the synthetic-image-generation device 200 first adjust the simulated image sensor's location so that the object model's projection fits on the image plane.  Given this new image-sensor location, some embodiments of the synthetic-image-generation device 200 shift the image sensor in such a way that (a) it introduces a variety and randomness in the composition of the synthetic 3D scene during the generation process, and (b) the distances of the object model and the background plane from the image sensor fall within the image sensor's maximum range); Page 5 of 9Patent Application No.: 17/271,515 Attorney Docket No. ZS202-21293 Preliminary Amendment 
generating first scale synthetic images, generating each of the corresponding first scale synthetic images comprising: 
fusing a corresponding one of the corresponding first scale foreground layers with a corresponding one of a plurality of disjoint first scale background layers that each Paragraph [0075], in block B1116, the synthetic-image-generation device applies one or more respective background images to the background plane and the support plane; paragraph [0035], when composing a modality-consistent synthetic scene, the synthetic-image-generation device 200 may account for three issues: First, the synthetic-image-generation device 200 may account for the scale of the scene.  In a depth image, the size of the object matters: An object model 221 of an arbitrary scale or in a different unit system than the rest of the synthetic scene may produce a synthetic multi-modal-image pair that does not comply with real-world object dimensions and thus real-world depth images.  Second, the synthetic-image-generation device 200 may account for the synthetic scene's context.  To generate an appropriate context for a synthetic scene in an image that has only color information (e.g., RGB data), the object model is placed in front of a background image 224, for example a background image 224 that depicts a random scene); 
generating first scale training instances that each include a corresponding one of the first scale synthetic images (Paragraph [0076], in block B1120, the synthetic-image-generation device generates a multi-modal-image pair based on the synthetic scene; paragraph [0079], FIG. 12 illustrates an example embodiment of an operational flow for deep learning. The flow starts in block B1200 and then proceeds to block B1202, where the synthetic-image-generation device obtains respective libraries of object models, textures, and background images.  Next, in block B1204, the synthetic-image-generation device generates multi-modal-image pairs, for example as described in FIG. 11); 
generating, with the foreground 3D object model at a second scale that is a smaller scale than the first scale (Paragraph [0077], the flow then moves to block B1122, where the synthetic-image-generation device determines if another multi-modal-image pair is to be generated.  If yes (block B1122=Yes), then the flow proceeds to block B1124.  In block B1124, the synthetic-image-generation device alters the scene.  For example, the synthetic-image-generation device may change the size of an object model, the pose of an object model, the position of the image sensor, one or more textures, one or more background images, or the deformation of the background plane; paragraph [0039], the scale of the object model 521 is then decreased to the third scale 527C, which is smaller than the second scale 527B), a plurality of second scale rotations for the foreground 3D object model (Paragraph [0033], FIG. 4B illustrates an example embodiment of an object model and viewpoints that observe the object model in different poses.  In this example, the object model 421 is a model of a chair.  Also, five viewpoints 441A-E from which the object model 421 can be observed are labeled, although this object model 421 and other object models can be viewed from many more viewpoints.  From the perspective of each viewpoint 441A-E, the object model 421 is in a different pose.  Thus, to change the pose of the object model 421 that is observed by a viewer (e.g., a simulated image sensor), the viewer may be moved to a different viewpoint (e.g., one of viewpoints 441A-E) or the object model 421 may be moved (e.g., rotated on one or more axis, translated on one or more axis)); 
for each of the plurality of second scale rotations for the foreground 3D object model (Paragraph [0077], as they repeatedly perform the operations in block B1124): 
rendering the foreground 3D object model, at a corresponding one of the second scale rotations and at the second scale (Paragraph [0077, some embodiments of the synthetic-image-generation device rotate an object model incrementally around the x, y, and z axes in rotation angles that range from -10°  to 10°  or the x axis, from 0°  to 20° on the y axis, and from 70°  to 100°  on the z axis), in a corresponding randomly selected location in a corresponding second scale foreground layer (Paragraph [0053], for example, some embodiments of the synthetic-image-generation device 200 first adjust the simulated image sensor's location so that the object model's projection fits on the image plane.  Given this new image-sensor location, some embodiments of the synthetic-image-generation device 200 shift the image sensor in such a way that (a) it introduces a variety and randomness in the composition of the synthetic 3D scene during the generation process, and (b) the distances of the object model and the background plane from the image sensor fall within the image sensor's maximum range); 
generating second scale synthetic images, generating each of the corresponding second scale synthetic images comprising: 
fusing a corresponding one of the corresponding second scale foreground layers with a corresponding one of a plurality of disjoint second scale background layers that Paragraph [0035], when composing a modality-consistent synthetic scene, the synthetic-image-generation device 200 may account for three issues: First, the synthetic-image-generation device 200 may account for the scale of the scene.  In a depth image, the size of the object matters: An object model 221 of an arbitrary scale or in a different unit system than the rest of the synthetic scene may produce a synthetic multi-modal-image pair that does not comply with real-world object dimensions and thus real-world depth images.  Second, the synthetic-image-generation device 200 may account for the synthetic scene's context.  To generate an appropriate context for a synthetic scene in an image that has only color information (e.g., RGB data), the object model is placed in front of a background image 224, for example a background image 224 that depicts a random scene); 
generating second scale training instances that each include a corresponding one of the second scale synthetic images (Paragraph [0076], in block B1120, the synthetic-image-generation device generates a multi-modal-image pair based on the synthetic scene; paragraph [0079], FIG. 12 illustrates an example embodiment of an operational flow for deep learning. The flow starts in block B1200 and then proceeds to block B1202, where the synthetic-image-generation device obtains respective libraries of object models, textures, and background images.  Next, in block B1204, the synthetic-image-generation device generates multi-modal-image pairs, for example as described in FIG. 11); 
Paragraph [0077], the flow then moves to block B1122, where the synthetic-image-generation device determines if another multi-modal-image pair is to be generated.  If yes (block B1122=Yes), then the flow proceeds to block B1124.  In block B1124, the synthetic-image-generation device alters the scene. Thus, training a machine learning model based on the first scale training instances “based on selected object 527B” prior to training of the machine learning model based on the second scale training instances “based on selected object 527C”).   
However, Armeni does not specifically disclose randomly selected background images are 3D object models;
a corresponding ground truth label for the rendering of the foreground 3D object model in the corresponding one of the first scale synthetic images;
a corresponding ground truth label for the rendering of the foreground 3D object model in the corresponding one of the second scale synthetic images.
In the similar field of endeavor, Tremblay discloses (Abstract, we present a system for training deep neural networks for object detection using synthetic images. To handle the variability in real-world data, the system relies upon the technique of domain randomization, in which the parameters of the simulator—such as lighting, pose, object textures, etc.—are randomized in non-realistic ways to force the neural network to learn the essential features of the object of interest) randomly selected background images are 3D object models (Page 1083, section 3,  a random number of geometric shapes are added to the scene);
Page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center) are rendered on top of a random background (left) along with random flying distractors (geometric shapes next to the background images) in a scene; page 1083, section 3, the resulting images, with automatically generated ground truth labels (e.g., bounding boxes), are used for training the neural network);
a corresponding ground truth label for the rendering of the foreground 3D object model in the corresponding one of the second scale synthetic images (Page 1086, left hand, during training, we applied the following data augmentations: random brightness, random contrast, and random Gaussian noise. We also included more classic augmentations to our training process, such as random flips, random resizing, box jitter, and random crop; page 1084, left hand, number, types, colors, and scales of distractors, selected from a set of 3D models (cones, pyramids, spheres, cylinders, partial toroids, arrows, pedestrians, trees, etc.); page 1083, section 3, the resulting images, with automatically generated ground truth labels (e.g., bounding boxes), are used for training the neural network).
Armeni and Tremblay are analogous art because both pertain to generate synthetic Images for deep learning. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the synthetic data generation taught by Armeni incorporate the teachings of Tremblay, and applying the synthetic image generation taught by Tremblay to provide the 3D object models for the background and generate ground truth labels for training the neural network. Therefore, 

	Regarding claim 19, Armeni discloses a method implemented by one or more processors, the method comprising: 
training a machine learning model (Paragraph [0079], FIG. 12 illustrates an example embodiment of an operational flow for deep learning. The flow starts in block B1200 and then proceeds to block B1202, where the synthetic-image-generation device obtains respective libraries of object models, textures, and background images.  Next, in block B1204, the synthetic-image-generation device generates multi-modal-image pairs, for example as described in FIG. 11) utilizing first scale training instances that each include a corresponding first scale synthetic image (Paragraph [0039], FIG. 5A illustrates different dimensions of an example embodiment of an object model … the initial scale 527A of the object model 521, the scale of the object model 521 is increased to the second scale 527B; paragraph [0074], in block B1104, the synthetic-image-generation device adds the one or more object models to a synthetic scene.  The flow then moves to block B1106, where synthetic-image-generation device selects respective sizes and poses for the one or more object models), wherein the corresponding first scale synthetic images each include one or more corresponding first scale foreground objects (Paragraph [0027], FIG. 1 illustrates an example embodiment of a system for generating synthetic images; paragraph [0029], FIG. 2 illustrates an example embodiment of the flow of information in a system for generating synthetic images and the operations that are performed by a synthetic-image-generation device; paragraph [0030], in block B201, the synthetic-image-generation device 200 obtains one or more scene components 220 (e.g., from cameras, from other computing devices, from storage, from a library-storage device) and selects scene components 220 for a synthetic scene.  This includes selecting one or more object models 221 (e.g., a CAD model), such as an object model 221 that belong to one or more object categories for which synthetic images are desired.  FIG. 3B illustrates example embodiments of object models 221 for the `chair` and `table` object categories.  Also, FIG. 4A illustrates example embodiments of object models 421A-C in a `furniture` category, of which object model 421C is a selected object model; paragraph [0072], FIG. 11 illustrates an example embodiment of an operational flow for generating synthetic images; paragraph [0074], block B1102, where a synthetic-image-generation device obtains one or more object models) that are each within a first range of sizes (Paragraph [0038], also for example, to scale the dimensions of an object mode 221, some embodiments of the synthetic-image-generation device 200 perform operations that can be described by the following pseudo code: …; paragraph [0039], FIG. 5A illustrates different dimensions of an example embodiment of an object model. Starting with the initial scale 527A of the object model 521, the scale of the object model 521 is increased to the second scale 527B); 
subsequent to training the machine learning model utilizing the first scale training instances (Paragraph [0077], the flow then moves to block B1122, where the synthetic-image-generation device determines if another multi-modal-image pair is to be generated.  If yes (block B1122=Yes), then the flow proceeds to block B1124.  In block B1124, the synthetic-image-generation device alters the scene.  For example, the synthetic-image-generation device may change the size of an object model, the pose of an object model, the position of the image sensor, one or more textures, one or more background images, or the deformation of the background plane): 
further training the machine learning model utilizing second scale training instances that each include a corresponding second scale synthetic image (Paragraph [0077], in block B1124, the synthetic-image-generation device alters the scene.  For example, the synthetic-image-generation device may change the size of an object model, the pose of an object model, the position of the image sensor, one or more textures, one or more background images, or the deformation of the background plane), wherein the corresponding second scale synthetic images each include one or more corresponding second scale foreground objects that are each within a second range of sizes (Paragraph [0039, the scale of the object model 521 is then decreased to the third scale 527C, which is smaller than the second scale 527B but larger than the initial scale 527A in this example); 
wherein the sizes of the second range of sizes are all smaller than the sizes of the first range of sizes (Paragraph [0038], also for example, to scale the dimensions of an object mode 221, some embodiments of the synthetic-image-generation device 200 perform operations that can be described by the following pseudo code: …; paragraph [0039, the scale of the object model 521 is then decreased to the third scale 527C, which is smaller than the second scale 527B. Thus, the sizes of the second range of sizes are all smaller than the sizes of the first range of sizes); and 
wherein the corresponding first scale synthetic images, of the first scale training instances, are void of any foreground objects that are within the second range of sizes (Paragraph [0035], in block B202, the synthetic-image-generation device 200 composes one or more modality-consistent synthetic scenes.  When composing a modality-consistent synthetic scene, the synthetic-image-generation device 200 may account for three issues: First, the synthetic-image-generation device 200 may account for the scale of the scene.  In a depth image, the size of the object matters: An object model 221 of an arbitrary scale or in a different unit system than the rest of the synthetic scene may produce a synthetic multi-modal-image pair that does not comply with real-world object dimensions and thus real-world depth images … Second, … the object model is placed in front of a background image 224, for example a background image 224 that depicts a random scene. Third, the synthetic-image-generation device 200 may account for the range of the simulated image sensor … In a depth image, this distance may be important because the ability of image sensors to collect depth information is often limited by a maximum range within which they can accurately collect depth information, and any objects or parts of objects that fall outside this range will not be accurately depicted in the depth image).
However, Armeni does not specifically disclose at least one corresponding label.
Abstract, we present a system for training deep neural networks for object detection using synthetic images. To handle the variability in real-world data, the system relies upon the technique of domain randomization, in which the parameters of the simulator—such as lighting, pose, object textures, etc.—are randomized in non-realistic ways to force the neural network to learn the essential features of the object of interest) at least one corresponding label (Page 1084, top, FIG. 1, synthetic objects (in this case cars, top-center) are rendered on top of a random background (left) along with random flying distractors (geometric shapes next to the background images) in a scene; page 1083, section 3, the resulting images, with automatically generated ground truth labels (e.g., bounding boxes), are used for training the neural network).
Armeni and Tremblay are analogous art because both pertain to generate synthetic Images for deep learning. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the synthetic data generation taught by Armeni incorporate the teachings of Tremblay, and applying the synthetic image generation taught by Tremblay to provide the 3D object models for the background and generate ground truth labels for training the neural network. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Armeni according to the relied-upon teachings of Tremblay to obtain the invention as specified in claim.

Allowable Subject Matter
Claims 3-5, 8 and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Dependent claim 3 depends from dependent claim 2 and recites additional limitations of “determining a lower bound scaling value of the scaling values based on determining that the lower bound scaling value, if used to scale any one of the background 3D object models before rendering, would result in the corresponding size being at a lower percentage bound of a foreground size, wherein the foreground size is based on the size at which the foreground 3D object model is to be rendered; determining an upper bound scaling value of the scaling values based on determining that the upper bound scaling value, if used to scale any one of the background 3D object models before rendering, would result in the corresponding sizes being at an upper percentage bound of the foreground size” for determining the range of scaling values.

Dependent claim 8 depends from dependent claim 7 and recites additional limitations of “wherein the foreground size is the same as the size at which to render the foreground 3D object model, or is a function of the size and of at least one additional size of at least one additional foreground 3D object model that is also rendered in the foreground layer” for rendering the selected background 3D object model.

wherein the corresponding renderings of the corresponding randomly selected background 3D object models, in the first scale backgroundPage 6 of 9Patent Application No.: 17/271,515 Attorney Docket No. ZS202-21293Preliminary Amendmentlayers, are all of a smaller size than the corresponding renderings of the corresponding randomly selected background 3D object models in the second scale background layers” and “wherein the corresponding renderings of the corresponding randomly selected background 3D object models, in the first scale background layers, are all within a threshold percentage range of the first scale; and wherein the corresponding renderings of the corresponding randomly selected background 3D object models, in the second scale background layers, are all within a threshold percentage range of the second scale” for rendering the selected background 3D object model.
However, the search results fail to show the obviousness of the claims as a whole. None of the prior art cited alone or in combination provides the motivation to teach the above limitations.

Dependent claims 4-5 have the same reasons at least due to their respective dependencies from the dependent claim 3.

	Conclusion
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Xilin Guo whose telephone number is (571)272-5786. The examiner can normally be reached Monday - Friday 9:00 AM-5:30 PM EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Gregory J Tryder can be reached on 571-270-7365. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XILIN GUO/Primary Examiner, Art Unit 2616