DETAILED ACTION
This action is in response the communications filed on 09/07/2022 in which claims 1, 2 and 15 are amended, claim 5 is canceled and therefore claims 1-4, 6-12 and 15 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3-4, 7-9, 11-12 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Schlosser ("Fusing LIDAR and images for pedestrian detection using convolutional neural networks") in view of Ros ("The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes").

In regard to claims 1 and 15, Schlosser teaches: A computer device for training a deep neural network, the computer device comprising: (Schlosser, section F. Experimental Design "Training of each network was performed from scratch with 450,000 iterations in Caffe [11] on several NVIDIA K20 and K40 GPUs [A computer device].", i.e. a GPU cluster is a computer cluster including computer device.)
a receiver configured to receive a two-dimensional input image frame;  (Schlosser, section B. Base Network Design "… we chose fixed dimensions of 368×160 pixels (height × width) [a 2D input image frame] for the network input…")
the deep neural network configured to examine the two-dimensional input image frame in view of objects being included in the two-dimensional input image frame, wherein the deep neural network comprises a plurality of hidden layers and an output layer representing a decision layer;
(Schlosser, abstract "In this paper, we explore various aspects of fusing LIDAR and color imagery for pedestrian [object] detection in the context of convolutional neural networks (CNNs)…"; section I. Introduction "...utilize sensor data to identify pedestrians and their coordinates. Convolutional neural networks (CNNs) operating on color image data channels... To extend this work, we explore fusing multi-modal sensor data... We perform this evaluation on a training/validation split of the KITTI urban pedestrian detection, which has..."; see Fig. 1 and Table I: data [368×160 pixels] is an input layer, conv 1-5, pooling, relu 1-7, fc 6-7 are hidden layers; fc8 is an output layer generating 2 outputs., i.e. pedestrians are objects, and CNN includes hidden and output layers.)
…an output configured to output a result of the deep neural network based on the model, (Schlosser, see Fig. 1 and Table I: fc8 is an output layer generating 2 outputs.; also see Ros: see figure 6: segmented image output by the network.)
wherein the trainer is further configured to provide a hierarchical training, and wherein the hierarchical training includes using a baseline model to increase a capability of the model by additionally using more complex images. (Schlosser, F. Experimental Design "In training and testing our CNNs, we utilized the KITTI dataset and benchmarks [6] throughout. Ground truth objects in KITTI have been classified by the benchmark as easy, medium, or hard, depending on the difficulty of identifying the object within the context of the full frame."; G. Results "Figure 5 shows the percent improvement over the original proposal baseline (DPM using RGB only), for the three difficulty categories as well as averaged over all categories"; also see Fig. 5 "Percent improvement in map in the three difficulty categories...", i.e. training using easy, medium, or hard categories is a hierarchical training, which includes different complexity of the images.)
a trainer configured to train the deep neural network using transfer learning… (Schlosser, section 2 "For network training, development, and design, we employ various tools and strategies... As pre-trained models are commonly used as a starting point upon which fine-tuning can be performed, we utilize the ubiquitous ImageNet [2] models when fine-tuning our networks..."; F. Experimental Design "... In the case of the better mid to late fusion architectures, we also fine-tuned from the pre-trained model trained on 1.2 million ImageNet images", i.e. [transfer learning]: a pre-trained/initial model is transferred to an enhanced model by training/fine-tuning with ImageNet images.)

Schlosser does not teach, but Ros teaches: a synthetic data generator for generating synthetic images, wherein the synthetic images are generated for different counts of objects; (Ros, section 3. The SYNTHIA Dataset "Here we describe our synthetic dataset of urban scenes, which we call the SYNTHetic collection of Imagery and Annotations (SYNTHIA)."; section 3.1 Virtual World Generator "SYNTHIA has been generated by rendering a virtual city created with the Unity development platform [38]."; figure 2 "Dynamic objects catalogue of SYNTHIA. (Top) vehicles examples; (middle) cyclists; (bottom) pedestrians."; p. 3234 abstract "In this paper, we propose to use a virtual world to automatically generate realistic synthetic images with pixel-level annotations… we have generated a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations."; p. 3237 "SYNTHIA consists of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes, i.e., sky, building, road, sidewalk, fence, vegetation, lane- marking, pole, car, traffic signs, pedestrians, cyclists and miscellaneous (see Fig. 1)."; See Fig. 1, a sample frame is a synthetic image including different objects such as pedestrians, cars, signs etc. and in the center image of Fig. 1, there are two signs colored with pink, a greater number of pedestrians/cars colored with dark gray/purple etc. [different counts of objects])
a trainer configured to train the deep neural network… based on the synthetic images for generating a model comprising trained parameters; and (Ros, section 4: "During training the contraction blocks are initialized using VGG-F [7] for T-Net and VGG-16 [36] for FCN, pretrained on ILSVRC [31]." and section 5.2: "In our first experiment we evaluate the capability of SYNTHIA-Rand in terms of the generalization of the trained models on state-of-the-art datasets ... The networks trained on just synthetic data [synthetic images] produce good results recognizing roads, buildings, cars and pedestrians in the presented datasets."; section 4.1. Architectures Specification "We use weighted cross-entropy as a loss function for both architectures, where the weights [trained parameters] are computed as the inverse frequencies of each of the classes for the training data [2]...", i.e. the DNN is initialized with ILSVRC data and trained for the target domain with synthetic images.)

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have applied the pedestrian detection using CNN of Schlosser to the synthetic images of Ros in the training stages. Doing so would significantly improve performance on the semantic segmentation task. (Ros, abstract "We use SYNTHIA in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments with DCNNs that show how the inclusion of SYNTHIA in the training stage significantly improves performance on the semantic segmentation task.")

Claim 15 recites substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claims 15.

In regard to claim 3, reference is made to the rejection of claim 1, and further, Schlosser teaches: wherein the trainer is further configured to use an initial model of the deep neural network to initialize parameters of the deep neural network. (Schlosser, F. Experimental Design "... the weights from the publicly-available models are used to initialize the network parameters.")

In regard to claim 4, reference is made to the rejection of claim 1, and further, Schlosser teaches: wherein the trainer is further configured to perform transfer learning from an initial model to a baseline model of the deep neural network, from the baseline model to an enhanced model of the deep neural network, from the initial model to the enhanced model of the deep neural network, and/or from the enhanced model to an improved model of the deep neural network, or any combination thereof. (Schlosser, section 2 "For network training, development, and design, we employ various tools and strategies... As pre-trained models [e.g. an initial model] are commonly used as a starting point upon which fine-tuning can be performed, we utilize the ubiquitous ImageNet [2] models when fine-tuning our networks..."; F. Experimental Design "... In the case of the better mid to late fusion architectures, we also fine-tuned from the pre-trained model trained on 1.2 million ImageNet images", i.e. a pre-trained/initial model is transferred to an enhanced model by training/fine-tuning with ImageNet images)

In regard to claim 7, reference is made to the rejection of claim 1, and further, Schlosser does not teach, but Ros teaches: wherein the objects are objects before a background of the two-dimensional input image frame. (Ros, figure 2 "Dynamic objects catalogue of SYNTHIA. (Top) vehicles examples; (middle) cyclists; (bottom) pedestrians.";  section 5.2, "the networks trained on just synthetic data produce good results recognizing roads, buildings, cars and pedestrians in the presented datasets".)
The rationale for combining the teachings of Schlosser and Ros is the same as set forth in the rejection of claim 1.

In regard to claim 8, reference is made to the rejection of claim 1, and further, Schlosser does not teach, but Ros teaches: wherein the objects are pedestrians. (Ros, figure 2 "Dynamic objects catalogue of SYNTHIA. (Top) vehicles examples; (middle) cyclists; (bottom) pedestrians.";  section 5.2, "the networks trained on just synthetic data produce good results recognizing roads, buildings, cars and pedestrians in the presented datasets".)
The rationale for combining the teachings of Schlosser and Ros is the same as set forth in the rejection of claim 1.

In regard to claim 9, reference is made to the rejection of claim 1, and further, Schlosser does not teach, but Ros teaches: wherein the trainer is further configured to train the deep neural network using a combination of an activation function, a linear neuron output in a first step and a cross entropy loss, a squared error loss in a second step, or any combination thereof. (Ros, section 4.1: "Fig. 6 shows a graphical schema of T-Net. The architecture is based on a combination of contraction, expansion blocks and a soft-max classifier. Contraction blocks consist of convolutions, batch normalization, ReLU [e.g. activation function] and max-pooling with indices storage. Expansion blocks consist of an unpooling of the blob using the pre-stored indices, convolution, batch normalization and ReLU... We use weighted cross-entropy as a loss function for both architectures".)
The rationale for combining the teachings of Schlosser and Ros is the same as set forth in the rejection of claim 1.

In regard to claim 11, reference is made to the rejection of claim 1, and further, Schlosser does not teach, but Ros teaches: wherein the output layer is configured to provide a classification of the objects, is configured to provide a regression value, is configured to generate images, or any combination thereof. (Ros, see Fig. 6 "Output Seg." and Fig. 7; section 4.2 "The aim of this work is to show that the use of synthetic data helps to improve semantic segmentation results on real imagery."; section 5.2, "The networks trained on just synthetic data produce good results recognizing roads, buildings, cars and pedestrians in the presented datasets.", see Fig. 6, the output layer provides segmentation of images.)
The rationale for combining the teachings of Schlosser and Ros is the same as set forth in the rejection of claim 1.

In regard to claim 12, reference is made to the rejection of claim 1, and further, Schlosser does not teach, but Ros teaches: wherein the result of the deep neural network includes a probability distribution, a single value, a decision, images, or any combination thereof. (Ros, see Fig. 6 "Output Seg." and Fig. 7; section 4.2 "The aim of this work is to show that the use of synthetic data helps to improve semantic segmentation results on real imagery."; section 5.2, "The networks trained on just synthetic data produce good results recognizing roads, buildings, cars and pedestrians in the presented datasets.", see Fig. 6 and 7, the result is segmentation of images.)
The rationale for combining the teachings of Schlosser and Ros is the same as set forth in the rejection of claim 1.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Schlosser in view of Ros in further view of Benzschawel (US 20050131790 A1).

In regard to claim 2, reference is made to the rejection of claim 1, and further, Schlosser and Ros do not teach, but Benzschawel teaches: wherein the output is further configured to feed back the result of the deep neural network to the trainer. (Benzschawel, [0033] "The training of the neural network model at 230 is now described with reference to 300 in FIG. 3. At 310... At 330, from calculations of the chosen values for the input variables and weights by the neural network model, the difference between the given output (i.e., output variable) of the network and its desired output (i.e., the actual market move from the training data) is backpropagated by adjusting slightly each of the weights in the network in proportion to their ability to reduce the output error, i.e., optimizing the weights... the gradient descent method aims at small, step-wise reductions in model errors by feedback adjustments (trainings) based on adjustments of each of the weights in the model...")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the pedestrian detection using CNN of Schlosser and Ros to include the back-propagation of Benzschawel. Doing so would make the training to adjust the weights and eventually optimize the weights. (Benzschawel, [0033] "the difference between the given output (i.e., output variable) of the network and its desired output (i.e., the actual market move from the training data) is backpropagated by adjusting slightly each of the weights in the network in proportion to their ability to reduce the output error, i.e., optimizing the weights.")

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Schlosser in view of Ros in further view of Seguı ("Learning to count with deep object features").

In regard to claim 6, reference is made to the rejection of claim 1, and further, Schlosser and Ros do not teach, but Seguı teaches: wherein the deep neural network is configured to provide as result a count of the objects in the two-dimensional input image frame. (Seguı, abstract "We also present preliminary results about a deep network that is able to count the number of pedestrians in a scene".)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have applied the pedestrian detection using CNN of Schlosser and Ros to solve the counting problems of Seguı. Doing so would provide good performance results of counting the number of pedestrians from a surveillance camera. (Seguı, 5 Conclusions "In this paper we explore the task of counting occurrences of a concept of interest with CNN... The proposal is illustrated in two synthetic scenarios: learning to count even handwritten digits in an image and counting the number of pedestrians form a surveillance camera... The performance results on these problems is high. This suggests that the task of counting even digits may be used as a surrogate for finding good representations for these new tasks... The results in this scenario are encouraging...")

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Schlosser in view of Ros in further view of Kingma ("Adam: A Method for Stochastic Optimization").

In regard to claim 10, reference is made to the rejection of claim 9, and further, Schlosser and Ros do not teach, but Kingma teaches: wherein the trainer is further configured to train the deep neural network using regularization. (Kingma, section 6.2 : "Stochastic regularization methods, such as dropout, are an effective way to prevent over-fitting and often used in practice due to their simplicity... We compare the effectiveness of Adam to other stochastic first order methods on multi-layer neural networks trained with dropout noise."; section 6.1 "We evaluate our proposed method on L2-regularized multi-class logistic regression using the MNIST dataset.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the pedestrian detection using CNN of Schlosser and Ros to include the Stochastic regularization of Kingma. Doing so would prevent over-fitting during training CNN. (Kingma, section 6.2 : "Stochastic regularization methods, such as dropout, are an effective way to prevent over-fitting and often used in practice due to their simplicity.")

Response to Arguments
Applicant's amendments with respect to the claim objections have been fully considered and are sufficient to overcome the objections. The objections have been withdrawn.
Applicant's arguments with respect to the rejection of the claims under 35 U.S.C. 103 have been fully considered but they are moot:
Applicant argues: (see p. 8 top): “… However, nowhere does Ros disclose synthetic images generated specifically for different counts of objects. This is because Ros only discloses frames rendered from multiple viewpoints from a virtual city, and the data of the frames rendered is only associated to the virtual city. Thus, instead of providing a broad range of different data (i.e., different counts of objects) to improve the training of neural networks, Ros discloses that the data is limited to the data of the frames from the virtual city. It follows that Ros does not teach…” 

Examiner answers: the arguments do not apply to new citation from the references (Ros) being used in the current rejection. The claim has been amended to bring in “different counts of objects,” therefore more citation is used to teach synthetic images with different counts of objects. See Fig. 1, a sample frame is a synthetic image including different objects such as pedestrians, cars, signs etc. and in the center image of Fig. 1, there are two signs colored with pink, a greater number of pedestrians/cars colored with dark gray/purple etc. [different counts of objects] See details in 103 section.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519.  The examiner can normally be reached on Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.C./Examiner, Art Unit 2122                 

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122