DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/7/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 2 and 13 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US Patent Application Publication No. 202020272888 (Wang et al).
	Regarding claim 1, US Patent Application Publication No. 202020272888 (Wang et al.) discloses: “an apparatus for detecting an object, the apparatus comprising: at least one processor (ABSTRACT: “a processor configured to execute a convolutional neural network that has been trained, the convolutional neural network including a backbone network”; FIG. 1: 10, 14; [0022]: “The computing system 10 may include a processor 14 and associated memory, which in FIG. 1 is depicted as volatile memory 16 and non-volatile memory 18.  These programs and the data they utilize may include image pre-processing programs 20, convolutional neural network (CNN) 22”) configured to extract information for object detection ([0024]: “From the training process, the CNN 22 may learn human part localization and association”) from image data frames (ABSTRACT: “At the backbone network, the processor is configured to receive an input image as input and output feature maps extracted from the input image”; FIG. 13: 204; [0044]: “receiving, at the backbone network 42, an input image 27 as input and outputting feature maps extracted from the input image 27”) based on a hierarchical structure  of a convolutional neural network (CNN) (FIG. 3: C1 – C5, 56; [0024]: “residual neural network (RNN) 56 including a plurality of intermediate layers that may be configured as convolutional neural network layers.  In FIG. 3, these intermediate layers include C1 to C5”) and transmit information ([0024]: “The plurality of intermediate layers may be connected on a downstream side to a concatenation layer”; [0044]: “Feature maps from the intermediate layers C1 to C5 may be concatenated as input to downstream convolutional layers, as shown in FIG. 3”) for object detection ([0020]: “detect human body parts to be associated with each other to construct full figures”; [0044]: “As the CCPN learns body part localization and association, feature maps may be extracted from input images”) extracted from an uppermost layer (FIG. 3: C5) of the hierarchical structure to a lower layer (FIG. 3: C1) to detect an object based on information received at each layer; and storage configured to store the information for object detection and detected object information” FIG. 1: 16, 18; [0022]: “The computing system 10 may include a processor 14 and associated memory, which in FIG. 1 is depicted as volatile memory 16 and non-volatile memory 18”).
Wang et al. discloses: “extracting information for object detection ([0024]: “From the training process, the CNN 22 may learn human part localization and association”) from image data frames (ABSTRACT: “At the backbone network, the processor is configured to receive an input image as input and output feature maps extracted from the input image”; FIG. 13: 204; [0044]: “receiving, at the backbone network 42, an input image 27 as input and outputting feature maps extracted from the input image 27”) based on a hierarchical structure of a convolutional neural network (CNN) (FIG. 3: C1 – C5, 56; [0024]: “residual neural network (RNN) 56 including a plurality of intermediate layers that may be configured as convolutional neural network layers”); and transmitting information ([0024]: “The plurality of intermediate layers may be connected on a downstream side to a concatenation layer”; [0044]: “Feature maps from the intermediate layers C1 to C5 may be concatenated as input to downstream convolutional layers, as shown in FIG. 3”) for object detection ([0020]: “detect human body parts to be associated with each other to construct full figures”; [0044]: “As the CCPN learns body part localization and association, feature maps may be extracted from input images) extracted from an uppermost layer (FIG. 3: C5) of the hierarchical structure to a lower layer (FIG. 3: C1) to detect an object based on information received at each layer” ([0024]: “In FIG. 3, these intermediate layers include C1 to C5).
	With respect to claim 2, Wang et al. discloses: “the at least one processor is configured to detect the object in an one-stage scheme” ([0051]: “the computing system 10 described herein may provide a process for single-stage detection”).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. in view of US Patent Application Publication No. 20170206441 (Miyano).
	Claims 3 and 14 are dependent upon claims 2 and 13, respectively.  As discussed above, claims 2 and 13 are disclosed by Wang et al.  Thus, those limitations of claims 3 and 14 that are recited in claims 2 and 13, respectively are also disclosed by Wang et al.
	However, Wang et al. does not clearly disclose the remaining limitations to the claims.  To that end, Miyano regarding claim 3, discloses: “the at least one processor (FIG. 4: 170) is configured to correct the detected object information by using the information for object detection” ([0079]: The mobile object detection model constructing unit 170 generates a mobile object detection model (parameter corresponding to CNN) for detecting a mobile object by using CNN, by using a correct answer data given by the region designation unit 120.  More specifically, the mobile object detection model constructing unit 170 is set so as to be the closest to the given correct answer data”).  It is respectfully submitted that it would have been obvious to one of ordinary skill in the art at the time of the invention to combine Wang et al. with the invention of Miyano in order to provide correct answer data for detecting a mobile object using CNN (e.g., see Miyano @ [0079]).
	With respect to claim 14, Miyano discloses: “correcting the detected object information by using the information for object detection” ([0079]: The mobile object detection model constructing unit 170 generates a mobile object detection model (parameter corresponding to CNN) for detecting a mobile object by using CNN, by using a correct answer data given by the region designation unit 120.  More specifically, the mobile object detection model constructing unit 170 is set so as to be the closest to the given correct answer data”).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. in view of US Patent Application Publication No. 20190258895 (Sachetti et al).
	Claim 4 is dependent upon claim 1.  As discussed above, claim 1 is disclosed by Wang et al.  Thus, those limitations of claim 1 that are recited in claim 4 are also disclosed by Wang et al.
	However, Wang et al. does not clearly disclose the remaining limitations to the claims.  To that end with respect to claim 4, Sachetti et al. discloses: “the information for object detection includes feature information for large object detection and contextual information for small object detection” ([0027]: “an exemplary visual search model 104 is adapted for filtering and 
ranking of contextually-related content based on exemplary categorical object 
classifications and other associated data (e.g., feature maps, intent, bounding 
box identification, contextual signal data and analysis) propagated by an 
exemplary object detection model 104”).  It is respectfully submitted that it would have been obvious to one of ordinary skill in the art at the time of the invention to combine Wang et al. with the invention of Sachetti et al. in order to provide feature information (e.g., feature maps) and contextually-related information (e.g., see Sachetti et al. @ [0027]).

Claims 5 - 7 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. in view of US Patent Application Publication No. 20200139973 (Palanisamy et al).
	Claim 5 - 7 are ultimately dependent upon claim 1.  As discussed above, claim 1 is disclosed by Wang et al.  Thus, those limitations of claim 1 that are recited in claims 5 - 7 are also disclosed by Wang et al.  In addition regarding claim 5, Wang et al. discloses: “a backbone network” (FIG. 3: “Backbone Network 42”).
Wang et al. does not clearly disclose the remaining limitations of the claim.  To that end regarding claim 5, Palanisamy et al. discloses: “a backbone network (FIG. 7A: 130-3) for extracting feature information for object detection (FIG. 7A: 129-1 – 129-3; 132-3; [0102]: “Processing of image data (sT) 129-3 begins when CNN module 130-3 receives and processes the image data (sT) 129-3 to extract features and generate a feature map/tensors (represented by the top big cuboid inside 130-3).  A set of region vectors 132-3 that collectively make up the feature map are extracted. Each region vector corresponds to features extracted from a different region/location in the convolutional feature maps/tensors (represented by the top big cuboid inside 130-3”) and contextual information from the image data frames” (FIG. 7A: 129-1-129-3; 135-3; [0102]: “The region vectors 132-3 are used by the spatial attention network 134-3 in calculating the context feature vectors 135-3 (which are then processed by the LSTM 150-3 based temporal attention block 160)”).  It is respectfully submitted that it would have been obvious to one of ordinary skill in the art at the time of the invention to combine Wang et al. with the invention of Palanisamy et al. in order to provide feature information (e.g., “feature map/tensors”) and context information for object identification (e.g., “context feature vectors; see Palanisamy et al. @ [0102]). 
	With respect to claim 6, Wang et al. discloses: “the backbone network (FIG. 3: 42) includes a scale-based hierarchical feature structure (FIG. 3: C1 – C5, 56) for each of the image data frames (FIG. 3: 27). 
	In addition, Palanisamy et al. discloses: “to substitute time-series data ([0101]: “a time series (t-1, t, t+1 . . . T)”) for data for each scale of the scale-based hierarchical feature structure (FIG. 7A: 130-1 – 130-3; [0101]: “dotted-line boxes”) and output the time-series data” ([0101]: “FIG. 7A includes three dotted-line boxes, where each dotted-line box includes an instance of common elements: an input of image data (s) image data 129, CNN module 130, a set of region vectors 132, hidden state vector 133, attention network 134, spatial context vector 135 and a Long Short-Term Memory (LSTM) network 150.  Each one of the dotted-line boxes represents the actor-critic network architecture 102 being continuously applied to updated information at different steps within a time series (t-1, t, t+1 . . . T).  In other words, each dotted-line box represents processing by the actor-critic network architecture 102 at different instances in time”).
	Regarding claim 7, Palanisamy et al. discloses: “the at least one processor further includes a hidden state top-down structure for receiving the feature information and the contextual information extracted for each layer of each image data frame” ([0114]: “Referring again to FIG. 7A, the LSTM network 150-3 processes the hidden state vector (ht+1) 133-3 and the spatial context vector (ZT) 135-3 to generate an output 152-3 that is equal to the product of a temporal attention weight (wT) and a hidden state vector (hT).  The LSTM network 150-3 will process the temporal information in the network instead of just having stacked historical observations as input.  Also, a longer sequence of history information can be incorporated and considered due to the connectivity through time via RNN, which could help generate more complex driving strategies”).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. in view of Miyano and Palanisamy et al.
	Claim 15 is dependent upon claim 14.  As discussed above, claim 14 is disclosed by the combination of Wang et al. and Miyano.  Thus, those limitations of claim 14 that are recited in claims 15 -16 are also disclosed by the combination of Wang et al. and Miyano.  In addition, with respect to claim 15, Wang et al. discloses: “in a backbone network (FIG. 3: 42) having a scale-based hierarchical feature structure (FIG. 3: C1 – C5, 56) for each of the image data frames (FIG. 3: 27).
	However, the combination of Wang et al. and Miyano does not clearly disclose the remaining limitations of the claim.  To that end, Palanisamy et al. discloses: “substituting time-([0101]: “a time series (t-1, t, t+1 . . . T)”) for data for each scale of the scale-based hierarchical feature structure” (FIG. 7A: 130-1 – 130-3; [0101]: “dotted-line boxes”).  It is respectfully submitted that it would have been obvious to one of ordinary skill in the art at the time of the invention to further modify the combination of Wang et al. and Miyano with the invention of Palanisamy et al. in order to substitute a time series data for a scale based hierarchical structure (e.g., see Palanisamy et al. @ [0101]). 

Allowable Subject Matter
Claims 8-12 and 16-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MYRON K WYCHE whose telephone number is (571)272-3390.  The examiner can normally be reached on 7:30 am - 3:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kathy Wang-Hurst can be reached on 571-270-5371.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished 




/Myron Wyche/                           2/27/2021
Primary Examiner                      AU2644