DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant' s claim for foreign priority under 35 U.S.C. 119 (a)-(d).  Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Response to Amendment
The amendment filed 2022-09-28 has been entered.  The status of claims is as follows:
Claims 1-25 are pending in the application.
Claims 1, 3-5, 10-12, 21-22, and 24-25 are amended.
Response to Arguments
Applicant’s arguments with respect to objections to the drawings have been considered and are persuasive, and thus the objections are hereby withdrawn.
Applicant’s arguments with respect to rejections of claims 22 and 24 under 35 USC 112(b) have been considered, and the rejections are hereby withdrawn in light of the amendments to the claims. 
Applicant's arguments with respect to rejections under 35 USC 101 have been fully considered but they are not persuasive.
Applicant argues, regarding claims 1-23 and 25, on Remarks Page 10, that “‘estimating ego motion information based on the input data using a motion recognition model,’ cannot practically be performed in the human mind, as the human mind is not equipped to perform such complex operation of the model.”  Examiner respectfully disagrees, as the how “complex” the model is, is not demonstrated in the claim limitations which do not recite details on the model structure, and thus could be a relatively simple model that could be evaluated with pen and paper.  Examiner also points out that evaluating even a “complex” model could also be considered another branch of abstract idea, a “mathematical concept”.
Applicant argues, regarding Claim 24, on Remarks Pages 10-12 that the claim does not recite a mathematical concept.  Applicant points out PEG Example 38, which recites a sequence of limitations that are not deemed, in themselves, to be mathematical concepts, because they do not recite a mathematical relationship, formula, or calculation, as opposed to PEG Example 41 which explicitly recites a formula.  However, Examiner points out that broadly recited “training” of a machine learning model is a mathematical concept, comprising mathematical operations of updating a loss function with a gradient until error is minimized.  Applicant argues that the October 2019 guidance states that “A claim that recites a numerical formula or equation will be considered as falling within the 'mathematical concepts' grouping. In addition, there are instances where a formula or equation is written in text format that should also be considered as falling within this grouping.”  Examiner points out that this is not restrictive, and does not state that a mathematical concept is only a numerical formula or equation.  MPEP 2106.04(a)(2)(I) states:  “It is important to note that a mathematical concept need not be expressed in mathematical symbols, because ‘words used in a claim operating on data to solve a problem can serve the same purpose as a formula.’”  Examiner reiterates that this applies to broadly stated “training” of a machine learning model, which is a mathematical concept.
Applicant argues on Remarks Page 13 that “Applicant respectfully submits that the present claims impose a meaningful limit on the claimed features cited by the Office”.  Examiner respectfully disagrees, as while amended claim 1 recites the additional limitation “controlling a function of an apparatus based on the estimated ego motion information”, this amounts to “Mere Instructions to Apply an Exception”, as stated in MPEP 2106.05(f), where Subsection 1 states:  “Whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished”.  Here, no details are recited as to precisely how the ego motion estimation by the model is applied in such a way that it controls a function of an apparatus, and thus it is unknown what “controlling a function” refers to.
Applicant argues on Remarks Page 13 that the claimed invention “improves the technical field of ego motion estimation”.  However, how the limitations of the claim lead to any alleged improvement, and what the alleged improvement is, is not evident in the claimed limitations, nor argued in the remarks.
Applicant's arguments with respect to rejections under 35 USC 103 have been fully considered but they are not persuasive.  Applicant argues on Remarks Page 17 that “As shown, Du merely discloses that ‘each video frame is fed into a CNN part in sequential order to extract semantic visual features individually,’ where Figure 1 shows the 25 frames of the video (i.e., from a first frame F(1) to a twenty-fifth frame F(25)) are sequentially and individually input to the CNN.  That is, Du does not extract a visual feature of a current frame (e.g., F(2)) from the current frame (e.g., F(2)) and a previous frame (e.g., F(1)).  Rather, the visual feature of the current frame (e.g., F(2)) is only extracted from the current frame itself (e.g., F(2)).” Examiner respectfully disagrees, as even if Du does extract features from consecutive frames “sequentially”, Du still extracts visual features from both the previous and current frames and inputs them both to the CNN, one after the other.  The visual features can be interpreted as “current feature data” in the case of both frames, as the features in the “previous frame” were “current”, as of the time of the previous frame.  Also, the word “current” may include the previous and current frames collectively, as “current feature data” has not been restrictively defined in the Specification.  Examiner points out that there is no language in the claimed limitations of Claims 1-5 that states that the previous and current frames must be input to the CNN concurrently as a single input.  In fact, any details about the structure of the CNN, i.e. the layers, or how input is arranged into the layers, are absent from Claims 1-5.  Therefore, in these claims, Du is still sufficient to read on the amended claims.
Examiner further points out that Applicant’s arguments above are better detailed by previously existing limitations in Claims 9, 14, and 20.  For example, Claim 9 recites “wherein each of the layers corresponds to a respective one of the plurality of time frames” and Claims 14 and 20 recite “plurality of time frames stacked in the input data”.  These were both taught by Karpathy et al. in the previous Non-Final Office Action.  For the reasons explained above, Examiner does not see the submitted amendments to be sufficient to require incorporation of Karpathy into the rejections of Claims 1-5, and thus the combination of prior art applied for these claims will remain unchanged.  Examiner also draws attention to the Qiao, Sudhakaran, and Muller references in the “prior art not relied upon” section of the Conclusion of this action.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-23 and 25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea, specifically a mental process, without significantly more. 
Step 1 Analysis:
Claims 1-22 are directed to a method, Claim 23 is directed to a non-transitory computer-readable storage medium, and Claim 25 is directed to an apparatus.  Therefore, the claims are all directed to one of the four statutory categories of patent eligible subject matter.
Step 2A Prong 1 Analysis:
Claims 1, 23, and 25 recite:  “estimating, using a motion recognition model, ego motion information based on feature data of the current frame extracted from the input data of the current frame and from the input data of the previous frame”; estimating data based on a model can be performed by a human with pen and paper, and is thus a mental process.
Step 2A Prong 2 Analysis:
The judicial exception is not integrated into a practical application because additional element “generating input data based on radar sensing data collected by one or more radar sensors for each of a plurality of time frames including a current frame and a previous frame” amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).  Also, additional element “controlling a function of an apparatus based on the estimated ego motion information” amounts to “Mere Instructions to Apply an Exception”, as stated in MPEP 2106.05(f), where Subsection 1 states:  “Whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished”.  Here, no details are recited as to precisely how the ego motion estimation by the model is applied in such a way that it controls a function of an apparatus, and thus it is unknown what “controlling a function” refers to.  As per Claim 25, additional element “one or more radar sensors” amounts to merely applying the judicial exception to a particular field of use and technological environment, as the data gathering is limited to a particular data source (radar).  See MPEP 2106.05(h):  “For instance, a data gathering step that is limited to a particular data source (such as the Internet) or a particular type of data (such as power grid data or XML tags) could be considered to be both insignificant extra-solution activity and a field of use limitation.”  There are no meaningful limits placed on the practice of the judicial exception.
Step 2B Analysis:
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as discussed above, additional element “generating input data based on radar sensing data collected by one or more radar sensors for each of a plurality of time frames including a current frame and a previous frame” amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)). Also, additional element “controlling a function of an apparatus based on the estimated ego motion information” amounts to “Mere Instructions to Apply an Exception”, as stated in MPEP 2106.05(f), where Subsection 1 states:  “Whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished”.  Here, no details are recited as to precisely how the ego motion estimation by the model is applied in such a way that it controls a function of an apparatus, and thus it is unknown what “controlling a function” refers to.  As per Claim 25, additional element “one or more radar sensors”, as discussed above, amounts to merely applying the judicial exception to a particular field of use and technological environment.  There is no indication that the data is collected from the sensors in a novel or unconventional manner, and therefore this additional element does not amount to significantly more than the judicial exception.  The claims are directed to the judicial exception.
Dependent Claims 2-22 have been determined to also not integrate the judicial exception into a practical application, for the following reasons.
Claim 2 recites “wherein the estimating of the ego motion information comprises: extracting feature data from the input data using a first model of the motion recognition model; and determining the ego motion information based on the feature data using a second model of the motion recognition model”; determining information based on a model can be performed by a human with pen and paper, and is thus a mental process, while additional element “extracting feature data” amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 3 recites “wherein the estimating of the ego motion information comprises: determining either one or both of a position and a pose of an apparatus as the ego motion information”; this determining can be performed by a human with pen and paper, and is thus a mental process.
Claim 4 recites “wherein the estimating of the ego motion information comprises: inputting, as the input data, radar sensing data corresponding to at least two time frames into a layer of the motion recognition model corresponding to one of the at least two time frames”; inputting the radar sensing data amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 5 recites “wherein the estimating of the ego motion information comprises: extracting current feature data from input data of a current frame and a previous frame of the time frames, using a first model; and determining current ego motion information based on the current feature data, using a second model”; determining information based on a model can be performed by a human with pen and paper, and is thus a mental process, while additional element “extracting feature data” amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 6 recites “wherein the estimating of the ego motion information comprises: extracting subsequent feature data from input data of a subsequent frame of the time 25012052.1695 frames and the current frame, using the first model; and determining subsequent ego motion information based on the subsequent feature data, using the second model”; determining information based on a model can be performed by a human with pen and paper, and is thus a mental process, while additional element “extracting feature data” amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 7 recites “wherein the extracting of the subsequent feature data comprises excluding the input data of the previous frame from the extracting of the subsequent feature data”; “extracting feature data”, or a subset thereof, amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 8 recites “wherein the first model comprises a convolutional neural network and the second model comprises a recurrent neural network”; by merely reciting neural network models without further details, this claim is directed to another abstract idea, namely a mathematical concept. 
Claim 9 recites “wherein the motion recognition model comprises: a first model including layers, wherein each of the layers corresponds to a respective one of the plurality of time frames; and a second model connected to the layers of the first model, and wherein the estimating of the ego motion information comprises: extracting, using a layer of the layers in the first model corresponding to a time frame of the plurality of time frames, feature data from input data of the time frame; and determining ego motion information of the time frame based on the extracted feature data using on the second model”; the claim merely recites that the model is two models, with no details on how each model is constructed, or what type of models they are, and thus extracting and determining information with the constituent models still amounts to a mental process.
Claim 10 recites “wherein the estimating of the ego motion information comprises: extracting current feature data from input data corresponding to a current frame using a first model; loading previous feature data corresponding to a previous frame from a memory; and determining the ego motion information based on the previous feature data and the current feature data using a second model”; determining the information can be performed by a human with pen and paper, and is thus a mental process, while extracting and loading feature data into memory amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 11 recites “storing, in a memory, feature data determined for a current frame using a first model included in the motion recognition model”; storing feature data in memory does not integrate the judicial exception into a practical application because it amounts to mere instructions to apply the judicial exception on a computer (MPEP 2106.05(f)) and it is not sufficient to recite significantly more than the judicial exception because it is well-understood, routine, and conventional activity (Storing and retrieving information in memory; 2106.05(d)(II)(iv)).
Claim 12 recites “wherein the generating of the input data comprises:  26012052.1695 detecting a radar signal using one or more radar sensors arranged along an outer face of an apparatus; generating the radar sensing data by preprocessing the detected radar signal; and generating the input data based on the preprocessed radar signal”; preprocessing and generating can be performed by a human with pen and paper, and are thus a mental process, while additional element “detecting a radar signal” amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)) as well as merely applying the judicial exception to a particular field of use and technological environment (MPEP 2106.05(h)).
Claim 13 recites “wherein the generating of the input data comprises: selecting two or more items of the radar sensing data corresponding to time frames of the time frames differing from each other by a preset time interval; and generating the input data based on the selected items of radar sensor data”; selecting items is a mental process, and generating input data amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 14 recites “wherein the generating of the input data comprises: excluding radar sensing data corresponding to a first frame of the plurality of time frames stacked in the input data, in response to radar sensing data corresponding to a subsequent frame being received”; excluding data based on some condition is a judgment that can be performed in the human mind, and is thus a mental process.
Claim 15 recites “wherein the generating of the input data comprises: generating radar sensing data indicating an angle and a distance from a point detected by the one or more radar sensors for each quantized velocity from a radar signal”; generating data from radar sensors amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 16 recites “wherein the generating of the input data comprises: generating input data indicating a horizontal angle and a distance from a point detected by the one or more radar sensors for each quantized elevation angle from the radar sensing data”; generating data from radar sensors amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 17 recites “wherein the generating of the input data comprises: classifying the radar sensing data into static data of a static point detected by the one or more radar sensors and dynamic data of a dynamic point detected by the one or more radar sensors; generating static input data indicating a horizontal angle and a distance from the static point for each quantized elevation angle based on the static data; and generating dynamic input data indicating a horizontal angle and a distance from the dynamic point for each quantized elevation angle based on the dynamic data”; classifying data is a mental process, and generating input data amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).
Claim 18 recites “wherein the motion recognition model includes a convolutional neural network and a recurrent neural network (RNN)”; by merely reciting neural network models without further details, this claim is directed to another abstract idea, namely a mathematical concept.
Claim 19 recites “wherein the RNN is a bi-directional neural network”; by merely reciting neural network models without further details, this claim is directed to another abstract idea, namely a mathematical concept.
Claim 20 recites “wherein the estimating of the ego motion information comprises: determining, in response to a plurality of items of radar sensing data corresponding to a plurality of time frames being stacked in the input data, ego motion information for each of the plurality of time frames”; determining information can be performed by a human with pen and paper, and is thus a mental process.
Claim 21 recites “detecting an object in a vicinity of an apparatus based on the estimated ego motion information”; detecting an object based on information can be performed by a human with pen and paper and is thus a mental process.
Claim 22 recites “generating reference input data for a plurality of training time frames based on reference radar sensing data and reference output data corresponding to the reference input data; and generating the motion recognition model by training parameters of a model to output the reference output data based on the reference input data”; training a machine learning model, broadly recited with no details on how the training is achieved, is another abstract idea, namely a mathematical concept; generating input data amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 24 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea, specifically a mathematical concept, without significantly more. 
Step 1 Analysis:
Claim 24 is directed to a method. Therefore, the claim is directed to one of the four statutory categories of patent eligible subject matter.
Step 2A Prong 1 Analysis:
Claim 24 recites:  “generating a motion recognition model by training parameters of a model to output the reference output data based on feature data of the current frame extracted from the reference input data of the current frame and from the reference input data of the previous frame”; training a model, broadly recited without any specifics, amounts to a mathematical concept.
Step 2A Prong 2 Analysis:
The judicial exception is not integrated into a practical application because additional element “generating reference input data for each of a plurality of time frames including a current frame and a previous frame, based on reference radar sensing data and based on reference output data corresponding to the reference input data” amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).  There are no meaningful limits placed on the practice of the judicial exception.
Step 2B Analysis:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, as discussed above, additional element generating reference input data for each of a plurality of time frames including a current frame and a previous frame, based on reference radar sensing data and based on reference output data corresponding to the reference input data” amounts to insignificant extra solution activity (mere data gathering; MPEP 2106.05(g)(3)).  There is no indication that the data is collected in a novel or unconventional manner, and therefore this additional element does not amount to significantly more than the judicial exception.  The claims are directed to the judicial exception.

Remarks – 35 USC 101
While merely reciting “training” or use of a model is not sufficient on its own, a specific method of training a machine learning model is understood to not be an abstract idea, as noted in MPEP 2106.04(a)(1)(vii):  “a method of training a neural network for facial detection comprising: collecting a set of digital facial images, applying one or more transformations to the digital images, creating a first training set including the modified set of digital facial images; training the neural network in a first stage using the first training set, creating a second training set including digital non-facial images that are incorrectly detected as facial images in the first stage of training; and training the neural network in a second stage using the second training set”.  The preceding example recites a specific 2-step training process.  However, Claims 22 and 24 in the instant application merely recite “training parameters of a model to output the reference output data based on the reference input data”.  Training a model to produce output based on input is a redundant statement that merely restates the definition of training.  If there is support in the Specification, a limitation describing the training process may be enough to amount to more than the judicial exception.
Examiner also points out that, if supported by the Specification, the judicial exception could potentially be integrated into a practical application with a subsequent limitation such as “and automatically steering a vehicle based on the results of the ego motion estimation.”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8, 10-13, 16, 18, and 21-25 are rejected under 35 U.S.C. 103 as being unpatentable over Du et. al. (“Ego-motion Classification for Driving Vehicle”; hereinafter “Du”) in view of Cen et. al. (“Precise Ego-Motion Estimation with Millimeter-Wave Radar under Diverse and Challenging Conditions; hereinafter “Cen”).
As per Claim 1, Du teaches A processor-implemented ego motion estimation method, the method comprising (Du, Page 277 Section IV Sentence 1, discloses ego motion estimation:  “We propose an end-to-end deep model to address the problem of ego-motion classification.”  Du, Page 278 Section V Sentence 2, discloses a processor:  “We used a GTX 1080 GPU and implement the network in Caffe [8].”)
generating input data based on [radar] sensing data collected by one or more [radar] sensors for each of a plurality of time frames including a current frame and a previous frame (Du, Page 277 Section III, discloses cameras detecting videos with multiple frames:  “To our knowledge, there is no dataset public available for ego-motion classification. In this paper, to provide a better benchmark on ego-motion classification, we collected a new video dataset, named Campus20. The videos are recorded on some typical roads in 20 different days. Data was collected in the clear days, in the day time. The dataset consists of 15 videos for training and 5 videos for testing, each lasts for 5 mins. The frame rate is 18 FPS, and the spatial resolution is 320×240 pixels. Each frame is labeled with a specific state of ego-motion action.”  If there are multiple frames (“frame rate”), then there is a current frame and a previous frame.)
and estimating, using a motion recognition model, ego motion information based on feature data of the current frame extracted from the input data of the current frame and from the input data of the previous frame (Du, Page 277 Section IV, discloses a motion recognition model:  “We propose an end-to-end deep model to address the problem of ego-motion classification. The overall architecture is shown is Figure 2. It takes raw video sequences as inputs and output is the probability distribution of the corresponding video frames.”
Du, Pg 276 Figure 1, discloses:

    PNG
    media_image1.png
    477
    653
    media_image1.png
    Greyscale


Du, Fig 1 shown above, shows 25 frames.  A frame, for example the 24th frame can be the “current frame” and contains feature data and the previous frames 1-23 are previous frames containing feature data about the same features in the video, including those of the current frame.  Feature data from the previous and the current frames are sequentially extracted using the first model (CNN).  Current ego motion information based on the 24th frame is determined in the second model shown in the figure (LSTM))
controlling a function of an apparatus based on the estimated ego motion information (Du, Page 4 Section VI, discloses:  “We propose a real-time ego-motion prediction task related to autonomous driving.”  Here, “autonomous driving” is controlling a function of an apparatus, and Du states that their ego motion prediction is used to this end. While no specific mechanism is given, this is a similar level of detail as is given in Instant Specification [0135]:  “The ego motion estimation apparatus 1500 assists autonomous driving and various advanced driver assistance systems (ADAS) functions of the vehicle.”)
However, Du does not explicitly teach radar sensing data collected by one or more radar sensors.
Cen teaches radar sensing data collected by one or more radar sensors (Cen, Page 6050 Section IV, discloses:  “We utilize the Navtech CTS350-X, a FMCW scanning radar without Doppler information. For this radar, M = 399, N = 2000, and β = 0.25 m. The beam spread is 2 degrees in azimuth and 25 degrees in elevation. The radar operates at 4 Hz, and our algorithm (not fully optimized) operates at approximately 3 Hz. The radar is placed on the roof of a ground vehicle with an axis of rotation perpendicular to the driving plane.”)
Du and Cen are analogous art because they are both in the field of endeavor of ego motion estimation.
It would have been obvious before the effective filing date of the claimed invention to combine the ego motion estimation of Du with the radar of Du.  One of ordinary skill in the art would be motivated to do save money and resources, and achieve greater accuracy in all weather (Cen, Pg 6045 Abstract:  “In contrast to cameras, lidars, GPS, and proprioceptive sensors, radars are affordable and efficient systems that operate well under variable weather and lighting conditions, require no external infrastructure, and detect long-range objects.”)

As per Claim 2, the combination of Du and Cen teaches the method of claim 1. Du teaches wherein the estimating of the ego motion information comprises: 
extracting feature data from the input data using a first model of the motion recognition model (Du, Page 277 Section IV Para 1, discloses:  “The network consists of two parts, a CNN part which works as a visual feature extractor and a LSTM which acts as a temporal feature extractor.” Here, Du discloses extracting feature data with a first model (“CNN part which works as a visual feature extractor”).
 and determining the ego motion information based on the feature data using a second model of the motion recognition model (Du, Page 277 Section IV Para 1, shown above, discloses determining ego motion information (“temporal feature extractor”) with a second model (“LSTM”)).

As per Claim 3, the combination of Du and Cen teaches the method of claim 1. Du teaches wherein the estimating of the ego motion information comprises: 
determining either one or both of a position and a pose of the apparatus as the ego motion information (Du, Page 276 Intro Para 2, discloses:  “In this paper, we formulate the problem of ego-motion classification as event detection in video streams: given video streams recorded in real time, we categorize each frame into one of possible action states (turning, lane-changing, reversing, lane-following, crossing, turn-left and turn-right).”  Turning and lane changing correspond to positions and poses of a vehicle.)

As per Claim 4, the combination of Du and Cen teaches the method of claim 1 as well as radar sensing data (see Rejection to Claim 1). Du teaches wherein the estimating of the ego motion information comprises: 
inputting, as the input data, radar sensing data corresponding to at least two time frames into a layer of the motion recognition model corresponding to one of the at least two time frames.  (Du, Pg 276 Figure 1, discloses:

    PNG
    media_image1.png
    477
    653
    media_image1.png
    Greyscale

Here, Du shows sensing data for 25 frames, being input into corresponding layers of a CNN. DU confirms 25 frames in Section V A Para 3:  “The CNN part in our model output 25×4096 data, when fed into a 25-frame video sequence each time step."  The time frames are input to the CNN sequentially, and at the time the individual frame is input, the input layer “corresponds” to that time frame.)
wherein the at least two time frames comprises the current frame and the previous frame (Du, above in Fig. 1, discloses 25 time frames, thus including a current and previous time frame).
wherein the [radar] sensing data comprises the feature data of the current frame (Recall in Claim 1 that Cen discloses radar.  Du, above discloses 25 time frames, including feature data of the current frame.)

As per Claim 5, the combination of Du and Cen teaches the method of claim 1.  Du teaches wherein the estimating of the ego motion information comprises: 
wherein the estimating of the ego motion information comprises: extracting the feature data of the current frame from the input data of the current frame and the input data of the previous frame of the time frames, using a first model; and determining current ego motion information of the ego motion information based on the feature data of the current frame, using a second model.  (Du, Fig 1 shown above, shows 25 frames.  A frame, for example the 24th frame contains feature data for the current frame and the previous frames 1-23 of the same video with the same features.  Current feature data of the 24th frame is extracted using the first model (CNN).  Current ego motion information based on the 24th frame is determined in the second model shown in the figure (LSTM)).

As per Claim 6, the combination of Du and Cen teaches the method of claim 5.  Du teaches wherein the estimating of the ego motion information comprises: 
extracting subsequent feature data from input data of a subsequent frame of the time frames and the current frame, using the first model; and determining subsequent ego motion information based on the subsequent feature data, using the second model.  (Du, Fig 1 shown above, shows 25 frames.  The current frame, for example, may be the 24th frame, then the subsequent frame may be the 25th frame.  Subsequent feature data of the 25th frame is extracted using the first model (CNN).  Subsequent ego motion information based on the 25th frame is determined in the second model shown in the figure (LSTM)).

As per Claim 7, the combination of Du and Cen teaches the method of claim 6.  Du teaches wherein the extracting of the subsequent feature data comprises excluding the input data of the previous frame from the extracting of the subsequent feature data.  (Du, Fig 1 shown above, shows only one frame at a time being entered into the first model (CNN).  Therefore, Du discloses excluding the input data of the previous frame from the extracting of the feature data of the subsequent frame.)

As per Claim 8, the combination of Du and Cen teaches the method of claim 5.  Du teaches wherein the first model comprises a convolutional neural network and the second model comprises a recurrent neural network.  (Du, Fig 1 shown above, discloses that the first model is a CNN, and the second model is an LSTM.  One of ordinary skill in the art will appreciate that an LSTM is a type of RNN.)


As per Claim 10, the combination of Du and Cen teaches the method of claim 1 and ego motion (see Rejection to Claim 1).  Du teaches wherein the estimating of the ego motion information comprises: 
extracting the feature data of the current frame from the input data of the current frame using a first model (Du, Fig 1 shown above, discloses extracting feature data with a CNN from Frame 24, which can be considered a current frame).
loading previous feature data corresponding to the previous frame from a memory; (Du, Fig 1 shown above, discloses extracted feature data from adjoining frames (i.e., 23 and 24) being connected together in an LSTM in order to determine ego motion information.  Du, Page 278 Section V Sentence 2, discloses a processor:  “We used a GTX 1080 GPU and implement the network in Caffe [8].”  As Du is using a processor to implement this method, this requires that the GPU load the operands for its operations from memory.  Thus, the extracted feature data from the previous frame must be retrieved from memory in order to input it into the second model (LSTM) with the extracted feature data from the current frame.)
and determining the ego motion information based on the previous feature data and the feature data of the current frame using a second model (Du, Fig 1 shown above, discloses extracted feature data from adjoining frames (i.e., 23 and 24) being connected together in an LSTM in order to determine ego motion information.)

As per Claim 11, the combination of Du and Cen teaches the method of claim 1.  Du teaches further comprising: storing, in a memory, the feature data of the current frame determined for the current frame using a first model included in the motion recognition model. (Du, Fig 1 shown above, discloses extracted feature data from adjoining frames (i.e., 23 and 24) using a first model (CNN).  Du, Page 278 Section V Sentence 2, discloses a processor:  “We used a GTX 1080 GPU and implement the network in Caffe [8].”  As Du is using a processor to implement this method, this requires that the GPU load the operands for its operations from memory.  Thus, the extracted feature data must be stored in memory for the GPU to execute the subsequent operations, including the operations of the second model (LSTM)).

As per Claim 12, the combination of Du and Cen teaches the method of claim 1.  Du teaches wherein the generating of the input data comprises:  26012052.1695
detecting a radar signal using one or more radar sensors arranged along an outer face of the apparatus (Cen, Page 6050 Section IV, discloses:  “The radar is placed on the roof of a ground vehicle with an axis of rotation perpendicular to the driving plane.”)
generating the radar sensing data by preprocessing the detected radar signal (Cen, Page 6047, Right Column Para 3, discloses:  “The landmark extraction method, as described next, references Figure 3 and Algorithm 1. To begin, an unbiased signal q that preserves high-frequency information (box 2) is acquired by subtracting the noise floor of v(s) from s (line 1)”  Here, Cen discloses preprocessing (“subtracting the noise floor”)).
and generating the input data based on the preprocessed radar signal (Cen, Page 6047, Right Column Para 3, continues:  “The result is then smoothed to obtain the underlying low frequency signal p (box 3), which better exposes obvious landmark peaks (line 2)”).
It would have been obvious to combine the teachings of Du and Cen for at least the reasons recited in Claim 1.

As per Claim 13, the combination of Du and Cen teaches the method of claim 1 as well as radar (see Rejection to Claim 1).  Du teaches wherein the generating of the input data comprises: 
selecting two or more items of the radar sensing data corresponding to time frames of the time frames differing from each other by a preset time interval; (Du, Page 277 Section III Para 2, discloses:  “The dataset consists of 15 videos for training and 5 videos for testing, each lasts for 5 mins. The frame rate is 18 FPS, and the spatial resolution is 320×240 pixels.”  Here, Du discloses that the frames differ by a preset time interval (“frame rate is 18 FPS”)).
and generating the input data based on the selected items of radar sensor data (Du, Page 277 Section IV, discloses:  “We propose an end-to-end deep model to address the problem of ego-motion classification. The overall architecture is shown is Figure 2. It takes raw video sequences as inputs.”)

As per Claim 16, the combination of Du and Cen teaches the method of claim 1.  Cen teaches wherein the generating of the input data comprises: generating input data indicating a horizontal angle and a distance from a point detected by the one or more radar sensors for each quantized elevation angle from the radar sensing data. (Cen, Page 6050 Section IV, discloses:  “We utilize the Navtech CTS350-X, a FMCW scanning radar without Doppler information. For this radar, M = 399, N = 2000, and β = 0.25 m. The beam spread is 2 degrees in azimuth and 25 degrees in elevation.”  Here, Cen discloses radar, which measures distance, as well as sweeping through horizontal angle (“azimuth”) as well as each quantized elevation angle (“25 degrees in elevation”)).
It would have been obvious to combine the teachings of Du and Cen for at least the reasons recited in Claim 1.

As per Claim 18, the combination of Du and Cen teaches the method of claim 1.  Du teaches wherein the motion recognition model includes a 27012052.1695 convolutional neural network and a recurrent neural network (RNN) (Du, Fig 1 shown above, discloses that the first model is a CNN, and the second model is an LSTM.  One of ordinary skill in the art will appreciate that an LSTM is a type of RNN.)


As per Claim 21, the combination of Du and Cen teaches the method of claim 1.  Du teaches further comprising: detecting an object in a vicinity of the apparatus based on the estimated ego motion information. (Cen, Page 6046 Last Paragraph before Section II, discloses “landmark extraction” as part as “motion estimation”:  “In this paper, we present robust radar-only motion estimation using our own algorithms for landmark extraction and scan matching.”)
It would have been obvious to combine the teachings of Du and Cen for at least the reasons recited in Claim 1.

As per Claim 22, the combination of Du and Cen teaches the method of claim 1 as well as radar (see Rejection to Claim 1).  Du teaches further comprising: 
generating reference input data for a plurality of training time frames based on reference radar sensing data and based on reference output data corresponding to the reference input data; (Du, Page 277 Section III, discloses cameras detecting videos with multiple frames:  “To our knowledge, there is no dataset public available for ego-motion classification. In this paper, to provide a better benchmark on ego-motion classification, we collected a new video dataset, named Campus20. The videos are recorded on some typical roads in 20 different days. Data was collected in the clear days, in the day time. The dataset consists of 15 videos for training and 5 videos for testing, each lasts for 5 mins. The frame rate is 18 FPS, and the spatial resolution is 320×240 pixels. Each frame is labeled with a specific state of ego-motion action.”  Du shown above discloses “15 videos for training”, which means they have a known output for training.  Du discloses “Each frame is labelled”, and these labels are the reference output data corresponding to reference input data.)
and generating the motion recognition model by training parameters of a model to output the reference output data based on the reference input data (Du shown above discloses “15 videos for training”).

	As per Claim 23, Claim 23 is a non-transitory computer-readable storage medium claim corresponding to method Claim 1.  Du, Page 278 Section V Sentence 2, discloses the use of a processor, and thus suggests a computing system, which includes a computer readable storage medium:  “We used a GTX 1080 GPU and implement the network in Caffe [8].”  Claim 23 is rejected for the same reasons as Claim 1.

As per Claim 24, Du teaches A processor-implemented ego motion estimation method comprising: (Du, Page 277 Section IV Sentence 1, discloses ego motion estimation:  “We propose an end-to-end deep model to address the problem of ego-motion classification.”  Du, Page 278 Section V Sentence 2, discloses a processor:  “We used a GTX 1080 GPU and implement the network in Caffe [8].”)
generating reference input data for each of a plurality of time frames including a current frame and a previous frame, based on reference [radar] sensing data and based on reference output data corresponding to the reference input data (Du, Page 277 Section III, discloses cameras detecting videos with multiple frames:  “To our knowledge, there is no dataset public available for ego-motion classification. In this paper, to provide a better benchmark on ego-motion classification, we collected a new video dataset, named Campus20. The videos are recorded on some typical roads in 20 different days. Data was collected in the clear days, in the day time. The dataset consists of 15 videos for training and 5 videos for testing, each lasts for 5 mins. The frame rate is 18 FPS, and the spatial resolution is 320×240 pixels. Each frame is labeled with a specific state of ego-motion action.”
Du, Pg 276 Figure 1, discloses:

    PNG
    media_image1.png
    477
    653
    media_image1.png
    Greyscale

Du, Fig 1 shown above, shows 25 frames.  A frame, for example the 24th frame can be the “current frame” and contains feature data and the previous frames 1-23 are previous frames containing feature data about the same features in the video, including those of the current frame.  Feature data from the previous and the current frames are sequentially extracted using the first model (CNN).  Du shown above discloses “15 videos for training”, which means they have a known output for training.  Du discloses “Each frame is labelled”, and these labels are the reference output data corresponding to reference input data.)
and generating a motion recognition model by training parameters of a model to output the reference output data based on feature data of the current frame extracted from the reference input data of the current frame and from the reference input data of the previous frame. (Du, as shown above, discloses features extracted from the current frame and previous frame in a motion recognition model.  Du shown above discloses “15 videos for training”, and thus discloses the training of the model.)
However, Du does not explicitly teach radar sensing data.
Cen teaches radar sensing data (Cen, Page 6050 Section IV, discloses:  “We utilize the Navtech CTS350-X, a FMCW scanning radar without Doppler information. For this radar, M = 399, N = 2000, and β = 0.25 m. The beam spread is 2 degrees in azimuth and 25 degrees in elevation. The radar operates at 4 Hz, and our algorithm (not fully optimized) operates at approximately 3 Hz. The radar is placed on the roof of a ground vehicle with an axis of rotation perpendicular to the driving plane.”)
It would have been obvious to combine the teachings of Du and Cen for at least the reasons recited in Claim 1.

As per Claim 25, Du teaches An ego motion estimation apparatus comprising: (Du, Page 277 Section IV Sentence 1, discloses ego motion estimation:  “We propose an end-to-end deep model to address the problem of ego-motion classification.”
one or more [radar] sensors configured to generate [radar] sensing data; (Du, Page 277 Section III, discloses cameras detecting videos with multiple frames:  “To our knowledge, there is no dataset public available for ego-motion classification. In this paper, to provide a better benchmark on ego-motion classification, we collected a new video dataset, named Campus20. The videos are recorded on some typical roads in 20 different days. Data was collected in the clear days, in the day time. The dataset consists of 15 videos for training and 5 videos for testing, each lasts for 5 mins. The frame rate is 18 FPS, and the spatial resolution is 320×240 pixels. Each frame is labeled with a specific state of ego-motion action.”)
and one or more processors configured to: (Du, Page 278 Section V Sentence 2, discloses a processor:  “We used a GTX 1080 GPU and implement the network in Caffe [8].”)
generate input data based on the radar sensing data for each of a plurality of time frames of the [radar] sensing data including a current frame and a previous frame (Du, Page 277 Section III, discloses cameras detecting videos with multiple frames:  “To our knowledge, there is no dataset public available for ego-motion classification. In this paper, to provide a better benchmark on ego-motion classification, we collected a new video dataset, named Campus20. The videos are recorded on some typical roads in 20 different days. Data was collected in the clear days, in the day time. The dataset consists of 15 videos for training and 5 videos for testing, each lasts for 5 mins. The frame rate is 18 FPS, and the spatial resolution is 320×240 pixels. Each frame is labeled with a specific state of ego-motion action.” If there are multiple frames (“frame rate”), then there is a current frame and a previous frame.)
estimate, using a motion recognition model, ego motion information based on feature data of the current frame extracted from the input data of the current frame and from the input data of the previous frame; (Du, Page 277 Section IV, discloses:  “We propose an end-to-end deep model to address the problem of ego-motion classification. The overall architecture is shown is Figure 2. It takes raw video sequences as inputs and output is the probability distribution of the corresponding video frames.” Du, Pg 276 Figure 1, discloses:

    PNG
    media_image1.png
    477
    653
    media_image1.png
    Greyscale


Du, Fig 1 shown above, shows 25 frames.  A frame, for example the 24th frame can be the “current frame” and contains feature data and the previous frames 1-23 are previous frames containing feature data about the same features in the video, including those of the current frame.  Feature data from the previous and the current frames are sequentially extracted using the first model (CNN).  Current ego motion information based on the 24th frame is determined in the second model shown in the figure (LSTM))
controlling a function of an apparatus based on the estimated ego motion information (Du, Page 4 Section VI, discloses:  “We propose a real-time ego-motion prediction task related to autonomous driving.”  Here, “autonomous driving” is controlling a function of an apparatus, and Du states that their ego motion prediction is used to this end. While no specific mechanism is given, this is a similar level of detail as is given in Instant Specification [0135]:  “The ego motion estimation apparatus 1500 assists autonomous driving and various advanced driver assistance systems (ADAS) functions of the vehicle.”)
However, Du does not explicitly teach radar sensors configured to generate radar sensing data.
Cen teaches radar sensors configured to generate radar sensing data (Cen, Page 6050 Section IV, discloses:  “We utilize the Navtech CTS350-X, a FMCW scanning radar without Doppler information. For this radar, M = 399, N = 2000, and β = 0.25 m. The beam spread is 2 degrees in azimuth and 25 degrees in elevation. The radar operates at 4 Hz, and our algorithm (not fully optimized) operates at approximately 3 Hz. The radar is placed on the roof of a ground vehicle with an axis of rotation perpendicular to the driving plane.”)
It would have been obvious to combine the teachings of Du and Cen for at least the reasons recited in Claim 1.

Claims 9, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Cen further in view of Karpathy et. al. (“Large-scale Video Classification with Convolutional Neural Networks”; hereinafter “Karpathy”).
As per Claim 9, the combination of Du and Cen teaches the method of claim 1.  However, Du teaches wherein the motion recognition model comprises: 
a first model including layers; (Du, Page 277 Section IV A discloses:  “We build the CNN part from one of the popular backbone architectures, i.e., AlexNet [9]. As shown in Figure 3, it mainly contains 5 convolutional layers, one fully connected layer.”)
and a second model connected to the layers of the first model (Du, Pg 276 Figure 1, discloses:

    PNG
    media_image1.png
    477
    653
    media_image1.png
    Greyscale

Here, Du discloses a second model (LSTM) connected to the first model (CNN), and thus connected to the layers of the CNN.)
and wherein the estimating of the ego motion information comprises: 
extracting, using a layer of the layers in the first model corresponding to a time frame of the plurality of time frames, feature data from input data of the time frame (Du, Figure 1 shown above, discloses a video of 25 frames, each frame “feature learning” via the first model (CNN)).
and determining ego motion information of the time frame based on the extracted feature data using on the second model.  (Du, Figure 1 shown above, discloses the feature data being input to the second model (LSTM).  Du Page 277 Section IV discloses this is used for ego motion:  “We propose an end-to-end deep model to address the problem of ego-motion classification”).
However, the combination of Du and Cen does not teach wherein each of the layers corresponds to a respective one of the plurality of time frames.
Karpathy teaches wherein each of the layers corresponds to a respective one of the plurality of time frames. (Karpathy, Page 3 Bottom Left “Late Fusion”, discloses:  “The Late Fusion model places two separate single-frame networks (as described above, up to last convolutional layer C(256, 3, 1) with shared parameters a distance of 15 frames apart and then merges the two streams.”  Here, Karpathy discloses a CNN constructed from CNNs for individual frames, then merged in a final layer.  Therefore, this CNN has layers that correspond to a respective one of the plurality of time frames.)
Karpathy and the combination of Du and Cen are analogous art because they are both in the field of endeavor of using neural networks to analyze frames of sensor information.
It would have been obvious before the effective filing date of the claimed invention to combine the ego motion estimation with CNN and LSTM of Du, with the Late Fusion of Karpathy.  One of ordinary skill in the art would be motivated to so in order to better detect motion across the temporal domain (Karpathy, Page 3, Upper Right Column, discloses:  “Therefore, neither single frame tower alone can detect any motion, but the first fully connected layer can compute global motion characteristics by comparing outputs of both towers.”)

As per Claim 14, the combination of Du and Cen teaches the method of claim 1 as well as radar.  However, the combination of Du and Cen does not teach wherein the generating of the input data comprises: excluding radar sensing data corresponding to a first frame of the plurality of time frames stacked in the input data, in response to radar sensing data corresponding to a subsequent frame being received.
Karpathy teaches wherein the generating of the input data comprises: excluding radar sensing data corresponding to a first frame of the plurality of time frames stacked in the input data, in response to radar sensing data corresponding to a subsequent frame being received. (Karpathy, Page 3 Bottom Left “Early Fusion”, discloses:  “The Early Fusion extension combines information across an entire time window immediately on the pixel level. This is implemented by modifying the filters on the first convolutional layer in the single-frame model by extending them to be of size 11 × 11 × 3 × T pixels, where T is some temporal extent (we use T = 10, or approximately a third of a second).”  Here, Karpathy discloses “stacking” the frames into a single input into a CNN (also shown in Karpathy Figure 1 shown above).  Also, Karpathy discloses “T = 10”, which means as subsequent frames are received, the 11th frame in the past will be excluded from the stack.  See Karpathy Page 3 Figure 1 below.)

    PNG
    media_image2.png
    276
    511
    media_image2.png
    Greyscale

Karpathy and the combination of Du and Cen are analogous art because they are both in the field of endeavor of using neural networks to analyze frames of sensor information.
It would have been obvious before the effective filing date of the claimed invention to combine the ego motion estimation with CNN and LSTM of Du, with the Early Fusion of Karpathy.  One of ordinary skill in the art would be motivated to so in order to more precisely detect local motion direction and speed. (Karpathy, Page 3, “Early Fusion”, discloses:  “The early and direct connectivity to pixel data allows the network to precisely detect local motion direction and speed”)


As per Claim 20, the combination of Du and Cen teaches the method of claim 1 as well as ego motion and radar (see Rejection to Claim 1).  However, the combination of Du and Cen does not teach wherein the estimating of the ego motion information comprises:  determining, in response to a plurality of items of radar sensing data corresponding to a plurality of time frames being stacked in the input data, ego motion information for each of the plurality of time frames.  
Karpathy teaches wherein the estimating of the ego motion information comprises:  
determining, in response to a plurality of items of radar sensing data corresponding to a plurality of time frames being stacked in the input data, ego motion information for each of the plurality of time frames.  (Karpathy, Page 3 Bottom Left “Early Fusion”, discloses:  “The Early Fusion extension combines information across an entire time window immediately on the pixel level. This is implemented by modifying the filters on the first convolutional layer in the single-frame model by extending them to be of size 11 × 11 × 3 × T pixels, where T is some temporal extent (we use T = 10, or approximately a third of a second).”  Here, Karpathy discloses “stacking” the frames into a single input into a CNN (also shown in Karpathy Figure 1 shown above).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Karpathy with Du and Cen for at least the reasons recited in Claim 14.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Cen further in view of Kellner et. al. (“Instantaneous Ego-Motion Estimation using Multiple Doppler Radars”; hereinafter “Kellner”).
As per Claim 15, the combination of Du and Cen teaches the method of claim 1.  However, the combination of Du and Cen does not teach wherein the generating of the input data comprises: generating radar sensing data indicating an angle and a distance from a point detected by the one or more radar sensors for each quantized velocity from a radar signal. 
Kellner teaches wherein the generating of the input data comprises: generating radar sensing data indicating an angle and a distance from a point detected by the one or more radar sensors for each quantized velocity from a radar signal (Kellner, Page 1593 Bottom of Left Column, discloses:  “The azimuth position and radial velocity of each target is registered and evaluated using the System Motion Equation”.  Here, Kellner discloses for each quantized velocity (“radial velocity”), generating sensing data indicating an angle and a distance from a point (“azimuth position…of each target”).  “Azimuth” indicates an angle from a given direction, and one of ordinary skill in the art will appreciate that distance is part of this equation, and that radar indicates that distance to a given object.)
Kellner and the combination of Du and Cen are analogous art because they are both in the field of endeavor of using radar to estimate ego motion.
It would have been obvious before the effective filing date of the claimed invention to combine Du and Cen’s radar ego motion estimation with Kellner’s measurement of the radial velocity and azimuth position.  One of ordinary skill in the art would be motivated to do so in order to achieve improved accuracy (Kellner, End of Page 1596:  “It has been shown that the results remain stable even with a large number of moving objects or clutter in the field of view and also show promising accuracy in high dynamic maneuvers.”)

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Cen further in view of Barjenbruch et. al. (“Joint spatial- and Doppler-based ego-motion estimation for automotive radars”; hereinafter “Barjenbruch”).
As per Claim 17, the combination of Du and Cen teaches the method of claim 1.  Cen teaches  wherein the generating of the input data comprises: generating static input data indicating a horizontal angle and a distance from the static point for each quantized elevation angle based on the static data; (Cen, Page 6050 Section IV, discloses:  “We utilize the Navtech CTS350-X, a FMCW scanning radar without Doppler information. For this radar, M = 399, N = 2000, and β = 0.25 m. The beam spread is 2 degrees in azimuth and 25 degrees in elevation.”  Here, Cen discloses radar, which measures distance, as well as sweeping through horizontal angle (“azimuth”) as well as each quantized elevation angle (“25 degrees in elevation”).
Cen also discloses an assumption that the data is static in the same paragraph:  “We adopt the usual odometry assumptions that the environment is mostly static and non-deformable.”)
It would have been obvious to combine the teachings of Du and Cen for at least the reasons recited in Claim 1.
However, the combination of Du and Cen does not teach classifying the radar sensing data into static data of a static point detected by the one or more radar sensors and dynamic data of a dynamic point detected by the one or more radar sensors; and generating dynamic input data indicating a horizontal angle and a distance from the dynamic point for each quantized elevation angle based on the dynamic data.
Barjenbruch teaches classifying the radar sensing data into static data of a static point detected by the one or more radar sensors and dynamic data of a dynamic point detected by the one or more radar sensors (Barjenbruch, Page 842 Section C, discloses:  “Additionally to the position information the Doppler radar provides very precise velocity information. By matching the expected with the measured Doppler velocity the accuracy can be improved significantly. Further this step will suppress distortion by non-stationary objects in the vehicles environment. If the Doppler velocity of a target does not meet the expected velocity it will receive a very low weight in the metric function.” Here, Barjenbruch discloses assigning lower weights to objects if they are determined to be non-stationary, thus classifying data in to stationary and non-stationary.)
The combination of Du, Cen, and Barjenbruch thus teaches and generating dynamic input data indicating a horizontal angle and a distance from the dynamic point for each quantized elevation angle based on the dynamic data, as Cen discloses generating data indicating a horizontal angle and distance for each elevation, and Barjenbruch discloses applying a low weight (but not zero), to a dynamic object.
Barjenbruch and the combination of Du and Cen are analogous art because they are both in the field of endeavor of using radar to estimate ego motion.
It would have been obvious before the effective filing date of the claimed invention to combine Du and Cen’s radar ego motion estimation with Barjenbruch’s low weighting of non-stationary objects.  One of ordinary skill in the art would be motivated to do so in order to  improve accuracy by avoiding distortion of dynamic objects in the vehicle environment (Barjenbruch, Page 842 Section C:  “Further this step will suppress distortion by non-stationary objects in the vehicles environment.”)

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Cen further in view of Bin et. al. (“Describing Video With Attention-Based Bidirectional LSTM”; hereinafter “Bin”).
As per Claim 19, the combination of Du and Cen teaches the method of claim 18.  However, the combination of Du and Bin does not teach wherein the RNN is a bi-directional neural network.
Bin teaches wherein the RNN is a bi-directional neural network. (Bin, Page 2632 Left Column Last Paragraph, discloses:  “In this paper, we propose a bidirectional long short-term memory (BiLSTM) structure, which fully explores both the forward and backward temporal information among the whole sequence of video frames. Specifically, we design a joint model by integrating a forward pass long-short term memory (LSTM), a backward pass LSTM, and CNN features to comprehensively exploit the bidirectional global temporal.”)
Bin and the combination of Du and Cen are analogous art because they are both in the field of endeavor of using neural networks to analyze frames of sensor information.
It would have been obvious before the effective filing date of the claimed invention to combine the ego motion estimation with CNN and LSTM of Du, with the Bidirectional LSTM of Bin.  One of ordinary skill in the art would be motivated to so in order to improve performance by analyzing relationships in both forward and backward time directions (Bin, Page 2632 Upper Right Column, discloses:  “Different from unidirectional approaches, our bidirectional network not only explores future fragments in videos, but also utilizes previous information… Extensive experiments on several real-world video captioning datasets illustrate the superiority of our proposal compared to unidirectional ones and other state-of-the-art approaches.”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Qiao et al. (“Learning the frame-2-frame ego-motion for visual odometry with convolutional neural network”) discloses on Page 501:

    PNG
    media_image3.png
    448
    968
    media_image3.png
    Greyscale

Sudharakan et al. (“Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions”) discloses on Pg 2340:


    PNG
    media_image4.png
    395
    644
    media_image4.png
    Greyscale

And Pg 2342:

    PNG
    media_image5.png
    481
    1338
    media_image5.png
    Greyscale

Muller et al., (“Flowdometry: An Optical Flow and Deep Learning Based Approach to Visual Odometry”) discloses on Page 630 Section 5:  “The Flowdometry application introduces several contributions. An end-to-end convolutional neural network system takes two consecutive video frames as input, converts them to an optical flow image, then regresses odometry information based on the raw flow data.”
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126