Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Amendments and Remarks filed in the U.S. on 9/21/2022. Claims 1-4, 6-15, and 17-20 are pending in the case. Claims 1, 12, and 13 are written in independent form. Claims 5 and 16 have been cancelled.
Applicant’s amendments and remarks filed on 9/21/2022 have been fully considered but were not found to overcome the previously cited prior art. Accordingly, THIS ACTION IS MADE FINAL.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 6-15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Satazoda et al. (U.S. Pre-Grant Publication No. 2018/0012082, hereinafter referred to as Satazoda), and further in view of Cirit (U.S. Patent No. 10,111,043).

Regarding Claim 1:
Satazoda teaches a computer-implemented method of searching a collection of sensor samples, the method comprising:
maintain a collection of sensor samples and, for each sensor sample, one or more embeddings of the sensor sample, wherein:
Satazoda teaches “the system can additionally include one or more databases 230, which function to store data (e.g., image data, image labels, model parameters, feature values, embedding values, relative embedding values, etc.)….[and] can also function to facilitate retrieval according to any suitable metadata parameters” (Para. [0053]). Satazoda further teaches “sensor signals can be timestamped (e.g., with the sampling time)” (Para. [0033]) thereby teaching the sensor data as sensor samples.
each sensor sample in the collection is generated from sensor data captured by a corresponding vehicle and characterizes an environment region in a vicinity of the corresponding vehicle, and
Satazoda teaches “the sensors 211 of the vehicle system function to acquire sensor signals (e.g., image data)” where “sensor signals can be timestamped (e.g., with the sampling time), geotagged, associated with the vehicle system identifier, associated with the user identifier, associated with the vehicle identifier, or associated with any suitable set of data” (Para. [0033]). Stazoda further teaches that the image data is “imagery of the surroundings of the vehicle” that is then to be analyzed (Para. [0059]) thereby teaching the image data characterizing an environment region in the vicinity of the corresponding vehicle.
each embedding has been generated by processing data from the corresponding sensor sample through a neural network from a set of one or more embedding neural networks;
Satazoda teaches “the object detection module functions to detect that an object is depicted in image data” and “the output of the object detection module can include bounding boxes…]and] feature vectors based on image words (e.g., embeddings)” (Para. [0046]). Satazoda further teaches “each of the above modules can utilize one or more…an artificial neural network model” (Para. [0044]) thereby teaching using a neural network for the object detection module to generate the feature vectors based on image words (e.g., embeddings).

Satazoda explicitly teaches all of the elements of the claimed invention as stated above except:
an embedding neural network from a set of one or more embedding neural networks that have each been trained to process data from input sensor samples to generate a respective embedding for each input sensor sample;
receiving a request specifying a query sensor sample, wherein the query sensor sample characterizes a query environment region; and
identifying, from the collection of sensor samples, a plurality of relevant sensor samples that characterize similar environment regions to the query environment region, comprising:
processing the query sensor sample through one or more of the embedding neural networks in the set to generate one or more query embedding; and
identifying, from sensor samples in a subset of the sensor samples in the collection, a plurality of sensor samples that have embeddings that are closest to each of the query embeddings.

However, in the related field of endeavor of applying embeddings to sensor data, Cirit teaches:
an embedding neural network from a set of one or more embedding neural networks that have each been trained to process data from input sensor samples to generate a respective embedding for each input sensor sample;
Cirit teaches “the machine learning engine 240 uses machine learning techniques to train a model to generate embeddings for data samples” where “the machine learning engine 240 trains models based on feature vectors derived from data samples captured” and “the machine learning engine 240 may implement machine learning techniques such as deep learning, logistic regression, convolutional neural networks, or other types of dimensionality reduction processes” (Col. 7 Lines 47-67).
receiving a request specifying a query sensor sample, wherein the query sensor sample characterizes a query environment region; and
Cirit teaches receiving a request specifying a query data sample that characterizes an environment associated with sensor data for a vehicle trip record and “the embedding engine 230 searches the embedding data store 235 for reference embeddings with characteristics that correspond to the characteristics of an embedding of the data sample” (Col. 9 Lines 26-49).
identifying, from the collection of sensor samples, a plurality of relevant sensor samples that characterize similar environment regions to the query environment region, comprising:
Cirit teaches “the embedding engine 230 searches the embedding data store 235 for reference embeddings with characteristics that correspond to the characteristics of an embedding of the data sample” (Col. 9 Lines 26-49).
processing the query sensor sample through one or more of the embedding neural networks in the set to generate one or more query embedding; and
Cirit teaches receiving a request specifying a query data sample that characterizes an environment associated with sensor data for a vehicle trip record and “the embedding engine 230 searches the embedding data store 235 for reference embeddings with characteristics that correspond to the characteristics of an embedding of the data sample” (Col. 9 Lines 26-49) thereby teaching processing the query data sample to generate embedding of the data sample to be used to compare to characteristics of other data samples.
identifying, from sensor samples in a subset of the sensor samples in the collection, a plurality of sensor samples that have embeddings that are closest to each of the query embeddings; and
Cirit teaches “the embedding engine 230 can search for and retrieve reference embeddings stored in the embedding data store 235 by comparing characteristics of data samples (e.g., for text embeddings) with reference characteristics of reference embeddings” (Col. 12 Lines 55-67).
using the relevant sensor samples that have been identified from the collection of sensor samples that characterize similar environment regions to the query environment region to generate training data for a machine learning model.
Cirit teaches “the embedding engine 230 can search for and retrieve reference embeddings stored in the embedding data store 235 by comparing characteristics of data samples (e.g., for test embeddings) with reference characteristics of reference embeddings” (Col. 12 Lines 55-67) where “storing, indexing, or searching previously stored trip records that are uncompressed may be computationally expensive…Thus, compressing the data samples using embeddings allows the network system 100 to save computational resources and enable more efficient look-up of embeddings” (Col. 8 Lines 46-62). Cirit further teaches “the machine learning engine 240 trains models based on feature vectors derived from data samples captured” where “in some use cases for training models, the feature vectors are labeled based on characteristics of the data samples” (Col. 7 Lines 47-67). Therefore, Cirit teaches searching the embeddings (compressed data samples) for relevant data samples that characterize similar environment regions to a query environment region to generate training data of feature vectors for training machine learning engine 240.


Thus it would have been obvious to a person having ordinary skill in the art, having the teachings of Satazoda and Cirit at the time that the claimed invention was effectively filed, to have combined the search system for finding relevant embeddings, as taught by Cirit, with the system and method for image analysis, as taught by Satazoda.
One would have been motivated to make such combination because Cirit teaches “the embedding engine 230 can use the embeddings as a method of data compression” where “compressing the data samples using embeddings allows the network system 100 to save computational resources and enable more efficient look-up of embeddings” (Col. 8 Lines 46-62) and it would have been obvious to a person having ordinary skill in the art to use the embeddings taught by Satazoda with a look-up system taught by Cirit to more efficiently search for and compare embeddings data.


Regarding Claim 2:
Cirit and Satazoda further teach:
processing each sensor sample in the collection using each of the embedding neural networks in the set to generate the embeddings of the sensor sample.
Cirit teaches “the machine learning engine 240 uses machine learning techniques to train a model to generate embeddings for data samples” where “the machine learning engine 240 trains models based on feature vectors derived from data samples captured” and “the machine learning engine 240 may implement machine learning techniques such as deep learning, logistic regression, convolutional neural networks, or other types of dimensionality reduction processes” (Col. 7 Lines 47-67).
It is noted that Independent Claim 1 recites the set of embedding neural networks as “set of one or more embedding neural networks”, and thus it is understood that “each of the embedding neural networks in the set” can be only a single embedding neural network.

Regarding Claim 3:
Cirit and Satazoda further teach:
generating a visual representation for each of the plurality of relevant sensor samples; and
Satazoda teaches “the verification module includes a user interface, and Block S400 includes receiving a user selection of the annotation at the user interface (e.g., a click action proximal a bounding box displayed about an object in the image)” and “receiving the user selection includes receiving a bounding box input (e.g., the user draws a bounding box about an object within an image displayed at the user interface)” (Para. [0077]).
providing the visual representations for presentation on a user device.
Satazoda teaches “the verification module includes a user interface, and Block S400 includes receiving a user selection of the annotation at the user interface (e.g., a click action proximal a bounding box displayed about an object in the image)” and “receiving the user selection includes receiving a bounding box input (e.g., the user draws a bounding box about an object within an image displayed at the user interface)” (Para. [0077]) thereby teaching a visual representation for presentation on a user device.

Regarding Claim 6:
Cirit and Satazoda further teach:
wherein the query environment region has been identified as depicting an object of a particular object type, and
Cirit teaches receiving a request specifying a query data sample that characterizes an environment associated with sensor data for a vehicle trip record and “the embedding engine 230 searches the embedding data store 235 for reference embeddings with characteristics that correspond to the characteristics of an embedding of the data sample” (Col. 9 Lines 26-49). Satazoda further teaches “the object detection module functions to detect that an object is depicted in image data” and “detects any of the predetermined set of object types within image data” (Para. [0046]).
wherein the machine learning model is a machine learning classifier configured to classify sensor samples as depicting objects of the particular object type.
Satazoda teaches “the object detection module functions to detect that an object is depicted in image data” and “detects any of the predetermined set of object types within image data” (Para. [0046]) thereby teaching a machine learning model to classify sensor samples as depicting objects of a particular object type.


Regarding Claim 7:
Cirit and Satazoda further teach:
the sensor samples in the collection of sensor samples are each associated with a high-level classification that identifies an object type of an object located in the environment region characterized by the sensor sample;
Satazoda teaches “the classification module functions to determine a class of an object (e.g., object class) depicted in image data” where “a cascaded classifier that is made up of hierarchical classification modules (e.g., wherein each parent classification module performs a higher level classification than a child classification module” (Para. [0047]).
the request specifies a query high-level classification; and
Satazoda teaches a cascade classifier made up of hierarchical classification modules “wherein each parent classification module performs a higher level classification than a child classification module” (Para. [0047]).
the subset includes only sensor samples in the collection that are associated with the query high-level classification.
Satazoda teaches a cascade classifier made up of hierarchical classification modules “wherein each parent classification module performs a higher level classification than a child classification module” and “the output of the classification module can include bounding boxes (e.g., drawn around all or a portion of the classified object), annotated image data (e.g., with objects annotated with a text fragment corresponding to an associated object class), feature vectors based on image words (e.g., embeddings), and any other suitable output” (Para. [0047]). Therefore, Satazoda teaches segmenting sensor samples based on higher level classifications and returning “feature vectors based on image words (e.g., embeddings)” based on the higher level classification when a cascade classifier is used.


Regarding Claim 8:
Cirit and Satazoda further teach:
wherein the embeddings of the sensor samples in the collection and the query embedding are each generated in accordance with trained values of parameters of the embedding neural networks.
Satazoda teaches “executing the trained model at one or more processing modules to analyze…labeled data...and adjusting model parameters (e.g., weights, feature vector templates, etc.) to achieve a desired output (e.g., an accurately recognized object” (Para. [0082]) thereby teaching generating the sensor sample embeddings and the query embeddings in accordance with a trained model having trained model parameters.


Regarding Claim 9:
Cirit and Satazoda further teach:
wherein each sensor sample represents measurements from a plurality of sensors of the corresponding vehicle, with the measurements from each sensor characterizing the same region at the same time.
Satazoda teaches “the vehicle system can include: a set of sensors 211” (Para. [0032]) and “the sensors 211 of the vehicle system function to acquire sensor signals (e.g., image data)” and “the set of sensors can include: cameras…orientation sensors…acoustic sensors…optical sensors…temperature sensors, pressure sensors, flow sensors, vibration sensors, proximity sensors, chemical sensors, electromagnetic sensors, force sensors, or any other suitable type of sensor” (Para. [0033]). Satazoda further teaches combining vehicle sensors that characterize the same region at the same time by teaching “determining that the vehicle is driving on a highway or freeway based on a combination of an inertial measurement unit (IMU) signal and a speed signal received from an OBD module coupled to the vehicle” (Para. [0024]).

Regarding Claim 10:
Cirit and Satazoda further teach:
wherein the plurality of sensor samples that have embeddings that are closest to the query embedding are the sensor samples that are nearest to the query embedding according to a distance metric.
Cirit teaches the embedding engine 230 can determine similarity scores between two or more different embeddings, e.g., by comparing each latent dimension of the different embeddings, by determining a cosine similarity between the embeddings, or by using symbolic processing on discretized sensor traces” (Col. 7 Lines 4-26).

Regarding Claim 11:
Cirit and Satazoda further teach:
wherein the distance metric is Euclidian distance or cosine similarity.
Cirit teaches the embedding engine 230 can determine similarity scores between two or more different embeddings, e.g., by comparing each latent dimension of the different embeddings, by determining a cosine similarity between the embeddings, or by using symbolic processing on discretized sensor traces” (Col. 7 Lines 4-26).

Regarding Claim 12:
Some of the limitations herein are similar to some or all of the limitations of Claim 1.

Cirit and Satazoda further teach:
one or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform respective operations (Satazoda – Para. [0094]).

Regarding Claim 13:
Some of the limitations herein are similar to some or all of the limitations of Claim 1.

Cirit and Satazoda further teach:
a system comprising one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to perform operations (Satazoda – Para. [0094]).

Regarding Claim 14:
All of the limitations herein are similar to some or all of the limitations of Claim 2.

Regarding Claim 15:
All of the limitations herein are similar to some or all of the limitations of Claim 3.

Regarding Claim 17:
All of the limitations herein are similar to some or all of the limitations of Claim 7.

Regarding Claim 18:
All of the limitations herein are similar to some or all of the limitations of Claim 8.

Regarding Claim 19:
All of the limitations herein are similar to some or all of the limitations of Claim 9.

Regarding Claim 20:
All of the limitations herein are similar to some or all of the limitations of Claim 10.


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Satazoda and Cirit, and further in view of Goma et al. (U.S. Pre-Grant Publication No. 2011/0242342, hereinafter referred to as Goma).

Regarding Claim 4:
Satazoda and Cirit explicitly teach all of the elements of the claimed invention as stated above except:
wherein generating a visual representation for each of the plurality of relevant sensor samples comprises:
for each relevant sensor sample, identifying other sensor samples that were captured within a specified time window of the relevant sensor sample; and
generating a video representation of the other sensor samples and the relevant sensor sample.

However, in the related field of endeavor of processing data from image sensors, Goma teaches:
wherein generating a visual representation for each of the plurality of relevant sensor samples comprises:
for each relevant sensor sample, identifying other sensor samples that were captured within a specified time window of the relevant sensor sample; and
Goma teaches combining images that come from data streams that “have substantially the same timing characteristics” (Para. [0060]) and “the image processor 1308 is configured to process the frame data 1318 and to output the frame data 1320 to the display device. The processed frame data 1320 may have a 3d image format or a 3D video format” (Para. [0110]).
generating a video representation of the other sensor samples and the relevant sensor sample.
Goma teaches combining images that come from data streams that “have substantially the same timing characteristics” (Para. [0060]) and “the image processor 1308 is configured to process the frame data 1318 and to output the frame data 1320 to the display device. The processed frame data 1320 may have a 3D image format or a 3D video format” (Para. [0110]).

Thus it would have been obvious to a person having ordinary skill in the art, having the teachings of Goma, Satazoda and Cirit at the time that the claimed invention was effectively filed, to have combined the aggregation of sensor data from multiple image sensors, as taught by Goma, with the search system for finding relevant embeddings, as taught by Cirit, and the system and method for image analysis, as taught by Satazoda.
One would have been motivated to make such combination because Cirit teaches combining images from multiple sensors into a 3d image or video format to be displayed to a user (Para. [0110]) and it would have been obvious to a person having ordinary skill in the art that viewing a display of combined sensor data in 3D would allow for a more realistic visualization to view the data captured by the sensors.


Response to Amendment
Applicant’s Amendments, filed on 9/21/2022, are acknowledged and accepted.
As stated above and restated here for convenience, Applicant’s amendments and remarks filed on 9/21/2022 have been fully considered but were not found to overcome the previously cited prior art. Accordingly, THIS ACTION IS MADE FINAL.


Response to Arguments
On pages 8-9 of the Remarks filed on 9/21/2022, Applicant argues that “the applied references, either applied independently or in combination, fail to teach or suggest, at least, ‘using the relevant sensor samples that have been identified from the collection of sensor samples that characterize similar environment regions to the query environment region to generate training data for training a machine learning model’…as recited in amended claim 1, and similarly recited in amended independent claims 12 and 13” because “in Cirit, the reference embeddings are used to ‘determine if the test embedding is similar to the reference embedding generated for other trips having one or more of the same (or similar) characteristics to the test embedding and thereby verify the test embedding was generated during a trip having those characteristics’…[and] Cirit merely describe training a model ‘based on feature vectors derived from data samples captured for trip records of the network system 100’”.Applicant’s argument is not convincing because Cirit teaches “storing, indexing, or searching previously stored trip records that are uncompressed may be computationally expensive…Thus, compressing the data samples using embeddings allows the network system 100 to save computational resources and enable more efficient look-up of embeddings” (Col. 8 Lines 46-62).  Therfore, when Cirit teaches determining feature vectors of the data samples for training the machine learning engine (Col. 7 Lines 47-67), Cirit is teaching searching the embeddings for relevant data samples for training using an embedding engine 230 to “search for and retrieve reference embeddings stored in the embedding store 235 by comparing characteristics of data samples…with reference characteristics of reference embeddings” (Col. 12 Lines 55-67).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Non-Patent Literature Hallac et al., "Drive2Vec: Multiscale State-Space Embedding of Vehicular Sensor Data", December 10, 2018, IEEE (Year: 2018) teaches a deep learning-based method, called Drive2Vec, for embedding sensor data in a low-dimensional yet actionable form based on stacked gated recurrent units (GRUs) and accepting short interval automobile sensor data as input.
Dalal et al. (U.S. Pre-Grant Publication No. 2020/0176121) teaches receiving sensor data from at least one sensor that generates biophysical readings for a subject; by operation of a first neural network (NN), embedding the sensor data to generate embedded values; by operation of a second NN, generating imputed embedded values in response to the embedded values, the imputed embedded values including imputed values corresponding to one or more regions of the sensor data; and normalizing the embedded imputed values to generate imputed values (Para. [0019]).
Sun et al. (U.S. Pre-Grant Publication No. 2019/0129436) teaches receiving training data from a real world data collection system; obtaining ground truth data corresponding to the training data; performing a training phase to train a plurality of trajectory prediction models; and performing a simulation or operational phase to generate a vicinal scenario for each simulated vehicle in an iteration of a simulation, the vicinal scenarios corresponding to different locations, traffic patterns, or environmental conditions being simulated, provide vehicle intention data corresponding to a data representation of various types of simulated vehicle or driver intentions, generate a trajectory corresponding to perception data and the vehicle intention data, execute at least one of the plurality of trained trajectory prediction models to generate a distribution of predicted vehicle trajectories for each of a plurality of simulated vehicles of the simulation based on the vicinal scenario and the vehicle intention data, select at least one vehicle trajectory from the distribution based on pre-defined criteria, and update a state and trajectory of each of the plurality of simulated vehicles based on the selected vehicle trajectory from the distribution.
Krishnan (U.S. Pre-Grant Publication No. 2019/0156134) teaches human sensors are used to capture human eye movement, hearing, hand grip and contact area, and foot positions. Event signatures corresponding to human actions, reactions and responses are extracted from these sensor values and correlated to events, status and situations acquired using vehicle and outside environment sensors. These event signatures are then used to train vehicles to improve their autonomous capabilities. Human sensors are vehicle mounted or frame mounted. Signatures obtained from events are classified and stored.


THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT F MAY whose telephone number is (571)272-3195. The examiner can normally be reached Monday-Friday 9:30am to 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT F MAY/Examiner, Art Unit 2154                                                                                                                                                                                                        9/24/2022

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154