DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the patent application filed on August 10, 2020. 
Claims 1-20 are currently pending and have been examined.
This action is made Non-FINAL.
The examiner would like to note that this application is being handled by examiner Christine Huynh.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on August 10, 2020. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: Figure numbers 406 and 232.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities: In paragraph [0034] of the specification, “sensors 1116” should be “sensors 116”. Appropriate correction is required.
The use of the terms “Bluetooth” and “Zigbee” in paragraph [0015], which is a trade name or a mark used in commerce, has been noted in this application. The term should be accompanied by the generic terminology; furthermore the term should be capitalized wherever it appears or, where appropriate, include a proper symbol indicating use in commerce such as ™, SM , or ® following the term.
Although the use of trade names and marks used in commerce (i.e., trademarks, service marks, certification marks, and collective marks) are permissible in patent applications, the proprietary nature of the marks should be respected and every effort made to prevent their use in any manner which might adversely affect their validity as commercial marks.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 3-10, and 12-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claims 1 and 3-9:
Claims 1 and 3-9 are directed to a method for trajectory prediction and motion device control which are/is one of the statutory categories of invention. (Step 1: YES)
Claim 1 recites the limitations of “computing a first trajectory…”. This step falls into the mental processes grouping of abstract ideas as “computing a first trajectory” could be done mentally for example using a pen and paper to solve the mathematical steps to get the trajectory. This limitation is a process that, under broadest reasonable interpretation, covers the performance of the limitation in the mind but for the recitation of generic computer components. With respect to claim 1, other than reciting a computer, nothing in limitations precludes the idea from practically being performed in the human mind. The recitation of generic components in a claim does not necessarily preclude that claim from reciting an abstract idea. (Step 2A-Prong 1: YES. The claims recite an abstract idea)
The claims recite elements additional to the abstract concepts. However, these additional elements fail to integrate the abstract idea into a practical application. The “receiving a first set of data”, “receiving a second set of data”, “extracting features from the first set of data”, “extracting features from the second set of data”, and “inputting the features” of claim 1 is insignificant extra-solution activity as it is merely data gathering and data inputting. The hardware/software is/are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic component or it is merely insignificant extra solution activity as it is only gathering data. Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and are at a high level of generality. Therefore, claim 1 is directed to an abstract idea without a practical application. (Step 2A-Prong 2: NO. The additional claimed elements are not integrated into a practical application)
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered separately and as an ordered combination, they do not add significantly more (also known as an “inventive concept”) to the exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a computer hardware amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. See Applicant’s specification para. [0012-0016] about implantation using general purpose or special purpose computing devices and MPEP 2106.05(f) where applying a computer as a tool is not indicative of significantly more as well as MPEP 2106.05(d), as well as MPEP 2106.05(g), if applicable. Accordingly, these additional elements, do not change the outcome of the analysis, when considered separately and as an ordered combination. Thus, claim 1 is not patent eligible. (Step 2B: NO. The claims do not provide significantly more)
Dependent claims 3-9 further define the abstract idea that is present in their respective independent claim 1 thus correspond to mental process and hence are abstract for the reasons presented above. The dependent claims do not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception when considered both individually and as an ordered combination. However, dependent claim 2 recites “controlling the motion device based on the first trajectory and the second trajectory”, which is NOT directed to an abstract idea. Therefore, only dependent claims 3-9 are directed to an abstract idea. Thus, claims 1 and 3-9 are not patent-eligible.
Regarding claims 10 and 12-14:
Claims 10 and 12-14 are directed to a method for trajectory prediction and motion device control which are/is one of the statutory categories of invention. (Step 1: YES)
Claim 10 recites the limitations of “generating a first feature representation”, “generating a second feature representation”, “determining a first trajectory”, “determining a second trajectory”, and “generating a control instruction”. These steps fall into the mental processes grouping of abstract ideas as “generating a first feature representation” could be done mentally for example using a pen and paper to solve the mathematical steps to determine the feature representation. This limitation is a process that, under broadest reasonable interpretation, covers the performance of the limitation in the mind but for the recitation of generic computer components. With respect to claim 10, other than reciting a computer, nothing in limitations precludes the idea from practically being performed in the human mind. The recitation of generic components in a claim does not necessarily preclude that claim from reciting an abstract idea. (Step 2A-Prong 1: YES. The claims recite an abstract idea)
The claims recite elements additional to the abstract concepts. However, these additional elements fail to integrate the abstract idea into a practical application. The “receiving a first set of data”, “receiving a second set of data”, and “inputting the first feature representation” of claim 10 is insignificant extra-solution activity as it is merely data gathering and data inputting. The hardware/software is/are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic component or it is merely insignificant extra solution activity as it is only gathering data. Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and are at a high level of generality. Therefore, claim 10 is directed to an abstract idea without a practical application. (Step 2A-Prong 2: NO. The additional claimed elements are not integrated into a practical application)
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered separately and as an ordered combination, they do not add significantly more (also known as an “inventive concept”) to the exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a computer hardware amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. See Applicant’s specification para. [0012-0016] about implantation using general purpose or special purpose computing devices and MPEP 2106.05(f) where applying a computer as a tool is not indicative of significantly more as well as MPEP 2106.05(d), as well as MPEP 2106.05(g), if applicable. Accordingly, these additional elements, do not change the outcome of the analysis, when considered separately and as an ordered combination. Thus, claim 10 is not patent eligible. (Step 2B: NO. The claims do not provide significantly more)
Dependent claims 12-14 further define the abstract idea that is present in their respective independent claim 10 thus correspond to mental process and hence are abstract for the reasons presented above. The dependent claims do not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception when considered both individually and as an ordered combination. However, dependent claim 11 recites “controlling motion of the motion device based on the driving instructions”, which is NOT directed to an abstract idea. Therefore, only dependent claims 12-14 are directed to an abstract idea. Thus, claims 10 and 12-14 are not patent-eligible.
Regarding claims 15-20:
Claims 15-20 are directed to a system for trajectory prediction and motion device control which are/is one of the statutory categories of invention. (Step 1: YES)
Claim 15 recites the limitations of “determines a first trajectory”, “determines a second trajectory”, and “generating a control instruction”. These steps fall into the mental processes grouping of abstract ideas as “determining a trajectory” could be done mentally for example using a pen and paper to solve the mathematical steps to get the trajectory. This limitation is a process that, under broadest reasonable interpretation, covers the performance of the limitation in the mind but for the recitation of generic computer components. With respect to claim 15, other than reciting a computer, nothing in limitations precludes the idea from practically being performed in the human mind. The recitation of generic components in a claim does not necessarily preclude that claim from reciting an abstract idea. (Step 2A-Prong 1: YES. The claims recite an abstract idea)
The claims recite elements additional to the abstract concepts. However, these additional elements fail to integrate the abstract idea into a practical application. Claim 15 recites encoders, decoders, and a processor, which are generic computer parts being applied to the recited abstract limitations. The “inputs the first sequence of images” and “inputs the second sequence of images” of claim 15 is insignificant extra-solution activity as it is merely data inputting. The hardware/software is/are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic component or it is merely insignificant extra solution activity as it is only gathering data. Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and are at a high level of generality. Therefore, claim 15 is directed to an abstract idea without a practical application. (Step 2A-Prong 2: NO. The additional claimed elements are not integrated into a practical application)
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered separately and as an ordered combination, they do not add significantly more (also known as an “inventive concept”) to the exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a computer hardware amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. See Applicant’s specification para. [0012-0016] about implantation using general purpose or special purpose computing devices and MPEP 2106.05(f) where applying a computer as a tool is not indicative of significantly more as well as MPEP 2106.05(d), as well as MPEP 2106.05(g), if applicable. Accordingly, these additional elements, do not change the outcome of the analysis, when considered separately and as an ordered combination. Thus, claim 10 is not patent eligible. (Step 2B: NO. The claims do not provide significantly more)
Dependent claims 16-20 further define the abstract idea that is present in their respective independent claim 10 thus correspond to mental process and hence are abstract for the reasons presented above. The dependent claims do not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception when considered both individually and as an ordered combination. Therefore, dependent claims 16-20 are directed to an abstract idea. Thus, claims 15-20 are not patent-eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-4, 7-11, 13-16, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Evans et al. (US 11200429 B1) in view of Zhang et al. (US 20190147250 A1). 
Regarding claims 1-4, 7-11, 13-16, and 19-20: 
With respect to claim 1, Evans teaches: 
receiving a first set of data about a surrounding environment of a motion device, wherein the first set of data includes first sensory data and first position data from the motion device; (“tracking an object may comprise receiving first data (such as a first image, first light detection and ranging (LIDAR) data, first RADAR data, etc.) captured by a sensor at a first capture time” (column 2, lines 48-51), “the perception engine 110 may receive sensor data from sensor(s) 104 of the autonomous vehicle 102, determine perception data from the sensor data, and transmit the perception data to a planner 112 for use by the planner 112 to localize a position of the autonomous vehicle 102 on a global map and/or a local map” (column 4, lines 54-65)) 
receiving a second set of data about the surrounding environment of the motion device, wherein the second set of data includes second sensory data and second position data from the motion device, wherein the second sensory data is a different modality than the first sensory data; (“… and second data (e.g., a second image, second LIDAR data, second RADAR data, etc.) captured by the sensor at a second capture time.” (column 2, lines 51-53), “the perception engine 110 may receive sensor data from sensor(s) 104 of the autonomous vehicle 102, determine perception data from the sensor data, and transmit the perception data to a planner 112 for use by the planner 112 to localize a position of the autonomous vehicle 102 on a global map and/or a local map” (column 4, lines 54-65))
extracting features from the first set of data that model motion behavior of the motion device; (“the perception engine 110 may also determine a track 120 of an object, such as vehicle 116. In some examples, the track 120 may comprise a current, predicted, and/or historical position, heading, velocity, acceleration, distance, ROI(s), and/or center of the object. In some examples, any of these characteristics may be real world values (e.g., a velocity of the object in the real world in meters per second, kilometers per hour, a heading relative to the sensor that captured them image) and/or image-relative characteristics (e.g., a velocity associated with movement of a representation of the object across images in pixels per second, a heading that identifies an angle of movement of the representation of the object in the image)” (column 5, lines 20-28)) 
extracting features from the second set of data that model motion behavior of the motion device; (“the perception engine 110 may also determine a track 120 of an object, such as vehicle 116. In some examples, the track 120 may comprise a current, predicted, and/or historical position, heading, velocity, acceleration, distance, ROI(s), and/or center of the object. In some examples, any of these characteristics may be real world values (e.g., a velocity of the object in the real world in meters per second, kilometers per hour, a heading relative to the sensor that captured them image) and/or image-relative characteristics (e.g., a velocity associated with movement of a representation of the object across images in pixels per second, a heading that identifies an angle of movement of the representation of the object in the image)” (column 5, lines 20-28)) as the system extracts features from the first set of data, it is capable of doing the same for a second set of data. 
inputting the features from the first set of data and the features from the second set of data into a machine learning model with a shared embedded space; (“The techniques may include comparing a first segmentation generated for first sensor data and a second segmentation generated for second sensor data. Comparing the segmentations may comprise determining a first embedding for a first segmentation and a second embedding for a second segmentation using an additional ML model (an “embedding model”).” (column 3, lines 35-41))
However, Evans does not teach but Zhang teaches: 
computing a first trajectory based on a first latent variable sampled from the shared embedded space and a second trajectory based on a second latent variable sampled from the shared embedded space; (“The method includes accessing, by the computing system, a machine-learned semantic segmentation model. The method includes providing as an input, by the computing system, the data indicative of the two-dimensional voxel representation associated with the environment into the machine-learned semantic segmentation model. The method includes obtaining, by the computing system, an output from the machine-learned semantic segmentation model.” [0005], “A motion plan 785 can include vehicle actions (e.g., planned vehicle trajectories, speed(s), acceleration(s), other actions, etc.) with respect to one or more of the objects within the surrounding environment of the vehicle 710 as well as the objects' predicted movements. For instance, the motion planning system 770 can implement an optimization algorithm, model, etc. that considers cost data associated with a vehicle action as well as other objective functions (e.g., cost functions based on speed limits, traffic lights, etc.), if any, to determine optimized variables that make up the motion plan 785” [0074]). This shows that the system uses the raw data that is sensed from the environment to create representative data for a voxel, which can be used as latent variables to plan the vehicle motion including the trajectory based off of the surrounding environment. 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation for (“an improved approach for semantic segmentation of a three-dimensional dataset associated with an environment (e.g., a three-dimensional point cloud, etc.)” See Zhang [0033] and “also provide an improvement to vehicle computing technology, such as machine-learned related computing technology…help generate an improved set of input data that can be ingested by a machine-learned model.” See Zhang [0035]). 

With respect to claim 10, Evans teaches: 
receiving a first set of data captured during a plurality of time steps about a surrounding environment of a motion device, wherein the first set of data includes first sensory data having a first modality and first position data from the motion device; (“tracking an object may comprise receiving first data (such as a first image, first light detection and ranging (LIDAR) data, first RADAR data, etc.) captured by a sensor at a first capture time” (column 2, lines 48-51), “the perception engine 110 may receive sensor data from sensor(s) 104 of the autonomous vehicle 102, determine perception data from the sensor data, and transmit the perception data to a planner 112 for use by the planner 112 to localize a position of the autonomous vehicle 102 on a global map and/or a local map” (column 4, lines 54-65))
receiving a second set of data captured during the plurality of time steps about the surrounding environment of the motion device, wherein the second set of data includes second sensory data having a second modality and second position data from the motion device, and wherein the first modality and the second modality are different; (“… and second data (e.g., a second image, second LIDAR data, second RADAR data, etc.) captured by the sensor at a second capture time.” (column 2, lines 51-53), “the perception engine 110 may receive sensor data from sensor(s) 104 of the autonomous vehicle 102, determine perception data from the sensor data, and transmit the perception data to a planner 112 for use by the planner 112 to localize a position of the autonomous vehicle 102 on a global map and/or a local map” (column 4, lines 54-65))
generating a first feature representation describing a motion behavior of the motion device with respect to the surrounding environment based on the first set of data; (“the perception engine 110 may also determine a track 120 of an object, such as vehicle 116. In some examples, the track 120 may comprise a current, predicted, and/or historical position, heading, velocity, acceleration, distance, ROI(s), and/or center of the object. In some examples, any of these characteristics may be real world values (e.g., a velocity of the object in the real world in meters per second, kilometers per hour, a heading relative to the sensor that captured them image) and/or image-relative characteristics (e.g., a velocity associated with movement of a representation of the object across images in pixels per second, a heading that identifies an angle of movement of the representation of the object in the image)” (column 5, lines 20-28)) 
generating a second feature representation describing the motion behavior of the motion device with respect to the surrounding environment based on the second set of data; (“the perception engine 110 may also determine a track 120 of an object, such as vehicle 116. In some examples, the track 120 may comprise a current, predicted, and/or historical position, heading, velocity, acceleration, distance, ROI(s), and/or center of the object. In some examples, any of these characteristics may be real world values (e.g., a velocity of the object in the real world in meters per second, kilometers per hour, a heading relative to the sensor that captured them image) and/or image-relative characteristics (e.g., a velocity associated with movement of a representation of the object across images in pixels per second, a heading that identifies an angle of movement of the representation of the object in the image)” (column 5, lines 20-28)) as the system extracts features from the first set of data, it is capable of doing the same for a second set of data.
inputting the first feature representation and the second feature representation into a machine learning model including a shared cross-modal latent space embedded with features extracted from different modalities; (“The techniques may include comparing a first segmentation generated for first sensor data and a second segmentation generated for second sensor data. Comparing the segmentations may comprise determining a first embedding for a first segmentation and a second embedding for a second segmentation using an additional ML model (an “embedding model”).” (column 3, lines 35-41), “receiving, from one or more sensors, a first image comprising a representation of an object; receiving, as a first segmentation, a first portion of the first image associated with the object; inputting, into a machine learned model, the first segmentation, the machine learned model being trained to output embeddings associated with an input segmentation;” (column 17, lines 36-42)) 
generating a control instruction based on the first trajectory and the second trajectory; (“select one of the potential trajectories as a trajectory of the autonomous vehicle 102 that may be used to generate a drive control signal that may be transmitted to drive components of the autonomous vehicle 102.” (column 6, lines 32-36)) 
However, Evans does not teach, but Zhang teaches: 
determining a first trajectory based on the first feature representation and a first latent variable sampled from the shared cross-modal latent space; (“The method includes accessing, by the computing system, a machine-learned semantic segmentation model. The method includes providing as an input, by the computing system, the data indicative of the two-dimensional voxel representation associated with the environment into the machine-learned semantic segmentation model. The method includes obtaining, by the computing system, an output from the machine-learned semantic segmentation model.” [0005], “A motion plan 785 can include vehicle actions (e.g., planned vehicle trajectories, speed(s), acceleration(s), other actions, etc.) with respect to one or more of the objects within the surrounding environment of the vehicle 710 as well as the objects' predicted movements” [0074]). This shows that the system uses the raw data that is sensed from the environment to create representative data for a voxel, which can be used as latent variables to plan the vehicle motion including the trajectory based off of the surrounding environment.
determining a second trajectory based on the second feature representation and a second latent variable sampled from the shared cross-modal latent space; (“The method includes accessing, by the computing system, a machine-learned semantic segmentation model. The method includes providing as an input, by the computing system, the data indicative of the two-dimensional voxel representation associated with the environment into the machine-learned semantic segmentation model. The method includes obtaining, by the computing system, an output from the machine-learned semantic segmentation model.” [0005], “A motion plan 785 can include vehicle actions (e.g., planned vehicle trajectories, speed(s), acceleration(s), other actions, etc.) with respect to one or more of the objects within the surrounding environment of the vehicle 710 as well as the objects' predicted movements” [0074]). This shows that the system uses the raw data that is sensed from the environment to create representative data for a voxel, which can be used as latent variables to plan the vehicle motion including the trajectory based off the surrounding environment. Because the system can determine a trajectory from a first set of data, it is capable of doing the same for a second set of data.
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation for (“an improved approach for semantic segmentation of a three-dimensional dataset associated with an environment (e.g., a three-dimensional point cloud, etc.)” See Zhang [0033] and “also provide an improvement to vehicle computing technology, such as machine-learned related computing technology…help generate an improved set of input data that can be ingested by a machine-learned model.” See Zhang [0035]). 

With respect to claim 15, Evans teaches: 
one or more feature encoders; “an autoencoder (e.g., structuring the embedding model to have input and output layers of the same dimension where an intermediate layer learns to encode data received via the input layers and to output code that matches or nearly matches the original data via the output layers, stochastic gradient variational Bayes (SGVB) algorithm, sparse autoencoder))” (column 10, lines 27-35)) 
a processor operatively connected for computer communication to the one or more feature encoders and the one or more trajectory encoders; (“The vehicle computing device 504 may include one or more processors 518 and memory 520 communicatively coupled with the one or more processors 518.” (column 15, lines 45-47)) 
inputs a first sequence of images having a first modality and a first set of past positions into the one or more feature encoders to generate a first feature representation; (“tracking an object may comprise receiving first data (such as a first image, first light detection and ranging (LIDAR) data, first RADAR data, etc.) captured by a sensor at a first capture time” (column 2, lines 48-51), “the track 120 may comprise a current, predicted, and/or historical position, heading, velocity, acceleration, distance, ROI(s), and/or center of the object” (column 5, lines 20-24), “receiving, from one or more sensors, a first image comprising a representation of an object; receiving, as a first segmentation, a first portion of the first image associated with the object; inputting, into a machine learned model, the first segmentation, the machine learned model being trained to output embeddings associated with an input segmentation;” (column 17, lines 36-42))
inputs a second sequence of images having a second modality and a second set of past positions into the one or more feature encoders, wherein the first modality and the second modality are different to generate a second feature representation; (“… and second data (e.g., a second image, second LIDAR data, second RADAR data, etc.) captured by the sensor at a second capture time.” (column 2, lines 51-53), “the track 120 may comprise a current, predicted, and/or historical position, heading, velocity, acceleration, distance, ROI(s), and/or center of the object” (column 5, lines 20-24), “receiving, from one or more sensors, a first image comprising a representation of an object; receiving, as a first segmentation, a first portion of the first image associated with the object; inputting, into a machine learned model, the first segmentation, the machine learned model being trained to output embeddings associated with an input segmentation;” (column 17, lines 36-42)) Because the system can input a first set of data, it is capable of doing the same for a second set of data.
generates a control instruction based on the first trajectory and the second trajectory; (“select one of the potential trajectories as a trajectory of the autonomous vehicle 102 that may be used to generate a drive control signal that may be transmitted to drive components of the autonomous vehicle 102.” (column 6, lines 32-36))
However, Evans does not teach, but Zhang teaches: 
one or more trajectory decoders; (“the model(s) 125 (e.g., the machine-learned semantic segmentation model) can include an encoder and decoder network.” [0054]) 
determines a first trajectory by inputting the first feature representation into the one or more trajectory decoders and sampling a shared latent space with a latent variable having the same modality as the first modality; (“The method includes accessing, by the computing system, a machine-learned semantic segmentation model. The method includes providing as an input, by the computing system, the data indicative of the two-dimensional voxel representation associated with the environment into the machine-learned semantic segmentation model. The method includes obtaining, by the computing system, an output from the machine-learned semantic segmentation model.” [0005], “A motion plan 785 can include vehicle actions (e.g., planned vehicle trajectories, speed(s), acceleration(s), other actions, etc.) with respect to one or more of the objects within the surrounding environment of the vehicle 710 as well as the objects' predicted movements” [0074]). This shows that the system uses the raw data that is sensed from the environment to create representative data for a voxel, which can be used as latent variables to plan the vehicle motion including the trajectory based off the surrounding environment.
determines a second trajectory by inputting the second feature representation into the one or more trajectory decoders and sampling the shared latent space with a latent variable having the same modality as the second modality; (“The method includes accessing, by the computing system, a machine-learned semantic segmentation model. The method includes providing as an input, by the computing system, the data indicative of the two-dimensional voxel representation associated with the environment into the machine-learned semantic segmentation model. The method includes obtaining, by the computing system, an output from the machine-learned semantic segmentation model.” [0005], “A motion plan 785 can include vehicle actions (e.g., planned vehicle trajectories, speed(s), acceleration(s), other actions, etc.) with respect to one or more of the objects within the surrounding environment of the vehicle 710 as well as the objects' predicted movements” [0074]). This shows that the system uses the raw data that is sensed from the environment to create representative data for a voxel, which can be used as latent variables to plan the vehicle motion including the trajectory based off the surrounding environment. Because the system can determine a trajectory from a first set of data, it is capable of doing the same for a second set of data.
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation for (“an improved approach for semantic segmentation of a three-dimensional dataset associated with an environment (e.g., a three-dimensional point cloud, etc.)” See Zhang [0033] and “also provide an improvement to vehicle computing technology, such as machine-learned related computing technology…help generate an improved set of input data that can be ingested by a machine-learned model.” See Zhang [0035]). 

With respect to claims 2 and 11, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claims 1 and 10. 
The combination of Evans and Zhang teaches the trajectory prediction system of claims 1 and 10. Evans further teaches controlling the motion device based on the first trajectory and the second trajectory; (“the perception engine 110 may receive sensor data from sensor(s) 104 of the autonomous vehicle 102, determine perception data from the sensor data, and transmit the perception data to a planner 112 for use by the planner 112 to localize a position of the autonomous vehicle 102 on a global map and/or a local map, determine one or more trajectories, and/or control motion of the autonomous vehicle 102 to traverse a path or route” (column 4, lines 54-65)) 

With respect to claim 3, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 1.  
The combination of Evans and Zhang teaches the trajectory prediction system of claim 1. Evans further teaches the first sensory data is LiDAR point cloud data from one or more sensors of the motion device; (“tracking an object may comprise receiving first data (such as a first image, first light detection and ranging (LIDAR) data, first RADAR data, etc.) captured by a sensor at a first capture time” (column 2, lines 48-51)). 

With respect to claim 4, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 3.  
The combination of Evans and Zhang teaches the trajectory prediction system of claim 3. Evans further teaches the second sensory data is image data from one or more sensors of the motion device; (“… and second data (e.g., a second image, second LIDAR data, second RADAR data, etc.) captured by the sensor at a second capture time… Note that, in some examples, the sensor may comprise different sensors (image sensors, LIDAR sensors, RADAR sensors, time-of-flight sensors, depth cameras, stereo cameras, and the like)” (column 2, lines 51-58))

With respect to claims 7 and 13, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claims 1 and 10.  
The combination of Evans and Zhang teaches the trajectory prediction system of claims 1 and 10. Evans further teaches the features from the first set of data and the features from the second set of data are each composed of environmental stimuli, inertial stimuli, and social stimuli; (“the sensor data may include a location signal (e.g., a GPS signal), an inertia signal (e.g., an accelerometer signal, a gyroscope signal, etc.), a magnetometer signal, a wheel encoder signal, a speedometer signal, a point cloud of accumulated LIDAR and/or RADAR points, an image (or images), an audio signal, and/or bariatric or other environmental signals, etc.” (column 4, lines 37-43), “The perception engine 110 may include one or more ML models and/or other computer-executable instructions for detecting, identifying, segmenting, classifying, and/or tracking objects from sensor data collected from the environment of the autonomous vehicle 102, such as ML model 114” (column 4, lines 54-59)) which shows that the data set features include sensor data from a plurality of different types of sensor that collect environment, motion, and social information. 

With respect to claims 8 and 14, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claims 7 and 13.   
The combination of Evans and Zhang teaches the trajectory prediction system of claims 7 and 13. Evans further teaches wherein the environmental stimuli includes contextual cues of the surrounding environment of the motion device (“The perception engine 110 may include one or more ML models and/or other computer-executable instructions for detecting, identifying, segmenting, classifying, and/or tracking objects from sensor data collected from the environment of the autonomous vehicle 102, such as ML model 114” (column 4, lines 54-59), the inertial stimuli includes motion intent of the motion device (“the sensor(s) 506 may include LIDAR sensors, RADAR sensors, ultrasonic transducers, sonar sensors, location sensors (e.g., global positioning system (GPS), compass, etc.), inertial sensors (e.g., inertial measurement units (IMUs), accelerometers, magnetometers, gyroscopes, etc.)” (column 13, lines 66 – column 14, lines 4)), social stimuli includes interactions between the motion device and other interactive agents in the surrounding environment of the motion device; (“the perception engine 110 may detect an object in the environment and classify the object (e.g., “four-wheeled vehicle,” “semi-truck,” “pedestrian,” “animal,” “construction vehicle”).” (column 5, lines 9-12), “the perception engine 110 may also determine a track 120 of an object, such as vehicle 116. In some examples, the track 120 may comprise a current, predicted, and/or historical position, heading, velocity, acceleration, distance, ROI(s), and/or center of the object” (column 5, lines 20-24))

With respect to claim 9, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 1. 
The combination of Evans and Zhang teaches the trajectory prediction system of claim 1. Evans does not teach, but Zhang further teaches the embedded space mapping includes a single cross-modal latent space; (“The voxel grid 410 can be defined in a three-dimensional Euclidian space that encompasses all or at least a portion of the plurality of points of the three-dimensional data. For example, the voxel grid 410 can confine one or more points associated with the object 405 (e.g., located within an environment 200A-B). Given a set of observations O={o.sub.1} representing a three-dimensional set of data (e.g., a three-dimensional point cloud), the computing system 100 can predict a probability distribution across (K) classes for all (o.sub.1), where each (o.sub.1) is a vector consisting of (p.sub.i) representing its (x, y, z) location in Euclidean space and (m.sub.1) representing any other available sensor modalities (e.g., RGB, intensity, etc.)” [0044]) where the system includes mapping a space by using sensor data that represents the environment around the vehicle.
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation and space for (“an improved approach for semantic segmentation of a three-dimensional dataset associated with an environment (e.g., a three-dimensional point cloud, etc.)” See Zhang [0033] and “also provide an improvement to vehicle computing technology, such as machine-learned related computing technology…help generate an improved set of input data that can be ingested by a machine-learned model.” See Zhang [0035]). 

With respect to claim 16, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 15. 
The combination of Evans and Zhang teaches the trajectory prediction system of claim 15. Evans further teaches transmitting the control instruction to the motion device thereby controlling motion of the vehicle; (“select one of the potential trajectories as a trajectory of the autonomous vehicle 102 that may be used to generate a drive control signal that may be transmitted to drive components of the autonomous vehicle 102.” (column 6, lines 32-36)

With respect to claim 19, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 15. 
The combination of Evans and Zhang teaches the trajectory prediction system of claim 15. Evans further teaches the first feature representation and the second feature representation are each composed of a plurality of stimuli; (“The perception engine 110 may include one or more ML models and/or other computer-executable instructions for detecting, identifying, segmenting, classifying, and/or tracking objects from sensor data collected from the environment of the autonomous vehicle 102, such as ML model 114” (column 4, lines 54-59)) which shows that the data set features include sensor data from a plurality of stimuli.  

With respect to claim 20, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 19. 
The combination of Evans and Zhang teaches the trajectory prediction system of claim 19. Evans further teaches the plurality of stimuli includes environmental stimuli, inertial stimuli, and social stimuli; (“the sensor data may include a location signal (e.g., a GPS signal), an inertia signal (e.g., an accelerometer signal, a gyroscope signal, etc.), a magnetometer signal, a wheel encoder signal, a speedometer signal, a point cloud of accumulated LIDAR and/or RADAR points, an image (or images), an audio signal, and/or bariatric or other environmental signals, etc.” (column 4, lines 37-43), “The perception engine 110 may include one or more ML models and/or other computer-executable instructions for detecting, identifying, segmenting, classifying, and/or tracking objects from sensor data collected from the environment of the autonomous vehicle 102, such as ML model 114” (column 4, lines 54-59)) which shows that the data set features include sensor data from a plurality of different types of sensor that collect environment, motion, and social information.

Claim(s) 5-6, 12, and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Evans et al. (US 11200429 B1) in view of Zhang et al. (US 20190147250 A1) in further view of Zeng et al. (US 20200159225 A1). 
Regarding claims 5-6, 12, and 17-18: 
With respect to claim 5, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 1. 
The combination of Evans and Zhang teaches the trajectory prediction system of claim 1. Evans does not teach, but Zeng teaches the first sensory data is a past sequence of top-down views; (“obtain sensor data such as LIDAR point cloud data associated with an environment external to an autonomous vehicle. The means can be configured to project the LIDAR point cloud data to a bird's eye view representation of the LIDAR point cloud data in some examples” [0128], “The vehicle computing system 110 can generate perception data 142 that is indicative of one or more states (e.g., current and/or past state(s)) of a plurality of objects that are within a surrounding environment of the vehicle 102.” [0060]), where a bird’s eye view provides a top down view of the sensor information. 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation with Zeng’s sensor data views because (“the motion planning model provides an end-to-end driving approach that is optimized for the motion planning task, while also providing intermediate representations that can be accessed to improve the effectiveness of the model, such as by training to optimize the intermediate representations for motion planning.” See Zeng [0022]).

With respect to claim 6, Evans in combination with Zhang and Zeng, as shown in the rejection above, discloses the limitations of claim 5. 
The combination of Evans, Zhang, and Zeng teaches the trajectory prediction system of claim 5. Evans does not teach, but Zeng further teaches the second sensory data is a past sequence of frontal views; (“the vehicle sensor(s) 116 can include a front-facing RGB camera mounted on top of the vehicle 102 and the sensor data 118 can include an RGB image depicting the surrounding environment of the vehicle 102” [0055], “The vehicle computing system 110 can generate perception data 142 that is indicative of one or more states (e.g., current and/or past state(s)) of a plurality of objects that are within a surrounding environment of the vehicle 102.” [0060]) 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation with Zeng’s sensor data views because (“the motion planning model provides an end-to-end driving approach that is optimized for the motion planning task, while also providing intermediate representations that can be accessed to improve the effectiveness of the model, such as by training to optimize the intermediate representations for motion planning.” See Zeng [0022]).

With respect to claim 12, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 10. 
The combination of Evans and Zhang teaches the trajectory prediction system of claim 10. Evans does not teach, but Zeng teaches the first sensory data is a past sequence of top-down views, and the second sensory data is a past sequence of frontal views; (“obtain sensor data such as LIDAR point cloud data associated with an environment external to an autonomous vehicle. The means can be configured to project the LIDAR point cloud data to a bird's eye view representation of the LIDAR point cloud data in some examples” [0128], (“the vehicle sensor(s) 116 can include a front-facing RGB camera mounted on top of the vehicle 102 and the sensor data 118 can include an RGB image depicting the surrounding environment of the vehicle 102” [0055], “The vehicle computing system 110 can generate perception data 142 that is indicative of one or more states (e.g., current and/or past state(s)) of a plurality of objects that are within a surrounding environment of the vehicle 102.” [0060]), where a bird’s eye view provides a top down view of the sensor information. 
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation with Zeng’s sensor data views because (“the motion planning model provides an end-to-end driving approach that is optimized for the motion planning task, while also providing intermediate representations that can be accessed to improve the effectiveness of the model, such as by training to optimize the intermediate representations for motion planning.” See Zeng [0022]).

With respect to claim 17, Evans in combination with Zhang, as shown in the rejection above, discloses the limitations of claim 15. 
The combination of Evans and Zhang teaches the trajectory prediction system of claim 15. Evans does not teach, but Zeng teaches the first sequence of images is a past sequence of top-down views of a surrounding environment of the motion device derived from LiDAR data; (“obtain sensor data such as LIDAR point cloud data associated with an environment external to an autonomous vehicle. The means can be configured to project the LIDAR point cloud data to a bird's eye view representation of the LIDAR point cloud data in some examples” [0128], “The vehicle computing system 110 can generate perception data 142 that is indicative of one or more states (e.g., current and/or past state(s)) of a plurality of objects that are within a surrounding environment of the vehicle 102.” [0060]), where a bird’s eye view provides a top down view of the sensor information.
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation with Zeng’s sensor data views because (“the motion planning model provides an end-to-end driving approach that is optimized for the motion planning task, while also providing intermediate representations that can be accessed to improve the effectiveness of the model, such as by training to optimize the intermediate representations for motion planning.” See Zeng [0022]).

With respect to claim 18, Evans in combination with Zhang and Zeng, as shown in the rejection above, discloses the limitations of claim 17. 
The combination of Evans, Zhang, and Zeng teaches the trajectory prediction system of claim 17. Evans does not teach, but Zeng further teaches the second sequence of images is a past sequence of frontal-views of the surrounding environment of the motion device derived from image data; (“the vehicle sensor(s) 116 can include a front-facing RGB camera mounted on top of the vehicle 102 and the sensor data 118 can include an RGB image depicting the surrounding environment of the vehicle 102” [0055], “The vehicle computing system 110 can generate perception data 142 that is indicative of one or more states (e.g., current and/or past state(s)) of a plurality of objects that are within a surrounding environment of the vehicle 102.” [0060])
It would have been obvious to one of ordinary skill in the art before the effective filling date of the instant application to have combined Evans’ tracking object sensor data and Zhang’s data segmentation with Zeng’s sensor data views because (“the motion planning model provides an end-to-end driving approach that is optimized for the motion planning task, while also providing intermediate representations that can be accessed to improve the effectiveness of the model, such as by training to optimize the intermediate representations for motion planning.” See Zeng [0022]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Christine N Huynh whose telephone number is (571)272-9980. The examiner can normally be reached Monday - Friday 8 am - 4 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aniss Chad can be reached on (571)270-3832. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHRISTINE NGUYEN HUYNH/Examiner, Art Unit 3662                      

/ANISS CHAD/Supervisory Patent Examiner, Art Unit 3662