Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over EP 3783538 A1 to Blaiotta in view of US 2022/0292867 A1 to Zhang et al., hereinafter, “Zhang”.
Claim 1. An image processing method comprising: defining relations between entities of a target of which a motion is to be predicted from an image of a first time point based on a feature vector of the entities; Blaiotta [0004] teaches it would be desirable to be able to model time-varying interaction configurations and use them to more accurately predict future dynamics of interacting physical objects.

Blaiotta [0006] teaches observations of the separate physical objects may be available in the form of sequences of observed object feature vectors of the separate objects. The object feature vectors may be indicative of respective physical quantities of the physical objects. For example, at a given set of discrete measurement times, the object feature vectors of the multiple interacting objects, e.g., their locations, velocities, accelerations, etcetera, may have been determined. Generally, the observed object feature vectors may be determined based on input signals from various sensors, e.g., camera, radar, LiDAR, ultrasonic sensors, or any combination. Accordingly, the object feature vectors of respective objects may represent the state of the physical system at a point in time.

estimating a dynamic interaction between the entities at the first time point based on the defined relations between the entities; Blaiotta [0004] teaches it would be desirable to be able to model time-varying interaction configurations and use them to more accurately predict future dynamics of interacting physical objects.

Blaiotta [0007] teaches in order to model the latent connectivity dynamics of such interacting objects, interestingly, the inventors devised to capture pairwise interactions between the objects at a particular point in time using interaction types from a set of multiple interaction types. For example, given observations of the objects at multiple points in time, the latent connectivity dynamics corresponding to these observations may be expressed in terms of interaction types between each pair of objects at each point in time. 

predicting a motion of the entities changing at a second time point based on the estimated dynamic interaction; Blaiotta [0007] teaches for example, given observations of the objects at multiple points in time, the latent connectivity dynamics corresponding to these observations may be expressed in terms of interaction types between each pair of objects at each point in time. Interestingly, the interaction types between a pair of objects may vary over time, e.g., in a sequence of interaction types for a pair of objects, at least two different interaction types may occur. For example, two pedestrians may first be moving independently from each other but may start affecting each other's trajectories as they get closer to each other. Thereby, a more accurate model of the connectivity dynamics may be obtained than if interaction types are assumed to remain static over time.

Blaiotta [0017] teaches the sequences of predicted object feature vectors may represent trajectories of fictitious interacting objects at discrete points in time. 

Blaiotta [0023] teaches such probability distributions may be used to generate multiple predictions of future trajectories of the interacting objects that are representative of various possible evolutions of the system.

and outputting a result to which the motion predicted at the second time point is applied. Blaiotta [0020] teaches as the inventors realized, in order to take into account correlations among interacting objects, it may be possible to model attributes of different pairs of object at a given time-step as being conditionally dependent on the previous interaction types. However, automatically learning such a prior may be inefficient, e.g., involving estimating K.sup.N2-N+1 free parameters where K is the number of interaction types and N the number of objects. By determining representations of respective objects based on object feature vectors of these objects, however, this may be avoided; e.g., the representations can be determined separately for the interacting objects. Still, by using these representations to determine pairwise interaction types, in other words by conditioning the prior on the history of object trajectories, correlations among interacting objects may be taken into account.

Blaiotta [0022] teaches the representations of the interacting objects used to predict pairwise interaction types are determined from immediately preceding object feature vectors of the respective interacting agents. For example, in various real-life settings, interactions between objects may be assumed to show first order Markovian dynamics, e.g., to predict behaviour at a time step, only information form the immediately preceding time step may be relevant.

Blaiotta fails to explicitly teach outputting a result to which the motion predicted at the second time point is applied. Zhang, in the field of trajectory prediction of an object [0020] teaches the trajectory module 104 is generally configured to generate output reflecting predicted movement of objects depicted in one or more images, e.g., images captured by the image capture device 103 and/or images received from another source.

Zhang [0037] teaches the decoder model 109 is generally configured to generate output to predict the movement of a given person depicted in an image at time interval t.

Thus, at the time of the invention, it would have been obvious to one of ordinary skill to combine the teachings of Blaiotta with Zhang’s for the ability to more accurately predict future locations using fewer resources may provide significant improvements in collision prevention Zhang [0001]

Claim 2. Blaiotta and Zhang further teaches wherein the relations between the entities are determined based on at least one of connections between the entities, positions of the entities, postures of the entities, movement directions of the entities, movement speeds of the entities, motion trajectories of the entities, a rule applied to the entities, motion patterns of the entities based on the rule, a regulation applied to the entities, or motion patterns of the entities based on the regulation. Blaiotta [0002] teaches various physical systems can be naturally represented as a set of objects with relational dependencies among them. For example, a traffic situation around an autonomous vehicle may be seen as a system in which various other actors, such as other vehicles, bikes, and pedestrians, interact with each other and thereby influence each other's trajectories. As another example, a manufacturing process may be considered as a system in which various devices, e.g., autonomous robots, interact with each other according to interaction dynamics. In such cases, although no explicit rules for interactions may be known, still the interactions may show a degree of predictability. In various settings, e.g., in a monitoring system for the manufacturing process or a control system of the autonomous vehicle, it is desirable to be able to use this predictability to automatically reason about the state of such a system, e.g., in order to make predictions about its future state. For example, the autonomous vehicle may brake if a dangerous traffic situation is predicted. In order to reason accurately about such systems, it is desirable to model the systems in terms of the individual physical objects and the way that they interact with each other.

Blaiotta [0006] teaches various measures discussed herein relate to interacting physical objects, for example, traffic participants, autonomous robots in a manufacturing plant, physical particles, etcetera. Observations of the separate physical objects may be available in the form of sequences of observed object feature vectors of the separate objects. The object feature vectors may be indicative of respective physical quantities of the physical objects. For example, at a given set of discrete measurement times, the object feature vectors of the multiple interacting objects, e.g., their locations, velocities, accelerations, etcetera, may have been determined.

Blaiotta [0018] teaches Optionally, the current object feature vector comprises one or more of a position, a velocity, and an acceleration of the first interacting object. This way, spatial configurations of interacting objects, e.g., traffic participants in a traffic situation, autonomous robots in a warehouse, etcetera can be modelled. For example, object feature vectors of respective objects may be derived from camera images, or obtained from location and/or motion sensors of the objects themselves, etcetera.

Zhang [0022] teaches the social graphs 107 may be generated to reflect the pairwise social relationships between people depicted in the images at the corresponding time interval. Based on an analysis of the captured images, the trajectory module 104 may identify persons in the image, determine the present location of the person, and update the trajectory history for each identified person (e.g., as metadata of the image and/or in a separate data store). The trajectory history may reflect the actual movement of each person at each time interval and may include a vector reflecting direction and/or velocity of movement at each time interval. The movement of each person at each time interval may be based on a respective image captured by the image capture device 103 depicting the person.

Claim 3. Blaiotta and Zhang further teaches wherein the defining of the relations between the entities comprises: generating hidden state information corresponding to the relations between the entities at the first time point by applying the feature vector to a graph neural network (GNN) comprising nodes corresponding to the entities and edges corresponding to the relations between the entities. Blaiotta [0077] teaches a Graph Neural Network (GNN) 630 may be used to determine parameters of a probability distribution for predicting an object feature vector. The GNN may takes as input sequence embeddings of previous object feature vectors together with the current latent graph configuration, e.g. interaction types, and may outputs node-level graph embeddings providing the parameters of the probability distribution for predicting the object feature vector. Graph Neural Network 630 may be parametrized with various number of layers and various type of aggregation functions, although it is preferred to use aggregation functions that are permutation invariant and/or differentiable.

Blaiotta [0014] teaches in the encoder model, propagation data to an interacting object from one or more interacting objects may be used to determine a hidden state of the interacting object that can then be used by a classification model to determine an interaction type. 

Blaiotta [0015] teaches the decoder model may be combined with a prior model for predicting pairwise interaction types to form together a generative model explaining dynamics of systems with hidden interaction configurations.

Blaiotta [0024] teaches the classification model may be applied to at least the hidden states of the first and the second object.

Blaiotta [0025] teaches when determining pairwise interaction types based on observation data, a hidden state of an interacting object may be determined by recursively applying a recurrent model such as a GRU or LTSM. As discussed above, hidden states are typically determined using propagation data. By using a recurrent model, also history data may be efficiently and generically used in the hidden representations. For example, propagation data from the second to the first object may comprise the hidden states of the objects as determined by the recurrent model. However, it is also possible to, instead or in addition, use the current observed feature vectors as input to the propagation model.

Blaiotta [0026] teaches not only hidden states based on preceding observations are used to determine interaction types, but also further hidden states based on following observations. In particular, a further hidden state of a first interacting object may be determined based on at least an immediately following further hidden state of the first interacting object. Similarly to the regular hidden states, a propagation model may be used to integrate further hidden states or object feature vectors of other interacting objects into the further hidden state, and the further hidden states may be determined using a recurrent model such as a GRU or LTSM. To determine an interaction type between a first and second interaction object, the further hidden state may be used as input to the classification model. By using the further hidden states, additional knowledge from future observations may be used to more accurately infer the interaction types.

Blaiotta [0059] teaches interaction type 411 may be predicted by a prior model with which the decoder model together forms a generative model. This may be an appropriate choice for making predictions on future behaviour of interacting objects after having observed one or more time steps.

Blaiotta [0063] teaches here, values and may be comprised in set of parameters DPAR. In this particular example, the prior parameters are shared across edges and time steps, e.g., the prior on graph connectivity may comprise a time-homogeneous Markov chains for every graph edge. However, the prior model could be made more expressive by allowing time-dependent and/or edge-dependent parameters. It may also be possible to use a more powerful prior in which attributes, e.g., interaction types, of different edges at a given time-step are conditionally independent only given the entire graph configuration at the previous time-step, however, such a prior may have a lot more parameters so in most cases, the above prior is desirable.

Blaiotta [0064] teaches a second possibility for predicting pairwise interaction types may involve determining respective representations of interacting objects based on previous object feature vectors of the respective interacting objects, and determining the pairwise interaction types based on the representations of the interacting objects. In other words, pairwise interaction types may be conditioned on the history of node trajectories. Thereby, a more expressive graph transition prior than the one using transition probabilities may be obtained, but with better scaling characteristics than a model in which interaction types are made conditionally dependent on the entire graph configuration of the previous time step.

Zhang [0023] teaches a social graph 107 may be a directed graph G=(N;E;A), where N is a plurality of graph nodes, E is one or more graph edges connecting two nodes, and A is a non-symmetric adjacency matrix. Based on a given image (which may be analyzed by the CV algorithms 106 to identify persons, determine movement, determine that one person is in view of another person, identify interactions, the types of interactions, etc.), each pedestrian is assigned to a node (n.sub.j∈N) in the social graph 107, and an edge e.sub.ij=(n.sub.i, n.sub.j)∈E linking from i-th to j-th person exists when the adjacency matrix entry a.sub.ij=1. Generally, at each time interval, the current position and speed direction of each person depicted in the corresponding image is used to determine whether another person is in the view of the person and generate the social graph 107 for the corresponding time interval. For example, a CV algorithm 106 and/or the trajectory module 104 may determine whether one or more rays emitted from a first person intersect with a second person in the image to determine whether the second person is in view of the first person at a given time interval…

Zhang [0047] teaches FIG. 3A is a schematic 300 illustrating an example representation of a social graph 107. As shown, FIG. 3A depicts four example persons 301-304. Each person 301-304 may be represented as a node in the social graph 107. The edges 306-314 reflect that one of the persons 301-304 (e.g., person 304) is in view of a different one of the persons 301-304 (e.g. person 303), and the future path of person 304 may be affected by person 303. More generally, the existence of an edge in the graph 107 is determined by pairwise positions. Therefore, if person A is ahead of (or in view of) person B, an edge in the graph 107 from person A to person B may be generated.

Zhang [0048] teaches edge 306 reflects that person 302 is in the view of person 301, while edge 307 reflects that person 301 is in the view of person 302. When the interaction is in one direction, only a single edge is generated in the social graph 107. For example, edge 311 reflects that person 303 is paying attention to person 301, while the absence of an edge from person 301 to person 303 indicates that person 301 is not interacting with person 303.

Zhang [0055] teaches FIG. 4 illustrates an example image 400. The image 400 may be captured by an image capture device 103 and/or received by the trajectory module 104 from another source. The CV algorithms 106 may analyze the image 400 to identify persons 401, 402 therein. As stated, destination feature vector f.sub.j.sup.(D) may be computed for each person 401, 402. Similarly, a social graph 107 may be generated for the image 400. The social graph 107 may assign persons 401, 402 to respective nodes, and an edge may connect the nodes representing persons 401, 402. For example, the edge may associate persons 401, 402 based on one or more of: that persons 401, 402 are walking side-by-side, that persons 401, 402 are in view of each other, that persons 401, 402 are holding hands, and/or the ground truth trajectories 405, 406 of persons 401, 402. The ground truth trajectories 405, 406 may correspond to the actual paths of the persons 401, 402 at prior time intervals. The social network 113 may then extract the feature vector f.sub.j.sup.(S) for each person 401, 402, and the stochastic model 108 may sample a value for the latent variable z.sub.t from the learned prior.

Claim 4. Blaiotta further teaches wherein the GNN comprises: a fully-connected GNN configured to generate the hidden state information corresponding to a state of relations between pairs of the entities based on the feature vector. Blaiotta [0024] teaches the classification model may be applied to at least the hidden states of the first and the second object.

Blaiotta [0025] teaches when determining pairwise interaction types based on observation data, a hidden state of an interacting object may be determined by recursively applying a recurrent model such as a GRU or LTSM. As discussed above, hidden states are typically determined using propagation data. By using a recurrent model, also history data may be efficiently and generically used in the hidden representations. For example, propagation data from the second to the first object may comprise the hidden states of the objects as determined by the recurrent model. However, it is also possible to, instead or in addition, use the current observed feature vectors as input to the propagation model.

Blaiotta [0026] teaches not only hidden states based on preceding observations are used to determine interaction types, but also further hidden states based on following observations. In particular, a further hidden state of a first interacting object may be determined based on at least an immediately following further hidden state of the first interacting object. 

Blaiotta [0026] teaches similarly to the regular hidden states, a propagation model may be used to integrate further hidden states or object feature vectors of other interacting objects into the further hidden state, and the further hidden states may be determined using a recurrent model such as a GRU or LTSM. To determine an interaction type between a first and second interaction object, the further hidden state may be used as input to the classification model. By using the further hidden states, additional knowledge from future observations may be used to more accurately infer the interaction types.

Blaiotta [0059] teaches interaction type 411 may be predicted by a prior model with which the decoder model together forms a generative model. This may be an appropriate choice for making predictions on future behaviour of interacting objects after having observed one or more time steps.

Claim 5. Blaiotta and Zhang further teaches wherein the estimating of the dynamic interaction comprises: generating prior information corresponding to the entities based on the hidden state information; Blaiotta [0077] teaches a Graph Neural Network (GNN) 630 may be used to determine parameters of a probability distribution for predicting an object feature vector. The GNN may takes as input sequence embeddings of previous object feature vectors together with the current latent graph configuration, e.g. interaction types, and may outputs node-level graph embeddings providing the parameters of the probability distribution for predicting the object feature vector. Graph Neural Network 630 may be parametrized with various number of layers and various type of aggregation functions, although it is preferred to use aggregation functions that are permutation invariant and/or differentiable.

generating posterior information predicted in association with the entities based on the prior information and the hidden state information; Zhang [0051-0055] 

Zhang [0097] teaches Example 7 includes the subject matter of example 1, the computer-readable storage medium storing instructions that when executed by the processor circuit cause the processor circuit to: learn the prior distribution based on a plurality of recursive hidden states of a posterior LSTM and a prior vector of the first person at a third time interval, the third time interval prior to the first time interval, the prior vector to comprise a direction of movement and a speed of the direction of movement of the first person at the second time interval, the prior distribution learned based at least in part on the following equation: p.sub.ψ(z.sub.t|f.sub.<t.sup.S)=LSTM.sub.ψ(f.sub.t-1.sup.S), the value for the latent variable sampled based at least in part on the following equation: q.sub.ϕ(z.sub.t|f.sub.≤t.sup.S)=LSTM.sub.ϕ(f.sub.t.sup.S).

Zhang [0108] teaches Example 18 includes the subject matter of example 12, the memory storing instructions which when executed by the processor circuit cause the processor circuit to: learn the prior distribution based on a plurality of recursive hidden states of a posterior LSTM and a prior vector of the first person at a third time interval, the third time interval prior to the first time interval, the prior vector to comprise a direction of movement and a speed of the direction of movement of the first person at the second time interval, the prior distribution learned based at least in part on the following equation: p.sub.ψ(z.sub.t|f.sub.<t.sup.S)=LSTM.sub.ψ(f.sub.t-1.sup.S), the value for the latent variable sampled based at least in part on the following equation: q.sub.ϕ(z.sub.t|f.sub.≤t.sup.S)=LSTM.sub.ϕ(f.sub.t.sup.S).

Zhang [0119] teaches Example 29 includes the subject matter of example 23, further comprising: learning the prior distribution based on a plurality of recursive hidden states of a posterior LSTM and a prior vector of the first person at a third time interval, the third time interval prior to the first time interval, the prior vector to comprise a direction of movement and a speed of the direction of movement of the first person at the second time interval, the prior distribution learned based at least in part on the following equation: p.sub.ψ(z.sub.t|f.sub.<t.sup.S)=LSTM.sub.ψ(f.sub.t-1.sup.S), the value for the latent variable sampled based at least in part on the following equation: q.sub.ϕ(z.sub.t|f.sub.≤t.sup.S)=LSTM.sub.ϕ(f.sub.t.sup.S).

Zhang [0120] teaches Example 30 includes the subject matter of example 23, the hierarchical LSTM comprising at least two LSTMs including a first LSTM and a second LSTM, the first LSTM to receive the value of the latent variable and the second feature vector as input, the second LSTM to receive an output of the first LSTM and the first feature vector as input, the second LSTM to generate the output vector.

and generating a latent variable corresponding to the dynamic interaction between the entities based on the prior information and the posterior information. Zhang [0097], [0108], [0119] and [0120] 

Claim 6. Blaiotta and Zhang further teaches wherein the prior information is determined based on a history of relations between the entities up to a time point before the first time point, and on feature vectors of the entities input up to the first time point. Blaiotta [0020] teaches still, by using these representations to determine pairwise interaction types, in other words by conditioning the prior on the history of object trajectories, correlations among interacting objects may be taken into account.

Blaiotta [0025] teaches when determining pairwise interaction types based on observation data, a hidden state of an interacting object may be determined by recursively applying a recurrent model such as a GRU or LTSM. As discussed above, hidden states are typically determined using propagation data. By using a recurrent model, also history data may be efficiently and generically used in the hidden representations. 

Blaiotta [0020] teaches pairwise interaction types between first and second interacting objects are predicted by determining respective representations of interacting objects based on previous object feature vectors of the respective interacting objects, and determining pairwise interaction types based on such representations. Determining the representations and/or using them to predict interaction types may be performed with trainable models that are part of the decoder model. Interestingly, this way, correlations among the different interacting objects can be taken into account for the predictions in a relatively efficient way. As the inventors realized, in order to take into account correlations among interacting objects, it may be possible to model attributes of different pairs of object at a given time-step as being conditionally dependent on the previous interaction types.

Zhang [0043] teaches FIG. 2 is a schematic 200 illustrating an example of stochastic trajectory prediction, according to one embodiment. As shown, the schematic 200 depicts example trajectory histories 201-203 for persons 204-206 depicted in one or more images. On the left side of FIG. 2, the trajectory histories 201-203 may include locations of each person 204-206 at an example time interval t−1. Generally, the trajectory module 104 may then predict the location of each person 204-206 at time interval t, where time interval t is later in time than time interval t−1.

Zhang [0044] teaches as shown, the feature vectors f.sub.1,t-1.sup.(D), f.sub.2,t-1.sup.(D), f.sub.3,t-1.sup.(D) may be computed for each person 204-206, respectively, e.g., based on Equation 1 above. As stated, these feature vectors may correspond to destination-based features, such as where each person 204-206 is traveling towards, the velocity of travel, prior history of movement, etc. Based on the extracted features f.sub.1,t-1.sup.(D), f.sub.2,t-1.sup.(D), f.sub.3,t-1.sup.(D) and/or the analysis of each image, the social graph 107 may be generated. As stated, the social graph 107 represents each person identified in an image as a node. If two people are determined to interact in any way, an edge may connect the nodes representing the two people in the social graph 107.

Claim 7. Blaiotta further teaches wherein the generating of the prior information comprises: generating the prior information by transferring the hidden state information as forward state information to a forward long short-term memory (LSTM).  Blaiotta [0025] teaches when determining pairwise interaction types based on observation data, a hidden state of an interacting object may be determined by recursively applying a recurrent model such as a GRU or LTSM.

Blaiotta [0085] teaches for example, the hidden state 541 at t = τ of a first interacting object may be determined in a forward recursion operator Fwr, 520, using hidden state 540 of the interacting object at t = τ - 1. Effectively, forward recursion Fwr may filter information coming from the past. 

Blaiotta [0091] teaches to determine an interaction type between a first and second interacting object, a forward pass 730 may be made in which hidden states of respective interacting objects are determined, and optionally, also a backward pass 780 may be made in which further hidden states of respective interacting objects are determined.

Blaiotta [0092] teaches as an input to the forward pass 730, a graph neural network (GNN) 710 may be used to determine aggregated propagation data, e.g., Mjf,t−1, for an interacting object at a particular point in time. The aggregated propagation data may aggregate propagation data from second interacting objects to the first interacting object determined by applying a propagation model, e.g., fefk, selected according to the interaction type, e.g., zijkt−1, between the interacting objects. The aggregated propagation data may be concatenated, 720, with current observation data, and input into a recurrent model, in this case a GRU 731. Applying the GRUs at respective time points may result in a hidde state, e.g., fit at the time point for which the interaction type is to be determined. Accordingly, hidden states, e.g., fit, fjt,may be determined for the two interacting objects between which the interaction type is to be determined.
In the backward pass 780, further hidden states, e.g., bit, bjt, of the interacting objects may be based on immediately following further hidden states of the interacting objects by again applying a recurrent model, in this case a GRU 781.

Claim 8. Blaiotta further teaches wherein the generating of the posterior information comprises: generating the posterior information by transferring the prior information and the hidden state information as backward state information to a backward LSTM. Blaiotta [0025] teaches when determining pairwise interaction types based on observation data, a hidden state of an interacting object may be determined by recursively applying a recurrent model such as a GRU or LTSM. 

Blaiotta [0085] teaches apart from hidden state 541, also a further hidden state 561 may be used to determine interaction type 580. In a backward recursion operation Bwr, 550, further hidden state 561, e.g., at time t = τ, of an interacting object may be determined based on at least an immediately following further hidden state 560, e.g., at time t = τ + 1.

Blaiotta [0091] teaches to determine an interaction type between a first and second interacting object, a forward pass 730 may be made in which hidden states of respective interacting objects are determined, and optionally, also a backward pass 780 may be made in which further hidden states of respective interacting objects are determined.

Blaiotta [0092] teaches as an input to the forward pass 730, a graph neural network (GNN) 710 may be used to determine aggregated propagation data, e.g., Mjf,t−1, for an interacting object at a particular point in time. The aggregated propagation data may aggregate propagation data from second interacting objects to the first interacting object determined by applying a propagation model, e.g., fefk, selected according to the interaction type, e.g., zijkt−1, between the interacting objects. The aggregated propagation data may be concatenated, 720, with current observation data, and input into a recurrent model, in this case a GRU 731. Applying the GRUs at respective time points may result in a hidde state, e.g., fit at the time point for which the interaction type is to be determined. Accordingly, hidden states, e.g., fit, fjt,may be determined for the two interacting objects between which the interaction type is to be determined.
In the backward pass 780, further hidden states, e.g., bit, bjt, of the interacting objects may be based on immediately following further hidden states of the interacting objects by again applying a recurrent model, in this case a GRU 781.

Claim 9. Blaiotta and Zhang further teaches wherein the generating of the latent variable comprises: sampling a result in which the prior information and the posterior information are combined; Zhang [0097] teaches Example 7 includes the subject matter of example 1, the computer-readable storage medium storing instructions that when executed by the processor circuit cause the processor circuit to: learn the prior distribution based on a plurality of recursive hidden states of a posterior LSTM and a prior vector of the first person at a third time interval, the third time interval prior to the first time interval, the prior vector to comprise a direction of movement and a speed of the direction of movement of the first person at the second time interval, the prior distribution learned based at least in part on the following equation: p.sub.ψ(z.sub.t|f.sub.<t.sup.S)=LSTM.sub.ψ(f.sub.t-1.sup.S), the value for the latent variable sampled based at least in part on the following equation: q.sub.ϕ(z.sub.t|f.sub.≤t.sup.S)=LSTM.sub.ϕ(f.sub.t.sup.S).

Zhang [0108] teaches Example 18 includes the subject matter of example 12, the memory storing instructions which when executed by the processor circuit cause the processor circuit to: learn the prior distribution based on a plurality of recursive hidden states of a posterior LSTM and a prior vector of the first person at a third time interval, the third time interval prior to the first time interval, the prior vector to comprise a direction of movement and a speed of the direction of movement of the first person at the second time interval, the prior distribution learned based at least in part on the following equation: p.sub.ψ(z.sub.t|f.sub.<t.sup.S)=LSTM.sub.ψ(f.sub.t-1.sup.S), the value for the latent variable sampled based at least in part on the following equation: q.sub.ϕ(z.sub.t|f.sub.≤t.sup.S)=LSTM.sub.ϕ(f.sub.t.sup.S).

Zhang [0119] teaches Example 29 includes the subject matter of example 23, further comprising: learning the prior distribution based on a plurality of recursive hidden states of a posterior LSTM and a prior vector of the first person at a third time interval, the third time interval prior to the first time interval, the prior vector to comprise a direction of movement and a speed of the direction of movement of the first person at the second time interval, the prior distribution learned based at least in part on the following equation: p.sub.ψ(z.sub.t|f.sub.<t.sup.S)=LSTM.sub.ψ(f.sub.t-1.sup.S), the value for the latent variable sampled based at least in part on the following equation: q.sub.ϕ(z.sub.t|f.sub.≤t.sup.S)=LSTM.sub.ϕ(f.sub.t.sup.S).

Zhang [0130] teaches Example 40 includes the subject matter of example 34, further comprising: means for learning the prior distribution based on a plurality of recursive hidden states of a posterior LSTM and a prior vector of the first person at a third time interval, the third time interval prior to the first time interval, the prior vector to comprise a direction of movement and a speed of the direction of movement of the first person at the second time interval, the prior distribution learned based at least in part on the following equation: p.sub.ψ(z.sub.t|f.sub.<t.sup.S)=LSTM.sub.ψ(f.sub.t-1.sup.S), the value for the latent variable sampled based at least in part on the following equation: q.sub.ϕ(z.sup.t|f.sub.≤t.sup.S)=LSTM.sub.ϕ(f.sub.t.sup.S).

and generating the latent variable corresponding to the dynamic interaction between the entities at the first time point based on a result of the sampling. Blaiotta [0007] teaches for example, given observations of the objects at multiple points in time, the latent connectivity dynamics corresponding to these observations may be expressed in terms of interaction types between each pair of objects at each point in time. Interestingly, the interaction types between a pair of objects may vary over time, e.g., in a sequence of interaction types for a pair of objects, at least two different interaction types may occur. For example, two pedestrians may first be moving independently from each other but may start affecting each other's trajectories as they get closer to each other. Thereby, a more accurate model of the connectivity dynamics may be obtained than if interaction types are assumed to remain static over time.

Blaiotta [0020] teaches in particular, a pairwise interaction type between a first and second interacting object may be predicted based on a previous pairwise interaction type between the first and the second interacting object by sampling the pairwise interaction type according to the set of transition probabilities

Blaiotta [0024] teaches similarly, also an interaction type may be determined using the classification model by applying the classification model to determine parameters of a probability distribution for the interaction type, and sampling the interaction type from the parameters of the probability distribution. In other words, the problem of estimating interaction types may be phrased from a Bayesian perspective as that of evaluating a posterior probability distribution over interaction types of the interacting objects. The classification model may be applied to at least the hidden states of the first and the second object.

Blaiotta [0096] teaches to train the models, a training instance X = {x.sub.1, ..., x.sub.N} may be obtained, e.g., randomly selected from training data 050, and processed by the encoder model to obtain a finite set of samples drawn from a variational posterior distribution over the interaction types. Such a sample Z may provide interaction types z.sub.ij for sets of interacting objects (i,j). The samples returned by the encoder model may then be processed by the decoder model to obtain parameters of the conditional probability distribution p(X|Z) for respective samples Z. The parameter values computed by the decoder model may be used to evaluate a reconstruction error loss, e.g., an expected negative log-likelihood of the training data.

Blaiotta [0097] teaches to obtain a full variational lower bound loss, it is possible to include a regularizing term to the reconstruction error. The regularizing term may evaluate a divergence, e.g., a Kullback-Leibler divergence, between the determined sequences of pairwise interaction types, e.g., their variational posterior distribution, and predicted interaction types obtained by applying the decoder model, e.g., the prior model.

Claim 10. Blaiotta and Zhang further teaches wherein the generating of the latent variable comprises: optimizing the latent variable based on the prior information.  Blaiotta [0015] teaches models are provided that allow evolving latent connectivity dynamics between a set of interacting objects to be effectively inferred from a set of sequential observations and/or that allow behaviour of such interacting objects corresponding to such connectivity dynamics to be predicted. Interestingly, this may be done without directly measuring pairwise relations between the objects. The decoder model, e.g., a conditional likelihood model, may define how observed object feature vectors are generated given the latent representation, e.g., time-dependent pairwise interaction types. Optionally, the decoder model may be combined with a prior model for predicting pairwise interaction types to form together a generative model explaining dynamics of systems with hidden interaction configurations. The decoder model and/or prior may be trained similarly to a variational autoencoder together with the encoder model, e.g., by optimizing a loss corresponding to a training dataset, e.g., in terms of a variational lower bound on the marginal likelihood of the training data. Thereby, a more expressive description of the latent connectivity is obtained and as a result, more accurate predictions of future dynamics can be made.

Zhang [Abstract] teaches systems, methods, apparatuses, and computer program products to provide stochastic trajectory prediction using social graph networks. An operation may comprise determining a first feature vector describing destination features of a first person depicted in an image, generating a directed graph for the image based on all people depicted in the image, determining, for the first person, a second feature vector based on the directed graph and the destination features, sampling a value of a latent variable from a learned prior distribution, the latent variable to correspond to a first time interval, and generating, based on the sampled value and the feature vectors by a hierarchical long short-term memory (LSTM) executing on a processor, an output vector comprising a direction of movement and a speed of the direction of movement of the first person at a second time interval, subsequent to the first time interval,  [0010], [0020], [0035-0040], [0046], [0052], [0054-0055]

Claim 11. Blaiotta and Zhang further teaches wherein the entities of the target comprise at least one of body parts of a user, joints of a user, pedestrians, vehicles, or players of a sports team. Blaiotta [0002] teaches various physical systems can be naturally represented as a set of objects with relational dependencies among them. For example, a traffic situation around an autonomous vehicle may be seen as a system in which various other actors, such as other vehicles, bikes, and pedestrians, interact with each other and thereby influence each other's trajectories. 

Blaiotta [0007] teaches for example, given observations of the objects at multiple points in time, the latent connectivity dynamics corresponding to these observations may be expressed in terms of interaction types between each pair of objects at each point in time. Interestingly, the interaction types between a pair of objects may vary over time, e.g., in a sequence of interaction types for a pair of objects, at least two different interaction types may occur. For example, two pedestrians may first be moving independently from each other but may start affecting each other's trajectories as they get closer to each other. Thereby, a more accurate model of the connectivity dynamics may be obtained than if interaction types are assumed to remain static over time.

Zhang [0020] teaches while people are used as a reference example of objects herein, the disclosure is applicable to predicting the movement of other types of objects, such as autonomous vehicles, robots, animals, and the like.

Claim 12. Blaiotta and Zhang further teaches wherein the predicting of the motion of the entities comprises: predicting the motion changing at the second time point by decoding the estimated dynamic interaction. Blaiotta [0007] teaches in order to model the latent connectivity dynamics of such interacting objects, interestingly, the inventors devised to capture pairwise interactions between the objects at a particular point in time using interaction types from a set of multiple interaction types. For example, given observations of the objects at multiple points in time, the latent connectivity dynamics corresponding to these observations may be expressed in terms of interaction types between each pair of objects at each point in time. Interestingly, the interaction types between a pair of objects may vary over time, e.g., in a sequence of interaction types for a pair of objects, at least two different interaction types may occur. For example, two pedestrians may first be moving independently from each other but may start affecting each other's trajectories as they get closer to each other. Thereby, a more accurate model of the connectivity dynamics may be obtained than if interaction types are assumed to remain static over time. 

Zhang [0040] teaches the social network 113 may then extract the feature vectors f.sup.(D), f.sup.(S), the stochastic model 108 may sample the latent variable, and the decoder model 109 may predict the next location of each person depicted in the image. The weights, biases, activations, and any other learnable parameters (e.g., of the models 108-109, LSTMs 110-111, social graph network 113, etc.) may then be refined during training based on how close the predicted location for each person is to the ground-truth location for each person (e.g., on the accuracy of the predicted location generated by the decoder model 109).

Claim 13. Zhang further teaches wherein the outputting of the result to which the predicted motion is applied comprises: processing the image of the first time point to be an image of the second time point by applying the predicted motion to the entities comprised in the image of the first time point; Zhang [Abstract] teaches generating, based on the sampled value and the feature vectors by a hierarchical long short-term memory (LSTM) executing on a processor, an output vector comprising a direction of movement and a speed of the direction of movement of the first person at a second time interval, subsequent to the first time interval.

Zhang [0010] teaches the social graph may comprise a directed graph that is updated at each of a plurality of time intervals given the location of persons depicted in the images and the velocity of any movement of the persons. Similarly, the temporal stochastic method to model uncertainty on social interactions between two or more persons depicted in the images may be updated at each time interval. Generally, at each time interval, the temporal stochastic method may sample a latent variable from a learned prior (that may vary across time) and use the sampled latent variable to generate diverse predictions. To generate all destination-oriented and/or social-plausible paths, the temporal stochastic method may leverage a hierarchical long short-term memory (LSTM) to progressively predict where the persons may move to next.

and outputting the image of the second time point. Zhang [0020] teaches the trajectory module 104 is generally configured to generate output reflecting predicted movement of objects depicted in one or more images, e.g., images captured by the image capture device 103 and/or images received from another source.

Claim 14. Blaiotta and Zhang further teaches wherein the outputting of the result to which the predicted motion is applied comprises: processing the image of the first time point to be an image of the second time point by applying the predicted motion to the entities comprised in the image of the first time point; perceiving whether a dangerous situation occurs based on the image of the second time point;
Blaiotta [0002] teaches various physical systems can be naturally represented as a set of objects with relational dependencies among them. For example, a traffic situation around an autonomous vehicle may be seen as a system in which various other actors, such as other vehicles, bikes, and pedestrians, interact with each other and thereby influence each other's trajectories. As another example, a manufacturing process may be considered as a system in which various devices, e.g., autonomous robots, interact with each other according to interaction dynamics. In such cases, although no explicit rules for interactions may be known, still the interactions may show a degree of predictability. In various settings, e.g., in a monitoring system for the manufacturing process or a control system of the autonomous vehicle, it is desirable to be able to use this predictability to automatically reason about the state of such a system, e.g., in order to make predictions about its future state. For example, the autonomous vehicle may brake if a dangerous traffic situation is predicted. In order to reason accurately about such systems, it is desirable to model the systems in terms of the individual physical objects and the way that they interact with each other.

Blaiotta [0016] teaches as an example, an autonomous vehicle may use the inferred interaction configurations of traffic participants, e.g., as a latent scene representation. For example, the inferred interaction types may be used as an input to a behavioural planning module of the autonomous vehicle in order to select manoeuvres that minimize the risk of collision with other traffic participants and/or ensure socially-compliant interaction with pedestrians.

Blaiotta [0017] teaches the trained models are used to generate synthetic observation data for use as training and/or test data in training a further machine learning model, for example, a neural network. For example, the sequences of predicted object feature vectors may represent trajectories of fictitious interacting objects at discrete points in time. Simulated data may be used for data augmentation, e.g., in order to train the second machine learning on larger datasets and/or datasets of situations for which it is hard to obtain training data, e.g., dangerous traffic situations, rare combinations of weather and/or traffic conditions, etcetera, without the need to perform further real physical measurements

Zhang [0011] teaches embodiments disclosed herein provide techniques to more accurately predict the movement of persons depicted in images. Doing so may improve the safety and reliability of different computing systems that predict where a person is moving. For example, using the techniques of the disclosure, a computing system may more accurately determine the future locations of one or more pedestrians depicted in an image. An autonomous vehicle may use the location data to determine that a future collision is likely to occur between the autonomous vehicle and one or more of the pedestrians. The autonomous vehicle may then perform an operation to avoid a collision with the pedestrian, e.g., by generating an alert that is outputted to the pedestrian (e.g., honking the horn of the autonomous vehicle) and/or changing the movement of the autonomous vehicle (e.g., slowing down, changing direction, and/or stopping). Embodiments are not limited in this context.

Zhang [0018] teaches for example, the trajectory module 104 may determine that a collision is likely to occur (e.g., beyond a threshold level of likelihood) with a pedestrian depicted in an image. In such an example, the navigation logic 112 may modify the movement of the autonomous vehicle (e.g., change direction of movement, change the speed of movement, stop movement, etc.).

and outputting an alarm corresponding to the dangerous situation. Zhang [0018] teaches Furthermore, the navigation logic 112 may receive signals from the trajectory module 104 based on processing of images captured by the image capture device 103.  For example, the trajectory module 104 may determine that a collision is likely to occur (e.g., beyond a threshold level of likelihood) with a pedestrian depicted in an image. In such an example, the navigation logic 112 may modify the movement of the autonomous vehicle (e.g., change direction of movement, change the speed of movement, stop movement, etc.). Similarly, the trajectory module 104 and/or the navigation logic 112 may output a warning signal (e.g., honking a horn of the autonomous vehicle, emitting light signals from the autonomous vehicle, etc.). More generally, regardless of the implementation, the trajectory module 104 may output warning signals that include audio signals, visual signals, and/or data signals. 

Claim 15. Blaiotta further teaches further comprising: determining the entities of the target of which the motion is to be predicted. Blaiotta [0002] teaches various physical systems can be naturally represented as a set of objects with relational dependencies among them. For example, a traffic situation around an autonomous vehicle may be seen as a system in which various other actors, such as other vehicles, bikes, and pedestrians, interact with each other and thereby influence each other's trajectories. 

Claim 16. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, causes the processor to perform the image processing method of claim 1. The reviewed and analysis of claim 1 applies to claim 16. See the above analysis. 

Claim 17. It differs from claim 1 in that it is an image processing apparatus performing the method of claim 1. Therefore claim 17 has been reviewed and analyzed in the same way as claim 1. See the above analysis.

Claim 18. Blaiotta and Zhang further teaches wherein the processor further comprises: a prior configured to generate prior information that is determined based on a history of relations between the entities up to a time point before the first time point and on feature vectors corresponding to the entities input up to the first time point; Blaiotta [0020] teaches still, by using these representations to determine pairwise interaction types, in other words by conditioning the prior on the history of object trajectories, correlations among interacting objects may be taken into account.

Blaiotta [0025] teaches when determining pairwise interaction types based on observation data, a hidden state of an interacting object may be determined by recursively applying a recurrent model such as a GRU or LTSM. As discussed above, hidden states are typically determined using propagation data. By using a recurrent model, also history data may be efficiently and generically used in the hidden representations. 

Blaiotta [0020] teaches pairwise interaction types between first and second interacting objects are predicted by determining respective representations of interacting objects based on previous object feature vectors of the respective interacting objects, and determining pairwise interaction types based on such representations. Determining the representations and/or using them to predict interaction types may be performed with trainable models that are part of the decoder model. Interestingly, this way, correlations among the different interacting objects can be taken into account for the predictions in a relatively efficient way. As the inventors realized, in order to take into account correlations among interacting objects, it may be possible to model attributes of different pairs of object at a given time-step as being conditionally dependent on the previous interaction types.

Zhang [0020] teaches the types of human (or social) interactions may include, but are not limited to, a distance between two or more people depicted in image and/or whether one person is in view of another person in the image (e.g., based on whether vectors associated with two people intersect). [0034] and [0042-0050]

an encoder configured to generate a latent variable corresponding to the dynamic interaction between the entities based on the feature vector and the prior information; Blaiotta [0009] teaches apart from predicting object feature vectors based on future interaction types, it is also possible to determine past interaction types corresponding to sequences of observed object feature vectors. A model, referred to herein as encoder model, may be trained to most accurately determine these interaction types, e.g., to accurately select particular interaction type for a pair of objects, or to accurately determine probabilities of respective interaction types for a pair of objects. For example, such a latent representation may indicate, for each pair of objects and each measured time point, an estimate of the probability that these two objects, at that point in time, were interacting according to respective interaction type. Interaction types determined in this way may provide a particularly meaningful representation of the past observations, for example, for use in another machine learning model…

and a decoder configured to predict the motion of the entities changing at the second time point based on the latent variable. Blaiotta [0007] teaches in order to model the latent connectivity dynamics of such interacting objects, interestingly, the inventors devised to capture pairwise interactions between the objects at a particular point in time using interaction types from a set of multiple interaction types. For example, given observations of the objects at multiple points in time, the latent connectivity dynamics corresponding to these observations may be expressed in terms of interaction types between each pair of objects at each point in time. Interestingly, the interaction types between a pair of objects may vary over time, e.g., in a sequence of interaction types for a pair of objects, at least two different interaction types may occur. For example, two pedestrians may first be moving independently from each other but may start affecting each other's trajectories as they get closer to each other. Thereby, a more accurate model of the connectivity dynamics may be obtained than if interaction types are assumed to remain static over time. 

Claim 19. Blaiotta and Zhang further teaches wherein the encoder comprises: a fully-connected graph neural network (GNN) configured to generate hidden state information corresponding to a state of relations between pairs of the entities based on the feature vector; a forward long short-term memory (LSTM) configured to generate the prior information corresponding to the entities of the target in the image of the first time point based on the hidden state information; a backward LSTM configured to generate posterior information predicted based on the dynamic interaction between the entities based on the prior information and the hidden state information; Apply the rationale and analysis of claims 4, 7 and 8. See the above analysis.

and a multi-layer perceptron (MLP) configured to generate the latent variable corresponding to the dynamic interaction between the entities at the first time point based on the prior information transferred through the forward LSTM and the posterior information transferred through the backward LSTM. Blaiotta [0078] teaches for example, GNN 630 may perform edge-wise concentration 632 in the sense that, to determine propagation data from a second interacting object to a first interacting object, representations of the interacting objects as computed by the GRU are used. The respective representations may be input to message passing functions fd.Math.k,[AltContent: rect] fdΣk,[AltContent: rect] 634, in this case, implemented as multi-layer perceptrons (MLPs).

Zhang [0025] teaches for the individual features f.sup.(D), the social graph network 113 may comprise a one-layer multi-layer perceptron (MLP) (e.g., a neural network) with a rectified linear unit (ReLU) to concatenate the (x,y) coordinates for the person p.sub.j,t and the velocity v.sub.j,t=p.sub.j,t−p.sub.j,t-1 of the person as input. Stated differently, the individual features f.sup.(D) of person j at time interval t may be determined according to the following Equation 1:

Zhang [0028] teaches in Equation 2, f.sub.j.sup.(0)=f.sub.j.sup.(D) at initialization, M.sub.ij corresponds to a message passed from person i to person j in the social graph 107, (W.sup.i, b.sup.i) denote weight and bias parameters for the input MLP with ReLU, and (W.sup.g, b.sup.g) denotes the weight and bias parameters for the global MLP with ReLU. The message may generally represent the first edge, e.g., that person i interacted with (and/or is in view of) person j in some way. The input x.sub.ij.sup.L to the social graph network 113 to compute vector f.sup.(S) may be denoted by Equation 3:

Claim 20. Blaiotta and Zhang further teaches comprising at least one of a head-up display (HUD), a three-dimensional (3D) digital information display (DID) (3D DID), a 3D mobile device, or a smart vehicle. Blaiotta [0002] teaches as another example, a manufacturing process may be considered as a system in which various devices, e.g., autonomous robots, interact with each other according to interaction dynamics.

Blaiotta [0044] teaches as a concrete example, system 200 may be an automotive control system for controlling a vehicle. Sensor 072 may be a camera providing images based on which observation data 030 is determined. The vehicle may be an autonomous or semi-autonomous vehicle, but system 200 can also be a driver-assistance system of a non-autonomous vehicle.

Zhang [0011] teaches an autonomous vehicle may use the location data to determine that a future collision is likely to occur between the autonomous vehicle and one or more of the pedestrians. The autonomous vehicle may then perform an operation to avoid a collision with the pedestrian, e.g., by generating an alert that is outputted to the pedestrian (e.g., honking the horn of the autonomous vehicle) and/or changing the movement of the autonomous vehicle (e.g., slowing down, changing direction, and/or stopping). Embodiments are not limited in this context.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661