NOTE:
Applicant's arguments filed 20 September 2021 have been fully considered but they are not persuasive.

Applicant argues that the cited art does not teach estimating optimal rewards based on transition probabilities predicted using state trajectories of the expert demonstrations.
However, while Li does teach using image states as inputs to a CNN, it also teaches that the agent imitates the behavior of an expert policy by matching the generated state-action distribution πE with the expert distribution, minimizing the divergence between them, while the discriminator tries to distinguish state-action pairs from the trajectories (pg. 3, section 2.3) and where for the policy network, input visual features are passed through two convolutional layers, and then combined with the auxiliary information vector and (in the case of Info-GAIL) the latent code to produce the expected accumulated future reward (pg. 7, section 5.2, etc.).

Regarding the proposed amendments to claim 1, examiner agrees that Li does not appear to teach the claimed calculation of the reward function but notes the attached references which are related (see attached forms).


/GEORGE GIROUX/Primary Examiner, Art Unit 2128