Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This Office Action is in response to the application 16/680,395 filed on 11/11/2019.
Claims 1 – 20 have been examined and are pending in this application.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/28/2020 and 01/14/2020. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1 – 4, 10 – 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Shalev-Shwartz et al. (US 2018/0314266 A1) in view of Bai et al. (US 2019/0084151 A1).

Regarding claim 1, Shalev-Shwartz discloses: “a computer-implemented method for multi-agent reinforcement learning with periodic parameter sharing [see para: 0215; the system may implement a multi-agent approach. For example, the system may take into account data from various sources and/or images capturing from multiple angles], comprising: 
inputting at least one occupancy grid to a convolutional neural network (CNN) and at least one vehicle dynamic parameter into a first fully connected layer, wherein the at least one occupancy grid and the at least one vehicle dynamic parameter are associated with at least one of: an ego agent and a target agent [see para: 0205; In FIG. 11B, the situation is slightly different. Here, host vehicle 1105 senses one or more target vehicles 1107 entering the main roadway 1112 from merge lane 1111. In this situation, once driving policy module 803 encounters merge node 913, it may choose to initiate an overtake left maneuver in order to avoid the merging situation]; 
providing Q value estimates for agent actions based on processing of the concatenated outputs and choosing at least one autonomous action to be executed by at least one of: the ego agent and the target agent [see para: 0226; A double merge navigational situation, as depicted in FIG. 11D, provides an example further illustrating these concepts. In a double merge, vehicles approach the merge area 1130 from both left and right sides. And, from each side, a vehicle, such as vehicle 1133 or vehicle 1135, can decide whether to merge into lanes on the other side of merge area 1130. Successfully executing a double merge in busy traffic may require significant negotiation skills and experience and may be difficult to execute in a heuristic or brute force approach by enumerating all possible trajectories that could be taken by all agents in the scene. In this double merge example, a set of Desires, , appropriate for the double merge maneuver may be defined. Figure may be the Cartesian product of the following sets: Figure [0, υmax]×L×{g, t, o}n, where [0, νmax] is the desired target speed of the host vehicle, L={1, 1.5, 2, 2.5, 3, 3.5, 4} is the desired lateral position in lane units where whole numbers designate a lane center and fractional numbers designate lane boundaries, and {g, t, o} are classification labels assigned to each of then other vehicles. The other vehicles may be assigned “g” if the host vehicle is to give way to the other vehicle, “t” if the host vehicle is to take way relative to the other vehicle, or “o” if the host vehicle is to maintain an offset distance relative to the other vehicle]; and 
processing a multi-agent policy that accounts for operation of the ego agent and the target agent with respect to one another within a multi-agent environment based on the at least one autonomous action to be executed by at least one of: the ego agent and the target agent [see para: 0215; Further, in some embodiments, the system may implement a multi-agent approach. For example, the system may take into account data from various sources and/or images capturing from multiple angles. Further, some disclosed embodiments may provide economy of energy, as anticipation of an event which does not directly involve the host vehicle, but which may have an effect on the host vehicle can be considered, or even anticipation of an event that may lead to unpredictable circumstances involving other vehicles may be a consideration (e.g., radar may “see through” the leading vehicle and anticipation of an unavoidable, or even a high likelihood of an event that will affect the host vehicle)].
Shalev-Shwartz does not explicitly disclose: “concatenating outputs of the CNN and the first fully connected layer, wherein the concatenated outputs of the first fully connected layer and the CNN are inputted into a long short-term memory unit (LSTM)”.
However, Bai, from the same or similar field of endeavor teaches: “concatenating outputs of the CNN and the first fully connected layer, wherein the concatenated outputs of the first fully connected layer and the CNN are inputted into a long short-term memory unit (LSTM) [see para: 0066; At the top of FIG. 2A is an instance grasping model 135. The instance grasping model 135 includes a first branch that is a CNN portion 136 that includes a plurality of convolutional layers. The instance grasping model 135 also includes a second branch that is a mask CNN portion 137 that also includes a plurality of convolutional layers. The output of CNN portion 136 and the output of mask CNN portion 137 are both connected to the input of combined layers 138. For example, in use, output generated based on processing of data over CNN portion 136 can be concatenated with output generated based on processing of separate data over mask CNN portion 137—and the concatenated outputs can be applied as input to combined layers 138. The combined layers 138 can include, for example, one or more fully connected layers]; 
It would have been obvious to the person of ordinary skill in the art before the effective filing date of the claimed invention to modify the autonomous vehicle using reinforcement learning system disclosed by Shalev-Shwartz to add the teachings of Bai as above, in order to provide output generated based on processing of data over CNN portion can be concatenated with output generated based on processing of separate data over mask CNN portion—and the concatenated outputs can be applied as input to combined layers. The combined layers 138 can include, for example, one or more fully connected layers. The instance grasping model can be a machine learning model, such as a deep neural network model that includes one or more convolutional neural network (“CNN”) portions [Bai see para: 0066]. 

Regarding claim 2, Shalev-Shwartz and Bai disclose all the limitation of claim 1 and are analyzed as previously discussed with respect to that claim.
	Furthermore, Shalev-Shwartz discloses: “wherein inputting at least one occupancy grid includes processing the at least one occupancy grid based on LiDAR data and image data that is sensed by at least one of: the ego agent and the target agent [see para: 0003; For example, an autonomous vehicle may need to process and interpret visual information (e.g., information captured from a camera), information from radar or lidar, and may also use information obtained from other sources (e.g., from a GPS device, a speed sensor, an accelerometer, a suspension sensor, etc.). At the same time, in order to navigate to a destination, an autonomous vehicle may also need to identify its location within a particular roadway (e.g., a specific lane within a multi-lane road), navigate alongside other vehicles, avoid obstacles and pedestrians, observe traffic signals and signs, travel from one road to another road at appropriate intersections or interchanges, and respond to any other situation that occurs or develops during the vehicle's operation. And see para: 0245].

Regarding claim 3, Shalev-Shwartz and Bai disclose all the limitation of claim 1 and are analyzed as previously discussed with respect to that claim.
	Furthermore, Shalev-Shwartz discloses: “further including processing at least one vehicle dynamic state that includes the at least one vehicle dynamic parameter, wherein the at least vehicle dynamic parameter is based on vehicle dynamic data that is sensed by at least one of: the ego agent and the target agent [see para: 0188; Another technique may include model based learning and planning (learning the probability of state transitions and solving the optimization problem of finding the optimal V). Combinations of these techniques may also be used to train the learning system. In this approach, the dynamics of the process may be learned, namely, the function that takes (st, at) and yields a distribution over the next state St+1. Once this function is learned, the optimization problem may be solved to find the policy π whose value is optimal. This is called “planning”. One advantage of this approach may be that the learning part is supervised and can be applied offline by observing triplets (st, at, st+1)].

Regarding claim 4, Shalev-Shwartz and Bai disclose all the limitation of claim 1 and are analyzed as previously discussed with respect to that claim.
	Furthermore, Shalev-Shwartz discloses: “wherein concatenating the outputs includes concatenating processed data associated with image and LiDAR coordinate points output by the CNN [see para: 0075; Sensory information (such as images, radar signal, depth information from lidar or stereo processing of two or more images) of the environment may be processed together with position information, such as a GPS coordinate, vehicle's ego motion, etc. to determine a current location of the vehicle relative to the known landmarks, and refine the vehicle location] and processed data associated with the at least one vehicle dynamic parameter output by the first fully connected layer into agent environmental and dynamic data that is associated with at least one of: the ego agent and the target agent [see para: 0337; As described above, the next state, st+1, may be decomposed into a sum of a predictable part, {circumflex over (N)}(st, at), and a non-predictable part, νt. The expression, {circumflex over (N)}(st, at), may represent the dynamics of vehicle locations and velocities (which may be well-defined in a differentiable manner), while νt may represent the target vehicles' acceleration. It may be verified that {circumflex over (N)}(st, at) can be expressed as a combination of ReLU functions over an affine transformation, hence it is differentiable with respect to st and at. The vector νt may be defined by a simulator in a non-differentiable manner, and may implement aggressive behavior for some targets and defensive behavior for other targets. Two frames from such a simulator are shown in FIGS. 17A and 17B].

Regarding claim 10 and 19, claim 10 and 19 is rejected under the same art and evidentiary limitations as determined for the method of claim 1.

Regarding claim 11, claim 11 is rejected under the same art and evidentiary limitations as determined for the method of claim 2.

Regarding claim 12, claim 12 is rejected under the same art and evidentiary limitations as determined for the method of claim 3.

Regarding claim 13, claim 13 is rejected under the same art and evidentiary limitations as determined for the method of claim 4.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Madani et al (US 2019/0188848 A1)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Masum Billah whose telephone number is (571)270-0701. The examiner can normally be reached Mon - Friday 9 - 5 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie J. Atala can be reached on (571) 272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MASUM BILLAH/Primary Patent Examiner, Art Unit 2486