DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 20 Dec 21 has been entered.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claims 8-10 and 18-19 are rejected under 35 U.S.C. 112(a) because the specification, while being enabling for a response threshold duration that specifically begins when an alert prompt is presented to a user on an alertness monitor such as one or more cockpit devices (per Paragraphs 42-44), the specification does not reasonably provide enablement for these broader claim limitations that merely use the term “response threshold time” (which under Broadest Reasonable Interpretation could be either an instant in time or a duration), and are not used in conjunction with any requirement for that time/duration to specifically start when the alert prompt is provided by the cockpit output device.  As such, the response threshold time as claimed could potentially be any arbitrary instant in time or arbitrary duration that does not necessarily have to start/stop according to any particular trigger such as the instant time” (as examples, if a user does respond to the alert prompt almost instantaneously but there is a failure and/or latency of the response due to communication and/or electro-mechanical issues, then are these claim limitations met?  Or what if a user does respond to the alert prompt almost instantaneously but the response is not an appropriate, expected, and/or reasonable response?).  By applying the factors set forth in In re Wands (858 F.2d 731, 737, 8 USPQ2d 1400, 1404 (Fed. Cir. 1998)) and MPEP 2164.08 “Enablement Commensurate in Scope With the Claims”, specifically at least: (A) the breadth of the claims, (F) the amount of direction provided by the inventor, and (G) the existence of working examples, it is clear that the specification thus does not enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention commensurate in scope with these claims.  It is therefore suggested to amend “a/the response threshold time” to specifically be “a/the response threshold duration that begins upon the alert prompt being provided by the cockpit output device”, or the like.  For purposes of compact prosecution, Examiner is interpreting “a/the response threshold time” as though it is actually “a/the response threshold duration that begins upon the alert prompt being provided by the cockpit output device” (commensurate in scope with the enabled portion of the Specification’s Paragraphs 42-44).  Appropriate corrections are required.
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-2, 11-12, and 20 (and Claims 3-10, 13, and 15-19 due to dependency) are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.  Firstly, each of the independent Claims 1, 11, and 20 include a first limitation that states that a plurality of inputs comprises a first labeling input and a first alert input, but then include in a later second limitation that an alert prompt signals the user to provide an alert input indicating that the user is alert.  It is thus indefinite as to whether or not “an alert input” is meant to be the same as “a first alert input” (and if so, “an alert input” should remote-sensing data captured by at least one remote-sensing devices of the vehicle; however, these claims are dependent upon independent Claims 1 and 11, respectively, which already state that a perception system is trained to detect objects outside of the vehicle based on sensor data captured by at least one sensor of the vehicle.  It is thus indefinite as to whether or not “remote-sensing data” is meant to be the same as “sensor data” (and if so, “remote-sensing data” should instead read “the sensor data”), or if they are meant to be different (and if so, “remote-sensing data” should instead read something along the lines of “remote-sensing data unique from the sensor data” to clearly differentiate the two types of data from each other).  It is thus also indefinite as to whether or not “at least one remote-sensing devices” (which appears to also include an inadvertent “s” at the end of “devices”) is meant to be the same as “at least one sensor” (and if so, “at least one remote-sensing devices” should instead read “the at least one sensor”), or if they are meant to be different (and if so, “at least one remote-sensing devices” should instead read something along the lines of “at least one remote-sensing device unique from the at least one sensor” to clearly differentiate the two types of data collecting means from each other).  For purposes of compact prosecution, Examiner is interpreting “remote-sensing data” to instead read “the sensor data”, indicating that they are being treated as the same type of data, and interpreting “at least one remote-sensing devices” to instead read “the at least one sensor”, indicating that they are being treated as the same data collecting means.  Appropriate corrections are required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 11-13, 15-16, and 20 are rejected under 35 U.S.C. 103 as being obvious over Aizawa et al. (US 2020/0023863, which has foreign priority to JP 2017-048169, filed 14 Mar 17), herein “Aizawa”, in view of Rajkumar et al. (US 10730181, filed 27 Dec 17), herein “Rajkumar”, further in view of Liu et al. (US 2018/0275667, filed 27 Mar 17), herein “Liu”.
As an initial matter, it should be noted that the Rajkumar reference is a published US Patent, and purely in an effort to expedite the Applicant’s review of this reference so as to not require referencing citations with columns and line numbers, which has been known to be cumbersome at times, the citations provided by the Examiner are instead listed as paragraph numbers, similar to how the Examiner cited to paragraphs within any pre-grant publications used for the prior art rejections found herein.  For citations to this Rajkumar reference, a paragraph number of “1” indicates either the “Background” (from the Brief Summary of the Invention, which per Rajkumar starts at “(1)” and goes up to “(58)”) or it indicates the “Brief Description of the Drawings” (from the Detailed Description of the Invention, which follows the Brief Summary of the Invention, which per Rajkumar restarts at “(1)” and goes up to “(293)”).  As such, any paragraph number over 58 must be from the section for Detailed Description of the Invention, but any paragraph number 58 or below could be from either section.  If Applicant has difficulty finding these citations within this published US Patent using this citation format, or simply prefers the Examiner instead use the more traditional citation format of columns and line numbers, Examiner requests Applicant say so in their reply to this Office action, and at that point the Examiner would gladly change their citation format for the Rajkumar reference (or any other published US Patents) in all future Office correspondence.
Regarding Claims 1, 11, and 20 (each independent), Aizawa discloses:
vehicle 1), the vehicle system comprising: at least one hardware processor unit programmed to perform operations (“control unit 23 includes a processor 231”, Paragraph 52, “The embodiment may be implemented by a storage medium such as a read only memory (ROM) that stores a program causing the processor 231 to execute processing of each unit included in the processor 231”, Paragraph 113) comprising: (per Claim 1) / a method (“a concentration degree determination method”, Paragraph 1, also see Claim 6) for operating an autonomous vehicle (automatic driving control device 14, vehicle 1, “an automatic driving mode in which the vehicle is caused to run along a previously-set route regardless of the driving operation of the driver has been developed as an driving mode of a vehicle”, Paragraph 2), comprising: (per Claim 11) / a machine-readable medium comprising instructions thereon that, when executed by at least one processor unit, cause the at least one processor unit to perform operations (“control unit 23 includes a processor 231”, Paragraph 52, “The embodiment may be implemented by a storage medium such as a read only memory (ROM) that stores a program causing the processor 231 to execute processing of each unit included in the processor 231”, Paragraph 113, also see Claims 7-14) comprising: (per Claim 20)
executing a perception system, the perception system being trained to detect objects outside of the vehicle based on sensor data captured by at least one sensor of the vehicle (per Claim 1) / executing a perception system at the autonomous vehicle, the perception system being trained to detect objects outside of the autonomous vehicle based on sensor data captured by at least one sensor of the autonomous vehicle (per Claim 11) / executing a perception system at a vehicle, the perception system being trained to detect objects outside of the vehicle based on sensor data captured by at least one sensor of the vehicle (per Claim 20) (“The vehicle 1 further includes an external camera 6”, Paragraph 33, “The external camera 6 is installed at any position of the vehicle 1 so as to capture an image of an outside of the vehicle 1. Although one external camera 6 is illustrated in FIG. 1, the vehicle 1 may include a plurality of external cameras that capture images in different  “The input and output interface unit 21 connects each of the external camera 6…the automatic driving control device 14…to the control unit 23”, Paragraph 47; also see discussion below pertaining to both Rajkumar and Liu);
receiving a plurality of inputs from a user of the vehicle / the autonomous vehicle (i.e. eye gaze direction/-s of the user at one or more specific objects each time one or more objects (that may or may not be a first object) being shown on display 131, per the below citations), the plurality of inputs comprising a first labeling input received from the user, the first labeling input comprising a user-provided label for a first object outside the vehicle / the autonomous vehicle detected by the perception system, the user-provided label indicating a classification of the first object (wherein the first labeling input describes/confirms/rejects a classification label for a first object from the surroundings of the vehicle being shown on display 131 of Aizawa that is either not yet known to the vehicle’s classification system or is only known with a low degree of confidence and thus a user input confirming or rejecting it’s classification would help improve the vehicle’s classification system for the future; see discussion below pertaining to both Rajkumar and Liu) and a first alert input received from the user (i.e. the actual eye gaze direction of the user to a specific object following an object (that may or may not be the first object) being shown on display 131, each time there is a new object shown on the display, and obviously the driver’s response time to do so, “The object recognition degree is an index how much the driver recognizes an object (for example, visually), and is a degree to which the driver consciously confirms ), wherein the first labeling input is used to train the perception system to detect a second object outside of the vehicle / the autonomous vehicle, the second object matching the classification of the first object (see discussion below pertaining to Liu); and
after determining that no user input has been received for an input threshold duration (i.e. the actual eye gaze direction of the user to a specific object has not been made within a certain time duration, and thus the driving concentration degree estimated by the concentration degree estimator is below the applicable first or second reference; “the concentration degree estimator 2316 can refer to the concentration degree table stored in the concentration degree table storage 223, and estimate the level of the driving concentration degree corresponding to the state of the driver from the plurality of levels…The driving concentration degree may be estimated by the concentration degree estimator 2316 using an artificial intelligence (AI) function such as machine learning and deep learning”, Paragraphs 59-60, “When the driving mode is the automatic driving mode, and when the driving concentration degree estimated by the concentration degree estimator 2316 does not satisfy the first reference, the signal output unit 2318 outputs the instruction signal. On the other hand, when the driving mode is the manual driving mode, and when the driving concentration degree estimated by the concentration degree estimator 2316 does not satisfy the second reference, the signal output unit 2318 outputs the instruction signal”, Paragraph 69; also see obviousness discussion below), providing an alert prompt to a cockpit output device (“Upon receiving the instruction signal from the signal output unit 2318, the support providing device performs predetermined support to the driver. For example, the support providing device is the navigation device 13 or the audio output device 16”, Paragraph 69), causing the cockpit output device to provide the alert prompt to signal to the user to provide an alert input indicating that the user is alert (i.e. the actual eye gaze direction of the user to .
Firstly, Aizawa does not specifically disclose (a) the plurality of inputs comprise a first labeling input received from the user, the first labeling input comprising a user-provided label for a first object/event outside the vehicle, the user-provided label indicating a classification of the first object/event, and (b) wherein the first labeling input is used to train the perception system to detect a second object/event outside of the vehicle / the autonomous vehicle, the second object/event matching the classification of the first object/event.  However, these limitations are taught by Rajkumar (“The robot may include navigation components allowing it to set a course and travel along a self-directed path. The robot may include sensory capabilities allowing the robot to perceive its surrounding environment. The robot can include body components such as a chassis and other connecting components”, Paragraph 19, “robots may use a perception system that is designed to classify many types of objects”, Paragraph 28, “the user can issue a voice command, interact with the display of client device 108 that communicates with the robot, or interact with a screen of a robot”, Paragraph 31, “the user 106 may provide a classification label 120 for the embedding 118 produced by the robot 104B. For instance, the robot 104B may request that the user 106 provide a classification label 120 for an object that the robot does not recognize. The user 106 may provide the classification label 120, in any of various ways, e.g., voice input to the robot, or entering the information to a client device 108, such as a phone or computer that communicates with each of the robots 104. The client device 108 may communicate with the robots 104A-104D over any communication protocol, such as Bluetooth or Wi-Fi. For example, the user 106 may type in the classification label 120 "cup" into the client device 108. Alternatively, the user may speak an utterance detected by the client device 108, as illustrated in FIG. 1, reciting, "This is a `Cup`". In other implementations, the user 106 may communicate directly with the the computer system of the robot 1110 may include a machine learning model that generates an embedding that is related to an object identified by the robot 1110. In some cases, a user may teach the robot the classification of the object”, Paragraph 196) and Liu (“According to another aspect of the present disclosure, existing vehicle data logs that include event labels recorded by a human passenger can be used to train the machine learned classifier model. In particular, a human co-pilot or passenger can ride in an autonomous vehicle and make a record of when particular events occur. At the same time, the autonomous vehicle can record or otherwise store its collected vehicle data. The humanly-generated events records can be attached to the data collected by the autonomous vehicle at such time. Thus, although generally inefficient for the reasons discussed above, use of a human passenger to provide manual labels can result in a corpus of vehicle data logs that include event labels that are generally aligned in time with vehicle data that reflects such events. According to an aspect of the present disclosure, such corpus of vehicle data logs with event labels can be used to train the machine learned classifier model. However, using a human passenger to create and complete event records can, in some instances, cause a delay between when the event occurred and when the record was created, thereby Other techniques can be used to apply the manual event labels to their corresponding features as well. In some implementations, to generate the training data from the vehicle data logs and manual labels, the systems and methods of the present disclosure can extract, for each time at which an event label has been applied to the vehicle data, one or more features from the vehicle data log. For example, the feature extraction process described above can be applied at each instance in which a label has been applied to the vehicle data. The event label can then be associated with the one or more features extracted from the vehicle data log for the corresponding time or vice versa. Thus, as a result, a training set can be generated that includes a plurality of sets of features, where each set of features is labeled with a particular event label (e.g., “high acceleration”). The classification model can be trained using such training data”, Paragraphs 41-44).  It would be obvious for one of ordinary skill in the art at the time of filing to modify the display 131 of Aizawa to include the capabilities of client device 108 of Rajkumar as well as the learned classifier model using human passenger-created event/feature labels of Liu, so that there is a way that the user/driver/passengers can provide classification labels to one or more particular objects/events to improve the vehicle’s ability to recognize that/those specific object/-s/event/-s or that/those type-s/classification/-s of object/-s/event/-s in the future, and/or to share that user-acquired information pertaining to that/those object/-s/event/-s with its machine learning model and/or other vehicles’, thus improving that vehicle’s and possibly other vehicles’ machine learning model/-s and object/event recognition capabilities for the future.
Secondly, Aizawa does not specifically disclose that a new object/event is shown on the display (i.e. an alert prompt) specifically when no user input (i.e. the actual eye gaze direction of the user to a first object/event has not been made within a certain duration, and thus the driving concentration degree estimated by the concentration degree estimator is below the applicable first or second reference) is received within a threshold duration.  However, it is old and well known in the art, and would certainly be obvious, to utilize a time threshold such as a required response time threshold or a time until correct response time threshold, as a means for determining driving concentration degree, since concentration during driving is highly correlated to how quickly a user responds to specific stimuli (i.e. red light turns green but the driver takes longer than expected to accelerate, driver approaches a stop sign but takes longer than expected to start deceleration, obstacle ahead moves into driving path but the driver takes longer than expected to see the obstacle and/or adjust the trajectory of the vehicle, etc.).  As such, it would have been obvious to one of ordinary skill in the art at the time of filing to have further modified Aizawa to only provide alert prompts when no user input is received within an input threshold time, as is old and well known in the art and would certainly be obvious in view of Aizawa as modified with Rajkumar and Liu, in order to limit the number of times that the user is interrupted, especially during autonomous driving, to (besides helping to improve object/event recognition and machine learning capabilities) prove that their concentration degree is sufficient for whichever mode of driving the vehicle is in (which is obviously a higher degree in manual driving than autonomous driving).
Regarding Claims 2 and 12, Aizawa as modified by Rajkumar and further modified by Liu renders obvious the vehicle system of Claim 1 and the method of Claim 11, respectively, and Aizawa further discloses:
detecting the first object based at least in part on remote-sensing data (“The state detector 2315 extracts the object from the external image data in order to detect the object recognition degree. For example, the object is an installed object such as a sign or a building, but the object is not particularly limited as long as the object has a possibility of being visually recognized by the driver”, Paragraph 78) captured by at least one remote-sensing devices of the vehicle / autonomous vehicle (external camera 6, “The external camera 6 is installed at any position of the vehicle 1 so as to capture an image of an outside of the vehicle 1. Although one external camera 6 is illustrated in FIG. 1, the vehicle 1 may include a plurality of external cameras that capture images in different directions. The external camera 6 continuously captures the image of a running environment in a vicinity of the vehicle 1. The external camera 6 is activated in response to start of driving of the vehicle 1, and continuously captures the image of the outside of the vehicle 1. The external camera 6 outputs the captured image (hereinafter, also referred to as "external image data") to the concentration degree determination device 2 and the automatic driving control device 14”, Paragraph 34);
providing an indication of the first object to a cockpit output device (see Paragraphs 81-82 as shown in the rejections of Claims 1 and 11 above); and
receiving, via the cockpit input device, the first labeling input from the user (see the above citations to Rajkumar and Liu as described in the rejections of independent Claims 1 and 11 above).
Regarding Claims 3 and 13, Aizawa as modified by Rajkumar and further modified by Liu renders obvious the vehicle system of Claim 1 and the method of Claim 11, respectively, and Rajkumar further teaches that the user-provided label for the first object outside the vehicle / the autonomous “the user can issue a voice command, interact with the display of client device 108 that communicates with the robot, or interact with a screen of a robot”, Paragraph 31, “The spoken utterance can include various types of phrases and/or instructions directed towards each robot 104. If the robot 104B does not understand the spoken utterance, the robot 104B may ask the user 106 to repeat the instruction. For instance, the robot 104B may process the spoken utterance using speech recognition to determine the context of the instruction”, Paragraph 43, “the user 106 may type in the classification label 120 "cup" into the client device 108. Alternatively, the user may speak an utterance detected by the client device 108, as illustrated in FIG. 1, reciting, "This is a `Cup`". In other implementations, the user 106 may communicate directly with the robot 104B to provide the classification label 120 using a text input or speaking to the robot 10”, Paragraph 50).  It would have been obvious to one of ordinary skill in the art at the time of filing to have further modified the display 131 of Aizawa to allow not just touch screen or button input but also voice recognition and/or talk-to-text recognition, as further taught by Rajkumar, so that the driver may provide their classification labels (or confirmations/rejections of a previous classification label) without physically having to take their hands off the steering wheel while driving, thus improving safety.
Regarding Claim 4, Aizawa as modified by Rajkumar and further modified by Liu renders obvious the vehicle system of Claim 1, and Rajkumar further teaches that there are a plurality of buttons positioned at a cockpit of the vehicle (display 131 of Aizawa as previously modified by Rajkumar, since the client device 108 includes the capability of receiving text input, and text input includes a plurality of buttons; “For example, the user 106 may type in the classification label 120 "cup" into the client device 108…the user 106 may…provide the classification label 120 using a text input”, Paragraph 50), wherein a first button of the plurality of buttons is indicative of a first labeling input type (i.e. using a first letter versus a second letter when inputting the ), wherein a second button of the plurality of buttons is indicative of a second labeling input type (i.e. using a second letter versus a first letter when inputting the classification label, or for example, a second specific button to press for confirming and/or to rejecting a previously created classification label for the object/event that may or may not be correct and/or detailed enough for the user, as would also be an obvious design choice), and wherein the user-provided label for the first object outside the vehicle comprises data indicating one of the plurality of buttons that was actuated (i.e. the letters corresponding to the pressed letter buttons while typing in the label, or for example, indicating either a user-input-generated label or a machine-learning-generated label that the user perhaps confirms and/or rejects, as would also be an obvious design choice, “A robot that encounters a new object and learns the correct classification shares the embedding for the object with the rest of the fleet, so that each robot does not need to be individually trained to identify the object. As a result, when a human trains a single robot to identify an object, the system enables the other robots to also be able to identify the object, due to receiving the corresponding embedding and classification information”, Paragraph 24, “additional metadata can also include a description of how the robot 104 determined the classification of the identified object, such as with user input or with comparison to other classification labels and corresponding embeddings stored in the local cache of robot 104”, Paragraph 137).  However, regardless of these teachings from Rajkumar, it is merely a matter of obvious design choice to specifically utilize first/second buttons out of a plurality of buttons for user inputs relating to the classification labels for objects/events, as opposed to any other form of inputs (such as more buttons, less buttons, a touch screen, speech/voice recognition/commands, motion detection, eye gaze recognition, head orientation recognition, heat detection, etc.), as the better intended results.  Furthermore, the Applicant claiming within Claims 3 and 13 that the labeling input comprises an audio recording whereas Claim 4 instead requires a plurality of buttons indicates that even to the Applicant the structural means used to accomplish this function is merely a matter of design choice, since they are providing two completely different options within the claims themselves (rather than providing a single “best mode”).  As such, it would have been obvious to one of ordinary skill in the art at the time of filing to have further modified Aizawa to utilize a plurality of buttons as the structural means for determining one or more labeling inputs, as is further taught by Rajkumar and/or is merely a matter of obvious design choice, in order to provide a means for not just the user to input/approve/reject labeling inputs, which helps to improve object/event recognition and machine learning capabilities, but also for determining whether or not the user is sufficiently concentrating on their driving, which can then provide a historical record of when the user has sufficient concentration for driving vs. when the user does not have sufficient concentration for driving, which can be used to determine insurance rates and/or for improving safety measures and/or for improving the machine learning of the system.
Regarding Claims 5 and 15, Aizawa as modified by Rajkumar and further modified by Liu renders obvious the vehicle system of Claim 1 and the method of Claim 11, respectively, and Aizawa/Rajkumar/Liu further discloses/teaches that:
the first labeling input is received at a first time (see discussion pertaining to Rajkumar/Liu as per the rejections of independent Claims 1 and 11 above, wherein the first labeling input describes/confirms/rejects a classification label for an object/event from the surroundings of the vehicle being shown on );
accessing first sensor data captured at the first time (i.e. the exterior camera data from external camera 6 that includes an image showing at least the object/event, as per Aizawa);
detecting the first object (“The state detector 2315 extracts the object from the external image data in order to detect the object recognition degree. For example, the object is an installed object such as a sign or a building, but the object is not particularly limited as long as the object has a possibility of being visually recognized by the driver”, Paragraph 78) using the first sensor data (“The external camera 6 is installed at any position of the vehicle 1 so as to capture an image of an outside of the vehicle 1. Although one external camera 6 is illustrated in FIG. 1, the vehicle 1 may include a plurality of external cameras that capture images in different directions. The external camera 6 continuously captures the image of a running environment in a vicinity of the vehicle 1. The external camera 6 is activated in response to start of driving of the vehicle 1, and continuously captures the image of the outside of the vehicle 1. The external camera 6 outputs the captured image (hereinafter, also referred to as "external image data") to the concentration degree determination device 2 and the automatic driving control device 14”, Paragraph 34); and
storing a first training data package (“When the robot acquires new information, such as the classification for a previously unknown object, the robot can store a representation of the new information in the cache to make the information immediately available for the robot to use”, Paragraph 5 of Rajkumar) comprising the first labeling input (i.e. the classification label for the ), the first sensor data (i.e. the exterior camera data that includes an image showing at least the object/event, as per Aizawa), and an indication of the first object (i.e. the image including the object/event, as per Aizawa).
Regarding Claims 6 and 16, Aizawa as modified by Rajkumar and further modified by Liu renders obvious the vehicle system of Claim 1 and the method of Claim 11, respectively, and Aizawa/Rajkumar/Liu further discloses/teaches that:
the first labeling input is received at a first time (see discussion pertaining to Rajkumar and Liu as per the rejections of independent Claims 1 and 11 above, wherein the first labeling input describes/confirms/rejects a classification label for an object/event from the surroundings of the vehicle being shown on display 131 of Aizawa that is either not yet be known to the vehicle’s classification system or is only known with a low degree of confidence and thus a user input confirming or rejecting it’s classification would help improve the vehicle’s classification system);
determining first sensor data captured by the vehicle / the autonomous vehicle at the first time (i.e. the exterior camera data from external camera 6 that includes an image showing at least the object/event, as per Aizawa); and
storing an indication that the first sensor data corresponds to the first object (i.e. the indication is the object/event actually shown in the image; “the method further includes storing, by the robot, the sensor data used to generate input to the machine learning model used to generate the embedding”, Paragraph 23 of Rajkumar).
Claims 7-10 and 17-19 are rejected under 35 U.S.C. 103 as being obvious over Aizawa in view of Rajkumar and Liu, further in view of Gao (US 2017/0329331, filed 16 May 17).
Regarding Claims 7 and 17, Aizawa as modified by Rajkumar and further modified by Liu renders obvious the vehicle system of Claim 1 and the method of Claim 11, respectively, and Aizawa/Rajkumar/Liu further discloses/teaches that:
see discussion pertaining to Rajkumar and Liu as per the rejections of independent Claims 1 and 11 above, wherein the first labeling input describes/confirms/rejects a classification label for an object/event from the surroundings of the vehicle being shown on display 131 of Aizawa that is either not yet be known to the vehicle’s classification system or is only known with a low degree of confidence and thus a user input confirming or rejecting it’s classification would help improve the vehicle’s classification system);
the first labeling input comprises a description of a vehicle control input (“if the autonomous vehicle performs an uncomfortably high deceleration at a particular time, the human passenger might make a record (e.g., an electronic record) that a high deceleration event occurred at such time. The humanly-generated record can be attached to any data collected by the autonomous vehicle at such time (e.g., vehicle data)”, Paragraph 20 of Liu; additionally if the first labeling input is received in a sufficiently short time period after the object/event is shown on the display, and it’s not conflicting with other labels for other similar objects/events that are already known/confirmed, then this could be considered a vehicle control input since it signifies the driver is alert and paying attention (and thus may be used for enabling the vehicle to switch from an autonomous mode to a manual mode), but if it is not received in a sufficiently short time period after the object/event is shown on the display, or it is but it conflicts with other labels for other similar objects/events that are already known/confirmed, then this could be considered a vehicle control input since it signifies the driver is less alert and possibly not paying attention; especially if the actual eye gaze direction of the user to a specific object/event has not been made within a certain time duration, and thus the driving concentration degree estimated by the concentration degree estimator is below the applicable first or second );
determining first sensor data captured by the vehicle / the autonomous vehicle at the first time (i.e. the exterior camera data from external camera 6 that includes an image showing at least the object/event, as per Aizawa); and
storing an indication that the first sensor data corresponds to the vehicle control input (“Once the classifier model has been trained, the systems and methods of the present disclosure can assist in evaluating an autonomous vehicle motion control system completely based on sensor feedback and/or other vehicle data, thereby eliminating the need for human passengers to ride along and manually record event occurrences”, Paragraph 24 of Liu; additionally adjusting the value of the user’s driving concentration degree is based on the object/event from the image detected by the exterior camera, and that user’s driving concentration degree may be stored in association with one or more vehicle controls such as enabling the vehicle to perhaps switch from an autonomous mode to a manual mode (if the value is high enough), or disabling the vehicle from changing from an autonomous mode to a manual mode or even slowing or stopping the vehicle (if the value is too low), etc.; also see discussion pertaining to Gao described below).
While Liu (but not necessarily Aizawa/Rajkumar) does discuss the labeling inputs as describing a vehicle control input, it should be noted that if the labeling inputs at least describe a classification of the object/event that might or might not make sense for a particular type of object/event and/or whether or not the user’s gaze direction were or were not aligned to a first object/event, and that true or false values of those “checks” causes an adjustment to the user’s driving concentration degree, then if that adjusted user’s driving concentration degree is used for vehicle control purposes, then the labeling inputs per Aizawa/Rajkumar could also be considered to comprise a description of a vehicle control input.  Regardless, Gao teaches that when an alert prompt is not responded to within a threshold amount of time “the system may, responsive to a determination that the driver has not taken over control of the vehicle (such as after a period of time elapses following when an alert to the driver to take over control of the vehicle), function to slow or stop the vehicle. For example, if a pedestrian or deer or other vehicle is determined (such as via processing of image data captured by one or more cameras of the vehicle or processing of sensor data captured by one or more radar or lidar sensors of the vehicle) to be present in the path or route (or approaching the path or route) where the system does not expect such objects, the system may generate an alert to have the driver take over, and if, after, for example, 1 second or 2 seconds, the driver has not started manually driving/controlling the vehicle, the system may slow and/or stop the vehicle”, Paragraph 18).  It would have been obvious to one of ordinary skill in the art at the time of filing to have further modified Aizawa to have the labeling inputs include a description of a vehicle control input as further taught by Liu (such as slowing or stopping the vehicle as taught by Gao), in order to ensure that the vehicle does not continue operating in its current mode if the driver is not alert enough to deem continued driving safe, and that whenever this occurs, the data associating that need to slow or stop the vehicle is also associated/stored with the other relevant captured data pertaining to the user, the user’s gaze as detected by the internal camera, and the object/event as detected by the external camera (that was presumably missed, ignored, or mis-classified by the user).
Regarding Claims 8-10 and 18-19, Aizawa as modified by Rajkumar and further modified by Liu renders obvious the vehicle system of Claim 1 and the method of Claim 11, respectively, and Aizawa remains silent as to, but Gao teaches: determining that no response to the alert prompt is received within a response threshold time (per Claims 8 and 18); after determining that no response to the alert prompt is received within the response threshold time, stopping the vehicle (per Claims 9 and 19, dependent on Claims 8 and 18, respectively); and after determining that no response to the alert prompt is received within the response threshold time, disengaging a vehicle autonomy system (per Claim 10, dependent “the system may, responsive to a determination that the driver has not taken over control of the vehicle (such as after a period of time elapses following when an alert to the driver to take over control of the vehicle), function to slow or stop the vehicle. For example, if a pedestrian or deer or other vehicle is determined (such as via processing of image data captured by one or more cameras of the vehicle or processing of sensor data captured by one or more radar or lidar sensors of the vehicle) to be present in the path or route (or approaching the path or route) where the system does not expect such objects, the system may generate an alert to have the driver take over, and if, after, for example, 1 second or 2 seconds, the driver has not started manually driving/controlling the vehicle, the system may slow and/or stop the vehicle”, Paragraph 18; wherein the disengagement of a vehicle autonomy system is the disengagement of the autonomous driving itself once the vehicle is slowed/stopped).  It would have been obvious to one of ordinary skill in the art at the time of filing to have further modified Aizawa to utilize an alert response threshold time for judging whether or not the driver is alert enough to continue safe driving, and if not, to disengage a vehicle autonomy system and/or stop the vehicle, as taught by Gao, in order to ensure that the vehicle does not continue operating in its current mode if the driver is not alert enough to deem continued driving safe.
Response to Arguments
Applicant's arguments filed 20 Dec 21 have been fully considered but they have not been found persuasive.  Applicant’s respectfully submitted remarks argue that the amendment to independent Claim 1 (and similar amendments to the other independent Claims 11 and 20) now require (a) “executing a perception system, the perception system being trained to detect objects outside of the vehicle based on sensor data captured by at least one sensor of the vehicle”, and (b) “receiving a plurality of inputs from a user of the vehicle, the plurality of inputs comprising a first labeling input received from the user and a first alert input received from the user, the first labeling input comprising a user-provided label for a first object outside the vehicle detected by the perception system, the user-provided label indicating a classification of the first object, wherein the first labeling input is used to train the perception system to detect a second object outside of 
Firstly, regarding (a), Examiner holds that this is disclosed by Aizawa, previously of record (“The vehicle 1 further includes an external camera 6”, Paragraph 33, “The external camera 6 is installed at any position of the vehicle 1 so as to capture an image of an outside of the vehicle 1. Although one external camera 6 is illustrated in FIG. 1, the vehicle 1 may include a plurality of external cameras that capture images in different directions. The external camera 6 continuously captures the image of a running environment in a vicinity of the vehicle 1. The external camera 6 is activated in response to start of driving of the vehicle 1, and continuously captures the image of the outside of the vehicle 1. The external camera 6 outputs the captured image (hereinafter, also referred to as “external image data”) to…the automatic driving control device 14”, Paragraph 34, “The input and output interface unit 21 connects each of the external camera 6…the automatic driving control device 14…to the control unit 23”, Paragraph 47, “The state detector 2315 extracts the object from the external image data in order to detect the object recognition degree. For example, the object is an installed object such as a sign or a building, but the object is not particularly limited as long as the object has a possibility of being visually recognized by the driver”, Paragraph 78).
Secondly, regarding (b), Examiner holds that a person of ordinary skill in the art at the time of filing would deem this obvious in further view of Rajkumar, previously of record (“The robot may include navigation components allowing it to set a course and travel along a self-directed path. The robot may include sensory capabilities allowing the robot to perceive its surrounding environment. The robot can include body components such as a chassis and other connecting components”, Paragraph 19, “robots may use a perception system that is designed to classify many types of objects”, Paragraph 28, “the user can issue a voice command, interact with the display of client device 108 that communicates with the robot, or interact with a screen of a robot”, Paragraph 31, “the user 106 may provide a classification label 120 for the embedding 118 produced by the robot 104B. For instance, the robot 104B may request that the user 106 provide a classification label 120 for an object that the robot does not recognize. The user 106 may provide the classification label 120, in any of various ways, e.g., voice input to the robot, or entering the information to a client device 108, such as a phone or computer that communicates with each of the robots 104. The client device 108 may communicate with the robots 104A-104D over any communication protocol, such as Bluetooth or Wi-Fi. For example, the user 106 may type in the classification label 120 "cup" into the client device 108. Alternatively, the user may speak an utterance detected by the client device 108, as illustrated in FIG. 1, reciting, "This is a `Cup`". In other implementations, the user 106 may communicate directly with the robot 104B to provide the classification label 120 using a text input or speaking to the robot 104B”, Paragraph 50, “robot 104D can repeat the process discussed in FIG. 3 for each object observed by the robot 104D in order to classify the objects the robot 104D encounters. This recognition or classification process may be done very frequently, e.g., on an almost continuous basis as the robot observes its surroundings, and for multiple objects that may be in view at any given time”, Paragraph 84, “in the system 1100, a robot 1110 receives data from one or more sensors of the robot 1110. The computer system of the robot 1110 uses the sensor data to identify and classify one or more objects in the robot's surroundings. In some implementations, the computer system of the robot 1110 may include a machine learning model that generates an embedding that is related to an object identified by the robot 1110. In some cases, a user may teach the robot the classification of the object”, Paragraph 196).
Additionally, the newly found prior art reference, Liu, also clearly renders obvious (b) when used to further modify Aizawa (“According to another aspect of the present disclosure, existing vehicle data logs that include event labels recorded by a human passenger can be used to train the machine learned classifier model. In particular, a human co-pilot or passenger can ride in an autonomous vehicle and make a record of when particular events occur. At the same time, the autonomous vehicle can record or otherwise store its collected vehicle data. The humanly-generated events records can be attached to the data collected by the autonomous vehicle at such time. Thus, although generally inefficient for the reasons discussed above, use of a human passenger to provide manual labels can result in a corpus of vehicle data logs that include event labels that are generally aligned in time with vehicle data that reflects such events. According to an aspect of the present disclosure, such corpus of vehicle data logs with event labels can be used to train the machine learned classifier model. However, using a human passenger to create and complete event records can, in some instances, cause a delay between when the event occurred and when the record was created, thereby leading to at least some of the event records having a slight delay in their timestamp relative to when the event occurred. To remedy this issue, in some implementations, the systems of the present disclosure can apply or re-apply each event label to any potentially referenced events that occurred within some time window prior to the timestamp of such event label. Thus, for each event label included in the vehicle data log, the computing system can identify a plurality of potentially referenced events within a time window prior to the particular time associated with the event label and associate the event label with each of the plurality of potentially referenced events. To provide an example, for a particular high acceleration event label, all potential events of high acceleration (e.g., instances in which acceleration is greater than a threshold value) within a sixty second window prior to the event label can be labeled as positives for high acceleration events. In other implementations, candidate events within the time window can be identified as described above (e.g., through identification of relative peaks in scale components), and the event label can be applied to each of such candidate events. Other techniques can be used to apply the manual event labels to their corresponding features as well. In some implementations, to generate the training data from the vehicle data logs and manual labels, the systems and methods of the present disclosure can extract, for each time at which an event label has been applied to the vehicle data, one or more features from the vehicle data log. For example, the feature extraction process described above can be applied at each instance in which a label has been applied to the vehicle data. The event label can then be associated with the one or more features extracted from the vehicle data log for the corresponding time or vice versa. Thus, as a result, a training set can be generated that includes a plurality of sets of features, where each set of features is labeled with a particular event label (e.g., “high acceleration”). The classification model can be trained using such training data”, Paragraphs 41-44).
It would be obvious for one of ordinary skill in the art at the time of filing to modify the display 131 of Aizawa to include the capabilities of client device 108 of Rajkumar as well as the learned classifier model using human passenger-created event/feature labels of Liu, so that there is a way that the user/driver/passengers can provide classification labels to one or more particular objects/events to improve the vehicle’s ability to recognize that/those specific object/-s/event/-s or that/those type-s/classification/-s of object/-s/event/-s in the future, and/or to share that user-acquired information pertaining to that/those object/-s/event/-s with its machine learning model and/or other vehicles’, thus improving that vehicle’s and possibly other vehicles’ machine learning model/-s and object/event recognition capabilities for the future.
As such, Examiner holds that the claims as currently presented are still in fact unpatentable under 35 USC 103 for being rendered obvious by at least the combination of Aizawa, Rajkumar, and now Liu.
Conclusion
The Examiner has cited particular paragraphs in the references applied to the claims above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well.  It is respectfully requested of the Applicant in preparing responses, to fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.  See MPEP 2141.02 [R-07.2015] VI. A prior art reference must be considered in its entirety, i.e., as a whole, including portions that would lead away from the claimed Invention.  W.L. Gore & Associates, Inc. v. Garlock, Inc., 721 F.2d 1540, 220 USPQ 303 (Fed. Cir. 1983), cert, denied, 469 U.S. 851 (1984).  See also MPEP §2123.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure, and may be found on an accompanying PTO-892, when applicable.  When a PTO-892 exists, all cited references have either (a) been utilized in the above rejections for their specific teachings (wherein .
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS E WORDEN whose telephone number is 571-272-4876.  The examiner can normally be reached between 1000-1700hrs, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Geepy Pe can be reached on 571-270-3703.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.