DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	The examiner acknowledges receipt of remarks dated May 16, 2022.

Response to Arguments
Applicant's arguments filed May 16, 2022 have been fully considered but they are not persuasive. In particular, the applicants state that the applied art Yan does not teach or suggest “determining an action target box based on the plurality of candidate boxes, wherein the action target box comprises a face local region and an action interactive object; and categorizing the predetermined action based on the action target box to obtain an action recognition result.” The examiner respectfully traverses. Yan in section II(b) and Figures 1 and 2, teaches determining plurality of skin like regions and then selecting the most influential contextual region as the secondary region as input into action recognition network. For example, in Figure 1 Yan shows plurality of candidate skin like regions including hand, arm or face of the driver. Further in Figure 6, Yan teaches selecting the most influential of region (shown with a blue box) that includes the action target. Thus, examiner submits that Yan clearly teaches determining the action target region (blue box shown in Figure 6) and inputting that box into the CNN to determine the action class. 
Additionally, the applicants state that “However, in the 3rd or 4th image in row 2 of FIG. 6 of Yan, the red box comprises all areas of a face, which is different from the face local region of the present application. The blue box does not comprise any face area, which is also different from the present application.” The examiner respectfully traverses. Figure 6 of Yan clearly shows multiple blue boxes that include a face region of the driver. Moreover, in response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “the face local region of the present application”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
It is for these reasons examiner submits that Yan still teaches claim 1. 

Response to Amendment
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-8, 10-12, 15, 16 and 18-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “Driver Behavior Recognition Based on Deep Neural Networks” by Yan et al. (hereinafter ‘Yan’).
In regards to claim 1, Yan teaches an action recognition method, comprising: (See Yan Figure 1) 
extracting a feature in an image comprising a human face; (See Yan Section II(A), Yan teaches extracting skin region from regions comprising human face.)
determining, based on the feature, a plurality of candidate boxes comprising a predetermined action; (See Yan Figure 2, Green Boxes, Yan teaches candidate regions comprising action.) 
determining an action target box based on the plurality of candidate boxes, wherein the action target box comprises a face local region and an action interactive object; and categorizing the predetermined action based on the action target box to obtain an action recognition result. (See Yan Figure 6 and Section III(E), Yan teaches action target box with the classified driver action within the box.).

In regards to claim 2, Yan teaches wherein the face local region comprises at least one of: a mouth region, an ear region, or an eye region, wherein the action interactive object comprises at least one of: a container, a cigarette, a mobile phone, food, a tool, a beverage bottle, glasses, or a mask, wherein the action target box further comprises a hand region, wherein the predetermined action comprises at least one of: calling, smoking, drinking water/beverages, eating food, using a tool, putting on glasses, or doing makeup.  (See Yan Figure 1).

In regards to claim 3, Yan teaches further comprising: capturing an image of a person in a vehicle by a vehicle-mounted camera, wherein the image comprises the human face, wherein the person in the vehicle comprises at least one of: a driver at a driving region of the vehicle, a person at a front passenger seat region of the vehicle, or a person at a rear seat of the vehicle, 48wherein the vehicle-mounted camera comprises an RGB camera, an infrared camera, or a near-infrared camera.  (See Yan Section III(a)).

In regards to claim 4, Yan teaches wherein extracting the feature in the image comprising the human face comprises: extracting the feature in the image comprising the human face by using a feature extraction branch of a neural network to obtain a feature map. (See Yan Section III(c)).

In regards to claim 5, Yan teaches wherein determining, based on the feature, the plurality of candidate boxes comprising the predetermined action comprises: determining, by using a candidate box extraction branch of the neural network, the plurality of candidate boxes comprising the predetermined action from the feature map, wherein determining, on the feature map by using the candidate box extraction branch of the neural network, the plurality of candidate boxes comprising the predetermined action comprises: dividing features in the feature map according to a feature corresponding to the predetermined action to obtain a plurality of candidate regions; and obtaining, according to the plurality of candidate regions, the plurality of candidate boxes and a first confidence of each of the plurality of candidate boxes, wherein the first confidence indicates a probability that the candidate box is the action target box. (See Yan Section II(B), Yan teaches RPN method for determining candidate action boxes and the action target box.) 

In regards to claim 6, Yan teaches wherein determining the action target box based on the plurality of candidate boxes comprises: determining the action target box based on the plurality of candidate boxes by using a bounding box refining branch of the neural network.  (See Yan Figure 2).

In regards to claim 7, Yan teaches wherein determining the action target box based on the plurality of candidate boxes by using the bounding box refining branch of the neural network comprises: removing, by using the bounding box refining branch of the neural network, a candidate 49box having a first confidence smaller than a first threshold to obtain at least one first candidate box; performing pooling processing on the at least one first candidate box to obtain at least one second candidate box; and determining the action target box according to the at least one second candidate box. (See Yan Figure 2 and Section II(B)). 

In regards to claim 8, Yan teaches wherein performing pooling processing on the at least one first candidate box to obtain the at least one second candidate box comprises: respectively performing pooling processing on the at least one first candidate box to obtain at least one first feature region corresponding to the at least one first candidate box; and adjusting a position and a size of a respective first candidate box based on each of the at least one first feature region to obtain the at least one second candidate box.  (See Yan Section II(B), Yan teaches R*CNN which comprises a region pooling.)

In regards to claim 10, Yan teaches wherein categorizing the predetermined action based on the action target box to obtain the action recognition result comprises: 50obtaining a region map corresponding to the action target box from a feature map by using an action categorization branch of a neural network, and categorizing the predetermined action based on the region map to obtain the action recognition result. (See Yan Figure 1 and Section II(B).)

In regards to claim 11, Yan teaches wherein the neural network is obtained by pre-supervised training based on a training image set, and the training image set comprises a plurality of sample images, wherein annotation information of each of the plurality of sample images comprises an action supervision box and an action category corresponding to the action supervision box. (See Yan Section II(B), Yan teaches training for R*CNN network).

In regards to claim 12, Yan teaches wherein the training image set comprises a positive sample image and a negative sample image, the action of the negative sample image is similar to that of the positive sample image, and wherein an action supervision box of the positive sample image comprises: a face local region and an action interactive object, or a face local region, a hand region, and an action interactive object, wherein the action of the positive sample image comprises calling, and the negative sample image comprises scratching an ear; the positive sample image comprises smoking, eating food, or drinking water, and the negative sample image comprises opening mouth or putting a hand on lips. (See Yan Section III(A)).

In regards to claim 15, Yan teaches comprising: acquiring, by using a vehicle-mounted camera, a video stream comprising a face image of a driver; obtaining an action recognition result of at least one image frame from the video stream through the action recognition method according to claim 1; and generating dangerous driving prompt information in response to the action recognition result satisfying a predetermined condition. (See Yan Figure 1 and Section I, Yan teaches driver behavior recognition system).

In regards to claim 16, Yan teaches wherein the predetermined condition comprises at least one of: occurrence of a particular predetermined action, a number of times that the particular predetermined action occurs within a predetermined duration, or a maintained duration of the occurrence of the particular predetermined action in the video stream. (See Figure 6 and Section III(E)).

Claims 18-20 recite limitations that are similar to that of claim 1. Therefore, claims 18-20 are rejected similarly as claim 1.

Claims 19 recite limitations that are similar to that of claim 15. Therefore, claim 19 are rejected similarly as claim 15.

Allowable Subject Matter
Claims 9, 13, 14 and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	The following is a statement of reasons for the indication of allowable subject matter:  
In regards to claims 9, 13, 14 and 17, the applied art does not teach or suggest the claimed limitations.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to UTPAL D SHAH whose telephone number is (571)272-5729. The examiner can normally be reached M-F: 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward Urban can be reached on 571-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/UTPAL D SHAH/Primary Examiner, Art Unit 2665