DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 5/19/2022 has been entered.
Response to Amendment
In response to the final action on 2/22/2022 and the after final of 5/10/2022, the applicant has submitted an RCE, amending claims 1, 10, cancelling claims 2-5, 7, 11-13, 18, and 20, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered and determined persuasive in light of the latest amendments, Therefore,  claims 1, 8-10, 15-17, and 19 are allowable over the prior art of record for the below provided reasons for allowance.
EXAMINER’S AMENDMENT
The examiner has changed the title of the invention to “LIP LANGUAGE IDENTIFICATION METHOD AND APPARATUS, AND AUGMENTED REALITY (AR) DEVICE AND STORGAE MEDIUM, WHICH IDENTIFIES AN OBJECT BASED ON AN AZIMUTH ANGLE ASSOCIATED WITH AR FIELD OF VIEW” so as to be more descriptive of the invention.
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with the attorney on file, Mr. William C. Collard on 8/4/2022.

Amend claims 1 and 10 and the abstract:

As Per Claim 1:
Claim 1 (Currently Amended): A lip-language identification method based on an augmented reality device, the augmented reality device comprising a camera device and an infrared sensor, the method comprising:
acquiring, by the augmented reality device, a sequence of face images for an object to be identified;

sending, by the augmented reality device, the sequence of face images to a server;
performing, by the server, lip-language identification based on the sequence of face images, so as to determine semantic information of speech content of the object to be identified corresponding to lip actions in the face images; and
receiving, by the augmented reality device, the semantic information sent by the server and outputting the semantic information,
wherein acquiring the sequence of face images for the object to be identified, comprises:
acquiring a sequence of images including the object to be identified;
positioning the object to be identified and acquiring azimuth of the object to be identified; and
determining a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and generating the sequence of face images by cropping an image of the face region of the object to be identified from each frame of the images; and
wherein positioning the azimuth of the object to be identified, comprises:
positioning the azimuth of the object to be identified according to a voice signal emitted when the object to be identified is speaking, and
positioning the azimuth of the object to be identified by sensing the object to be identified through the infrared sensor;
wherein the azimuth of the object to be identified is an angle between the position of the object to be identified and a central axis of a field of view range of the camera device;
wherein the semantic information is semantic text information and/or semantic audio information;
wherein outputting the semantic information comprises:
displaying, by the augmented reality device, the semantic text information within a visual field of a user wearing the augmented reality device, in response to receiving a display mode instruction; and
playing, by the augmented reality device, the semantic audio information, in response to receiving an audio mode instruction.

As Per Claim 10:
Claim 10 (Currently Amended): A lip-language identification apparatus, comprising:

a face image sequence acquiring unit, configured to acquire a sequence of face images for an object to be identified;
a sending unit, configured to send the sequence of face images to a server, wherein the server determines semantic information corresponding to lip actions in the face images by performing lip-language identification; and
a receiving unit, configured to receive semantic information from the server,
an output unit, configured to output semantic information;
wherein the face image sequence acquiring unit comprises:
an image sequence acquiring subunit, configured to acquire a sequence of images for the object to be identified;
a positioning subunit, configured to position an azimuth of the object to be identified; and
a face image sequence generation subunit, configured to determine a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and crop an image of the face region of the object to be identified from the each frame image so as to generate the sequence of face images; and
wherein the positioning subunit is further configured to position the azimuth of the object to be identified according to a voice signal emitted when the object to be identified is speaking, and
position the azimuth of the object to be identified by sensing the object to be identified through an infrared sensor;
wherein the azimuth of the object to be identified is an angle between the position of the object to be identified and a central axis of a 
wherein the output unit comprises:
an output mode instruction generation subunit, configured to generate a display mode instruction, wherein the output mode instruction includes a display mode instruction and an audio mode instruction;
wherein the semantic information is semantic text information and/or semantic audio information, and the output unit further comprises:
a display subunit, configured to display the semantic text information within a visual field of a user wearing an augmented reality device upon receiving the display mode instruction; and
a play subunit, configured to play the semantic audio information upon receiving the audio mode instruction.

As Per Abstract:

A lip motion recognition method and a device therefor, and an augmented realty (AR) device and a storage medium.  The present lip motion recognition method comprises: obtaining a facial image sequence of an object to be recognized, and acquiring an azimuth angle which depends on the AR field of view; on the basis of the facial image sequence and the azimuth, performing lip motion recognition to determine the semantic information of the speech content of said object to which the lip motions in the facial images correspond; and displaying the semantic information. The present lip motion recognition method is able to convert the recognized lip motions of an object to be recognized into text or sound, realizing the translation of lip motions. In addition, components of existing AR devices can be used, without the need to add individual pieces of hardware, thus expanding the functionality of AR devices without adding cost, further improving user experience.

Allowable Subject Matter
The following is an examiner’s statement of reasons for allowance: The independent claims 1 and 10 teach about an “augmented reality device” that can do lip reading by capturing lip images and processing them to determine “semantic information” (spec. ¶ 0061: “For example” “identify all pronunciations represented by the lip-shape changes”), and then presents it in the “augmented” device as either “text” and/or “audio” to the person wearing the “augmented” device.
The capturing of the lip images is via acquiring sequences of images including the object to be identified and acquiring azimuth of the object to be identified, “wherein the azimuth of the object to be identified is an angle between the position of the object to be identified and a central axis of a field of view range of a camera” associated with the augmented reality device.
The prior art of record Vartanian et al. (US 2012/0242865), does teach in ¶ 0054 lines 3-7: “detect” “lip, mouth” “movement” “for speech recognition” and used by “device 100” to “automatically augment” according to ¶ 0045 lines 4-5. Furthermore, according to ¶ 0054 lines 5+: “Lip, mouth, or tongue movement may be detected when the user is speaking with sound or silently speaking without sound” “Images captured by camera in I/O devices” (acquiring a sequence of face images of user (object) being detected by the “device 100” camera). Finally, according to ¶ 0054 last 7 lines: “Images captured by camera” (the sequence of facial images) “processed” “to determine user input” “object device 100 may use lip or tongue movement” “for inputting text” “to assist with an existing speech or voice recognition system to interpret spoken language” (to determine i.e., the “text” (semantic information) of the “input” (speech content) corresponding to the lip or mouth movements of the “user” (object) being identified).
Vartanian et al. is silent on positioning the azimuth of the object to be identified according to a voice signal emitted, and also in means of outputting the semantic information by e.g. displaying it within the AR visual field and/or audio playing it.
Prasad et al. (US Patent 5,680,481) do teach in Col. 10 lines 5+: “speakers’ head axis of symmetry is constrained to be within a small angle of the vertical” (determining an “angle” (azimuth) of a “speaker’s” “lips” (object) while he is speaking as shown in Figs. 3, 6, 9). And Bailey et al. (US 2014/0129207) do teach in ¶ 0111 last sentence: “the destination language text” (the semantic text information) “can be displayed” (displayed) “on viewing surface 148 of prism 144 and superimposed on the user’s field of view” (within user’s visual field) “thereby achieving augmented reality” (in an augmented reality device) “functionality”).
The “angle θ” in Prasad et al. though cannot be mapped to the claim’s “azimuth”, because it is simply an angle between the object’s “head axis of symmetry and the vertical direction [in the plane of the head]”. It is completely independent of “a central axis of a field of view range of [AR] camera device”, which is basically an axis at right angles to the “object” “head” and its “axis of symmetry”. Using this “azimuth” by the “AR” “camera” will enable much better tracking of the target object by the camera and thus much better image sequence and consequently semantic information obtained by analysis of the images will result.
New search mandated by the latest amendments gave rise to Yang, Guang (CN 103853190A) that teaches on page 6 lines 8-9: “adjust the shooting angle” (position the azimuth) “of the monitoring camera to follow the trail of the target object” (so as to depend on the field of view of the camera) “the target object can always appear in the vicinity of the central position of the monitoring picture”. However, this does not involve “monitoring” “the target object” exclusively when “the object to be identified is speaking”, i.e., by “adjusting the shooting angle” according to the “target object” “lip movement”, and is basically directed to a “security monitoring system” which is non-analogous art with respect to the instant application.
Further search did not give rise to any other reference teaching this phenomenon and therefore these claims became allowable. Claims 8-9, 15, 19 (dependent on claim 1), and 16-17 (dependent on claim 10), further limit the scope of their allowed parent claims and are thus allowable under similar rationale.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/Farzad Kazeminezhad/
Art Unit 2657
August 4th 2022.