DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 4/17/2020 and 12/8/2020 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 6-7, 19-20 stand rejected:
Claim 6 recites the limitation "the azimuth" in the 2nd limitation.  There is insufficient antecedent basis for this limitation in the claim.

Regarding claims 19-20, the phrase "may execute instruction …" render the claims indefinite because it is unclear whether the limitation(s) following the word “may” are part of the claimed invention.  See MPEP § 2173.05(d).



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-5, 8, 11, 15, 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vartanian et al. (US 2012/0242865), and further in view of Bailey et al. (US 2014/0129207).
Regarding claim 1, Vartanian et al. do teach a lip-language identification method (¶ 0054 lines 3-7: “detect” “lip, mouth” “movement” “for speech recognition” and used by  “device 100” to “automatically augment” according to ¶ 0045 lines 4-5), 
comprising:

performing lip-language identification based on the sequence of face images, so as to determine semantic information of speech content of the object to be identified corresponding to lip actions in the face images(¶ 0054 last 7 lines: “Images captured by camera” (the sequence of facial images) “processed” “to determine user input” “object device 100 may use lip or tongue  movement” “for inputting text” “to assist with an existing speech or voice recognition system to interpret spoken language” (to determine i.e. the “text” (semantic information) of the “input” (speech content) corresponding to the lip or mouth movements of the “user” (object) being identified).
Vartanian et al. do not specifically disclose:
and outputting the semantic information.
Bailey et al. do teach:
Outputting the semantic information (Abstract lines 6+: “the commencement of lip movement by one of the potential speakers and reception of the utterance” “The utterance can be converted to text” (semantic information associated also with lip movement determined) “converted text can then be displayed to the user” (“text” 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the methods associated with perceptions in the augmented device of Bailey et al. into the corresponding ones associated with the augmented object device of Vartanian et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable a user wearing the augmented device in Vartainian et al. to be able to determine “which of” “potential speakers the converted text should be attributed” when there are plurality of users in the field of view of the user wearing the augmented device as disclosed in Bailey et al. abstract last sentence.

Regarding claim 2, Vartanian et al. do teach the lip-language identification method according to claim 1, wherein the performing lip-language identification based on the sequence of face images, so as to determine the semantic information of the speech content of the object to be identified corresponding to the lip actions in the face image, comprises:
sending the sequence of face images to a server, and performing, by the server, the lip-language identification so as to determine the semantic information of the speech content of the object to be identified corresponding to the lip actions in the face 

Regarding claim 3, Vartanian et al. do teach the lip-language identification method according to claim 2, further comprising: 
receiving semantic information sent by the server, 
Vartanian et al. do not specifically disclose:
receiving semantic information, in prior to the outputting the semantic information.
Bailey et al. do teach:

For obviousness to combine Vartanian et al. and Bailey et al. see claim 1.

Regarding claim 4, Vartanian et al. do teach the lip-language identification method according to claim 1,
 wherein the semantic information is semantic text information (¶ 0054 last 7 lines: “Images captured by camera” (the sequence of facial images) “processed” “to determine user input” “object device 100 may use lip or tongue  movement” “for inputting text” (semantic information is textual) “to interpret spoken language”)
and/or semantic audio information.

Regarding claim 5, Vartanian et al. do not specifically disclose the lip-language identification method according to claim 4,  wherein outputting the semantic information comprises:
displaying the semantic text information within a visual field of a user wearing an augmented reality device ; 

Bailey et al. do teach:
displaying the semantic text information within a visual field of a user wearing an augmented reality device (¶ 0111 last sentence: “the destination language text” (the semantic text information) “can be displayed” (displayed) “on viewing surface 148 of prism 144 and superimposed on the user's field of view” (within user’s visual field) “thereby achieving augmented reality” (in an augmented reality device) “functionality” (see Fig. 9 the texts associated with each user); as Fig. 1 top left shows the device is wearable like a glass). 
For obviousness to combine Vartanian et al. and Bailey et al. see claim 1.

Regarding claim 8, Vartanian et al. do teach the lip-language identification method according to claim 2, further comprising saving the sequence of face images, after acquiring the sequence of face images for the object to be identified (¶ 0042 last 3 lines: “The other user” “information” (e.g., his sequence of images) “may be stored” (saved) “and accessed on storage device 110”; furthermore, ¶ 0054 lines 4+: “read lip” by “Lip, mouth, or tongue movement” when “silently speaking” “Images captured by camera” (i.e., they are saved because) “processed by” (they are processed later by a) “software” “to determine user input” (by a program; i.e. speech here is recognized not 

Regarding claim 11, Vartanian et al. do not specifically disclose the lip-language identification apparatus according to claim 10, further comprising:
an output unit, configured to output semantic information.
Bailey et al. do teach:
An output unit, configured to output semantic information (Abstract lines 6+: “the commencement of lip movement by one of the potential speakers and reception of the utterance” “The utterance can be converted to text” (semantic information associated also with lip movement determined) “converted text can then be displayed to the user” (“text” (semantic information) is outputted) “in an augmented reality environment” (to the augmented reality device (output unit))).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the methods associated with perceptions in the augmented device of Bailey et al. into the corresponding ones associated with the augmented object device of Vartanian et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable a user wearing the augmented device in Vartainian et al. to be able to determine “which of” “potential speakers the converted text should be 

Regarding claim 15, Vartanian et al. do teach a lip-language identification apparatus, comprising:
a processor (¶ 0059 line 9: “processor”); 
and
a machine-readable storage medium, storing instructions that are executed by the processor (¶ 0059 line 5+: “The methods, processes, or flow charts provided herein may be implemented” “in a computer-readable storage medium for execution by a general purpose computer or a processor”) ;
for performing the lip-language identification method according claim 1 (it is rejected under similar rationale as claim 1).

Regarding claim 17, Vartanian et al. do teach the augmented reality device according to claim 16, further comprising a camera device, a display device or a play device (¶ 0054 line 3 “camera in I/O devices 118” (a camera or display device); ¶ 0051 lines 1-3: “For” “augmented audio” “augmented reality” (a play device) “used to play a song associated with another user”);

Vartanian et al. do not specifically disclose:
the display device is configured to display semantic information; and
the play device is configured to play the semantic information.
Bailey et al. do teach:
the display device is configured to display the semantic text information (¶ 0111 last sentence: “the destination language text” (the semantic text information) “can be displayed” (displayed) “on viewing surface 148 of prism 144 and superimposed on the user's field of view” (within user’s visual field) “thereby achieving augmented reality” (in an augmented reality device) “functionality” (see Fig. 9 the texts associated with each user));
and
the play subunit is configured to play the semantic information (¶ 0088 lines 1+: “the destination language text” (the semantic information) “can be converted to an audio signal” (converted to audio) “and output to the user via a speaker” (and played)).


Regarding claim 18, Vartanian et al. do teach a lip-language identification method (¶ 0054 lines 3-7: “detect” “lip, mouth” “movement” “for speech recognition” and used by  “device 100” to “automatically augment” according to ¶ 0045 lines 4-5), 
comprising:
receiving a sequence of face images for an object to be identified sent by an augmented reality device (¶ 0054 lines 5+: “Lip, mouth, or tongue movement may be detected when the user is speaking with sound or silently speaking without sound” “Images captured by camera in I/O devices” (acquiring a sequences of face images of user (object) being detected by the “device 100” camera), this corresponds to step 
determining semantic information of speech content of the object to be identified corresponding to lip actions in the face images, by  performing lip-language identification based on the sequence of face images (¶ 0054 last 7 lines: “Images captured by camera” (the sequence of facial images) “processed” “to determine user input” “object device 100 may use lip or tongue  movement” “for inputting text” “to assist with an existing speech or voice recognition system to interpret spoken language” (to determine i.e. the “text” (semantic information) of the “input” (speech content) corresponding to the lip or mouth movements of the “user” (object) being identified).
Vartanian et al. do not specifically disclose:
Sending the semantic information to the augmented reality device.
Bailey et al. do teach:
Sending the semantic information to the augmented reality device (Abstract lines 6+: “the commencement of lip movement by one of the potential speakers and reception of the utterance” “The utterance can be converted to text” (semantic information associated also with lip movement determined) “converted text can then be displayed to the user” (“text” (semantic information) is outputted) “in an augmented reality environment” (to the augmented reality device)).


Regarding claim 19, Vartanian et al. in view of Bailey et al. do teach a storage medium that stores non-transitorily computer readable instructions that, when executed by a computer, the computer may execute instructions for the lip-language identification method according to claim 1 (Vartanian et al.: ¶ 0059 line 5+: “The methods, processes, or flow charts provided herein may be implemented” “in a computer-readable storage medium for execution by a general purpose computer or a processor”, and rejected under similar rationale as claim 1).

Regarding claim 20, Vartanian et al. in view of Bailey et al. do teach a storage medium that stores non-transitorily computer readable instructions that, when .

Claims 6-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vartanian et al. in view of Bailey et al., and further in view of Prasad et al. (US Patent 5,680,481).
Regarding claim 6, Vartanian et al. do teach the lip-language identification method according to claim 1, wherein acquiring the sequence of face images for the object to be identified, comprises:
acquiring a sequence of images including the object to be identified ( ¶ 0054 lines 5+: “Lip, mouth, or tongue movement may be detected when the user is speaking with sound or silently speaking without sound” “Images captured by camera” (acquiring a sequence of images of the face of the “user” “lip, mouth” (object to be identified)).
Vartanian et al. in view of Bailey et al. do not specifically disclose:
positioning the object to be identified and acquiring the azimuth of the object to be identified; and

Prasad et al. do teach:
positioning the object to be identified and acquiring the azimuth of the object to be identified (Col. 10 lines 5+: “speakers’ head axis of symmetry is constrained to be within a small angle of the vertical” (determining an “angle” (azimuth) of a “speakers” “lips” object while he is speaking, see Fig. 3, 6, 9)); 
and determining a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and generating the sequence of face images by cropping an image of the face region of the object to be identified from each frame of the images (Col. 9 lines 47+: “The pixels belonging to ROI” (i.e., “region of interest” (see Fig. 3 e.g., mouth or lip (the object)) “may be found by defining two coordinate systems (x,y) and (x’,y’)” (the “(x,y)” (position of pixels of the object are according to the equation in Col. 9 lines 55+  determined in terms of “θ” or the “angle” (azimuth); furthermore as Figs. 3 and/or 4, 6, 9 show, this corresponds to cropping an image of the face region (object) of the speaker to be identified in a given frame)).


Regarding claim 7, Vartanian et al. in view of Bailey et al. do not specifically disclose the lip-language identification method according to claim 6, wherein positioning the azimuth of the object to be identified, comprises:
positioning the azimuth of the object to be identified according to a voice signal emitted when the object to be identified is speaking.
Prasad et al. do teach:
positioning the azimuth of the object to be identified according to a voice signal emitted when the object to be identified is speaking (Col. 10 lines 5+: “speakers’ head axis of symmetry is constrained to be within a small angle of the vertical” (determining the  “angle” (azimuth) of a “speaker’s” “lips” (object) while he is speaking)).
.

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vartanian et al. in view of Bailey et al., and further in view of YOSHIGAHARA et al. (US 2016/0078318).
Regarding claim 9, Vartanian et al. in view of Bailey et al. do not specifically disclose the lip-language identification method according to claim 8, wherein sending the sequence of face images to the server comprises:
sending the saved sequence of face images to the server upon receiving a sending instruction.
YOSHIGAHARA et al. do teach:
sending the saved sequence of face images to the server upon receiving a sending instruction (¶ 0084 lines 4+: “Transmission of the input image” (sending an image) “in response to a user instruction” (by a sending instruction) so “an object displayed on the screen be identified or tracked” “when input image is transmitted in response” “a feature dictionary is provided from the dictionary server” (to a server), where the “instruction” “is from a user via the input unit 106” according to ¶ 0080; ¶ 0005 lines 4-6: “One application of such object identification is an augmented reality (AR) application”   ).
.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 10, 16 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Vartanian et al.
Regarding claim 10, Vartanian et al. do teach a lip-language identification apparatus (¶ 0054 lines 3-7: “detect” “lip, mouth” “movement” “for speech recognition” and used by  “device 100” to “automatically augment” according to ¶ 0045 lines 4-5), 
comprising:

a sending unit, configured to send the sequence of face images to a server (¶ 0046 lines 5+: “using well-known techniques for text recognition, character recognition, image recognition” (e.g. recognition of images comprising of lip movements above) “speech recognition” (to decipher the associated “text” (semantic information)) “processed by one or processors 102 or remotely on a server” (sent to a server and performed by the server)), 
wherein the server determines semantic information corresponding to lip actions in the face images by performing lip-language identification (¶ 0054 last 7 lines: “Images captured by camera” (the sequence of facial images) “processed” (i.e., by the “server” ((¶ 0046 lines 5+)) “to determine user input” “object device 100” (or “server” (¶ 0046 lines 5+)) “may use lip or tongue  movement” “for inputting text” “to assist with an existing speech or voice recognition system to interpret spoken language” (to determine i.e. the “text” (semantic information) of the “input” (speech content) corresponding to the lip or mouth movements of the “user” (object) being identified)); and


Regarding claim 16, Vartanian et al. do teach 
an augmented reality device (title and abstract: e.g. Abstract lines 1-2: “providing augmented or mixed reality environments based on other user or third party information”), 
comprising the lip- language identification apparatus according to claim 10 (it is rejected under the same rationale as claim 10).


Claims 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vartanian et al. in view of Bailey et al., and further in view of Shpigelman (US 2015/0302651).

an output mode instruction generation subunit, configured to generate a display mode instruction, wherein the output mode instruction includes a display mode instruction and an audio mode instruction.
Shpigelman does teach:
an output mode instruction generation subunit, configured to generate a display mode instruction, wherein the output mode instruction includes a display mode instruction and an audio mode instruction (¶ 0042 last 4 lines and ¶ 0043 lines 1-4: “the file or streamable content represents” “visual scenes” “as well as an audio stream, for playback on the VR/AR headset” (generating visual scenes and audio playback by an augmented reality headset) “These steps may be followed by user selection of a “PLAY” button” (using “PLAY” “button” (and output mode instruction)) by a “button” (subunit) enables both display as well as audio playback or also functions simultaneously as a display mode instruction and an audio mode instruction)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “PLAY” ”button” functionality of the augmented reality (“AR”) device of Shpigelman into the augmented reality devices of Vartanian et al. in view of Bailey et al. would enable the combined 

Regarding claim 13, Vartanian et al. in view of Bailey et al. do teach the lip-language identification apparatus according to claim 12, wherein the semantic information is semantic text information and/or semantic audio information, and the output unit further comprises:
a display subunit, configured to display the semantic text information within a visual field of a user wearing an augmented reality device 
and

Vartanian et al. do not specifically disclose:
Displaying upon receiving the display mode instruction, and play upon receiving the audio mode instruction.
Shpigelman does teach:
Displaying upon receiving the display mode instruction, and play upon receiving the audio mode instruction (¶ 0042 last 4 lines and ¶ 0043 lines 1-4: “the file or streamable content represents” “visual scenes” “as well as an audio stream, for playback on the VR/AR headset” “These steps may be followed by user selection of a “PLAY” button” (using “PLAY” “button” (the output (display plus audio) mode instructions))).
For obviousness to combine Vartanian et al. in view of Bailey et al. and Shpigelman see claim 12.

Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vartanian et al., and further in view of Prasad et al..

An image sequence acquiring subunit, configured to acquire a sequence of images for an object to be identified (¶ 0054 lines 5+: “Lip, mouth, or tongue movement may be detected when the user is speaking with sound or silently speaking without sound” “Images captured by camera” (acquiring a sequence of images of the face of the “user” “lip, mouth” (object to be identified)).
Vartanian et al. do not specifically disclose:
A positioning subunit, configured to position an azimuth of the object to be identified; and
A face image sequence generation subunit, configured to determine a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and crop an image of the face region of the object to be identified from each frame image so as to generate the sequence of face images.
Prasad et al. do teach:
A positioning subunit, configured to position an azimuth of the object to be identified (Col. 10 lines 5+: “speakers’ head axis of symmetry is constrained to be within 
and a face image sequence generation subunit, configured to determine a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and crop an image of the face region of the object to be identified from each frame image so as to generate the sequence of face images (Col. 9 lines 47+: “The pixels belonging to ROI” (i.e., “region of interest” (see Fig. 3 e.g., mouth or lip (the object)) “may be found by defining two coordinate systems (x,y) and (x’,y’)” (the “(x,y)” (position of pixels of the object are according to the equation in Col. 9 lines 55+  determined in terms of “θ” or the “angle” (azimuth) and each “θ” is associated with a given frame and therefore this angle is parameter defining a sequence of images; furthermore as Figs. 3 and/or 4, 6, 9 show, this corresponds to cropping an image of the face region (object) of the speaker to be identified in a given frame)).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the mathematical methods using the pixel analysis pertaining to lip and mouth positions in Prasad et al. into the lip image analysis and processing of Vartanian et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Vartanian et al. to benefit from a more “effective speech” “recognition” .


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 10, 18 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims (1+15), (8+15), (1+15) of copending Application No. 16/346,815 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because:



A lip language recognition method, applied to a mobile terminal

having a sound mode and a silent mode, the method comprising:

training a deep neural network in the sound mode;

collecting a user's lip images in the silent mode; and



identifying content corresponding to the user’s lip images with the deep neural
network trained in the sound mode.

Claim 15:
sending portion configured to encode the synthesized void data and send the encoded synthesized voice data to a communication station wirelessly.

A lip-language identification method, comprising:







acquiring a sequence of face images for an object to be identified;

performing lip-language identification based on the sequence of face images, so as to determine semantic information of speech content of the object to be identified corresponding to lip actions in the face images; and

outputting the semantic information.

In re Karlson, 136 USPQ 184: “Omission of an element and its function in a combination where the remaining elements perform the same functions as before involves only routine skill in the art”.

This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860.  The examiner can normally be reached on 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Farzad Kazeminezhad/
Art Unit 2657
September 28th 2021.