DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 11/30/2021.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Priority
Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C 119(a)-(d), which papers have been placed of record in the file.

Information Disclosure Statement
The Information Statements (IDS) filed on 11/30/2021 have been accepted and considered in this office action and are in compliance with the provisions of 37 CFR 1.97.

Response to Arguments
Applicant’s argument, page 7, filed 11/30/2021, with respect to the rejection of claims 1 and 7 under 102(a)(2) have been fully considered and are moot upon a further consideration and for the independent claims 1,8, and 15, a new ground(s) of rejection made under AIA  35 U.S.C. 103 as being unpatentable over LIPMAN (US 2020/0194028 A1), and further in view of MIYAZAKI (US 2006/0155549 A1).  For the dependent claims 2-5, 9-12, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over LIPMAN, and further in view of MIYAZAKI and ZHANG (US 2019/0341053 A1).  Please see the rejections below for more details.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:



Claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over LIPMAN (US 2020/0194028 A1), and further in view of MIYAZAKI (US 2006/0155549 A1).

REGARDING 1, LIPMAN discloses eyewear, comprising: 
a frame (LIPMAN Fig. 1 Unit 101; Par 20 – “FIG. 1 depicts glasses with display 100. Glasses with display 100 may include frame 101.”); a display supported by the frame (LIPMAN Fig. 1 Unit 109; Par 21 – “One or both of lenses 107 may include display 109 positioned to allow information to be displayed to a wearer of glasses with display 100 as further described below.”); a microphone coupled to the frame (LIPMAN Fig. 1 Units 113a-d; Par 22 – “In some embodiments, glasses with display 100 may include directional microphone array 113 adapted to receive sound from the environment surrounding a user of glasses with display 100 and may be used to determine the direction of the incoming sound. In some embodiments, directional microphone array 113 may include one or more microphones, such as microphones 113 a-d depicted in FIG. 1, which may be positioned in bridge 103 (with another microphone positioned on the opposite side).”); and an electronic processor (LIPMAN Fig. 2 – “Processor 123”) configured to: 
receive speech from a plurality of human speakers via the microphone (LIPMAN Par 28 – “In certain embodiments, such as depicted in FIG. 3B, text may appear under or in the direction of the speaker, such that when there are two or more people speaking, such as speakers 15 a and 15 b, text will appear under each such speaker. on closed caption display receive audio signal 121 a from directional microphone array 113. Processor 123 may process audio signal 121 a by recognizing speech at speech recognition 201. Speech recognition 201 may analyze audio signal 121 a and determine what words were spoken, converting audio signal 121 a into text speech output 203. Text speech output 203 may be passed to closed caption display 127 of display 109.”); 
perform diarization on the received speech to segment spoken language into different speakers (LIPMAN Fig. 3B – “Hello, how can I help you today? 133a ….. What do you want to eat? 133b”; Par 31 – “Speech recognition 201 may analyze audio signal 121 a and determine what words were spoken, converting audio signal 121 a into text speech output 203. Text speech output 203 may be passed to closed caption display 127 of display 109.”; Par 33 –“ In some embodiments, direction analysis 205 may be used to determine when two or more speakers are detected and to determine which speech is detected from which speaker. This directional information 207 may be used by speech recognition 201 to refine text speech output 203 to, for example and without limitation, isolate the speech of a single speaker for display, to output multiple text speech outputs 203 for multiple users, or to otherwise affect speech recognition 201.”; Par 35 – “In some embodiments, speech recognition 201 may further analyze audio signal 121 a to determine whether a single speaker or multiple speakers are detected. Speech recognition 201 may use, for example and without limitation, frequency analysis, volume analysis, or other techniques to differentiate the speech of multiple speakers and to determine a primary or selected speaker to analyze. In some embodiments, speech recognition 201 may store the results of this analysis to, for example and without limitation, recognize speakers and, in some embodiments, identify speakers based on past analysis.”); and 
display text associated with each speaker on the display (LIPMAN Fig. 3B – “Hello, how can I help you today? 133a ….. What do you want to eat? 133b”; ), [wherein the displayed text associated with each speaker of the plurality of speakers has a unique attribute that is different from an attribute of the text associated with another speaker of the plurality of speakers] (LIPMAN Figs. 3-3b; Par 28 – “In certain embodiments, such as depicted in FIG. 3B, text may appear under or in the direction of the speaker, such that when there are two or more people speaking, such as speakers 15 a and 15 b, text will appear under each such speaker. on closed caption display 127′, shown as text boxes 133 a and 133 b, respectively. Multiple displays of text may thus appear simultaneously if each other person has spoken.”), 
wherein the processor is configured to display the text proximate the respective speaker such that an eyewear user can visually associate the text to the respective speaker (LIPMAN Fig. 3B – “Hello, how can I help you today? 133a ….. What do you want to eat? 133b”; Par 28 – “In some embodiments, directional display 125 may indicate the direction relative to the user of glasses with display 100 from which speech is detected. In some embodiments, as shown in FIG. 3, directional display 125 may include one or more visual directional indicators such as arrow 131. In such an embodiment, arrow 131 may appear to rotate to point toward speaker 15 while speaker 15 is speaking to the user. In some embodiments, when a different speaker is speaking to the user, arrow 131 may then point to the different speaker. In other embodiments, as depicted in FIG. 3A, directional display 125 a may instead modify the positioning of one or more elements of display 109 a such as, for example and without limitation, closed caption display 127′ such that closed caption display 127′ moves to a position on display 109 a to correspond with the relative direction between the current direction of view of the user of glasses with display 100 and speaker 15. In certain embodiments, such as depicted in FIG. 3B, text may appear under or in the direction of the speaker, such that when there are two or more people speaking, such as speakers 15 a and 15 b, text will appear under each such speaker. on closed caption display 127′, shown as text boxes 133 a and 133 b, respectively. Multiple displays of text may thus appear simultaneously if each other person has spoken.”), and such that the text does not substantially obscure a view of the eyewear user (LIPMAN Fig. 3B – “Hello, how can I help you today? 133a ….. What do you want to eat? 133b”; Par 28 – “In certain embodiments, such as depicted in FIG. 3B, text may appear under or in the direction of the speaker, such that when there are two or more people speaking, such as speakers 15 a and 15 b, text will appear under each such speaker. on closed caption display 127′, shown as text boxes 133 a and 133 b, respectively. Multiple displays of text may thus appear simultaneously if each other person has spoken.”; Examiner reviewed the specification for the limitation, and the paragraph [0100] of the specification describes the limitation in a similar manner to the method/system of LIPMAN, where textual data are displayed at the bottom portion of the display for not substantially obscuring the view of the user.).
LIPMAN does not explicitly teach the [square-bracketed] limitations. In other words, LIPMAN teaches displaying the textual data under each speaker so that a user would recognize who speaks what, but LIPMAN does not explicitly teach using different attributes, e.g., colors and/or fonts for each speaker.

MIYAZAKI discloses a method/system for displaying textual data for speakers, comprising:
display text associated with each speaker on the display (MIYAZAKI Figs. 11A and 11B – “Really! …. Delicious!”), [wherein the displayed text associated with each speaker of the plurality of speakers has a unique attribute that is different from an attribute of the text associated with another speaker of the plurality of speakers] (MIYAZAKI Par 101 – “According to the imaging device 10 of the present embodiment, text data is converted to text different appearances (font, font size, color or the like) for different speakers and synthesized, and therefore the correspondence between text and speakers can be distinguished visually more easily.”), 
wherein the processor is configured to display the text proximate the respective speaker such that an eyewear user can visually associate the text to the respective speaker (MIYAZAKI Par 99 – “FIGS. 11A and 11B illustrate examples of a synthesized image. The voice print registrants A, B and voice print non-registrant shown in FIGS. 11A and 11B correspond to those in FIG. 5. As shown in FIGS. 11A and 11B, the text image data corresponding to the voices of voice print registrants A, B are laid out based on the speaker direction information, for example, the voice of the voice print registrant B who is on the left when viewed from the imaging device 10 is laid out on the left in the image and the voice of the voice print registrant A who is in the center is laid out close to the center. On the other hand, the voice of the user recorded by the microphone M3 is laid out at such a position that does not overlap with the object, on the back or the like.”), and such that the text does not substantially obscure a view of the eyewear user (MIYAZAKI Par 100 – “Furthermore, as shown in FIG. 11A, the text image data may be embedded in an image or may be disposed in the margin of the image as shown in FIG. 11B.”; Par 99 – “As shown in FIGS. 11A and 11B, the text image data corresponding to the voices of voice print registrants A, B are laid out based on the speaker direction information, for example, the voice of the voice print registrant B who is on the left when viewed from the imaging device 10 is laid out on the left in the image and the voice of the voice print registrant A who is in the center is laid out close to the center. On the other hand, the voice of the user recorded by the microphone M3 is laid out at such a position that does not overlap with the object, on the back or the like.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of LIPMAN to include different attributes for different speakers, as taught by MIYAZAKI.


REGARDING 8, LIPMAN in view of MIYAZAKI discloses a method for use with eyewear, the eyewear having a frame (LIPMAN Fig. 1 Unit 101; Par 20 – “FIG. 1 depicts glasses with display 100. Glasses with display 100 may include frame 101.”), a display supported by the frame (LIPMAN Fig. 1 Unit 109; Par 21 – “One or both of lenses 107 may include display 109 positioned to allow information to be displayed to a wearer of glasses with display 100 as further described below.”), a microphone coupled to the frame (LIPMAN Fig. 1 Units 113a-d; Par 22 – “In some embodiments, glasses with display 100 may include directional microphone array 113 adapted to receive sound from the environment surrounding a user of glasses with display 100 and may be used to determine the direction of the incoming sound. In some embodiments, directional microphone array 113 may include one or more microphones, such as microphones 113 a-d depicted in FIG. 1, which may be positioned in bridge 103 (with another microphone positioned on the opposite side).”), and an electronic processor (LIPMAN Fig. 2 – “Processor 123”), the processor: performing the steps of Claim 1; thus, it is rejected under the same rationale.

REGARDING 15, LIPMAN in view of MIYAZAKI discloses a non-transitory computer-readable medium storing program code which, when executed by a processor of eyewear having a frame (LIPMAN Fig. 1 Unit 101; Par 20 – “FIG. 1 depicts glasses with display 100. Glasses with display 100 may include frame 101.”), a display supported by the frame (LIPMAN Fig. 1 Unit 109; Par 21 – “One or both of lenses 107 may include display 109 positioned to allow information to be displayed to a wearer of glasses with display 100 as further described below.”), a microphone coupled to the frame (LIPMAN Fig. 1 Units 113a-d; Par 22 directional microphone array 113 adapted to receive sound from the environment surrounding a user of glasses with display 100 and may be used to determine the direction of the incoming sound. In some embodiments, directional microphone array 113 may include one or more microphones, such as microphones 113 a-d depicted in FIG. 1, which may be positioned in bridge 103 (with another microphone positioned on the opposite side).”), is operative to cause the processor (LIPMAN Fig. 2 – “Processor 123”) to perform the steps of: performing the steps of Claim 1; thus, it is rejected under the same rationale.


Claims 2-5, 9-12, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over LIPMAN (US 2020/0194028 A1), and further in view of MIYAZAKI (US 2006/0155549 A1) and ZHANG (US 2019/0341053 A1).

REGARDING 2, LIPMAN in view of MIYAZAKI discloses the eyewear of claim 1.
LIPMAN in view of MIYAZAKI does not explicitly teach using a CNN.
ZHANG discloses a method/system for analyzing utterances of multiple speakers for diarization, wherein the processor is configured to use a convolutional neural network (CNN) to perform the diarization (ZHANG Par 39 – “Diarization machine 132 is a sensor fusion machine configured to use the various received signals to associate recorded speech with the appropriate speaker. The diarization machine is configured to attribute information encoded in the beamformed signal or another audio signal to the human responsible for generating the corresponding sounds/speech. In some implementations (e.g., FIG. 11), the diarization machine is configured to attribute the actual audio signal to the corresponding speaker (e.g., label the audio signal with the speaker identity). In some implementations (e.g., FIG. 12), the diarization machine is configured to attribute speech-recognized text to the corresponding speaker (e.g., label the text with the speaker identity).” Par 45 – “Speech recognition machine 130 may employ any suitable combination of state-of-the-art and/or future natural language processing (NLP), AI, and/or ML techniques. Non-limiting examples of techniques that may be incorporated in an implementation of speech recognition machine 130 include support vector machines, multi-layer neural networks, convolutional neural networks (e.g., including temporal convolutional neural networks for processing natural language sentences), word embedding models (e.g., GloVe or Word2Vec), recurrent neural networks, associative memories, unsupervised spatial and/or clustering methods, graphical models, and/or natural language processing techniques (e.g., tokenization, stemming, constituency and/or dependency parsing, and/or intent recognition).”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of LIPMAN in view of MIYAZAKI to include a CNN for diarization, as taught by ZHANG.
One of ordinary skill would have been motivated to include a CNN for diarization, in order to improve machine learning performance (ZHANG Pars 46-47).


REGARDING 3, LIPMAN in view of MIYAZAKI and ZHANG discloses the eyewear of claim 2.
MIYAZAKI further discloses, wherein the attribute is a color (MIYAZAKI Par 101 – “According to the imaging device 10 of the present embodiment, text data is converted to text image data having different appearances (font, font size, color or the like) for different speakers and synthesized, and therefore the correspondence between text and speakers can be distinguished visually more easily.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of LIPMAN to include different colors for different speakers, as taught by MIYAZAKI.



REGARDING 4, LIPMAN in view of MIYAZAKI and ZHANG discloses the eyewear of claim 2. MIYAZAKI further discloses, wherein attribute is a font size (MIYAZAKI Par 101 – “According to the imaging device 10 of the present embodiment, text data is converted to text image data having different appearances (font, font size, color or the like) for different speakers and synthesized, and therefore the correspondence between text and speakers can be distinguished visually more easily.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of LIPMAN to include different font sizes for different speakers, as taught by MIYAZAKI.
One of ordinary skill would have been motivated to include different font sizes for different speakers, in order to allow a user to distinguish the correspondence between text and speakers visually more easily (MIYAZAKI Par 101).


REGARDING 5, LIPMAN in view of MIYAZAKI and ZHANG discloses the eyewear of claim 2. MIYAZAKI further discloses, wherein the attribute is a font style (MIYAZAKI Par 101 – “According to the imaging device 10 of the present embodiment, text data is converted to text image data having different appearances (font, font size, color or the like) for different speakers and synthesized, and therefore the correspondence between text and speakers can be distinguished visually more easily.”).

One of ordinary skill would have been motivated to include different fonts for different speakers, in order to allow a user to distinguish the correspondence between text and speakers visually more easily (MIYAZAKI Par 101).


Claim 9 is a method claim similar to Claim 2; thus, it is rejected under the same rationale.

Claim 10 is a method claim similar to Claim 3; thus, it is rejected under the same rationale.

Claim 11 is a method claim similar to Claim 4; thus, it is rejected under the same rationale.

Claim 12 is a method claim similar to Claim 5; thus, it is rejected under the same rationale.

Claim 16 is a non-transitory computer readable medium claim similar to Claim 2; thus, it is rejected under the same rationale.

Claim 17 is a non-transitory computer readable medium claim similar to Claim 3; thus, it is rejected under the same rationale.

REGARDING 18, LIPMAN in view of MIYAZAKI and ZHANG discloses the non-transitory computer readable medium as specified in claim 16.
MIYAZAKI further discloses, wherein attribute is a font size or a font type (MIYAZAKI Par 101 – “According to the imaging device 10 of the present embodiment, text data is converted to different appearances (font, font size, color or the like) for different speakers and synthesized, and therefore the correspondence between text and speakers can be distinguished visually more easily.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of LIPMAN to include different fonts for different speakers, as taught by MIYAZAKI.
One of ordinary skill would have been motivated to include different fonts for different speakers, in order to allow a user to distinguish the correspondence between text and speakers visually more easily (MIYAZAKI Par 101).


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655