DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/8/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Status of Claims
Claims 1-20 are pending in this application.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 12 and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “real time” in claims 2, 12 and 17 is a relative term which renders the claim indefinite. The term “real time” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. Specifically, there is no commonly accepted definition for the lower bound of the speed of real time processing (i.e.at what specific speed or time duration do you transition from real-time to non-real time processing).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 3, 5, 7-11, 13, 15-16, 18 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang et al. (U.S. Patent Application Publication 2020/0219517).
As per claims 1, 11 and 16, Wang et al discloses:
A computer-implemented method (Figures 1A & 5) comprising: 
obtaining, by a computing system, a stream of audio waveform data that represents speech involving a plurality of speakers (Figure 1A, items 120 & 10a-10n and Paragraphs [0025-0032] – multiple speakers are involved in a conversation); 
as the stream of audio waveform data is obtained, determining, by the computing system, a plurality of audio chunks, wherein an audio chunk is associated with one or more identity embeddings (Figure 1A, items 210, 220, 230 & 240 and Paragraphs [0029-0030] – the audio stream is chunked into fixed length segments each associated with a speaker identity embedding); 
segmenting, by the computing system, the stream of audio waveform data into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks, wherein a segment can be associated with a speaker included in the plurality of speakers (Figure 1A, items 260 & 280 and Paragraph [0031-0032] – each chunk then has a speaker identity assigned to it); and 
providing, by the computing system, information describing the plurality of segments associated with the stream of audio waveform data (Figure 1A, items 260 & 280 and Paragraph [0031-0032] –the diarization information for the audio is output).
  Claim 11 is directed to a computer system for implementing the method of claim 1, so is rejected for similar reasons. See Figure 6 and Paragraphs [0059-0069] for details of the hardware & software to implement the invention.
Claim 16 is directed to a computer readable medium containing instructions to cause a processor to execute the method of claim 1, so is rejected for similar reasons. See Figure 6 and Paragraphs [0059-0069] for details of the hardware & software to implement the invention.

As per claims 3, 13 and 18, Wang et al. discloses all of the limitations of claims 1, 11 and 16 above. Wang et al. further discloses:
each audio chunk in the plurality of audio chunks corresponds to a fixed length of time (Paragraphs [0029-0032] – the chunks are fixed length).

As per claims 5, 15 and 20, Wang et al. discloses all of the limitations of claims 1, 11 and 16 above. Wang et al. further discloses:
determining, by the computing system, that a first audio chunk matches a second audio chunk associated with a speaker included in a speaker inventory; and assigning, by the computing system, the first audio chunk to the speaker included in the speaker inventory (Figure 1A, item 280 and Paragraphs [0031-0032] - Multiple chunks can be assigned to the same speaker).

As per claim 7, Wang et al. discloses all of the limitations of claim 5 above. Wang et al. further discloses:
the speaker inventory maintains associations between speakers identified in the stream of audio waveform data, audio chunks, and identity embeddings (Figure 1A, item 280 and Paragraphs [0031-0032] - Multiple chunks can be assigned to the same speaker and the diarization tracks speaker to chunk associations).

As per claim 8, Wang et al. discloses all of the limitations of claim 5 above. Wang et al. further discloses:
the speaker inventory is refreshed at regular time intervals to reconcile a first speaker in the speaker inventory and a second speaker in the speaker inventory as a same speaker  (Figure 1A, item 280 and Paragraphs [0031-0032] - Multiple chunks can be assigned to the same speaker and the diarization tracks speaker to chunk associations. Since the speaker inventory can be construed as speakers that have been assigned to a chunk in this diary and that is refreshed as each chunk comes in (on a regular time interval) the limitation is met).

As per claim 9, Wang et al. discloses all of the limitations of claim 5 above. Wang et al. further discloses:
determining, by the computing system, that an audio chunk does not match any audio chunks associated with speakers included in a speaker inventory; and updating, by the computing system, the speaker inventory to include a new speaker associated with the audio chunk (Figure 1A, item 280 and Paragraphs [0031-0032] - Multiple chunks can be assigned to the same speaker and the diarization tracks speaker to chunk associations. Since the speaker inventory can be construed as speakers that have been assigned to a chunk in this diary, the limitation is met).

As per claim 10, Wang et al. discloses all of the limitations of claim 1 above. Wang et al. further discloses:
the information describing the plurality of segments provides labels for the plurality of segments, and wherein a label can indicate that a segment represents a particular speaker (Figure 1A, item 280 and Paragraphs [0031-0032] – each chunk is labeled as being spoken by a particular speaker).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 12 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (U.S. Patent Application Publication 2020/0219517) in view of Gao (U.S. Patent Application Publication 2021/0020161).
As per claims 2, 12 and 17, Wang discloses all of the limitations of claims 1, 11 and 16 above. Wang fails to disclose, but Gao in the same field of endeavor teaches:
the segmenting is performed in real-time based on a computational graph (Figures 17 & 18 and Paragraphs [0183-0206] – the speaker vector functions are embodied as computational graphs).
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method, system and computer readable medium of Wang with the computational graphing techniques of Gao because it is a case of combining prior art elements according to known methods to yield predictable results.

Claims 4, 6, 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (U.S. Patent Application Publication 2020/0219517) in view of Ramasubramanian (U.S. Patent 10,706,857).
As per claims 4, 14 and 19, Wang discloses all of the limitations of claims 1, 11 and 16 above. Wang fails to disclose, but Ramasubramanian in the same field of endeavor teaches:
the one or more identity embeddings associated with the audio chunk are generated by a temporal convolutional network that pre-processes the audio chunk and outputs the one or more identity embeddings (Figure 1, items 2-6 and Column 11, line 54 – Column 12, line 16 – a temporal CNN processes the audio to generate identity embeddings).
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method, system and computer readable medium of Wang with the temporal CNN of Ramasubramanian because it is a case of combining prior art elements according to known methods to yield predictable results.

As per claim 6, Wang discloses all of the limitations of claim 5 above. Wang further discloses
a network evaluates at least one identity embedding associated with the first audio chunk and at least one identity embedding associated with the second audio chunk to determine whether the first audio chunk matches the second audio chunk (Figure 1A, item 280 and Paragraphs [0031-0032] - Multiple chunks can be assigned to the same speaker and the diarization tracks speaker to chunk associations).
Wang fails to disclose, but Ramasubramanian in the same field of endeavor teaches:
The network is a temporal convolutional network (Figure 1, items 2-6 and Column 11, line 54 – Column 12, line 16 – a temporal CNN processes the audio to generate identity embeddings).
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method, system and computer readable medium of Wang with the temporal CNN of Ramasubramanian because it is a case of simple substitution of one known element for another to obtain predictable results

Examiner Notes
The Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the Applicant.  Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well.  It is respectfully requested that, in preparing responses, the Applicant fully considers the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or as disclosed by the Examiner. 
Communications via Internet e-mail are at the discretion of the applicant and require written authorization. Should the Applicant wish to communicate via e-mail, including the following paragraph in their response will allow the Examiner to do so:
“Recognizing that Internet communications are not secure, I hereby authorize the USPTO to communicate with me concerning any subject matter of this application by electronic mail. I understand that a copy of these communications will be made of record in the application file.”
Should e-mail communication be desired, the Examiner can be reached at Edwin.Leland@USPTO.gov

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWIN S LELAND III whose telephone number is (571)270-5678. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy Goddard can be reached on (571) 272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/EDWIN S LELAND III/Primary Examiner, Art Unit 2677