Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 7, 13 and 19 is rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more.

The independent claims 1, 7, 13 and 19 recite A processor/system/method/machine-readable medium/, comprising: one or more circuits to indicate an end of one or more speech segments based, at least in part, on one or more characters predicted to be within the one or more speech segments.
The limitations of “indicate” and “predict” as drafted cover a human organizing of activities where two people are talking and a third person writing down the conversations predicts the completion of the sentence based on long pauses between utterances, which indicates completion of the sentence.
This judicial exception is not integrated into a practical application. In particular claim 1 recites additional element of “processor”, which is a form of generic computer equipment. In the as-filed Specifications “[0059] In at least one embodiment, inference and/or training logic 615 may include, without limitation, code and/or data storage 601 to store forward and/or output weight and/or input/output data, and/or other parameters to configure neurons or layers of a neural network trained and/or used for inferencing in aspects of one or more embodiments. In at least one embodiment, 
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.


Claim Rejections-35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 7, 13 and 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Takahashi (US 20200213457 A1)
With respect to claim 1, 7, 13, 19  Takahashi  teaches  A processor/system/method/machine-readable medium/ ([0187] Embodiment(s) can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s).), comprising: one or more circuits to indicate an end of one or more speech segments based, at least in part, on one or more characters predicted to be within the one or more speech segments ([0090] The determination of whether the speech has ended may be made on the basis of a predetermined word spoken by the user 106, not on the basis of the length of time during which no speech takes place (hereinafter referred to as “blank period”). For example, if a predetermined word, such as “Yes”, “No”, “OK”, “Cancel”, “Finish”, “Start”, or “Begin”, is received, the speech-end determining unit 608 may determine that the speech has ended, without waiting for a predetermined length of time. The determination of the speech end may be made by the server 102, instead of the audio control apparatus 100. The end of the speech may be determined from the meaning and context of the speech made by the user 106) 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2, 3, 8, 9, 14, 15, 20, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Takahashi as applied to claims 1, 2, 7, 13, 14, 19 and 20, respectively, in further view of Zhou (US 20200005765 A1)

With respect to claim   2, 8, 14, 20,  Takahashi fails to explicitly disclose, however, Zhou  teaches   wherein the one or more circuits are further to use a connectionist temporal classification (CTC) function with one or more neural networks to generate probabilities for each of the one or more characters based on features extracted from one or more audio signals containing the one or more speech segments ([0047] Connectionist temporal classification (CTC) 172 utilizes an objective function that allows RNN 352 to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. The output layer contains a single unit for each of the transcription labels, such as characters or phonemes plus an extra unit referred to as the "blank" which corresponds to a null emission. Given a length T input sequence X , the output vectors yt are normalized with the softmax function, then interpreted as the probability of emitting the label or blank with index k at time t :).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Takahashi in view of Zhou, in order to use a connectionist temporal classification (CTC) function with one or more neural networks to generate probabilities for each of the one or more characters based on features extracted from one or more audio signals containing the one or more speech segment  to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

With respect to claim   3, 9, 15, 21  Takahashi fails to explicitly disclose, however, Zhou  teaches   wherein the one or more circuits are further to analyze the probabilities for each of the one or more characters using a greedy decoder to generate a string of characters for individual time steps ([0047] For the connectionist temporal classification (CTC), consider an entire neural network to be simply a function that takes in some input sequence of length T and outputs some output sequence y also of length T [generated string of characters], and [0048] Connectionist temporal classification (CTC) 172 utilizes an objective function that allows RNN 352 to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. The output layer contains a single unit for each of the transcription labels, such as characters or phonemes plus an extra unit referred to as the "blank" which corresponds to a null emission. Given a length T input sequence X , the output vectors yt are normalized with the softmax function, then interpreted as the probability of emitting the label or blank with index k at time t,  and [0058] Training with the defined objective is efficient, since both sampling and greedy decoding are cheap).

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Takahashi in view of Zhou, in order to analyze the probabilities for each of the one or more characters using a greedy decoder to generate a string of characters for individual time steps to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

Claims 6, 12, 18, 24 are rejected under 35 U.S.C. 103 as being unpatentable over Takahashi in further view of Gorny (US 20210020181 A1)
With respect to claim 6, 12, 18, 24 Takahashi fails to explicitly disclose, however, Gorny teaches wherein transcripts for the one or more speech segments are to be provided as input for one or more voice- controllable devices ([0042] According to embodiments, transcription module 206 accesses local device audio data 214 and transcribes the audio data stored in local device audio data 214 into a local device text transcript and [0006] In embodiments of the disclosed subject matter, the computer merges the audio transcription data from each of the two or more communication devices into a master audio transcript. The computer transmits the master audio transcript to each of the two or more communication devices.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Takahashi in view of Gorny, in order to generate transcripts for the one or more speech segments to provide a consensus mechanism for merging a plurality of local text transcripts, which may comprise differing text transcriptions of the same audio data communications between communication devices, into a single master text transcript. ([0018], Gorny);


Claims 25 is rejected under 35 U.S.C. 103 as being unpatentable over Takahashi in further view of Srinivasan (US 20210133577 A1)
With respect to claim 25 Takahashi teaches Takahashi  teaches  A voice transcription system comprising: one or more circuits to indicate an end of one or more speech segments based, at least in part, on one or more characters predicted to be within the one or more speech segments ([0090] The determination of whether the speech has ended may be made on the basis of a predetermined word spoken by the user 106, not on the basis of the length of time during which no speech takes place (hereinafter referred to as “blank period”). For example, if a predetermined word, such as “Yes”, “No”, “OK”, “Cancel”, “Finish”, “Start”, or “Begin”, is received, the speech-end determining unit 608 may determine that the speech has ended, without waiting for a predetermined length of time. The determination of the speech end may be made by the server 102, instead of the audio control apparatus 100. The end of the speech may be determined from the meaning and context of the speech made by the user 106); 
Takahashi fails to explicitly disclose, however, Srinivasan teaches memory for storing network parameters for the one or more neural networks ([0095] At process block 1110, parameters, such as weights and biases, of the neural network can be initialized. As one example, the weights and biases can be initialized to random normal-precision floating-point values. As another example, the weights and biases can be initialized to normal-precision floating-point values that were calculated from an earlier training set. The initial parameters can be stored in a memory or storage of the machine learning system. In one example, the parameters can be stored as quantized floating-point values which can reduce an amount of storage used for storing the initial parameters.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Takahashi in view of Srinivasan, in order to store network parameters for the .

Claims 26, 27 are rejected under 35 U.S.C. 103 as being unpatentable over Takahashi and Srinivasan as applied to claim 25, and in further view of Zhou (US 20200005765 A1)

With respect to claim   26  Takahashi  and Srinivasan fail to explicitly disclose, however, Zhou  teaches   wherein the one or more circuits are further to use a connectionist temporal classification (CTC) function with one or more neural networks to generate probabilities for each of the one or more characters based on features extracted from one or more audio signals containing the one or more speech segments ([0047] Connectionist temporal classification (CTC) 172 utilizes an objective function that allows RNN 352 to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. The output layer contains a single unit for each of the transcription labels, such as characters or phonemes plus an extra unit referred to as the "blank" which corresponds to a null emission. Given a length T input sequence X , the output vectors yt are normalized with the softmax function, then interpreted as the probability of emitting the label or blank with index k at time t :).

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Takahashi and Srinivasan in view of Zhou, in order to use a connectionist temporal classification (CTC) function with one or more neural networks to generate probabilities for each of the one or more characters based on features extracted from one or more audio signals containing the one or more speech segment  to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

With respect to claim  27 Takahashi and Srinivasan fail to explicitly disclose, however, Zhou  teaches   wherein the one or more circuits are further to analyze the probabilities for each of the one or more characters using a greedy decoder to generate a string of characters for individual time steps ([0047] For the connectionist temporal classification (CTC), consider an entire neural network to be simply a function that takes in some input sequence of length T and outputs some output sequence y also of length T [generated string of characters], and [0048] Connectionist temporal classification (CTC) 172 utilizes an objective function that allows RNN 352 to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. The output layer contains a single unit for each of the transcription labels, such as characters or phonemes plus an extra unit referred to as the "blank" which corresponds to a null emission. Given a length T input sequence X , the output vectors yt are normalized with the softmax function, then interpreted as the probability of emitting the label or blank with index k at time t and [0058] Training with the defined objective is efficient, since both sampling and greedy decoding are cheap).

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Takahashi and Srinivasan in view of Zhou, in order to analyze the probabilities for each of the one or more characters using a greedy decoder to generate a string of characters for individual time steps to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

Claims 30 is rejected under 35 U.S.C. 103 as being unpatentable over Takahashi and Srinivasan as applied to claim 25, in further view of Gorny (US 20210020181 A1)
([0042] According to embodiments, transcription module 206 accesses local device audio data 214 and transcribes the audio data stored in local device audio data 214 into a local device text transcript and [0006] In embodiments of the disclosed subject matter, the computer merges the audio transcription data from each of the two or more communication devices into a master audio transcript. The computer transmits the master audio transcript to each of the two or more communication devices.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Takahashi and Srinivasan in view of Gorny, in order to generate transcripts for the one or more speech segments to provide a consensus mechanism for merging a plurality of local text transcripts, which may comprise differing text transcriptions of the same audio data communications between communication devices, into a single master text transcript. ([0018], Gorny).

Allowable Subject Matter
Claims 4, 10, 16, 22 and 28 are objected to as being dependent upon a rejected base claims, but would be allowable if rewritten in independent form including all the limitations of the base claim and any intervening claims.
Claim 4, 10, 16, 22 and 28   recites “wherein the one or more circuits are further to analyze the string of characters using a sliding window of a specified length, wherein the end of the one or more speech segments is determined in response to a percentage of blank characters contained within the sliding window being determined to satisfy an end of speech threshold” The closest teachings come from  Wada (US 20200160871 A1) who teaches “([0129] The disclosure is not limited to the embodiment and the modification examples described above, and for example, when the speech speed 
Claims 5 depends on 4, 11 depends on 10, 17 depends on 16, 23 depends on 22, and 29 depends on 28. These claims are allowed over the prior art of record by virtue of their dependencies.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Gejji (US 11056098 B1) teaches” (Col 17 ll 26-39: An endpoint to the speech can be identified when the silent count exceeds a silence threshold, such as 5 for example, and when at least two other factors are present: (1) a state score is above a state score threshold and (2) the current state has a positive final mark. The silence threshold can be indicative of a predetermined amount of time that differentiates between a pause in speech and the end of speech. Taking into consideration these other factors when determining and endpoint in speech can enable a more robust and reliable determination 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.   Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ATHAR N PASHA/Examiner, Art Unit 2657     

/HUYEN X VO/Primary Examiner, Art Unit 2656