Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
In the office action dated Feb 15, 2022, Claims 4, 10, 16, 22 and 28 were listed as Allowable Subject Matter. The amendments for the claims 1, 7, 13 and 19 and 25 require a new reference and that same reference, together with an existing reference, is now also used to reject the previously objected-to claims 4, 5, 10, 11, 16, 17, 22, 23, 28, 29. This office action, therefore rescinds the Allowable Subject Matter in relation to those claims.  Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, this action is made final.  See MPEP § 706.07(a).



Rejections Under 35 USC 101
The applicant has made amendments to the independent claims 1, 7, 13 and 19 and 25, the examiner maintains the US 101 Rejections. Please see the 35 USC 101 Section below

On page 11 the applicant states under the Prong One Step 2A/2B:

    PNG
    media_image1.png
    590
    1429
    media_image1.png
    Greyscale

In the 101 section below using the revised claim language, regarding Step 2A, prong 1 the claims describe certain methods of human activity. In the current Office Action, the examiner has stated:

    PNG
    media_image2.png
    266
    817
    media_image2.png
    Greyscale

The passage demonstrates an example of human organizing of activities.
In regards to the judicial exception integrated into a practical application, the steps of “a proportion of one or more non-speech characters” don’t describe an improvement in technology but a particular method of human activity that a human can perform. 

Rejections Unser 35 USC 102
The applicant has made amendments to the independent claims 1, 7, 13 and 19 and 25, the examiner maintains the US 102 Rejections using a new reference. Please see the 35 USC 102 Section below.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 7, 13 and 19 is rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more.

The independent claims 1, 7, 13 and 19 recite A processor/system/method/machine-readable medium/, comprising: one or more circuits to indicate an end of one or more speech segments based, at least in part, on a proportion of one or more non-speech characters within the one or more speech segments.
The limitation of “indicate”” as drafted cover a human organizing of activities where two people are talking and a third person writing down the conversations predicts the end of the sentence by counting in four five-second intervals if 3 out of 4 of the said intervals consist only of the special word pause  (The system specifications  at [0049] specify “…special characters such as blanks to represent time steps or audio frames in which no other character is detected”). On such determination,  the third person writes down <EOS> to indicate the end of sentence word/character.
This judicial exception is not integrated into a practical application. In particular claim 1 recites additional element of “processor”, which is a form of generic computer equipment. In the as-filed Specifications “[0059] In at least one embodiment, inference and/or training logic 615 may include, without limitation, code and/or data storage 601 to store forward and/or output weight and/or input/output data, and/or other parameters to configure neurons or layers of a neural network trained and/or used for inferencing in aspects of one or more embodiments. In at least one embodiment, training logic 615 may include, or be coupled to code and/or data storage 601 to store graph code or other software to control timing and/or order, in which weight and/or other parameter information is to be loaded to configure, logic, including integer and/or floating point units (collectively, arithmetic logic units (ALUs).” The element “processor” is all general purpose computer devices.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.


Claim Rejections-35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 7, 13 and 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Abrash (US 20060241948 A1).
With respect to claim 1, 7, 13, 19   Abrash  teaches  A processor/system/method/machine-readable medium/ ([0053] Therefore, in one embodiment, a general purpose computing device 500 comprises a processor 502, a memory 504, a speech endpointer or module 505…[0054] Thus, in one embodiment, the speech endpointer 505 for endpointing audio signals described herein with reference to the preceding Figures can be stored on a computer readable medium or carrier (e.g., RAM, magnetic or optical drive or diskette, and the like).), comprising: one or more circuits to indicate an end of one or more speech segments based, at least in part, on a proportion of one or more non-speech characters within the one or more speech segments (Claim 55. The computer readable medium of claim 51, wherein said step of locating a second speech endpoint comprises: counting a number of frames of said audio signal for which a most likely word in a pre-defined quantity of preceding frames is silence [non-speech character]; determining whether said number of frames exceeds a second pre-defined threshold; [counting silence words in frames that exceed a threshold equates to proportion of non-speech characters]) .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2, 3, 4, 5, 8, 9, 10, 11, 14, 15, 16, 17, 20, 21, 22, 23, 28, 29 are rejected under 35 U.S.C. 103 as being unpatentable over  Abrash as applied to claims 1, 2, 7, 13, 14, 19 and 20, respectively, in further view of Zhou (US 20200005765 A1)
Zhou was used in the previous Office Action.

With respect to claim   2, 8, 14, 20,   Abrash fails to explicitly disclose, however, Zhou  teaches   wherein the one or more circuits are further to use a connectionist temporal classification (CTC) function with one or more neural networks to generate probabilities for each of the one or more characters based on features extracted from one or more audio signals containing the one or more speech segments ([0047] Connectionist temporal classification (CTC) 172 utilizes an objective function that allows RNN 352 to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. The output layer contains a single unit for each of the transcription labels, such as characters or phonemes plus an extra unit referred to as the "blank" which corresponds to a null emission. Given a length T input sequence X, the output vectors yt are normalized with the softmax function, then interpreted as the probability of emitting the label or blank with index k at time t :).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify  Abrash in view of Zhou, in order to use a connectionist temporal classification (CTC) function with one or more neural networks to generate probabilities for each of the one or more characters based on features extracted from one or more audio signals containing the one or more speech segment  to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

With respect to claims   3, 9, 15, 21   Abrash fails to explicitly disclose, however, Zhou  teaches   wherein the one or more circuits are further to analyze the probabilities for each of the one or more characters using a greedy decoder to generate a string of characters for individual time steps ([0047] For the connectionist temporal classification (CTC), consider an entire neural network to be simply a function that takes in some input sequence of length T and outputs some output sequence y also of length T [generated string of characters], and [0048] Connectionist temporal classification (CTC) 172 utilizes an objective function that allows RNN 352 to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. The output layer contains a single unit for each of the transcription labels, such as characters or phonemes plus an extra unit referred to as the "blank" which corresponds to a null emission. Given a length T input sequence X, the output vectors yt are normalized with the softmax function, then interpreted as the probability of emitting the label or blank with index k at time t, and [0058] Training with the defined objective is efficient, since both sampling and greedy decoding are cheap).

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify  Abrash in view of Zhou, in order to analyze the probabilities for each of the one or more characters using a greedy decoder to generate a string of characters for individual time steps to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

With respect to claims   4, 10, 16, 22 and 28, Abrash further teaches further to analyze the string of characters [[using a sliding window of a specified length]], wherein the end of the one or more speech segments is determined in response to a percentage of blank characters contained [[within the sliding window]] being determined to satisfy an end of speech threshold (Claim 55. The computer readable medium of claim 51, wherein said step of locating a second speech endpoint comprises: counting a number of frames of said audio signal for which a most likely word in a pre-defined quantity of preceding frames is silence [non-speech character]; determining whether said number of frames exceeds a second pre-defined threshold; [counting silence words in frames that exceed a threshold equates to proportion of non-speech characters])
Abrash does not explicitly disclose but Zhou teaches using a sliding window of specified length ([0043] FIG. 2 shows a block diagram for preprocessor 148 which includes spectrogram generator 225 which takes as input, sampled speech audio wave 252 and computes, for each speech input, a spectrogram with a sliding 20 ms window and 10 ms step size.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Abrash in view of Zhou, in order to use a sliding window of specified length to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

With respect to claim 5, 11, 17, 23 and 29, Abrash does not explicitly disclose but Zhao teaches further teaches further to analyze the string wherein the probabilities for each of the one or more characters are decoded up to the end of the one or more speech segments in order to generate one or more text transcripts of the one or more speech segments ([0053] FIG. 4 shows an example whole transcription sampled [decoded] by e sampling module 125 from softmax probabilities generated by the RNN 352 after processing a speech sample annotated with a “HALO” transcription. The illustrated example would use CER as the evaluation metric. Another example could include words instead of characters, and calculate WER. In FIG. 4, the x axis shows the letters predicted for each 20 ms window, and the y axis lists the twenty-six letters of the alphabet and blank 472 and space 482. The bright red entries correspond to letters sampled by the sampling module 125. The sampled whole transcription is “HHHEE_LL_LLLOOO”. In some implementations, a collapsing module (not show) enforces CTC collapsing rules and removes repeated letters and blanks to produce a final whole transcription “HELLO”.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Abrash in view of Zhou, in order to analyze the string wherein the probabilities for each of the one or more characters are decoded up to the end of the one or more speech segments in order to generate one or more text transcripts of the one or more speech segments to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);


Claims 6, 12, 18, 24 are rejected under 35 U.S.C. 103 as being unpatentable over Abrash in further view of Gorny (US 20210020181 A1)
With respect to claim 6, 12, 18, 24  Abrash fails to explicitly disclose, however, Gorny teaches wherein transcripts for the one or more speech segments are to be provided as input for one or more voice- controllable devices ([0042] According to embodiments, transcription module 206 accesses local device audio data 214 and transcribes the audio data stored in local device audio data 214 into a local device text transcript and [0006] In embodiments of the disclosed subject matter, the computer merges the audio transcription data from each of the two or more communication devices into a master audio transcript. The computer transmits the master audio transcript to each of the two or more communication devices.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify  Abrash in view of Gorny, in order to generate transcripts for the one or more speech segments to provide a consensus mechanism for merging a plurality of local text transcripts, which may comprise differing text transcriptions of the same audio data communications between communication devices, into a single master text transcript. ([0018], Gorny);


Claims 25 is rejected under 35 U.S.C. 103 as being unpatentable over Abrash in further view of Srinivasan (US 20210133577 A1)
With respect to claim 25 Abrash teaches one or more processors to indicate an end of one or more speech segments based, at least in part, on a proportion of one or more non-speech characters (Claim 55. The computer readable medium of claim 51, wherein said step of locating a second speech endpoint comprises: counting a number of frames of said audio signal for which a most likely word in a pre-defined quantity of preceding frames is silence [non-speech character]; determining whether said number of frames exceeds a second pre-defined threshold; [counting silence words in frames that exceed a threshold equates to proportion of non-speech characters])); 
Abrash fails to explicitly disclose, however, Srinivasan teaches memory for storing network parameters for the one or more neural networks ([0095] At process block 1110, parameters, such as weights and biases, of the neural network can be initialized. As one example, the weights and biases can be initialized to random normal-precision floating-point values. As another example, the weights and biases can be initialized to normal-precision floating-point values that were calculated from an earlier training set. The initial parameters can be stored in a memory or storage of the machine learning system. In one example, the parameters can be stored as quantized floating-point values which can reduce an amount of storage used for storing the initial parameters.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Abrash in view of Srinivasan, in order to store network parameters for the one or more neural networks to adjust the model parameters to improve the correlation of the machine learning model output values to a set of desired output values. ([0030], Srinivasan).

Claims 26, 27 are rejected under 35 U.S.C. 103 as being unpatentable over Abrash and Srinivasan as applied to claim 25, and in further view of Zhou (US 20200005765 A1)

With respect to claim   26  Abrash  and Srinivasan fail to explicitly disclose, however, Zhou  teaches   wherein the one or more circuits are further to use a connectionist temporal classification (CTC) function with one or more neural networks to generate probabilities for each of the one or more characters based on features extracted from one or more audio signals containing the one or more speech segments ([0047] Connectionist temporal classification (CTC) 172 utilizes an objective function that allows RNN 352 to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. The output layer contains a single unit for each of the transcription labels, such as characters or phonemes plus an extra unit referred to as the "blank" which corresponds to a null emission. Given a length T input sequence X, the output vectors yt are normalized with the softmax function, then interpreted as the probability of emitting the label or blank with index k at time t :).

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Abrash and Srinivasan in view of Zhou, in order to use a connectionist temporal classification (CTC) function with one or more neural networks to generate probabilities for each of the one or more characters based on features extracted from one or more audio signals containing the one or more speech segment  to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

With respect to claim  27 Abrash and Srinivasan fail to explicitly disclose, however, Zhou  teaches   wherein the one or more circuits are further to analyze the probabilities for each of the one or more characters using a greedy decoder to generate a string of characters for individual time steps ([0047] For the connectionist temporal classification (CTC), consider an entire neural network to be simply a function that takes in some input sequence of length T and outputs some output sequence y also of length T [generated string of characters], and [0048] Connectionist temporal classification (CTC) 172 utilizes an objective function that allows RNN 352 to be trained for sequence transcription tasks without requiring any prior alignment between the input and target sequences. The output layer contains a single unit for each of the transcription labels, such as characters or phonemes plus an extra unit referred to as the "blank" which corresponds to a null emission. Given a length T input sequence X, the output vectors yt are normalized with the softmax function, then interpreted as the probability of emitting the label or blank with index k at time t and [0058] Training with the defined objective is efficient, since both sampling and greedy decoding are cheap).

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Abrash and Srinivasan in view of Zhou, in order to analyze the probabilities for each of the one or more characters using a greedy decoder to generate a string of characters for individual time steps to optimizes a performance metric, such as CER or WER, defined over output transcriptions. ([0032], Zhou);

Claims 30 is rejected under 35 U.S.C. 103 as being unpatentable over Abrash and Srinivasan as applied to claim 25, in further view of Gorny (US 20210020181 A1)
With respect to claim 30 Abrash and Srinivasan fail to explicitly disclose, however, Gorny teaches wherein transcripts for the one or more speech segments are to be provided as input for one or more voice- controllable devices ([0042] According to embodiments, transcription module 206 accesses local device audio data 214 and transcribes the audio data stored in local device audio data 214 into a local device text transcript and [0006] In embodiments of the disclosed subject matter, the computer merges the audio transcription data from each of the two or more communication devices into a master audio transcript. The computer transmits the master audio transcript to each of the two or more communication devices.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Abrash and Srinivasan in view of Gorny, in order to generate transcripts for the one or more speech segments to provide a consensus mechanism for merging a plurality of local text transcripts, which may comprise differing text transcriptions of the same audio data communications between communication devices, into a single master text transcript ([0018], Gorny).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675. The examiner can normally be reached Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ATHAR N PASHA/               Examiner, Art Unit 2657                                                                                                                                                                                         

/DANIEL C WASHBURN/               Supervisory Patent Examiner, Art Unit 2657