DETAILED ACTION
This action is in response to the initial filing of Application no. 17/472,724 on 09/13/2021. Claims 1 – 20 are still pending in this application, with claims 1 and 11 being independent.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Aside from the non-prior art rejections, it has been determined that the prior art fails to teach of suggest in reasonable combination the limitations recited in the independent claims.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

The claim mapping for the following rejections is as follows.

Current Application
1. A method of speech recognition, sequentially executed by a processor, on a plurality of consecutive speech segments, the method comprising: obtaining digital information of a speech segment, the digital information comprising a spectrogram representation; and assigning at least one label to said speech segment by: dividing each of a plurality of time frames of the speech segment to a plurality of frequency bands and each of the plurality of frequency bands to a plurality of frequency bins each having a bin value; calculating a speech feature value of each of a plurality of speech features, for each of a plurality of combinations, each of the combinations is of one of the plurality of time frames and one of the plurality of respective frequency bands, said plurality of speech features includes at least a mean value of the respective bin values, a standard deviation value of the respective bin values, a maximum value of the respective bin values and a voice-unvoiced ratio value of the respective bin values; determining a segment vector based on inner relations between two or more speech features of different combinations from the plurality of combinations, said inner relations represent cross effects between said two or more speech features; and determining the at least one label by classifying said segment vector using machine learning classification algorithm receiving as input at least one labeled segment vector of respective at least one previously analyzed speech segment.

2. The method of claim 1, wherein said obtaining digital information further comprising digitizing, by a processor, an analog speech signal originated from a device having a microphone in real time or from a device having an audio recording, wherein, the analog speech signal comprising analog voice portions and non-voice portions; and wherein the digitizing of the analog voice portion produces the digital information of a segment.

3. The method of claim 1, wherein the speech segment represents an element selected from a group comprising of: a syllable, a plurality of syllables, a word, a fraction of a word, a plurality of words and any combination thereof.

4. The method of claim 1, wherein the calculating the speech feature value of each of the plurality of speech features, for each of the plurality of combinations, comprises assembling a plurality of matrixes and an index matrix, having identical number of cells, wherein each matrix of the plurality of matrixes represent a different feature of the plurality of speech features, wherein assembling the index matrix is based on a spectrogram having said plurality of time frames and said plurality of frequency bands, wherein the index matrix dimensions correlates with the plurality of time frames and the plurality of frequency bands of the spectrogram, wherein the plurality of matrixes overlap with the index matrix, and wherein a content of each cell of each matrix of the plurality of matrixes represents a speech feature value of a time frame and a frequency band indicated by the index matrix.

5. The method of claim 4, wherein one or more portions of frequency bands of the index matrix having a time duration which expands along a total number of time frames smaller than a minimal duration defined by a threshold of minimum number of consecutive time frames, are filtered out of the index matrix and the plurality of matrixes.

6. The method of claim 4, wherein contiguous time frames containing similar speech features values are replaced with a time interval in the index matrix and the plurality of matrixes.

7. The method of claim 4, wherein said inner relations between two or more speech features are inner relations between band pairs, wherein the determining a segment vector further comprises compiling a plurality of components each comprising equal number of operands, wherein the first component of the plurality of components is an index component corresponding with the index matrix while the rest of the plurality of components are features components corresponding with the features matrixes, wherein a total number of operands is all possible combinations of frequency bands pairs, and wherein the index component indicates operands having band pairs presence in the segment vector.

8. The method of claim 7, wherein the segment vector further comprises said inner relations.

9. The method of claim 7, wherein properties of operands, having pairs presence, of each feature component are determined by calculating cross effect between sets of aggregated pairs, wherein each set of aggregated pairs is associated with a predetermined time zone of the segment.

10. The method of claim 1, wherein said at least one label comprising at least one alphanumeric character manifestation of a speech segment and wherein said at least one label is a representation of at least one member of a group consisting of: an accent, a pronunciation level, an age of a speaker and a gender of the speaker.

11. A system for speech recognition, comprising: at least one hardware processor adapted to execute code, said code comprising code instructions to sequentially conduct analysis on a plurality of consecutive speech segments, said analysis comprising: obtaining digital information of a speech segment, the digital information comprising a spectrogram representation; and assigning at least one label to said speech segment by: dividing each of a plurality of time frames of the speech segment to a plurality of frequency bands and each of the plurality of frequency bands to a plurality of frequency bins each having a bin value; calculating a speech feature value of each of a plurality of speech features, for each of a plurality of combinations, each of the combinations is of one of the plurality of time frames and one of the plurality of respective frequency bands, said plurality of speech features includes at least a mean value of the respective bin values, a standard deviation value of the respective bin values, a maximum value of the respective bin values and a voice-unvoiced ratio value of the respective bin values; determining a segment vector based on inner relations between two or more speech features of different combinations from the plurality of combinations, said inner relations represent cross effects between said two or more speech features; and determining the at least one label by classifying said segment vector using machine learning classification algorithm receiving as input at least one labeled segment vector of respective at least one previously analyzed speech segment.

12. The system of claim 11, wherein said obtaining said digital information is conducted from devices selected from a group comprising of: image capturing device, video capturing device, images storage, video storage, a real time sound sensor and a sound recording system.

13. The system of claim 11, wherein said obtaining digital information further comprising digitizing an analog speech signal originated from a device having a microphone in real time or from a device having an audio recording, wherein the analog speech signal comprising analog voice portions and non-voice portions, and wherein the digitizing of the analog voice portion produces the digital information of a segment.

14. The system of claim 11, wherein the speech segment represents an element selected from a group comprising of: a syllable, a plurality of syllables, a word, a fraction of a word, a plurality of words and any combination thereof.

15. The system of claim 11, wherein the calculating the speech feature value of each of the plurality of speech features, for each of the plurality of combinations, comprises assembling a plurality of matrixes and an index matrix, having identical number of cells, wherein each matrix of the plurality of matrixes represent a different feature of the plurality of speech features, wherein assembling the index matrix is based on a spectrogram having said plurality of time frames and said plurality of frequency bands, wherein the index matrix dimensions correlates with the plurality of time frames and the plurality of frequency bands of the spectrogram, wherein the plurality of matrixes overlap with the index matrix, and wherein a content of each cell of each matrix of the plurality of matrixes represents a speech feature value of a time frame and a frequency band indicated by the index matrix.

16. The system of claim 15, wherein one or more portions of frequency bands of the index matrix having a time duration which expands along a total number of time frames smaller than a minimal duration defined by a threshold of minimum number of consecutive time frames, are filtered out of the index matrix and the plurality of matrixes; and

17. The system of claim 15, wherein contiguous time frames containing similar speech features values are replaced with a time interval in the index matrix and the plurality of matrixes.

18. The system of claim 15, wherein said inner relations between two or more speech features are inner relations between band pairs, wherein the determining a segment vector further comprises compiling a plurality of components each comprising equal number of operands, wherein the first component of the plurality of components is an index component corresponding with the index matrix while the rest of the plurality of components are features components corresponding with the features matrixes, wherein a total number of operands is all possible combinations of frequency bands pairs, and wherein the index component indicates operands having band pairs presence in the segment vector; and wherein the segment vector further comprises said inner relations.

19. The system of claim 15, wherein properties of operands, having pairs presence, of each feature component are determined by calculating cross effect between sets of aggregated pairs, wherein each set of aggregated pairs is associated with a predetermine time zone of the segment.

20. The system of claim 19, wherein properties of operands, having pairs presence, of each feature component are determined by calculating cross effect between sets of aggregated pairs, wherein each set of aggregated pairs is associated with a predetermine time zone of the segment.

US 11,120,793

1. A method of speech recognition, voice recognition, and a combination thereof, sequentially executed by at least one processor, on a plurality of consecutive segments, the method comprising: obtaining digital information of a segment selected from the group consisting of speech segment, voice segment, and a combination thereof, wherein the digital information comprises a spectrogram representation; extracting a plurality of features characterizing the segment from the spectrogram representation; determining a consistent structure segment vector based on the features; deploying machine learning to determine at least one label of the segment vector; and outputting the at least one label; wherein the extracting a plurality of features further comprises assembling a plurality of matrixes and an index matrix, having identical number of cells, wherein each matrix of the plurality of matrixes represent a different feature of the plurality of features, wherein assembling the index matrix is based on a spectrogram having time frames and frequency bands, wherein the index matrix dimensions correlates with the time frames and frequency bands of the spectrogram, wherein the plurality of matrixes overlap with the index matrix, and wherein a content of each cell of each matrix of the plurality of matrixes represents a feature value of a time frame and a frequency band indicated by the index matrix, wherein one or more portions of frequency bands of the index matrix falling below a threshold of minimum number of consecutive time frames are filtered out of the index matrix and the plurality of matrixes.

2. The method of claim 1, wherein said obtaining digital information further comprises digitizing, by the at least one processor, an analog signal originated from a device having a microphone in real time or from a device having an audio recording, wherein, the analog signal comprising analog voice portions and non-voice portions; and wherein the digitizing of the analog voice portion produces the digital information of a segment.

3. The method of claim 1, wherein the segment represents an element selected from the group consisting of a syllable, a plurality of syllables, a word, a fraction of a word, a plurality of words, and any combination thereof.

4. The method of claim 1, wherein the extracting a plurality of features further comprises assembling a plurality of matrixes and an index matrix, having identical number of cells, wherein each matrix of the plurality of matrixes represent a different feature of the plurality of features, wherein assembling the index matrix is based on a spectrogram having time frames and frequency bands, wherein the index matrix dimensions correlates with the time frames and frequency bands of the spectrogram, wherein the plurality of matrixes overlap with the index matrix, and wherein a content of each cell of each matrix of the plurality of matrixes represents a feature value of a time frame and a frequency band indicated by the index matrix.

5. A method of speech recognition, voice recognition, and a combination thereof, sequentially executed by at least one processor, on a plurality of consecutive segments, the method comprising: obtaining digital information of a segment selected from the group consisting of speech segment, voice segment, and a combination thereof, wherein the digital information comprises a spectrogram representation; extracting a plurality of features characterizing the segment from the spectrogram representation; determining a consistent structure segment vector based on the features; deploying machine learning to determine at least one label of the segment vector; and outputting the at least one label; wherein the extracting a plurality of features further comprises assembling a plurality of matrixes and an index matrix, having identical number of cells, wherein each matrix of the plurality of matrixes represent a different feature of the plurality of features, wherein assembling the index matrix is based on a spectrogram having time frames and frequency bands, wherein the index matrix dimensions correlates with the time frames and frequency bands of the spectrogram, wherein the plurality of matrixes overlap with the index matrix, and wherein a content of each cell of each matrix of the plurality of matrixes represents a feature value of a time frame and a frequency band indicated by the index matrix, wherein contiguous time frames containing similar features values are replaced with a time interval in the index matrix and the plurality of matrixes.

6. A method of speech recognition, voice recognition, and a combination thereof, sequentially executed by at least one processor, on a plurality of consecutive segments, the method comprising: obtaining digital information of a segment selected from the group consisting of speech segment, voice segment, and a combination thereof, wherein the digital information comprises a spectrogram representation; extracting a plurality of features characterizing the segment from the spectrogram representation; determining a consistent structure segment vector based on the features; deploying machine learning to determine at least one label of the segment vector; and outputting the at least one label; wherein the extracting a plurality of features further comprises assembling a plurality of matrixes and an index matrix, having identical number of cells, wherein each matrix of the plurality of matrixes represent a different feature of the plurality of features, wherein assembling the index matrix is based on a spectrogram having time frames and frequency bands, wherein the index matrix dimensions correlates with the time frames and frequency bands of the spectrogram, wherein the plurality of matrixes overlap with the index matrix, and wherein a content of each cell of each matrix of the plurality of matrixes represents a feature value of a time frame and a frequency band indicated by the index matrix, wherein the determining a consistent structure segment vector further comprises compiling a plurality of components each comprising equal number of operands, wherein the first component of the plurality of components is an index component corresponding with the index matrix while the rest of the plurality of components are features components corresponding with the features matrixes, wherein a total number of operands is all possible combinations of frequency bands pairs, and wherein the index component indicates operands having band pairs presence in the segment vector.

7. The method of claim 6, wherein the segment vector further comprises inner relations that carry extra information necessary for the speech recognition and the voice recognition.

8. The method of claim 6, wherein properties of operands, having pairs presence, of each feature component are determined by calculating cross effect between sets of aggregated pairs, wherein each set of aggregated pairs is associated with a predetermine time zone of the segment.

9. The method of claim 6, wherein deploying machine learning further comprises classifying a segment vector based on preceding segment vectors and their labels, wherein each vector has at least one label comprising at least one alphanumeric character manifestation of a speech segment or a voice segment.

10. A system configured to execute the method of claim 1 the system comprising: at least one client device configured to communicate information; wherein the at least one processor executes a code for determining a consistent structure segment vector based on features selected from the group comprising of speech features, voice features, and any combination thereof; and at least one label of the segment vector.

11. The system of claim 10, wherein the at least one client device is further configured to obtain the information from devices selected from the group consisting of a real time sound system and a sound recording system.

12. The system of claim 10, wherein the information further comprises information for interfacing with a user.

13. The system of claim 10, wherein the at least one client device is adapted to perform duties selected from the group consisting of duties attributed to the segment vector generator, duties attributed to the machine learning server, and a combination thereof.

14. The system of claim 10, wherein the at least one client device is further configured to obtain the information from devices selected from the group consisting of image capturing, video capturing, images storage, video storage, and any combination thereof.

15. The system of claim 10, wherein the at least one client device is adapted to perform duties selected from the group consisting of duties attributed to the segment vector generator, duties attributed to the machine learning server, and a combination thereof.


Claims 1-9 and 11 - 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 – 15 of U.S. Patent No. 11,120,793 in view of Guven (US 2013/0297297). Although the claims at issue are not identical, they are not patentably distinct from each other.

As shown above,  claims 1 – 15 of US 11,120,793, in combination, recite the limitations of claims 1 –  9  and 11 – 20 except for the following: dividing each of a plurality of time frames of the speech segment to a plurality of frequency bands and each of the plurality of frequency bands to a plurality of frequency bins each having a bin value; calculating a speech feature value of each of a plurality of speech features, for each of a plurality of combinations, each of the combinations is of one of the plurality of time frames and one of the plurality of respective frequency bands, said plurality of speech features includes at least a mean value of the respective bin values, a standard deviation value of the respective bin values, a maximum value of the respective bin values and a voice-unvoiced ratio value of the respective bin values.
	However, Guven discloses a system and method for classification of emotions in human speech (Abstract), comprising the following: dividing each of a plurality of time frames of a speech segment into a plurality of frequency bands (a STFT of the speech signal is calculated, [0010] [0023] [0031]) and each of the plurality of frequency bands to a plurality of frequency bins each having a bin value (the output of the STFT is quantized by Bark Scale filters into bins, [0010] [0023] [0032]);  calculating a speech feature values of each of a plurality of speech features (linear regression coefficients of the time-frequency- power surface and first and second formants for each of a plurality of combinations, each of the combinations is one of the plurality of time frames and one of the plurality of respective frequency bands , wherein a plurality of speech  features includes at least a mean value of the respective bin values (average power per bin over a given t time slot) (Finally, linear regression coefficients of the time-frequency-power surface can be determined. This corresponds to average power per bin over a given time slot, the slope of the power parallel to the time axis, and the slope of the power parallel to the frequency axis for each bin and time slot, [0010] [0023] [0033 – 0035] [0037]). 
	Therefore, it would have been obvious to modify the recited limitations of claims 1 – 15 of US 11,120,793 with Guven’s teaching to achieve the following for the purpose of improving speech recognition by using a digital signaling processing pipeline which is computationally equivalent to a cochlea (Guven, [0005 – 0010]): dividing each of a plurality of time frames of the speech segment to a plurality of frequency bands and each of the plurality of frequency bands to a plurality of frequency bins each having a bin value; calculating a speech feature value of each of a plurality of speech features, for each of a plurality of combinations, each of the combinations is of one of the plurality of time frames and one of the plurality of respective frequency bands, said plurality of speech features includes at least a mean value of the respective bin values, a standard deviation value of the respective bin values, a maximum value of the respective bin values and a voice-unvoiced ratio value of the respective bin values.
	Therefore, claims 1 – 9  and 11 – 20 of the currently pending application are obvious variants of claims 1 – 15 of US 11,120,793.

Claim 10 is rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 – 15 of U.S. Patent No. 11,120,793 in view of Guven (US 2013/0297297) and further in view of Guven (“Robust Classification of Emotion in Human Speech Using Spectrogram Features”) (“Guven2”). Although the claims at issue are not identical, they are not patentably distinct from each other.
	  For claim 10, the limitations recited by claims 1- 15 of US 11,120,783 in view of Guven fail to recite the following: wherein said at least one label is a representation of at least one member of a group consisting of: an accent, a pronunciation level, an age of a speaker and a gender of the speaker.
	However, Guven2 discloses a method for classification of emotion in human speech using spectrogram features (Abstract), wherein a segment vector is classified and labeled using gender (Chapter 3 and 4.2.3 pg. 48 – 49, 65 and 66).
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to modify the combined teaching of claims 1 – 15 of US 11,120,783 and Guven with the teachings of Guven2 so that the at least one label is a representation of at least one member of a group consisting of: an accent, a pronunciation level, an age of a speaker and a gender of the speaker for the purpose of achieving greater efficiency of human computer interactions by understanding a human voice in a variety of conditions and discriminating characteristic variations of human voice (Guven, [0002]) (Given2, pg.65). 
	Therefore, claim 10 of the currently pending application is an obvious variant of claims 10 15 of US 11,120,793.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SONIA L GAY whose telephone number is (571)270-1951. The examiner can normally be reached Monday-Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SONIA L GAY/Primary Examiner, Art Unit 2657