DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is in response to the Amendments and Arguments filed on 09 June 2021. The Applicants’ amendment and remarks have been carefully considered, but they are moot in view of new grounds for rejection. Hence, this Action has been made FINAL. 
Any rejections of the previous office action not addressed in this action are considered resolved and no longer pertain to the prosecution of this application.

Response to Amendments and Arguments
The applicant argues step 2A, prong 1 of the 101 rejection by stating the features of claim 1 do not fall within the category of “human activity”. The examiner maintains that the steps of claim one may be performed mentally and/or with pen and paper. The applicant further argues step 2A prong 2 and step 2B, referring to paragraph [0092] of the specification. However, the examiner observes that although the specification refers to “the identification of the type of language in a hands-free manner” this is not conveyed in the claim language. Moreover, identification of the type of language in a hands-free manner may be performed mentally by a person.
The art rejections are moot in view of new grounds for rejection (Pashine). The limitation requiring a new reference is “acquiring, by performing voice recognition for the input voice in each of a plurality of languages, a plurality of speech recognition result”.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

         Claims 1-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. Claims 1, 6, and 11 are directed to the abstract idea of “a method of organizing human activities”. The claim is directed to a process, which is a statutory category of invention. A similar analysis yields the same conclusion for the applicant’s claim 1 and other independent claims: Step 1: YES. 
Next, the claims are assessed according to Prong One of the 2019 Revised Patent Subject Matter Eligibility Guidance – Judicial Exception Recited?. The limitations of the applicant’s claims 1, 6, and 11 are all steps which are directed toward the judicial exception of “organizing human activity”. Therefore, Step 2A, Prong One: YES. 
Next, the claims are assessed according to Prong Two of the 2019 Revised Patent Subject Matter Eligibility Guidance – 2A: Integrated into a Practical Application? And 2B: Claim provides an Inventive Concept?. Claim 11 recites the additional elements: processor and memory. However, the processor is recited at a high level of generality, i.e., as a generic processor performing a generic computer function of processing data. This generic processor limitation is no more than mere instructions to apply the exception using a generic computer component. The memory is similarly 
Independent claims 1 and 6 are also ineligible on similar grounds.
The dependent claims also address the abstract idea of “a method of organizing human activities” without adding significantly more. They are all drawn to analyzing a conversation and providing results based on this analysis.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, 

Claims 1-16 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claim 1 includes the limitation “specifying a phoneme count for each of the plurality of languages from the corresponding one of the phoneme strings obtained by the conversion for the respective languages”. A similar limitation is also in claims 6 and 11. However, the examiner does not see support in the applicant’s specification for the term “specifying” as used in this context. The examiner recommends reverting to the term “calculating” as used in the original claim language, for which there is clear support in the specification.
Additionally, claim 1 has been amended to include the phrase “acquiring, by performing voice recognition for the input voice in each of a plurality of languages, a plurality of speech recognition results”, for which no support is found in the applicant’s specification. A similar amendment has been made to claims 6 and 11. The specification contains support for “speech recognition”, but not for “voice recognition”. 

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:


Claims 2, 6, and 11 are rejected under pre-AIA  35 U.S.C. 103(a) as being obvious over US 20110035219, hereinafter referred to as Kadirkamanathan et al., in view of US 20170169814, hereinafter referred to as Pashine.

Regarding claim 1, Kadirkamanathan et al. discloses a non-transitory computer-readable recording medium (Kadirkamanathan et al., para [0076]) having stored therein a program for causing a computer to execute processing (Kadirkamanathan et al., para [0076]) comprising: 

acquiring, by performing voice recognition for the input voice in each of a plurality of languages, a plurality of speech recognition results (“In an embodiment, a human language and accent attribute filter consists of four language models 409-412 receive the audio information stream 402 to compare the output from the different human language models 409-412 at approximately the same time to generate a robust confidence rating for each recognized word.  The four exemplary human language models are a U.S.  English language model 410, a U.K.  English language model 411, European Spanish language model 408, and a Colombian Spanish language model 412.  The human language models 409-412 may be resident on the same machine or networked across multiple machines.  The audio information stream 402 may be 
FIG. 5 illustrates a graph of the continuous engine monitoring and transcribing the phone conversation. In U.S. English, a first speaker states the words, "Is that correct." In European Spanish, a second speaker responds with the words, "No mas!",” Kadirkamanathan et al., para [0058]); 

converting each of the plurality of speech recognition results into a phoneme string (“Each of the databases was filled with phoneme and phoneme sequences being trained on for a particular language in the set of two or more spoken languages, and each of the databases received the phoneme and phoneme sequences from a phone output from the same universal phoneme decoder independent of which spoken language in the set of two or more potential languages was being trained on. The run-time language identifier module identifies a particular human language being spoken in the audio stream in the set of two or more potential languages by utilizing the one or more statistical models. The language identification system that may be used with for example, a continuous speech recognition engine that includes various components that includes front end filters, a speech recognition decoder module, one or more statistical language models, and an output module,” Kadirkamanathan et al., para [0013]. See also Kadirkamanathan et al., fig. 3.); 

specifying a phoneme count for each of the plurality of languages from the corresponding one of the phoneme strings obtained by the conversion for the respective Each statistical model analyzes an amount of different phones and phone sequences that occur in a training audio data and counts of a total number of phonemes for the training audio data upon which the model is based on. A statistical inference methodology uses the extracted phoneme sequence to do the language identification,” Kadirkamanathan et al., para [0033].); and 

identifying a type of language matched with the input voice based on the phoneme counts calculated for the respective languages (“The language ID trainer 114 then approximates the n-gram distribution as the weighted sum of the probabilities of the n-gram sequence of phonemes and supplies this back to the statistical language model for that language. In essence the statistical language model compares both the ratios of counts of phone sequences observed in the training data compared to 1) how often particular phonemes and phoneme sequences are used in that human language, such as French, to an occurrence of other phoneme and phoneme sequences in that human language, and 2) how often particular phonemes and phoneme sequences are used in that human language, such as French, to an occurrence of the same or very similar sounding phonemes and phoneme sequences are used in another human language, such as English,” Kadirkamanathan et al., para [0029]. Also, “The language ID parameters database 116 couples to the run-time language identifier module 218. The language ID parameters database 116 is a populated database specific to a linguistic domain that contains at least the number of counts that the sequence of phones x followed by y occurs in the overall corpus of human language specific acoustic data analyzed from this domain analyzed C(xy), as well as the number of counts C(xyz) the N-grams (xyz), phone sequences of x followed by y followed by z, occurs in the overall corpus of domain-specific acoustic data from this analyzed domain,” Kadirkamanathan et al., para [0035]. And, “The set of languages trained on as discussed above may be two or more. However, more typically the set of languages for which the universal phoneme decoder contains a universal phoneme set representing phonemes occurring in the set of languages will be five or more languages. Thus, the set of language will be five or more languages,” Kadirkamanathan et al., para [0036].).

Kadirkamanathan et al., though, does not disclose acquiring, by performing voice recognition for the input voice in each of a plurality of languages, a plurality of speech recognition results.

Pashine is cited to disclose acquiring, by performing voice recognition for the input voice in each of a plurality of languages, a plurality of speech recognition results (“The above systems and methods also provide for an example system including an accented phonetic and transformed ID generation unit that includes a database of accented substrings…A third example of the system optionally includes one or more of the first example and the second example, and further includes the speech recognition system wherein the voice recognition logic unit comprises a context unit that includes a grammar file associated with the certain language,” Pashine, para [0072]. See also Pashine, fig. 3. Here, the voice recognition aids in the speech recognition process.). Pashine benefits Kadirkamanathan et al. by providing speech recognition with automatic accent detection (Pashine, para [0005]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Kadirkamanathan et al. with those of Pashine to improve the spoken language identification of Kadirkamanathan et al.   

As to claim 6, method claim 6 and CRM claim 1 are related as CRM and method of using the same, with each claimed element’s function corresponding to the CRM step. Accordingly claim 6 is similarly rejected under the same rationale as applied above with respect to CRM claim.

As to claim 11, device claim 11 and CRM claim 1 are related as CRM and device of using the same, with each claimed element’s function corresponding to the CRM step. Accordingly claim 11 is similarly rejected under the same rationale as applied above with respect to device claim. Also, Kadirkamanathan et al., para [0075] and [0077] teach processor and memory/instructions, respectively. 

Claims 2, 7, and 12 are rejected under pre-AIA  35 U.S.C. 103(a) as being obvious over US 20110035219, hereinafter referred to as Kadirkamanathan et al., in view of US 20170169814, hereinafter referred to as Pashine , and further in view of US 20070219777, hereinafter referred to as Chu et al.

Regarding claim 2 (original), Kadirkamanathan et al., as modified by Pashine, discloses the non-transitory computer-readable recording medium according to claim 1, wherein 

15the identifying includes identifying a language having the largest phoneme count among the plurality of languages as the language matched with the input voice (“The language ID parameters database 116 is trained/filled with phoneme sequences for each spoken language. Sequences of phonemes unique to one or a few languages are identified. phonemes and phoneme sequences that occur common to many languages but occur so commonly in those one or few languages that a high count of those phoneme or phoneme sequences occurrence is also a good indication that particular language is being spoken in the audio file under analysis,” Kadirkamanathan et al., para [0032].).  

Chu et al. is also included to emphasize that the identifying includes identifying a language having the largest phoneme count among the plurality of languages as the language matched with the input voice (“Language identification has been done for spoken languages. Using one technique, a speech utterance is first converted into a phoneme string by a speech recognition engine, then the probabilities that the phoneme string belongs to each candidate language are estimated by phoneme N-grams of that language, and finally the language with the highest likelihood is selected,” Chu et al., para [0003].). Chu et al. benefits Kadirkamanathan et al. by determining language origin from a letter string (Chu et al., para [0005]). Therefore, it would be obvious one for skilled in the art to combine the teachings of Kadirkamanathan et al. with those of Chu et al. to improve the spoken language identification of Kadirkamanathan et al.

As to claim 7, method claim 7 and CRM claim 2 are related as CRM and method of using the same, with each claimed element’s function corresponding to the CRM step. Accordingly claim 7 is similarly rejected under the same rationale as applied above with respect to CRM claim.

As to claim 12, device claim 12 and CRM claim 2 are related as CRM and device of using the same, with each claimed element’s function corresponding to the CRM step. Accordingly claim 12 is similarly rejected under the same rationale as applied above with respect to device claim. Also, Kadirkamanathan et al., para [0075] and [0077] teach processor and memory/instructions, respectively. 


Claims 3, 5, 8, 10, 13, and 15 are rejected under pre-AIA  35 U.S.C. 103(a) as being obvious over US 20110035219, hereinafter referred to as Kadirkamanathan et al., in view of US 20170169814, hereinafter referred to as Pashine, and further in view of US 20180286411, hereinafter referred to as Nakadai et al.

Regarding claim 3 (original), Kadirkamanathan et al., as modified by Pashine, discloses the non-transitory computer-readable recording medium according 20to claim 1, wherein the processing further includes: 

the identifying includes identifying the type of language matched with 25the input voice based on the phoneme counts calculated for the respective languages (Kadirkamanathan et al., para [0032].).

Kadirkamanathan et al., though, does not disclose calculating a sentence likelihood for each of the plurality of languages based on a linguistic model from the speech recognition result of the speech recognition performed on the input voice for the respective languages and the sentence likelihoods for the respective languages calculated based on the linguistic models.

Nakadai et al. is cited to disclose calculating a sentence likelihood for each of the plurality of languages based on a linguistic model from the speech recognition result of the speech recognition performed on the input voice for the respective languages (“The voice recognition unit 13 calculates a second likelihood for candidates for sentences indicating the contents of speech corresponding to the determined phoneme sequence candidates using a predetermined language model for each phoneme sequence candidate,” Nakadai et al., para [0038].) and the sentence likelihoods for the respective languages calculated based on the linguistic models (Nakadai et al., para [0038].). Nakadai et al. benefits Kadirkamanathan et al. by incorporating speaker identification along with speech recognition and language identification of    .Therefore, it would be obvious for one skilled in the art to combine the teachings of Kadirkamanathan et al. with those of Nakadai et al. to extend the language identification and speech recognition techniques of Kadirkamanathan et al. to speaker identification. 

As to claim 8, method claim 8 and CRM claim 3 are related as CRM and method of using the same, with each claimed element’s function corresponding to the CRM step. Accordingly claim 8 is similarly rejected under the same rationale as applied above with respect to CRM claim.

As to claim 13, device claim 13 and CRM claim 3 are related as CRM and device of using the same, with each claimed element’s function corresponding to the CRM step. Accordingly claim 13 is similarly rejected under the same rationale as applied above with respect to device claim. Also, Kadirkamanathan et al., para [0075] and [0077] teach processor and memory/instructions, respectively. 


Regarding claim 5 (original), Kadirkamanathan et al., as modified by Pashine and Nakadai et al., discloses the non-transitory computer-readable recording medium according to claim 3, wherein 

the identifying includes identifying the language in which a score 10calculated from the phoneme count and the sentence likelihood based on the linguistic model is the highest as the language matched with the input voice (“Here, the voice recognition unit 13 calculates a sound feature quantity for each frame with respect to the signals for respective sound sources, calculates a first likelihood for each possible phoneme sequence using a sound model preset for the calculated sound feature quantity, and determines candidates for a predetermined number of phoneme sequences in descending order of first likelihood. For example, the sound model is the Hidden Markov Model (HMM). The voice recognition unit 13 calculates a second likelihood for candidates for sentences indicating the contents of speech corresponding to the determined phoneme sequence candidates using a predetermined language model for each phoneme sequence candidate. For example, the language model is the n-gram. The voice recognition unit 13 calculates a total likelihood by combining the first likelihood and the second likelihood for each sentence candidate and determines asentence candidate having a maximum total likelihood as the contents of speech,” Nakadai et al., para [0038].).

As to claim 10, method claim 10 and CRM claim 5 are related as CRM and method of using the same, with each claimed element’s function corresponding to the CRM step. 

As to claim 15, device claim 15 and CRM claim 5 are related as CRM and device of using the same, with each claimed element’s function corresponding to the CRM step. Accordingly claim 15 is similarly rejected under the same rationale as applied above with respect to device claim. Also, Kadirkamanathan et al., para [0075] and [0077] teach processor and memory/instructions, respectively. 


Claim 4, 9, and 14 are rejected under pre-AIA  35 U.S.C. 103(a) as being obvious over US 20110035219, hereinafter referred to as Kadirkamanathan et al., in view of US 20170169814, hereinafter referred to as Pashine, further in view of US 20180286411, hereinafter referred to as Nakadai et al., and further in view of US 20190096396, hereinafter referred to as Jiang et al. 

Regarding claim 4 (original), Kadirkamanathan et al., as modified by Pashine and Nakadai et al., discloses the non-transitory computer-readable recording medium according 30to claim 3, wherein 30Fujitsu Ref. No.: 18-00427 

identifying the language having the largest phoneme count among the extracted one or 5more languages as the language matched with the input voice (“The language ID parameters database 116 is trained/filled with phoneme sequences for each spoken language. Sequences of phonemes unique to one or a few languages are identified. Phonemes patterns common to many different languages are also identified. The set of phonemes unique to one or a few languages may include phonemes and phoneme phonemes and phoneme sequences that occur common to many languages but occur so commonly in those one or few languages that a high count of those phoneme or phoneme sequences occurrence is also a good indication that particular language is being spoken in the audio file under analysis,” Kadirkamanathan et al., para [0032].).  

Kadirkamanathan et al., though, does not disclose that the identifying includes extracting one or more languages in which the sentence likelihood based on the linguistic model is equal to or more than a predetermined threshold value from the plurality of languages.

Nakadai et al. is cited to disclose that the identifying includes extracting one or more languages in which the sentence likelihood based on the linguistic model is equal to or more than a predetermined threshold value from the plurality of languages (“For example, for the speech sentence 1, if the probability of belonging to recognized as the Shandong dialect is 0.99, since the 0.99 exceeds the preset threshold (e.g., 0.95), it is determined that the target linguistic category corresponding to the speech information is Shandong dialect,” Jiang et al., para [0037].). Jiang et al. benefits Kadirkamanathan et al. by avoiding a manual language switching method as part of the speech recognition,which is inefficient (Jiang et al., para [0003]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Kadirkamanathan et al. with those of Jiang et al. to improve the spoken language identification of Kadirkamanathan et al.

As to claim 9, method claim 9 and CRM claim 4 are related as CRM and method of using the same, with each claimed element’s function corresponding to the CRM step. 

As to claim 14, device claim 14 and CRM claim 4 are related as CRM and device of using the same, with each claimed element’s function corresponding to the CRM step. Accordingly claim 14 is similarly rejected under the same rationale as applied above with respect to device claim. Also, Kadirkamanathan et al., para [0075] and [0077] teach processor and memory/instructions, respectively. 

Claim 16 is rejected under pre-AIA  35 U.S.C. 103(a) as being obvious over US 20110035219, hereinafter referred to as Kadirkamanathan et al., in view of US 20170169814, hereinafter referred to as Pashine, and further in view of US 20150127349, hereinafter referred to as Agiomyrgiannakis.

Regarding claim 16 (New), Kadirkamanathan et al., as modified by Pashine, discloses the non-transitory computer-readable recording medium according to claim 1, but not wherein the processing further includes: 

converting the input voice from the identified type of language to another type of language which is different from the identified type of language outputting the converted input voice.

Agiomyrgiannakis is cited to disclose converting the input voice from the identified type of language to another type of language which is different from the identified type of language outputting the converted input voice (“More specifically, an input utterance 1001 in an input language and in the voice of an input speaker may recognized by the ASR subsystem 1002, converted to text 1003 in the input language, translated by the language translation system 1004 into translated text 1005 in the output language, the converted by the TTS subsystem 1006 to a translated utterance 1007 in the output language and in the voice of the input speaker,” Agiomyrgiannakis, para [0167].). Agiomyrgiannakis benefits Kadirkamanathan et al. by providing cross-lingual voice conversion (Agiomyrgiannakis, Abstract). Therefore, it would be obvious for one skilled in the art to combine the teachings of Kadirkamanathan et al. with those of Agiomyrgiannakis to extend the language identification capabilities of Kadirkamanathan et al.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.  

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 5712727453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656