DETAILED ACTION
This office action is in response to Applicant’s submission filed on 9/7/2021. Claims 1, 3, 4, 6 – 9, and 11 are pending in the application.  Claims 1, 4, 7 – 9, and 11 are emended. Claims 2, 5, 10 are cancelled. As such, claims 1, 3, 4, 6 – 9, and 11 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. PCT/JP2018/018051, filed on 5/17/2017.

Drawings
The drawing filed on 11/13/2019 have been accepted and considered by the examiner.

Response to Argument
Applicant’s amendment filed with respect to the 35 USC §112(f) rejection raised in the previous office action have been fully considered. The claimed invention, as 
Applicant's arguments filed with respect to the 35 USC §101 (Signal per se) rejections raised in the previous office action have been fully considered and are persuasive. The claimed invention, as currently amended, overcomes the 35 USC §101 (Signal per se) rejections. Therefore, the 35 USC §101 (Signal per se) rejections are withdrawn.
Applicant's arguments filed with respect to the 35 USC §101 rejections raised in the previous office action have been fully considered and are persuasive. The claimed invention, as currently amended, does not overcome the 35 USC §101 rejections. Therefore, the 35 USC §101 rejections are withdrawn. On page 5 through 7, Applicant’s argues:
However, amended Claim 1 now recites "a step for generating a primary stream expression for each speaker, the primary stream expression being a fixed-length vector of a word sequence corresponding to each speaker's speech recorded in a setting including a plurality of speakers, the word sequence being generated by automatic speech recognition; a step for generating a primary multi-stream expression obtained by integrating the primary stream expression; a step for generating a secondary stream expression for each speaker, the secondary stream expression being a fixed-length vector generated based on the word sequence of each speaker and the primary multi-stream expression; a step for generating a secondary multi-stream expression obtained by integrating the secondary stream expression and a step for calculating a posteriori probability with respect to a predetermined class, based on the secondary multi-stream expression." 
First, the claim refers to a combination of additional elements a) - g). 
a) A primary multi-stream expression is generated by integrating the primary stream expression. 
b) a secondary stream expression for each speaker, which is a fixed-length vector generated based on the word sequence of each speaker and the primary multi-stream expression, is generated for each speaker. 

Therefore, the claims as a whole integrate any interpreted judicial exception into a practical application. Specifically, additional elements have made it possible to build highly accurate multi-stream document discriminative models based on methods that provide specific improvements to traditional systems. Additionally, this constitutes an improvement of computer functions or improvement of other technologies and technical fields. 
Furthermore, the amendments make it clear that "the word sequence being generated by automatic speech recognition.... calculating a posteriori probability with respect to a predetermined class, based on the secondary multi-stream expression." Therefore, the invention of amended Claim 1, when considered as a whole, provide the additional practical application of allowing a system to determine which predetermined class is applicable to a speech between a plurality of speakers where it is explicitly that a human has not listened to the speech and transcribed the utterances with a pen and paper.

Examiner respectfully disagrees and do not find the speculation put forth by the Applicant persuasive. The abstract idea (human organizing activity) necessitate for a human not to be able to carry out such an inventive concept which has been claimed as an invention without employment of the proposed invention.  If processing circuitry or other elements are pulled out, human can still do the same activities as claimed. Therefore, the Applicant’s argument is not persuasive. Examiner respectfully direct the applicant to the updated 101 section of this Office Action for further detail. As such 101 rejection for claims 1, 3, 4, 6 – 9, and 11 are sustained.  

With respect to the rejection of claims 1, 3, 4, 6 – 9, and 11 under 35 U.S.C. §102(a)(1) as being anticipated by Bouaziz et al. (Parallel Long Short Term Memory for Multi-Stream Classification, 2016 IEEE Spoken Language Technology 
With respect to the rejection of Claim 1 under 35 U.S.C. §102(a)(1), Applicant respectfully traverses this ground of rejection. Claim 1 recites: 
generate a primary stream expression for each speaker, the primary stream expression being a fixed-length vector of a word sequence corresponding to each speaker's speech recorded in a setting including a plurality of speakers, the word sequence being generated by automatic speech recognition; 
generate a primary multi-stream expression obtained by integrating the primary stream expression; 
generate a secondary stream expression for each speaker, the secondary stream expression being a fixed- length vector generated based on the word sequence of each speaker and the primary multi-stream expression; 
generate a secondary multi-stream expression obtained by integrating the secondary stream expression; and 
calculate a posteriori probability with respect to a predetermined class, based on the secondary multi-stream expression. 
Turning to the applied art, Bouaziz does not use information from other streams (for example, in the case of a call center, the words spoken by the operator and the words spoken by the customer are common topics) at all when converting each stream into a fixed-length vector, so it does not use common information between streams, and high discrimination performance could not be expected. 
In the invention defined by Claim 1, the afore-mentioned features are utilized to realize high discrimination performance. 
Specifically, the information of all streams is once integrated as a fixed-length vector to generate "a primary multi-stream expression", which is used as additional information when each stream is read again. 
Specifically, by generating a secondary stream expression, which is a fixed-length vector based on the word sequence for each speaker and the "primary multi-stream expression," for each speaker, the important part of each stream can be obtained. 
Thus, Claim 1 allows construction of an important fixed-length vector. 
On the contrary, Bouaziz does not disclose or suggest any idea of integrating the primary stream expression as a fixed length vector to generate a "primary multi-stream expression" and reflecting this in the secondary stream expression. 
Therefore, Bouaziz fails to disclose or suggest all of "generate a primary stream expression for each speaker, the primary stream expression being a 

According to the Specification as filed Par. 0005:” … provides a technique for identifying a class of a document by using a multi-stream document. [[Bouaziz]] employs a method in which recurrent neural network [RNN] structures are prepared for respective streams [texts corresponding to speech uttered by respective persons who have attended calls and meetings], the respective streams are converted into fixed-length [fixed- dimension] vectors, and then, the pieces of information are integrated to perform identification, with respect to a target multi-stream document.”). Therefore, per as filed specification Bouaziz discloses fixed-length vectors which is in contradiction to the argument set forth by the Applicant on page 8 of the argument filed. Furthermore, to provide two sets of streams from the same set of information is amount to have couple of hypothesis at the output of an ASR and it does not appear to be a genuine and inventive concept. In Examiner’s view, once Bouaziz discloses a fixed-length vector which according to Applicant, he does, the rest of the argument is moot.
With respect to a posteriori probability, Examiner has introduced a new reference which teaches the art in question. Huang et al. (Rapid Feature Space Speaker posteriori probability as opposed to just the likelihood, we call this method fMAPLR.”, and section 2, second page, left column, second Par:” In the multi-stream HMM decision fusion approach, the single modality observations are assumed generated by audio-only and visual-only HMMs of identical topologies with class-conditional emission probabilities  P_a (O_(a,t) |c) and P_v (O_(v,t) |c), respectively, where c∈C denotes the speech classes of interest such as context dependent sub-phonetic units. Both are modeled as mixtures of Gaussian densities. Based on the assumption that audio and visual streams are independent, we compute the joint probability P_av (O(av,t) |c), as follows). Consequently, Examiner respectfully disagree and finds the Applicant’s argument moot in view of Bouaziz, and Haung as mentioned supra.
For at least the supra provided reasons, Applicant’s arguments are found not persuasive. Examiner respectfully disagrees, and therefore, the rejections of Claims 1, and 4 under 35 U.S.C. §102(a)(1) are sustained under now U.S.C. §103 and further updated accordingly.
In response to the art rejection of the remainder of dependent claims 3, 6 – 9, and 11 rejected under 35 U.S.C. §103 in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in remarks filed 9/7/2021, supra reasons provided in the response directed towards claim 1, and 4 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and as such Applicant’s arguments are also found not persuasive. Consequently, claim rejections for claim 1, 3, 4, 6 – 9, and 11 are sustained.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 3, 4, 6 – 9, and 11 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter without significantly more. The claims as whole, considering all claim elements both individually and in combination, do not amount to significantly more than an abstract idea.
The independent claims 1 and 4 recites: “A document identification device comprising: processing circuitry configured to generate a primary stream expression for each speaker, the primary stream expression being a fixed-length vector of a word sequence corresponding to each speaker's speech recorded in a setting including a plurality of speakers, the word sequence being generated by automatic speech recognition; generate a primary multi- stream expression obtained by 
The limitation of “generation”, “obtaining”, and “integrating”, as drafted covers a human organizing activities, as such they all point to an abstract idea. Generating a primary stream expression can be performed by a human by attentively listening to the speech transcribing the utterances with a pen and paper. Obtaining single stream or multi-stream expression can similarly also be accomplished by a human by simply hearing attentively a n utterance and writing it down on a piece of paper, and then arrange them in a series of words put them in a proper order to simulate a single or multi stream from them. Integrating the primary stream expression to create a secondary stream expression can also be carried out by writing the various transcriptions and perform compiling them together to create a single entity from them on piece of paper as well by mixing the original or primary series of words with another sets of words and order them in the fashion that is required.  Furthermore, there is an amended and added limitation of “a class identification unit that calculates a posteriori probability with respect to a predetermined class, based on the 
This judicial exception is not integrated into a practical application. Even though claims 1 and 4 do not recites any direct dependency to processors, but it is related to a processor thru “unit” cited in the claim, and a storage device, or programs, however the as filed applicant’s specification relies on executing the controller via a general purpose compute. For example, in Par. 0045 “Each device according to the present invention has, as a single hardware entity, for example, an input unit to which a keyboard or the like is connectable, an output unit to which a liquid crystal display or the like is connectable, a communication unit to which a communication device (for example, communication cable) capable of communication with the outside of the hardware entity is connectable, a central processing unit (CPU, which may include cache memory and/or registers), RAM or ROM as memories, an external storage device which is a hard disk, and a bus that connects the input unit, the output unit, the CPU, the RAM, the ROM, and the external storage device so that data can be exchanged between them. The hardware entity may also include, for example, a device (drive) capable of reading and writing a recording medium such as a CD-ROM as desired. A physical entity having such hardware resources may be a general-purpose computer, for example”. These additional elements (pre-post solution activities plus computer elements enumerated here) do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea–see MPEP 2106.05(f), 2106.04(d). The claim is directed to an abstract idea.
Likewise, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer which due to lack of specificity, is considered as a general purpose computer (or processor) -see Par. 0046 of the Applicant’s Specification “The external storage device of the hardware entity has stored therein programs necessary for embodying the aforementioned features and data necessary in the processing of the programs (in addition to the external storage device, the programs may be prestored in ROM as a storage device exclusively for reading out, for example). Also, data or the like resulting from the processing of these programs are stored in the RAM and the external storage device as appropriate.” Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Moreover, the 
Claims 8, and 9 are directed toward human activity. It recites: “wherein the secondary stream expression is a fixed-length vector that is generated by calculating a function having a feature of a recurrent neural network based on the word sequence and the primary multi-stream expression.” Putting a stream of word sequence into a fixed-length vector can be accomplish with a pen on a paper by a human. Having a “feature” of recurrent neural network is not defined clearly and as such is not by itself constitute an additional element. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claims 3, and 6 is directed toward human activity. It recites “wherein the secondary stream expression is a fixed-length vector that is generated by calculating a function having a feature of a recurrent neural network based on the word sequence and the primary multi-stream expression.” Putting a stream of word sequence along with another sequence into a fixed-length vector can be accomplish with a pen on a paper by a human. Having a “feature” of recurrent 
Claims 7, and 11 is a program for making a computer function means for performing the abstract idea. These recited means are interpreted as function/algorithm performed by a processor, which is nothing more than a generic computer executing a program code. As a result, a computer executing the program does not impose any meaningful limits on practicing the abstract idea. See MPEP2106.04(d) As a result, these claims are directed to an abstract idea. 
Therefore, claims 1, 3, 4, 6 – 9, and 11 are not patent eligible under 35 USC 101.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 4, 6 – 9, and 11  are rejected under 35 U.S.C. 103 as being unpatentable over Bouaziz et al. (Parallel Long Short Term Memory for Multi-Stream Classification, 2016 IEEE Spoken Language Technology Workshop)(hereinafter Bouaziz), (Applicant’s Admitted Prior Art as set forth in Paragraph 0005 – 0008 of the current as filed Specification), and Huang et al. (Rapid Feature Space Speaker Adaptation for Multi-stream HMM-Based Audio-Visual Speech Recognition, 2005 IEEE International Conference on Multimedia and Expo, Amesterdam, Netherlands).

Bouaziz was applied in the previous Office Action.
Regarding claim 1, AAPA of Bouaziz (Par. 5 of Specification) teaches a document identification device comprising: processing circuitry configured to generate a primary stream expression for each speaker, the primary stream expression being a fixed-length vector of a word sequence corresponding to each speaker's speech recorded in a setting including a plurality of speakers, the word sequence being generated by automatic speech recognition; generate a primary multi- stream expression obtained by integrating the primary stream expression; generate a secondary stream expression for each speaker, the secondary stream expression being a fixed-length vector generated based on the word sequence of each speaker and the primary multi-stream expression; generate a secondary multi- stream expression obtained by integrating the secondary stream expression and. (ABS:” This paper presents an original LSTM-based architecture, named Parallel LSTM [PLSTM], that carries out multiple parallel synchronized input sequences in order to predict a output. The proposed PLSTM method could be used for parallel sequence classification purposes. The PLSTM approach is evaluated on an automatic telecast genre sequences classification task and compared with different state-of-the- art architectures. Results show that the method outperforms the baseline n-gram models as well the state-of-the-art LSTM approach.”, and Applicant Specifications as filed Par. 0005:” … provides a technique for identifying a class of a document by using a multi-stream document. [[Bouaziz]] employs a method in which recurrent neural network [RNN] structures are prepared for respective streams [texts 
Bouaziz does not teach calculate a posteriori probability with respect to a predetermined class, based on the secondary multi-stream expression.
Huang teaches calculate a posteriori probability with respect to a predetermined class, based on the secondary multi-stream expression (Huang, Section 1, right column, second paragraph:” Moreover when only a small amount of on-line adaptation data is available, we could estimate the fMLLR transform to maximize the a posteriori probability as opposed to just the likelihood, we call this method fMAPLR.”, and section 2, second page, left column, second Par:” In the multi-stream HMM decision fusion approach, the single modality observations are assumed generated by audio-only and visual-only HMMs of identical topologies with class-conditional emission probabilities  P_a (O_(a,t) |c) and P_v (O_(v,t) |c), respectively, where c∈C denotes the speech classes of interest such as context dependent sub-phonetic units. Both are modeled as mixtures of Gaussian densities. Based on the assumption that audio and visual streams are independent, we compute the joint probability P_av (O(av,t) |c), as follows:

    PNG
    media_image1.png
    41
    554
    media_image1.png
    Greyscale

Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Bouaziz in 

With respect to claims 3, Bouaziz teaches wherein the secondary stream expression is a fixed-length vector that is generated by calculating a function having a feature of a recurrent neural network based on the word sequence and the primary multi-stream expression. (Entire publication, and Applicant Specifications as filed Par. 0005:”… the respective streams are converted into fixed-length [fixed- dimension] vectors, and then, the pieces of information are integrated to perform identification, with respect to a target multi-stream document.”).

Regarding claim 4, AAPA of Bouaziz (Par. 5 of Specification) teaches a document identification device comprising: processing circuitry configured to generate a primary stream expression for each speaker, the primary stream expression being a fixed-length vector of a word sequence corresponding to each speaker's speech recorded in a setting including a plurality of speakers, the word sequence being generated by automatic speech recognition; generate a primary multi- stream expression obtained by integrating the primary stream expression; generate a secondary stream expression for each speaker, the secondary stream expression being a fixed-length vector generated based on the word sequence of each speaker 
Bouaziz does not teach calculate a posteriori probability with respect to a predetermined class, based on the secondary multi-stream expression.
Huang teaches calculate a posteriori probability with respect to a predetermined class, based on the secondary multi-stream expression (Huang, Section 1, right column, second paragraph:” Moreover when only a small amount of on-line adaptation data is available, we could estimate the fMLLR transform to maximize the posteriori probability as opposed to just the likelihood, we call this method fMAPLR.”, and section 2, second page, left column, second Par:” In the multi-stream HMM decision fusion approach, the single modality observations are assumed generated by audio-only and visual-only HMMs of identical topologies with class-conditional emission probabilities  P_a (O_(a,t) |c) and P_v (O_(v,t) |c), respectively, where c∈C denotes the speech classes of interest such as context dependent sub-phonetic units. Both are modeled as mixtures of Gaussian densities. Based on the assumption that audio and visual streams are independent, we compute the joint probability P_av (O(av,t) |c), as follows:

    PNG
    media_image1.png
    41
    554
    media_image1.png
    Greyscale

Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Bouaziz in view of Huang to calculate a posteriori probability with respect to a predetermined class, based on the secondary multi-stream expression, in order to estimate the fMLLR transform to maximize the a posteriori probability as opposed to just the likelihood, as evidence by Haung (See sec. 1 one to last par.)

With respect to claim 6, Bouaziz teaches wherein the secondary stream expression is a fixed-length vector that is generated by calculating a function having a feature of a recurrent neural network based on the word sequence and the primary multi-stream expression. (Entire publication, and Applicant Specifications as filed Par. 0005:”… the respective streams are converted into fixed-length [fixed- 

With respect to claim 7, Bouaziz teaches a non-transitory computer-readable medium that stores program for making a computer function as the document identification device according to (Section 4.3:” The classical LSTM, and the proposed P2LSTM and P4LSTM, are composed with 3 layers: input layer X of size varying from 1 to 4, a hidden layer h of size 80 for all LSTM-based models and an output layer y with size equals to the number of different possible TV genres [11]. The Keras library [20], based on Theano [21] for fast tensor manipulation and CUDA-based GPU acceleration, has been employed to train neural networks on an Nvidia Ge Force GTX TITAN X GPU card.”).

With respect to claim 8, Bouaziz teaches wherein the secondary stream expression is a fixed-length vector that is generated by calculating a function having a feature of a recurrent neural network based on the word sequence and the primary multi-stream expression: (” Entire publication, and Applicant Specifications as filed Par. 0005:”… the respective streams are converted into fixed-length [fixed- dimension] vectors, and then, the pieces of information are integrated to perform identification, with respect to a target multi-stream document.”).

converted into fixed-length [fixed- dimension] vectors, and then, the pieces of information are integrated to perform identification, with respect to a target multi-stream document.”).

With respect to claim 11, Bouaziz teaches a non-transitory computer-readable medium that stores program for making a computer function as the document identification device according to (Section 4.3:” The classical LSTM, and the proposed P2LSTM and P4LSTM, are composed with 3 layers: input layer X of size varying from 1 to 4, a hidden layer h of size 80 for all LSTM-based models and an output layer y with size equals to the number of different possible TV genres [11]. The Keras library [20], based on Theano [21] for fast tensor manipulation and CUDA-based GPU acceleration, has been employed to train neural networks on an Nvidia Ge Force GTX TITAN X GPU card.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Xiong et al. (US Patent Application Number: US20180329884A1) teach (Par. 0019):”generating a response string based at least .
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DARIOUSH AGAHI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656