Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments filed on 4/19/2022 are being considered by the examiner.
Applicant's arguments filed with respect to the 35 USC §103 on page 6 are convincing, but the application is rejected using a new reference in light of amendments.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3, 4, 8, 10, 11, 15, 17, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Korbecki (US 20150149179 A1) in further view of Ganguly (US 20210357433 A1), Girardi (US 20210286831 A1) and Shafiullah (US 20170357636 A1).
With respect to claim 1, 8 and 15 Korbecki teaches An apparatus/method/system comprising:
a memory (Korbecki: [0024] The computer readable media may be transitory, including, but not limited to, propagating electrical or electromagnetic signals, or may be non-transitory including, but not limited to, volatile and non-volatile computer memory or storage devices such as a hard disk, floppy disk, USB drive, DVD, CD, media cards, register memory, processor caches, Random Access Memory ("RAM"), etc.); and
a first device; a second device (Korbecki: [0109] It should be noted that process 700 or any step thereof could be provided by any of the devices shown in FIGS. 3-4. For example, process 700 may be executed by control circuitry 304 (FIG. 3) as instructed by the media application)
a hardware processor communicatively coupled to the memory (Korbecki: [0024] The media guidance application and/or any instructions for performing any of the embodiments discussed herein may be encoded on computer readable media. Computer readable media includes any media capable of storing data. The computer readable media may be transitory, including, but not limited to, propagating electrical or electromagnetic signals, or may be non-transitory including, but not limited to, volatile and non-volatile computer memory or storage devices such as a hard disk, floppy disk, USB drive, DVD, CD, media cards, register memory, processor caches, Random Access Memory ("RAM"), etc.) , the hardware processor configured to:
detect a plurality of spoken words in an audio signal of a conversation between a first person and a second person ([0053] For example, when two users are in viewing area 100, the media guidance application may present audible and/or visually any social network communication [conversation]  that is received that is associated with either of the two users when both are engaged (e.g., are both associated with an attentiveness level that exceeds a threshold), and [0040] In some embodiments, the content recognition module or algorithm may also include speech recognition techniques, including but not limited to Hidden Markov Models, dynamic time warping, and/or neural networks (as described above) to translate spoken words [detect word] into text and/or processing audio data.);
generate a text file comprising a plurality of textual words representing the detected plurality of spoken words (Korbecki: [0040] In some embodiments, the content recognition module or algorithm may also include speech recognition techniques, including but not limited to Hidden Markov Models, dynamic time warping, and/or neural networks (as described above) to translate spoken words [detect word] into text and/or processing audio data);
[[transform each word in the plurality of textual words into a vector indicative of a meaning of that word to produce a plurality of vectors]]; and
[[analyze, using a multi-attention network, the plurality of vectors to produce]]
a first score indicative of how attentive the first person was to the second person during the conversation (Korbecki: [0107] In response to determining that the conversation pertains to the content being presented, the media application may determine whether the users who were engaged in the conversation are associated with a level of attentiveness that exceeds a given threshold [score] (absent the conversation). When the users who were engaged in the conversation are also associated with a level of attentiveness that exceeds a given threshold, the media application may determine that the users have a heightened level of interest in the particular content being presented.); and
[[a second score indicative of how pleased the second person was with the first person during the conversation]].
Korbecki fails to explicitly disclose but Ganguly teaches transform each word in the plurality of textual words into a vector having an orientation indicative of a meaning of that word to produce a plurality of vectors (Ganguly: [0005] For example, a ML system is trained on a corpus of words (objects). During the training, a vector representation with a word length of multiple numeric parameters (parameters) mapped to the word “cat”. When the training/mapping is complete the vector will have parameters with values that associate/map the vector with the word “cat”. When an unknown target word has a numeric representation that matches the values of the parameters in the vector mapped to the word “cat”, the system infers that the target word is “cat” and/or has the meaning of “cat”. In ML systems, this process is repeated for an enormous number of objects. ).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Korbecki in view of Ganguly, in order to transform each word in the plurality of textual words into a vector indicative of a meaning of that word as desirable to have vectors in database with variable word lengths (or with variable numbers of non-zero value parameters) so the word length can vary to be only as long as needed to represent the respective object. ([0033], Ganguly);
Korbecki, and Ganguly fail to explicitly disclose but Girardi teaches analyze, using a multi-attention network, the plurality of vectors to produce (Girardi [0074] The input set of query terms may be a vector representation of initial tokens. The input set of query terms x1 . . . xn are received at a multi-head-attention layer 402 that helps the encoder look at other words in the input set as it encodes a specific word. The outputs of the multi-head-attention layer 402 [multi-attention network] are fed to a feed-forward neural network 404.):
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Korbecki and Ganguly in view of Girardi, in order to analyze, using a multi-attention network, the plurality of vectors to produce to increase the reformulation accuracy of the received query and may thus improve the performance of the data retrieval. ([0063], Girardi);
Korbecki, Ganguly Girardi fail to explicitly disclose but Shafiullah teaches a second score indicative of how pleased the second person was with the first person during the conversation (Shafiullah [0076]:  As referenced above, FIG. 6 illustrates an updated example of the sentimeter 502, illustrated as sentimeter 602. As shown, in the example of FIG. 6, the sentimeter 602 has been updated during the voice conversation to reflect a portion 604 illustrating 52% level of happiness [happiness] or satisfaction for the customer, while a remaining portion 606 illustrates a 48% level of dissatisfaction.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Korbecki, Ganguly, Girardi in view of Shafiullah, in order to indicate how pleased the second person was with the first person during the conversation to provide all related data processing and analyses ([0049], Shafiullah);



With respect to claim 3, 10, 17 Korbecki teaches  wherein the first score is higher the more 25attentive the first person was to the second person during the conversation (Korbecki: 0107] In response to determining that the conversation pertains to the content being presented, the media application may determine whether the users who were engaged in the conversation are associated with a level of attentiveness that exceeds a given threshold [score] (absent the conversation)

With respect to claim 4, 11, 18 Shafiullah further teaches wherein the second score is higher the more pleased the second person was with the first person during the conversation (Shafiullah [0076]:  As referenced above, FIG. 6 illustrates an updated example of the sentimeter 502, illustrated as sentimeter 602. As shown, in the example of FIG. 6, the sentimeter 602 has been updated during the voice conversation to reflect a portion 604 illustrating 52% level of happiness [happiness] or satisfaction for the customer, while a remaining portion 606 illustrates a 48% level of dissatisfaction.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Korbecki, Ganguly, Girardi in view of Shafiullah, in order for  the second score to be higher the more pleased the second person was with the first person during the conversation to provide all related data processing and analyses ([0049], Shafiullah);



Claims 5, 6, 12, 13, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Korbecki (US 20150149179 A1),  Ganguly  (US 20210357433 A1 ), Girardi (US 20210286831 A1) and Shafiullah (US 20170357636 A1) as applied to claims 1, 5, 8, 12, 15, 19 respectively and in further view of Cheung (US-20210151058-A1).

With respect to claims 5, 12 and 19, Korbecki, Ganguly and Girardi fail to explicitly disclose but Cheung teaches wherein the hardware processor is further configured to determine which words of the plurality of spoken words were spoken by the first person and which words of the plurality of spoken words were spoken by the second person (Cheung [0060] Speech transcriber 346 perform functions relating to transcribing speech segments recognized by speech recognition engine 342. For example, speech transcriber 346 produces text output of the one or more speech segments recognized by speech recognition engine 342 with an indication of the one or more speakers identified by speaker identifier 344. In some examples, speech transcriber 346 produces text output of the one or more speech segments recognized by speech recognition engine 342 that are associated with the user of HMD 112 (e.g., user 110). In other words, in some examples, speech transcriber 346 only produces text output for the one or more speech segments spoken by the user of HMD 112, as identified by speaker identifier 344. Either way, speech transcriber 346 then stores the text output in transcriptions 336.). 6. The apparatus of Claim 5, wherein the first score is determined by analyzing the words spoken by the first person and the second score is determined by analyzing the words spoken by the second person.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Korbecki, Ganguly, Girardi in view of Cheung, in order to determine which words of the plurality of spoken words were spoken by the first person and which words of the plurality of spoken words were spoken by the second person to  improve transcription and/or speaker identity accuracy ([0039], Cheung);

With respect to claims 6, 13 and 20 Korbecki teaches wherein the first score is determined by analyzing the words spoken by the first person  (Korbecki: [0107] In response to determining that the conversation pertains to the content being presented, the media application may determine whether the users who were engaged in the conversation are associated with a level of attentiveness that exceeds a given threshold [score] (absent the conversation). When the users who were engaged in the conversation are also associated with a level of attentiveness that exceeds a given threshold, the media application may determine that the users have a heightened level of interest in the particular content being presented.); and 
Korbecki does not explicitly disclose but Shafiullah the second score is determined by analyzing the words spoken by the second person (Shafiullah [0076]:  As referenced above, FIG. 6 illustrates an updated example of the sentimeter 502, illustrated as sentimeter 602. As shown, in the example of FIG. 6, the sentimeter 602 has been updated during the voice conversation to reflect a portion 604 illustrating 52% level of happiness [happiness] or satisfaction for the customer, while a remaining portion 606 illustrates a 48% level of dissatisfaction. )
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Korbecki, Ganguly, Girardi in view of Shafiullah, in order to determine second score by analyzing the words spoken by the second person to provide all related data processing and analyses ([0049], Shafiullah);

Claims 7, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Korbecki (US 20150149179 A1), Ganguly  (US 20210357433 A1 ), Girardi (US 20210286831 A1) and Shafiullah (US 20170357636 A1) as applied to claims 1, 8  respectively and in further view of Thomson (US-20210151058-A1).
With respect to claims 7, 14, Korbecki, Ganguly and Girardi fail to explicitly disclose but Thomson teaches wherein the audio signal has a bitrate of at least 64 kilobits per second (Thomson [0068]: For example, if the audio is encoded in a 64 kb/s format, and the transcriptions are generated at a peak rate of 200 bits/second, then the threshold may be set at 64.1 kb/s.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Korbecki, Ganguly, Girardi in view of Shafiullah, in order for audio signal having a bitrate of at least 64 kilobits per second to  improve technology with respect to audio communications and transfer of communications between devices ([0025], Thomson);

Allowable Subject Matter
Claims 2, 9 and 16 are objected to as being dependent upon a rejected base claims, but would be allowable if rewritten in independent form including all the limitations of the base claim and any intervening claims.
Claim 2, 9 and 16 recite “a multi-task model that produces the first score and the second score based on an output from the bi-directional attention layer.” The closest teachings come from  Song (US 20200193974 A1) who teaches “18. The apparatus of claim 12, wherein, for the recognition of the plural frames of the first speech, the processor is configured to use calculated acoustic scores of frames of the extracted select frames, calculated by the bi-directional neural network-based acoustic model, for inferring acoustic scores of respective frames of the plural frames of the first speech that correspond to the frames of the extracted select frames, the use of the calculated acoustic scores including deriving an acoustic score of one of the plural frames other than the extracted select frames, as an adjacent frame and being adjacent to one or more of the frames of the extracted select frames or the respective frames of the first speech, based on one or more calculated acoustic scores of the frames of the extracted select frames and/or one or more derived acoustic scores of the respective frames of the first speech.” However, neither Song nor any other cited references teach a bi-directional attention multi-task model that teaches first and second score based on an output form the bi-directional attention layer”

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.   Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ATHAR N PASHA/Examiner, Art Unit 2657   

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657