DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 21, 31, 23-25, and 33-35 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 21 and 31 recites “first feature data” in determining first feature data representing acoustic attributes corresponding to the first reference audio data. The specification ¶0140 refers to Fig. 11 as feature data values. “First feature data” is thus not clear.
Claims 21 and 31 recites “second feature data” in determining second feature data representing acoustic attributes corresponding to the input audio data. The specification ¶0140 refers to Fig. 11 as feature data values. “Second feature data” is thus not clear.
Claims 21 and 31 recites “first feature data” and “second feature data” in processing the first feature data and the second feature data using a trained model to determine output data representing at least one difference between the input audio data and the first reference audio data. The specification ¶0140 refers to Fig. 11 as feature data values. “First feature data” and ““second feature data” are thus not clear.
Claims 23-25 and 33-35 recites “first context data”. The specification ¶0132 discloses “context data” which represents the first speaker’s environment, circumstances, location, etc. “First context data” is thus not clear.

Claims 1-20 are cancelled.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 21-22, 25-32, and 35-40 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Craner (US #2004/0013252).

Regarding Claim 21, Craner discloses a computer-implemented method (Fig. 15; ¶0147), comprising:
receiving input audio data (Craner Fig. 15: talker);
determining that the input audio data represents audio from a user associated with a user profile (Craner Fig. 15: talker indicator 1503; ¶0129);
selecting first reference audio data associated with the user profile (Craner ¶0131 discloses since the prior neutral sample is determined when, for example, the talker first introduce themselves, the voice is assumed to be in a relatively relaxed state [in most cases].  ¶0132 discloses the prior neutral sample can also be obtained at some other time during the telephone conversation. For example, the listener may select some time in the middle of the talker's conversation when it is believed that the talker's voice is in a relaxed state. Alternatively, the sample can be taken over an extended duration so the prior neutral sample device 1508 can statistically determine the neutral sample of that particular speaker. Additionally, the neutral sample can be retrieved from a call history database or a corporate server; Fig. 15: prior neutral sample 1508);
determining first feature data representing acoustic attributes corresponding to the first reference audio data (Craner ¶0133 discloses the talker dependent emotion detector compares the prior neutral sample obtained from the prior neutral sample device 1508 to the voice signal transmitted over the network, by using a correlation program. For example, if the talker is mad, the voice will likely be louder than if the talker is speaking in their normal voice);
determining second feature data representing acoustic attributes corresponding to the input audio data (Craner ¶0134 discloses by comparison, if the talker is happy, the talker will likely be talking in a softer voice); and
processing the first feature data and the second feature data using a trained model to determine output data representing at least one difference between the input audio data and the first reference audio data (Craner ¶0135 discloses a talker independent emotion detector 1504 is also provided that operates similarly to the talker dependent emotion detector 1506 except that a "typical" person’s voice is stored in the prior neutral sample device 1508 instead of the actual talker's voice. This compares the talker's voice to a normal talker. For example, a typical angry person will "raise their voice", thus increasing the volume.  ¶0136 discloses a judge emotion detector 1514 is included, which minimizes errors caused by inaccuracies of the talker independent emotion detector 1504 used alone by weighing the output from the talker independent emotion detector 1504 and the talker dependent emotion detector 1506. As such, the emotion indicated by both the talker independent emotion detector 1504 and the talker dependent emotion detector 1506 can be quantified).

Regarding Claim 22, Craner discloses the computer-implemented method of claim 21, further comprising:
determining the user profile is associated with the first reference audio data and second reference audio data (Craner ¶0131 discloses the prior neutral sample device 1508 receives, and stores, a sample of each of the talker's voices. The talker dependent emotion detector 1506, which has received the identity of the present talker who is speaking from the talker indicator, also receives the voice signal directly over the network via the splitter 1520, and also receives a prior neutral sample from the prior neutral sample device 1508; Fig. 15),
wherein the selecting comprises selecting the first reference audio data instead of the second reference audio data (Craner Fig. 15: pre-stored talker 1520 is optional.  A prior neutral sample from the prior neutral sample device 1508 is therefore selected).

Regarding Claim 25, Craner in view of Osotio discloses the computer-implemented method of claim 23,
wherein the first context data indicates the user is interacting with another user (Craner ¶0037 discloses the overall configuration allows for the identity of a current talker to be transmitted to a listener located proximate the voice-projecting device. The listener located at the voice-projecting device can therefore use output from the voice-projecting device to determine the identity of the current talker. The listener can make this determination by being presented with other talker characteristics such as talkers geographical location, identifier, name, biographical data, emotional state, group membership, opinion on various topics, estimated absolute coordinates, estimated relative coordinates with respect to other conference participants).

Regarding Claim 26, Craner discloses the computer-implemented method of claim 21,
wherein the first reference audio data corresponds to baseline speech spoken by the user (Craner ¶0132 discloses the prior neutral sample can also be obtained at some other time during the telephone conversation. For example, the listener may select some time in the middle of the talker's conversation when it is believed that the talker's voice is in a relaxed state. Alternatively, the sample can be taken over an extended duration so the prior neutral sample device 1508 can statistically determine the neutral sample of that particular speaker).

Regarding Claim 27, Craner discloses the computer-implemented method of claim 21,
wherein the first reference audio data corresponds to a neutral state of the user (Craner ¶0132 discloses the prior neutral sample can also be obtained at some other time during the telephone conversation. For example, the listener may select some time in the middle of the talker's conversation when it is believed that the talker's voice is in a relaxed state. Alternatively, the sample can be taken over an extended duration so the prior neutral sample device 1508 can statistically determine the neutral sample of that particular speaker).

Regarding Claim 28, Craner discloses the computer-implemented method of claim 21,
wherein the output data further represents a first state of the user corresponding to the audio relative to a second state of the user corresponding to the first reference audio data (Craner ¶0137 discloses these quantifiable outputs from both the talker independent emotion detector 1504 and the talker dependent emotion detector 1506 are then weighted as indicated by W1 and W2 and input to the judge emotion detector 1514, which determines the most likely emotion of the talker. Based upon the determination by the judge emotion detector, an output verdict of emotion signal is transmitted to the visual emotion indicator 1516. ¶0138 discloses another talker may sound very angry. Both the talker dependent emotion detector 1506 [that would compare their angry voice to a prior neutral sample 1508] and the talker independent emotion detector 1504 [that would compare the angry voice to the voices of the public at large] would output these indications of angriness that would be weighed by the judge emotion detector. The judge emotion detector 1514 would then provide an indication of angry as the verdict of emotion to the visual emotion indicator 1516).

Regarding Claim 29, Craner discloses the computer-implemented method of claim 21, wherein determining the first feature data comprises:
processing the first reference audio data using a first encoder to determine the first feature data, the encoder having at least a first processing layer and a second processing layer (Craner ¶0049 discloses talker indicators 32 that include spatialization equipment act to capture spatial coordinates of sound sources, [e.g. a plurality of talkers] relative to a reference at a first location [or multiple locations]. The coordinated information is encoded over the standard [typically mono] audio, and relayed by analog or digital means to a remote location where the coordinate information is decoded).

Regarding Claim 30, Craner discloses the computer-implemented method of claim 29, wherein determining the second feature data comprises:
processing the input audio data using a second encoder and data corresponding to the second processing layer to determine the second feature data (Craner ¶0050 discloses providing the sound spatialization equipment at the voice capture device 14 [but not at the voice projecting device 23] results in encoding of spatial information, and should not limit the resulting audio from being used by a listener who does not have equipment to decode this spatial relationship).

Claims 31-32 and 35-40 are rejected for the same reasons as set forth in Claims 21-22, and 25-30.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 23-24 and 33-34 are rejected under 35 U.S.C. 103 as being unpatentable over Craner (US #2004/0013252) in view of Osotio et al. (US #2018/0101776).

Regarding Claim 23, Craner discloses the computer-implemented method of claim 21, but may not explicitly disclose further comprising: receiving first context data associated with the input audio data; and determining the first context data corresponds to the first reference audio data, wherein the selecting is based at least in part on the first context data corresponding to the first reference audio data.
However, Osotio (abstract; Figs. 1-7) teaches receiving first context data associated with the input audio data (Osotio ¶0077 discloses the emotional engine 604 receives data 616 from a variety of sources. Thus, the emotional engine 604 receives user data 606 from a user and/or user sources such as a user profile and/or user data 620. The emotional engine also receives device data 608 from devices 626. The data can come from applications 628, system functionality 630, sensors 632 and/or other device sources. This data includes, but is not limited to, location information, biometric [heart rate, skin temperature, respiration, etc.], movement history, application utilization, and/or a wide variety of other data. The emotional engine also receives service and/or application data 610 from services 624 that may provide data relevant to the user. Services and applications can analyze communication, such as voice, and other mechanisms to extract information about a user, the individual whom the user is communicating with, and/or the substance of the user's communication. Cameras can capture body language, eye contact, and so forth. Sensors can capture information about a user's environment [temperature, lighting, music playing, ambient noise level, and/or so forth], the force of typing on a keyboard, the smoothness and pressure of an ink trail, the pressure and duration of a tap on a touch input device, and/or so forth. Thus, the service/application data 610 can comprise any data from services or applications used by the user that are not accounted for by the devices. For example, in the conference call scenario above, the service/application data 610 can represent data collected by the conferencing service and/or digital assistant service. The user data 606, the device data 608 and the service/application data 610 comprises the contextual data utilized by the emotional engine to ascertain the emotional state vector); and
determining the first context data corresponds to the first reference audio data (Osotio ¶0078 discloses in addition to contextual data, public data 612 exists that can help ascertain the emotional state vector),
wherein the selecting is based at least in part on the first context data corresponding to the first reference audio data (Osotio ¶0031 discloses an algorithm can be used to capture tone of voice, volume, and other indicators of a user's emotional state).
Craner and Osotio are analogous art as they pertain to emotion state detection. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the talker emotion (as taught by Craner) to present emotional responses along with the context for the emotional responses (as taught by Osotio, ¶0033) to extract an emotional state from the variety of data and use the emotional state to customize the interaction of systems with a given user (Osotio, ¶0001).

Regarding Claim 24, Craner in view of Osotio discloses the computer-implemented method of claim 23. Craner may not explicitly disclose wherein the first context data indicates the user is participating in a first activity.
However, Osotio (abstract; Figs. 1-7) teaches wherein the first context data indicates the user is participating in a first activity (Osotio ¶0036 discloses in the present scenario, the processes identify an action to have the digital assistant recommend that Max and James work individually together on that part of the presentation. This can be identified, for example, by evaluating Max's emotional state vector which indicates a negative emotion such as frustration, a context which is collaborating on a presentation, with progress stalled on discussions for a particular section).
Craner and Osotio are analogous art as they pertain to emotion state detection. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the talker emotion (as taught by Craner) to present emotional responses along with the context for the emotional responses (as taught by Osotio, ¶0033) to extract an emotional state from the variety of data and use the emotional state to customize the interaction of systems with a given user (Osotio, ¶0001).

Claims 33-34 are rejected for the same reasons as set forth in Claims 23-24.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESHKUMAR G PATEL whose telephone number is (571)272-3957. The examiner can normally be reached 7:30 AM-4 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOGESHKUMAR PATEL/Primary Examiner, Art Unit 2651