DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
        The prior objection to Claims 4, 6, 13, 15 and 16 (12/22/2020) are hereby withdrawn in light of amendments to the claims.

Response to Arguments
Applicant’s arguments with respect to independent claims 1, 12 and 20 as well as dependent claims 8 and 17 and references Kumar and Sato not disclosing new limitation “wherein the speech data comprises an initial session setting, a subsequent session setting, and a timestamp of a transition from the initial session setting to the subsequent session setting” as well as amended limitation “wherein the delivery setting comprises one or more of length of initial silence, urgency, insertion of non-content expressions, or an amount of change to any one or more of the delivery settings” (Amendment, pg. 8-9), have been considered but are moot in light of new grounds of rejection with references Fedorov and Reddy as provided in the rejections below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:



1.   Claims 1-4 and 7, 9-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar US PGPUB 2015/0281445 A1 (“Kumar”) in view of Sato US 5,949,854 (“Sato”) and Fedorov US PGPUB 2013/0300645 A1 (“Fedorov”)
         Per Claim 1, Kumar discloses a server, comprising: 
            a microprocessor (para. [0011]; para. [0063]; para. [0072]; para. [0423]); 
            a data storage (para. [0063]; para. [0120]; para. [0423]); and 
            network interface (para. [0068]; para. [0125]; para. [0132], communication with customer over network as requiring network interface); and 
            wherein the microprocessor: preforms a two-way interactive voice communication with a user device connected, via the network interface to a network, wherein the microprocessor provides a first portion of the two-way voice communication and receives a second portion of the two-way communication and wherein the first portion comprises speech generated by the microprocessor and the second portion comprise speech received by the microprocessor (para. [0003]; para. [0032]; The voice site called by the customer may be an automated interactive voice site that is configured to process, using pre-programmed scripts, information received from the customer that is input through the voice communications device being used by the user, and, in response, provide information to the user that is conveyed to the user through the voice communications device. The interaction between the customer and the voice site may 
          store, in the data storage, a recorded two-way voice communication in the data storage, comprising an audio waveform recording of the second portion (para. [0026]-[0027]; para. [0120]; para. [0197])
           Kumar does not explicitly disclose the recorded two-way voice communication in the data storage comprising speech data utilized by the microprocessor to generate the speech of the first portion
           However, this feature is taught by Sato (Abstract; col. 2, ln 25-39; col. 17, ln 41-58)
           Kumar in view of Sato does not explicitly disclose wherein the speech data comprises an initial session setting, a subsequent session setting, and a timestamp of a transition from the initial session setting to the subsequent session setting
           However, this feature is taught by Fedorov (fig. 13, elements 2040, 2050, 2060, 2070; para. [0068]; para. [0072]; para. [0075]-[0076]; Utilizing one or more of the statistical variables selected in (a)-(d) above, a voice expression value is generated, the generated voice expression value being associated with user 8's current emotional state…The CPU 20 outputs the determined emotion to mapping algorithm 1020 which maps input means 49 to input means-independent events 1030, wherein the events 1030 are defined to trigger changes in computer's emotional state…, para. [0077]-[0078]; Output of the update algorithm 1050 are time-stamped new values for each of the basic emotions in an emotional matrix 1062 and time-stamped records of events 
         It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Sato with the server of Kumar in arriving at the “recorded two-way voice communication in the data storage comprising speech data utilized by the microprocessor to generate the speech of the first portion”, as well as to combine the teachings of Fedorov with the server of Kumar in view of Sato in arriving at “wherein the speech data comprises an initial session setting, a subsequent session setting, and a timestamp of a transition from the initial session setting to the subsequent session setting”, because such combination would have resulted in improving service by synthesizing voices with voice quality suited to a user (Sato, col. 18, ln 5-24) as well as emotionally enriching the process of man-machine communication, fulfilling man's natural expectation of the emotional stream complementing information (Fedorov, para. [0011]; para. [0021])
          Per Claim 2, Kumar in view of Sato and Fedorov discloses the server of claim 1, 
              Sato discloses wherein the microprocessor generates the speech of the first portion, comprising: selecting content from a plurality of content maintained in the data storage, wherein the selected content comprises a content identifier and an audio waveform file (The log data storage unit stores contents of a plurality of processes executed in an interactive form with a user through voice responses in the form of log data indicating a record of time-transitions and stores identification data for identifying each of the contents of the plurality of processes as a part of contents of the voice responses…, col. 2, ln 25-39; col. 8, ln 13-55); 

            providing the sound to the first portion (col. 9, ln 3-20); and 
            setting the speech data to comprise the content identifier (Abstract; col. 2, ln 50-55; col. 17, ln 1-21).
          Per Claim 3, Kumar in view of Sato and Fedorov discloses the server of claim 1, 
             Sato discloses wherein the microprocessor provides the first portion, comprising: selecting content from a plurality of content maintained in the data storage, wherein the selected content comprises a content identifier and an associated a textual representation of the content (The log data storage unit stores contents of a plurality of processes executed in an interactive form with a user through voice responses in the form of log data indicating a record of time-transitions and stores identification data for identifying each of the contents of the plurality of processes as a part of contents of the voice responses…a content of the voice response that is stored in the log data storage unit is defined as character data…, col. 2, ln 25-55); 
            generating a spoken form of the textual representation of the content (col. 2, ln 50-55);
            providing the spoken form of the textual representation of the content as the first portion (col. 2, ln 25-55); and 
           setting the speech data to comprise the content identifier (Abstract; col. 2, ln 50-55; col. 17, ln 1-21).
          Per Claim 4, Kumar in view of Sato and Fedorov discloses the server of claim 1, 
              Sato discloses wherein the microprocessor provides the first portion, comprising: accessing an attribute of a user of the user device (An attribute storage 
             setting at least a portion of the speech data in accordance with the attribute (col. 6, ln 12-18; col. 17, ln 1-30); and
              generating the speech in accordance with the speech data (Abstract; col. 6, ln 12-18; col. 17, ln 1-30).
         Per Claim 7, Kumar in view of Sato and Fedorov discloses the server of claim 1, 
                Sato discloses wherein the speech data comprises a content delivery setting (col. 6, ln 12-18; col. 16, ln 40-47). 
          Per Claim 9, Kumar in view of Sato and Fedorov discloses the server of claim 1, 
                Sato discloses wherein the speech data is a non-audio representation of the first portion without any audio waveform data (Abstract; col. 2, ln 25-39; col. 17, ln 41-58).
         Per Claim 10, Kumar in view of Sato and Fedorov discloses the server of claim 1, 
               Sato discloses wherein the speech data comprising a speech setting determining non-verbalized content of the first portion (Abstract; col. 17, ln 1-21). 
          Per Claim 11, Kumar in view of Sato and Fedorov discloses the server of claim 10, 
               Sato discloses wherein the speech setting comprises at least one of apparent age, gender, nationality, accent, location, dialect, formality, education, confidence, patience, pace, or mood (Abstract; col. 17, ln 1-21).
         Per Claim 12, Kumar discloses a method, comprising: 

           generating, by the microprocessor, a first portion of the interactive two-way voice communication comprising speech generated by the microprocessor and provided to the user device via the network (in response, provide information to the user that is conveyed to the user through the voice communications device…, para. [0034]-[0035]; para. [0132]); and
           receiving, by the microprocessor, a second portion of the interactive two-way voice communication comprising speech received from the user device via the network (The voice site called by the customer may be an automated interactive voice site that is configured to process, using pre-programmed scripts, information received from the customer that is input through the voice communications device being used by the user…, para. [0034]-[0035]; para. [0132]); 
            recording the two-way voice communication, the recorded two-way voice comprising an audio recording of the second portion (para. [0026]-[0027]; para. [0197])

           However, this feature is taught by Sato (Abstract; col. 2, ln 25-39; col. 17, ln 41-58)
          Kumar in view of Sato does not explicitly disclose wherein the speech data comprises an initial session setting, a subsequent session setting, and a timestamp of a transition from the initial session setting to the subsequent session setting
           However, this feature is taught by Fedorov (fig. 13, elements 2040, 2050, 2060, 2070; para. [0068]; para. [0072]; para. [0075]-[0076]; Utilizing one or more of the statistical variables selected in (a)-(d) above, a voice expression value is generated, the generated voice expression value being associated with user 8's current emotional state…The CPU 20 outputs the determined emotion to mapping algorithm 1020 which maps input means 49 to input means-independent events 1030, wherein the events 1030 are defined to trigger changes in computer's emotional state…, para. [0077]-[0078]; Output of the update algorithm 1050 are time-stamped new values for each of the basic emotions in an emotional matrix 1062 and time-stamped records of events related to changes in emotional variables written into a learning database…, para. [0094])   
         It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Sato with the method of Kumar in arriving at the “recorded two-way voice communication in the data storage comprising speech data utilized by the microprocessor to generate the speech of the first portion”, as well as to combine the teachings of Fedorov with the method of Kumar in view of wherein the speech data comprises an initial session setting, a subsequent session setting, and a timestamp of a transition from the initial session setting to the subsequent session setting”, because such combination would have resulted in improving service by synthesizing voices with voice quality suited to a user (Sato, col. 18, ln 5-24) as well as emotionally enriching the process of man-machine communication, fulfilling man's natural expectation of the emotional stream complementing information (Fedorov, para. [0011]; para. [0021])
          Per Claim 13, Kumar in view of Sato and Fedorov discloses the method of claim 13, 
            Sato discloses selecting content from a plurality of content maintained in a data storage, wherein the selected content comprises a content identifier and an audio file (The log data storage unit stores contents of a plurality of processes executed in an interactive form with a user through voice responses in the form of log data indicating a record of time-transitions and stores identification data for identifying each of the contents of the plurality of processes as a part of contents of the voice responses…, col. 2, ln 25-39; col. 8, ln 13-55);
            generating sound from the audio file (col. 8, ln 13-38; col. 9, ln 3-20);
            providing the sound to the first portion (col. 9, ln 3-20); and 
            setting the speech data to comprise the content identifier (Abstract; col. 2, ln 50-55; col. 17, ln 1-21).
          Per Claim 14, Kumar in view of Sato and Fedorov discloses the method server of claim 12, 

             generating a spoken form of the textual representation of the content (col. 2, ln 50-55);
             providing the spoken form of the textual representation of the content as the first portion (col. 2, ln 25-55);  and 
             setting the speech data to comprise the content identifier (Abstract; col. 2, ln 50-55; col. 17, ln 1-21).
          Per Claim 15, Kumar in view of Sato and Fedorov discloses the method of claim 12, 
               Sato discloses accessing an attribute of a user of the user device (An attribute storage stores an age of each user.  A voice volume controller sets a volume level of a voice response for a user to a level corresponding to the user's age …, Abstract);
              setting at least a portion of the speech data in accordance with the attribute (col. 6, ln 12-18; col. 17, ln 1-30); and 
              generating the speech in accordance with the speech data (Abstract; col. 6, ln 12-18; col. 17, ln 1-30).
Claim 16, Kumar in view of Sato and Fedorov discloses the method of claim 15, 
               Sato discloses wherein the speech data comprises at least one content delivery setting (col. 6, ln 12-18; col. 16, ln 40-47).
           Per Claim 18, Kumar in view of Sato and Fedorov discloses the method of claim 12, 
               Sato discloses wherein the speech data comprising a speech setting determining non-verbalized content of the first portion (Abstract; col. 17, ln 1-21).
           Per Claim 19, Kumar in view of Sato and Fedorov discloses the method of claim 18, 
                Sato discloses wherein the speech setting comprises at least one of apparent age, gender, nationality, accent, location, dialect, formality, education, confidence, patience, or mood (Abstract; col. 17, ln 1-21).
          Per Claim 20, Kumar discloses a system, comprising: 
             means to engage in an interactive two-way voice communication between a microprocessor and a user device over a network (fig. 1; para. [0003]; para. [0032]; The voice site called by the customer may be an automated interactive voice site that is configured to process, using pre-programmed scripts, information received from the customer that is input through the voice communications device being used by the user, and, in response, provide information to the user that is conveyed to the user through the voice communications device. The interaction between the customer and the voice site may be done using an interactive voice response system (IVR) that is included in a 
             means to generate a first portion of the interactive two-way voice communication comprising speech generated by the microprocessor and provided to the user device via the network (in response, provide information to the user that is conveyed to the user through the voice communications device…, para. [0034]-[0035]; para. [0132]); and 
             means to receive a second portion of the interactive two-way voice communication comprising speech received from the user device via the network (The voice site called by the customer may be an automated interactive voice site that is configured to process, using pre-programmed scripts, information received from the customer that is input through the voice communications device being used by the user…, para. [0034]-[0035]; para. [0132]); 
            means to record the two-way voice communication, the recorded two-way voice comprising an audio recording of the second portion (para. [0026]-[0027]; para. [0197])
           Kumar does not explicitly disclose the recorded two-way voice comprising speech data utilized by the microprocessor to generate the speech of the first portion and wherein the speech data is absent sound waveform data
           However, this feature is taught by Sato (Abstract; col. 2, ln 25-39; col. 17, ln 41-58)
          Kumar in view of Sato does not explicitly disclose wherein the speech data comprises an initial session setting, a subsequent session setting, and a timestamp of a transition from the initial session setting to the subsequent session setting
 Fedorov (fig. 13, elements 2040, 2050, 2060, 2070; para. [0068]; para. [0072]; para. [0075]-[0076]; Utilizing one or more of the statistical variables selected in (a)-(d) above, a voice expression value is generated, the generated voice expression value being associated with user 8's current emotional state…The CPU 20 outputs the determined emotion to mapping algorithm 1020 which maps input means 49 to input means-independent events 1030, wherein the events 1030 are defined to trigger changes in computer's emotional state…, para. [0077]-[0078]; Output of the update algorithm 1050 are time-stamped new values for each of the basic emotions in an emotional matrix 1062 and time-stamped records of events related to changes in emotional variables written into a learning database…, para. [0094])   
         It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Sato with the system of Kumar in arriving at the “recorded two-way voice communication in the data storage comprising speech data utilized by the microprocessor to generate the speech of the first portion”, as well as to combine the teachings of Fedorov with the system of Kumar in view of Sato in arriving at “wherein the speech data comprises an initial session setting, a subsequent session setting, and a timestamp of a transition from the initial session setting to the subsequent session setting”, because such combination would have resulted in improving service by synthesizing voices with voice quality suited to a user (Sato, col. 18, ln 5-24) as well as emotionally enriching the process of man-machine communication, fulfilling man's natural expectation of the emotional stream complementing information (Fedorov, para. [0011]; para. [0021]).

2.        Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar in view of Sato and Fedorov as applied to claims 7 and 16 above, and further in view of Reddy et al US PGPUB 2014/0032471 A1 (“Reddy”)
          Per Claim 8, Kumar in view of Sato and Fedorov discloses the server of claim 7, 
                 Kumar in view of Sato and Fedorov does not explicitly disclose wherein the delivery setting comprises one or more of length of initial silence, urgency, insertion of non-content expressions, or an amount of change to any one or more of the delivery settings 
                However, this feature is taught by Reddy (The conversation filler window allows the user to create various filler words or phrases that the synthetic character can use at its discretion to work around technical limitations…, para. [0060], filler words/phrases as non-content expressions)
             It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Reddy with the server of Kumar in view of Sato and Fedorov in arriving at “wherein the delivery setting comprises one or more of length of initial silence, urgency, insertion of non-content expressions, or an amount of change to any one or more of the delivery settings”, because such combination would have resulted in allowing the system to provide speech output while working around technical difficulties like network lag (Reddy, para. [0060])
           Per Claim 17, Kumar in view of Sato and Fedorov discloses the method of claim 16, 

             However, this feature is taught by Reddy (The conversation filler window allows the user to create various filler words or phrases that the synthetic character can use at its discretion to work around technical limitations…, para. [0060], filler words/phrases as non-content expressions)
             It would have been obvious to one of ordinary skill in the art before the effective filing of the invention to combine the teachings of Reddy with the method of Kumar in view of Sato and Fedorov in arriving at “wherein the delivery setting comprises one or more of length of initial silence, urgency, insertion of non-content expressions, or an amount of change to any one or more of the delivery settings”, because such combination would have resulted in allowing the system to provide speech output while working around technical difficulties like network lag (Reddy, para. [0060])

Allowable Subject Matter
Claims 5 and 6 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUJIMI A ADESANYA whose telephone number is (571)270-3307.  The examiner can normally be reached on 8:30-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/OLUJIMI A ADESANYA/Primary Examiner, Art Unit 2658