DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed May 18, 2022 has been entered.  Claims 1, 3 – 14 and 18 – 21 remain pending in the application.  Applicant’s amendments to the Claims have overcome each and every 35 U.S.C. 101 rejection and 35 U.S.C. 112(a) and 112(b) rejection previously set forth in the Non-Final Office Action mailed February 18, 2022, and the amended claims 1 and 18 no longer invoke a claim interpretation under 35 U.S.C. 112(f).
Response to Arguments
Applicant’s arguments with respect to claims 1, 3 – 14 and 18 – 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Specification
The disclosure is objected to because of the following informalities:
In paragraph 0098, line 3, the acronym “DB” is used without being defined.
In paragraph 0098, line 4, the acronym “DB” is used without being defined.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3 – 5, 12 – 14 and 18 – 21 are rejected under 35 U.S.C. 103 as being unpatentable over by Agiomyrgiannakis et al. (US Patent No. 9,865,247), hereinafter Agiomyrgiannakis, in view of Yella et al. ("Inferring Social Relationships in a Phone Call from a Single Party's Speech"), hereinafter Yella.
Regarding claim 1, Agiomyrgiannakis teaches an information processing apparatus (Figure 1A, Device 100) comprising:
a processor (Figure 1A, Processor 106) configured to:
acquire learning data set including first time-series data for a neural network (Column 4, lines 53-56, “The input interface 102 may include an audio input device such as a microphone or any other component configured to provide an input signal comprising audio content associated with speech to the processor 106.”; Column 4, lines 6-8, “For example, the linguistic content may include text that corresponds to the speech (e.g., the speech and the linguistic content may be training data for the device).”; Column 9, lines 45-47, “The neural network may be configured to learn mapping from an input sequence (e.g., linguistic features) to output sequence (e.g., phase vectors).”);
identify first feature information from the first time-series data (Column 1, lines 36-38, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data.");
learn a parameter of the neural network (Column 9, lines 52-56, "For example, parameters of the statistical distributions may correspond to outputs of the neural network, and weights of the neural network may be trained based on an error measure associated with the statistical distributions."  Adjusting weights of the neural network during training demonstrates learning a parameter of the neural network.);
infer second feature information from the first time-series data and the identified first feature information based on the neural network (Column 1, lines 36-41, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."; Column 9, lines 41-45, “An example for neural network implementations and variants of neural networks (e.g., mixture density network, recurrent neural network, long short-term memory, etc.) of the mapping module 114 for the statistical mapping is as follows.”  Determining the acoustic feature parameter of phase data from the speech signal and mapping the phase data to linguistic features demonstrates inferring second feature information from the time-series data and the first feature information.),
wherein a meaning is assigned to the first feature information, the meaning is not assigned to the second feature information (Column 1, lines 36-45, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."  The acoustic feature parameters correspond to the feature information, with the phase data corresponding to the first feature information with meaning assigned, and the circular space representation and linguistic features corresponding to the second feature information with meaning not assigned.),
the first feature information includes information corresponding to a context of the first time-series data (Column 3, line 64 - column 4, line 1, "Additionally, for example, the linguistic features may include context features such as preceding/following phonemes, position of speech sound within the speech, distance from stressed/accented syllable in the speech, prosodic context, length of speech sound, etc."),
and generate second time-series data based on the first feature information and the second feature information (Column 1, lines 45-47, "The method also includes providing a synthetic audio pronunciation of the linguistic content based on the mapping."  The mapping refers to mapping the phase data to linguistic features based on the circular space representations, demonstrating the generation of time-series synthetic audio derived from the feature information.).
Agiomyrgiannakis does not specifically disclose: the information corresponding to the context includes information related to a conversational partner of a speaker.
Yella teaches:
the information corresponding to the context includes information related to a conversational partner of a speaker (Abstract, lines 11-14, “We trained a classifier using a boosting algorithm on a set of conversational and acoustic features and use it to classify calls according to the social relationship between both speakers.”).
Yella teaches identifying feature information corresponding to a conversational partner of the speaker in order to detect the social relationship between two people (Abstract, lines 1-9, “People usually speak differently depending on who they talk to. Based on this hypothesis, in this paper we propose an automatic method to detect the social relationship between two people based solely on a set of acoustic and conversational characteristics. We argue that changes in these features of an individual reflect the social relationship with the other person. To infer relationship we only require the speech of one of the conversation partners and the interaction patterns between both speakers.”).
Agiomyrgiannakis and Yella are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis to incorporate the teachings of Yella and identify feature information corresponding to a conversational partner of the speaker.  Doing so would allow for detecting the social relationship between two people (Yella, Abstract, lines 1-9).
Regarding claim 3, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1.
Agiomyrgiannakis further discloses:
wherein the first feature information includes information indicating a framework of the first time-series data (Column 3, lines 45-52, "The device may be configured to determine acoustic feature parameters for the speech that include amplitude data and phase data. For example, the device may utilize various techniques (e.g., vocoder analysis techniques) that provide a parametric representation (e.g., spectral envelopes, aperiodicity envelopes, etc.) of the speech in the input."  The amplitude data, phase data, and parametric representation correspond to features indicating a framework, or structure, of the times-series data.).
Regarding claim 4, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 3.
Agiomyrgiannakis further discloses:
wherein the information indicating the framework is text information, and the second time-series data includes speech data in which the text information is read aloud (Column 4, lines 4-12, "In some examples, the device may be configured to receive the linguistic content along with the speech in the input. For example, the linguistic content may include text that corresponds to the speech (e.g., the speech and the linguistic content may be training data for the device). In other examples, the linguistic content may be received as a separate input by the device for which the device may generate a synthetic audio pronunciation based on an analysis of the speech.").
Regarding claim 5, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1.
Agiomyrgiannakis further discloses:
wherein the information corresponding to the context includes information related to the speaker (Column 17, lines 28-33, "In one example, the input speech may include speech by a first speaker, and the output synthesized audio pronunciation may correspond to speech by a second speaker or speech having different voice characteristics that corresponds to the same linguistic content as the input speech."  Using speech input from one speaker and modifying the voice characteristics to output synthesized speech that matches the voice characteristics of a second speaker demonstrates acquiring context features that relate to the speaker.).
The information processing apparatus according to claim 1, wherein the information corresponding to the context includes information related to a region in which the speaker is located.  
Regarding claim 12, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1.
Agiomyrgiannakis further discloses:
wherein the first feature information and the second feature information correspond to features included in a speech of the speaker (Column 1, lines 36-45, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."  The acoustic feature parameters correspond to the feature information, with the phase data corresponding to the first feature information, and the circular space representation and linguistic features corresponding to the second feature information, where the first feature information and second feature information correspond to features in user speech.).
Regarding claim 13, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1.
Agiomyrgiannakis further discloses:
wherein the processor is further configured to: acquire identification information corresponding to the first feature information and the second feature information (Column 1, lines 36-45, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."  The acoustic feature parameters correspond to the feature information, with the phase data corresponding to the first feature information and the circular space representation corresponding to the second feature information, and the linguistic features corresponding to the identification information.),
and generate the second time-series data having features corresponding to the identification information (Column 1, lines 45-47, "The method also includes providing a synthetic audio pronunciation of the linguistic content based on the mapping."  The mapping refers to mapping the phase data to linguistic features based on the circular space representations, demonstrating the generation of time-series synthetic audio with features corresponding to the identification information.).
Regarding claim 14, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1.
Agiomyrgiannakis further discloses:
wherein the processor is further configured to generate the second time-series data based on the neural network (Column 9, lines 45-47, "The neural network may be configured to learn mapping from an input sequence (e.g., linguistic features) to output sequence").
Regarding claim 18, Agiomyrgiannakis teaches an information processing apparatus (Figure 1A, Device 100) comprising:
a first processor (Figure 1A, Processor 106) configured to:
notify a second information processing apparatus of learning data set including first time-series data (Column 4, lines 53-56, “The input interface 102 may include an audio input device such as a microphone or any other component configured to provide an input signal comprising audio content associated with speech to the processor 106.”; Column 4, lines 6-8, “For example, the linguistic content may include text that corresponds to the speech (e.g., the speech and the linguistic content may be training data for the device).”; Column 5, lines 6-13, “Additionally or alternatively, the input interface 102 and/or the output interface 104 may include network interface components configured to, respectively, receive and/or transmit the input signal and/or the output signal described above. For example, an external computing device (e.g., server, etc.) may provide the input signal (e.g., speech content, linguistic content, etc.) to the input interface 102 via a communication medium”);
wherein the second information processing apparatus includes a second processor (Column 17, lines 41-45, “FIG. 8 illustrates an example distributed computing architecture 800, in accordance with an example embodiment. FIG. 8 shows server devices 802 and 804 configured to communicate, via network 806, with programmable devices 808a, 808b, and 808c.”; Column 4, lines 49-52, “The processor 106 included in the device 100 may comprise one or more processors configured to execute the program instructions 110 to operate the device 100.”) that: 
acquires the learning data set for a neural network (Column 4, lines 53-56, “The input interface 102 may include an audio input device such as a microphone or any other component configured to provide an input signal comprising audio content associated with speech to the processor 106.”; Column 4, lines 6-8, “For example, the linguistic content may include text that corresponds to the speech (e.g., the speech and the linguistic content may be training data for the device).”; Column 9, lines 45-47, “The neural network may be configured to learn mapping from an input sequence (e.g., linguistic features) to output sequence (e.g., phase vectors).”);
identifies first feature information from the first time-series data (Column 1, lines 36-38, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data.");
learns a parameter of the neural network (Column 9, lines 52-56, "For example, parameters of the statistical distributions may correspond to outputs of the neural network, and weights of the neural network may be trained based on an error measure associated with the statistical distributions."  Adjusting weights of the neural network during training demonstrates learning a parameter of the neural network.);
infers second feature information from the first time-series data and the identified first feature information based on the neural network (Column 1, lines 36-41, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."; Column 9, lines 41-45, “An example for neural network implementations and variants of neural networks (e.g., mixture density network, recurrent neural network, long short-term memory, etc.) of the mapping module 114 for the statistical mapping is as follows.”  Determining the acoustic feature parameter of phase data from the speech signal and mapping the phase data to linguistic features demonstrates inferring second feature information from the time-series data and the first feature information.),
wherein a meaning is assigned to the first feature information, the meaning is not assigned to the second feature information (Column 1, lines 36-45, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."  The acoustic feature parameters correspond to the feature information, with the phase data corresponding to the first feature information with meaning assigned, and the circular space representation and linguistic features corresponding to the second feature information with meaning not assigned.),
the first feature information includes information corresponding to a context of the first time-series data (Column 3, line 64 - column 4, line 1, "Additionally, for example, the linguistic features may include context features such as preceding/following phonemes, position of speech sound within the speech, distance from stressed/accented syllable in the speech, prosodic context, length of speech sound, etc."),
and generates second time-series data based on the first feature information and the second feature information (Column 1, lines 45-47, "The method also includes providing a synthetic audio pronunciation of the linguistic content based on the mapping."  The mapping refers to mapping the phase data to linguistic features based on the circular space representations, demonstrating the generation of time-series synthetic audio derived from the feature information.);
and acquire, from the second information processing apparatus, the generated second time-series data (Column 10, lines 51-55, "In some examples, the device 100 in FIG. 1C may be configured to provide an output that includes synthetic speech 152 indicative of a synthetic audio pronunciation of the linguistic content 150. The output, for example, may be provided via the output interface 104"; Column 5, lines 6-13, “Additionally or alternatively, the input interface 102 and/or the output interface 104 may include network interface components configured to, respectively, receive and/or transmit the input signal and/or the output signal described above.”).
Agiomyrgiannakis does not specifically disclose: the information corresponding to the context includes information related to a conversational partner of a speaker.
Yella teaches:
the information corresponding to the context includes information related to a conversational partner of a speaker (Abstract, lines 11-14, “We trained a classifier using a boosting algorithm on a set of conversational and acoustic features and use it to classify calls according to the social relationship between both speakers.”).
Yella teaches identifying feature information corresponding to a conversational partner of the speaker in order to detect the social relationship between two people (Abstract, lines 1-9, “People usually speak differently depending on who they talk to. Based on this hypothesis, in this paper we propose an automatic method to detect the social relationship between two people based solely on a set of acoustic and conversational characteristics. We argue that changes in these features of an individual reflect the social relationship with the other person. To infer relationship we only require the speech of one of the conversation partners and the interaction patterns between both speakers.”).
Agiomyrgiannakis and Yella are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis to incorporate the teachings of Yella and identify feature information corresponding to a conversational partner of the speaker.  Doing so would allow for detecting the social relationship between two people (Yella, Abstract, lines 1-9).
Regarding claim 19, Agiomyrgiannakis teaches an information processing method, comprising:
acquiring learning data set including first time-series data for a neural network (Column 4, lines 53-56, “The input interface 102 may include an audio input device such as a microphone or any other component configured to provide an input signal comprising audio content associated with speech to the processor 106.”; Column 4, lines 6-8, “For example, the linguistic content may include text that corresponds to the speech (e.g., the speech and the linguistic content may be training data for the device).”; Column 9, lines 45-47, “The neural network may be configured to learn mapping from an input sequence (e.g., linguistic features) to output sequence (e.g., phase vectors).”);
identifying first feature information from the first time-series data (Column 1, lines 36-38, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data.");
learning a parameter of the neural network (Column 9, lines 52-56, "For example, parameters of the statistical distributions may correspond to outputs of the neural network, and weights of the neural network may be trained based on an error measure associated with the statistical distributions."  Adjusting weights of the neural network during training demonstrates learning a parameter of the neural network.);
inferring second feature information from the first time-series data and the identified first feature information based on the neural network (Column 1, lines 36-41, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."; Column 9, lines 41-45, “An example for neural network implementations and variants of neural networks (e.g., mixture density network, recurrent neural network, long short-term memory, etc.) of the mapping module 114 for the statistical mapping is as follows.”  Determining the acoustic feature parameter of phase data from the speech signal and mapping the phase data to linguistic features demonstrates inferring second feature information from the time-series data and the first feature information.),
wherein a meaning is assigned to the first feature information, the meaning is not assigned to the second feature information (Column 1, lines 36-45, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."  The acoustic feature parameters correspond to the feature information, with the phase data corresponding to the first feature information with meaning assigned, and the circular space representation and linguistic features corresponding to the second feature information with meaning not assigned.),
the first feature information includes information corresponding to a context of the first time-series data (Column 3, line 64 - column 4, line 1, "Additionally, for example, the linguistic features may include context features such as preceding/following phonemes, position of speech sound within the speech, distance from stressed/accented syllable in the speech, prosodic context, length of speech sound, etc."),
and generating second time-series data based on the first feature information and the second feature information (Column 1, lines 45-47, "The method also includes providing a synthetic audio pronunciation of the linguistic content based on the mapping."  The mapping refers to mapping the phase data to linguistic features based on the circular space representations, demonstrating the generation of time-series synthetic audio derived from the feature information.).
Agiomyrgiannakis does not specifically disclose: the information corresponding to the context includes information related to a conversational partner of a speaker.
Yella teaches:
the information corresponding to the context includes information related to a conversational partner of a speaker (Abstract, lines 11-14, “We trained a classifier using a boosting algorithm on a set of conversational and acoustic features and use it to classify calls according to the social relationship between both speakers.”).
Yella teaches identifying feature information corresponding to a conversational partner of the speaker in order to detect the social relationship between two people (Abstract, lines 1-9, “People usually speak differently depending on who they talk to. Based on this hypothesis, in this paper we propose an automatic method to detect the social relationship between two people based solely on a set of acoustic and conversational characteristics. We argue that changes in these features of an individual reflect the social relationship with the other person. To infer relationship we only require the speech of one of the conversation partners and the interaction patterns between both speakers.”).
Agiomyrgiannakis and Yella are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis to incorporate the teachings of Yella and identify feature information corresponding to a conversational partner of the speaker.  Doing so would allow for detecting the social relationship between two people (Yella, Abstract, lines 1-9).
Regarding claim 20, Agiomyrgiannakis teaches an information processing method, comprising:
notifying a second information processing apparatus of learning data set including first time-series data (Column 4, lines 53-56, “The input interface 102 may include an audio input device such as a microphone or any other component configured to provide an input signal comprising audio content associated with speech to the processor 106.”; Column 4, lines 6-8, “For example, the linguistic content may include text that corresponds to the speech (e.g., the speech and the linguistic content may be training data for the device).”; Column 5, lines 6-13, “Additionally or alternatively, the input interface 102 and/or the output interface 104 may include network interface components configured to, respectively, receive and/or transmit the input signal and/or the output signal described above. For example, an external computing device (e.g., server, etc.) may provide the input signal (e.g., speech content, linguistic content, etc.) to the input interface 102 via a communication medium”);
wherein the second information processing apparatus includes a second processor (Column 17, lines 41-45, “FIG. 8 illustrates an example distributed computing architecture 800, in accordance with an example embodiment. FIG. 8 shows server devices 802 and 804 configured to communicate, via network 806, with programmable devices 808a, 808b, and 808c.”; Column 4, lines 49-52, “The processor 106 included in the device 100 may comprise one or more processors configured to execute the program instructions 110 to operate the device 100.”) that: 
acquires the learning data set for a neural network (Column 4, lines 53-56, “The input interface 102 may include an audio input device such as a microphone or any other component configured to provide an input signal comprising audio content associated with speech to the processor 106.”; Column 4, lines 6-8, “For example, the linguistic content may include text that corresponds to the speech (e.g., the speech and the linguistic content may be training data for the device).”; Column 9, lines 45-47, “The neural network may be configured to learn mapping from an input sequence (e.g., linguistic features) to output sequence (e.g., phase vectors).”);
identifies first feature information from the first time-series data (Column 1, lines 36-38, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data.");
learns a parameter of the neural network (Column 9, lines 52-56, "For example, parameters of the statistical distributions may correspond to outputs of the neural network, and weights of the neural network may be trained based on an error measure associated with the statistical distributions."  Adjusting weights of the neural network during training demonstrates learning a parameter of the neural network.);
infers second feature information from the first time-series data and the identified first feature information based on the neural network (Column 1, lines 36-41, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."; Column 9, lines 41-45, “An example for neural network implementations and variants of neural networks (e.g., mixture density network, recurrent neural network, long short-term memory, etc.) of the mapping module 114 for the statistical mapping is as follows.”  Determining the acoustic feature parameter of phase data from the speech signal and mapping the phase data to linguistic features demonstrates inferring second feature information from the time-series data and the first feature information.),
wherein a meaning is assigned to the first feature information, the meaning is not assigned to the second feature information (Column 1, lines 36-45, "The method also includes determining acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The method also includes determining circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The method also includes mapping the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content."  The acoustic feature parameters correspond to the feature information, with the phase data corresponding to the first feature information with meaning assigned, and the circular space representation and linguistic features corresponding to the second feature information with meaning not assigned.),
the first feature information includes information corresponding to a context of the first time-series data (Column 3, line 64 - column 4, line 1, "Additionally, for example, the linguistic features may include context features such as preceding/following phonemes, position of speech sound within the speech, distance from stressed/accented syllable in the speech, prosodic context, length of speech sound, etc."),
and generates second time-series data based on the first feature information and the second feature information (Column 1, lines 45-47, "The method also includes providing a synthetic audio pronunciation of the linguistic content based on the mapping."  The mapping refers to mapping the phase data to linguistic features based on the circular space representations, demonstrating the generation of time-series synthetic audio derived from the feature information.);
and acquiring the generated second time-series data from the second information processing apparatus (Column 10, lines 51-55, "In some examples, the device 100 in FIG. 1C may be configured to provide an output that includes synthetic speech 152 indicative of a synthetic audio pronunciation of the linguistic content 150. The output, for example, may be provided via the output interface 104"; Column 5, lines 6-13, “Additionally or alternatively, the input interface 102 and/or the output interface 104 may include network interface components configured to, respectively, receive and/or transmit the input signal and/or the output signal described above.”).
Agiomyrgiannakis does not specifically disclose: the information corresponding to the context includes information related to a conversational partner of a speaker.
Yella teaches:
the information corresponding to the context includes information related to a conversational partner of a speaker (Abstract, lines 11-14, “We trained a classifier using a boosting algorithm on a set of conversational and acoustic features and use it to classify calls according to the social relationship between both speakers.”).
Yella teaches identifying feature information corresponding to a conversational partner of the speaker in order to detect the social relationship between two people (Abstract, lines 1-9, “People usually speak differently depending on who they talk to. Based on this hypothesis, in this paper we propose an automatic method to detect the social relationship between two people based solely on a set of acoustic and conversational characteristics. We argue that changes in these features of an individual reflect the social relationship with the other person. To infer relationship we only require the speech of one of the conversation partners and the interaction patterns between both speakers.”).
Agiomyrgiannakis and Yella are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis to incorporate the teachings of Yella and identify feature information corresponding to a conversational partner of the speaker.  Doing so would allow for detecting the social relationship between two people (Yella, Abstract, lines 1-9).
Regarding claim 21, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1.
Agiomyrgiannakis further discloses:
wherein the second feature information includes utterance style information of the speaker (Column 16, lines 39-46, “By way of example, the method 600 may associate the phase data (and/or the amplitude data) in the acoustic feature parameters with linguistic features such as a phonemic representation of the speech. Identifying such linguistic features may be enhanced by the method 600, for example, due to incorporating the phase data to characterize context features such as prosodic context of the speech.”  Linguistic features such as a phonemic representation of the speech read on utterance style information of the speaker.).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Agiomyrgiannakis in view of Yella as applied to claim 1 above, and further in view of Djellab et al. ("Regional Accents Recognition based on i-vectors approach: The Case of the Algerian linguistic environment"), hereinafter Djellab.
Regarding claim 6, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1, but does not specifically disclose: wherein the information corresponding to the context includes information related to a region in which the speaker is located.
Djellab teaches:
wherein the information corresponding to the context includes information related to a region in which the speaker is located (Abstract, lines 4-6, “This work presents some preliminary results about the Algerian regional accents recognition using the i-vectors approach.”).
Djellab teaches identifying feature information corresponding to a regional location of the speaker in order to improve speech recognition (Section I, lines 9-14, “Investigating dialects and regional accents can provide important benefits to speech technology beyond improving speech recognition. It can help in speaker recognition by narrowing the search space at the front end once features used in Automatic Speaker Recognition Systems (ASRS) are adapted to regional origin.”).
Agiomyrgiannakis, Yella, and Djellab are considered to be analogous to the claimed invention because they are in the same field of speech processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis in view of Yella to incorporate the teachings of Djellab and identify feature information corresponding to a regional location of the speaker.  Doing so would allow for improving speech recognition (Djellab, Section I, lines 9-14).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Agiomyrgiannakis in view of Yella as applied to claim 1 above, and further in view of Trilla et al. ("Sentence-Based Sentiment Analysis for Expressive Text-to-Speech"), hereinafter Trilla.
Regarding claim 7, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1, but does not specifically disclose: wherein the information corresponding to the context includes information related to a subject corresponding to a speech from the speaker.  
Trilla teaches:
 wherein the information corresponding to the context includes information related to a subject corresponding to a speech from the speaker (Section 1, lines 14-17, "This work is focused on the latter, since the detection and classification of the expression present in textual input is the requisite first step in the fully automatic generation of naturally expressive synthetic speech"; Section 1, lines 19-21, "Some researchers in TTS tend to relate expression in speech with domain (i.e., the topic)").
Trilla teaches that generating expressive synthetic speech, with expression in the speech determined from the subject of the speech, allows the synthetic speech to convey the social and psychological aspects of the speech (Section 1, lines 1-3, "Speech researchers are increasingly focusing on the full range and variation of speech in order to signal the social and psychological aspects of a message.").
Agiomyrgiannakis, Yella, and Trilla are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis in view of Yella to incorporate the teachings of Trilla and use context information related to a subject to generate expressive synthetic speech.  Doing so would allow the synthetic speech to convey the social and psychological aspects of the speech (Trilla, Section 1, lines 1-3).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Agiomyrgiannakis in view of Yella as applied to claim 3 above, and further in view of Browne (US Patent No. 6,297,439).
Regarding claim 8, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 3, but does not specifically disclose: wherein the information indicating the framework is musical score information, and the second time-series data is music data performed in accordance with the musical score information.
Browne teaches:
wherein the information indicating the framework is musical score information, and the second time-series data is music data performed in accordance with the musical score information (Column 3, lines 9-13, "Referring to FIG. 1, there is shown a schematic of a system 1 for automatically generating music on the basis of an initial note sequence input. The system 1 includes a score interpreter 2, which generates duration data, context data and pitch data from an input musical score 10.").
Browne teaches that generating music based on the information derived from a musical score allows the generated music to emulate a certain musical style or the music of a certain composer (Column 1, lines 42-44, " It is an object of the present invention to provide an improved automatic music generation system for generating music which is evocative of a given style or composer.").
Agiomyrgiannakis, Yella, and Browne are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis in view of Yella to incorporate the teachings of Browne and use context information related to a musical score to generate music.  Doing so would allow the generated music to emulate a certain musical style or the music of a certain composer (Browne, Column 1, lines 42-44).
Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Agiomyrgiannakis in view of Yella, and further in view of Endres et al. (“Learning the Dynamics of Doors for Robotic Manipulation”), hereinafter Endres.
Regarding claim 9, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1, but does not specifically disclose: wherein in a case where the first time-series data is acquired from a sensor configured to detect a moving body, the information corresponding to the context is information indicating a movement category.
Endres teaches:
wherein in a case where the first time-series data is acquired from a sensor configured to detect a moving body, the information corresponding to the context is information indicating a movement category (Abstract, lines 2-5, "In this paper we present an approach to learn a dynamic model of a door from sensor observations and utilize it for effectively swinging the door open to a desired angle.").
Endres teaches that considering a category of movement while utilizing sensor data from sensors detecting motion allows for developing a model for that category of movement that determines a strategy to perform the movement, and allows for reducing the complexity of the model (Abstract, lines 5-7, "The learned model enables the realization of dynamic door-opening strategies and reduces the complexity of the door opening task.").
Agiomyrgiannakis, Yella, and Endres are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis in view of Yella to incorporate the teachings of Endres to process movement data and use context information related to a movement category when utilizing sensor data from sensors detecting motion to develop a model of the movement.  Doing so would allow the movement model to determine a strategy to perform the movement while reducing the complexity of the model (Endres, Abstract, lines 5-7).
Regarding claim 10, Agiomyrgiannakis in view of Yella and further in view of Endres discloses the information processing apparatus as claimed in claim 9.
Endres further teaches:
wherein the second time-series data includes a control signal that causes an actuator included in the moving body to operate (Section 5A, lines 5-7, "We let the robot push the door using a linear position-controlled motion, such that the door achieves a velocity sufficient to reach the goal state.").
Endres teaches that considering the category of movement while developing a model for a movement and generating control signals for an actuator that performs that movement allows for training mobile robots to perform tasks that they need to perform to operate in human environments (Abstract, lines 1-2, "Opening doors is a fundamental skill for mobile robots operating in human environments."; Abstract, lines 5-7, "The learned model enables the realization of dynamic door-opening strategies and reduces the complexity of the door opening task.").
Agiomyrgiannakis, Yella, and Endres are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis in view of Yella and further in view of Endres to further incorporate the teachings of Endres to process movement data and use context information related to a movement category when developing a model for a movement that generates control signals for an actuator that performs that movement.  Doing so would allow for training mobile robots to perform tasks that they need to perform to operate in human environments (Endres, Abstract, lines 1-2 and 5-7).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Agiomyrgiannakis in view of Yella as applied to claim 1 above, and further in view of Jang et al. (US Patent No. 10,310,702), hereinafter Jang.
Regarding claim 11, Agiomyrgiannakis in view of Yella discloses the information processing apparatus as claimed in claim 1, but does not specifically disclose: wherein the first feature information is edited in accordance with an instruction by the speaker.  
Jang teaches:
wherein the first feature information is edited in accordance with an instruction by the speaker (Column 15, lines 63-67, "The controller 150 or the image display apparatus 100 may recognize a voice corresponding to the voice information and/or recognize the recognized voice as a control command for controlling the operation of the image display apparatus 100."  Recognizing voice commands for controlling the operation of the image display apparatus demonstrates the use of voice commands to control functions of the information processing apparatus, including the editing of feature information.).  Jang teaches that using voice commands provides an additional option for entering user inputs (Column 15, lines 6-8, "The user input unit 220 may include a keypad, a key button, a touch screen, a scroll key, a jog key, etc., to facilitate entering of an input.").
Agiomyrgiannakis, Yella, and Jang are considered to be analogous to the claimed invention because they are in the same field of information processing systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Agiomyrgiannakis in view of Yella to incorporate the teachings of Jang and allow voice commands for controlling functions of the information processing system.  Doing so would provide an additional option for entering user inputs (Jang, Column 15, lines 6-8).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        
	/EDGAR X GUERRA-ERAZO/           Primary Examiner, Art Unit 2656