DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-17 and 19-20 have been considered but are moot because of the new ground of rejection in view of Vilermo for claims 1-8, 13-16 and 19-20; Vilermo and Naik for claim 9; Vilermo and Mizuki for claims 10-12; and, Vilermo and Gagner for claim 17. The examiner notes that a new ground of rejection is given on the basis of the amendments requiring that the analysis of the utterance and non-utterance section be of identifying sections of content that include spoken voice output and content that includes no spoken voice output and the extension is performed on the basis of the analysis.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: “output control unit” in claims 1-18, “analyzer” in claim 18 and “voice output unit” in claim 19.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-8, 13-16 and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Vilermo (US PG Pub 20150380054).
	As per claims 1 and 20, Vilermo discloses:	An information processing apparatus and method comprising:	at least one processor configured to (Vilermo; Fig. 3, item 72; p. 0036 – a processor):	execute an analytic process on content to identify an utterance section of the content and a non-utterance section of the content (Vilermo; p. 0049 - For each audio object of an audio signal, the apparatus 70 may include means, such as the processor 72 or the like, for determining each audio object to be either a transient object or a non-transient object… In an example embodiment, the transient and non-transient objects include speech and non-speech objects, respectively; see also p. 0056-0057), wherein 	the utterance section of the content refers to a first section of the content that includes spoken voice output, and the non-utterance section of the content refers to a second section of the content that includes no spoken voice output (Vilermo; p. 0049 - For each audio object of an audio signal, the apparatus 70 may include means, such as the processor 72 or the like, for determining each audio object to be either a transient object or a non-transient object… In an example embodiment, the transient and non-transient objects include speech and non-speech objects, respectively; see also p. 0056-0057);	extend the non-utterance section of the content based on the identification of the utterance section and the non-utterance section of the content (Vilermo; Fig. 5; p. 0058-0060 - During a respective period of time, the average level of the non-speech signal at standard speed may be determined. Thereafter, at slow motion speed, the level associated with each segment of the non-speech signal may be extended or multiplied by the multiple, such as 3 times, such that the same plurality of discrete levels are associated with the extended representation of the non-speech signal, albeit with each level extending longer, such as 3 times, relative to the corresponding level at standard speed. As noted above, the level of the extended non-speech signal may be changed from segment to segment more gradually than that depicted in FIG. 5 and, in some embodiments, a non-speech object may be repeated with some overlap so as to mask the boundaries between the repeated non-speech objects); 	control an output of a spoken utteranceIn order to mask the leakage between the speech and non-speech objects, the apparatus 70, such as the processor 72, of an example embodiment may be configured to synchronize the segments of the speech object, e.g., Word 1, Word 2 and Word 3, with the non-speech object during replay in slow motion).

As per claim 2, Vilermo discloses: 	The information processing apparatus according to claim 1, wherein the at least one processor is further configured to extend the non-utterance section of the content based on a detail of the content (Vilermo; Fig. 5; p. 0058-0060 - During a respective period of time, the average level of the non-speech signal at standard speed may be determined. Thereafter, at slow motion speed, the level associated with each segment of the non-speech signal may be extended or multiplied by the multiple, such as 3 times, such that the same plurality of discrete levels are associated with the extended representation of the non-speech signal, albeit with each level extending longer, such as 3 times, relative to the corresponding level at standard speed. As noted above, the level of the extended non-speech signal may be changed from segment to segment more gradually than that depicted in FIG. 5 and, in some embodiments, a non-speech object may be repeated with some overlap so as to mask the boundaries between the repeated non-speech objects).

As per claim 3, Vilermo discloses	The information processing apparatus according to claim 2, wherein the at least one processor is further configured to extend the non-utterance section of the content based on a During a respective period of time, the average level of the non-speech signal at standard speed may be determined. Thereafter, at slow motion speed, the level associated with each segment of the non-speech signal may be extended or multiplied by the multiple, such as 3 times, such that the same plurality of discrete levels are associated with the extended representation of the non-speech signal, albeit with each level extending longer, such as 3 times, relative to the corresponding level at standard speed. As noted above, the level of the extended non-speech signal may be changed from segment to segment more gradually than that depicted in FIG. 5 and, in some embodiments, a non-speech object may be repeated with some overlap so as to mask the boundaries between the repeated non-speech objects).

As per claim 4, Vilermo discloses:	The information processing apparatus according to claim 2, wherein the at least one processor is further configured to extend the non-utterance section of the content based on relevant information related to the detail of the content (Vilermo; Fig. 5; p. 0058-0060 - During a respective period of time, the average level of the non-speech signal at standard speed may be determined. Thereafter, at slow motion speed, the level associated with each segment of the non-speech signal may be extended or multiplied by the multiple, such as 3 times, such that the same plurality of discrete levels are associated with the extended representation of the non-speech signal, albeit with each level extending longer, such as 3 times, relative to the corresponding level at standard speed. As noted above, the level of the extended non-speech signal may be changed from segment to segment more gradually than that depicted in FIG. 5 and, in some embodiments, a non-speech object may be repeated with some overlap so as to mask the boundaries between the repeated non-speech objects).

As per claim 5, Vilermo discloses:	The information processing apparatus according to claim 2, wherein the at least one processor is further configured to determine a length of the extension of the non-utterance section of the content based on a duration of the output of the spoken utterance (Vilermo; Fig. 5; p. 0058-0060 - During a respective period of time, the average level of the non-speech signal at standard speed may be determined. Thereafter, at slow motion speed, the level associated with each segment of the non-speech signal may be extended or multiplied by the multiple, such as 3 times, such that the same plurality of discrete levels are associated with the extended representation of the non-speech signal, albeit with each level extending longer, such as 3 times, relative to the corresponding level at standard speed. As noted above, the level of the extended non-speech signal may be changed from segment to segment more gradually than that depicted in FIG. 5 and, in some embodiments, a non-speech object may be repeated with some overlap so as to mask the boundaries between the repeated non-speech objects).

As per claim 6, Vilermo discloses:	The information processing apparatus according to claim 1, wherein the at least one processor is further configured to: control consecutive reproduction of a plurality of pieces of the content (Vilermo; Fig. 5; p. 0059 - In order to mask the leakage between the speech and non-speech objects, the apparatus 70, such as the processor 72, of an example embodiment may be configured to synchronize the segments of the speech object, e.g., Word 1, Word 2 and Word 3, with the non-speech object during replay in slow motion); determine that a non-utterance section of a first piece of the plurality of pieces of the content is one of extendable or non extendable; extend a non-utterance section of a second piece of the plurality of pieces of the content based on the determination of the first piece of the content as non extendable, wherein the second piece of the content reproduces subsequent to the reproduction of the first piece of the content (Vilermo; Fig. 5; p. 0058-0060 - During a respective period of time, the average level of the non-speech signal at standard speed may be determined. Thereafter, at slow motion speed, the level associated with each segment of the non-speech signal may be extended or multiplied by the multiple, such as 3 times, such that the same plurality of discrete levels are associated with the extended representation of the non-speech signal, albeit with each level extending longer, such as 3 times, relative to the corresponding level at standard speed. As noted above, the level of the extended non-speech signal may be changed from segment to segment more gradually than that depicted in FIG. 5 and, in some embodiments, a non-speech object may be repeated with some overlap so as to mask the boundaries between the repeated non-speech objects); and control the output of the spoken utterance during the reproduction of the non-utterance section of the second piece of the content (Vilermo; Fig. 5; p. 0059 - In order to mask the leakage between the speech and non-speech objects, the apparatus 70, such as the processor 72, of an example embodiment may be configured to synchronize the segments of the speech object, e.g., Word 1, Word 2 and Word 3, with the non-speech object during replay in slow motion).

claim 7, Vilermo discloses:	The information processing apparatus according to claim 6, wherein the at least one processor is further configured to control a reproduction order of the plurality of pieces of the content based on a characteristic of the spoken utterance (Vilermo; Fig. 5; p. 0059 - In order to mask the leakage between the speech and non-speech objects, the apparatus 70, such as the processor 72, of an example embodiment may be configured to synchronize the segments of the speech object, e.g., Word 1, Word 2 and Word 3, with the non-speech object during replay in slow motion).

As per claim 8, Vilermo discloses:	The information processing apparatus according to claim 7, wherein the output control unit moves up the reproduction order of the content including the non-utterance section adapted to the output of the spoken utterance on a basis of an importance degree of the spoken utterance (Vilermo; Fig. 5; p. 0059 - In order to mask the leakage between the speech and non-speech objects, the apparatus 70, such as the processor 72, of an example embodiment may be configured to synchronize the segments of the speech object, e.g., Word 1, Word 2 and Word 3, with the non-speech object during replay in slow motion).

As per claim 13, Vilermo discloses:	The information processing apparatus according to claim 1, wherein the content includes video content (Vilermo; p. 0034 - A method, apparatus and computer program product are provided in accordance with an example embodiment of the present invention in order to maintain synchronization, such as both in time and direction, between audio signals and video signals as the video signals are played with modified motion, such as in slow motion).

As per claim 14, Vilermo discloses:	The information processing apparatus according to claim 13, wherein the at least one processor is further configured to extend a non-utterance section of the video content based on by using a still image extracted from the video content (Vilermo; p. 0060 - As shown in FIG. 6, for example, the non-speech object may be repeated a number of times dependent upon the slow motion of the video).

As per claim 15, Vilermo discloses:	The information processing apparatus according to claim 13, wherein the at least one processor is further configured to extend a non-utterance section of the video content based on a still image related to a detail of the video content (Vilermo; p. 0060 - As shown in FIG. 6, for example, the non-speech object may be repeated a number of times dependent upon the slow motion of the video).

As per claim 16, Vilermo discloses:	The information processing apparatus according to claim 1, wherein the content includes audible content (Vilermo; p. 0005 - A method, apparatus and computer program product are provided in accordance with an example embodiment in order to facilitate synchronization of audio signals with corresponding video images that are replayed with a modified motion, such as in slow motion).

As per claim 19, Vilermo discloses:	The information processing apparatus according to claim 1, further comprising a voice output unit that outputs the spoken utterance (Vilermo; Fig. 5; p. 0059 - In order to mask the leakage between the speech and non-speech objects, the apparatus 70, such as the processor 72, of an example embodiment may be configured to synchronize the segments of the speech object, e.g., Word 1, Word 2 and Word 3, with the non-speech object during replay in slow motion).	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Vilermo in view of Naik (US PG Pub 20170346872).
	As per claim 9, Vilermo discloses:
Vilermo, however, fails to disclose wherein the output control unit controls the output of the spoken utterance on a basis of respective pieces of notification information received from at least one or more terminals.
Naik does teach wherein the output control unit controls the output of the spoken utterance on a basis of respective pieces of notification information received from at least one or more terminals (Naik; Fig. 6, items 610-614; p. 0102-0104 - At step 614, user device 102 can present the audio notification at the determined location. For example, notification application 106 can present the generated audio notification between media items, during a lull in a media item, or in response to the user pausing, rewinding, skipping a media item or otherwise modifying playback of the media stream).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the apparatus of Vilermo to include wherein the output control unit controls the output of the spoken utterance on a basis of respective pieces of notification information received from at least one or more terminals, as taught by Naik, in order to provide notifications to a user without interrupting media item playback (Naik; p. 0004).

Claims 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Vilermo in view of Mizuki et al. (JP 2006047237; hereinafter “Mizuki”).

claim 10, Vilermo discloses:	The information processing apparatus according to claim 1, upon which claim 10 depends.	Vilermo, however, fails to disclose wherein the content includes music content. Mizuki does teach wherein the content includes music content (Mizuki; p. 0002 – music DJ functionality).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the apparatus of Vilermo to include wherein the content includes music content, as taught by Mizuki, in order to provide voice based song introductions and navigation guidance in vehicles (Mizuki; p. 0001).	As per claim 11, Vilermo in view of Mizuki discloses:	The information processing apparatus according to claim 10, upon which claim 11 depends.	And further, Mizuki teaches wherein a non-utterance section of the music content includes at least one of a prelude, an interlude, or a postlude in the music content, and the at least one processor is further configured to extend the non-utterance section of the music content based on control of repeat reproduction of the at least one of the prelude, the interlude, or the postlude (Mizuki; p. 0017 - When the music data management unit 31 (FIG. 4) requests the continuous music playback / music introduction DJ from the operation key unit 18a (step 601), the music data management unit 31 reads the music information (song information) of the first song from the HDD 15, The data is input to the audio signal generation unit 20 via the data / music information output unit 32. The audio signal generator 20 creates a music introduction audio signal based on the input music information, inputs it to the music playback device 13, and outputs the music introduction audio from the speaker 14 (music introduction DJ: step 602)).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the apparatus of Vilermo to include wherein a non-utterance section of the music content includes at least one of a prelude, an interlude, or a postlude in the music content, and the at least one processor is further configured to extend the non-utterance section of the music content based on control of repeat reproduction of the at least one of the prelude, the interlude, or the postlude, as taught by Mizuki, in order to provide voice based song introductions and navigation guidance in vehicles (Mizuki; p. 0001).	As per claim 12, Vilermo in view of Mizuki discloses:	The information processing apparatus according to claim 10, upon which claim 12 depends.	And further, Mizuki discloses wherein the spoken utterance includes one of basic information or additional information related to the music content (Mizuki; p. 0017 - When the music data management unit 31 (FIG. 4) requests the continuous music playback / music introduction DJ from the operation key unit 18a (step 601), the music data management unit 31 reads the music information (song information) of the first song from the HDD 15, The data is input to the audio signal generation unit 20 via the data / music information output unit 32. The audio signal generator 20 creates a music introduction audio signal based on the input music information, inputs it to the music playback device 13, and outputs the music introduction audio from the speaker 14 (music introduction DJ: step 602)).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the apparatus of Vilermo to include wherein the spoken utterance includes one of basic information or additional information related to the music content, as taught by Mizuki, in order to provide voice based song introductions and navigation guidance in vehicles (Mizuki; p. 0001).

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Vilermo in view of Gagner (US PG Pub 20100240455).

As per claim 17, Vilermo discloses the information processing apparatus according to claim 1, upon which claim 17 depends.	Vilermo, however, fails to disclose wherein the content includes game content. 	Gagner does teach wherein the content includes game content (Gagner; [0007] - determining a player identifier associated with a wagering game player; using the player identifier to access a user account comprising one or more user preferences; comparing the one or more user preferences to secondary content, wherein said comparing comprises analyzing the secondary content to find a correlation with the one or more user preferences, resulting in correlated secondary content; generating control information that references the correlated secondary content; and causing a device to process the control information to present the correlated secondary content during the wagering game).
.

Conclusion
	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art includes:	Bostick (US PG Pub 20170292853) which teaches a method for tailoring voice navigation instruction output [0005] to be played during playback of music currently playing an instrumental portion of a song.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RODRIGO A CHAVEZ/Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658