DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Objections - 37 CFR 1.75(a)
1.	The following is a quotation of 37 CFR 1.75(a):
The specification must conclude with a claim particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention or discovery.

2.          	Claim 3, 12, 14 & 18 are objected to under 37 CFR 1.75(a), as failing to conform to particularly point out and distinctly claim the subject matter which application regards as his invention or discovery. 
	Claims 3 & 14 state in part “displaying on the speech recording interface a preset text sentence “and/or” playing a voicing sentence corresponding to the text sentence; and obtaining the speech data entered by the user according to the text sentence “and/or” voicing sentence displayed”. 
	Here, it is not clear whether the Applicant intends… 
1) displaying on the speech recording interface a preset text sentence “and” playing a voicing sentence corresponding to the text sentence; and obtaining the speech data entered by the user according to the text sentence “and” voicing sentence displayed
	2) displaying on the speech recording interface a preset text sentence “or” playing a voicing sentence corresponding to the text sentence; and obtaining the speech data entered by the user according to the text sentence “or” voicing sentence displayed

	3) displaying on the speech recording interface a preset text sentence “and” playing a voicing sentence corresponding to the text sentence; and obtaining the speech data entered by the user according to the text sentence “or” voicing sentence displayed

4) displaying on the speech recording interface a preset text sentence “or” playing a voicing sentence corresponding to the text sentence; and obtaining the speech data entered by the user according to the text sentence “and” voicing sentence displayed.
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes that Claims 3 & 14 are intended to be examined as indicated in #2 above.

Claims 12 & 18 state in part “sending the client a text sentence “and/or” a voicing sentence corresponding to the text sentence, so that the user enters speech data according to the text sentence “and/or’ voicing sentence displayed on the client”.
Here, it is not clear whether the Applicant intends…
1) sending the client a text sentence “and” a voicing sentence corresponding to the text sentence, so that the user enters speech data according to the text sentence “and” voicing sentence displayed on the client
2) sending the client a text sentence “or” a voicing sentence corresponding to the text sentence, so that the user enters speech data according to the text sentence “or” voicing sentence displayed on the client
3) sending the client a text sentence “and” a voicing sentence corresponding to the text sentence, so that the user enters speech data according to the text sentence “or” voicing sentence displayed on the client
4) sending the client a text sentence “or” a voicing sentence corresponding to the text sentence, so that the user enters speech data according to the text sentence “and” voicing sentence displayed on the client
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes that Claims 12 & 18 are intended to be examined as indicated in #2 above.
The Examiner has tried to interpret the claims, as best the Examiner can ascertain, to develop an appropriate prior art rejection in the interests of compact prosecution. If any interpretation of the Examiner's is considered incorrect or off-base, the Examiner invites the Applicant to show the portions of the Applicant's specification which give a more proper interpretation of the claimed subject matter.
Appropriate correction is required.


Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1-4, 11-14 & 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Holdren et al. (US 20150248881 A1 hereinafter, Holdren ‘881) in combination with Rajagopalan et al. (US 20150100318 A1, hereinafter, Rajagopalan ‘318).
Regarding claim 13; Holdren ‘881 discloses an electronic device (Fig. 1, Vehicle Electronics 28), implemented in a client side (i.e. A client computer used by the vehicle owner or other subscriber for such purposes as accessing or receiving vehicle data or to setting up or configuring subscriber preferences or controlling vehicle functions. Paragraph 0026) 
wherein the electronic device comprises: 
at least one processor (Fig. 1, Electronic Processing Device 52) 
and a storage (Fig. 1, Memory Device 54) 
communicatively connected with the at least one processor (i.e. Fig. 1 shows wherein Electronic Processing Device 52 and Memory Device 54 are coupled together in Vehicle Electronics 28. The vehicle electronics 28 is shown generally in Fig. 1 can be connected directly to the telematics unit and indirectly connected using one or more network connections, such as a communications bus 44 or an entertainment bus 46.  Paragraph 0015);
wherein, the storage stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform a method for generating a speech packet (i.e. The speech recognition decoder 314 processes the feature vectors using the appropriate acoustic models, grammars, and algorithms to generate an N-best list of reference patterns. Those skilled in the art will recognize that reference patterns can be generated by suitable reference pattern training of the ASR system and stored in memory.  Paragraph 0052)
wherein the method comprises: 
providing a speech recording interface to a user (i.e. At step 430, a vehicle occupant is asked for a recitation of the text data. The vehicle telematics unit 30 can then prompt or ask the vehicle occupant to verbally recite the text data and follow the prompt with a listening period during which speech from the microphone 32 is received and recorded. Paragraph 0057)
obtaining speech data entered by the user after obtaining an event of triggering speech recording on the speech recording interface (i.e. The vehicle telematics unit 30 can then prompt or ask the vehicle occupant to verbally recite the text data and follow the prompt with a listening period during which speech from the microphone 32 is received and recorded. The vehicle telematics unit 30 can then generate a prompt to the vehicle occupant that asks "please recite the text following the tone." After the tone, the vehicle telematics unit 30 can begin listening for the vehicle occupant's correct pronunciation of the artist's name and record that pronunciation as an audio file. Paragraph 0057);
uploading the speech data entered by the user to a server side in response to determining that the speech data entered by the user meets requirements for training a speech synthesis model (i.e. Other such accessible computers 18 can be, for example: a service center computer where diagnostic information and other vehicle data can be uploaded from the vehicle via the telematics unit 30.  In general, a vehicle occupant vocally interacts with an automatic speech recognition system (ASR) for one or more of the following fundamental purposes: training the system to understand a vehicle occupant's particular voice. Generally, ASR extracts acoustic data from human speech, compares and contrasts the acoustic data to stored subword data, selects an appropriate subword which can be concatenated with other selected subwords, and outputs the concatenated subwords or words for post-processing such as dictation or transcription, address book dialing, storing to memory, training ASR models or adaptation parameters, or the like. Paragraphs 0026 & 0041) 
and receiving a downloading address of the speech packet generated by the server side after training the speech synthesis model with the speech data (i.e. When used for packet-switched data communication such as TCP/IP, the telematics unit can be configured with a static IP address or can set up to automatically receive an assigned IP address from another device on the network such as a router or from a network address server. Paragraph 0017). 
Although Examiner reasonably believes that Holdren ‘881 at Paragraph 0016 discloses generating a speech packet. Here, data can be sent either via a data connection, such as via packet data transmission over a data channel, or via a voice channel using techniques known in the art. However, Examiner cites Rajagopalan ‘318 to cure any presumed deficiencies of Holdren ‘881.  
Rajagopalan ‘318 discloses generating a speech packet (i.e. Fig. 4 is a block diagram illustrating one example of an electronic device in which systems and methods for decoding a speech packet may be implemented. Paragraph 0025).
Holdren ‘881 and Rajagopalan ‘318 are combinable because they are from same field of endeavor of speech systems (Rajagopalan ‘318 at “Background”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Holdren ‘881 by adding generating a speech packet as taught by Rajagopalan ‘318. The motivation for doing so would have been advantageous because the use of electronic devices has become common. In particular, advances in electronic technology have reduced the cost of increasingly complex and useful electronic devices. Cost reduction and consumer demand have proliferated the use of electronic devices such that they are practically ubiquitous in modern society. As the use of electronic devices has expanded, so has the demand for new and improved features of electronic devices. More specifically, electronic devices that perform functions faster, more efficiently or with higher quality are often sought after. Therefore, it would have been obvious to combine Holdren ‘881 with Rajagopalan ‘318 to obtain the invention as specified.

Regarding claim 14; Holdren ‘881 discloses wherein the obtaining speech data entered by the user comprises: displaying on the speech recording interface a preset text sentence and/or playing a voicing sentence corresponding to the text sentence; and obtaining the speech data entered by the user according to the text sentence and/or voicing sentence displayed. (i.e. A vehicle occupant can request a song by visually scanning a list of artists shown on the visual display 38 of the vehicle 12 and select a song by an artist, such as the artist named "?uestlove." The vehicle telematics unit 30 can receive this request and respond by audibly repeating the artist's name as speech converted from the text shown on the visual display 38. Paragraph 0055)

Regarding claim 17; Claim 17 contains substantially the same subject matter as claim 13. Therefore, claim 17 is rejected on the same grounds as claim 13. However, claim 17 further discloses implementation in a server side. Paragraph 0045 of Holdren ‘881 discloses wherein grammar models, acoustic models, and the like can be stored in memory of one of the servers 82 and/or databases 84 in the call center 20 and communicated to the vehicle telematics unit 30 for in-vehicle speech processing. Similarly, speech recognition software can be processed using processors of one of the servers 82.

Regarding claim 18; Claim 18 contains substantially the same subject matter as claim 12. Therefore, claim 18 is rejected on the same grounds as claim 12.

Regarding claim 19; Claim 19 contains substantially the same subject matter as claim 13. Therefore, claim 19 is rejected on the same grounds as claim 13. However. claim 19 further discloses a non-transitory computer-readable storage medium storing computer instructions therein, wherein the computer instructions are used to cause the computer to perform a method. Holdren ‘881 discloses at paragraph 0040 wherein the program(s) can be embodied on computer readable media, which can be non-transitory and can include one or more storage devices, articles of manufacture, or the like to execute the method.

Regarding claim 20; Claim 20 contains substantially the same subject matter as claim 17. Therefore, claim 20 is rejected on the same grounds as claim 17. However, Claim 19 further discloses a non-transitory computer-readable storage medium storing computer instructions therein, wherein the computer instructions are used to cause the computer to perform a method. Holdren ‘881 discloses at paragraph 0040 wherein the program(s) can be embodied on computer readable media, which can be non-transitory and can include one or more storage devices, articles of manufacture, or the like to execute the method.

Regarding claim 1; Claim 1 contains substantially the same subject matter as claim 13. Therefore, claim 1 is rejected on the same grounds as claim 13. 

Regarding claim 2; Holdren ‘881 discloses wherein the event of triggering speech recording comprises at least one of: detecting a gesture of triggering speech recording on the speech recording interface; or receiving a speech instruction of triggering speech recording from the user when the speech recording interface is displayed (i.e. To provide some context via an example, a vehicle occupant can request a song by visually scanning a list of artists shown on the visual display 38 of the vehicle 12 and select a song by an artist, such as the artist named "?uestlove." Paragraph 0055).

Regarding claim 3; Claim 3 contains substantially the same subject matter as claim 14. Therefore, claim 3 is rejected on the same grounds as claim 14.

Regarding claim 4; Holdren ‘881 discloses obtaining a speech recognition result by recognizing the speech data entered by the user (i.e. At step 420, the accuracy of the speech converted from text data is detected. Similar to the ASR systems, modern Text-To-Speech systems can be based on Hidden Markov Models. The output from a TTS Engine could be determined using a confidence score or likelihood probability based approach. It can also be determined that the detected accuracy is below a predetermined threshold. The TTS system 200 could classify the speech output as falling into one of three categories: a high confidence result, a medium confidence result, and a low confidence result. Paragraph 0056);
and comparing the speech recognition result with the text sentence to judge whether the speech data entered by the user meets a recording quality requirement (i.e. The unit selector 220 compares output from the synthesis engine 216 to stored speech data and selects stored speech that best corresponds to the synthesis engine output. The speech selected by the unit selector 220 can include pre-recorded sentences, clauses, phrases, words, subwords of pre-recorded words, and/or the like. The selector 220 may use the acoustic models 226 for assistance with comparison and selection of most likely or best corresponding candidates of stored speech. The acoustic models 226 may be used in conjunction with the selector 220 to compare and contrast data of the synthesis engine output and the stored speech data, assess the magnitude of the differences or similarities therebetween, and ultimately use decision logic to identify best matching stored speech data and output corresponding recorded speech. Paragraph 0035).

Regarding claim 11; Claim 11 contains substantially the same subject matter as claim 13. Therefore, claim 11 is rejected on the same grounds as claim 13. However, claim 13 further discloses implementation in a server side. Paragraph 0045 of Holdren ‘881 discloses wherein grammar models, acoustic models, and the like can be stored in memory of one of the servers 82 and/or databases 84 in the call center 20 and communicated to the vehicle telematics unit 30 for in-vehicle speech processing. Similarly, speech recognition software can be processed using processors of one of the servers 82.

Regarding claim 12; Holdren ‘881 discloses sending the client a text sentence and/or a voicing sentence corresponding to the text sentence, so that the user enters speech data according to the text sentence and/or voicing sentence displayed on the client (i.e. The method 400 begins at step 410 by performing text-to-speech conversion of text data at the vehicle 12 and presenting the converted speech via an audio system at the vehicle 12. The default text data can be accessed from the text source 212 and ultimately output to the speaker 230 via the acoustic interface 228. The vehicle telematics unit 30 can receive this request and respond by audibly repeating the artist's name as speech converted from the text shown on the visual display 38. Paragraph 0055).


Allowable Subject Matter
1.	Claims 5-10, 15 & 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

2.	Claim 6 depends on indicated objected claim 5. Therefore, by virtue of its dependency, Claim 6 is also indicated as objected subject matter. 

3.	Claim 10 depends on indicated objected claim 9. Therefore, by virtue of its dependency, Claim 10 is also indicated as objected subject matter. 

4.	Claim 16 depends on indicated objected claim 15. Therefore, by virtue of its dependency, Claim 16 is also indicated as objected subject matter.


Examiners Statement of Reasons for Allowance
The cited reference (Holdren ‘881) teaches a system and method of tuning speech recognition systems includes performing text-to-speech conversion of text data; detecting the accuracy of speech converted from text data; determining that the detected accuracy is below a predetermined threshold; recording a user recitation of the text data in response to the determination; and storing the user recitation in an exception database located at a vehicle.
The cited reference (Rajagopalan ‘318) teaches a method for decoding a speech signal is described. The method includes obtaining a packet. The method also includes obtaining a previous lag value. The method further includes limiting the previous lag value if the previous lag value is greater than a maximum lag threshold. The method additionally includes disallowing an adjustment to a number of synthesized peaks if a combination of the number of synthesized peaks and an estimated number of peaks is not valid.
The cited references fail to disclose wherein the requirements for training the speech synthesis model comprises at least one of:  the speech data entered by the user meets a recording quality requirement; or an amount of the speech data entered by the user meets a preset amount requirement; wherein the uploading the speech data entered by the user to a server side in response to determining that the speech data entered by the user meets requirements for training a speech synthesis model comprises: judging whether a current piece of speech data entered by the user meets the recording quality requirement, in response to determining that the current piece of speech data entered by the user meets the recording quality requirement, obtaining a next piece of speech data entered by the user until the amount of speech data entered by the user meeting the recording quality requirement meets a preset amount requirement; and in response to determining that the current piece of speech data entered by the user does not meet the recording quality requirement, prompting the user to re-enter the current piece of speech data; before obtaining the speech data entered by the user, displaying voice class options on the speech recording interface; and obtaining voice class information selected by the user and updating the voice class information to the server side to train the speech synthesis model; wherein the uploading the speech data entered by the user to a server side in response to determining that the speech data entered by the user meets requirements for training a speech synthesis model comprises: displaying on the speech recording interface a component for uploading the speech data, in response to determining that the speech data entered by the user meets the requirements for training the speech synthesis model; and uploading the speech data entered by the user to the server side after obtaining an event of the user triggering the component for uploading the speech data; displaying a downloading link of the speech packet, wherein the downloading link includes the downloading address of the speech packet; and downloading the speech packet from the server side after obtaining an event of the user triggering the downloading link, and integrating the speech packet to the client so that the client performs speech broadcast using the speech packet; wherein the client performing speech broadcast using the speech packet comprises one of: sending a broadcast text and model parameters included in the speech packet to the server side, so that the server side performs speech synthesis with the text and the model parameters to obtain a broadcast speech; or invoking the speech synthesis model so that the speech synthesis model performs speech synthesis with the broadcast text and model parameters included in the speech packet to obtain the broadcast speech; wherein the method further comprising: displaying a downloading link of the speech packet, wherein the downloading link includes the downloading address of the speech packet; and downloading the speech packet from the server side after obtaining an event of the user triggering the downloading link, and integrating the speech packet to the client so that the client performs speech broadcast using the speech packet; wherein the client performing speech broadcast using the speech packet comprises one of: sending a broadcast text and model parameters included in the speech packet to the server side, so that the server side performs speech synthesis with the text and the model parameters to obtain a broadcast speech; or invoking the speech synthesis model so that the speech synthesis model performs speech synthesis with the broadcast text and model parameters included in the speech packet to obtain the broadcast speech. 
As a result, and for these reasons, Examiner indicates Claims 5-10, 15 & 16 as objectionable subject matter. 

Relevant Prior Art References Not Relied Upon
1.	Hoffberg et al. (US 6,640,145 B2) - An intelligent media device, comprising a packet data communications interface; a media communication interface for receiving audio and/or video data; a digital memory for persistently storing received audio and/or video data; and an intelligent server for generating a virtual interface for controlling the media communication interface and the digital memory through said packet data communications interface. The intelligent server may be adaptive. A variety of devices may be interfaced through the packet data communications interface, including telephony, imaging, videoconferencing, security, alarm, environmental control, vehicular, illumination system, domestic appliance; fluid and handling systems, as well as consumer electronic devices. A digital rights manager for enforcing a set of externally supplied restrictions associated with the received audio and/or video data may be incorporated, with a cryptographic processor for selectively cryptoprocessing audio and/or video data in dependence on said rights manager being provided to limit access to the audio and/or video data content.

2.	Lakaniemi (US 7,573,907 B2) - Packets for a discontinuous transmission of a speech signal via a packet switched network may be provided in shorter transmission intervals during an active state and in longer transmission intervals during an inactive state. The active state may be selected whenever a speech signal comprises a speech burst, optionally with a hangover period after a respective speech burst. For enhancing the control of an adaptive jitter buffer at a receiver at the beginning of a respective transmission session, an active state is enforced in addition for a predetermined period at a beginning of a transmission session, irrespective of a presence of speech bursts. In case hangover periods are used, the length of the predetermined period exceeds the length of these hangover periods.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCUS T. RILEY, ESQ. whose telephone number is (571)270-1581. The examiner can normally be reached 9-5 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy P. Goddard can be reached on 517-272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARCUS T. RILEY, ESQ.
Primary Examiner
Art Unit 2677



/MARCUS T RILEY/Primary Examiner, Art Unit 2677