Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 10/21/2019.  These drawings are accepted.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-4,7-9,12-17,19-20 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Hoffmeister et al (Patent No.: 9070367)
Claim 1, Hoffmeister et al discloses
receiving, at a device, a portion of an audio file (Fig. 11, label 1102, Col. 4, lines 53-62 discloses a recognition score may represent a probability that a portion of audio data corresponds to a particular phoneme, word or phrase …”.), wherein the portion of 
comparing the portion of the voice signal to one or more guidance queries (Fig. 11, label 1104. Col. 4, lines 34-44 discloses “the ASR module 214 may compare the input audio data with models for sounds (e.g., speech units or phonemes) and sequences of sounds (e.g., speech units or phonemes) and sequences of sounds to identify words and phrases that match the sequence of sounds spoken in the utterance of the audio data.” Col. 5, lines 15-18 discloses “compares the speech recognition data with acoustic, language and other data models and information stored in the speech storage 220 for recognizing the speech containing the original audio data.” Col. 6, lines 25-30 discloses the speech storage 220 contains individual user speech input.),
wherein the one or more guidance queries correspond to portions of one or more voice queries (Col. 6, lines 25-30 discloses the speech storage 220 contains individual user speech input.);
determining, based on the comparing the portion of the voice signal to the one or more guidance queries, that the portion of the voice signal is not capable of being processed at a cache associated with device (Fig. 11, label 1110 shows the audio signal is transmitted to the server for speech recognition processing as a result of the determination at label 1106.); and
sending the portion of the voice signal for processing (Fig. 11, label 1110).
Claim 2, Hoffmeister et al discloses the voice query comprises a plurality of utterances (Col. 4, lines 28-30 discloses “Audio data including spoken utterances may be processed in real time or may be saved and processed at a later time.”), and wherein 
Claim 3, Hoffmeister et al disclose wherein determining that portion of the audio file is not capable of being processed at the cache associated with the device (Fig. 11, label 1110,1106) comprises determining, prior to receiving at least one other portion of the audio file, that the portion of the audio file is not capable of being processed at the cache associated with the device. (Col. 11, lines 44-50 disclose the local device processes the portion of the audio signal containing frequent phrase or word and transmits all or only a remainder (such as additional speech) of the audio over the network to the remote device for ASR processing. Such can occur before or after reception of at least one other portion of the audio file since speech recognition is conducted according to speech models of frequently spoken utterances as disclosed in Col. 11, lines 57-62.)
Claim 4, Hoffmeister et al disclose sending the portion of the audio file for processing comprises sending, after receiving the at least one other portion of the audio file, the portion of the audio file for processing (Col. 11, lines 44-50 disclose the local device processes the portion of the audio signal containing frequent phrase or word and transmits all or only a remainder (such as additional speech) of the audio over the network to the remote device for ASR processing. Such can occur before or after reception of at least one other portion of the audio file since speech recognition is conducted according to speech models of frequently spoken utterances as disclosed in Col. 11, lines 57-62.).

Claim 8, Hoffmeister et al discloses
receiving, at a device (Fig. 2), a portion of an audio file (Fig. 11, label 1102, Col. 4, lines 53-62 discloses a recognition score may represent a probability that a portion of audio data corresponds to a particular phoneme, word or phrase …”.), wherein the portion of the voice signal corresponds to a portion of a voice query (Col. 4, lines 28-30 discloses “Audio data including spoken utterances may be processed in real time or may be saved and processed at a later time.”);
comparing the portion of the voice signal to one or more guidance queries (Fig. 11, label 1104. Col. 4, lines 34-44 discloses “the ASR module 214 may compare the input audio data with models for sounds (e.g., speech units or phonemes) and sequences of sounds (e.g., speech units or phonemes) and sequences of sounds to identify words and phrases that match the sequence of sounds spoken in the utterance of the audio data.” Col. 5, lines 15-18 discloses “compares the speech recognition data with acoustic, language and other data models and information stored in the speech storage 220 for recognizing the speech containing the original audio data.” Col. 6, lines 25-30 discloses the speech storage 220 contains individual user speech input.),

determining, based on the comparing the portion of the audio file to the one or more guidance queries, to monitor for at least one other portion of the audio file (Fig. 11, label 1104. Col. 4, lines 34-44 discloses comparing the input audio data with models for sounds and sequences of sounds. Lines 54-67 discloses portion of audio data corresponding to a particular phoneme in order to determine whether a particular set of words matches those spoken in the utterance (lines 36-54). As the speech recognition module determines whether the spoken utterance matches models for sounds and sequences of sounds to identify words and phrases, monitoring for at least one other portion of the audio file as well as the portion of the audio file is performed.).
Claim 9, Hoffmeister et al discloses determining to monitor for the at least one other portion of the voice signal comprises determining that the portion of the audio file corresponds to at least one of the one or more guidance queries (Col. 4, lines 34-44 discloses comparing the input audio data with models for sounds and sequences of sounds. Lines 54-67 discloses portion of audio data corresponding to a particular phoneme in order to determine whether a particular set of words matches those spoken in the utterance (lines 36-54). Such indicates determination of whether the portion of the audio file corresponds to models for sounds and sequences of sounds. Col. 5, lines 15-18 discloses “compares the speech recognition data with acoustic, language and other 
Claim 12, Hoffmeister et al discloses the voice query comprises a plurality of utterances (Col. 4, lines 28-30 discloses “Audio data including spoken utterances may be processed in real time or may be saved and processed at a later time.”), and 
wherein the portion of the voice query corresponds to a subset of the plurality of utterances (Col. 4, lines 28-30 discloses “Audio data including spoken utterances may be processed in real time or may be saved and processed at a later time.” Any portion of the audio data or audio signal received includes a subset or the spoken utterances.).
Claim 13, Hoffmeister et al discloses the portion of the voice signal is determined based on a particular time interval of the voice query (Col. 4, lines 30-50 discloses comparing the input audio data with models for sounds (e.g., speech units or phonemes) and sequences of sounds to identify words and phrases of the spoken utterances. Such indicates particular time interval such as time interval of speech units or phonemes.) 
Claim 14, Hoffmeister et al discloses 
receiving, at the device (Fig. 2), at least one other portion of the audio file (Col. 4, lines 28-30 discloses “Audio data including spoken utterances may be processed in real time or may be saved and processed at a later time.”), wherein the at least one other portion of the audio file corresponds to at least one other portion of the voice query (Col. 4, lines 28-30 discloses the audio data includes spoken utterances, which indicates any portion of the audio data in the audio file includes at least one other portion of the voice query or spoken utterances.);
determining that the portion of the audio file and the at least one other portion of the audio file correspond to a stored voice query (Col. 4, lines 53-62,34-38 discloses 
processing, at a cache of the device (Fig. 11, label 1108), based on the determining that the portion of the audio file and the at least one other portion of the audio file correspond to a stored voice query, the portion of the audio file and the at least one other portion of the audio file (Fig. 11, label 1106,1108, Col. 4, lines 53-62, 34-38 discloses processing the audio file at the local device is performed based on the comparison of the audio file to stored voice query or models of sounds and sequences of sounds.).
Claim 15, Hoffmeister et al discloses
receiving, at the device (Fig. 2), at least one other portion of the audio file (Fig. 11, label 1102, Col. 4, lines 53-62 discloses a recognition score may represent a probability that a portion of audio data corresponds to a particular phoneme, word or phrase …”.), wherein the at least one other portion of the audio file corresponds to at least one other portion of the voice query (Col. 4, lines 28-30 discloses “Audio data including spoken utterances may be processed in real time or may be saved and processed at a later time.”);
determining that the portion of the audio file and the at least one other portion of the voice signal do not correspond to a stored voice query (Fig. 11, label 1104. Col. 4, 
sending, based on the determining that the portion of the audio file and the at least one other portion of the audio file do not correspond to a stored voice query, the portion of the voice signal and the at least one other portion of the audio file for processing (Fig. 11, label 1110, 1106. When 1106 determines a match between the models of sounds and sequence of sounds to utterances in the audio file cannot be found, the portion of the audio file is sent to the server.).
Claim 16, Hoffmeister et al discloses
accessing, by a device (Fig. 2), a voice query (Fig. 10, label 1002);
determining, based on the voice query, a plurality of guidance queries (Fig. 10, label 1004-1018, Col. 12, line 50 - Col. 13, line 25  discloses the device stores the audio signal including the spoken utterance by the user. Speech recognition models are generated for the number of the most frequently spoken utterances. The speech recognition models are updated.), wherein each of the plurality of guidance queries correspond to a portion of the voice query (Col. 13, lines 1-25 discloses the speech 
storing the plurality of guidance queries (Fig. 10, label 1014, Col. 12, line 50 - Col. 13, line 25 discloses storing the speech recognition models or plurality of guidance queries.).
Claim 17, Hoffmeister et al discloses the voice query comprises a plurality of utterances (Fig. 10, label 1002, Col. 12, lines 50-53 discloses audio signal is received. Col. 11, lines 44-50 discloses the received audio signal includes frequent phrase or word with additional speech.) and wherein the portion of the voice query comprises a subset of the plurality of utterances (Col. 11, lines 44-50 discloses the received audio signal includes frequent phrase or word with additional speech.).
Claim 19, Hoffmeister et al discloses wherein the plurality of guidance queries comprise at least a first guidance query that corresponds to a portion of the voice query and a second guidance query that corresponds to the portion of the voice query and at least one other portion of the voice query (Col. 13, lines 1-20 discloses storage of frequently spoken utterances and Col. 15, lines 20-40 discloses matching the audio input with the stored recognized models. Depending on whether there are frequently spoken utterances that include similar words found in the voice query, the recognized models of frequently spoken utterances may include a portion of the voice query. Col. 12, lines 54-56 discloses storing the audio data or voice query, depending on whether the audio data or voice query is a frequently spoken utterance, the frequently spoken utterance includes the voice query or at least one other portion of the voice query and the portion of the voice query.)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hoffmeister et al (US Patent No.: 9070367) in view of Postelnicu et al (US Patent No.: 9536151).
Claim 18, Hoffmeister et al discloses the plurality of guidance queries (Col. 12, line 50 - Col. 13, line 25 discloses the device stores the audio signal including the spoken utterance by the user. Speech recognition models are generated for the number of the most frequently spoken utterances. The speech recognition models are updated.), but fails to disclose determining, for each of the plurality of guidance queries, an audio fingerprint based on the guidance query.
Postelnicu et al discloses determining, for each of the plurality of guidance queries, an audio fingerprint based on the guidance query (Col. 3, lines 50-67 discloses audio fingerprint is generated for audio track and stored. Fig. 5, label 307 shows the fingerprints database or storage.)
It would be obvious to one skilled in the art before the effective filing date of the application to simply substitute one well known element of a database of guidance queries as disclosed by Hoffmeister et al with another well known element of a database or storage or audio prints of audio track as disclosed by Postelnicu et al so to yield predictable results of a database of stored audio, wherein such audio can include speech. 

Allowable Subject Matter
Claims 5-6,10-11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
CN104866274 discloses voice recognition configured to receive the first voice sent by the wake up unit and recognize the instruction voice part and the content voice part to form the recognition result.
Ushida et al (US Publication No.: 20040010409) discloses voice recognition of an audio performed at a local device. When the local device rejects all the possible candidates, the audio is sent to the server for voice recognition.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044.  The examiner can normally be reached on 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LINDA WONG/Primary Examiner, Art Unit 2656