DETAILED ACTION
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 2 – 36 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 2, 3, 5, 7 - 36 of U.S. Patent No. 11,024,308.  Although the claims at issue are not identical, they are not patentably distinct from each other because claims 2 – 36 of the instant application are similar in scope and content of claims 2, 3, 5, 7 – 36 of the patent from the same applicant.
Here is a comparison between claim 35 of the instant application and claim 2 of the US Patent.
Instant Application 17/303,325
US Patent 11,024,308
Comparison
35. A computer-implemented method comprising:
2. A computer-implemented method comprising:
Same
storing, at a client device, a digital representation of a spoken utterance; sending, to a server computer system, a digital representation of a spoken utterance;
performing primary automatic speech recognition (ASR) processing upon an utterance to produce a dataset including a nominal primary transcription, comprising a sequence of nominal transcribed words;
Similar
receiving, from said server computer system, an indication of at least one putative spoken proper name entity present within said digital representation of said spoken utterance, an indication of at least one putative type of said at least one putative spoken proper name entity, and an indication of an acoustic span associated with said at least one putative spoken proper name entity;
augmenting said dataset with a nominal meaning of said nominal primary transcription; detecting a putative presence of a spoken proper name entity within said nominal primary transcription, wherein said spoken proper name entity is associated with a contiguous portion of said utterance, comprising a target span;
Similar
performing secondary automatic speech recognition (ASR) processing upon said acoustic span of said digital representation of said spoken utterance to produce a transcription and associated meaning for said acoustic span; and
performing an instance of secondary ASR processing upon at least a portion of said utterance including said target span to produce a transcription and an associated meaning of said target span, wherein said instance of secondary ASR processing is performed with an automatic speech recognizer specialized to process a plurality of putative types of said spoken proper name
entity; and
Similar
attributing a meaning to said digital representation of said spoken utterance based on said transcription and associated meaning for said acoustic span.
attributing a revised meaning to said utterance by incorporating said transcription and associated meaning of said target span obtained from said secondary ASR processing into said dataset.
Similar


Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 2 – 8,  12 – 28, 30 – 36 are rejected under 35 U.S.C. 103 as being unpatentable over Abrams et al. (US PAP 2014/0019126) in view of Griggs et al. (US PAP 2011/0125499).
As per claims 2, 35, 36, Abrams et al. teach a client device comprising: 
a processor; and a memory including instructions that, when executed by said processor (paragraph 10), cause said client device to: 
send, to a server computer system, a digital representation of a spoken utterance (“user speech… on a remote computer or entirely on the remote computer or server”; paragraphs 3, 30); 
perform secondary automatic speech recognition (ASR) processing upon said digital representation of said spoken utterance to produce a transcription and associated meaning for said acoustic span (“updating the word dictionary by temporarily adding words from the location data to the word dictionary; and using the updated word dictionary to convert the previously unrecognized portion of the speech to text.”; paragraphs 3, 13); and 
attribute a meaning to said digital representation of said spoken utterance based on at least said transcription and associated meaning for said acoustic span (paragraphs 13 — 20).
However, Abrams et al. does not specifically teach receiving, from said server computer system, an indication of at least one putative spoken proper name entity present within said digital representation of said spoken utterance, an indication of at least one putative type of said at least one putative spoken proper name entity, and an indication of an acoustic span associated with said at least one putative spoken proper name entity.
Griggs et al., disclose that word lattice 108 represents one or more possibilities for words that may occur in audio stream 101 at particular times. Each possible word included in word lattice 108 is associated with a start time t.sub.1, an end time t.sub.2, and a confidence score representative of the probability that the word is a correct match to the word spoken between time t.sub.1 and time t.sub.2 in audio stream 101. Word recognition engine 104 may identify multiple possible words for a given time period or for overlapping time periods, each possible word having a different confidence score... the new word may be a proper name, a word or phrase in a foreign language, a word or phrase related to a current event or recently identified technical problem, or the name of a product or promotion... words potting engine 124 searches audio data 102 to determine time intervals for putative occurrences 125 of the new word in audio stream 101(paragraphs 21 — 24).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to detect putative spoken proper name as taught by Griggs et al. in Abrams et al., because that would help provide an improved method and system for performing speech-to-text recognition of non-dictionary words (Abrams et al., paragraph 2).

	As per claim 3, Abrams et al. in view of Griggs et al., further disclose prior to performing said secondary ASR: receive, from said server computer system, a nominal primary transcription of said digital representation of said spoken utterance along with said indication of said at least one putative spoken proper name entity, said indication of at least one putative type of said at least one putative spoken proper name entity; and said indication of an acoustic span associated with said at least one putative spoken proper name entity (“the location data include any combination of street names, business names, places of interest, and municipality names... someone/person's name]’; Griggs et al., paragraphs 21 — 24; Abrams et al., paragraphs 13 - 25).

	As per claim 4, Abrams et al. in view of Griggs et al., further disclose prior to performing said secondary ASR: receive, from said server computer system, a nominal meaning along with said nominal primary transcription of said digital representation of said spoken utterance (“in response to a portion of the speech being unrecognizable, determining if the speech contains a location-based phrase that contains a term relating to any combination of a geographic origin or destination, a current location, and a route”; Abrams et al., paragraph 3).

	As per claim 5, Abrams et al. in view of Griggs et al., further disclose said meaning attributed to said digital representation of said spoken utterance corresponds to a revised version of said nominal meaning for said nominal primary transcription to include said transcription and associated meaning for said acoustic span (Abrams et al., paragraphs 13 — 20).

	As per claim 6, Abrams et al. in view of Griggs et al., further disclose said nominal primary transcription comprises at least a sequence of nominal transcribed words (Griggs et al. paragraphs 20 — 26; Abrams et al., paragraphs 13 — 25)..

	As per claim 7, Abrams et al. in view of Griggs et al., further disclose said at least a sequence of nominal transcribed words includes associated nominal word timings (“the dictionary enhancer 44 may remove words previously added to the word dictionary 34 after a predetermined period of time, e.g., 1/4 hour to 11/2 hours.”;  Abrams et al. paragraph 25).

	As per claim 8, Abrams et al. in view of Griggs et al., further disclose sending said digital representation of said spoken utterance comprises causing said client device to: cause said server computer to generate a nominal primary transcription including a sequence of nominal transcribed words generated based on a primary ASR processing of said digital representation of said spoken utterance (“in response to a portion of the speech being unrecognizable, determining if the speech contains a location-based phrase that contains a term relating to any combination of a geographic origin or destination, a current location, and a route”; Abrams et al., paragraphs 3, 13).

	As per claim 12, Abrams et al. in view of Griggs et al., further disclose attributing said meaning to said spoken utterance comprises causing said client device to: incorporate said transcription and associated meaning for said acoustic span into a nominal primary transcription of said digital representation of said spoken utterance received from said server computer system (Griggs et al. paragraphs 20 — 26; Abrams et al., paragraphs 13 — 25).

	As per claim 13, Abrams et al. in view of Griggs et al., further disclose attributing said meaning to said spoken utterance comprises causing said client device to: incorporate said transcription and associated meaning for said acoustic span into said nominal primary transcription (Griggs et al. paragraphs 20 — 26; Abrams et al., paragraphs 13 — 25).

	As per claim 14, Abrams et al. in view of Griggs et al., further disclose prior to sending said digital representation of a spoken utterance to said server computer system: capture said spoken utterance at said client device; store, in memory of said client device, said spoken utterance as said digital representation of said spoken utterance (“The speech-to-text system 10 is implemented as an electronic device 12 that may exist in various forms, including a vehicle navigation/entertainment system, a smartphone, tablet, or any other type of device or computer that is equipped with a global positioning system (GPS)”; Abrams et al., paragraph 8).

	As per claim 15, Abrams et al. in view of Griggs et al., further disclose said secondary ASR processing is performed with an automatic speech recognizer that is specialized to process said at least one putative type of said at least one putative spoken proper name entity (Griggs et al. paragraphs 27 - 30; Abrams et al., paragraph 3, and 13).

	As per claim 16, Abrams et al. in view of Griggs et al., further disclose said secondary ASR processing is performed with a plurality of automatic speech recognizers, each said automatic speech recognizer specialized to process a specific putative type of said at least one putative spoken proper name entity (“the location-phrase detector 42 may be configured to recognize geographic origin and destination phrases in the speech 21, such as "I'm coming from"/"I'm leaving from/the"/"I'm on my way to"/"I'm heading towards"/I'm meeting [someone/person's name] at", and the like. The location-phrase detector 42 may be further configured to recognize current location phrases when terms are detected in the speech 21, such as "I'm near"/"I'm right beside"/"I'm next two"/"passing by"/and the like. The location-phrase detector 22 may be further configured to recognize route phrases when terms are detected in the speech 21, such as "I'm traveling [direction] [on/along][highway or street name]"/"turning [right/left] on"/and the like.” paragraph 19).

	As per claim 17, Abrams et al. in view of Griggs et al., further disclose multiple given putative types of proper name entities within said digital representation of said spoken utterance are processed by said secondary ASR processing to produce respective transcriptions and associated meanings (“Word recognition engine 104 may identify multiple possible words for a given time period or for overlapping time periods, each possible word having a different confidence score... the new word may be a proper name, a word or phrase in a foreign language, a word or phrase related to a current event or recently identified technical problem, or the name of a product or promotion... words potting engine 124 searches audio data 102 to determine time intervals for putative occurrences 125 of the new word in audio stream 101”; Griggs et al., paragraphs 21 — 24).

	As per claim 18, Abrams et al. in view of Griggs et al., further disclose said multiple given putative types are associated with one putative spoken proper name entity that is included within said digital representation of said spoken utterance (Griggs et al., paragraphs 21 — 24).

	As per claim 19, Abrams et al. in view of Griggs et al., further disclose said multiple given putative types are associated with a plurality of putative spoken proper name entities included within said digital representation of said spoken utterance; and
wherein one or more of said given putative types is associated with each said putative spoken proper name entity (“the location data include any combination of street names, business names, places of interest, and municipality names... someone/person's name]’; Griggs et al., paragraphs 21 — 24; Abrams et al., paragraphs 13 - 25).

	As per claim 20, Abrams et al. in view of Griggs et al., further disclose prior to performing said secondary ASR: receive, from said server computer system, a nominal primary transcription of said spoken utterance including a contiguous sequence of nominal transcribed words within said acoustic span (Griggs et al., paragraphs 21 — 24; Abrams et al., paragraphs 13 - 25).

	As per claim 21, Abrams et al. in view of Griggs et al., further disclose performing said secondary ASR comprises causing said client device to: perform said secondary ASR on an entirety of said digital representation of said spoken utterance including said acoustic span (“updating the word dictionary by temporarily adding words from the location data to the word dictionary; and using the updated word dictionary to convert the previously unrecognized portion of the speech to text.”; paragraphs 3, 13).

	As per claim 22, Abrams et al. in view of Griggs et al., further disclose performing said secondary ASR comprises causing said client device to: perform said secondary ASR on a portion of said digital representation of said spoken utterance including said acoustic span (“updating the word dictionary by temporarily adding words from the location data to the word dictionary; and using the updated word dictionary to convert the previously unrecognized portion of the speech to text.”; paragraphs 3, 13).

	As per claim 23, Abrams et al. in view of Griggs et al., further disclose an instance of said secondary ASR processing is performed on multiple distinct putative spoken proper name entities, each said distinct proper name entity having an associated acoustic span and having at least one associated putative type (Griggs et al., paragraphs 21 — 24; Abrams et al., paragraphs 13 - 25).

	As per claim 24, Abrams et al. in view of Griggs et al., further disclose said secondary ASR processing comprises a single instance of secondary ASR processing (Abrams et al., paragraphs 13 - 25).

	As per claim 25, Abrams et al. in view of Griggs et al., further disclose performing said secondary ASR processing comprises causing said client device to: produce multiple associated meanings for said acoustic span (Griggs et al. paragraphs 27 - 30; Abrams et al., paragraph 3, and 13-25).

	As per claim 26, Abrams et al. in view of Griggs et al., further disclose said secondary ASR processing comprises one or more distinct instances of secondary ASR processing, wherein each said instance of secondary ASR processing comprises secondary ASR processing of a portion of said digital representation of said spoken utterance (Griggs et al. paragraphs 27 - 30; Abrams et al., paragraph 3, and 13-25).

	As per claim 27, Abrams et al. in view of Griggs et al., further disclose creating a complete transcription for an entirety of said spoken utterance based upon a nominal primary transcription of said digital representation modified by said transcription for said acoustic span, wherein said nominal primary transcription is obtained from said server computer system (“The enhanced word dictionary 34 is used by the speech-to-text recognizer 30 to recognize any of the use's spoken words that were previously unrecognizable, thereby increasing accuracy of the by the speech-to-text recognizer 30.”; Abrams et al., paragraphs 13 — 25).

	As per claim 28, Abrams et al. in view of Griggs et al., further disclose outputting, on said client device, a complete transcription and associated meaning for an entirety of said spoken utterance (Griggs et al. paragraphs 20 — 26; Abrams et al., paragraphs 13 — 25).

	As per claim 30, Abrams et al. in view of Griggs et al., further disclose detection of said at least one putative spoken proper name entity in said digital representation of said spoken utterance is based on information external to said digital representation of said spoken utterance (“Each possible word included in word lattice 108 is associated with a start time t.sub.1, an end time t.sub.2, and a confidence score representative of the probability that the word is a correct match to the word spoken between time t.sub.1 and time t.sub.2 in audio stream 101. Word recognition engine 104 may identify multiple possible words for a given time period or for overlapping time periods, each possible word having a different confidence score... the new word may be a proper name, a word or phrase in a foreign language, a word or phrase related to a current event or recently identified technical problem, or the name of a product or promotion... words potting engine 124 searches audio data 102 to determine time intervals for putative occurrences 125 of the new word in audio stream 101”; Griggs et al., paragraphs 21 — 24).

	As per claim 31, Abrams et al. in view of Griggs et al., further disclose detection of said at least one putative spoken proper name entity in said digital representation of said spoken utterance is based on a prior action on said client device (Griggs et al., paragraphs 21 — 24; Abrams et al., paragraphs 13 - 25).

	As per claim 32, Abrams et al. in view of Griggs et al., further disclose prior to sending said data to a server computer system: capture, at said client device, said spoken utterance by a user of said client device, wherein detection of said at least one putative spoken proper name entity in said digital representation of said spoken utterance is based on a prior spoken utterance by said user of said client device (Griggs et al., paragraphs 21 — 24; Abrams et al., paragraphs 3, 13 - 25).

	As per claim 33, Abrams et al. in view of Griggs et al., further disclose said at least one putative spoken proper name entity pertains to a content identifier for controlling a video system (“The set of audio signals may include one or more of the following: a live audio stream, a legal deposition, a telephone call, and broadcast media. At least a first audio signal of the set of audio signals may be associated with video.”; paragraphs 11, 18).

	As per claim 34, Abrams et al. in view of Griggs et al., further disclose said spoken utterance is received by a software application running on said client device, and wherein said software application is configured for searching for content based on said at least one putative spoken proper name entity (“for each putative occurrence of the new word identified in audio stream 101, wordspotting engine 124 determines whether the original word lattice 108 or the putative occurrence better matches audio stream 1… Speech processing system 100 may be implemented in software, in firmware,”; Griggs et al., paragraphs 25, 35).

6.	Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Abrams et al. (US PAP 2014/0019126) in view of Griggs et al. (US PAP 2011/0125499).
; and further in view of Gruber et al., (US PAP 2012/0265528).
	As per claim 29, Abrams et al. in view of Griggs et al., further disclose producing multiple complete transcriptions and associated meanings for said spoken utterance (Griggs et al. paragraphs 20 — 26; Abrams et al., paragraphs 3, 13 — 25).
	However, Abrams et al. in view of Griggs et al. do not specifically teach ranking a set of said multiple complete transcriptions and associated meanings to create an ordered list of complete transcriptions and associated meanings; and output said ordered list of complete transcriptions and associated meanings.
	Gruber et al disclose Speech Recognition--receiving voice input and generating candidate interpretations in text, for example, "call her", "collar", and "call Herb". Context can be used to constrain which words and phrases are considered by a speech recognition module, how they are ranked, and which are accepted as above a threshold for consideration. For example, the user's address book can add personal names to an otherwise language-general model of speech, so that these names can be recognized and given priority... Context 1000 can be used, for example, for disambiguation in speech recognition to guide the generation, ranking, and filtering of candidate hypotheses that match phonemes to words. Different speech recognition systems use various mixes of generation, rank, and filter, but context 1000 can apply in general to reduce the hypothesis space at any stage (paragraphs 20, 315- 317).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to rank complete transcriptions and associated meanings as taught by Gruber et al. in Abrams et al. in view of Griggs et al., because that would help improve the speech recognition system (paragraph 315).

Allowable Subject Matter
7.	Claims 9 – 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and filing a terminal disclaimer over U.S. Patent No. 11,024,308.
The following is a statement of reasons for the indication of allowable subject matter:  
As to claim 9, the prior art made of record does not teach or suggest prior to performing said secondary ASR: store a first copy of said digital representation of said spoken utterance in memory at said client device; generate a second copy of said digital representation of said spoken utterance; and send said second copy of said digital representation of said spoken utterance to said server computer while said first copy of said digital representation of said spoken utterance remains stored in memory, wherein a nominal primary transcription is generated based on said second copy of said digital representation of said spoken utterance. 

As to claim 10, the prior art made of record does not teach or suggest prior to performing said secondary ASR: generate a first copy of a first segment of said digital representation of said spoken utterance; store said first copy of said first segment of said digital representation of said spoken utterance in memory at said client device;
send said first copy of said first segment of said digital representation of said spoken
utterance to said server computer for processing; wherein said generating, storing, and sending steps are repeated for each subsequent segment of said digital representation of said spoken utterance.

	As to claim 11, the prior art made of record does not teach or suggest a first copy of said digital representation of said spoken utterance is stored in memory of said client device and a second copy of said digital representation of said spoken utterance is sent to said server computer system, and wherein to perform said secondary ASR processing comprises causing said client device to: produce said transcription and associated meaning for said acoustic span based on said acoustic span obtained from said first copy of said digital representation of said spoken utterance stored in memory of said client device.
Conclusion
8.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.   Gruenstein teaches disambiguation of spoken proper names.  Verhasselt et al. teach N-best list rescoring in speech recognition. 
9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD SAINT-CYR whose telephone number is (571)272-4247. The examiner can normally be reached Monday- Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LEONARD SAINT-CYR/Primary Examiner, Art Unit 2658