Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim(s) 27 and 37 is/are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claims 27 and 37 each require an entity data structure defining relationships among a plurality of keywords, to determine a second keyword associated with a first keyword to identify a digital component. There is no description of such an entity data structure in the specification as filed. Paragraphs [0054-0056] of the specification describe an entity data structure such as a knowledge graph. However, the knowledge graph “may include nodes that represent known entities (and in some cases, entity attributes), as well as edges that connect the nodes and represent relationships between entities represented by nodes. An “entity” or a “node” is not equivalent to a keyword. For these reasons, the claimed entity data structure defining relationships between a plurality of keywords was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor had possession of the claimed invention.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

In regard to claim 21, Stahl discloses a system to resolve requests in audio-based networked environments (Fig. 10, 100), comprising: 
a data processing system having one or more processors coupled with memory (processors 101 and 102, RAM 104, and non-volatile memory 105) to: 
parse an input audio acquired via a sensor at a client device to determine a characteristic, the characteristic including at least one of a vocal characteristic, a grammar level, or a vocabulary level (see Fig. 4, an utterance spoken by a user 40 is processed by a classifier 45, where the classifier classifies the speaker according to vocal characteristics such as accent, and/or n-grams and lexicons associated with particular classes of speakers, paragraphs [0040-0042] and [0047]); 

identify a digital component based on one or more keywords identified from the input audio in accordance with the interaction model (e.g., based on the keyword “engineer”, a picture is identified based on the grammar rules selected for the classification, paragraph [0058]); and 
provide, in response to the input audio, the digital component to present via the client device (e.g. the user is provided the selected picture, paragraph [0058]).

In regard to claim 22, Stahl discloses the data processing system to: 
identify a trigger to disambiguate the parsing of the input audio (see Fig. 8, transcription hypotheses are parsed to produce interpretation hypotheses, paragraph [0057]); and 
select, responsive to the identification of the trigger to disambiguate, the interaction model from the plurality of interaction models to process the input audio (in response to receiving the interpretation hypotheses, module 84 utilizes characteristics of the utterance to select and apply weights to produce reweighted interpretation hypotheses, paragraph [0057]).

In regard to claim 23, Stahl discloses the data processing system to: 
identify an account profile associated with the client device via which the input audio is acquired (e.g., the owner of a mobile phone is identified, paragraph [0033]); and 
select, from the plurality of interaction models, the interaction model based on the account profile (the grammar selected for interpretation is based on the owner, paragraph [0033]).

In regard to claim 24, Stahl discloses the data processing system to: 

select, from the plurality of candidate intents, an intent based on the one or more keywords of the input audio identified in accordance with the interaction model (an interpretation is selected based on weightings applied to homonyms according to the utterance characteristics, paragraph [0058]); and 
identify the digital component based on the intent identified from the plurality of candidate intents (e.g., based on the keyword “engineer”, a picture is identified based on the grammar rules selected for the classification, paragraph [0058]).

In regard to claim 25, Stahl discloses the data processing system to: 
identify an intent based on at least one of the one or more keywords identified from the input audio in accordance with the interaction model, the intent defined for a third-party application interfacing with a digital assistant application executing on the client device (various companies define proprietary domain sets of grammar rules that interpret an utterance to invoke proprietary APIs on the system, paragraph [0055]); and 
identify the digital component based on the intent defined for the third-party application (the interpretation is selected based on the identified authorized domains, paragraph [0054]).

In regard to claim 26, Stahl discloses the data processing system to: 
identify an intent based on a mapping defined by the interaction model between at least one of the one or more keywords to one of a plurality of intents (the interpretation is determined based on grammar rules, paragraph [0053]; the grammar rules mapping keywords to particular intents, paragraphs [0035-0036]); and 


In regard to claim 28, Stahl discloses the data processing system to: 
select the interaction model from the plurality of interaction models, each of the plurality of interaction models having a tolerance level for at least one of a pronunciation, a grammar, and a vocabulary in processing the input audio (e.g., selected grammar rules have varying tolerances for mature or offensive subject matter, paragraph [0056]); and 
modify a first keyword parsed from the input audio to a second keyword in accordance with the tolerance level of the interaction model to identify the digital component (e.g., a request for mature or offensive content from a child will be matched to a generic grammar rule in accordance with the tolerance for mature or offensive subject matter, paragraph [0056]).

In regard to claim 31, Stahl discloses a method of resolving requests in audio-based networked environments (Fig. 10, 100), comprising: 
parsing, by a data processing system, an input audio acquired via a sensor at a client device to determine a characteristic, the characteristic including at least one of a vocal characteristic, a grammar level, or a vocabulary level (see Fig. 4, an utterance spoken by a user 40 is processed by a classifier 45, where the classifier classifies the speaker according to vocal characteristics such as accent, and/or n-grams and lexicons associated with particular classes of speakers, paragraphs [0040-0042] and [0047]); 
selecting, by the data processing system, from a plurality of interaction models, an interaction model based on the characteristic (particular domain sets of grammar rules are selected to determine the intent of the utterance, based on the classification, paragraphs [0053-0055]); 

providing, by the data processing system, in response to the input audio, the digital component to present via the client device (e.g. the user is provided the selected picture, paragraph [0058]).

In regard to claim 32, Stahl discloses identifying, by the data processing system, a trigger to disambiguate the parsing of the input audio (see Fig. 8, transcription hypotheses are parsed to produce interpretation hypotheses, paragraph [0057]); and 
selecting, by the data processing system, responsive to the identification of the trigger to disambiguate, the interaction model from the plurality of interaction models to process the input audio (in response to receiving the interpretation hypotheses, module 84 utilizes characteristics of the utterance to select and apply weights to produce reweighted interpretation hypotheses, paragraph [0057]).

In regard to claim 33, Stahl discloses identifying, by the data processing system, an account profile associated with the client device via which the input audio is acquired (e.g., the owner of a mobile phone is identified, paragraph [0033]); and 
selecting, by the data processing system, from the plurality of interaction models, the interaction model based on the account profile (the grammar selected for interpretation is based on the owner, paragraph [0033]).


selecting, by the data processing system, from the plurality of candidate intents, an intent based on the one or more keywords of the input audio identified in accordance with the interaction model (an interpretation is selected based on weightings applied to homonyms according to the utterance characteristics, paragraph [0058]); and 
identifying, by the data processing system, the digital component based on the intent identified from the plurality of candidate intents (e.g., based on the keyword “engineer”, a picture is identified based on the grammar rules selected for the classification, paragraph [0058]).

In regard to claim 35, Stahl discloses identifying, by the data processing system, an intent based on at least one of the one or more keywords identified from the input audio in accordance with the interaction model, the intent defined for a third-party application interfacing with a digital assistant application executing on the client device (various companies define proprietary domain sets of grammar rules that interpret an utterance to invoke proprietary APIs on the system, paragraph [0055]); and 
identifying, by the data processing system, the digital component based on the intent defined for the third-party application (the interpretation is selected based on the identified authorized domains, paragraph [0054]).

In regard to claim 36, Stahl discloses identifying, by the data processing system, an intent based on a mapping defined by the interaction model between at least one of the one or more keywords to 
identifying, by the data processing system, the digital component based on the intent defined by the mapping (e.g., based on the keyword “engineer”, a picture is identified based on the grammar rules selected for the classification, paragraph [0058]).

In regard to claim 38, Stahl discloses selecting, by the data processing system, the interaction model from the plurality of interaction models, each of the plurality of interaction models having a tolerance level for at least one of a pronunciation, a grammar, and a vocabulary in processing the input audio (e.g., selected grammar rules have varying tolerances for mature or offensive subject matter, paragraph [0056]); and 
modifying, by the data processing system, a first keyword parsed from the input audio to a second keyword in accordance with the tolerance level of the interaction model to identify the digital component (e.g., a request for mature or offensive content from a child will be matched to a generic grammar rule in accordance with the tolerance for mature or offensive subject matter, paragraph [0056]).



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claim(s) 29 and 39 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stahl, in view of Orbach (U.S. Patent Application Pub. No. 2004/0215453).
In regard to claim 29, Stahl does not disclose to select, from a plurality of voice synthesis models, a voice synthesis model based on the characteristic determined from the input audio.
Orbach discloses a system to:
select, from a plurality of voice synthesis models, a voice synthesis model based on the characteristic determined from the input audio (a speech sample is analyzed to determine speech characteristics, such as voice tone and choice of words, paragraph [0017]); and 
provide the digital component to present via the client device using the voice synthesis model (a voice response set is determined based on the characteristics, paragraph [0019]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to select, from a plurality of voice synthesis models, a voice synthesis model based on the characteristic determined from the input audio and provide the digital component to present via the client device using the voice synthesis model, because providing synthesized voice responses based on a characteristic determined from the input audio allows the responses to be tailored to the user automatically, as suggested by Orbach (paragraphs [0004] and [0006]).

In regard to claim 39, Stahl does not disclose to select, from a plurality of voice synthesis models, a voice synthesis model based on the characteristic determined from the input audio.
Orbach discloses a method comprising selecting, by the data processing system, from a plurality of voice synthesis models, a voice synthesis model based on the characteristic determined from the input audio (a speech sample is analyzed to determine speech characteristics, such as voice tone and choice of words, paragraph [0017]); and 

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to select, from a plurality of voice synthesis models, a voice synthesis model based on the characteristic determined from the input audio and provide the digital component to present via the client device using the voice synthesis model, because providing synthesized voice responses based on a characteristic determined from the input audio allows the responses to be tailored to the user automatically, as suggested by Orbach (paragraphs [0004] and [0006]).



Claim(s) 30 and 40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stahl, in view of Sachidanandam et al. (U.S. Patent Application Pub. No. 2015/0134334, hereinafter “Sachidanandam”).
In regard to claim 30, Stahl does not disclose the data processing system to identify a second digital component associated with the digital component based on the one or more keywords identified from the input audio in accordance with the interaction model.
Sachidanandam discloses a system to identify a second digital component associated with the digital component based on the one or more keywords identified from the input audio in accordance with the interaction model (a speech recognizer determines a plurality of media items matching a user’s spoken request, paragraph [0035]); and 

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to identify a second digital component associated with the digital component, because it would allow the user to request and receive related digital components related to a keyword, such as a plurality of songs on a particular playlist or album, as suggested by Sachidanandam (paragraph [0041]).

In regard to claim 40, Stahl does not disclose the data processing system to identify a second digital component associated with the digital component based on the one or more keywords identified from the input audio in accordance with the interaction model.
Sachidanandam discloses a method comprising identifying, by the data processing system, a second digital component associated with the digital component based on the one or more keywords identified from the input audio in accordance with the interaction model (a speech recognizer determines a plurality of media items matching a user’s spoken request, paragraph [0035]); and 
providing, by the data processing system, in response to the input audio, the digital component and the second digital component to present via the client device (a plurality of media items are returned to the user, paragraph [0043]).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to identify a second digital component associated with the digital component, because it would allow the user to request and receive related digital components related to a keyword, such as a plurality of songs on a particular playlist or album, as suggested by Sachidanandam (paragraph [0041]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached Mon-Thurs 9AM-3PM (Part time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





BLA 3/3/22
/BRIAN L ALBERTALLI/               Primary Examiner, Art Unit 2656