DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
2.	Applicant's arguments filed 07/28/2022 have been fully considered but they are not persuasive. 
	Applicant argues that Fife fails to provide determining, by the computing device, a selected portion of the audio for analysis based at least in part on a predetermined time period preceding receipt of the request from the user (Amendment, pages 7 – 9).
	The examiner disagrees, since discloses “a guidance application client residing on the user's equipment may initiate sessions with source 418 to obtain guidance data when needed, e.g., when the guidance data is out of date or when the user equipment device receives a request from the user to receive data. Media guidance may be provided to the user equipment with any suitable frequency (e.g., continuously, daily, a user-specified period of time, a system-specified period of time, in response to a request from user equipment, etc.)” [paragraph 63, lines 3 – 7].  “Audio to text module 814 processes the received audio signal to convert it to text using any known speech recognition process.” (providing media guidance to the user based on request from the user specifying a period of time of receiving the response implies determining, by the computing device, a selected portion of the audio for analysis based at least in part on a predetermined time period preceding receipt of the request from the user; paragraph 101).

Claim Rejections - 35 USC § 102
3.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

4.	Claims 1 – 6, 8, 10, 12- 15, 18– 21 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Fife et al. (US PAP 2014/0088952).
As per claims 1, 15, Fife et al. teach a method/device for identifying semantic entities within an audio signal, comprising:
obtaining, by a computing device comprising one or more processors and one or more memory devices, an audio signal concurrently heard by a user (“control circuitry processes verbal data received during an interaction between a user of a user device and a person with whom the user is interacting”: paragraph 4);
receiving, by the computing device, a request from a user to display one or more semantic entities (“Media guidance data" or "guidance data" may also include athletic teams or athletes, stadium names, names of hosts, names of commentators, place names, store names, restaurants, character names, occupations, artists, band names, album titles, song titles, and other words or phrases that could be used to identify content”; paragraphs 32, 63);
determining, by the computing device, a selected portion of the audio for analysis based at least in part on a predetermined time period preceding receipt of the request from the user to display the one or more semantic entities (“when the user equipment device receives a request from the user to receive data. Media guidance may be provided to the user equipment with any suitable frequency (e.g., continuously, daily, a user-specified period of time, a system-specified period of time, in response to a request from user equipment, etc.)”; paragraphs 63, 101);
analyzing, by a machine-learned model stored on the computing device, the selected portion of the audio signal in a background of the computing device to determine the one or more semantic entities ("Media guidance data" or "guidance data" may also include athletic teams or athletes, stadium names, names of hosts, names of commentators, place names, store names, restaurants, character names, occupations, artists, band names, album titles, song titles, and other words or phrases that could be used to identify content…an audio analytics model analyzes the audio signal directly to identify discussed media assets without converting the audio signal to text.”; paragraphs 32, 102); and
in response to receiving the request, displaying the one or more semantic entities on a display screen of the computing device (“display screen showing a second automated recommendation generated by media asset recommendation system 508 or 608”; figs. 13, 14; paragraph 114).

As per claim 2, Fife et al. further disclose receiving, by the computing device, a user selection of a selected semantic entity; wherein the selected semantic entity comprises one of the one or more semantic entities displayed on display screen of the computing device selected by a user (“Interactive media guidance applications may generate graphical user interface screens that enable a user to navigate among, locate and select content. As referred to herein, the terms "media asset" and "content" should be understood to mean an electronically consumable user asset”; paragraph 30).

As per claim 3, Fife et al. further disclose determining, by the computing device, one or more supplemental information options associated with the selected semantic entity; and displaying the one or more supplemental information options associated with the selected semantic entity on the display screen of the computing device (“The content presented on the second screen device may be any suitable content that supplements the content presented on the first device.”; paragraph 56).

As per claim 4, Fife et al. further disclose the one or more supplemental information options are determined based at least in part on a context of the audio signal obtained by the computing device or a context of the selected semantic entity (“In response to the number of interactions in which the media asset was identified exceeding the predetermined threshold, the identified media asset is added to the list of media assets associated with the user”; paragraph 8, 56).

As per claim 5, Fife et al. further disclose the one or more supplemental information options comprises one or more of: a database entry associated with the selected semantic entity, search results associated with the selected semantic entity, or one or more application interaction options associated with the selected semantic entity (“In response to the number of interactions in which the media asset was identified exceeding the predetermined threshold, the identified media asset is added to the list of media assets associated with the user…Text analytics module 704 uses the data in media asset database 706 to determine a media asset”;  paragraphs 8, 83).

As per claim 6, Fife et al. further disclose the audio signal concurrently heard by the user comprises at least one of: an audio signal associated with an application being executed by the computing device, an audio signal generated by the computing device based on ambient audio, or an audio signal associated with a communication signal communicated to or from the computing device (“automatically generating a media asset recommendation based on a user's interaction. First, media asset recommendation system 508 or 608 processes verbal data received during an interaction, e.g., the audio signal recorded during a user's interaction”; paragraph 123).

As per claim 8, Fife et al. further disclose the request is received in response to the user performing a user interaction with an associated peripheral device (“control circuitry processes verbal data received during an interaction between a user of a user device and a person with whom the user is interacting”: paragraph 4).

As per claim 10, Fife et al. further disclose the receiving the request comprises receiving a user interaction with the computing device (“The first user device 602 includes a microphone 604 for detecting audio, and in particular, for detecting an interaction between the user 610 and another person 612 such as the user's friend or a family member.” paragraph 79).

As per claim 12, Fife et al. further disclose displaying the one or more semantic entities on a display screen of the computing device comprises displaying the one or more semantic entities in a user interface of an application being executed by the computing device (“media asset recommendation module 708 may identify one or more candidate media assets…display screen showing a second automated recommendation generated by media asset recommendation system 508 or 608”; figs. 13, 14; paragraphs 95, 114).

As per claim 13, Fife et al. further disclose the audio signal concurrently heard by the user comprises an audio signal associated with the application being executed by the computing device (“The control circuitry analyzes the verbal data to automatically identify a media asset referred to during the interaction by at least one of the user and the person with whom the user is interacting”; Abstract).

As per claim 14, Fife et al. further disclose the machine-learned model comprises at least one of: a speech recognition semantic entity identifier model, a song recognition semantic entity identifier model, or a language translation semantic entity identifier model (“an audio analytics model analyzes the audio signal directly to identify discussed media assets without converting the audio signal to text”; paragraph 102).

As per claim 18, Fife et al. teach a system for identifying semantic entities within an audio signal, comprising: 
a computing device comprising one or more processors and a display screen (paragraph 113); and
 a speaker device configured to play one or more audio signals for a user, the speaker device further configured to receive one or more user interactions from the user (“Speakers 314 may be provided as integrated with other elements of user equipment device 300 or may be stand-alone units. The audio component of videos and other content displayed on display 312 may be played through speakers 314. In some embodiments, the audio may be distributed to a receiver (not shown), which processes and outputs the audio via speakers 314.”; paragraph 50); 
wherein the speaker device is operable to: receive a user interaction indicative of a user request ; and responsive to receiving the user interaction, communicate the user request to the computing device (“control circuitry processes verbal data received during an interaction between a user of a user device and a person with whom the user is interacting”: paragraph 4); and 
wherein, the computing device is operable to: receive the user request; determine a first portion of a first audio signal for analysis based at least in part on a predetermined time period preceding receipt of the user request from the user interaction (“when the user equipment device receives a request from the user to receive data. Media guidance may be provided to the user equipment with any suitable frequency (e.g., continuously, daily, a user-specified period of time, a system-specified period of time, in response to a request from user equipment, etc.)”; paragraphs 63, 101); 
responsive to receiving the user request, analyze a first portion of a first audio signal for analysis based at least in part on predetermined time period preceding receipt  of the user request [“Media guidance may be provided to the user equipment with any suitable frequency (e.g., continuously, daily, a user-specified period of time, a system-specified period of time, in response to a request from user equipment, etc.)”]; analyze the first portion of the first audio signal with a machine-learned model (“an audio analytics model analyzes the audio signal directly to identify discussed media assets”) to identify one or more semantic entities ("Media guidance data" or "guidance data" may also include athletic teams or athletes, stadium names, names of hosts, names of commentators, place names, store names, restaurants, character names, occupations, artists, band names, album titles, song titles, and other words or phrases that could be used to identify content…an audio analytics model analyzes the audio signal directly to identify discussed media assets without converting the audio signal to text.”; paragraphs 32, 63, 102);  and 
display the one or more semantic entities on the display screen (“display screen showing a second automated recommendation generated by media asset recommendation system 508 or 608”; figs. 13, 14; paragraph 114).

As per claim 19, Fife et al. further disclose the computing device is further operable to: communicate the first audio signal to the speaker device; wherein the speaker device is configured to receive the first audio signal, and responsive to receiving the first audio signal, play the first audio signal; wherein the first portion of the first audio signal comprises a portion of the first audio signal played by the speaker device at a time period preceding receipt of the user interaction (“when the user equipment device receives a request from the user to receive data. Media guidance may be provided to the user equipment with any suitable frequency (e.g., continuously, daily, a user-specified period of time, a system-specified period of time, in response to a request from user equipment, etc.). Media guidance data source 418 may provide user equipment devices”; paragraph 63).

As per claim 20, Fife et al. further disclose the computing device further comprises a microphone configured to generate audio signals based on ambient audio; wherein the first audio signal is generated by the microphone of the computing device; and wherein the first portion of the first audio signal is received by the computing device at a time period preceding receipt of the user interaction by the speaker device (“when the user equipment device receives a request from the user to receive data. Media guidance may be provided to the user equipment with any suitable frequency (e.g., continuously, daily, a user-specified period of time, a system-specified period of time, in response to a request from user equipment, etc.)… The first user device 602 includes a microphone 604 for detecting audio”; paragraphs 63, 79).

As per claim 21, Fife et al. further disclose that the audio signal comprises at least one of: an audio signal associated with an application being executed by the device, an audio signal generated by the device based on ambient audio, or an audio signal associated with a communication signal communicated to or from the device (“automatically generating a media asset recommendation based on a user's interaction. First, media asset recommendation system 508 or 608 processes verbal data received during an interaction, e.g., the audio signal recorded during a user's interaction”; paragraph 123).

Claim Rejections - 35 USC § 103
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Fife et al. (US PAP 2014/0088952) in view of Zilberman et al. (US PAP 2020/0275207).
As per claim 9, Fife et al. do not specifically teach the associated peripheral device comprises an earbud device communicatively coupled to the computing device; and wherein the user interaction comprises a fetch gesture performed on the earbud device.
Zilberman et al. disclose that said output sound generator utilizes the map data to determine said at least one selected transducer unit in accordance with said data about spatial location of the at least one user's ear received from the corresponding user detection module such that the respective coverage zone of said selected transducer unit includes said location of said at least one user's ear… said gesture detection is adapted to apply gesture recognition processing to said at least a portion of the sensory data to identify whether one or more predetermined gestures are performed by the user (paragraphs 49, 73).
Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to detect a fetch gesture as taught by Zilberman et al. in Fife et al., because that would help generate and transmit a corresponding commands for operating said communication system for performing one or more corresponding actions (paragraph 73).

Conclusion
7.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD SAINT-CYR whose telephone number is (571)272-4247. The examiner can normally be reached Monday- Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LEONARD SAINT-CYR/           Primary Examiner, Art Unit 2658