DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/27/2022 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 11, and 17 have been considered but are moot because the new ground of rejection does not rely on the same combination of references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1, 2, 6-11, 14, 15, 17, 18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Howcroft et al. ("Howcroft" US 20160100208), and further in view of Law et al. (“Law” US 20190197369), and Bakish et al. (“Bakish” US 20140149117).

Regarding claim 1, Howcroft teaches a computer-implemented method, comprising: 
receiving a request to modify audio events in video corresponding to a audio event type; [Howcroft – Para 0052, Fig. 10: teaches a user can utilize a computing device to select media content of interest to be monitored and replacement content to replace the media content of interest.  The user can select an option in a GUI to monitor every occurrence of an offensive word for a selected media program] 
obtaining first video content to be provided for presentation on a display device, the video content including visual content and audio content; [Howcroft – Fig. 10, Para 0047: teaches at step 1004, transmitting the selections to an application on a server.  Para 0051, 0042: teaches media content being relayed to a media device for display, wherein media content can audio content, video content, still image content, text content, text content associated with the audio content (such as closed captioning content), and other content] 

identifying an occurrence of a first defined audio event of the first audio event type [i.e. offensive language] in the audio content; [Howcroft – Para 0049, 0052, and 0054, Fig. 10: teaches analyzing audio content to detect the media content of interest in the streamed audio content.  If the media content of interest is detected, an indicator with a pointer to the replacement content can be inserted into the audio stream at the start of the media content of interest.  For example, the user can select an option in a GUI to monitor every occurrence of an offensive word for a selected media program] 
associating the first modified audio content with a first segment [i.e. the indicated point in the stream] of the visual content that corresponds to the first defined audio event; and [Howcroft – Fig. 10, Para 0054, 0016, 0042: teaches inserting an indicator with pointer to replacement content at a start point of the media content of interest. For example, if the media content of interest was the word “computer,” the pointer can be inserted at the point in the stream where the word “computer” begins.  Wherein The media content can include audio content, video content, still image content, text content, text content associated with the audio content (such as closed captioning content), and other content] 
providing the first modified audio content in association with the first segment for presentation on the display device. [Howcroft – Para 0054, Fig. 10: teaches splicing the replacement content into the audio stream in place of the media content of interest]
Howcroft teaches audio event types, but does not explicitly teach non-speech audio event type,
identifying, using a machine learning model, non-speech audio event, wherein the machine learning model is trained on a collection of training video content including both (a) visual content corresponding to one or more audio events of the non-speech audio event type and (b) audio content corresponding to the one or more audio events of the non-speech audio event type;
Further, Howcroft teaches modifying audio content of the first defined audio event, but does not explicitly teach modifying audio content of the first defined audio event according to a first modification operation by at least (a) preserving perceptibility of first frequencies not corresponding to the first defined audio event and (b) phase shifting one or more features of second frequencies corresponding to the first defined audio event, to generate first modified audio content of the first defined audio event; 

However, Law teaches non-speech audio event type;
identifying, using a machine learning model, non-speech audio event, wherein the machine learning model is trained on a collection of training video content including both (a) visual content corresponding to one or more audio events of the non-speech audio event type and (b) audio content corresponding to the one or more audio events of the non-speech audio event type; [Law – Para 0105, 0106, Fig. 4: teaches retrieving one or more audio and/or video streams captured (step 412), identifying one or more machine learning training modules corresponding to 
Howcroft and Law are analogous in the art because they are from the same field of detecting events in audio/video content [abstract].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Howcroft’s defined audio events in view of Law to machine learning for the reasons of improving audio event accuracy by learning and determining non-speech events such as gunshots. 
Howcroft and Law teaches modifying audio content of the first defined audio event, but does not explicitly teach modifying audio content of the first defined audio event according to a first modification operation by at least (a) preserving perceptibility of first frequencies not corresponding to the first defined audio event and (b) phase shifting one or more features of second frequencies corresponding to the first defined audio event, to generate first modified audio content of the first defined audio event;

However, Bakish teaches modifying audio content of the first defined audio event according to a first modification operation by at least (a) preserving perceptibility [i.e. reducing noise and cutting non speech areas] of first frequencies not corresponding to the first defined audio event and (b) phase shifting one or more features of second frequencies corresponding to the first defined audio event, to generate first modified audio content of the first defined audio event; [Bakish – Para 0059: teaches modifying audio content with an analysis module 300 outputting an improved audio pattern, which may be a cleaner version of the audio pattern--noise reduced and having the signal indication in the identified non-speech areas completely removed/cut from the original audio pattern which results in preserving perceptibility of frequencies not corresponding the defined audio event. Para 0040: teaches phase shifts may allow outputting a first output pattern which includes the velocity or displacement of the reflected signals over time. This first pattern may then be used for extracting the blank spaces for identification of speech segments therefrom to generate a modified audio content] 
Howcroft, Law, and Bakish are analogous in the art because they are from the same field of speech recognition [abstract].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Howcroft and Law’s audio modification in view of Bakish to phase shifting frequencies for the reasons of improving perceptibility by modifying the frequency to reduce unwanted segments in the content. 

Regarding claim 2, Howcroft, Law, and Bakish teaches the computer-implemented method of claim 1, wherein identifying the occurrence of the defined first audio event includes identifying a time period in the video content that corresponds to the non-speech audio event type. [Howcroft – Para 0056: teaches the database 516 can store an identification information associated with the media 

Regarding claim 6, Howcroft, Law, and Bakish teaches the computer-implemented method of claim 1, comprising: 
receiving, by a server, an indication by a viewer of the occurrence of the first defined audio event in the video content, wherein the occurrence is identified as a result of receiving the indication. [Howcroft – Para 0052: teaches the user can select an option in a GUI to monitor every occurrence of an offensive word for a selected media program. The user can also select a non-offensive replacement word such as "hello" to replace each occurrence of the offensive word in the media program]

Regarding claim 7, Howcroft, Law, and Bakish teaches the computer-implemented method of claim 1, comprising: 
determining the first modification operation to be performed based on modification setting information provided by a viewer. [Howcroft – Para 0043, Fig. 6: teaches a user interface that features audio replacement application settings, wherein the user is enabled to identify what media content of interest the application needs to monitor, and to select a replacement word or words to use to replace]

Regarding claim 8, Howcroft, Law, and Bakish teaches the computer-implemented method of claim 1, comprising: 
receiving a request to modify audio events corresponding to an other audio event type different than the non-speech audio event type; [Howcroft – Para 0043, 0052, Fig. 6: teaches a plurality of “Words to Replace”]
identifying an occurrence of a second defined audio event of the second audio event type in the audio content; [Howcroft – Para 0053: teaches modifying the audio-visual core portion with at least one revised content portion in accordance with the at least one selection signal to create a dynamically customized audio-visual content at 630 may include at least one of replacing a culturally inappropriate portion with a culturally appropriate portion]
modifying the second defined audio event to generate second modified audio content of the second defined audio event; [Bakish – Para 0008: teaches the non-speech part identified in the VAD process are then cut out of the audio signal, resulting in audio files that only represent the identified speech parts thereof.]
associating the second modified audio content with a second segment of the visual content; and [Howcroft – Fig. 10: teaches inserting an indicator with pointer to replacement content at a start point of the media content of interest]
providing the second modified audio content in association with the second segment for presentation on the display device. [Howcroft – Para 0054, Fig. 10: teaches splicing the replacement content into the audio stream in place of the media content of interest]

Regarding claim 9, Howcroft, Law, and Bakish teaches the computer-implemented method of claim 8, wherein the second defined audio event is modified according to a second modification operation different than the first modification operation. [Bakish – Para 0008: teaches the non-speech part identified in the VAD process are then cut out of the audio signal, resulting in audio files that only represent the identified speech parts thereof.]

Regarding claim 10, Howcroft, Law, and Bakish teaches the computer-implemented method of claim 1, wherein the non-speech audio event type is an event type selected from an infant crying event type, a doorbell event type, an animal noise event type, a knocking event type, and a gunshot event type. [Law – Para 0051, 0090: teaches shot detection sensor 180 that includes an acoustic sensor for identifying and timestamping strong impulsive noises, perhaps including an array of acoustic sensors for triangulating a direction and/or location of a detected shot or visual confirmation for detecting a gunshot from the barrel of a gun]

Regarding System claim 11, claim(s) 11 recite(s) limitations that is/are similar in scope to the limitations recited in Method claim 1. 
Therefore, claim(s) 11 is/are subject to rejections under the same rationale as applied hereinabove for claim 1.
[Examiner notes: Howcroft – Para 0062: teaches the instructions 1124 may also reside, completely or at least partially, within the main memory 1104, the static memory 1106, and/or within the processor 1102 during execution thereof by the computer system 1100]

the system of claim 11, wherein execution of the instructions causes the system to: 
obtain, in response to identifying the occurrence, second audio content previously selected for replacement of occurrences of the non-speech audio event type, wherein the first modification operation includes replacement of the first defined audio event with the second audio content. [Howcroft – Para 0049, Fig. 5: teaches the actual replacement content, information about the media content of interest, and other information can be stored in database 516.  Fig. 6: teaches multiple options in which to choose what to replace the words with.]

Regarding claim 15, Howcroft, Law, and Bakish teaches the system of claim 11, wherein execution of the instructions causes the system to: 
receive, over a network from a remotely located content receiver, a request to provide the video content; and [Howcroft – Fig. 1: suggests STB 106 to receive content via the network 132]
transmit, over the network to the remotely located content receiver, the first modified audio content in association with the first segment. [Howcroft – Para 0047: teaches Once the user or other individual has made his/her selections pertaining to the media content of interest and replacement content, the media content can enter a network associated with the devices of system 500, and can be received by an encoder]

Non-transitory computer readable medium claim 17, claim(s) 11 recite(s) limitations that is/are similar in scope to the limitations recited in Method claim 1. 
Therefore, claim(s) 17 is/are subject to rejections under the same rationale as applied hereinabove for claim 1.
[Examiner notes: Howcroft – Para 0062: teaches the disk drive unit 1116 may include a machine-readable medium 1122 on which is stored one or more sets of instructions (e.g., software 1124) embodying any one or more of the methodologies or functions described herein]

Regarding claim 18, Howcroft, Law, and Bakish teaches the one or more non-transitory computer-readable media of claim 17, wherein the instructions cause the one or more processors to: 
compare the modification setting information with event information associated with the video content, the event information specifying a set of defined audio events and audio event types for each of the set of defined audio events, wherein the occurrence of the defined audio event is identified based on determining a match between the audio event type specified in the modification setting information and the event information. [Howcroft – Fig. 6: teaches replacement words in the event “Words to Replace” are identified.  Para 0054, Fig. 6, 10: teaches If the selected media content of interest is found in the audio stream, an indicator with a pointer to the replacement content can be inserted at the beginning of the media content of interest at step 1018. For example, if the media content of interest 

Regarding claim 20, Howcroft, Law, and Bakish teaches the one or more non-transitory computer-readable media of claim 17, wherein the instructions cause the one or more processors to:
attenuate conspicuity of the occurrence of the defined audio event in a segment of the audio content and maintain conspicuity of other audio in the audio content. [Howcroft – Para 0043, 0049, Fig. 6: teaches replacing a specific word or words to replace, wherein an indicator with a pointer to the start time so that only the content of interest is replaced]

Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Howcroft, Law, and Bakish as applied to claim 1 above, and further in view of Farrell et al. ("Farrell" US 9531998).

Regarding claim 3, Howcroft, Law, and Bakish do not explicitly teach claim 3.  However, Farrell teaches the computer-implemented method of claim 1, wherein identifying the occurrence of the defined first audio event includes identifying image content in the visual content that corresponds to the non-speech audio event type. [Farrell – C 23, L 4-17: teaches facial/vocal recognition unit 318 determines 
Howcroft, Law, Bakish, and Farrell are analogous in the art because they are from the same field of analyzing video content [abstract].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Howcroft, Law, and Bakish’s defined audio events in view of Farrell to identifying images for the reasons of improving accuracy by identifying which specific frames are analyzed to include inappropriateness.

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Howcroft, Law, and Bakish as applied to claim 1 above, and further in view of Vorenkamp et al. ("Vorenkamp" US 20010007151).

Regarding claim 4, Howcroft, Law, and Bakish do not explicitly teach claim 4.  However, Vorenkamp teaches the computer-implemented method of claim 1, comprising: 
generating an opposing audio event having a waveform with amplitudes aligned with and opposing amplitudes of the first defined audio event, wherein performing the first modification operation includes combining the opposing audio event with the first defined audio event. [Vorenkamp – Para 0299, 0366, Fig. 6: teaches creating two frequencies by each multiplication, when these signals are added together one when these signals are added together one frequency component, the difference, that is present in each signal has twice the amplitude of the individual 
Howcroft, Law, Bakish, and Vorenkamp are analogous in the art because they are from the same field of television receivers [Para 0003].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Howcroft, Law, and Bakish’s audio events in view of Vorenkamp to signal creation for the reasons of improving quality of the audio by cancelling out certain signals.

Claims 5 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Howcroft, Law, and Bakish as applied to claim 1 above, and further in view of Renner et al. ("Renner" US 20200136580).

Regarding claim 5, Howcroft, Law, and Bakish do not explicitly teach claim 5.  However, Renner teaches the computer-implemented method of claim 1, comprising: 
detecting frequency characteristics of the first defined audio event; [Renner – Para 0045: teaches determining audio characteristics (e.g. frequency, amplitude, time values, etc.) in real time]
determining a first filter to be applied to the first defined audio event based on frequency characteristics detected; and [Renner – Para 0079: teaches determining EQ filter settings by transforming the input audio signal into a frequency and/or characteristic form to be utilized by the EQ model query generator]
applying the first filter to the first defined audio event, wherein the modified audio content is generated based on output of the first filter. [Renner – Para 0079: teaches The EQ filter selector 218 determines one or more of the filters represented by the EQ settings to apply to the input media signal 202. The EQ adjustment implementor 220 applies the selected filters using smoothing based on parameters from the smoothing filter configurator 222.]
Howcroft, Law, Bakish, and Renner are analogous in the art because they are from the same field of audio signals [abstract].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Howcroft, Law, and Bakish’s audio events in view of Renner to filtering for the reasons of improving the quality by smoothing the audio signal by utilizing filters.

Regarding claim 13, Howcroft, Law, and Bakish teaches the system of claim 11, wherein execution of the instructions causes the system to: 
perform the first modification operation on the frequency content based on modification setting information to generate the modified audio content. [Bakish - Para 0059: teaches the analysis module 300, produces the following outputs: (i) data including indications of the voice activity areas in the audio pattern and the characterizing pitch of the speaker; (ii) the original optical and audio patterns; and/or (iii) an improved audio pattern, which may be a cleaner version of the audio pattern--noise reduced and having the signal indication in the identified non-speech areas completely removed/cut from the original audio pattern]
determine frequency content of the first defined audio event by at least performing a frequency domain transform on the occurrence of the first defined audio event; 

However, Renner teaches determine frequency content of the first defined audio event by at least performing a frequency domain transform on the occurrence of the first defined audio event; [Renner – Para 0079: teaches transforming the input audio signal into a frequency and/or characteristic form]
In addition, the rationale of claim 5 is used for this claim.

Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Howcroft, Law, and Bakish as applied to claim 11 above, and further in view of Renner et al. ("Renner" US 20200136580), and Vorenkamp et al. ("Vorenkamp" US 20010007151).

Regarding claim 12, Howcroft, Law, and Bakish do not explicitly teach claim 12.  However, Renner teaches the system of claim 11, wherein execution of the instructions causes the system to: 
determine a frequency content of the first defined audio event by at least performing a frequency domain transform on the occurrence of the first defined audio event; [Renner – Para 0079: teaches transforming the input audio signal into a frequency and/or characteristic form]
perform a time domain transform on the modified frequency content to obtain a time domain signal of the modified frequency content. [Renner – Para 0100: teaches the time to frequency domain converter 232 may utilize any type of transform (e.g., a short -time Fourier Transform, a Constant-Q transform, Hartley transform, etc.) to convert the input media signal 202 from a time-domain representation to a frequency -domain representation]
In addition, the rationale of claim 5 is used for these limitations. 
Howcroft, Law, Bakish, and Renner do not explicitly teach shift a phase of the frequency content to generate modified frequency content of the first defined audio event; and 

However, Vorenkamp teaches shift a phase of the frequency content to generate modified frequency content of the first defined audio event; and [Vorenkamp – Para 0248: teaches tuning with a 90 degree phase shift across capacitors internal to the phase detector, that corresponds to 0 degrees of phase shift across the filter.]
Howcroft, Law, Bakish, Renner, and Vorenkamp are analogous in the art because they are from the same field of television receivers [Para 0003].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Howcroft, Law, Bakish, and Renner’s audio events in view of Vorenkamp to signal creation for the reasons of improving quality of the audio by cancelling out certain signals.

Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Howcroft, Law, and Bakish as applied to claim 11 above, and further in view of Strubbe et al. ("Strubbe" US 6795808).

Regarding claim 16, Howcroft, Law, and Bakish do not explicitly teach claim 16.  However, Strubbe teaches the system of claim 11, wherein execution of the instructions causes the system to: 
provide the first video content as input to the machine learning model; and [Strubbe – C 10, L57-62: teaches various inputs may be used for such a machine-learning process including specific words]
receive output from the machine learning model indicating the occurrence of the non-speech audio event type in the first video content [Strubbe – C 29, L 20-37: teaches identifying a situation where speech output by the conversation simulator is inappropriate and instead the template selector/store generates white sound (or music, no sound at all, or a lowering of the lights]
Howcroft, Law, Bakish, and Strubbe are analogous in the art because they are from the same field of speech analysis [C 12, L 10-33].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Howcroft, Law, and Bakish’s audio event types in view of Strubbe to machine learning for the reasons of improving analysis of inappropriateness by analysis of speech.

Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Howcroft, Law, and Bakish as applied to claim 17 above, and further in view of Milazzo et al. ("Milazzo" US 20200073902).

Regarding claim 19, Howcroft, Law, and Bakish do not explicitly teach claim 19.  However, Milazzo teaches the one or more non-transitory computer-readable media of claim 17, wherein the instructions cause the one or more processors to: 
determine a visual indicator that corresponds to the audio event type; [Milazzo – Para 0117: teaches indicator module may determine an offensive speech indicator for an article]
associate the visual indicator with the segment of the visual content; and [Milazzo – Para 0122: teaches the indicator module 206 may determine offensive speech (e.g., number of words and phrases) for each unit or portion.]
provide the visual indicator in association with the segment for display on the display device. [Milazzo – Fig. 8: teaches a graphic representing a degree to which an article is using dramatic or sensational language]
Howcroft, Law, Bakish, and Milazzo are analogous in the art because they are from the same field of speech analysis [abstract].  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Howcroft, Law, and Bakish’s audio event types in view of Milazzo to indicators for the reasons of improving user experience by warning the user of unwanted content when offensive language is present.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAYCEE IMPERIAL whose telephone number is (571)270-0604. The examiner can normally be reached 8-6 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached on 571.272.4195. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAYCEE IMPERIAL/Examiner, Art Unit 2426

/NASSER M GOODARZI/Supervisory Patent Examiner, Art Unit 2426