DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Status of Claims

This action is in response to the amendment filed on 7/2/2022.
Claims 1-20 have been amended.
No additional claims have been added.
Claims 1-20 are pending and have been examined.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claims are directed to a device (claim 1), method (claim 5) and system (claim 7). Claims 1 , 5, 7, and their dependents fall within the four statutory categories of patentable subject matter. 
Step 2A prong 1: The claims are directed to certain methods of organizing human activities and mental processes. The claims store phrases related to a sponsorship message and estimates a period of time before or after the phrase as the sponsor message, performs voice recognition on an audio signal of a broadcast program, detects the stored related phrase from the voice recognition, using the appearance time of the detected related phrase as a start point consider the estimated period of time around the detected related phrase as the sponsor message, and detect a segment that continues for a predetermined period of time as the sponsorship credit display. But for the recitation of the device (claim 1), computer (claim 5), and processor (claim 7) under broadest reasonable interpretation, the claims cover performance of the limitations in the human mind or with pen and paper. A human could store (i.e. write down) a set of related phrases and estimated times around those phrases, interpret an audio signal (i.e. hear it), detect the related phrase in the audio signal (i.e. match it), consider the time frame around the detected related phrase as the sponsorship credit, and detect that it continues for at least a predetermined amount of time. Thus the claims are considered a mental process. Further, determining sponsorship of broadcast program falls under advertising, marketing or sales activities and behaviors as the claims seek to identify a sponsored a segment.
The following limitations, when considered both individually and as an ordered combination, are considered as merely descriptive of abstract concepts: 
Storing a combination including a phrase and an estimation period of a sponsorship credit display segment including a display of a sponsorship credit and the phrase spoken during sponsorship credit, wherein the estimation period from a start to an end of the sponsorship credit display segment includes at least one predetermined period relative to a time when the phrase is spoken; generating voice data of the broadcast program by recognizing voice from an audio signal of the broadcast program using a voice recognition acoustic model/language model; detecting the stored phrase from the voice data; estimating, using a time of the detected phrase spoken in the voice data of the broadcast program and the estimation period, a segment period corresponding to the sponsorship credit display segment including the stored phrase in the broadcast program; and determining a part of the broadcast program according to the segment period as the sponsorship credit display segment associated with the broadcast program (claim 1)
Storing a combination of a phrase and an estimation period of a sponsorship credit display segment including a display of a sponsorship credit and the phrase spoken during sponsorship credit, and wherein the estimation period from a start to an end of the sponsorship credit display segment includes at least one predetermined period relative to a time when the phrase spoken; generating voice data of the broadcast program by recognizing voice from an audio signal of the broadcast program; detecting the stored phrase from the voice data; estimating, using a time of the detected phrase spoken in the voice data of the broadcast program and the estimation period, a segment period corresponding to the sponsorship credit display segment including the stored phrase in the broadcast program; and determining a part of the broadcast program according to the segment period as the sponsorship credit display segment associated with the broadcast program (claim 5)
Storing a combination including a phrase and an estimation period of a sponsorship credit display segment including a display of a sponsorship credit and the phrase spoken during sponsorship credit, wherein the estimation period from a start to an end of the sponsorship credit display segment includes at least one predetermined period relative to a time when the phrase is spoken; generating voice data associated with the broadcast program by recognizing voice from an audio signal of the broadcast program; detecting the stored phrase from the voice data; estimating, based on a time of the detected phrase spoken in the voice data of the broadcast program and the estimation period, a segment period corresponding to the sponsorship credit display segment including the stored phrase in the broadcast program; and determining a part of the broadcast program according to the segment period as the sponsorship credit display segment associated with the broadcast program (claim 7)

The following dependent claim limitations, when considered both individually and as an ordered combination, are considered as merely descriptive of abstract concepts:
	estimating the sponsorship credit display segment based on a video signal of the broadcast program, and estimating, based at least on a logical sum operation or a logical product operation upon data associated with the broadcast program, a segment that includes the stored phrase being spoken and continues for at least a predetermined time period as the sponsorship credit display segment associated with the broadcast program; (claims 2, 6, 14); wherein the estimated period of the sponsorship credit display segment including the stored phrase depends on when the stored phrase is likely to be spoken during the sponsorship credit display (claims 3, 10, 15);  wherein the detecting the stored phrase further comprises outputting time information of the detected sponsorship credit display segment (claims 4, 9, 11, 13, 16, 18) wherein the estimated period of the sponsorship credit display segment including the stored phrase depends on when the stored phrase is likely to be spoken during the sponsorship credit display (claims 8, 12, 17) wherein the phrase includes a name of a sponsor of the broadcast program (claims 19, 20).
Step 2A prong 2: This judicial exception is not integrated into a practical application. The claims recite the additional element of a sponsorship credit display detection device comprising a processor (claim 1) and a processor (claim 7). Claim 5 implies the method is to be implement by a computer but does not recite which steps are actually performed by a computer. Even assuming that the computer is sufficiently claimed, at best claim 5 can be interpreted merely include a computer (claim 5). The generic computing devices are recited at a high level of generality such that it amount to no more than mere instructions to apply the exception using a generic computer (See MPEP 2106.04(d) and subsequently 2106.05(f)). Further, claims 1, 5, and 7 include use of a neural network. Additionally, the use of a neural network is recited at a high level of generality and merely provides a general link to the use of neural networks. (See MPEP 2106.04(d) and subsequently 2106.05(h)). Finally, the recitation of using a neural network is insignificant extra solution activity. The neural network is recited at a high level of generality and is only tangentially related to the invention. (See MPEP 2106.04(d) and subsequently 2106.05(g)). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Step 2B: The claim(s), when considered both individually and as an ordered combination, does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Similarly as with regard to practical application, the current claims merely recite generic computing devices that amounts to no more than mere instruction to apply the abstract idea using a computer. Mere instructions to apply an exception using a generic computer does not provide an inventive concept. Further, the generic recitation of a neural network is a general link to a particular field of use or technological environment (i.e. machine learning). Further, the generic use of a neural network does provide an inventive concepts related to neural networks and is considered as only tangentially related to the invention. Thus t it also considered insignificant extra solution activity.
The use of neural networks were well-understood, routine, and conventional at the time of the invention (See https://www.datanami.com/2017/05/10/machine-learning-deep-learning-ai-whats-difference/ - popularity of machine learning and various types of machine learning algorithms including neural networks – 2017; https://towardsdatascience.com/neural-network-architectures-156e5bad51ba - Deep neural networks and Deep Learning are powerful and popular algorithms – 2017; https://steemit.com/academia/@krishtopa/what-are-neural-networks-why-they-are-so-popular-and-what-problems-can-solve - neural networks are very popular now due to advances in technology – 2016; 
As a result the claims are not patent eligible.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 5, 7, 10, 15, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Forbes et al (US 2009/0006193) in view of Ollis et al (US 2005/0008325) in view of Lee et al (US 2017/0018272)

As per claim 1:

Forbes teaches A sponsorship credit display detection device for determining a sponsorship credit display segment associated with a broadcast program, the device comprising a processor configured to execute a method comprising: (paragraph [0045]): storing a combination including a phrase and an estimation period {…} wherein the estimation period from a start to an end {…} includes at least one predetermined period relative to a time when the phrase is spoken  (paragraphs [0030] During a digital audio session, the audio content streams may be analyzed. For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124… [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like. The {…} indicate a modification to the claim language to show what is expressly taught by Forbes. Limitations regarding the detected segment being a sponsorship credit display will be addressed below.) generating voice data of the broadcast program by recognizing voice from an audio signal of the broadcast program {…} (paragraph [0030] For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In some embodiments, all or a portion of the recognition module may reside on a network server such 110, 112, or 120. The {…} indicate a modification to the claim language to show what is expressly taught by Forbes. Limitations regarding the use of language/acoustic models will be addressed below.) detecting the stored phrase from the voice data {…} (paragraph [0030] In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124. In this way, an utterance (word, phrase, phoneme, etc.) may be compared with a database 124 including utterances, pre-selected words (e.g. "sponsored" words in the database 124 that trigger advertisement forwarding), phrases, combinations of words, particular meanings and so on for determining associated advertisements. The {…} indicate a modification to the claim language to show what is expressly taught by Forbes. Limitations regarding neural networks will be addressed below.) estimating, using a time of the detected phrase spoken in the voice data of the broadcast program and the estimation period, a segment period corresponding to the sponsorship credit display segment including the stored phrase in the broadcast program (paragraph [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like.) determining a part of the broadcast program according to the segment period as the sponsorship credit display segment associated with the broadcast program (paragraphs [0030] During a digital audio session, the audio content streams may be analyzed. For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124… [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like.)
Forbes does not expressly teach the phrase being indicative of sponsorship credit display.
Ollis teaches {wherein the phrase is indicative} of a sponsorship credit display segment including a display of a sponsorship credit and the phrase spoken during sponsorship credit (paragraph [0019] According to another preferred embodiment, the broadcast receiver uses voice recognition to identify particular words which introduce particular types of broadcast programs transmitted by radio stations. For instance, traffic reports are often sponsored and therefore are usually introduced using a sponsor's name, and use the same words to identify the beginning of such a report. When the broadcast receiver detects these introductory words, the predetermined criteria is satisfied, and a broadcast program containing an information report for recording is known to follow.)
It would have been obvious to one of ordinary skill in the art to use voice recognition to determine a sponsorship credit display segment as taught by Ollis in order to identify specific segments in a broadcast (paragraph [0019]). Further, determining a sponsor display credit is the use of a known technique used to improve similar devices/methods in the same way.
The combination does not expressly teach using a neural network for detecting a phrase.
Lee teaches{generating voice data} using a voice recognition acoustic model/language model (paragraph [0057] The voice analyzer 200, according to one or more embodiments, includes a text generator and a text analyzer, as seen, for example, in FIG. 2. The text generator may recognize speech and convert the recognition result into text, such as through the aforementioned acoustic or linguistic models, or other algorithms, for example, the voice analyzer 200 may extract the raw voice data as audio frame data, and input audio frame data to the acoustic or linguistic models. The text analyzer may compare the generated text with the user's topics of interest to evaluate the relevance. The voice analyzer 200 may apply one or more of support vector machine (SVM) classification, neural network classification, or other classification, as would be known to one of skill in the art after gaining a thorough understanding of the entirety of the description provided herein, to classify the generated text, and then evaluate the relevance between the classification result and the user's topics of interest. In the case of using the neural network classification, for example, the voice analyzer 200 may extract keywords from the recognized result, identify a context in which the speech is involved, and classify the recognized speech into a specific category.) {detecting a stored phrase from} voice data using a trained neural network  (paragraph [0057] The voice analyzer 200, according to one or more embodiments, includes a text generator and a text analyzer, as seen, for example, in FIG. 2. The text generator may recognize speech and convert the recognition result into text, such as through the aforementioned acoustic or linguistic models, or other algorithms, for example, the voice analyzer 200 may extract the raw voice data as audio frame data, and input audio frame data to the acoustic or linguistic models. The text analyzer may compare the generated text with the user's topics of interest to evaluate the relevance. The voice analyzer 200 may apply one or more of support vector machine (SVM) classification, neural network classification, or other classification, as would be known to one of skill in the art after gaining a thorough understanding of the entirety of the description provided herein, to classify the generated text, and then evaluate the relevance between the classification result and the user's topics of interest. In the case of using the neural network classification, for example, the voice analyzer 200 may extract keywords from the recognized result, identify a context in which the speech is involved, and classify the recognized speech into a specific category.)
It would have been obvious to one of ordinary skill in the art at the time of filing the invention to include using a neural network as taught by Lee in order to identify content in which speech in involved and classify recognized speech into specific categories (paragraph [0054]). Further, using a neaural network to identify a phrase is the use of a known technique used to improve similar devices/methods in the same way.


As per claim 5:


Forbes teaches A computer implemented method for determining a sponsorship credit display segment associated with a broadcast program, the method comprising; (paragraph [0045]): storing a combination including a phrase and an estimation period {…} wherein the estimation period from a start to an end {…} includes at least one predetermined period relative to a time when the phrase is spoken  (paragraphs [0030] During a digital audio session, the audio content streams may be analyzed. For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124… [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like. The {…} indicate a modification to the claim language to show what is expressly taught by Forbes. Limitations regarding the detected segment being a sponsorship credit display will be addressed below.) generating voice data of the broadcast program by recognizing voice from an audio signal of the broadcast program (paragraph [0030] For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In some embodiments, all or a portion of the recognition module may reside on a network server such 110, 112, or 120.) detecting the stored phrase from the voice data {…} (paragraph [0030] In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124. In this way, an utterance (word, phrase, phoneme, etc.) may be compared with a database 124 including utterances, pre-selected words (e.g. "sponsored" words in the database 124 that trigger advertisement forwarding), phrases, combinations of words, particular meanings and so on for determining associated advertisements. The {…} indicate a modification to the claim language to show what is expressly taught by Forbes. Limitations regarding neural networks will be addressed below.) estimating, using a time of the detected phrase spoken in the voice data of the broadcast program and the estimation period, a segment period corresponding to the sponsorship credit display segment including the stored phrase in the broadcast program (paragraph [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like.) determining a part of the broadcast program according to the segment period as the sponsorship credit display segment associated with the broadcast program (paragraphs [0030] During a digital audio session, the audio content streams may be analyzed. For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124… [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like.)
Forbes does not expressly teach the phrase being indicative of sponsorship credit display.
Ollis teaches {wherein the phrase is indicative} of a sponsorship credit display segment including a display of a sponsorship credit and the phrase spoken during sponsorship credit (paragraph [0019] According to another preferred embodiment, the broadcast receiver uses voice recognition to identify particular words which introduce particular types of broadcast programs transmitted by radio stations. For instance, traffic reports are often sponsored and therefore are usually introduced using a sponsor's name, and use the same words to identify the beginning of such a report. When the broadcast receiver detects these introductory words, the predetermined criteria is satisfied, and a broadcast program containing an information report for recording is known to follow.)
It would have been obvious to one of ordinary skill in the art to use voice recognition to determine a sponsorship credit display segment as taught by Ollis in order to identify specific segments in a broadcast (paragraph [0019]). Further, determining a sponsor display credit is the use of a known technique used to improve similar devices/methods in the same way.
The combination does not expressly teach using an acoustic model/language model or a neural network for detecting a phrase.
Lee teaches  {detecting a stored phrase from} voice data using a trained neural network  (paragraph [0057] The voice analyzer 200, according to one or more embodiments, includes a text generator and a text analyzer, as seen, for example, in FIG. 2. The text generator may recognize speech and convert the recognition result into text, such as through the aforementioned acoustic or linguistic models, or other algorithms, for example, the voice analyzer 200 may extract the raw voice data as audio frame data, and input audio frame data to the acoustic or linguistic models. The text analyzer may compare the generated text with the user's topics of interest to evaluate the relevance. The voice analyzer 200 may apply one or more of support vector machine (SVM) classification, neural network classification, or other classification, as would be known to one of skill in the art after gaining a thorough understanding of the entirety of the description provided herein, to classify the generated text, and then evaluate the relevance between the classification result and the user's topics of interest. In the case of using the neural network classification, for example, the voice analyzer 200 may extract keywords from the recognized result, identify a context in which the speech is involved, and classify the recognized speech into a specific category.)
It would have been obvious to one of ordinary skill in the art at the time of filing the invention to include using a neural network as taught by Lee in order to identify content in which speech in involved and classify recognized speech into specific categories (paragraph [0054]). Further, using a neural network to identify a phrase is the use of a known technique used to improve similar devices/methods in the same way.

As per claim 7:

Forbes teaches a system comprising a processor configured to execute a method comprising: (paragraph [0045]): { storing a combination including a phrase and an estimation period {…} wherein the estimation period from a start to an end {…} includes at least one predetermined period relative to a time when the phrase is spoken  (paragraphs [0030] During a digital audio session, the audio content streams may be analyzed. For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124… [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like. The {…} indicate a modification to the claim language to show what is expressly taught by Forbes. Limitations regarding the detected segment being a sponsorship credit display will be addressed below.) generating voice data of the broadcast program by recognizing voice from an audio signal of the broadcast program (paragraph [0030] For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In some embodiments, all or a portion of the recognition module may reside on a network server such 110, 112, or 120.) detecting the stored phrase from the voice data {…} (paragraph [0030] In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124. In this way, an utterance (word, phrase, phoneme, etc.) may be compared with a database 124 including utterances, pre-selected words (e.g. "sponsored" words in the database 124 that trigger advertisement forwarding), phrases, combinations of words, particular meanings and so on for determining associated advertisements. The {…} indicate a modification to the claim language to show what is expressly taught by Forbes. Limitations regarding neural networks will be addressed below.) estimating, using a time of the detected phrase spoken in the voice data of the broadcast program and the estimation period, a segment period corresponding to the sponsorship credit display segment including the stored phrase in the broadcast program (paragraph [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like.) determining a part of the broadcast program according to the segment period as the sponsorship credit display segment associated with the broadcast program (paragraphs [0030] During a digital audio session, the audio content streams may be analyzed. For example, a recognition module (such as recognition module 122 in the first client 102) may identify words or utterances using speech recognition algorithm which converts the digital audio data into computer recognizable data. In an example, the recognition module 122 may identify the term "truck" in the digital audio session, and generate the corresponding Unicode text (or any other suitable computer recognizable data) equivalent of "truck" for comparison with the database 124… [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like.)
Forbes does not expressly teach the phrase being indicative of sponsorship credit display.
Ollis teaches {wherein the phrase is indicative} of a sponsorship credit display segment including a display of a sponsorship credit and the phrase spoken during sponsorship credit (paragraph [0019] According to another preferred embodiment, the broadcast receiver uses voice recognition to identify particular words which introduce particular types of broadcast programs transmitted by radio stations. For instance, traffic reports are often sponsored and therefore are usually introduced using a sponsor's name, and use the same words to identify the beginning of such a report. When the broadcast receiver detects these introductory words, the predetermined criteria is satisfied, and a broadcast program containing an information report for recording is known to follow.)
It would have been obvious to one of ordinary skill in the art to use voice recognition to determine a sponsorship credit display segment as taught by Ollis in order to identify specific segments in a broadcast (paragraph [0019]). Further, determining a sponsor display credit is the use of a known technique used to improve similar devices/methods in the same way.
The combination does not expressly teach using a neural network for detecting a phrase.
Lee teaches {detecting a stored phrase from} voice data using a trained neural network  (paragraph [0057] The voice analyzer 200, according to one or more embodiments, includes a text generator and a text analyzer, as seen, for example, in FIG. 2. The text generator may recognize speech and convert the recognition result into text, such as through the aforementioned acoustic or linguistic models, or other algorithms, for example, the voice analyzer 200 may extract the raw voice data as audio frame data, and input audio frame data to the acoustic or linguistic models. The text analyzer may compare the generated text with the user's topics of interest to evaluate the relevance. The voice analyzer 200 may apply one or more of support vector machine (SVM) classification, neural network classification, or other classification, as would be known to one of skill in the art after gaining a thorough understanding of the entirety of the description provided herein, to classify the generated text, and then evaluate the relevance between the classification result and the user's topics of interest. In the case of using the neural network classification, for example, the voice analyzer 200 may extract keywords from the recognized result, identify a context in which the speech is involved, and classify the recognized speech into a specific category.)
It would have been obvious to one of ordinary skill in the art at the time of filing the invention to include using a neural network as taught by Lee in order to identify content in which speech in involved and classify recognized speech into specific categories (paragraph [0054]). Further, using a neaural network to identify a phrase is the use of a known technique used to improve similar devices/methods in the same way.


Forbes and Ollis teach the limitations of claim 1, 5, and 7. As per claims 3, 10, and 15:

Ollis further teaches wherein the estimated period of the sponsorship display segment including the stored phrase depends on when the stored phrase is likely to be spoken during the sponsorship credit display (paragraph [0019] According to another preferred embodiment, the broadcast receiver uses voice recognition to identify particular words which introduce particular types of broadcast programs transmitted by radio stations. For instance, traffic reports are often sponsored and therefore are usually introduced using a sponsor's name, and use the same words to identify the beginning of such a report. When the broadcast receiver detects these introductory words, the predetermined criteria is satisfied, and a broadcast program containing an information report for recording is known to follow.)
It would have been obvious to one of ordinary skill in the art to use voice recognition to determine a sponsorship credit display segment as taught by Ollis in order to identify specific segments in a broadcast (paragraph [0019]). Further, determining a sponsor display credit is the use of a known technique used to improve similar devices/methods in the same way.


Forbes and Ollis teach the limitations of claim 1 and 5. As per claims 19 and 20:

Ollis further teaches wherein the phrase includes a name of a sponsor of the broadcast program (paragraph [0019] According to another preferred embodiment, the broadcast receiver uses voice recognition to identify particular words which introduce particular types of broadcast programs transmitted by radio stations. For instance, traffic reports are often sponsored and therefore are usually introduced using a sponsor's name, and use the same words to identify the beginning of such a report. When the broadcast receiver detects these introductory words, the predetermined criteria is satisfied, and a broadcast program containing an information report for recording is known to follow.)
It would have been obvious to one of ordinary skill in the art to use voice recognition to determine a sponsorship credit display segment as taught by Ollis in order to identify specific segments in a broadcast (paragraph [0019]). Further, determining a sponsor display credit is the use of a known technique used to improve similar devices/methods in the same way.


Claims 2, 8, 6, 12, 14, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Forbes et al (US 2009/0006193) in view of Ollis et al (US 2005/0008325) in view of Tajima et al (US 2001/0031129)

Forbes and Ollis teach the limitations of claim 1, 5, and 7. As per claims 2, 6, and 14:

The combination does not expressly teach using video signals in combination with the audio to determine segments. 
Tajima teaches estimating the sponsorship credit display segment based on a video signal of the broadcast program, an estimating, based at least on a logical sum operation or a logical product operation upon data associated with the broadcast program a segment that includes the stored phrase being spoken and continues for at least a predetermined time period as the sponsorship credit display segment associated with the broadcast program (paragraph [0066] And the face image database 5 and the audio database 15 can be combined into one unit, and also the person designating means 4 and the audio designating means 14 can be combined into one unit, and an inquiring object is designated by using a keyboard or a mouse. In this case, a designated voice in a designated period, or a designated face image or a designated object in one frame of the video signal, is designated. This designation can be executed as a logical sum OR, in which only an audio signal is designated or only a video signal is designated separately. Or this designation can be executed as a logical product AND, in which an audio signal and a video signal are added. These designating means makes an inquiring object or an inquiring voice desired by a user display on a display (not shown) and designates the inquiring object or the inquiring voice.)
It would have been obvious to one of ordinary skill in the art at the time of filing the invention to include using video signals in combination with the audio to determine segments as taught by Tajima in order to determine desired content (paragraph [0001]). Further the use of such technique is the use of a known technique used to improve similar devices/methods in the same way.

Forbes, Ollis, And Tajima teach the limitations of claims 2, 6, and 14. As per claims 8, 12, and 17

Forbes further teaches wherein the estimated period of the sponsorship credit display segment including the stored phrase depends on when the stored phrase is likely to be spoken  during the sponsorship credit display (paragraph [0031] The recognition module 122 may sample utterances at set intervals, throughout the digital audio session and so on. For example, the recognition module may use a speech recognition algorithm which samples audio data when a particular client is providing audio, sample at intervals, sample throughout the digital audio event, sample at intervals until a "sponsored" term, or a term included in a database 124 is identified at which point the speech recognition is applied for a period of time (or until a decision is reached as to the applicability of the term) and the like.)


Claims 4, 11, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Forbes et al (US 2009/0006193) in view of Ollis et al (US 2005/0008325) in view of Lester (US 2015/0067459)


Forbes and Ollis teach the limitations of claim 1, 5, and 7. As per claims 4, 11, and 16:

The combination does not expressly teach outputting time information.
Lester teaches wherein the detecting the stored phrase further comprises outputting time information of the detected sponsorship credit display segment (paragraph [0014] The textual transcription may be obtained either by obtaining a pre-existing transcription or by generating one using speech-to-text recognition techniques. The transcription may include timestamps for each of the words recognized in the audio content representing a time interval during which a particular word is spoken, sung, shouted, or otherwise presented in the audio content.)
It would have been obvious to one of ordinary skill in the art at the time of filing the invention to include time information as taught by Lester in order to aid in the analyzing of substantive material (paragraph [0014]). Further, recording time information in relation to identified data is the use of a known technique used to improve similar devices/methods in the same way.


Claims 9, 13, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Forbes et al (US 2009/0006193) in view of Ollis et al (US 2005/0008325) in view of Tajima et al (US 2001/0031129) in view of Lester (US 2015/0067459)

Forbes, Ollis, And Tajima teach the limitations of claims 2, 6, and 14. As per claims 9, 13, and 18:

The combination does not expressly teach outputting time information.
Lester teaches wherein the detecting the stored phrase further comprises outputting time information of the detected sponsorship credit display segment (paragraph [0014] The textual transcription may be obtained either by obtaining a pre-existing transcription or by generating one using speech-to-text recognition techniques. The transcription may include timestamps for each of the words recognized in the audio content representing a time interval during which a particular word is spoken, sung, shouted, or otherwise presented in the audio content.)
It would have been obvious to one of ordinary skill in the art at the time of filing the invention to include time information as taught by Lester in order to aid in the analyzing of substantive material (paragraph [0014]). Further, recording time information in relation to identified data is the use of a known technique used to improve similar devices/methods in the same way.

Response to Arguments

The examiner has considered and finds persuasive applicant’s arguments regarding previous objections to claim 18. As a result such objection has been withdrawn. 
The examiner has considered and finds persuasive applicant’s arguments with regard to previous rejections under 35 USC 112. As a result such rejections have been withdrawn.
The examiner has considered and finds persuasive applicant’s arguments regarding the software per se 101 rejection of claims 7 and 14-18. As a result such rejection has been withdrawn.
The examiner has considered but does not find persuasive applicant’s arguments with regard to rejections under 35 USC 101 of claims 1-20 as being directed to abstract ideas without significantly more. Applicant argues that the claims are not directed to abstract ideas. The examiner respectfully disagrees. But for the inclusion of generic computing devices the claims implement  a series of steps that can be done in the human mind or with pen and paper. Additionally the claims are directed to identifying a sponsored segment and this qualify as organizing human activities in the form of advertising, marketing or sales behaviors. Neither the claims nor the specification provide any meaningful details with regard to the acoustic/language model. Thus under broadest reasonable interpretation the model could merely be listening to something or interpreting language. Further, neural networks are merely the application of mathematical calculations by a computing device which could be done in the same manner by a human. The use of a neural network could potentially overcome 101 with a level of detail that indicated that the way in which the neural network was being implemented involved performing the process in a way that a human would not, similar to the findings in McRo. However, the claims only make reference to a generic use of a neural network without providing any detail as to how the neural network is performing the detection. As such, the neural network provides a general link to a particular technological environment and is only tangentially related to the invention. Nothing in the claims provides an improvement to the field of machine learning or in particular neural networks. Thus the claims no not contain an inventive concept related to neural networks. As a result such rejections have been maintained.
Applicant’s arguments with regard to previous rejections under 35 USC 103 are moot in light of new grounds of rejection necessitated by amendment. With regard to Forbes and Ollis, the examiner respectfully disagrees with applicant’s arguments. Forbes analyzes audio data to determine if particularly words or phrases were heard and then analyzes a specific time period around the word to identify content. Admittedly Forbes does not identify content specifically to identify a sponsored segment however Ollis detects the names of sponsors for identifying sponsored segments. Thus the examiner finds the combination teaches the claimed manner of identifying sponsored segments. 
 
Conclusion

Other prior art not relied upon but considered relevant include:
Baughman et al (US 2019/0313154) – paragraph [0065]
Tsunokawa (US 2008/0285944) – paragraph [0069]
David et al (US 2018/0278999) – paragraph [0035]

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER STROUD whose telephone number is (571)272-7930. The examiner can normally be reached Mon. - Fri. 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kambiz Abdi can be reached on 571-272-6702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/C.S/             Examiner, Art Unit 3688                                                                                                                                                                                           
/KAMBIZ ABDI/             Supervisory Patent Examiner, Art Unit 3688