Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-12 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more.
The independent claims 1 and 7 recite “ A computer/method for controlling navigation through a content item, for production of an output to a user corresponding to that content item, the computer comprising: an audio stream acquisition unit for acquiring a stream of audio samples; a sound detector for detecting, on the stream of audio samples, one or more non-verbal sound identifiers, each non-verbal sound identifier identifying a non-verbal sound signature on the stream of audio samples; a translator for translating a sequence of one or more non-verbal sound identifiers into a sequence of one or more navigation commands relating to the content item; a content navigator, responsive to navigation commands from the translator, to cause navigation through the content item and generation of a corresponding output to a user.”
The limitations of “acquisition”, “acquiring”, “detecting”, and “translating” as drafter cover a human organizing of activities where a human observes a person interacting with document and when the person wants to navigate to a different section of the document, the human interacts with the computer and navigates to the right place.
he functions or algorithms described herein may be implemented in software in one embodiment. The software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware-based storage devices, either local or networked. Further, such functions correspond to modules, which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples. The software may be executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine.”,  the elements “computer”, “computer program” are all general purpose computer devices.
 Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claims are not patent eligible.




Claims 3 and 9 recite wherein the translator is operable to acquire the sequence correspondence model by virtue of processing the content item to identify one or more elements of the content that can be associated with a detectable sound event.  This amounts to a human listening for sounds of a person cutting onions as they are following a recipe, the human advances the document to the next step in the document when the cutting stops.
Claims 4 and 10 recite wherein the translator is operable to acquire the sequence correspondence model by way of text analysis of the content item.  This amounts to a human analyzing a text document a person is reading and from the context of the document knows to navigate to a different section when it hears a snap from the person.
Claims 5 and 11 recite wherein the translator is operable to acquire the sequence correspondence model by way of image analysis of the content item.  This amounts to a human observing people reading a document, and if they look perplexed, to take them to a reference that has more details, otherwise navigate them to the next section. 
Claims 6 and 12 recite configured to receive a human input action, and wherein the translator is operable to acquire a sequence correspondence model on the basis of a received human input action.  This amounts to a human observing a person reading a document and moving to the appropriate section of the document based on the person’s verbal commands.

Claim Rejections-35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 7 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Harif (US-6820056-B1).
With respect to claims 1 and 7 Harif teaches A computer for controlling navigation through a content item, for production of an output to a user corresponding to that content item, the computer comprising:
an audio stream acquisition unit (Col 4 ll 14-15 Sound and/or speech input 50 is applied through microphone 51 which represents a speech input device. ) for acquiring a stream of audio samples;
a sound detector (Col 4 ll14-15 Sound and/or speech input 50 is applied through microphone 51 which represents a speech input device, and, Col4 ll 44-47: These sounds may be discerned by the above-described voice recognition apparatus based upon digitized sound patterns) for detecting, on the stream of audio samples, one or more non-verbal sound identifiers (Col 4 ll 37-46...In addition, there is stored a basic set of sound commands 52. These sound commands are represented by stored non-verbal sounds. Some examples are vocal: long and short whistles, coughs or hacks, teeth clicks, mouth-tongue clacks and hisses; or manual-physical: knocking on a desk, tapping on a computer case with a metallic object, clapping hands or rubbing sounds. These sounds may be discerned by the above-described voice recognition apparatus based upon digitized sound patterns.), each non-verbal (Col 4 ll 50-53 Thus, a comparison 55 is made of an input of non-verbal sound to the stored non-verbal sound commands 52…);
a translator for translating a sequence of one or more non-verbal sound identifiers ( Col 4 ll 33-54: The input speech goes through a recognition process which seeks a comparison 54 to a stored set of speech words in word tables 53...These sound commands are represented by stored non-verbal sounds. Some examples are vocal: long and short whistles, coughs or hacks, teeth clicks, mouth-tongue clacks and hisses; or manual-physical: knocking on a desk, tapping on a computer case with a metallic object, clapping hands or rubbing sounds...Thus, a comparison 55 is made of an input of non-verbal sound to the stored non-verbal sound commands 52 and recognized non-verbal sounds are input via display adapter 36 to display 38.) into a sequence of one or more navigation commands ( Col 5 ll 40-49 TO MOVE CURSOR‌ SOUND COMMAND‌ Hand Clap Move to Right‌ Mouth-Tongue Clack Move to Left‌ Knock Move Up‌ Metallic Tap Move Down) relating to the content item (Col 4 ll 65-66 The initial display screen of FIG. 3 shows a display screen 61. In the panel, a window will show the recognized speech words that the user speaks arranged in a conventional text string 62);
a content navigator, responsive to navigation commands from the translator, to cause navigation through the content item and generation of a corresponding output to a user (Col 5 ll 8-15 The cursor 68 may then be moved by commands, e.g. hand clap moves cursor to the right, tongue/mouth clack moves cursor to the left, knocking on desk moves cursor up and metallic tapping moves cursor down. The sound recognition and command execution may be set up so that a sequence of the command sounds (claps, clacks, knocks and taps) accelerate the cursor in the selected direction.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2, 3, 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Harif in view of Celinski (US-20180374512 A1)
With respect to claims 2 and 8 Harif does not teach wherein the translator is operable to acquire a sequence correspondence model from the content item.
Celinski teaches wherein the translator is operable to acquire a sequence correspondence model ([0052]…elements of interest are identified using a learning classifier based on a deep-learning mode, and [0046] … the method 200 may extract clips of the media stream using the timestamp and metadata information associated with the identified elements of interest. The method 200 then generates the curated media clips, such as by stitching together the extracted media clips into a curated media stream (214)) from the content item ([0043] Referring to FIG. 4, a media stream curation method 200 receive at least one media stream).  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Harif to include the teachings of Celinski, motivation being to 
With respect to claims 3 and 9 Harif does not teach wherein the translator is operable to acquire the sequence correspondence model by virtue of processing the content item to identify one or more elements of the content that can be associated with a detectable sound event.
Celinski teaches wherein the translator is operable to acquire the sequence correspondence model ([0052] In some embodiments, elements of interest are identified using a learning classifier based on a deep-learning mode) by virtue of processing the content  item ([0043] Referring to FIG. 4, a media stream curation method 200 receive at least one media stream) to identify one or more elements of the content that can be associated ([0046] The method 200 then proceeds to process the media stream using the identified elements of interest (212). For example, the method 200 may extract clips of the media stream using the timestamp and metadata information associated with the identified elements of interest. The method 200 then generates the curated media clips, such as by stitching together the extracted media clips into a curated media stream (214)) with a detectable sound event ([0043] The audio cues can be verbal cues and/or non-verbal cues. The method 200 generates timestamps and associated metadata for the detected audio cues (206)).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Harif to include the teachings of Celinski, motivation being to infer non-verbal audio commands directly from the accompanying audio without need of external voice input  (Celinski, [0019]).
Claims 4 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Harif in view of Celinski and in further view of Nassar (Nassar, Hamed & Taha, Ahmed & Nazmy, T & Nagaty, Khaled. (2008). RETRIEVING OF VIDEO SCENES USING ARABIC CLOSED-CAPTION. International Journal of Intelligent Computing and Information Systems (IJICIS). 8. 191-203.)

Nassar teaches wherein the translator is operable to acquire the sequence correspondence model (p5 para 1: In the indexing process, the video document is first segmented into a set of scenes using a video scene detection algorithm. Then, the closed caption text of the video document is segmented to extract the caption of each detected scene. After that, each video scene extracted in the scene segmentation process is classified into predefined semantic categories. This will provide users a table of content similar to that of printed books to facilitate navigation) by way of text analysis (p3 para 3 ll4-6: The closed-caption is the video transcript that represents the spoken dialogue or narration and it includes also sound effects, speaker identification, music, and other nonspeech information.) of the content item (p5 last paragraph: The proposed approach is built on the idea that the embedded text in the video document is very useful in video analysis, especially for high-level semantic content analysis.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Harif and Celinski to include the teachings of Nassar, motivation being to use embedded non-verbal cues for fast navigation in  video applications (Nassar, p6 para 1).
Claims 5 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Harif in view of Celinski and in further view of Hsu (US 20190066681 A1)
With respect to claims 5 and 11 Harif and Celinski does not teach wherein the translator is operable to acquire the sequence correspondence model by way of image analysis of the content item.  
Hsu teaches wherein the translator is operable to acquire the sequence correspondence model ([0065] In block 508, at least one statement command is determined based on the lip movement of the user, and [0054] The controller 302 can utilize a learning model to update the elevator command database 330 with terms and keywords for later usage. For example, a user might state, “I'd like to go to the top.” Keywords such as a floor or a number are not included in this statement command. The controller 302, through the display 308, might provide feedback indicating that the command was not recognized.) by way of image analysis (Abstract:...wherein the receiving the statement command from the user includes capturing, by a sensor, a series of frames of the user, wherein the series of frames includes lip movements of the user) of the content item ([0047] Referring to FIG. 3, a block diagram of an elevator control system for a spoken command interface...The system 300 includes a controller 302 for performing the elevator control functions described herein).  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Harif and Celinski to include the teachings of Hsu, motivation being to infer non-verbal audio commands directly from video without need of external voice (Hsu, Abstract).
Claims 6 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Harif in view of Igarashi (Igarashi T, Hughes JF. Voice as sound: using non-verbal voice input for interactive control. In Proceedings of the 14th annual ACM symposium on User interface software and technology 2001 Nov 11 (pp. 155-156). (Year: 2001))
With respect to claims 6 and 12  Harif does not teach configured to receive a human input action, and wherein the translator is operable to acquire a sequence correspondence model on the basis of a received human input action.  
Igarashi teaches wherein the translator is operable to acquire a sequence correspondence model (Introduction ll 9-12: This paper proposes the use of non-verbal features in speech, features like pitch, volume, and continuation, to directly control interactive applications..) on the basis of a received human input action. (p1 col 2 2nd to last para: When the user says "move up, ahhhh", the map on the screen scrolls down while the sound continues. When the user increases the pitch of his voice, the scrolling speed increases, and vice versa. When the user stops speaking, the scrolling ends. We also combined this technique with a speed-dependent automatic zooming interface.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Harif to include the teachings of Igarashi, motivation being to provide a more direct speedy interaction that using voice pitch and volume (Igarashi, Introduction).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/A.N.P./               Examiner, Art Unit 2657                                                                                                                                                                                         
/Paras D Shah/               Primary Examiner, Art Unit 2659                                                                                                                                                                                         
04/04/2021