DETAILED ACTION
Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 26, 38 and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Terterov et al. (USPAP       2019/0246,138), hereinafter, “Terterov”, in view of Negishi et al. (USPN       2005/0204,291), hereinafter, “Negishi”.

                  Regarding claim 26 Terterov recites, at least one memory; and logic, at least a portion of the logic comprised in hardware coupled to the at least one memory (Please note, figure 1), the logic to: receive a source video comprising a plurality of frames(Please note, figure 2, block 202, “receiving video information to be encoded, the video information comprising a sequence of video frames”), determine a plurality of regions for the plurality of frames, generate at least one region-sequence connecting the determined plurality of regions (Please note, figure 2, block 204, “Aligning a plurality of consecutive frames in the sequence of video frames by spatially shifting at least one frame of the plurality of consecutive frames, giving rise to a pre-processed sequence of video frames including the aligned plurality of consecutive frames”).
        Terterov does not expressly recite, applying a language model to the at least one region-sequence to generate description information comprising a description of at least a portion of content of the source video. 
        Negishi recites, applying a language model to the at least one region-sequence to generate description information comprising a description of at least a portion of content of the source video (Please note, paragraph 0004. As indicated there are contents described with scene description enabling interaction by user input, such as digital TV broadcasting and DVD (Digital Video/versatile Disk), Internet home pages described with HyperText Markup Language (hereafter referred to as "HTML") or the like, Binary Format for the Scene (hereafter referred to as "MPEG-4 BIFS") which is a scene description format stipulated in ISO/IEC14496-1, Virtual Reality Modeling Language (hereafter referred to as "VRML") which is stipulated in ISO/IEC14472, and so forth. The data of such contents will hereafter be referred to as "scene description". Scene description also includes the data of audio, images, computer graphics, etc., used within the contents).
        Terterov & Negishi are combinable because they are from the same field of endeavor.
        At the time before the effective filing date, it would have been obvious to a person of ordinary skill in the art to utilize this language model operation of Negishi in Terterov’s invention.
        The suggestion/motivation for doing so would have been as indicated on paragraph 0004, “to further enable interaction by user”.
                   Therefore, it would have been obvious to combine Negishi with Terterov to obtain the invention as specified in claim 26.
                    Regarding claims 38 and 45, analysis similar to those presented for claim 26, are applicable.













Claims 27, 39 and 46 are rejected under 35 U.S.C. 103 as being unpatentable over Terterov et al. (USPAP       2019/0246,138), hereinafter, “Terterov”, in view of Negishi et al. (USPAP       2005/0204,291), hereinafter, “Negishi”, as applied to claim 26 above, and further in view of Hammoud et al. (USPAP       2017/0024,899), hereinafter, “Hammoud”.

           Regarding claim 27, Terterov recites, at least one memory; and logic, at least a portion of the logic comprised in hardware coupled to the at least one memory (Please note, figure 1), the logic to: receive a source video comprising a plurality of frames(Please note, figure 2, block 202, “receiving video information to be encoded, the video information comprising a sequence of video frames”), determine a plurality of regions for the plurality of frames, generate at least one region-sequence connecting the determined plurality of regions (Please note, figure 2, block 204, “Aligning a plurality of consecutive frames in the sequence of video frames by spatially shifting at least one frame of the plurality of consecutive frames, giving rise to a pre-processed sequence of video frames including the aligned plurality of consecutive frames”).
                Negishi recites, applying a language model to the at least one region-sequence to generate description information comprising a description of at least a portion of content of the source video (Please note, paragraph 0004. As indicated there are contents described with scene description enabling interaction by user input, such as digital TV broadcasting and DVD (Digital Video/versatile Disk), Internet home pages described with HyperText Markup Language (hereafter referred to as "HTML") or the like, Binary Format for the Scene (hereafter referred to as "MPEG-4 BIFS") which is a scene description format stipulated in ISO/IEC14496-1, Virtual Reality Modeling Language (hereafter referred to as "VRML") which is stipulated in ISO/IEC14472, and so forth. The data of such contents will hereafter be referred to as "scene description". Scene description also includes the data of audio, images, computer graphics, etc., used within the contents).

       Hammoud recites generating a captioned video comprises at least one of the plurality of frames annotated with the at least one region-sequence and the description information (Please note, paragraph 0005. As indicated to determine the correspondences between the video frames and associated text in order to annotate the video frames with more reliable labels and descriptions).
        Terterov, Negishi & Hammoud are combinable because they are from the same field of endeavor.
        At the time of the invention, it would have been obvious to a person of ordinary skill in the art to utilize this generating a captioned video comprises at least one of the plurality of frames annotated with the at least one region-sequence and the description information of Hammoud in Terterov & Negishi’s invention.
        The suggestion/motivation for doing so would have been as indicated on paragraph 0005, “to arrive at more reliable labels and descriptions”.
                   Therefore, it would have been obvious to combine Terterov, Negishi with Hammoud to obtain the invention as specified in claim 27.
                   Regarding claims 39 and 46, analysis similar to those presented for claim 27, are applicable.






Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Terterov et al. (USPAP       2019/0246,138), hereinafter, “Terterov”, in view of Negishi et al. (USPAP       2005/0204,291), hereinafter, “Negishi”, as applied to claim 26 above, and further in view of Cheng et al. (USPAP       2016/0154,882), hereinafter, “Cheng”.

           Regarding claim 28, Terterov recites, at least one memory; and logic, at least a portion of the logic comprised in hardware coupled to the at least one memory (Please note, figure 1), the logic to: receive a source video comprising a plurality of frames(Please note, figure 2, block 202, “receiving video information to be encoded, the video information comprising a sequence of video frames”), determine a plurality of regions for the plurality of frames, generate at least one region-sequence connecting the determined plurality of regions (Please note, figure 2, block 204, “Aligning a plurality of consecutive frames in the sequence of video frames by spatially shifting at least one frame of the plurality of consecutive frames, giving rise to a pre-processed sequence of video frames including the aligned plurality of consecutive frames”).
                Negishi recites, applying a language model to the at least one region-sequence to generate description information comprising a description of at least a portion of content of the source video (Please note, paragraph 0004. As indicated there are contents described with scene description enabling interaction by user input, such as digital TV broadcasting and DVD (Digital Video/versatile Disk), Internet home pages described with HyperText Markup Language (hereafter referred to as "HTML") or the like, Binary Format for the Scene (hereafter referred to as "MPEG-4 BIFS") which is a scene description format stipulated in ISO/IEC14496-1, Virtual Reality Modeling Language (hereafter referred to as "VRML") which is stipulated in ISO/IEC14472, and so forth. The data of such contents will hereafter be referred to as "scene description". Scene description also includes the data of audio, images, computer graphics, etc., used within the contents).


       Cheng recites utilizing a natural language description (Please note, paragraph 0005. As indicated the video search assistant may identify a video likely depicting the complex event of interest and present to a user a natural language description of one or more segments of the video that relate to the complex event).
        Terterov, Negishi & Cheng are combinable because they are from the same field of endeavor.
        At the time of the invention, it would have been obvious to a person of ordinary skill in the art to utilize this natural language description of Cheng in Terterov & Negishi’s invention.
        The suggestion/motivation for doing so would have been as indicated on paragraph 0005, “to address a complex event”.
                   Therefore, it would have been obvious to combine Terterov, Negishi with Cheng to obtain the invention as specified in claim 28.
                   











Claim 37 is rejected under 35 U.S.C. 103 as being unpatentable over Terterov et al. (USPAP       2019/0246,138), hereinafter, “Terterov”, in view of Negishi et al. (USPAP       2005/0204,291), hereinafter, “Negishi”, as applied to claim 26 above, and further in view of Dai et al. (USPN       10,528,866), hereinafter, “Dai”.

           Regarding claim 37, Terterov recites, at least one memory; and logic, at least a portion of the logic comprised in hardware coupled to the at least one memory (Please note, figure 1), the logic to: receive a source video comprising a plurality of frames(Please note, figure 2, block 202, “receiving video information to be encoded, the video information comprising a sequence of video frames”), determine a plurality of regions for the plurality of frames, generate at least one region-sequence connecting the determined plurality of regions (Please note, figure 2, block 204, “Aligning a plurality of consecutive frames in the sequence of video frames by spatially shifting at least one frame of the plurality of consecutive frames, giving rise to a pre-processed sequence of video frames including the aligned plurality of consecutive frames”).
                Negishi recites, applying a language model to the at least one region-sequence to generate description information comprising a description of at least a portion of content of the source video (Please note, paragraph 0004. As indicated there are contents described with scene description enabling interaction by user input, such as digital TV broadcasting and DVD (Digital Video/versatile Disk), Internet home pages described with HyperText Markup Language (hereafter referred to as "HTML") or the like, Binary Format for the Scene (hereafter referred to as "MPEG-4 BIFS") which is a scene description format stipulated in ISO/IEC14496-1, Virtual Reality Modeling Language (hereafter referred to as "VRML") which is stipulated in ISO/IEC14472, and so forth. The data of such contents will hereafter be referred to as "scene description". Scene description also includes the data of audio, images, computer graphics, etc., used within the contents).


       Dai recites utilizing a long-short term memory networks (LSTMs) (Please note, claim 18. As indicated processing the LSTM output using the language model output layer to generate a set of word scores, wherein the set of word scores comprises a respective score for each of a plurality of vocabulary words that represents a likelihood that the vocabulary word is the word that appears in the predetermined position in the particular input document relative to the words in the sequence of input words).
        Terterov, Negishi & Dai are combinable because they are from the same field of endeavor.
        At the time of the invention, it would have been obvious to a person of ordinary skill in the art to utilize this long-short term memory networks (LSTMs) of Dai in Terterov & Negishi’s invention.
        The suggestion/motivation for doing so would have been as indicated on claim 18, “to generate a set of word scores”.
                   Therefore, it would have been obvious to combine Terterov, Negishi with Dai to obtain the invention as specified in claim 37.











Allowable Subject Matter


Claims 29-36, 40-44 and 47-50 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
           The following is a statement of reasons for the indication of allowable subject matter: The closest applied Prior Art of record fails to disclose or reasonably suggest wherein the logic is further to determine the at least one region-sequence based on at least one selection criterion, the at least one selection criterion comprises an informativeness selection criterion configured to maximize information in the at least one region-sequence.


                












Examiner’s Note

               The examiner cites particular figures, paragraphs, columns and line numbers in the references as applied to the claims for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claims, other passages and figures may apply as well. 
               It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.















Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571)272-7332.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. 


































/AMIR ALAVI/Primary Examiner, Art Unit 2668                                                                                                                                                                                                        Friday, May 21, 2021