Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation - 35 USC § 112(f)

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f), is invoked.

As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f).  The presumption that 35 U.S.C. 112(f) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function.  

Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f).  The presumption that 35 U.S.C. 112(f) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 

Claim limitations in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.

This application includes one or more claim limitations in claims 1, 4,  9 and 10 that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) include the placeholde “unit” of claim 1 and “generator” and “module” of claims 4, 9 and/or 10.

A review of the specification shows that the following appears to be the corresponding structure(, material, or acts for performing the claimed function) described in the specification for the 35 U.S.C. 112(f) limitation: Fig. 2 and its corresponding descriptions on pages 11-12.  

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) , it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f), applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f).

For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Claim Rejections - 35 USC § 102

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1 and 3-5 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Liu et al. (“Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network,” arXiv:1606.00625v1; 2 Jun 2016).

Regarding claim 1 Liu discloses:
extracting features from a plurality of respective images by using a first extraction unit of a deep learning network;
[Fig. 2 and P. 5, 2nd paragraph, lines 1-2 (“…Given a photo stream denoted as S = {I1, I2,…IN}, we first extract the CNN features…from the fc7 layer of VGGNet”).  Note that the fc7 layer of VGGNet is considered a first extraction unit.  Note further that combination of the CNN layer, the image embedding layer and the BMRNN is considered a deep learning network]
generating a structure of a story based on an overall feature of the plurality of images by using a second extraction unit of the deep learning network;
[Fig. 2; P. 5, line 2 (“…(3) a BMRNN network integrated with sGRU for visual structure modeling and text story generation”); P. 5, 2nd paragraph, lines 7-10 (“…the proposed BMRNN model integrates the sGRU and use the skip relation to model the complex visual flow based on input of image embedding vectors X = {x1, x2,…,xN}.   The sentence embedding vectors H = {h1, h2,…,hN} are predicted for retrieval”); P. 5, section 3.2, line 1 (“The role of BMRNN is to model the complex structure of visual stories”).  Note that the set of image embedding vectors X = {x1, x2,…,xN} is considered an overall feature and that the BMRNN is considered a second extraction unit]
generating the story by using outputs of the first and second extraction units
[Fig. 2 (near top: compatibility measure function) and Section 3.3.  Note that as clear from Fig. 2, both the outputs of the first extraction unit (image feature extraction, via Skip Relation Detection) and second extraction unit (BMRNN) are used by Compatibility Measure Function to generate the story] 

Regarding claim 3, Liu further discloses:
wherein generating the story comprises
generating the story based on the generated structure of the story and generating sentences by connecting pieces of information between sentences included in the story
[Figs. 2 (Compatibility Measure Function); 3 (see the connected sentences of the proposed method); P. 5, line 2 (“…(3) a BMRNN network integrated with sGRU for visual structure modeling and text story generation”); P. 5, section 3.2, line 1 (“The role of BMRNN is to model the complex structure of visual stories”); PP. 5-6, section 3.3 (regarding the compatibility measure function ).  Note that the story is finally generated after the application of the compatibility measure function and   the Proposed Method blocks of Fig. 3 show the generated sentences being connected into a story for each sequence of input images] 

Regarding claim 4, Liu further discloses:
wherein generating the story is performed by
applying a cascading mechanism such that a hidden value output by each sentence generator included in a story generation module configured to generate the sentences is input to a subsequent sentence generator
[Fig. 2 (BMRNN).  Note that the skip-DRUs in each layer are cascaded]

Regarding claim 5, Liu further discloses:
wherein extracting the features from the plurality of respective images comprises
extracting features from the plurality of respective images by using a convolution neural network
[Fig. 2 and P. 5, 2nd paragraph, lines 1-2 (“…Given a photo stream denoted as S = {I1, I2,…IN}, we first extract the CNN features…from the fc7 layer of VGGNet”)]

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 6, 7 and 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (“Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network,” arXiv:1606.00625v1; 2 Jun 2016) and Wang et al. (US 2017/0200065).

Regarding claim 6, Liu discloses all the limitations of claim 1 but not a non-transitory computer-readable storage medium.  This feature is taught by Wang.   [Fig. 12 (especially ref. 1206) and paragraphs 89 (“The example computing device 1202 is illustrated as including a processing system 1204, one or more computer-readable media 1206”), 95 (“…The computer-readable storage media…for storage of information such as computer readable instructions, data structures, program modules”).]

Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to use a storage medium and the reasons at least would have been to facilitate the distribution or storage of the computer-executable instructions that implement the claimed method.

>>><<<

Regarding claim 7, Liu discloses:
a deep learning network comprises:
a first extraction unit configured to extract features of the plurality of respective images;
a second extraction unit configured to generate a structure of the story based an overall feature of the plurality of images;
a story generation module configured to generate the story by using outputs of the first extraction unit and the second extraction unit
[Per the analysis of claim 1.  Note that the units and the module can be implemented in the processor taught by the Wang reference applied below]
	Liu does not expressly disclose the following, which are taught by Wang:
an input/output unit configured to receive a plurality of images from an outside, and to output a story generated from the plurality of images;
[Fig. 12 (especially ref. 1208) and paragraphs 89 (“The example computing device 1202…including a processing system 1204, one or more computer-readable media 1206, and one or more I/O interface 1208 that are communicatively coupled, one to another”), 92 (“…Input/output interface(s) 1208…allow information to be presented to the user and/or other components or devices using various input/output devices. Examples of input devices include…a scanner…a camera”)]
a storage unit configured to store a program for generating a story from a plurality of images;
[Fig. 12 (especially ref. 1206) and paragraphs 89 (“The example computing device 1202 is illustrated as including a processing system 1204, one or more computer-readable media 1206”), 95 (“…The computer-readable storage media…for storage of information such as computer readable instructions, data structures, program modules”)]
a control unit configured to include at least one processor;
[Fig. 12 (especially ref. 1204) and paragraphs 89 (“The example computing device 1202…including a processing system 1204”), 90 (“…the processing system 1204…including hardware elements 1210 that may be configured as processors”), 98 (“…The computing device 1202 may be configured to implement particular instructions and/or functions corresponding to the software…through use of computer-readable storage media and/or hardware elements 1210 of the processing system 1204”)]
(that the) deep learning network is implemented by executing the program by the control unit
[Fig. 4 and paragraph 43 (“… a pre-trained convolution neural network (CNN) 402 is used”).  Note implementation a functionality (such as a neural network) using a processor is also taught by Wang, see the analysis above]

Prior to the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to modify Liu with Wang’s teaching as set forth above.  The reasons at least would have been to realize the claimed method, as would have been obvious to one of ordinary skill in the art.

Claims 9-11 are similarly analyzed and rejected as per the analyses of claims 7 (base claim) and claims 3-5 (including respective claim limitations).

Allowable Subject Matter

Claims 2 and 8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter: closest art of record, alone or in combination, does not disclose, teach or fairly suggest at least the following common features of claims 2 and 8:
extracting, by the second extraction unit, the overall feature of the plurality of images; 
understanding, by the second extraction unit, context based on the overall feature; and
generating, by the second extraction unit, the structure of the story based on the understood context

For example, 
The applied reference Liu further discloses using two layers of long short-term memory (LSTM) operating in different directions.  [Fig. 2 (BMRNN).  Note that two layers of skip-GRUs, each of which is a version of a LSTM.  (See, for example, the last paragraph on page 3 of the IDS reference “Image Caption Generation with Context-Gate” by Changki Lee: “…it uses a Gated Recurrent Unit GRU…a variant of LSTM-ERNN”).]  However, the LSTMs are not bidirectional. 
Liu further discloses extracting the overall feature of the plurality of images.  [Fig. 2; P. 5, lines 7-10 of the 2nd paragraph recited above in the analysis of claim 1 and note that the set of image embedding vectors X = {x1, x2,…,xN} is the overall feature.]  However, the overall feature is not extracted by the second extraction unit that is the BMRNN.
Yu et al. (“An end-to-end neural network approach to story segmentation,” 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference; Date of Conference: 12-15 December 2017) discloses using bi-directional LSTMs for feature extraction.  [Fig. 1 and P. 171, right column, 1st complete paragraph (“…we use one LSTM layer for feature extraction, which forms sentence vectors by accumulating word sequence information.  The derived sentence vectors are fed into another LSTM layer that captures the context information of each sentence…We also investigated bi-directional LSTM (BLSTM) layers as an alternative, since they can accumulate both past and future information”).]
Wang et al. (CN 106650789A) teaches using multiple cascaded LSTMs to generate words.  [Fig. 3a and the abstract (“The invention claims a LSTM network based on depth of image description generation method, comprising the following steps: 1) extracting the image description data set image of CNN feature and obtaining the image description embedded vector corresponding word in the reference sentence; 2) establishing double-LSTM network, binding-LSTM network and CNN network sequence modelling to generate multimode LSTM model 3) by means of joint training for multi-mode LSTM model for training, 4) gradually hierarchical LSTM network multi-mode in the LSTM model. is increased by one layer image description model and training, finally obtaining multiple target optimization and multi-layer probability fusion, 5) the multiple target optimization and multi-layer probability fusion image description model in multi-layer LSTM network probability score of each branch output, employing a common decision method, the probability of the maximum output of the corresponding word”).]  But again, the LSTMs are not bidirectional.
Lee (“Image Caption Generation with Context-Gate,” The Korean Institute of Information Scientists and Engineers, June 2018.  Provided in the IDS) also discloses cascading outputs from similar modules each of which includes a GRU.  [Figs. 1, 2 and P. 4, 1st paragraph - P. 5, 1st paragraph (“…The GRU-DO1 (Gate Recurrent Unit with Deep Output-1) model consists of a CNN that extracts 4,096-dimensional features from an input image and an RNN that generates image captions word by word…The RNN uses GRU…a variant of LSTM-ERNN…The first input W0 of the RNN is the starting symbol of a sentence. From the first input W0, a probability that the next word W1, is generated…The next word is selected …and sent back as the input of the RNN, This process is repeated until the end symbol of the sentence appears as the result of the RNN).]  However, the  CRUs are not bi-directional.
None of the above-cited closet prior art of references disclose or suggest   extracting an overall feature the context of which is subsequently understood for generating the structure of the story, as required by claims 2 and 8.




Conclusion and Contact Information

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Zhong et al. (“Learning Video-Story Composition via Recurrent Neural Network,” 2018 IEEE Winter Conference on Applications of Computer Vision; Date of Conference: 12-15 March 2018)
Sigurdsson et al. (“Learning Visual Storylines with Skipping Recurrent Neural Networks,” arXiv:1604.04279v2; 26 Jul 2016)
Weinberger et al. (US 2010/0332958)—[Fig. 4 and paragraph 36 (“…A plurality of multimedia content…is uploaded…The multimedia content may contain… photos…Contextual information associated with the multimedia content is identified from the metadata…The plurality of multimedia content is grouped into one or more groups…each of the groups…is integrated into a corresponding photo story”)]

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUBIN HUNG whose telephone number is (571)272-7451. The examiner can normally be reached M-F 7:30-16:00.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 571-272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/YUBIN HUNG/Primary Examiner, Art Unit 2662                                                                                                                                                                                                        June 14/2022