DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Objections
3.	Claim 4 is objected to because of the following informalities:  each of the limitations “proposed object locations”, “tracked proposal positions”, “set of proposal tubes”, “tubes”, “label sets”, “collection of tubes”, “graphical model”, “set of potential labels”, “similarity measure”, and “predicate scores” lack antecedent basis and are interpreted omitting the phrase “the” which preceeds the limitation.  Appropriate correction is required.
	It is suggested that each of the claims be reviewed by Applicant to ensure correct antecedent basis for each of the limitations.
4.	Claim 4 is objected to because of the following informalities:  steps a-e are provided in the alternative, therefore each of the limitations in the steps should be reviewed for antecedent basis.  Appropriate correction is required.
	It is suggested that each of the claims be reviewed by Applicant to ensure correct antecedent basis for each of the limitations.
5.	Claim 4 is objected to because of the following informalities:  the phrase “collections of vertices that are associated with occurrences of different nouns in the same sentence associated with a video are attached by a factor whose arity is the arity of a predicate in the conjuction of the predicates” should read “collections of vertices that are associated with occurrences of different nouns in the same sentence associated with a video are attached by a factor whose arity is the arity of a predicate in a conjunction of predicates”.  Appropriate correction is required.
6.	Claim 4 is objected to because of the following informalities:  the phrase “between the tubes selected for from the label sets for the two vertices” should read “between the tubes selected from label sets for the two vertices”.  Appropriate correction is required.
7.	Claims 5 and 6 are objected to because of the following informalities:  in each of claims 5 and 6, the phrase “wherein the proposal generation mechanism” should read “wherein a proposal generation mechanism”.  Appropriate correction is required.
8.	Claim 14 is objected to because of the following informalities:  the phrase “wherein the set of proposal is augmented with proposals rotated by multiples of 90 degrees” should read “wherein the set of proposals is augmented with proposals rotated by multiples of 90 degrees”.  Appropriate correction is required.
9.	Claim 16 is objected to because of the following informalities:  the phrase “wherein the similarity measures and predicate scores are combined by taking their product” should read “wherein the similarity measures and predicate scores are combined by taking their sum”.  Appropriate correction is required. 
10.	Claim 18 is objected to because of the following informalities:  the phrase “wherein the set of proposals is augmented by detections produced by a pretrained object detector” should read “wherein the set of proposals is augmented by detections produced by a machine-trained semantic parser”.  Appropriate correction is required. 
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

11.	Claim 19 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.  The limitation “wherein the method of claim 1 is first applied and then the method of claim 18 is applied in one or more subsequent iterations, each iteration using an object detector trained on the proposals selected in earlier iterations” is not supported by the disclosure.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

12.	Claim 1 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  In the limitation “describing one or more activities in which those objects participate in a corresponding video” it is unclear whether a single “corresponding video” corresponding to all of “those objects” is intended or rather a distinct “corresponding video” for each of “those objects”.  For examination purposes, “describing one or more activities in which those objects participate in a corresponding video” is interpreted as “describing one or more activities in which those objects participate in the set of videos”.
13.	Claim 2 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  In claim 2, it is unclear if the limitation “sentences” refers to the limitation “one or more sentences” in claim 1.  For purposes of examination, “sentences” in claim 2 is interpreted as “the sentences”.
14.	Claims 3 and 4 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  In each of claims 3 and 4, it is unclear if the limitation “predicate” refers to a grammatical or logical predicate.  For examination purposes, “predicate” is interpreted as “logical predicate”.
15.	Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  It is unclear what is meant by the limitation “using one or more object trackers to track the positions of the proposed object locations forward or backward in time”.  For examination purposes, “using one or more object trackers to track the positions of the proposed object locations forward or backward in time” is interpreted as “using one or more object trackers to track the positions of the proposed object locations in previous or following frames”.
16.	Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  It is unclear if whether the tubes contain “tracked proposal positions” as suggested by c. or “portion of the images” as suggested by d.  For examination purposes, it is assumed the tubes contain tracked proposal positions.
17.	Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  It is unclear what is meant by the limitation “pairs of vertices that are associated with occurrences of the same noun in two sentences associated with different videos are attached by a binary factor”.  For examination purposes, the limitation “pairs of vertices that are associated with occurrences of the same noun in two sentences associated with different videos are attached by a binary factor” is interpreted as “pairs of vertices that are associated with occurrences of the same noun in two sentences associated with different videos are attached by a predicate”.
18.	Claims 4 and 10-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  It is unclear what is meant by the limitation “tube”.  For examination purposes, “tube” is interpreted as “tube, wherein a tube is a collection of tracked proposal positions”.
19.	Claim 18 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  It is unclear what is meant by the limitation “wherein the set of proposals is augmented by detections produced by a pretrained object detector” since it conflicts with claim 1 which recites “wherein no use is made of a pretrained object detector”.  For examination purposes, “wherein the set of proposals is augmented by detections produced by a pretrained object detector” is interpreted as “wherein the set of proposals is augmented by detections produced by a machine-trained semantic parser”.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

20.	Claims 1-3 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Siskind et al. (US2015/0369596).
Regarding claim 1, Siskind discloses A method for determining the locations and types of objects in a plurality of videos (“Candidate object(s) are detected in the frames of the video” in abstract; “one or more candidate object(s) are detected in each of the plurality of frames of the video” in par. [0086]; “video clips” and “nouns” in par. [0045]; “An object detector is run on every frame of a video producing a set of axis-aligned rectangles” in par. [0168]), comprising: 
using a computer processor (“2186” in fig. 21), receiving a plurality of videos (“2262 MORE VIDEOS” in fig. 22; “plurality of videos” in par. [0067]; “for each of the plurality of videos, at least one respective negative aggregate query (e.g., sentence) is received” in par. [0073]); 
pairing each of the videos with one or more sentences (“short video clips paired with sentences” in par. [0045]; “individual sentences are paired each with a short video clip that depicts that sentence” in par. [0049]); 
using the processor, describing one or more activities in which those objects participate in a corresponding video (“Computing features between pairs of tracks to encode the relative position and motion of the pairs of objects that participate in events that involve two participants” in par. [0046]; “Various aspects relate to "seeing what you're told," e.g., sentence-guided activity recognition in video” in par. [0196]; “The sentence tracker can focus its attention on just those objects that participate in an event specified by a sentential description” in par. [0227]); and 
wherein no use is made of a pretrained object detector. 
Regarding claim 2, Siskind discloses The method of claim 1, wherein locations of the objects are specified by the processor as rectangles in frames of the videos (“The sentence tracker can focus its attention on just those objects that participate in an event specified by a sentential description” in par. [0168]; “Each detection j has an associated axis-aligned rectangle b.sub.i.sup.t and score f(b.sub.j.sup.t) and each pair of detections has an associated temporal coherence score g(b.sub.j.sub.t-1.sup.t-1, b.sub.j.sub.t.sup.t) where t is the index of the current frame in a video of length T” in par. [0169]), the object types are specified as nouns (detecting nouns in video clips in par. [0045]), and sentences describe the relative positions and motions of the objects in the videos referred to by the nouns in the sentences (video clips are paired with sentences containing nouns in par. [0045]; “Computing features between pairs of tracks to encode the relative position and motion of the pairs of objects that participate in events that involve two participants” in par. [0046]; “A detection-based tracker to track object motion” in par. [0126]; “Nouns (e.g., “person”) may be represented by constructing static FSMs over discrete features.  Motion prepositions (e.g., “towards” and “away from”) may be represented as FSMs that describe the changing relative position of two participants” in par. [0221]). 
Regarding claim 3, Siskind discloses The method of claim 1, wherein the relative positions and motions of the objects in the video are described by a conjunction of predicates constructed to represent the activity described by the sentences associated with the videos (“Computing features between pairs of tracks to encode the relative position and motion of the pairs of objects that participate in events that involve two participants” in par. [0046]; “That predicate is constructed as a conjunction of predicates representing the semantics of the individual words in that sentence” in par. [0202]; “A sentence may describe an activity involving multiple tracks, where different (collections of) tracks fill the arguments of different words” in par. [0223]). 
21.	Claims 1-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yu et al. (“Sentence Directed Video Object Codetection”, arXiv.org, Cornell University Library, 5 June 2015).
Regarding claim 1, Yu discloses A method for determining the locations and types of objects in a plurality of videos (“Right: output original videos with objects codetected” in Figure 2 caption on pg. 3; bounding boxes are highlighted on the right of Figure 2 which requires determining the locations of the codetected objects; “We extract object instances (see all 15 classes in Section 4) from the sentences and model them as vertices in a graph…We put an edge between every two vertices that belong to the same object class in “3.4 Joint Inference on pg. 6; “The input is a set of videos paired with human-elicited sentences” in “3 Sentence Directed Codetection” on pg. 3), comprising: 
using a computer processor (implicit in “computer vision” in “5 Results and Conclusions”, last paragraph, on pg. 8), receiving a plurality of videos (“Left: input a set of videos paired with sentences” in Figure 2 caption on pg. 3; “The input is a set of videos paired with human-elicited sentences” in “3 Sentence Directed Codetection” on pg. 3); 
pairing each of the videos with one or more sentences (“Left: input a set of videos paired with sentences” in Figure 2 caption on pg. 3; “The input is a set of videos paired with human-elicited sentences” in “3 Sentence Directed Codetection” on pg. 3); 
using the processor, describing one or more activities in which those objects participate in a corresponding video (“We tackle the problem of video object codetection by leveraging weak semantic constraints implied by sentences that describe the video content” in abstract; The example sentences in Figure 2 make it clear that the sentences describe activities. And makes it clear that the objects participate in such activities in the corresponding videos); and 
wherein no use is made of a pretrained object detector (“Note that no pretrained object detectors are used in this whole process” in Figure 2 caption on pg. 3). 
Regarding claim 2, Yu discloses The method of claim 1, wherein locations of the objects are specified by the processor as rectangles in frames of the videos (see rectangles in “OUTPUT” photographs in Figure 2 on pg. 3), the object types are specified as nouns (see highlighted nouns in sentences in “INPUT” in Figure 2 on pg. 3), and sentences describe the relative positions and motions of the objects in the videos referred to by the nouns in the sentences (“spatial-relation prepositions (e.g., TOTHELEFTOF and ABOVE), motion prepositions (e.g., AWAYFROM and TOWARDS)” in “1 Introduction”, fourth paragraph on pg. 2; see sentences in “INPUT” in Figure 2 on pg. 3). 
Regarding claim 3, Yu discloses The method of claim 1, wherein the relative positions and motions of the objects in the video are described by a conjunction of predicates constructed to represent the activity described by the sentences associated with the videos (“We use a conjunction of predicates to represent (a portion of) the semantics of a sentence. Object instances in a sentence fill the arguments of the predicates in that sentence. An object instance that fills the arguments of multiple predicates is said to be coreferenced. For a coreferenced object instance, only one track is codetected. For example, a sentence like “the person put the cleaner into the sink near the cabbage” implies the following conjunction of predicates: DOWN(cleaner) ∧ NEAR(cleaner, cabbage)” in “3.1 Sentence Semantics”, first paragraph, on pg. 4). 
Regarding claim 4, The method of claim 1, wherein the locations and types of the objects in the plurality of videos are determined by: 
a. using one or more object proposal mechanisms to propose locations for possible objects in one or more frames of the videos (“We use EdgeBoxes [46] to obtain the N/2 top-ranking object candidates and MCG [4] to obtain the other half, filtering out candidates larger than 1/20 of the video-frame size to focus on small and medium-sized objects” on pg. 4 col. 2 ln. 6-9); 
b. using one or more object trackers to track the positions of the proposed object locations forward or backward in time; 
c. collecting the tracked proposal positions for each proposal into a tube; 
d. computing features for each tube based on image features for the portion of the images inside the tubes; or 
e. forming a graphical model, wherein: 
i. one or more noun occurrences in sentences associated with a video are associated with vertices in the model; 
ii. the set of potential labels of each vertex is the set of proposal tubes for the associated video; 
iii. pairs of vertices that are associated with occurrences of the same noun in two sentences associated with different videos are attached by a binary factor computed as a similarity measure between the tubes selected for from the label sets for the two vertices; 
iv. collections of vertices that are associated with occurrences of different nouns in the same sentence associated with a video are attached by a factor whose arity is the arity of a predicate in the conjuction of the predicates used to represent the activity described by the sentence where the score of said represents the degree to which the collection of tubes selected for those vertices exhibits the properties of that predicate; or 
iv. the graphical model is solved by selecting a single proposal tube for each vertex from the set of potential labels for that vertex that collectively maximizes a combination of the similarity measure for all pairs of vertices connected by a similarity factor and the predicate scores of all collections of vertices connected by a predicate factor.
Regarding claim 5, Yu discloses The method of claim 1, wherein the proposal generation mechanism is MCG (“We use EdgeBoxes [46] to obtain the N/2 top-ranking object candidates and MCG [4] to obtain the other half, filtering out candidates larger than 1/20 of the video-frame size to focus on small and medium-sized objects” on pg. 4 col. 2 ln. 6-9).
Regarding claim 6, Yu discloses The method of claim 1, wherein the proposal generation mechanism is EdgeBoxes (“We use EdgeBoxes [46] to obtain the N/2 top-ranking object candidates and MCG [4] to obtain the other half, filtering out candidates larger than 1/20 of the video-frame size to focus on small and medium-sized objects” on pg. 4 col. 2 ln. 6-9).
Regarding claim 7, Yu discloses The method of claim 1, wherein the proposals are tracked by CamShift (“We use the CamShift algorithm [9] to track both MOVING and STATIONARY objects” on pg. 4 col. 2 ln. 26-27).
	Regarding claim 8, Yu discloses The method of claim 1, wherein moving proposals are tracked in HSV color space and allowed to change size (“We track MOVING objects in HSV color space and STATIONARY objects in RGB color space” on pg. 4 col. 2 ln. 32-33).
Regarding claim 9, Yu discloses The method of claim 1, wherein stationary proposals are tracked in RGB color space and are required to remain of constant size (“We track MOVING objects in HSV color space and STATIONARY objects in RGB color space” on pg. 4 col. 2 ln. 32-33).
Regarding claim 10, Yu discloses The method of claim 1, wherein PHOW features are used as image/tube features (“For each sampled detection, we extract PHOW [8] and HOG [13] features to represent its appearance and shape” on pg. 5 col. 1 ln. 8-10).
Regarding claim 11, Yu discloses The method of claim 1, wherein HOG features are used as image/tube features (“For each sampled detection, we extract PHOW [8] and HOG [13] features to represent its appearance and shape” on pg. 5 col. 1 ln. 8-10).
Regarding claim 12, Yu discloses The method of claim 1, wherein similarity is measured using a chi-squared distance between image/tube features (“We use gχ2 to compute the χ 2 distance between the PHOW features and gL2 to compute the Euclidean distance between the HOG features” on pg. 5 col. 1 ln. 16-18).
Regarding claim 13, Yu discloses The method of claim 1, wherein similarity is measured using Euclidean distance between image/tube features (“We use gχ2 to compute the χ 2 distance between the PHOW features and gL2 to compute the Euclidean distance between the HOG features” on pg. 5 col. 1 ln. 16-18).
Regarding claim 14, Yu discloses The method of claim 1, wherein the set of proposal is augmented with proposals rotated by multiples of 90 degrees (“we extract PHOW [8] and HOG [13] features to represent its appearance and shape. We also do so after we rotate this detection by 90◦ , 180◦ , and 270◦” on pg. 5 col. 1 ln. 9-11).
Regarding claim 15, Yu discloses The method of claim 1, wherein the similarity measures and predicate scores are combined by summation (“3.1. Our problem, then, is to select a proposal for each vertex that maximizes the joint score on this graph, i.e., solving the following optimization problem: max k X v hv(kv) + X (v,u)∈C gv,u(kv, ku) + X (v,u)∈P hv,u(kv, ku)” on pg. 5 col. 2 ln. 27-30).
Regarding claim 16, Yu discloses The method of claim 1, wherein the similarity measures and predicate scores are combined by taking their product (interpreted as sum) (“3.1. Our problem, then, is to select a proposal for each vertex that maximizes the joint score on this graph, i.e., solving the following optimization problem: max k X v hv(kv) + X (v,u)∈C gv,u(kv, ku) + X (v,u)∈P hv,u(kv, ku)” on pg. 5 col. 2 ln. 27-30).
Regarding claim  17, Yu discloses The method of claim 1, wherein the graphical model is solved using Belief Propagation (“This discrete inference problem on graphical models can be solved approximately by Belief Propagation” on pg. 5 col. 2 ln. 33-35).
Regarding claim 18, Yu discloses The method of claim 1, wherein the set of proposals is augmented by detections produced by a pretrained object detector (interpreted as machine-trained semantic parser) (“we employ simpler handwritten rules to fully automate the semantic parsing process for our limited corpus. Nothing, in principle, precludes using a machine-trained semantic parser in its place” on pg. 3 col. 2 ln. 4-7).
Allowable Subject Matter
21.	Claim 19 is  objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and if the 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph rejection above is overcome.  Nothing in the prior art showed or suggested the specific limitations of these claims.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRENDA C BERNARDI whose telephone number is (571)270-7125.  The examiner can normally be reached on M-F 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MATTHEW BELLA can be reached on (571) 272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENDA C BERNARDI/Primary Examiner, Art Unit 2667