DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/10/2019 and 11/21/2019 have been considered by the examiner and been placed of record in the file.

Allowable Subject Matter
Claims 7-9 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Claims 17-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

35 USC § 112(f) (pre-AIA  35 USC 112, 6th) 
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. § 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that § 112(f) (pre-AIA  § 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function. 

Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke § 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke § 112(f) except as otherwise indicated in an Office action. 
Claim limitations in claims 1-10 have been interpreted under 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, because they use a generic placeholder “localization module”,  “fusion component”, “probability computing module ” and “action start generation module” coupled with such functional languages as “receiving” and  “generating” without reciting sufficient structure to achieve the function.  Furthermore, the generic placeholder is not preceded by a structural modifier.  
Since these claim limitations invoke 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, claims 1-10 are interpreted to cover the corresponding structures described in the specification that achieve the claimed functions, and equivalents thereof.  
A review of the specification shows that there appears to be no corresponding structure described in the specification for the 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre- AIA ), sixth paragraph limitations.

If applicant does not wish to have the claim limitation treated under 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, applicant may amend the claim so that it will clearly not invoke 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, or present a sufficient showing that the claim recites sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph.
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance with 35 U.S.C. § 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Claim Rejections - 35 U$C § 112 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


 
Claims 1-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. 

Claims 1-10 recite the means plus function limitations that invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structures, materials, or acts for the claimed functions. 



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6, 10-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shou et al. (Online Action Detection in Untrimmed, Streaming Videos-Modeling and Evaluation) in view of Escorcia et al. (US 2019/0108400 A1).

Claim 1. Shou et al. disclose a system for detection of an action start in a video (read as Online Action Detection (OAD), as shown in Figure 1,aims to detect the occurrence and class of the action start as soon as it happens…This task is important in many practical applications, such as monitoring surveillance cameras. robot cognition, etc. (Introduction)), the system comprising: 
a classification module for generating a set of action scores corresponding to a first video frame from the video (read as confidence score s( t) (Section 3.1). The scores are being generated by some module), based on the first video frame and video frames before the first video frames in the video (read as when a new frame arrives at time t, we feed the window ending at t into our network to make prediction for the time stamp t (Section 3.1).), each action score indicating a respective probability (read as confidence score s ( t) (Section 3.1)) that the first video frame contains a respective action class (read as we assign
its label as the action class of the last frame of the window (Section 3.1));  
a localization module coupled to the classification module (read as evaluate classification accuracy, detection timeliness, and localization precision (Section 4.1)) for: 
receiving the set of action scores from the classification module (read as confidence score s( t) (Section 3.1)), and generating an action-agnostic start probability that the first video frame contains an action start; and 
a fusion component coupled to the localization module and the localization module (This statement is not clear. Is the fusion component coupled just to the localization module, or to the classification module also as shown in the drawings?) for: 
generating, based on the set of action scores (read as confidence score s( t) (Section 3.1)) and the action-agnostic start probability, a set of action-specific start probabilities, each action-specific start probability corresponding to a start of an action belonging to the respective action class.
Shou et al. do not explicitly disclose: the action-agnostic start probability, a set of action-specific start probabilities, each action-specific start probability corresponding to a start of an action belonging to the respective action class.
However, in the related field of endeavor Escorcia et al. disclose: The actor detection is action class agnostic. That is, the actor detector does not detect an actor performing a specific type of action, such as cricket bowling. Aspects of the present disclosure focus on actors to retrieve the locations that are most likely to contain actions. Action proposals may be generated based on the most likely action locations. The action may be classified based on the action proposal…The actor detector 504 is action class agnostic… Because the actor detector 504 is not specific to an action, the actor detector 504 may identify the actor 508 performing any type of action. The actor detector 504 may be trained on images and/or video [0057-0060]. 
The idea, of using agnostic-action detection, is clearly disclosed by Escorcia et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Shou et al. with the teaching of Escorcia et al. in order to agnostic-action detection. This feature is important. For example, in an autonomous vehicle, the actor detector may detect pedestrians, other cars, and bicyclists. The actor detection is action class agnostic. That is, the actor detector does not detect an actor performing a specific type of action, such as cricket bowling. Aspects of the present disclosure focus on actors to retrieve the locations that are most likely to contain actions (Escorcia et al. [0057]).

Claim 2. The system of claim 1, the combination of Shou et al. and Escorcia et al. teaches,
wherein the classification module further comprises a deep convolutional network  (Escorcia et al.: FIG. 2D is a diagram illustrating an exemplary deep convolutional network (DCN) [0015]) for: 
receiving the first video frame from the video (Escorcia et al.: read as input from an image capturing device 230, such as a car-mounted camera [0040]. FIG. 2D); and 
generating a feature representation of the first video frame (Escorcia et al.: read as a convolutional layer 232 may apply convolutional kernels (not shown) to the image 226 to generate a first set of feature maps [0041]).

Claim 3. The system of claim 2, the combination of Shou et al. and Escorcia et al. teaches,
wherein the classification module further comprises: 
a recurrent neural network coupled to the deep convolutional network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]) for: 
receiving the feature representation of the first video frame (read as the first layer may learn to recognize relatively simple features, such as edges, in the input stream [0035]); 
generating an updated state based on an input of the feature representation and a previous state of the recurrent neural network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]).

Claim 4. The system of claim 3, the combination of Shou et al. and Escorcia et al. teaches,
wherein the classification module further comprises: 
a probability computing module (Escorcia et al.: read as an output 222 of the DCN 200 is a probability of the image 226 including one or more features [0043]) coupled to the recurrent neural network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]) for: 
receiving the updated state from the recurrent neural network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]); and 
computing, via a softmax operation (Escorcia et al.: read as A softmax function (not shown) may convert the numbers in the second feature vector 228 to a probability), the set of action scores based on the updated state and a set of parameters of the classification module (Escorcia et al.: read as To adjust the weights, a learning algorithm may compute a gradient vector for the weights [0045]).

Claim 5. The system of claim 1, the combination of Shou et al. and Escorcia et al. teaches,
wherein the classification module further comprises: 
a loss module for computing an entropy loss based on set of action scores, wherein the entropy loss is used for updating a set of parameters of the classification module via backpropagation (Escorcia et al.: read as The weights may then be adjusted to reduce the error. This manner of adjusting the weights may be referred to as “back propagation” as it involves a “backward pass” through the neural network [0045]).

Claim 6. The system of claim 1, the combination of Shou et al. and Escorcia et al. teaches,
wherein the localization module further comprises: 
a recurrent neural network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]) for: 
receiving the set of action scores from the classification module (Escorcia et al.: read as The DCN 200 may include a feature extraction section and a classification section [0041].); and 
updating a current state (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037].) based on an input of the set of action scores (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]. The feedback is related to some score in order to provide the correct update), a previous state of the recurrent neural network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037].) and a set of previously generated action-agnostic start probabilities (Escorcia et al.: read as actor detector 504 is action class agnostic. That is, the actor detector 504 detects actors 508 and does not detect classes of actions [0060]. The device must calculate some probability to which action is being detected.).

Claim 10. The system of claim 1, the combination of Shou et al. and Escorcia et al. teaches,
further comprising: 
an action start generation module that generates an action start point of a specific action class in the video based at least in part on a largest value of the set of action-specific start probabilities (Shou et al.: In order to detect the action start, we compare prediction at t - 1 and t. We output an action start prediction when the following conditions are all satisfied: (1) c(t) is action: (2) c(t) ≠ c(t-1); (3) s(t) exceeds the threshold obtained by g1id search on the training set (Section 3.1)).

Claim 11. Shou et al. disclose a method for detection of an action start in a video (read as Online Action Detection (OAD), as shown in Figure 1,aims to detect the occurrence and class of the action start as soon as it happens…This task is important in many practical applications, such as monitoring surveillance cameras. robot cognition, etc. (Introduction)), the method comprising: 
generating, via a classification module, a set of action scores corresponding to a first video frame from the video (read as confidence score s( t) (Section 3.1). The scores are being generated by some module), based on the first video frame and video frames before the first video frames in the video (read as when a new frame arrives at time t, we feed the window ending at t into our network to make prediction for the time stamp t (Section 3.1).), 
each action score indicating a respective probability (read as confidence score s ( t) (Section 3.1))  that the first video frame contains a respective action class (read as we assign its label as the action class of the last frame of the window (Section 3.1)); 
generating, via a localization module (read as evaluate classification accuracy, detection timeliness, and localization precision (Section 4.1)), an action-agnostic start probability that the first video frame contains an action start; and 
generating, at a fusion component coupled to the localization module and the localization module (This statement is not clear. Is the fusion component coupled just to the localization module, or to the classification module also as shown in the drawings?), based on the set of action scores and the action-agnostic start probability, a set of action- specific start probabilities, each action-specific start probability corresponding to a start of an action belonging to the respective action class.
Shou et al. do not explicitly disclose: the action-agnostic start probability, a set of action-specific start probabilities, each action-specific start probability corresponding to a start of an action belonging to the respective action class.
However, in the related field of endeavor Escorcia et al. disclose: The actor detection is action class agnostic. That is, the actor detector does not detect an actor performing a specific type of action, such as cricket bowling. Aspects of the present disclosure focus on actors to retrieve the locations that are most likely to contain actions. Action proposals may be generated based on the most likely action locations. The action may be classified based on the action proposal…The actor detector 504 is action class agnostic… Because the actor detector 504 is not specific to an action, the actor detector 504 may identify the actor 508 performing any type of action. The actor detector 504 may be trained on images and/or video [0057-0060]. 
The idea, of using agnostic-action detection, is clearly disclosed by Escorcia et al.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Shou et al. with the teaching of Escorcia et al. in order to agnostic-action detection. This feature is important. For example, in an autonomous vehicle, the actor detector may detect pedestrians, other cars, and bicyclists. The actor detection is action class agnostic. That is, the actor detector does not detect an actor performing a specific type of action, such as cricket bowling. Aspects of the present disclosure focus on actors to retrieve the locations that are most likely to contain actions (Escorcia et al. [0057]).

Claim 12. The method of claim 11, the combination of Shou et al. and Escorcia et al. teaches,
further comprising: 
receiving, at the classification module, the first video frame from the video (Escorcia et al.: read as input from an image capturing device 230, such as a car-mounted camera [0040]. FIG. 2D); and 
generating a feature representation of the first video frame (Escorcia et al.: read as a convolutional layer 232 may apply convolutional kernels (not shown) to the image 226 to generate a first set of feature maps [0041]).

Claim 13. The method of claim 12, the combination of Shou et al. and Escorcia et al. teaches,
further comprising: 
receiving, at a recurrent neural network coupled to the deep convolutional network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]), the feature representation of the first video frame (Escorcia et al.: read as input from an image capturing device 230, such as a car-mounted camera [0040]. FIG. 2D); and 
generating an updated state based on an input of the feature representation and a previous state of the recurrent neural network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]).

Claim 14. The method of claim 13, the combination of Shou et al. and Escorcia et al. teaches,
further comprising: 
receiving, at a probability computing module (Escorcia et al.: read as an output 222 of the DCN 200 is a probability of the image 226 including one or more features [0043]) coupled to the recurrent neural network, the updated state from the recurrent neural network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]); and 
computing, via a softmax operation (Escorcia et al.: read as A softmax function (not shown) may convert the numbers in the second feature vector 228 to a probability), the set of action scores based on the updated state and a set of parameters of the classification module (Escorcia et al.: read as To adjust the weights, a learning algorithm may compute a gradient vector for the weights [0045]).

Claim 15. The method of claim 11, the combination of Shou et al. and Escorcia et al. teaches,
further comprising: 
computing an entropy loss based on set of action scores, wherein the entropy loss is used for updating a set of parameters of the classification module via backpropagation (Escorcia et al.: read as The weights may then be adjusted to reduce the error. This manner of adjusting the weights may be referred to as “back propagation” as it involves a “backward pass” through the neural network [0045]).

Claim 16. The method of claim 11, the combination of Shou et al. and Escorcia et al. teaches,
further comprising: 
receiving, at a recurrent neural network (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]), the set of action scores from the classification module (Escorcia et al.: read as The DCN 200 may include a feature extraction section and a classification section [0041].); and 
updating a current state (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037].) based on an input of the set of action scores (Escorcia et al.: read as Neural networks may also have recurrent or feedback (also called top-down) connections [0037]. The feedback is related to some score in order to provide the correct update), a previous state of the recurrent neural network and a set of previously generated action-agnostic start probabilities (Escorcia et al.: read as actor detector 504 is action class agnostic. That is, the actor detector 504 detects actors 508 and does not detect classes of actions [0060]. The device must calculate some probability to which action is being detected.).

Claim 20. The method of claim 11, the combination of Shou et al. and Escorcia et al. teaches,
 further comprising: 
generating an action start point of a specific action class in the video based at least in part on a largest value of the set of action-specific start probabilities (Shou et al.: In order to detect the action start, we compare prediction at t - 1 and t. We output an action start prediction when the following conditions are all satisfied: (1) c(t) is action: (2) c(t) ≠ c(t-1); (3) s(t) exceeds the threshold obtained by g1id search on the training set (Section 3.1)).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMED RACHEDINE whose telephone number is (571)272-9249.  The examiner can normally be reached on Mon-Fri 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lester Kincaid can be reached on (571)272-7922.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MOHAMMED RACHEDINE
Examiner
Art Unit 2649



/MOHAMMED RACHEDINE/Primary Examiner, Art Unit 2646