Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is the initial office action that has been issued in response to patent application 16/439,508 filed on 06/12/2019. Claims 1-20, as originally filed, are currently pending and have been considered below. Claim 1, 13 and 20 are independent claims.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are (generic place holders in bold):
Claim 1:
an input unit configured to receive a Spatio-Temporal Graph (STG) (Specification Para. [0009] reiterates the function, but does not provide description of the structure)
an output device configured to output an indication 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1-12 and 19 are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite or failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Each of the limitations in claim 1 that contain the following the following generic placeholders:
Claim 3:
input unit

invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the corresponding description found in the Specification (see Section 4 of the Office Action) of each of the generic placeholders listed above substantially reiterates the claim language and does not provide description of the structure that performs the corresponding functions. Therefore, the claims are indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claims 2-12 and 19 are rejected based on the same rationale as discussed above in the rejected claim 1.

Claim 19 is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite or failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claim 19 recites “the method of Claim 1,” however, Claim 1 is a system claim, so it is indefinite as to which particular method recited in Claim 1 is being referred.  For the purpose of examination, Claim 19 will be interpreted as a dependent on Claim 1, further limiting the system of Claim 1.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a system, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a system for identifying complex events from hierarchical representation of data set features. Each of the following limitation(s):  
To process the STG to identify the one or more multimodal subevents for the event 
as drafted, under its broadest reasonable interpretation, covers mental processes corresponding to an evaluation, judgement.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer or other machinery as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “an input unit configured to”,  “a computation engine comprising processing circuitry for”, “a machine learning system comprising one or more STGCNs” and “an output device configured to”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the limitations of “receive a Spatio-Temporal Graph (STG)” and “output an indication of the one or more multimodal subevents”, which can be considered as mere data gathering. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. Further, the insignificant extra-solution activity of “receive a Spatio-Temporal Graph (STG)” and “output an indication of the one or more multimodal subevents” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 2,
Claim 2, dependent upon Claim 1, recites “wherein the one or more stacked STGCNs are further configured to process the STG to convert the variable-length feature descriptors to fixed-length feature descriptors”  and “wherein the STG comprises the variable-length feature descriptors” each describe a mental process of corresponding to an evaluation, judgement.
Regarding claim 3,
Claim 3, dependent upon Claim 2, recites “wherein, to process the STG to convert the variable-length feature descriptors to fixed- length feature descriptors, the one or more stacked STGCNs further comprise, for each cluster of the plurality of clusters, a spatial graph convolutional network (GCN) configured to process the cluster to convert the variable-length feature descriptors of the nodes of the cluster to the fixed- length feature descriptors”  and “wherein the plurality of nodes of the STG comprise a plurality of clusters of nodes, each cluster of the plurality of clusters having feature descriptors that have a length that is different from a length of feature descriptors of each other cluster of the one or more clusters” describe a mental process of corresponding to an evaluation, judgement.
Regarding claim 4,
Claim 4, dependent upon Claim 2, recites “to convert the variable-length feature descriptors of the plurality of nodes to the fixed-length feature descriptors” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite no new additional elements.
Regarding claim 5,
Claim 5, dependent upon Claim 1, recites “to perform a plurality of down-sampling operations and a plurality of up-sampling operations on the adjacency matrix to identify the one or more multimodal subevents for the event” and “wherein the STG comprises an adjacency matrix that represents irregular connections between the plurality of nodes” describe a mental process of corresponding to an evaluation, judgement, and no new additional elements.
Regarding claim 6,
Claim 6, dependent upon Claim 5, recites “perform the plurality of down-sampling operations” and “perform the plurality of up-sampling operations” is a mental process of corresponding to an evaluation, judgement, or a combination of, and new additional elements “wherein each stacked STGCN of the plurality of stacked STGCNs is a graph convolutional neural network comprising a first plurality of graph convolutional layers and a second plurality of graph convolutional layers”, “wherein the first plurality of layers comprises one or more STGCN layers and one or more convolutional layers, the first plurality of layers configured to”, and “wherein the second plurality of layers comprises one or more deconvolutional layers, the second plurality of layers configured to”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h). 
Regarding claim 7,
Claim 7, dependent upon Claim 6, does not recite any additional abstract ideas, and only recites additional elements “wherein the one or more STGCN layers comprise one or more spatial graph convolution layers, one or more temporal graph convolution layers, and one or more non-linear convolution layers”, which can be considered as “generally linking the use of judicial exception to a particular technological environment or field of use”. See MPEP 2106.05(h).
Regarding claim 8,
Claim 8, dependent upon Claim 1, does not recite any additional abstract ideas, and only recites additional elements “each STG of the plurality of STGs labeled with one or more multimodal subevents present within the plurality of spatial edges and the plurality of temporal edges of the STG”. These additional elements neither integrate the claim into a practical application nor provide significantly more. The recitation of “wherein the plurality of stacked STGCNs are trained with a plurality of STGs” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 9,
Claim 9, dependent upon Claim 1, recites “each node of the plurality of nodes of the STG is connected to one or more of the plurality of spatial edges and the plurality of temporal edges to enable one or more spatial relationships and temporal relationships between the plurality of nodes” which does not alter the fact that the “to process the STG” remains a mental process.
Regarding claim 10,
Claim 10, dependent upon Claim 1, does not recite any additional abstract ideas, and only recites additional elements “wherein the one or more multimodal subevents comprise one or more recognized human or animal activities, wherein the feature descriptor of each node of the plurality of nodes describes a human event feature, the human event feature including one of at least an actor, an object, a scene, or a human action, wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the human event features, and wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the human event features” which merely specify a particular technological environment in which to apply the abstract idea (MPEP 2106.05(h)).
Regarding claim 11,
Claim 11, dependent upon Claim 1, does not recite any additional abstract ideas, and only recites additional elements “wherein the one or more multimodal subevents comprise one or more meteorological predictions, wherein the feature descriptor of each node of the plurality of nodes describes a meteorological feature, wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the meteorological features, and wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the meteorological features” which merely specify a particular technological environment in which to apply the abstract idea (MPEP 2106.05(h)).
Regarding claim 12,
Claim 12, dependent upon Claim 1, does not recite any additional abstract ideas, and only recites additional elements “wherein the one or more multimodal subevents comprise one or more financial predictions, wherein the feature descriptor of each node of the plurality of nodes describes a financial feature, wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the financial features, and wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the financial features” which merely specify a particular technological environment in which to apply the abstract idea (MPEP 2106.05(h)).
Regarding claim 13,
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 13 is directed to a system, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a system for identifying complex events from hierarchical representation of data set features. Each of the following limitation(s):    
to process the STG to identify the one or more multimodal subevents for the event 

as drafted, under its broadest reasonable interpretation, covers mental processes corresponding to an evaluation, judgement, or a combination of.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer  or other machinery, e.g. the STGCN, as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “by a computing system comprising a machine learning system”, “by the computing system”, and “by an output device”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Further, the limitations of “receiving... a Spatio- Temporal Graph (STG) and “outputting… an indication of the one or more multimodal subevents”, which can be considered as mere data gathering. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. Further, the insignificant extra-solution activity of “receiving... a Spatio- Temporal Graph (STG)” and “outputting… an indication of the one or more multimodal subevents” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 14,
Claim 14, dependent upon Claim 13, recites “processing the STG to convert the variable-length feature descriptors to fixed-length feature descriptors” and “wherein the STG comprises the variable-length feature descriptors” which describes a mental process of corresponding to an evaluation, judgement.
Regarding claim 15,
Claim 15, dependent upon Claim 14, recites “to convert the variable-length feature descriptors of the nodes of the cluster to the fixed- length feature descriptors” and wherein the plurality of nodes of the STG comprise a plurality of clusters of nodes, each cluster of the plurality of clusters having feature descriptors that have a length that is different from a length of feature descriptors of each other cluster of the one or more clusters”  which describes a mental process of corresponding to an evaluation, judgement, or a combination of, and new additional elements 
Regarding claim 16,
Claim 16, dependent upon Claim 14, recites “converting, by one or more convolutional layers of the one or more stacked STGCNs, the variable-length feature descriptors of the plurality of nodes of the fixed-length feature descriptors” is a mental process of corresponding to an evaluation, judgement, or a combination of, and does not recite any new additional elements.
Regarding claim 17,
Claim 17, dependent upon Claim 13, recites “performing, by the one or more STGCNs, a plurality of down-sampling operations and a plurality of up-sampling operations on the adjacency matrix to identify the one or more multimodal subevents for the event” and ““wherein the STG comprises an adjacency matrix that represents irregular connections between the plurality of nodes” which describe a mental process of corresponding to an evaluation, judgement, or a combination of, and new additional elements.
Regarding claim 18,
Claim 18, dependent upon Claim 13, does not recite any additional abstract ideas, and only recites additional elements “wherein each node of the plurality of nodes of the STG is connected to one or more of the plurality of spatial edges and the plurality of temporal edges to enable one or more spatial relationships and temporal relationships between the plurality of nodes” which does not alter the “to process the STG” limitation of Claim 13 as a mental process.
Regarding claim 19,
Claim 19, dependent upon Claim 1, does not recite any additional abstract ideas, and only recites additional elements “wherein the one or more multimodal subevents comprise one or more recognized human or animal activities, wherein the feature descriptor of each node of the plurality of nodes describes a human event feature, the human event feature including one of at least an actor, an object, a scene, or a human action, wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the human event features, and wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the human event features” which merely recite a particular technological environment in which to apply the judicial exception (MPEP 2106.05(h)).
Regarding claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 20 is directed to a non-transitory computer-readable medium, which is directed to a manufacture, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a non-transitory computer-readable medium for identifying complex events from hierarchical representation of data set features. Each of the following limitation(s):   
 to process the STG to identify the one or more multimodal subevents for the event
as drafted, claim 20 is a manufacture that, under its broadest reasonable interpretation, covers mental processes corresponding to an evaluation, judgement, or a combination of.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer or other machinery, e.g. the STGCN, as a tool to perform an abstract idea. See MPEP 2106.05(f). Further, the limitations of “receive a Spatio-Temporal Graph (STG) comprising (1) a plurality of nodes, each node of the plurality of nodes having a feature descriptor that describes a feature present in the event, (2) a plurality of spatial edges, each spatial edge of the plurality of spatial edges describing a spatial relationship between two of the plurality of nodes, and (3) a plurality of temporal edges, each temporal edge of the plurality of temporal edges describing a temporal relationship between two of the plurality of nodes” and “output an indication of the one or more multimodal subevents”, which can be considered as mere data gathering. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. Further, the insignificant extra-solution activity of “receive a Spatio-Temporal Graph (STG) comprising (1) a plurality of nodes, each node of the plurality of nodes having a feature descriptor that describes a feature present in the event, (2) a plurality of spatial edges, each spatial edge of the plurality of spatial edges describing a spatial relationship between two of the plurality of nodes, and (3) a plurality of temporal edges, each temporal edge of the plurality of temporal edges describing a temporal relationship between two of the plurality of nodes” and “output an indication of the one or more multimodal subevents” are considered well known, routine, and conventional because of what is recited in the MPEP 2106.05(d)(II): “The courts have recognized the following computer functions as well‐ understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity... i. Receiving or transmitting data over a network, e.g., using the Internet to gather data”. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 10-11, 13-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ascari et al. (US11074497B2) in view of Jain et al. (“Structural-RNN: Deep Learning on Spatio-Temporal Graphs”)
Regarding Claim 1,
Ascari et al. teaches a system for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features, the system comprising (Ascari et al., Col. 4 Lines 9-10, “The invention relates to a method or system or architecture” teaches a system. Col. 4 Lines 54-65, “a second processing stage (further denoted “wired layer”), separate from the first processing stage, the second processing stage comprising a plurality of Forced Temporal Pooler nodes, each being adapted to extract and save prototypes from raw data, each prototype being representative of data chunks with features which are similar, whereby the second processing stage is suited for identifying and recognizing (e.g. single and/or multimodal spatio-temporal) primitives while the first stage is suited for learning, storing, recalling and predicting more abstract relationships among those primitives, these relationships comprising spatial-temporal events and inter-event states” teaches identifying multimodal primitives comprising feature relationships spatial-temporal events and inter-event states (corresponds to multimodal subevents within an event having spatially-related and temporally-related features)). 
… a computation engine comprising processing circuitry for executing a machine learning system comprising one or more stacked Spatio-Temporal Graph Convolutional Networks (STGCNs) each comprising a plurality of STGCN layers, the one or more stacked STGCNs configured to process the STG to identify the one or more multimodal subevents for the event (Ascari et al., Figure 8 and Col. 3 Lines 52-65, “wherein the architecture is a first neural network having structures and mechanisms for abstraction… whereby one or more or essentially all of said nodes each being a second neural network adapted for time series analysis, comprising of neurons and edges connecting two or more of said neurons in a graph, preferably a topological or temporal graph” teaches one or more neural network with Spatial and Temporal Pooler Nodes (corresponds to one or more stacked Spatio-Temporal Graph Convolutional Networks). Col. 4 Lines 49-65, “In a further embodiment the architecture comprises (1) a first processing stage (further denoted “conscious layer”)… and (2) a second processing stage (further denoted “wired layer”)… whereby the second processing stage is suited for identifying and recognizing (e.g. single and/or multimodal spatio-temporal) primitives while the first stage is suited for learning, storing, recalling and predicting more abstract relationships among those primitives, these relationships comprising spatial-temporal events and inter-event states” teaches a plurality of layers and further identifying one or more multimodal events and inter-event states (corresponds to subevents)).
an output device configured to (Ascari et al., Col. 40 Lines 58-62, “In some implementations, a display system, a keyboard, and a pointing device may be included as part of a user interface subsystem” teaches the output device).
Ascari et al. does not appear to explicitly teach an input unit configured to receive a Spatio-Temporal Graph (STG) comprising (1) a plurality of nodes, each node of the plurality of nodes having a feature descriptor that describes a feature present in the event, (2) a plurality of spatial edges, each spatial edge of the plurality of spatial edges describing a spatial relationship between two of the plurality of nodes, and (3) a plurality of temporal edges, each temporal edge of the plurality of temporal edges describing a temporal relationship between two of the plurality of nodes, wherein the STG comprises at least one of: (1) variable-length descriptors for the feature descriptors or (2) temporal edges that span multiple time steps for the event; and… output an indication of the one or more multimodal subevents
However, Jain et al., teaches an input unit configured to receive a Spatio-Temporal Graph (STG) comprising (1) a plurality of nodes, each node of the plurality of nodes having a feature descriptor that describes a feature present in the event, (2) a plurality of spatial edges, each spatial edge of the plurality of spatial edges describing a spatial relationship between two of the plurality of nodes, and (3) a plurality of temporal edges, each temporal edge of the plurality of temporal edges describing a temporal relationship between two of the plurality of nodes (Jain et al., Figure 2 and Section 3.1 Pg. 3, “Figure 2a shows an example st-graph capturing human-object interactions during an activity. The nodes v ∈ V and edges e ∈ ES ∪ ET of the st-graph repeats over time. In particular, Figure 2b shows the same st-graph unrolled through time. In the unrolled st-graph, the nodes at a given time step t are connected with undirected spatio-temporal edge e = (u, v) ∈ ES, and the nodes at adjacent time steps (say the node u at time t and the node v at time t + 1) are connected with undirected temporal edge iff (u, v) ∈ ET . Given a st-graph and the feature vectors associated with the nodes x t v and edges x t e , as shown in Figure 2b, the goal is to predict the node labels (or real value vectors)… For instance, in human-object interaction, the node features can represent the human and object poses, and edge features can their relative orientation; the node labels represent the human activity and object affordance” teaches a spatio-temporal graph comprising a plurality of nodes having feature vectors (corresponds to feature descriptor) that represents the human and object poses (corresponds to features present in the event). The STG further comprises undirected spatio-temporal edges that describe spatial and temporal relationship between the plurality of nodes).
wherein the STG comprises at least one of: (1) variable-length descriptors for the feature descriptors or (2) temporal edges that span multiple time steps for the event (Jain et al., Section 3.2 Pg. 4, “The factors in the st-graph operate in a temporal manner, where at each time step the factors observe (node & edge) features and perform some computation on those features” teaches temporal edges that span multiple time steps for the activity (corresponds to the event). 
… output an indication of the one or more multimodal subevents (Jain et al., Section 4.3 Pg. 7, “At each time step, the human nodeRNN outputs the sub-activity label (10 classes)” teaches outputting the sub-activity label (corresponds to an indication of the one or more multimodal subevents)). 
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to add the Spatio-Temporal Graph and its composition and outputting an indication of the one or more multimodal subevents, as taught by Jain et al., to the system for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features of Ascari et al. The motivation to combine the power of high level spatio-temporal graphs and sequence learning success of Recurrent Neural networks (Jain et al., Abstract, “In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks (RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well-defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower new approaches to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks”)
Regarding Claim 2,
The Ascari et al. in view of Jain et al. combination of claim 1 teaches the system of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the STG comprises the variable-length feature descriptors, and wherein the one or more stacked STGCNs are further configured to process the STG to convert the variable-length feature descriptors to fixed-length feature descriptors (Jain et al., Section 3.3 Pg. 5, “The forward-pass involves the edgeRNNs RE1 (human-object edge) and RE3 (human-human edge). Since the human node v interacts with two object nodes {u,w}, we pass the summation of the two edge features as input to RE1. The summation of features, as opposed to concatenation, is important to handle variable number of object nodes with a fixed architecture. Since the object count varies with environment, it is challenging to represent variable context with a fixed length feature vector” teaches variable context (corresponds to variable-length feature descriptors) that can convert to fixed length feature vector (corresponds to fixed-length feature descriptors)).
Regarding Claim 10,
The Ascari et al. in view of Jain et al. combination of claim 1 teaches the system of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the one or more multimodal subevents comprise one or more recognized human or animal activities (Jain et al., Section 4.3 Pg. 7, “In this section we present S-RNN for modeling human activities… Each activity consist of a sequence of sub-activities (e.g. moving, drinking etc.) and objects affordance (e.g., reachable, drinkable etc.), which evolves as the activity progresses” teaches sub-activities (corresponds to the one or more multimodal subevents) comprises modeling human activities). 
wherein the feature descriptor of each node of the plurality of nodes describes a human event feature, the human event feature including one of at least an actor, an object, a scene, or a human action (Jain et al., Figure 2 and Section 3.1 Pg. 3, “Given a st-graph and the feature vectors associated with the nodes x t v and edges x t e , as shown in Figure 2b, the goal is to predict the node labels (or real value vectors) y t v at each time step t. For instance, in human-object interaction, the node features can represent the human and object poses, and edge features can their relative orientation; the node labels represent the human activity and object affordance” teaches the feature vectors (corresponds to feature descriptor) associated with the nodes includes human-object interaction (corresponds to the human event feature including one of at least an actor, an object, a scene, or a human action)).
wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the human event features (Jain et al., Figure 2 and Section 3.1 Pg. 3, “Given a st-graph and the feature vectors associated with the nodes x t v and edges x t e , as shown in Figure 2b, the goal is to predict the node labels (or real value vectors) y t v at each time step t. For instance, in human-object interaction, the node features can represent the human and object poses, and edge features can their relative orientation; the node labels represent the human activity and object affordance” teaches edge features (corresponds each spatial edge of the plurality of spatial edges) that describes interaction and relative orientation of the human activities (corresponds human event)). 
wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the human event features (Jain et al., Figure 2 and Section 3.1 Pg. 3, “In the unrolled st-graph, the nodes at a given time step t are connected with undirected spatio-temporal edge e = (u, v) ∈ ES, and the nodes at adjacent time steps (say the node u at time t and the node v at time t + 1) are connected with undirected temporal edge iff (u, v) ∈ ET” teaches temporal edges that describe the interaction during human activity (corresponds to a temporal relationship between two of the human event feature)). 
Regarding Claim 11,
The Ascari et al. in view of Jain et al. combination of claim 1 teaches the system of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the one or more multimodal subevents comprise one or more meteorological predictions (Ascari et al., Col. 40 Lines 3-6, “For instance, it can sense the weather conditions (like temperature, humidity, barometric pressure, luminosity, . . . ), presence of obstacles in front of the aircraft and the GPS position” teaches predicting weather conditions (corresponds to one or more meteorological predictions)).
wherein the feature descriptor of each node of the plurality of nodes describes a meteorological feature (Ascari et al., Col. 2 Lines 55-64, “ In particular for the invention input values provided to the first nodes are probability distributions, except for the first level of first nodes receiving as input sensor values, whereas internally to the first nodes the input for their internal algorithms (for instance the neural gas method in accordance to an embodiment of the invention) is an n-dimensional feature vector composed by a probability distribution (as received as input to the first node) and/or a temporal context vector and/or a permanence estimation descriptor” teaches n-dimensional feature vector (corresponds to feature descriptors) of the first nodes. Col. 40 Lines 3-6, “For instance, it can sense the weather conditions (like temperature, humidity, barometric pressure, luminosity, . . . ), presence of obstacles in front of the aircraft and the GPS position” teaches weather conditions (corresponds to meteorological feature)).
wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the meteorological features (Ascari et al., Col. 2 Lines 46-49, “The first nodes are based on a set of components that represent input data and connection between said components that represent temporal or spatial relationship between represented data” teaches connection of spatial (corresponds to spatial edge of the plurality of spatial edges) relationships between represented data (corresponds two of the meteorological features)).
wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the meteorological features (Ascari et al., Col. 2 Lines 46-49, “The first nodes are based on a set of components that represent input data and connection between said components that represent temporal or spatial relationship between represented data” teaches connection of temporal (corresponds to temporal edge of the plurality of spatial edges) relationships between represented data (corresponds two of the meteorological features)).   
Regarding Claim 13,
Ascari et al. teaches a method for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features, the method comprising (Ascari et al., Col. 4 Lines 9-10, “The invention relates to a method or system or architecture” teaches a method. Col. 4 Lines 54-65, “a second processing stage (further denoted “wired layer”), separate from the first processing stage, the second processing stage comprising a plurality of Forced Temporal Pooler nodes, each being adapted to extract and save prototypes from raw data, each prototype being representative of data chunks with features which are similar, whereby the second processing stage is suited for identifying and recognizing (e.g. single and/or multimodal spatio-temporal) primitives while the first stage is suited for learning, storing, recalling and predicting more abstract relationships among those primitives, these relationships comprising spatial-temporal events and inter-event states” teaches identifying multimodal primitives comprising feature relationships spatial-temporal events and inter-event states (corresponds to multimodal subevents within an event having spatially-related and temporally-related features)).
… executing, by the computing system, the machine learning system to process the STG to identify the one or more multimodal subevents for the event, wherein the machine learning system comprises one or more stacked Spatio-Temporal Graph Convolutional Networks (STGCNs) each comprising a plurality of STGCN layers (Ascari et al., Col. 40 Lines 52-56, “Thus, one or more aspects of the method according to embodiments of the present invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them” teaches the computer system. Figure 8 and Col. 3 Lines 52-65, “wherein the architecture is a first neural network having structures and mechanisms for abstraction… whereby one or more or essentially all of said nodes each being a second neural network adapted for time series analysis, comprising of neurons and edges connecting two or more of said neurons in a graph, preferably a topological or temporal graph” teaches one or more neural network with Spatial and Temproal Pooler Node (corresponds to one or more stacked Spatio-Temporal Graph Convolutional Networks). Col. 4 Lines 49-65, “In a further embodiment the architecture comprises (1) a first processing stage (further denoted “conscious layer”)… and (2) a second processing stage (further denoted “wired layer”)… whereby the second processing stage is suited for identifying and recognizing (e.g. single and/or multimodal spatio-temporal) primitives while the first stage is suited for learning, storing, recalling and predicting more abstract relationships among those primitives, these relationships comprising spatial-temporal events and inter-event states” teaches a plurality of layers and further identifying one or more multimodal events and inter-event states (corresponds to subevents)). 
an output device (Ascari et al., Col. 40 Lines 58-62, “In some implementations, a display system, a keyboard, and a pointing device may be included as part of a user interface subsystem to provide for a user to manually input information, such as parameter values” teaches the output device).
Ascari et al. does not appear to explicitly teach receiving, by a computing system comprising a machine learning system, a Spatio- Temporal Graph (STG) comprising (1) a plurality of nodes, each node of the plurality of nodes having a feature descriptor that describes a feature present in the event, (2) a plurality of spatial edges, each spatial edge of the plurality of spatial edges describing a spatial relationship between two of the plurality of nodes, and (3) a plurality of temporal edges, each temporal edge of the plurality of temporal edges describing a temporal relationship between two of the plurality of nodes, wherein the STG comprises at least one of. (1) variable-length descriptors for the feature descriptors; or (2) temporal edges that span multiple time steps for the event; outputting… an indication of the one or more multimodal subevents.
However, Jain et al., teaches receiving, by a computing system comprising a machine learning system, a Spatio- Temporal Graph (STG) comprising (1) a plurality of nodes, each node of the plurality of nodes having a feature descriptor that describes a feature present in the event, (2) a plurality of spatial edges, each spatial edge of the plurality of spatial edges describing a spatial relationship between two of the plurality of nodes, and (3) a plurality of temporal edges, each temporal edge of the plurality of temporal edges describing a temporal relationship between two of the plurality of nodes (Jain et al., Figure 2 and Section 3.1 Pg. 3, “Figure 2a shows an example st-graph capturing human-object interactions during an activity. The nodes v ∈ V and edges e ∈ ES ∪ ET of the st-graph repeats over time. In particular, Figure 2b shows the same st-graph unrolled through time. In the unrolled st-graph, the nodes at a given time step t are connected with undirected spatio-temporal edge e = (u, v) ∈ ES, and the nodes at adjacent time steps (say the node u at time t and the node v at time t + 1) are connected with undirected temporal edge iff (u, v) ∈ ET . Given a st-graph and the feature vectors associated with the nodes x t v and edges x t e , as shown in Figure 2b, the goal is to predict the node labels (or real value vectors)… For instance, in human-object interaction, the node features can represent the human and object poses, and edge features can their relative orientation; the node labels represent the human activity and object affordance” teaches a spatio-temporal graph comprising a plural nodes having feature vectors (corresponds to feature descriptor) that represents the human and object poses (corresponds to feature present in the event). The STG further comprises undirected spatio-temporal edges that describe spatial and temporal relationship between the plurality of nodes).
wherein the STG comprises at least one of. (1) variable-length descriptors for the feature descriptors; or (2) temporal edges that span multiple time steps for the event (Jain et al., Section 3.2 Pg. 4, “The factors in the st-graph operate in a temporal manner, where at each time step the factors observe (node & edge) features and perform some computation on those features” teaches temporal edges that span multiple time steps for the activity (corresponds to the event).
outputting… an indication of the one or more multimodal subevents (Jain et al., Section 4.3 Pg. 7, “At each time step, the human nodeRNN outputs the sub-activity label (10 classes)” teaches outputting the sub-activity label (corresponds to an indication of the one or more multimodal subevents)).
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to add the Spatio-Temporal Graph and its composition and outputting an indication of the one or more multimodal subevents, as taught by Jain et al., to the system for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features of Ascari et al. The motivation to combine the power of high level spatio-temporal graphs and sequence learning success of Recurrent Neural networks (Jain et al., Abstract, “In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks (RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well-defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower new approaches to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks”)
Regarding Claim 14,
The Ascari et al. in view of Jain et al. combination of claim 13 teaches the method of claim 13,
The combination, as described in the rejection of claim 13, further teaches wherein the STG comprises the variable-length feature descriptors, and wherein executing the machine learning system to process the STG further comprises processing the STG to convert the variable-length feature descriptors to fixed-length feature descriptors (Jain et al., Section 3.3 Pg. 5, “The forward-pass involves the edgeRNNs RE1 (human-object edge) and RE3 (human-human edge). Since the human node v interacts with two object nodes {u,w}, we pass the summation of the two edge features as input to RE1. The summation of features, as opposed to concatenation, is important to handle variable number of object nodes with a fixed architecture. Since the object count varies with environment, it is challenging to represent variable context with a fixed length feature vector” teaches variable context (corresponds to variable-length feature descriptors) that can convert to fixed length feature vector (corresponds to fixed-length feature descriptors)).
Regarding Claim 19,
The Ascari et al. in view of Jain et al. combination of claim 1 teaches the [system/method] of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the one or more multimodal subevents comprise one or more recognized human or animal activities (Jain et al., Section 4.3 Pg. 7, “In this section we present S-RNN for modeling human activities… Each activity consist of a sequence of sub-activities (e.g. moving, drinking etc.) and objects affordance (e.g., reachable, drinkable etc.), which evolves as the activity progresses” teaches sub-activities (corresponds to the one or more multimodal subevents) comprises modeling human activities).
wherein the feature descriptor of each node of the plurality of nodes describes a human event feature, the human event feature including one of at least an actor, an object, a scene, or a human action (Jain et al., Figure 2 and Section 3.1 Pg. 3, “Given a st-graph and the feature vectors associated with the nodes x t v and edges x t e , as shown in Figure 2b, the goal is to predict the node labels (or real value vectors) y t v at each time step t. For instance, in human-object interaction, the node features can represent the human and object poses, and edge features can their relative orientation; the node labels represent the human activity and object affordance” teaches the feature vectors (corresponds to feature descriptor) associated with the nodes includes human-object interaction (corresponds to the human event feature including one of at least an actor, an object, a scene, or a human action)).
wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the human event features (Jain et al., Figure 2 and Section 3.1 Pg. 3, “Given a st-graph and the feature vectors associated with the nodes x t v and edges x t e , as shown in Figure 2b, the goal is to predict the node labels (or real value vectors) y t v at each time step t. For instance, in human-object interaction, the node features can represent the human and object poses, and edge features can their relative orientation; the node labels represent the human activity and object affordance” teaches edge features (corresponds each spatial edge of the plurality of spatial edges) that describes interaction and relative orientation of the human activities (corresponds human event)).
wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the human event features (Jain et al., Figure 2 and Section 3.1 Pg. 3, “In the unrolled st-graph, the nodes at a given time step t are connected with undirected spatio-temporal edge e = (u, v) ∈ ES, and the nodes at adjacent time steps (say the node u at time t and the node v at time t + 1) are connected with undirected temporal edge iff (u, v) ∈ ET” teaches temporal edges that describe the interaction during human activity (corresponds to a temporal relationship between two of the human event feature)). 
Regarding Claim 20,
Ascari et al. teaches a non-transitory computer-readable medium comprising instructions for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features, the instructions configured to cause processing circuitry of a computation engine to (Ascari et al., Col. 44 Lines 60-62, “Such a computer program product can be tangibly embodied in a carrier medium carrying machine-readable code for execution by a programmable processor” teaches carrier medium (corresponds to a non-transitory computer-readable medium). Col. 4 Lines 54-65, “a second processing stage (further denoted “wired layer”), separate from the first processing stage, the second processing stage comprising a plurality of Forced Temporal Pooler nodes, each being adapted to extract and save prototypes from raw data, each prototype being representative of data chunks with features which are similar, whereby the second processing stage is suited for identifying and recognizing (e.g. single and/or multimodal spatio-temporal) primitives while the first stage is suited for learning, storing, recalling and predicting more abstract relationships among those primitives, these relationships comprising spatial-temporal events and inter-event states” teaches identifying multimodal primitives comprising feature relationships spatial-temporal events and inter-event states (corresponds to multimodal subevents within an event having spatially-related and temporally-related features)).
… execute a machine learning system to process the STG to identify the one or more multimodal subevents for the event, wherein the machine learning system comprises one or more stacked Spatio-Temporal Graph Convolutional Networks (STGCNs) each comprising a plurality of STGCN layers (Ascari et al., Figure 8 and Col. 3 Lines 52-65, “wherein the architecture is a first neural network having structures and mechanisms for abstraction… whereby one or more or essentially all of said nodes each being a second neural network adapted for time series analysis, comprising of neurons and edges connecting two or more of said neurons in a graph, preferably a topological or temporal graph” teaches one or more neural network with Spatial and Temproal Pooler Node (corresponds to one or more stacked Spatio-Temporal Graph Convolutional Networks). Col. 4 Lines 49-65, “In a further embodiment the architecture comprises (1) a first processing stage (further denoted “conscious layer”)… and (2) a second processing stage (further denoted “wired layer”)… whereby the second processing stage is suited for identifying and recognizing (e.g. single and/or multimodal spatio-temporal) primitives while the first stage is suited for learning, storing, recalling and predicting more abstract relationships among those primitives, these relationships comprising spatial-temporal events and inter-event states” teaches a plurality of layers and further identifying one or more multimodal events and inter-event states (corresponds to subevents)).
Ascari et al. does not appear to explicitly teach receive a Spatio-Temporal Graph (STG) comprising (1) a plurality of nodes, each node of the plurality of nodes having a feature descriptor that describes a feature present in the event, (2) a plurality of spatial edges, each spatial edge of the plurality of spatial edges describing a spatial relationship between two of the plurality of nodes, and (3) a plurality of temporal edges, each temporal edge of the plurality of temporal edges describing a temporal relationship between two of the plurality of nodes; wherein the STG comprises at least one of. (1) variable-length descriptors for the feature descriptors; or (2) temporal edges that span multiple time steps for the event; and output an indication of the one or more multimodal subevents
However, Jain et al., teaches receive a Spatio-Temporal Graph (STG) comprising (1) a plurality of nodes, each node of the plurality of nodes having a feature descriptor that describes a feature present in the event, (2) a plurality of spatial edges, each spatial edge of the plurality of spatial edges describing a spatial relationship between two of the plurality of nodes, and (3) a plurality of temporal edges, each temporal edge of the plurality of temporal edges describing a temporal relationship between two of the plurality of nodes (Jain et al., Figure 2 and Section 3.1 Pg. 3, “Figure 2a shows an example st-graph capturing human-object interactions during an activity. The nodes v ∈ V and edges e ∈ ES ∪ ET of the st-graph repeats over time. In particular, Figure 2b shows the same st-graph unrolled through time. In the unrolled st-graph, the nodes at a given time step t are connected with undirected spatio-temporal edge e = (u, v) ∈ ES, and the nodes at adjacent time steps (say the node u at time t and the node v at time t + 1) are connected with undirected temporal edge iff (u, v) ∈ ET . Given a st-graph and the feature vectors associated with the nodes x t v and edges x t e , as shown in Figure 2b, the goal is to predict the node labels (or real value vectors)… For instance, in human-object interaction, the node features can represent the human and object poses, and edge features can their relative orientation; the node labels represent the human activity and object affordance” teaches a spatio-temporal graph comprising a plural nodes having feature vectors (corresponds to feature descriptor) that represents the human and object poses (corresponds to feature present in the event). The STG further comprises undirected spatio-temporal edges that describe spatial and temporal relationship between the plurality of nodes).
wherein the STG comprises at least one of. (1) variable-length descriptors for the feature descriptors; or (2) temporal edges that span multiple time steps for the event (Jain et al., Section 3.2 Pg. 4, “The factors in the st-graph operate in a temporal manner, where at each time step the factors observe (node & edge) features and perform some computation on those features” teaches temporal edges that span multiple time steps for the activity (corresponds to the event).
… output an indication of the one or more multimodal subevents (Jain et al., Section 4.3 Pg. 7, “At each time step, the human nodeRNN outputs the sub-activity label (10 classes)” teaches outputting the sub-activity label (corresponds to an indication of the one or more multimodal subevents)). 
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to add the Spatio-Temporal Graph and its composition and outputting an indication of the one or more multimodal subevents, as taught by Jain et al., to the system for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features of Ascari et al. The motivation to combine the power of high level spatio-temporal graphs and sequence learning success of Recurrent Neural networks (Jain et al., Abstract, “In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks (RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well-defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower new approaches to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks”).
Claims 3-9 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Ascari et al. in view of Jain et al. in further view of Bhoi (“Spatio-temporal Action Recognition: A Survey”)
Regarding Claim 3,
The Ascari et al. in view of Jain et al. combination of claim 2 teaches the system of claim 2,
The combination, as described in the rejection of claim 2, further teaches wherein the plurality of nodes of the STG comprise a plurality of clusters of nodes, each cluster of the plurality of clusters having feature descriptors that have a length that is different from a length of feature descriptors of each other cluster of the one or more clusters (Ascari et al., Col. 2 Lines 43-64, “Note that said nodes are denoted clustering nodes as their function to find data representations is realized by finding clusters of data or cluster of sequences therein… one or more or all of said first nodes being based on the neural gas concept being artificial neural networks able to find suitable or optimal input data representations. More in particular neural gas is a quantization method. It is an algorithm for finding suitable or optimal data representations based on n-dimensional vectors of features (feature vector)… input values provided to the first nodes are probability distributions, except for the first level of first nodes receiving as input sensor values, whereas internally to the first nodes the input for their internal algorithms (for instance the neural gas method in accordance to an embodiment of the invention) is an n-dimensional feature vector composed by a probability distribution (as received as input to the first node) and/or a temporal context vector and/or a permanence estimation descriptor” teaches a plurality of clustering nodes that have n-dimensional vectors of features (corresponds to feature descriptors that have a length that is different from a length of feature descriptors of each other cluster)). 
Ascari et al. in view of Jain et al. does not appear to explicitly teach wherein, to process the STG to convert the variable-length feature descriptors to fixed- length feature descriptors, the one or more stacked STGCNs further comprise, for each cluster of the plurality of clusters, a spatial graph convolutional network (GCN) configured to process the cluster to convert the variable-length feature descriptors of the nodes of the cluster to the fixed- length feature descriptors
However, Bhoi, teaches wherein, to process the STG to convert the variable-length feature descriptors to fixed- length feature descriptors, the one or more stacked STGCNs further comprise, for each cluster of the plurality of clusters, a spatial graph convolutional network (GCN) configured to process the cluster to convert the variable-length feature descriptors of the nodes of the cluster to the fixed- length feature descriptors (Bhoi, Section 7.2 Pg. 9, “Given these sequences of body joints, the model constructs a spatial temporal graph with joints as graph nodes and natural connectivities in both human body structures and time as graph edges” teaches a spatial temporal graph convolutional network with clusters of joints as graphs nodes (corresponds to each cluster of the plurality of clusters). Section 7.2.3 Pg. 9, “Let us just consider graph CNN model within one single frame. At a single frame at time τ , there will be N joint nodes Vt along with skeleton edges ES(τ ) = vtivtj|(i, j) ∈ H” teaches the spatial graph convolutional neural network that processes the joint nodes (corresponds to the cluster). Section 4.2 Pg. 2, “ToI layers are used to produce fixed length feature vectors which solves the problem of variable length feature vectors” teaches converting the variable length feature vectors to fixed length feature vectors). 
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to add a spatial graph convolutional network (GCN) configured to process the cluster to convert the variable-length feature descriptors of the nodes of the cluster to the fixed- length feature descriptors by Bhoi, to the system for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features of Ascari et al. in view of Jain et al. The motivation to extend the technique of spatiotemporal convolutions while incorporation more features to solve the problem of action localization (Bhoi, Abstract, “Finally, spatiotemporal convolutions provide an interesting proposition and possibly we can extend this technique while incorporating more features to solve the problem of action localization.”)
Regarding Claim 4,
The Ascari et al. in view of Jain et al. combination of claim 2 teaches the system of claim 2,
The combination, as described in the rejection of claim 2, further teaches wherein, to process the STG to convert the variable-length feature descriptors to fixed- length feature descriptors, the one or more stacked STGCNs comprise one or more convolutional layers configured to convert the variable-length feature descriptors of the plurality of nodes to the fixed-length feature descriptors (Bhoi, Section 7.2 Pg. 9, “Given these sequences of body joints, the model constructs a spatial temporal graph with joints as graph nodes and natural connectivities in both human body structures and time as graph edges” teaches a spatial temporal graph convolutional network with clusters of joints as graphs nodes (corresponds to each cluster of the plurality of clusters). Section 7.2.3 Pg. 9, “Let us just consider graph CNN model within one single frame. At a single frame at time τ , there will be N joint nodes Vt along with skeleton edges ES(τ ) = vtivtj|(i, j) ∈ H” teaches the spatial graph convolutional neural network that processes the joint nodes (corresponds to the cluster). Section 8.1.3 Pg. 12, “Batch normalization is applied to all convolutional layers and batch size is set to 32 per GPU” teaches the one or more convolutional layers. Section 4.2 Pg. 2, “ToI layers are used to produce fixed length feature vectors which solves the problem of variable length feature vectors” teaches converting the variable length feature vectors to fixed length feature vectors).
Regarding Claim 5,
The Ascari et al. in view of Jain et al. combination of claim 1 teaches the system of claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein to process the STG to identify the one or more multimodal subevents for the event, the one or more STGCNs are configured to perform a plurality of down-sampling operations and a plurality of up-sampling operations on the adjacency matrix to identify the one or more multimodal subevents for the event (Jain et al., Section 4.2 Pg. 7, “We trained two independent S-RNN models – a slower human and a faster human (by down sampling data) – and swapped the left leg nodeRNN of the trained models” teaches down sampling operation to identify the human activities (corresponds to the one or more multimodal subevents for the event). Section 1 Pg. 2, “Figure 1 schematically illustrates this process, where a sample spatio-temporal problem is shown at the bottom, the corresponding st-graph representation is shown in the middle, and our RNN mixture counterpart of the st-graph is shown at the top” teaches up-sampling operations).
Bhoi further teaches wherein the STG comprises an adjacency matrix that represents irregular connections between the plurality of nodes (Bhoi, Section 7.2.4 Pg. 10, “The intra-body connections are represented by an adjacency matrix A and identity matrix I” teaches an adjacency matric A that represent intra-body connections (corresponds to irregular connections between the plurality of nodes)). 
Regarding Claim 6,
The Ascari et al. in view of Jain et al. in view of Bhoi combination of claim 5 teaches the system of claim 5,
The combination, as described in the rejection of claim 5, further teaches wherein each stacked STGCN of the plurality of stacked STGCNs is a graph convolutional neural network comprising a first plurality of graph convolutional layers and a second plurality of graph convolutional layers (Bhoi, Figure 7 and Section 8.1.1 Pg. 11, “In the authors’ experiments of a 5 layer residual block, two variants come out where first three layers are 3D convolutions while last two are 2D” teaches the graph convolutional neural network comprising of a plurality of 3D convolutions (corresponds to a first plurality of graph convolutional layers) and 2D convolutions (corresponds to a second plurality of graph convolutional layers)).
wherein the first plurality of layers comprises one or more STGCN layers and one or more convolutional layers, the first plurality of layers configured to perform the plurality of down-sampling operations (Bhoi, Section 8.1.3 Pg. 11-12, “Frame input size is 112× 112. The authors use one spatial downsampling of 1×2×2, and three spatiotemporal downsampling with convolutional striding of 2 × 2 × 2. For training, L consecutive frames are randomly sampled. Batch normalization is applied to all convolutional layers and batch size is set to 32 per GPU. The initial learning rate is set to 0.01 and is decayed by 0.1 every 10 epochs. The R(2+1)D layer architecture reported an average 3% improvement from previous methods” teaches downsampling being performed in the first layers).
wherein the second plurality of layers comprises one or more deconvolutional layers, the second plurality of layers configured to perform the plurality of up-sampling operations (Bhoi, Section 1 Pg. 2, “Figure 1 schematically illustrates this process, where a sample spatio-temporal problem is shown at the bottom, the corresponding st-graph representation is shown in the middle, and our RNN mixture counterpart of the st-graph is shown at the top” teaches up-sampling operations).
Regarding Claim 7,
The Ascari et al. in view of Jain et al. in view of Bhoi combination of claim 6 teaches the system of claim 6,
The combination, as described in the rejection of claim 6, further teaches wherein the one or more STGCN layers comprise one or more spatial graph convolution layers, one or more temporal graph convolution layers, and one or more non-linear convolution layers (Bhoi, Figure 7 and Section 8.1 Pg. 11, “The second new convolution is a complete decomposition of the 3D convolution into separate 2D spatial convolution and 1D temporal convolution. This is called the R(2+1)D convolution. This decomposition brings in two advantages. Firstly, the decomposition introduces an additional nonlinear rectification between two operations. This means, you double the number of nonlinearities compared to a network using full 3D convolutions for same number of parameters” teaches one or more temporal graph convolution layers and one or more nonlinear convolution layers).  
Regarding Claim 8,
The Ascari et al. in view of Jain et al. combination of claim 1 teaches the system of claim 1,
Bhai further teaches wherein the plurality of stacked STGCNs are trained with a plurality of STGs, each STG of the plurality of STGs labeled with one or more multimodal subevents present within the plurality of spatial edges and the plurality of temporal edges of the STG (Bhai, Section 7.2.2 Pg. 9, “The authors create an undirected spatial temporal graph G = (V, E) on a skeleton sequence with N joints and T frames feature both intra-body and inter-frame connection. The node set V = vti|t = 1, ..., T, i = 1, ..., N includes all joints in skeleton sequence. As ST-GCN’s input, feature vector on node F(vti) consists of coordinate vectors as well as estimation confidence on i-th join on frame t… the edge set E is composed of two subsets: ES = vtivtj|(i, j) ∈ H consisting of intraskeleton connection at each frame where H is set of naturally connected human body joints. The second subset consists of inter-frame edges connecting same joints in consecutive frames and is expressed as EF = vtiv(t+1)i” teaches the plurality of stacked Spatial-Temporal Graph Convolutional Networks are trained with a plurality of spatial temporal graphs with subsets of inter-frame in the edge set (corresponds to one or more multimodal subevents present in the plurality of edges).  Section 7.2 Pg. 9, “The graph representation contains spatial edges that conform to natural connectivity of joints and temporal edges that connect to same joints across consecutive time steps” teaches the plurality of spatial edges and the plurality of temporal edges of the STG). 
Regarding Claim 9,
The Ascari et al. in view of Jain et al. combination of claim 1 teaches the system of claim 1,
Bhoi further teaches each node of the plurality of nodes of the STG is connected to one or more of the plurality of spatial edges and the plurality of temporal edges to enable one or more spatial relationships and temporal relationships between the plurality of nodes (Bhoi, Section 7.2 Pg. 9, “The graph representation contains spatial edges that conform to natural connectivity of joints and temporal edges that connect to same joints across consecutive time steps… Given these sequences of body joints, the model constructs a spatial temporal graph with joints as graph nodes and natural connectivities in both human body structures and time as graph edges” teaches the nodes of the spatial temporal graph being connected to the plurality of spatial and temporal edges to enable connection (corresponds to relationships) between the nodes).
Regarding Claim 15,
The Ascari et al. in view of Jain et al. combination of claim 14 teaches the method of claim 14,
The combination, as described in the rejection of claim 14, further teaches wherein the plurality of nodes of the STG comprise a plurality of clusters of nodes, each cluster of the plurality of clusters having feature descriptors that have a length that is different from a length of feature descriptors of each other cluster of the one or more clusters (Ascari et al., Col. 2 Lines 43-64, “Note that said nodes are denoted clustering nodes as their function to find data representations is realized by finding clusters of data or cluster of sequences therein… one or more or all of said first nodes being based on the neural gas concept being artificial neural networks able to find suitable or optimal input data representations. More in particular neural gas is a quantization method. It is an algorithm for finding suitable or optimal data representations based on n-dimensional vectors of features (feature vector)… input values provided to the first nodes are probability distributions, except for the first level of first nodes receiving as input sensor values, whereas internally to the first nodes the input for their internal algorithms (for instance the neural gas method in accordance to an embodiment of the invention) is an n-dimensional feature vector composed by a probability distribution (as received as input to the first node) and/or a temporal context vector and/or a permanence estimation descriptor” teaches a plurality of clustering nodes that have n-dimensional vectors of features (corresponds to feature descriptors that have a length that is different from a length of feature descriptors of each other cluster)).  
Ascari et al. in view of Jain et al. does not appear to explicitly teach wherein, to process the STG to convert the variable-length feature descriptors to fixed- length feature descriptors, the one or more stacked STGCNs further comprise, for each cluster of the plurality of clusters, a spatial graph convolutional network (GCN) configured to process the cluster to convert the variable-length feature descriptors of the nodes of the cluster to the fixed- length feature descriptors
However, Bhoi, teaches wherein, to process the STG to convert the variable-length feature descriptors to fixed- length feature descriptors, the one or more stacked STGCNs further comprise, for each cluster of the plurality of clusters, a spatial graph convolutional network (GCN) configured to process the cluster to convert the variable-length feature descriptors of the nodes of the cluster to the fixed- length feature descriptors (Bhoi, Section 7.2 Pg. 9, “Given these sequences of body joints, the model constructs a spatial temporal graph with joints as graph nodes and natural connectivities in both human body structures and time as graph edges” teaches a spatial temporal graph convolutional network with clusters of joints as graphs nodes (corresponds to each cluster of the plurality of clusters). Section 7.2.3 Pg. 9, “Let us just consider graph CNN model within one single frame. At a single frame at time τ , there will be N joint nodes Vt along with skeleton edges ES(τ ) = vtivtj|(i, j) ∈ H” teaches the spatial graph convolutional neural network that processes the joint nodes (corresponds to the cluster). Section 4.2 Pg. 2, “ToI layers are used to produce fixed length feature vectors which solves the problem of variable length feature vectors” teaches converting the variable length feature vectors to fixed length feature vectors). 
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to add a spatial graph convolutional network (GCN) configured to process the cluster to convert the variable-length feature descriptors of the nodes of the cluster to the fixed- length feature descriptors by Bhoi, to the system for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features of Ascari et al. in view of Jain et al. The motivation to extend the technique of spatiotemporal convolutions while incorporation more features to solve the problem of action localization (Bhoi, Abstract, “Finally, spatiotemporal convolutions provide an interesting proposition and possibly we can extend this technique while incorporating more features to solve the problem of action localization”). 
Regarding Claim 16,
The Ascari et al. in view of Jain et al. combination of claim 14 teaches the method of claim 14,
Bhoi further teaches wherein, processing the STG to convert the variable-length feature descriptors to fixed- length feature descriptors comprises converting, by one or more convolutional layers of the one or more stacked STGCNs, the variable-length feature descriptors of the plurality of nodes of the fixed-length feature descriptors (Bhoi, Section 7.2 Pg. 9, “Given these sequences of body joints, the model constructs a spatial temporal graph with joints as graph nodes and natural connectivities in both human body structures and time as graph edges” teaches a spatial temporal graph convolutional network with clusters of joints as graphs nodes (corresponds to each cluster of the plurality of clusters). Section 7.2.3 Pg. 9, “Let us just consider graph CNN model within one single frame. At a single frame at time τ , there will be N joint nodes Vt along with skeleton edges ES(τ ) = vtivtj|(i, j) ∈ H” teaches the spatial graph convolutional neural network that processes the joint nodes (corresponds to the cluster). Section 8.1.3 Pg. 12, “Batch normalization is applied to all convolutional layers and batch size is set to 32 per GPU” teaches the one or more convolutional layers. Section 4.2 Pg. 2, “ToI layers are used to produce fixed length feature vectors which solves the problem of variable length feature vectors” teaches converting the variable length feature vectors to fixed length feature vectors).
  Regarding Claim 17,
The Ascari et al. in view of Jain et al. combination of claim 13 teaches the method of claim 13,
The combination, as described in the rejection of claim 13, further teaches wherein processing the STG to identify the one or more multimodal subevents for the event comprises performing, by the one or more STGCNs, a plurality of down-sampling operations and a plurality of up-sampling operations on the adjacency matrix to identify the one or more multimodal subevents for the event (Jain et al., Section 4.2 Pg. 7, “We trained two independent S-RNN models – a slower human and a faster human (by down sampling data) – and swapped the left leg nodeRNN of the trained models” teaches down sampling operation to identify the human activities (corresponds to the one or more multimodal subevents for the event). Section 1 Pg. 2, “Figure 1 schematically illustrates this process, where a sample spatio-temporal problem is shown at the bottom, the corresponding st-graph representation is shown in the middle, and our RNN mixture counterpart of the st-graph is shown at the top” teaches up-sampling operations).
Bhoi further teaches wherein the STG comprises an adjacency matrix that represents irregular connections between the plurality of nodes (Bhoi, Section 7.2.4 Pg. 10, “The intra-body connections are represented by an adjacency matrix A and identity matrix I” teaches an adjacency matric A that represent intra-body connections (corresponds to irregular connections between the plurality of nodes)).
Regarding Claim 18,
The Ascari et al. in view of Jain et al. combination of claim 13 teaches the method of claim 13,
Bhoi further teaches wherein each node of the plurality of nodes of the STG is connected to one or more of the plurality of spatial edges and the plurality of temporal edges to enable one or more spatial relationships and temporal relationships between the plurality of nodes (Bhoi, Section 7.2 Pg. 9, “The graph representation contains spatial edges that conform to natural connectivity of joints and temporal edges that connect to same joints across consecutive time steps… Given these sequences of body joints, the model constructs a spatial temporal graph with joints as graph nodes and natural connectivities in both human body structures and time as graph edges” teaches the nodes of the spatial temporal graph being connected to the plurality of spatial and temporal edges to enable connection (corresponds to relationships) between the nodes).
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Ascari et al. in view of Jain et al. in further view of Jakobson (US7788109B2)
Regarding Claim 12,
The Ascari et al. in view of Jain et al. combination of claim 1 teaches the system of claim 1,
Ascari et al. in view of Jain et al. does not appear to explicitly teach wherein the one or more multimodal subevents comprise one or more financial predictions, wherein the feature descriptor of each node of the plurality of nodes describes a financial feature, wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the financial features, and wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the financial features
However, Jakobson et al., teaches wherein the one or more multimodal subevents comprise one or more financial predictions (Jakobson et al., Col. 2 Lines 39-49, “In financial trading for fund management, information is collected continuously using data processing networks and systems… data processing models and automated data processing analysis to identify fund trends, trading opportunities and recommendations” teaches financial predictions). 
wherein the feature descriptor of each node of the plurality of nodes describes a financial feature (Jakobson et al., Col. 15 Lines 62-65, “adaptation by substitution covers those episodes in which an object that occurs as a descriptor in the current situation should be substituted throughout for an object that occurs as a descriptor in the retrieved case” teaches nodes of the STG consist of a descriptor. Col. 2 Lines 39-46, “In financial trading for fund management, information is collected continuously using data processing networks and systems about securities, interest rates, financial market conditions, analysts' reports, mergers and acquisitions, regulatory changes, company announcements and filings, and external events. This information is combined with information about portfolios, funds, and customers, and presented to fund managers, research analysts and trader” teaches financial features).
wherein each spatial edge of the plurality of spatial edges describes a spatial relationship between two of the financial features (Jakobson et al., Col. 5 Lines 11-16, “Equivalently, a situation is a time-dependent state of a system that can be described by a collection of declarations and a set of logical, arithmetic, spatial, temporal, structural, causal, modal, or other domain-specific relations and qualities defined over the collection of declarations” teaches the situation (corresponds to financial features) based on spatial relationship. Col. 9 Lines 17-20, “In the STG each arc represents a transition from one situation to another. Any path through the STB using the arcs is an evolution of the situation starting at the beginning of the path to the end point” teaches the spatial arc (corresponds to spatial edge) that represent the evolution of the situation (corresponds to two of the financial features)). 
wherein each temporal edge of the plurality of spatial edges describes a temporal relationship between two of the financial features (Jakobson et al., Col. 5 Lines 11-16, “Equivalently, a situation is a time-dependent state of a system that can be described by a collection of declarations and a set of logical, arithmetic, spatial, temporal, structural, causal, modal, or other domain-specific relations and qualities defined over the collection of declarations” teaches the situation (corresponds to financial features) based on temporal relationship. Col. 9 Lines 17-20, “In the STG each arc represents a transition from one situation to another. Any path through the STB using the arcs is an evolution of the situation starting at the beginning of the path to the end point” teaches the temporal arc (corresponds to temporal edge) that represent the evolution of the situation (corresponds to two of the financial features)).
It would have been obvious to one of ordinary skills in the art before the effective filing data of the claimed invention to have the one or more multimodal subevents comprise one or more financial predictions by Jakobson et al., to the system for identifying one or more multimodal subevents within an event having spatially-related and temporally-related features of Ascari et al. in view of Jain et al. The motivation to increase the confidence level of the situation (Jakobson et al., Col. 20 Lines 62-64, “The situation may contain parameters which, if provided, may increase the confidence level of the situation. Thus, the SBM may take action to be provided the missing information or provide instructions to external resources, such as the EC, to provide such information”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Henry T Nguyen whose telephone number is (571)272-8860. The examiner can normally be reached Monday-Friday 8:00am-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HENRY TRONG NGUYEN/Examiner, Art Unit 2125

/BRIAN M SMITH/Primary Examiner, Art Unit 2122