DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The following action is in response to the original filing of 08/09/2018.

Claims 1-17 are pending and have been considered below.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Such claim limitation(s) is/are: a sensing unit sensing, a memory unit storing, a processing unit executing in independent claims 8 and 14.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding independent claims 8 and 14, claim limitations “a sensing unit sensing”, “a memory unit storing” and “a processing unit [..] executes a performing procedure” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The disclosure is devoid of any structure that performs the functions of the claim and merely recites a generic gesture recognition system comprised of the units. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Molchanov et al. (US 2017/0206405 published 07/20/2017, hereafter “Molchanov”).

Regarding claim 1, Molchanov discloses a gesture recognition method, comprising a performing procedure (¶22: gesture detection and classification algorithm);
wherein the performing procedure comprises steps of: 
receiving a sensing signal from a sensing unit (Fig. 1A, ¶26: At step 110, an unsegmented stream of data associated with a hand gesture is received. ¶34: The system 125 includes a processor 130 and multiple sensors that may each provide one or more data streams to the processor 130); wherein the sensing signal comprises a plurality of sensing frames (¶34: Each one of the sensors may provide a data stream of a different modality that is captured or computed within a gesture environment., ¶35, ¶36: The frames of data may be encoded as a multi-dimensional tensor in dimensions of x and y (e.g., frame size in pixels), channels (e.g., RGB), and temporal length (e.g., frame number).); 
selecting one of the sensing frames (¶37: The local spatio-temporal features represent micro-level and/or short-term details of the gesture. In other words, the local spatio-temporal features are features within a clip including one or more frames or a short buffer of frames.); 
determining a soft label of the selected sensing frame according to a neural network (¶29: At step 130, a class label for the hand gesture is produced based on the spatio-temporal features. In one embodiment, a class label is a class-conditional probability vector associated with one hand gesture., ¶37: The processor 130 then processes the local spatio-temporal features using a recurrent layer to produce global spatio-temporal features. The global spatio-temporal features represent macro-level and/or long-term details of the gesture. In other words, the global spatio-temporal features are features extracted over multiple clips. Finally, the processor 130 processes the global spatio-temporal features using a softmax layer to predict class-conditional gesture probabilities (e.g., gesture labels).);
classifying a gesture event when to the soft label of the selected sensing frame is approved (¶37: When data streams for more than one modality are input to the processor 130, the data stream for each modality may be processed separately and the resulting modality-specific probabilities may be combined using averaging to produce the gesture label 102., ¶46: Observe that fusing any pair of sensors improves individual results. In addition, combining different modalities of the same sensor (e.g., color and optical flow) also improves the accuracy. ). 
 
Regarding claim 2, Molchanov discloses the gesture recognition method as claimed in claim 1, wherein the soft label of the sensing frame is encoded into one-hot vectors to train the neural network (¶29: In one embodiment, a class label is a class-conditional probability vector associated with one hand gesture., ¶54: At first, feature maps from the last 3D convolutional layer of the 3D-CNN 200 are vectorized and passed through fully connected layers 218 followed by the recurrent layer 220.).  

Regarding claim 3, Molchanov discloses the gesture recognition method as claimed in claim 1, wherein the performing 14procedure comprises steps of:  
determining whether the soft label of the selected sensing frame exceeds a 16confidence threshold before classifying the gesture event (¶47: As understood by those skilled in the art, CTC is a cost function designed for training a CNN on a sequence that is unsegmented or weakly segmented. The CTC 240 implements a CTC forward algorithm that identifies and correctly labels the nucleus of the gesture, while assigning a no gesture class to the remaining clips, addressing the alignment of class labels to particular clips in the data stream.);  
when the soft label of the selected sensing frame exceeds the confidence 18threshold, the soft label of the selected sensing frame is approved, and the gesture 19event is classified (¶48 The label for the gesture class w having the highest probability is output as the gesture label.:).  

Regarding claim 4, Molchanov discloses the gesture recognition method as claimed in claim 3, wherein the performing procedure further comprises steps of:  
when the soft label of the selected sensing frame does not exceed the confidence threshold, selecting another one of the sensing frames, determining a soft label of the 9selected another one of the sensing frames according to the neural network, and determining again whether the soft label of said selected another sensing frame 3exceeds the confidence threshold (¶47: The CTC 240 implements a CTC forward algorithm that identifies and correctly labels the nucleus of the gesture, while assigning a no gesture class to the remaining clips, addressing the alignment of class labels to particular clips in the data stream., ¶48).  

Regarding claim 5, Molchanov discloses a gesture recognition method as claimed in claim 1, further comprising a training procedure (¶51: FIG. 2F illustrates a flowchart of a method for training the R3DCNN 230 or 250 for gesture detection and labeling, in accordance with one embodiment.); 
6wherein the training procedure comprises steps of:  
receiving a training signal (¶51: At step 272, a training data stream associated with a hand gesture is received by the R3DCNN 230 or 250.); 
wherein the training signal comprises a plurality of training frames (¶51: Note that weakly-segmented videos contain the preparation, nucleus, and retraction phases and frames from the no gesture class.);  
determining an amount of the training frames (¶52: In one embodiment, the 3D-CNN 200 is initialized with the C3D network trained on the large-scale Sport-1M human recognition dataset. The network has 8 convolutional layers of 3×3×3 filters and 2 fully-connected neural network layers trained on 16-frame clips.);  
determining a function according to the amount of the training frames (¶54: At first, feature maps from the last 3D convolutional layer of the 3D-CNN 200 are vectorized and passed through fully connected layers 218 followed by the recurrent layer 220. The output of the recurrent layer 220 is fed into the linear classifier 225 to produce output class conditional probabilities.);  
calculating a soft label of each training frame according to the function (¶54: At step 280, a class label is produced for the hand gesture based on the spatio-temporal features. ); and  
training the neural network with the soft labels of the training frames as 13ground truth of the neural network (¶54: The CTC 240 computes the probability of the sequence and, given a ground truth sequence, propagates errors to the preceding layer.).  

Regarding claim 6, Molchanov discloses a gesture recognition method as claimed in claim 5, wherein the training procedure is executed two times for receiving two said training signals (¶54: The training procedure may be repeated iteratively until convergence is achieved.);  
wherein a percentage of a sequence of the training frames of one of the training signals equals to a percentage of a sequence of the training frames of another one of the training signals having the same soft label when the soft label equals to the confidence threshold (¶54: The training procedure may be repeated iteratively until convergence is achieved.).  

20REgardRegarding claim 7, Molchanov discloses a gesture recognition method as claimed in claim 6, wherein:  
21the function is a monotonic function (¶54: Given the error in estimate, each layer updates its parameters in the direction of error reduction and propagates any remaining error to reduce errors. In one embodiment, the training procedure is implemented as stochastic gradient descent with momentum.);  
22the training frames of the training signal are arranged in sequence (¶58: A path π is defined as a possible mapping of the unsegmented training data stream X into a sequence of class labels y.); 
a first training frame of the training signal is mapped to the soft label of zero 10through the function (¶58: The probability of observing a particular sequence y given an input sequence X is the sum of the conditional probabilities of all paths π mapping to that sequence, 
    PNG
    media_image1.png
    21
    155
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    27
    152
    media_image2.png
    Greyscale
); and 
a last training frame of the training signal is mapped to the soft label of one 3through the function (¶58: The probability of observing a particular sequence y given an input sequence X is the sum of the conditional probabilities of all paths π mapping to that sequence, 
    PNG
    media_image1.png
    21
    155
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    27
    152
    media_image2.png
    Greyscale
).

Regarding claim 8, claim 8 recites limitations similar to claim 5 and is similarly rejected.

Regarding claim 9, claim 8 recites limitations similar to claim 7 and is similarly rejected.

Regarding claim 10, claim 8 recites limitations similar to claim 2 and is similarly rejected.

Regarding claim 11, claim 8 recites limitations similar to claim 3 and is similarly rejected.

Regarding claim 12, claim 8 recites limitations similar to claim 4 and is similarly rejected.

Regarding claim 13, claim 8 recites limitations similar to claim 6 and is similarly rejected.

Regarding claim 14, claim 8 recites limitations similar to claim 1 and is similarly rejected.

Regarding claim 15, claim 8 recites limitations similar to claim 2 and is similarly rejected.

Regarding claim 16, claim 8 recites limitations similar to claim 3 and is similarly rejected.

Regarding claim 17, claim 8 recites limitations similar to claim 4 and is similarly rejected.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sachs et al.
US 20090265671 A1
Mobile devices with motion gesture recognition
Park et al.
US 20180232606 A1
Sensory information providing apparatus, video analysis engine, and method thereof
Yang et al.
US 20180373985 A1
Transforming convolutional neural networks for visual sequence learning
Rao et al.
US 20190087009 A1
System and method for radar gesture recognition
Nowozin et al.
US 9619035 B2
Gesture detection and recognition
Karmon et al.
US 9857881 B2
Electrical device for hand gestures detection
Gu et al.
US 10373332 B2
Systems and methods for dynamic facial analysis using a recurrent neural network
Mcmahon et al.
WO 2016170011 A1
Gesture recognition with sensors
Xiong
CN 103926999 A
Palm gesture recognition method and device, human-computer interaction method and device
Yuan et al.
CN 107590432 A
A gesture recognition method based on circular three-dimensional convolutional neural network
De Mello et al.
DE 102017100609 A1
Computer-implemented method for detecting and classifying gestures, involves generating class label for hand gesture based on spatio-temporal features extracted from unsegmented stream through three-dimensional folding neural network
Tu et al.
EP 2362326 A2
Systems and methods for motion recognition with minimum delay
Galstyan, A., Cohen, P.R. (2008). Empirical Comparison of “Hard” and “Soft” Label Propagation for Relational Classification. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds) Inductive Logic Programming. ILP 2007. Lecture Notes in Computer Science(), vol 4894. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78469-2_13
P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree and J. Kautz, "Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4207-4215, doi: 10.1109/CVPR.2016.456.
Kopinski, Thomas, Alexander Gepperth, and Uwe Handmann. "A simple technique for improving multi-class classification with neural networks." Proceedings. Presses universitaires de Louvain, 2015, https://doi.org/10.48550/arXiv.1601.01157
D. Wu et al., "Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 8, pp. 1583-1597, 1 Aug. 2016, doi: 10.1109/TPAMI.2016.2537340.
Goh, "Why Momentum Really Works", Distill, 2017. http://doi.org/10.23915/distill.00006


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW L TANK whose telephone number is (571)270-1692. The examiner can normally be reached Monday-Thursday 9a-6p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Renee Chavez can be reached on 571-270-1104. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW L TANK/Primary Examiner, Art Unit 2179