DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Amendment filed on March 2, 2021.  Claims 1, 6, and 11 are amended. Claims 2, 4, 7, 9, 12, and 14 are cancelled.  Claims 1, 3, 5, 6, 8, 10, 11, 13, and 15 are pending in the case.  Claims 1, 6, and 11 are the independent claims.  
This action is non-final.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on March 2, 2021 has been entered.
 
Applicant’s Response
In Applicant’s Amendment filed March 2, 2021, Applicant amended the claims in response to the rejections of the claims under 35 USC 103 and 112 in the previous office action.

Response to Argument/Amendment
Applicant’s amendments to the claims in response to the rejection of the claims under 35 USC 112 in the previous office action are acknowledged, and Applicant’s associated arguments have been fully considered.  As the amendments to the claims remove the basis for the rejection, the rejection is withdrawn.
Applicant’s amendments to the claims in response to the rejection of the claims under 35 USC 103 in the previous office action are acknowledged, and Applicant’s arguments have been fully considered.  As noted by Applicant, Examiner has previously indicated that the rejection under 35 USC 103 is overcome by the amendments to the claims (see the Advisory Action mailed February 26, 2021), i.e. the previously cited references do not appear to explicitly disclose at least the newly-recited limitations “wherein the CDLM comprises an object detector, a Fingertip regressor and a Bidirectional Long Short Tem Memory (Bi-LSTM) Network, for accurate gesture recognition, and wherein the CDLM ported on the mobile communication device and removes hand gesture recognition framework dependence on a remote server,” and “wherein the Fingertip regressor is implemented based on a Convolutional Neural Network (CNN) architecture to localise a first coordinate and a second coordinate of the fingertip, wherein the CNN consists of two convolutional blocks and three fully connected layers to regress over the fingertip spatial location, wherein each of the two convolutional blocks have three convolutional layers followed by a max-pooling layer.”  Therefore, Applicant’s arguments are persuasive and the rejection is withdrawn.
However, new grounds of rejection are provided below.

Claim Objections
Claims 1, 6, and 11 are objected to because of the following informalities:  
Claims 1, 6, and 11 recite, on lines 8-9, 13, and 10, respectively, “the CDLM ported on the mobile communication device” when “the CDLM is ported on the mobile communication device” was perhaps intended.
Claims 1, 6, and 11, recite, on lines 26, 29, and 27, respectively “the quality of image features” when “a quality of image features” or simply “quality of image features” was perhaps intended.
Claim 6 additionally appears to contain multiple spacing and/or punctuation issues, including: 
“interfaces ;” (line 4, with an unnecessary space between “interfaces” and the semicolon); 
“interfaces ,” (line 6, with an unnecessary space between “interfaces” and the comma);
“gesture, ,” (line 11, with an extra comma following “gesture”);
“candidates, ,” (line 27, with an extra comma following “candidates”);
“location;and” (line 38, where a space is needed between the semicolon and “and”);
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 3, 5, 6, 8, 10, 11, 13, and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
With respect to claims 1, 6, and 11, these claims recite, on lines 6, 10-11, and 8, respectively, “each of the plurality of RGB input images comprises a hand gesture.”  In addition, these claims recite, on lines 25, 28, and 26, respectively “downscaling the plurality of RGB input images comprising hand candidates.”  Finally, these claims recite, on lines 44, 44, and 42-43, respectively, “set of consecutive frames in the plurality of RGB input images.”  It is unclear whether the first recitation of RGB input images which comprises a hand gesture is the same as, or different from, the second recitation of RGB input images which comprises hand candidates (i.e. such as being a subset which includes some but not all of the RGB input images which comprises a hand gesture).  In addition, if these two recitations of RGB input images are intended to refer to different sets of RGB input images, it is further unclear whether the third recitation of the plurality of RGB input images (that having the set of consecutive frames) is intended to refer to the first recitation of RGB input images, the second recitation of RGB input images, or some other set/subset of RGB input images.  Therefore, this limitation is indefinite.  In the interest of providing full examination on the merits, these limitations are interpreted as each referring to a set of RGB input images which is includes some but not necessarily all of the images of the initially-recited plurality of RGB input images.
With respect to claims 3, 5, 8, 10, 13, and 15, these claims depend upon claims 1, 6, and 11, respectively, and inherit the deficiencies identified above with respect to claims 1, 6, and 11.  Therefore, these claims are rejected on the same basis as is identified above with respect to claims 1, 6, and 11.

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102€, (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).
Claims 1, 3, 6, 8, 11, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Garg, Gaurav & Hegde, Srinidhi & Perla, Ramakrishna & Jain, Varun & Vig, Lovekesh & Hebbalaguppe, Ramya. (2019). DrawInAir: A Lightweight Gestural Interface Based on Fingertip Regression. In: Leal-Taixé L., Roth S. (eds) Computer Vision – ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science, vol 11134. Springer, Cham. https://doi.org/10.1007/978-3-030-11024-6_15.  [retrieved on May 18, 2021].  Retrieved from the Internet:  https://link.springer.com/content/pdf/10.1007%2F978-3-030-11024-6_15.pdf.  Hereinafter referred to as “Garg” in view of Dani, Meghal & Garg, Gaurav & Perla, Ramakrishna & Hebbalaguppe, Ramya.  (2018). Mid-Air Fingertip-Based User Interaction in Mixed Reality. 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), 2018, pp. 174-178, doi: 10.1109/ISMAR-Adjunct.2018.00061. [retrieved on May 18, 2021].  Retrieved from the Internet:  https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8699224.  Hereinafter referred to as “Dani.”
With respect to claims 1, 6, and 11, Garg teaches 
a system for classification of fingertip motion patterns into gestures, the system comprising: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces (e.g. page 230 first full paragraph, frugal HMDs/smartphones; i.e. where HMD/smartphone includes hardware processor executing instructions stored in memory), wherein the one or more hardware processors are configured by the instructions to perform a method, 
one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause an on-device classification of fingertip motion patterns into gestures (e.g. page 230 first full paragraph, frugal HMDs/smartphones; i.e. where HMD/smartphone includes hardware processor executing instructions stored in memory
the method, which is a processor implemented method for an on-device classification of fingertip motion patterns into gestures, the method comprising: 
receiving in real-time, in a Cascaded Deep Learning Model (CDLM) executed via the one or more hardware processors of a mobile communication device, a plurality of Red, Green and Blue (RGB) input images from a real-time feed or a video from an image capturing device, wherein each of the plurality of RGB input images comprises a hand gesture (e.g. page 230, first and second full paragraphs, frugal HMD/smartphone; neural network architecture; page 231 first full paragraph, neural network architecture uses only RGB image sequence, works in real-time; Fig. 2 and its caption, classifying images and subsequent frames of hand including fingertip into different gestures; page 234 second full paragraph, adaptable to videos/live feeds); 
wherein the CDLM comprises an object detector, a Fingertip regressor and a Bidirectional Long Short Term Memory (Bi-LSTM) Network, for accurate gesture recognition, and wherein the CDLM ported on the mobile communication device and removes hand gesture recognition framework dependence on a remote server (e.g. page 230, second full paragraph, neural network architecture comprising of a base CNN and DSNT layer followed by a Bi-LSTM; layer transforms heatmap from CNN to output spatial location of fingertip; page 231 first full paragraph, architecture is for efficient classification of user gestures, works in real-time, ported on mobile devices due to low memory footprint (i.e. and therefore removes dependence on server); page 231, Fig. 2 and its caption, DrawInAir comprises a Fingertip Regressor module which accurately localize the fingertip and Bi-LSTM network for classification);
detecting in real-time, using the Fingertip regressor comprised in the CDLM executed via the one or more hardware processors on the mobile communication device, a spatial location of a fingertip from the images (e.g. page 231 first full paragraph, system works in real-time, implemented on mobile device; Fig. 2 and its caption, fingertip regressor module localizes fingertip; page 233 second full paragraph, regressing over coordinates x, y of the fingertip
wherein the spatial location of the fingertip from the hand candidates represents a fingertip motion pattern (e.g. page 232 final paragraph through page 233 first paragraph, classifying point gesture motion patterns into different gestures),
classifying in real-time, via the Bi-LSTM Network comprised in the CDLM executed via the one or more hardware processors on the mobile communication device, using the first coordinate and the second coordinate from the spatial location of the fingertip, the fingertip motion pattern into the one or more hand gestures (e.g. page 231 first full paragraph, system works in real-time, implemented on mobile device; page 234 first full paragraph, spatial location of fingertip fed to gesture classification network; employing Bi-LSTM for classification of gestures),
wherein the spatial location of the fingertip is detected based on a presence of a positive pointing-finger hand detection on a set of consecutive frames in the plurality of RGB input images, and wherein the presence of the positive pointing-finger hand detection is indicative of a start of the hand gesture (e.g. page 231 first full paragraph, neural network architecture uses only RGB image sequence, works in real-time; Fig. 2 and its caption, classifying images and subsequent frames of hand including fingertip into different gestures; page 234 second full paragraph, adaptable to videos/live feeds; page 234 first full paragraph, using only gestures that have pointing fingers).
Garg does not explicitly disclose:
detecting in real-time, using the object detector comprised in the CDLM executed via the one or more hardware processors on the mobile communication device, a plurality of hand candidate bounding boxes from the received plurality of RGB input images, wherein each of the plurality of hand candidate bounding boxes is specific to a corresponding RGB image from the received plurality of RGB input images, wherein each of the plurality of hand candidate bounding boxes comprises a hand candidate, and wherein each of the plurality of hand candidate bounding boxes comprising the hand candidate depicts a pointing gesture pose to be utilized for classifying into one or more hand gestures; 
downscaling in real-time, the hand candidate from each of the plurality of hand candidate bounding boxes to obtain a set of down-scaled hand candidates, wherein downscaling comprises downscaling the plurality of RGB input images comprising hand candidates to a specific resolution to reduce processing time without compromising on the quality of image features; 
that the spatial location of the fingertip is detected from each down- scaled hand candidate from the set of down-scaled hand candidates, where the spatial location of the fingertip from the set of down-scaled hand candidates represents a fingertip motion pattern;
wherein the Fingertip regressor is implemented based on a Convolutional Neural Network (CNN) architecture to localize a first coordinate and a second coordinate of the fingertip, wherein the CNN consists of two convolutional blocks and three fully connected layers to regress over the fingertip spatial location, wherein each of the two convolutional blocks have three convolutional layers followed by a max-pooling layer.
However, Dani teaches:
detecting in real-time, using the object detector comprised in the CDLM executed via the one or more hardware processors on the mobile communication device, a plurality of hand candidate bounding boxes from the received plurality of RGB input images, wherein each of the plurality of hand candidate bounding boxes is specific to a corresponding RGB image from the received plurality of RGB input images, wherein each of the plurality of hand candidate bounding boxes comprises a hand candidate, and wherein each of the plurality of hand candidate bounding boxes comprising the hand candidate depicts a pointing gesture pose to be utilized for classifying into one or more hand gestures (e.g. page 175 second column, fourth full paragraph, real-time gesture recognition; hand candidate detection given an RGB input image; page 176, Fig. 3 and its caption, along with first column, section 3.1, taking RGB input image and outputting hand candidate bounding box, detecting specific pointing hand pose, such as using Faster R-CNN, YOLOv2, or MobileNet
downscaling in real-time, the hand candidate from each of the plurality of hand candidate bounding boxes to obtain a set of down-scaled hand candidates, wherein downscaling comprises downscaling the plurality of RGB input images comprising hand candidates to a specific resolution to reduce processing time without compromising on the quality of image features (e.g. page 175, Fig. 2 and its caption, smartphone sends downscaled video frames to gesture recognition framework; page 175 second column, third full paragraph, each frame is down-scaled to 640x480 resolution to achieve real-time performance by reducing computational time; page 176, Fig. 3 and its caption, cropping and resizing hand candidate to feed into fingertip regressor; page 176 second column, first paragraph, hand candidate bounding box is cropped and resized to 99x99 resolution; i.e. the smartphone sends the downscaled frames and therefore performs the down-scaling); 
that the spatial location of the fingertip is detected from each down- scaled hand candidate from the set of down-scaled hand candidates, where the spatial location of the fingertip from the set of down-scaled hand candidates represents a fingertip motion pattern (e.g. page 175 second column, fourth full paragraph, fingertip regressor accurately estimating fingertip spatial location given hand candidate detection from previous block as input; page 176, Fig. 3 and its caption, cropped and resized hand candidate fed to fingertip regressor block for accurately localizing fingertip; page 176, second column first and second paragraphs, regressing over x, y coordinates of the fingertip, determining continuous valued outputs corresponding to fingertip positions);
wherein the Fingertip regressor is implemented based on a Convolutional Neural Network (CNN) architecture to localize a first coordinate and a second coordinate of the fingertip, wherein the CNN consists of two convolutional blocks and three fully connected layers to regress over the fingertip spatial location, wherein each of the two convolutional blocks have three convolutional layers followed by a max-pooling layer (e.g. page 176 second column first full (i.e. second) paragraph, architecture consists of two convolutional blocks each with three convolutional layers followed by a max-pooling layer, and uses three fully connected layers to regress over two coordinate values of fingertip point at the last layer; determining continuous valued outputs corresponding to positions; page 177, Fig. 4 and its caption; fingertip regressor architecture as previously described).

With respect to claims 3, 8, and 13, Garg in view of Dani teaches all of the limitations of claims 1, 6, and 11, as previously discussed, and Garg further teaches wherein the step of classifying the fingertip motion pattern into one or more hand gestures comprises (i.e. the fingertip motion pattern is classified into one or more hand gestures by) applying a regression technique on the first coordinate and the second coordinate of the fingertip (e.g. page 231 first full paragraph, system works in real-time, implemented on mobile device; Fig. 2 and its caption, fingertip regressor module localizes fingertip; page 233 second full paragraph, regressing over coordinates x, y of the fingertip).
Claims 1, 2, 4-7, 9-12, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Garg in view of Dani, further in view of Kumar et al. (US 20170161555 A1).
With respect to claims 5, 10, and 15, Garg in view of Dani teaches all of the limitations of claims 1 6, and 11 as previously discussed.  Garg and Dani do not explicitly disclose wherein an absence of a positive pointing-finger hand detection on a set of consecutive frames in the plurality of RGB input images is indicative of an end of the hand gesture.  However, Kumar teaches wherein an absence of e.g. paragraph 0033, single finger pointing gesture detected over plurality of images, performing drawing/tracing operation based on gesture; paragraph 0034, image 108 fed into system, five fingertips are all detected; when multiple fingertips are detected, the system no longer draws on the screen; in first frame of this gesture, system is unable to determine swipe right gesture is occurring, and therefore correctly predicts current frame 108 has no swipe-right gesture performed; image 110 fed into system, still too early to determine swipe right gesture; image 112 fed into system, no fingertips tracked/detected, detecting swipe right gesture performed based on context of previous frames; i.e. the system detects that the pointing gesture for drawing/tracing on the screen is ended when multiple and/or no fingers are detected, and, therefore, the drawing/tracing operation corresponding to the gesture is stopped).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Garg, Dani, and Kumar in front of him to have modified the teachings of Garg (directed to a lightweight gestural interface based on fingertip regression) and Dani (directed to fingertip based user interaction in mixed reality), to incorporate the teachings of Kumar (directed to improved virtual reality interaction utilizing deep learning) to include the capability to detect, after detecting a single finger pointing gesture corresponding to a drawing operation, that the single finger gesture is no longer detected and interpret this as an end of the gesture.  One of ordinary skill would have been motivated to perform such a modification in order to provide improved neural network object detection as described in Kumar (paragraphs 0005).

It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain,” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting in re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (GCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co, v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert, denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F,3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir, 2005): Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEREMY STANLEY whose telephone number is (469)295-9105. The examiner can normally be reached on Mon-Thurs 8:00-5:00 CST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Renee Chavez can be reached on (571) 270-1104. The fax phone number for the organization where this application or proceeding is assigned is 571 -273-8300.

Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JEREMY L STANLEY/
Examiner, Art Unit 2179