Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. 
The independent claims 1 and 10 recites receiving, extracting, determining and analyzing a video to identify f sign language gesture that can also be perform mentally by a human. This judicial exception is not integrated into a practical application because the judicial exception does not provide improvements to functioning of any technology, use any particular machine or transform in to different state. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the limitation of “by …gesture detection model” merely reciting the words "apply it" (or an equivalent “by”) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f).  
Dependent claims fail to add any additional elements that are sufficient to amount to significantly more than the judicial exception, and therefore, are rejected as well.  
The independent claim 19 recites receiving a request, obtaining a video and generating and returning identified sign language subtitles that can also be perform mentally by a human. This judicial exception is not integrated into a practical application because the judicial exception does not provide improvements to functioning of any technology, use any particular machine or transform in to different state. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the limitation of “using a subtitle generator system” merely reciting the words "apply it" (or an equivalent “by”) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f).  
Dependent claim fails to add any additional elements that are sufficient to amount to significantly more than the judicial exception, and therefore, are rejected as well.  


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-8, 10-13, 15-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al (US 11,263,409) in view of Kelly (US Pub. 2022/0327961).  
With respect to claim 1, Zhang discloses A computer-implemented method comprising:
receiving an input video comprising a representation of one or more sign language gestures;
extracting landmark coordinates associated with a signer represented in the input video;
determining derivative information from the landmark coordinates; and
analyzing the landmark coordinates and the derivative information by at least one gesture detection model to identify [a first] sign language gesture, (see figure 9, 906, 908 and 914 and also see col. 3, lines 17-19 video data for the scene, col. 7, lines 22-35 for the details on the steps 906, 908 and 914), as claimed.
However, Zhang fails to explicitly disclose gesture detection model to identify a first sign language gesture, (emphasis added) as claimed.  
Kelly in the same field teaches a signing detection embodiment paragraph 0029-0030, a detection of signing “detection model to identify a first sign language gesture”, (emphasis added), as claimed.  
It would have been obvious to one ordinary skilled in the art at the effective date of invention to combine the two references as they are analogous because they are solving similar problem of sign language recognition using image analysis.  The teaching of Kelly to detect the signing gesture can be incorporated in to the Zhang system as suggested in figure 9 of Zhang (for suggestion) and modifying of the system yields a starting point of the translation of the sign language (see paragraph 0001 of Kelly) for motivation.  

With respect to claim 2, combination of Zhang and Kelly further discloses iteratively processing frames of the input video to identify a plurality of sign language gestures, (see Zhang figure 2 and 3, for acquiring the images and figure 9 for processing i.e. iteratively processing), as claimed.

With respect to claim 3, combination of Zhang and Kelly further discloses receiving, by a natural language processing (NLP) accumulator, a plurality of words or phrases corresponding to the plurality of sign language gestures;
identifying a timestamp associated with each of the plurality of words or phrases; and
generating subtitles based on the plurality of words or phrases and associated timestamps, (see Zhang figure 5, 522 sentence level translation with the time on X-axis showing the breaks in the words and 524 as subtitles based on the movements of the hands; and col. 4, lines 36-38), as claimed.  

With respect to claim 4, combination of Zhang and Kelly further discloses wherein generating subtitles based on the plurality of words or phrases and associated timestamps, further comprises:
detecting a pause between identified gestures from the plurality of sign language gestures greater than a pause threshold; and identifying a sentence boundary based on the pause, (see Zhang figure 5, 522 sentence level translation with the time on X-axis showing the breaks in the words and 524 as subtitles based on the movements of the hands, the sentence “I want to drink.”), as claimed.  

With respect to claim 6, combination of Zhang and Kelly further discloses wherein analyzing the landmark coordinates and the derivative information by at least one gesture detection model to identify the first sign language gesture, further comprises:
setting a cooldown period associated with a gesture detection model that identified the first sign language gesture, wherein the cooldown period disables the gesture detection model until the cooldown period expires, (see Zhang figure 5, 522 sentence level translation with the time on X-axis showing the breaks “cooldown period” in the words and col. 4, lines 36-44 for small and big ASL sign and the breaks in the words are the cooldown time), as claimed.  

With respect to claim 7, combination of Zhang and Kelly further discloses wherein each gesture detection model is a LSTM model trained to identify one or more gestures, (see Zhang col. 3 lines 39-41, neural networks), as claimed.  

With respect to claim 8, combination of Zhang and Kelly further discloses wherein extracting landmark coordinates associated with a signer represented in the input video, further comprises:
extracting landmark coordinates from a plurality of consecutive frames of the input video, (see Zhang col. 5, lines 45-col. 6 line 35 for using the coordinates of the hands and fingers to translate the sign language), as claimed.

Claims 10-13 and 15-17 are rejected for the same reasons as set forth in the rejections of claim 1-4 and 6-8 because claims 10-13 and 15-17 is claiming subject matter of similar scope as claimed in claims 1-4 and 6-8 respectively.  

With respect to claim 19, Zhang discloses A computer-implemented method comprising:
[receiving a request to generate subtitles for sign language content represented in a digital video, the request including at least a reference to the digital video];
obtaining the digital video; generating subtitles for the digital video using a subtitle generator system, the subtitle generator system including a plurality of moderately deep long short-term memory (LSTM) networks configured to identify dynamic gestures across multiple frames of the digital video, (see figure 5, col. 3 lines 39-41, neural networks, and Zhang figure 5, 522 sentence level translation with the time on X-axis showing the breaks in the words and 524 as subtitles based on the movements of the hands, the sentence “I want to drink.”); and returning at least a subtitle track for the digital video, (figure 5, 524 as subtitles based on the movements of the hands, the sentence “I want to drink.”), as claimed. 
However, Zhang fails to explicitly disclose receiving a request to generate subtitles for sign language content represented in a digital video, the request including at least a reference to the digital video, as claimed.  
Kelly in the same field teaches receiving a request to generate subtitles for sign language content represented in a digital video, the request including at least a reference to the digital video, (see paragraph 0029-0030, when a user signing is detected from a video call), as claimed.  
It would have been obvious to one ordinary skilled in the art at the effective date of invention to combine the two references as they are analogous because they are solving similar problem of sign language recognition using image analysis.  The teaching of Kelly to detect the signing gesture can be incorporated in to the Zhang system as suggested in figure 9 of Zhang (for suggestion) and modifying of the system yields a starting point of the translation of the sign language (see paragraph 0001 of Kelly) for motivation.  

With respect to claim 20, Combination of Zhang and Kelly further discloses wherein generating subtitles for the digital video using a subtitle generator system, the subtitle generator system including a plurality of moderately deep long short-term memory (LSTM) networks configured to identify dynamic gestures across multiple frames of the digital video, further comprises:
extracting landmark coordinates associated with a signer represented in the digital video;
determining derivative information from the landmark coordinates; analyzing the landmark coordinates and the derivative information by at least one gesture detection model to identify a first sign language gesture, (see figure 9, 906, 908 and 914 and also see col. 3, lines 17-19 video data for the scene, col. 7, lines 22-35 for the details on the steps 906, 908 and 914);
iteratively processing frames of the digital video to identify a plurality of sign language gestures, (see Zhang figure 2 and 3, for acquiring the images and figure 9 for processing i.e. iteratively processing);
receiving, by a natural language processing (NLP) accumulator, a plurality of words or phrases corresponding to the plurality of sign language gestures; identifying a timestamp associated with each of the plurality of words or phrases; and generating subtitles based on the plurality of words or phrases and associated timestamps, (see Zhang figure 5, 522 sentence level translation with the time on X-axis showing the breaks in the words and 524 as subtitles based on the movements of the hands; and col. 4, lines 36-38), as claimed.  

Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al (US 11,263,409) in view of Kelly (US Pub. 2022/0327961) as applied to claim 8 above, and further in view of A framework for sign language sentence recognition by commonsense context, by Infantino et al..
With respect to claim 5, combination of Zhang and Kelly discloses all the elements as claimed and as rejected above in claim 1.  Furthermore, combination of Zhang and Kelly further discloses extracting hand landmark coordinates using a hand landmark detector; and extracting pose landmark coordinates using a pose landmark detector, (see Zhang figure 6, 906 hand shape information and 908 hand movement “pose”), as claimed.
However, they fail to disclose extracting face landmark coordinates using a face landmark detector, as claimed.   
Infantino in the same field teaches face landmark coordinates using a face landmark detector, (see figure 3, lip extraction i.e. face landmarks extracted), as claimed.   
It would have been obvious to one ordinary skilled in the art at the effective date of invention to combine the references as they are analogous because they are solving similar problem of sign language recognition using image analysis.  The teaching of Infantino to detect the signing gesture by facial landmarks like lip movements can be incorporated in to the Zhang system to yield a predictable for translating the sign language for motivation, as claimed.  

Claim 14 is rejected for the same reasons as set forth in the rejections of claim 5 because claim 14 is claiming subject matter of similar scope as claimed in claim 5.  

Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al (US 11,263,409) in view of Kelly (US Pub. 2022/0327961) as applied to claim 8 above, and further in view of Banerjee et al (US 11,307,667).  
With respect to claim 9, combination of Zhang and Kelly discloses all the elements as claimed and as rejected above in claim 8.  However, they fail to disclose wherein determining derivative information from the landmark coordinates, further comprises:
computing velocity data for the landmark coordinates based on the landmark coordinates from two of the plurality of consecutive frames of the input video; and computing acceleration data for the landmark coordinates based on the landmark coordinates from the plurality of consecutive frames of the input video, as claimed.
Banerjee in the same field teaches computing velocity data for the landmark coordinates based on the landmark coordinates from two of the plurality of consecutive frames of the input video; and computing acceleration data for the landmark coordinates based on the landmark coordinates from the plurality of consecutive frames of the input video, (see col. 4, lines 5-15 velocity and acceleration vector is calculated for sign language models), as claimed.
It would have been obvious to one ordinary skilled in the art at the effective date of invention to combine the references as they are analogous because they are solving similar problem of sign language recognition using image analysis.  The teaching of Banerjee to detect the signing gesture by calculating the velocity and acceleration can be incorporated in to the Zhang system as suggested in col. 10, lines 54 signing speed of Zhang (for suggestion) and modifying of the system yields the predicted results for translating the sign language for motivation.  

Claim 18 is rejected for the same reasons as set forth in the rejections of claim 9 because claim 18 is claiming subject matter of similar scope as claimed in claim 9.  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIKKRAM BALI whose telephone number is (571)272-7415. The examiner can normally be reached Monday-Friday 7:00AM-3:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VIKKRAM BALI/Primary Examiner, Art Unit 2663