DETAILED ACTION
This action is responsive to the Application filed on 12/16/2020.  Claims 1-20 are pending in the case. Claims 1, 10, and 17 are the independent claims.
This action is non-final.
The instant application appears to be related by common assignee, inventor, and/or subject matter to US Patent Nos. 11,175,746 and 11,294,474.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Acknowledgement of References Cited By Applicant
As required by MPEP 609 (c), the Applicants’ submission of the Information Disclosure Statement(s) is/are acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. 
As required by MPEP 609 (c)(2), a copy of each PTOL-1449, initialed and dated by the Examiner, is attached to the instant office action.
Specification
The disclosure is objected to for the following informalities: on page 28, referring to FIG 7, the disclosure refers to "first setting 702", "second setting 702" however FIG 7 has first setting 702 (to enable context) and second setting 704 (to infer gesture).
Applicant’s assistance is required in identifying and correcting any deficiencies in the disclosure discovered during prosecution.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-9 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over WEBER (Pub. No.: US 2019/0171716 A1) in view of TARDIF (Pub. No.: US 2011/0301934 A1) further in view of NOWOZIN et al. (Pub. No.: US 2019/0392587 A1).
Regarding claim 1, WEBER teaches (or suggests) the first device, comprising: at least one processor; and storage accessible to the at least one processor and comprising instructions executable by the at least one processor to (e.g. in method 600 of figure 6: first endpoint of the video conference system, example endpoints shown in FIGs 4, 5; structural components shown in FIG 7):
receive one or more images from a camera, the one or more images indicating a first gesture being made by a person using a hand-based sign language, 
provide the one or more images to a sign language gesture classifier established at least in part by an artificial neural network (suggested at [0039] step 606 Automatically translate the video stream of the first message expressed in sign language into the textual translation of the first message; note that endpoint is configured [0038] preview should include only portions of the textual translation that the video conference system assesses do not meet a threshold confidence level of accuracy; note [0003] many techniques for automatic translation of sign language are known); 
receive, from the sign language gesture classifier, plural candidate first text words for the first gesture (suggested at [0039] step 606 Automatically translate the video stream of the first message expressed in sign language into the textual translation of the first message; note that endpoint is configured [0038] preview should include only portions of the textual translation that the video conference system assesses do not meet a threshold confidence level of accuracy; note [0003] many techniques for automatic translation of sign language are known);
use at least a second text word correlated to a second gesture different from the first gesture to select one of the candidate first text words for the first gesture (suggested at [0039] step 606 Automatically translate the video stream of the first message expressed in sign language into the textual translation of the first message; where the entirety of the video stream (i.e. multiple gestures) may be translated or only a portion; note [0003] many techniques for automatic translation of sign language are known);
combine the second text word with the selected first text word for the first gesture to establish a text string (suggested at [0039] step 606 Automatically translate the video stream of the first message expressed in sign language into the textual translation of the first message; where the entirety of the video stream (i.e. multiple gestures) may be translated or only a portion; note [0003] many techniques for automatic translation of sign language are known); and
provide the text string to an apparatus different from the first device ([0040] step 608 Display, on a display of the first endpoint of the video conference system, the entirety or the portion of the textual translation of the first message;  simply transmit the textual translation without changes (step 612); or [0041] the first endpoint may modify the textual translation of the first message based on the feedback (step 614) and then transmit a modified version of the textual translation of the first message to a second endpoint (step 616)).
As noted above, WEBER cannot be relied upon to expressly disclose at least part of the first gesture extending out of at least one respective image frame of the one or more images, and WEBER at best suggests, without expressly disclosing, the operational steps for translating the sign language gestures into a textual translation because WEBER states at [0003] many techniques for automatic translation of sign language are known, thus WEBER does not provide the explicit details.
TARDIF is a teaching example for using a machine learning based system to translate sign language gestures into text such as is suggested in WEBER. Broadly, (abstract) A capture device detects motions defining gestures and detected gestures are matched to signs. Successive signs are detected and compared to a grammar library to determine whether the signs assigned to gestures make sense relative to each other and to a grammar context. Each sign may be compared to previous and successive signs to determine whether the signs make sense relative to each other. TARDIF teaches provide the one or more images to a sign language gesture classifier…; receive, from the sign language gesture classifier, plural candidate first text words for the first gesture; use at least a second text word correlated to a second gesture different from the first gesture to select one of the candidate first text words for the first gesture; and combine the second text word with the selected first text word for the first gesture to establish a text string by relying on the operations in FIG 7 as briefly explained below:
[0092] FIG. 7 illustrates a method in accordance with the present technology for providing a sign language interpretation system based on motion tracking and gesture interpretation. Operations include [0094] (708) monitoring user action (710) detecting that a gesture has occurred [0095] (712) gestures are recognized, compared to known sign data, and when a possible match is found (714) an initial probability of recognition is assigned (including possible alternatives). These initial determinations are evaluated at (718) with respect to other signs in the stream, as well as other context information. [0096] corrections may be made (722, 724) if the sign is not correct.  [0097] Previous gesture (726) information is compared (728) to confirm the probability that the initial gesture was recognized. If so, the weight assigned to the initial sign is increased (736) and an output generated (734). [0099] if the previous gesture and sign are not confirmed (728), then other alternative signs are considered and, if appropriate, assigned, and the probability (732) is adjusted and output generated (724). [0100] The system continues to receive signs and these later signs can be used to determine whether previous signs were correctly assigned.
The operations of FIG 7 are implemented using, for example, the system components in FIG 2.
Accordingly, it would have been obvious to one having ordinary skill in graphical user interfaces before the effective filling date of the claimed invention, having the teachings of WEBER and TARDIF before them, to have combined WEBER (teaching a use of sign language gesture translation, relying on using known translation techniques) and TARDIF (teaching specific operations for a sign language translation technique) with a reasonable expectation of success, the combination motivated by the suggestion in WEBER as explained above.
WEBER in view of TARDIF does not explicitly describe the sign language gesture classifier is established at least in part by an artificial neural network, nor does WEBER in view of TARDIF explicitly describe at least part of the first gesture extending out of at least one respective image frame of the one or more images.
NOWOZIN is broadly directed to (abstract) a system to predict a location of a feature point of an articulated object from a plurality of data points relating to the articulated object of which some possess and some are missing 2D location data. The problem that NOWOZIN is solving is [0002] if part of a body to be detected is partially obscured by another part of the body (self-occlusion) or by an additional object, or because the person is partially outside a field of view of a camera, then feature detection can fail and, consequentially, additional computation that relies on the feature detection can fail. The prediction technique taught in NOWOZIN is [0025] suitable for use with known systems for identifying and tagging features such as trained classifier that labels image elements as being one of a plurality of possible features, a classifier that uses depth camera data depicting an articulated object to compute 2D feature positions of the articulated object for a plurality of specified feature points (in other words, gesture detection, such as might be used for sign language). Further, the gesture recognition system uses a neural network with a number of different layers (see e.g. [0042, 0053, 0071]).
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention, having the teachings of WEBER-TARDIF and NOWOZIN before them, to have combined WEBER-TARDIF (teaching a system for recognizing and translating sign language gestures in a video conference) and NOWOZIN (teaching an improvement for gesture recognition that predicts location information when at least part of the gesture is partially outside a field of view of a camera) by applying the technique NOWOZIN to the sign language gesture recognition of WEBER-TARDIF with a reasonable expectation of success, the combination resulting in recognizing the sign language gesture when at least part of the first gesture extending out of at least one respective image frame of the one or more images, where the images are provided to a gesture classifier established at least in part by an artificial neural network The combination is motivated by the need to improve gesture recognition and prevent failure in properly recognizing the gesture.
Regarding dependent claim 2, incorporating the rejection of claim 1, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein at least part of the first gesture extends out of each respective image frame of the one or more images (the problem being solved in NOWOZIN with general gesture recognition, applied to the sign language gesture recognition taught in WEBER-TARDIF as explained above).
Regarding dependent claim 3, incorporating the rejection of claim 1, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein natural language understanding is executed to select the selected first text word from the candidate first text words using the second text word (the process in TARDIF FIG 7 for determining whether the output (word or character) for current gesture is correct within the context of the previous gesture output and the next gesture output; the technique relies on grammar library, thus teaching “natural language understanding” under the breadth of the term; note FIG 2, gesture library 455 results passed to lexicon/grammar matcher 195 with lexicon library 193 and grammar 185, output generator 188).
Regarding dependent claim 4, incorporating the rejection of claim 1, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein the sign language gesture classifier is configured for receiving as input images of respective gestures and providing as output one or more respective text words corresponding to respective gestures from the input (the broad teachings of WEBER, using the sign language gesture recognition of TARDIF as explained in claim 1).
Regarding dependent claim 5, incorporating the rejection of claim 4, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein the sign language gesture classifier uses a database of image frames corresponding to respective gestures to provide the output (taught in TARDIF [0095] (712) gestures are recognized, compared to known sign data).
Regarding dependent claim 6, incorporating the rejection of claim 1, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches the instructions are further executable to: 
use at least the second text word and a third text word correlated to a third gesture different from the first and second gestures to select one of the candidate first text words for the first gesture, wherein the second gesture as indicated in images from the camera was gestured before the first gesture and wherein the third gesture as indicated in images from the camera was gestured after the first gesture (as explained in TARDIF [0100] The system continues to receive signs and these later signs can be used to determine whether previous signs were correctly assigned); and 
combine the second and third text words with the selected first text word for the first gesture to establish the text string, the text string comprising the second text word placed before the first text word and comprising the third text word placed after the first text word (WEBER generates text for the sequence of gestures using, as explained, the operations in TARDIF; note also that TARDIF FIG 12 shows a string of words based on a sequence of gestures).
Regarding dependent claim 7, incorporating the rejection of claim 1, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein the apparatus is a display controlled 1201-38332RPS920200082-US-NPby the first device (WEBER: the first endpoint includes display where text is displayed and feedback solicited), wherein the text string is presented on the display (WEBER [0040] step 608 Display, on a display of the first endpoint of the video conference system, the entirety or the portion of the textual translation of the first message), and wherein the first device receives the one or more images from a third device different from the first device and apparatus (e.g. WEBER from the camera associated with the endpoint).
Regarding dependent claim 8, incorporating the rejection of claim 1, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein the first device is a server, wherein the apparatus is an end-user device, and wherein the one or more images are received from a third device different from the first device and apparatus (alternative interpretation of WEBER, where the translation is performed not at the user’s endpoint itself, but at a server, see e.g. [0006] server-based translation; [0019] part of video conferencing system [0025] application server 214 ( e.g., a user experience engine) facilitate automated sign language-to-text and/or sign language-to-speech translation; images are received from a camera associated with the first endpoint (third device different from both the server and the receiver) and results are provided to a second endpoint (end-user device)).
Regarding dependent claim 9, incorporating the rejection of claim 1, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches the first device comprising the camera (WEBER first endpoint includes camera), wherein the first device is a first end-user device (the first endpoint), and wherein the apparatus is a second end-user device (after feedback, translated text sent to second endpoint; see WEBER [0040]).
Regarding claim 17, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, similarly teaches the at least one computer readable storage medium (CRSM) that is not a transitory signal, the computer readable storage medium comprising instructions executable by at least one processor to (e.g. the storage of the first device of claim 1; taught in WEBER first endpoint of the video conference system, example endpoints shown in FIGs 4, 5; structural components shown in FIG 7; relying on method 600 in FIG 6): 
receive one or more images at a first device, the one or more images indicating a first gesture being made by a person using a sign language (taught in WEBER [0039] step 604 Capture, by a video camera of the first endpoint, a video stream of the first message expressed in sign language by the user), at least part of the first gesture extending out of at least one respective image frame of the one or more images (WEBER in view of the improvement taught by NOWOZIN for predicting missing 2D data when body part is outside field of view of camera; see discussion claim 1);
provide the one or more images to a gesture classifier (suggested at WEBER [0039] step 606 Automatically translate the video stream of the first message expressed in sign language into the textual translation of the first message; note WEBER [0003] many techniques for automatic translation of sign language are known; specific example of gesture classifier and operations for translating sign language taught in TARDIFF FIG 2 and FIG 7as explained in claim 1);
receive, from the gesture classifier, plural candidate first text words for the first gesture (suggested at WEBER [0039] step 606 Automatically translate the video stream of the first message expressed in sign language into the textual translation of the first message; note WEBER [0003] many techniques for automatic translation of sign language are known; specific example of gesture classifier and operations for translating sign language taught in TARDIFF FIG 2 and FIG 7as explained in claim 1);
use context determined from at least a second text word that has been correlated to a second gesture different from the first gesture to select one of the candidate first text words for the first gesture (suggested at WEBER [0039] step 606 Automatically translate the video stream of the first message expressed in sign language into the textual translation of the first message; note WEBER [0003] many techniques for automatic translation of sign language are known; specific example of natural language understanding of sequence of gestures and operations for translating sign language taught in TARDIFF FIG 2 and FIG 7as explained in claim 1);
combine the second text word with the selected first text word to establish a text string (suggested at WEBER [0039] step 606 Automatically translate the video stream of the first message expressed in sign language into the textual translation of the first message; note WEBER [0003] many techniques for automatic translation of sign language are known; specific example of natural language understanding of sequence of gestures and operations for translating sign language taught in TARDIFF FIG 2 and FIG 7as explained in claim 1); and
provide the text string to an apparatus (see at least WEBER ([0040] step 608 Display, on a display of the first endpoint of the video conference system, the entirety or the portion of the textual translation of the first message;  simply transmit the textual translation without changes (step 612); or [0041] the first endpoint may modify the textual translation of the first message based on the feedback (step 614) and then transmit a modified version of the textual translation of the first message to a second endpoint (step 616)).
Regarding dependent claim 18, incorporating the rejection of claim 17, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein the gesture classifier is established at least in part by a trained artificial neural network (ANN) (using the gesture classifier of NOWOZIN which is explicitly described as a trained neural network, (e.g. [0025] trained classifier, improved by [0028-0029] conditional variational autoencoder 20 of FIG 2), as the gesture recognizer 190 in TARDIF FIG 2; note TARDIF [0042] explains details of skeletal extraction and motion tracking are described in an earlier reference; ), the ANN being trained prior to the gesture classifier outputting the plural candidate first text words (inherently, a trained system cannot be used until at least some training has been completed prior to its use), the ANN being trained using labeled sample image frames indicating various gestures in the sign language (TARDIF [0041] Motions and gesture components are translated into gestures which are matched against a library 193 of known signs which are equivalent to gestures, relying on NOWOZIN to teach the underlying trained neural network for recognizing gestures generally).
Regarding dependent claim 19, incorporating the rejection of claim 17, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein natural language understanding is executed to determine the context from the second text word (the process in TARDIF FIG 7 for determining whether the output (word or character) for current gesture is correct within the context of the previous gesture output and the next gesture output; the technique relies on grammar library, thus teaching “natural language understanding” under the breadth of the term; note FIG 2, gesture library 455 results passed to lexicon/grammar matcher 195 with lexicon library 193 and grammar 185, output generator 188).
Regarding dependent claim 20, incorporating the rejection of claim 17, WEBER-TARDIF-NOWOZIN, combined at least for the reasons discussed above, further teaches wherein the instructions are further executable to: execute the gesture classifier to output the candidate first text words (relying on TARDIF FIG 7 as previously explained, words are presented to user for feedback at (722); final output is generated at (734) and if needed altered at (744), output can be displayed to user as in FIG 12; note that WEBER also provides output of text string to the first and/or second endpoints as previously noted), wherein the gesture classifier uses a database of image frames indicating various gestures to identify the candidate first text words (relying on TARDIF FIG 2 to implement the operations of FIG 7, as suggested by WEBER (606); TARDIF operation [0095] (712) gestures compared to known sign library, e.g. gesture library (455) of FIG 2).
Claims 10-14 are rejected under 35 USC 103 as unpatentable over TARDIF in view of NOWOZIN.
Regarding claim 10, TARDIF teaches the method comprising (e.g. FIG 7, using device of FIG 2; discussed in detail in the rejection of claim 1, used with FIGs 11 and 12)):
providing, at a first device, at least one image showing a first gesture into a gesture classifier to receive, as output from the gesture classifier, a first text word corresponding to the first gesture (illustrated in FIGs 11, 12; gesture is recognized using operations in FIG 7, text word is determined and displayed as part of sequence of sign language gestures);
providing, at the first device, at least one image a second gesture into the gesture classifier to receive, as output from the gesture classifier, a second text word corresponding to the second gesture, the second gesture being different from the first gesture, the second text word being different from the first text word (operations of FIG 7 used with a sequence of gestures, thus a first gesture and a second gesture; broadly determines whether second (subsequent) gesture output is appropriate in view of the first (previously-received) gesture output) and
providing, to an apparatus different from the first device, a text string indicating the first text word and the second text word (in FIGs 11, 12 the images are provided on a display apparatus (e.g. a television), which is distinct from the device which is performing the gesture analysis (e.g. device 12, see FIG 2).
TARDIF cannot be relied upon to expressly disclose the at least one image partially but not fully showing a second gesture. As explained in the rejection of claim 1 above, NOWOZIN is directed to improving gesture recognition with a neural network when the person is partially outside a field of view of a camera (which can cause errors in gesture interpretation) by predicting the location of the gesture 2D points (e.g. which are outside the field of view).
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention, having the teachings of TARDIF and NOWOZIN before them, to have improved the sign language gesture recognition in TARDIF using the prediction technique of NOWOZIN with a reasonable expectation of success, the combination resulting in providing, at the first device, at least one image partially but not fully showing a second gesture into the gesture classifier, the combination motivated by the improvement to gesture recognition generally taught in NOWOZIN (e.g. using prediction will decrease incorrect or failing results).
Regarding dependent claim 11, incorporating the rejection of claim 10, TARDIF further teaches wherein the gesture classifier determines plural candidate second text words for the second gesture as the output (TARDIF FIG 7 [0095] (714, 716) possible matches with probabilities for gesture, gestures may have alternative possible meanings; note this is prior to (726) compare with previous sign ), and wherein the first device uses the first text word to select one of the candidate second text words to use in the text string (TARDIF FIG 7 after (726) comparing with previous sign, [0097] confirm probability for initial sign and (724) generate output; such as is displayed on FIG 12).
Regarding dependent claim 12, incorporating the rejection of claim 11, TARDIF further teaches wherein the first device executes natural language understanding to use the first text word to select one of the candidate second text words based on context determined from the first text word (the process in TARDIF for determining whether the output (word or character) for current gesture is correct within the context of the previous gesture output and the next gesture output; the technique relies on grammar library, thus teaching “natural language understanding” under the breadth of the term; note FIG 2, gesture library 455 results passed to lexicon/grammar matcher 195 with lexicon library 193 and grammar 185, output generator 188).
Regarding dependent claim 13, incorporating the rejection of claim 10, TARDIF in view of NOWOZIN, combined at least for the reasons discussed above, further teaches wherein the gesture classifier extrapolates additional portions of the second gesture extending out of the at least one image partially but not fully showing the second gesture (the technique of NOWOZIN for predicting data points which are missing from the gesture image; see at least (abstract), [0006][0024-0025][0027]), and wherein the gesture classifier uses the extrapolation to output plural candidate second text words (using the improved gesture recognition taught by NOWOZIN for interpreting the sign language gestures in TARDIF FIG 7 in order to determine the sign translation probabilities and provide output).
Regarding dependent claim 14, incorporating the rejection of claim 10, TARDIF further teaches wherein the first gesture is correlated to the first text word according to a sign language corresponding to a first written language (FIG 7 implemented using device of FIG 2 which includes lexicon library 193 and grammar 185 for first written language (textual output to be generated), e.g. English as can be seen in FIG 12), and wherein the second gesture is correlated to the second text word according to the sign language (correlation illustrated by matching gesture to word (sign) probability in at least FIG 7 sequence (710,712,714,716); as well as correlation of other gestures to entire sequence of gestures at (726-734) and (738-742)).
Claim 15 is rejected under 35 USC 103 as unpatentable over TARDIF in view of NOWOZIN, further in view of YAMAMOTO et al. (Pub. No.: US 2002/0111794 A1).
Regarding dependent claim 15, incorporating the rejection of claim 14, TARDIF does not appear to expressly disclose wherein the text string provided to the apparatus comprises a third text word in a second written language that corresponds to the first text word in the first written language, the second written language being different from the first written language, and wherein the text string provided to the apparatus further comprises a fourth text word in the second written language that corresponds to the second text word in the first written language because while TARDIF acknowledges that [0086] there are hundreds of sign languages corresponding to different spoken languages, the disclosure of TARDIF is primarily with respect to translating [0087] American Sign language to English.
YAMAMOTO is similarly directed to a method for translating sign language gestures using the dataflow of FIG 8 [0107-0113]. Note that sign language recognition 53 unit receives analyzed images from 52 and under the control of text conversion controller 29 converts the sign language to text 26 based on a text database 25, which is then output 27. Further note that [0107] makes clear that FIG 8 is an apparatus which converts a sign-language image to text data, and performs conversion processing of the text data according to the above-noted control commands (that is, conversion processing of other embodiments). Accordingly, the teachings of [0040] The text analyzer 11, if necessary, can convert (translate) the input text data to a prescribed language; and [0057] can cause the text analyzer 11 to change a word in the input text to an arbitrary dialect, or to a different language entirely (that is, to perform translation) are presumed to be a “conversion process” which may be applied to the text data generated by the sign language translation process.
Applying a translation technique to text data obtained from sign language gesture analysis is thus known in the art, such that applying the predictable result of the text string provided to the apparatus comprises a third text word in a second written language that corresponds to the first text word in the first written language, the second written language being different from the first written language, and wherein the text string provided to the apparatus further comprises a fourth text word in the second written language that corresponds to the second text word in the first written language as required by the claim.
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention, having the teachings of TARDIF in view of NOWOZIN and YAMAMOTO before them, to have applied the known technique of text translation for sign language text data taught in YAMAMOTO to the sign language text string generated by TARDIF in view of NOWOZIN, the results being clearly predictable to one of ordinary skill in the art (one or more words of the sign language text is translated to a second language), the ability to apply the technique suggested at least by YAMAMOTO itself (applying text conversion process to the text generated by sign language), and arrived at the claimed invention. The combination is motivated at least by the desired goal of YAMAMOTO in [0020-0021] information exchange that enables rich and enjoyable expression of emotions and, in a case in which information transmission is done, smooth communication is enabled without an increase in the amount of information transmitted, by extracting information and converting character data based on the extracted information).
Claim 16 is rejected under 35 USC 103 as unpatentable over TARDIF in view of NOWOZIN, further in view of WEBER.
Regarding dependent claim 16, incorporating the rejection of claim 10, TARDIF, while teaching a remote computer (server) in communication with the user’s device [0081] does not appear to expressly disclose wherein the first device {that receives the gesture images and generates the text words} comprises a server, and wherein the apparatus {that receive the text from the first device} comprises a second device of an end-user.
WEBER, as discussed in the rejection of claim 1, is directed to the use of sign language translation in a video conference setting. Within this context, WEBER executes the method of FIG 6 and discloses the reference [0025] relies on known techniques for translating sign language to text and/or speech, but introduces such activities in the context of video conferences. Along with providing these facilities in this context, and preferably via cloud-based server translation components rather than endpoints, the present invention also introduces user interfaces for use at video conference endpoints used by hearing-impaired users. 
Thus, WEBER may be relied up to teach wherein the first device {that receives the gesture images and generates the text words} comprises a server, and wherein the apparatus {that receive the text from the first device} comprises a second device of an end-user (e.g. cloud-based server providing translation services for communication between video conference endpoints; see also [0040] first endpoint can provide translation for feedback purposes; note also [0041] transmit modified or unmodified version of textual translation to second endpoint [0042] through cloud-based server suggested at [0042] which may also synthesize audio translation).


It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).


CONCLUSION
The prior art made of record is considered pertinent to applicant’s disclosure and is recorded on Form PTO-892. Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
US 2019/0340426 A1 (RANGARAJAN) systems and methods for interpreting gesture(s) and/or sign(s) using a machine-learned model, Information regarding the interpreted gesture(s) and/or sign(s) are provided (e.g., displayed as visual text and/or an audible output) to the second user (FIG 4); includes training system (FIG 5); and different target languages [0028].
US 10,176,366 B1 (MAXWELL) video relay service for providing automatic translation services during a real-time communication session, including translation of human sign language.
US 2017/0277684 A1 (DHARMARJAN) sign language communication system
US 10304208 B2 (CHANDLER) automated gesture identification (sign language) using neural networks
US 2019/0138607 A1 (ZHANG) word and sentence level sign language translation
US 2020/0005673 A1 (LIN) word-by-word interpretation with neural network and language model
US 2020/0167556 A1 (KAUR) real-time gesture recognition including sign language (see cover image, e.g. FIG 1B)
US 20210397266 A1 (GUPTA) much more comprehensive neural network for determining any sign language word
US 20130304451 A1 (SARIKAYA) another teaching for translating sign language into different target languages using machine translation
US 20190251344 A1 (MENEFEE) [0013] facilitate communication between a sign language speaker and a non-sign language speaker. This can include computer implemented methods for visual language interpretation and spoken language interpretation. In some embodiments, the method includes receiving video data of a sign language speaker signing and displaying the video data on a primary display. The video data of the sign language speaker signing can be translated into translation text and displayed on the primary display for viewing by a user.



	

	
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY M LEVY whose telephone number is (571)270-3771. The examiner can normally be reached Mon-Fri 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KIEU VU can be reached on (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Amy M Levy/Primary Examiner, Art Unit 2173