DETAILED ACTION

Introduction
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on 07/13/2022 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments/Amendments
3.	 With respect to Claim Rejection 35 U.S.C § 102/103, Applicant’s arguments have been considered but are moot because the new ground to rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenge in the argument. 
	
Claims Objections
4.	Claims 3, 5, 20 are objected because of the following informalities: typographical errors. The phrase “phoneme/sign fragment” in claims 3, 5 and 20 should be changed to “phoneme” as claimed in claim 1. Appropriate correction is required.
 	Claim 12 is objected because of the following informalities: typographical errors. The period (.) at the end of claim 12 is missed. Appropriate correction is required.

Claim Rejections - 35 USC § 102
5.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

6.	Claims 1-8, 10, 12-13, 15-22, 24, 26, 17, 29-30 are rejected under 35 U.S.C. 102(a) (1) as being anticipated by Zhou et al. (US 2016/0307469 A1.)

 	With respect to Claim 1, Zhou et al. disclose 
 	A method of translating sign language utterances into a target language, comprising: 
 	receiving motion capture data (Zhou e al. Fig. 9 element 904 Receive dominant hand and non-dominant hand input corresponding to a sequence of signs via user interface device); 
extracting features in parallel for each group of features of a plurality of groups of features on the basis of an independent temporal segmentation of the motion capture data for each group of features (Zhou et al. Fig. 9 element 908 Extract features from dominant and non-dominant hand inputs); 
producing phonemes based on the extracted features (Zhou et al. 0027] As used herein, the terms “sign phoneme” or more simply “phoneme” are used interchangeably and refer to a gesture, handshape, location, palm orientation, or other hand posture using one or two hands that corresponds to the smallest unit of a sign language. Each sign in a predetermined sign language includes at least one sign phoneme, and as used herein the term “sign” refers to a word or other unit of language that is formed from one or more sign phoneme, Fig. 9 element 912);
 	producing a plurality of sign sequences from the phonemes (Zhou et al. Fig. 9 elements 912-920 Identify start of sequence from features using a start/end model stored in memory, Identify next sign in sequence from features using a sign model using a sign model stored in memory, Identify next transition or end of sequence using transition model stored in memory and start/end model stored in memory); 
 	parsing the sign sequences to produce grammatically parsed sign utterances (Zhou et al. Fig. 9 element 925 Identify words corresponding to identified sign using language model stored in memory, see paragraphs [0059 and 0060]); 
 	translating the grammatically parsed sign utterances into grammatical representations in the target language (Zhou et al. [0031] The sign language phrase recognition model in the memory 108 also includes a language model 128 that converts identified signs from the sign language phrase recognition model 112 into phrases/sentences for one or more languages, such as Chinese or American Sign Language phrases/sentences, [0060] To ensure a proper grammatical output, the translation procedure may include reordering the words in the output from the actual order of the signs as entered by the user 102 or adding additional articles, conjunctions, prepositions, and other words that are not directly included in the sign language to the final output); and 
 	generating output utterances in the target language based upon the grammatical representations (Zhou et al. [0061] the processor 104 generates synthesized speech for the words in the output using a loudspeaker or other audio output device 136. In this mode, the system 100 can produce audio output for recipients who can hear but are unable to understand the signs from the user 102.)

 	With respect to Claim 2, Zhou et al. disclose 
 	wherein confidence values are produced for each generated output utterance (Zhou et al. [0008] The method includes generation of explicit models for the transition signals before, between, and after signs within a HMM framework. In traditional speech recognition, an HMM based framework trains HMMs for phonemes as well as for the short pause between words and the silence between phrases. The speech recognition HMM then connects the phoneme HMMs into word HMMs and connects the word HMMs with the short pause/silence HMMs into a decoding network. Recognition is conducted by searching for the path with the highest probability in the decoding network given the input signals. FIG. 2 illustrates a prior art decoding network 200 for SR along with the HMM structures, see paragraphs [0046, 0057].)  

 	With respect to Claim 3, Zhou et al. disclose 
 	wherein producing phonemes from motion capture data includes producing a confidence value for each produced phoneme/sign fragment (Zhou et al. [0008] The method includes generation of explicit models for the transition signals before, between, and after signs within a HMM framework. In traditional speech recognition, an HMM based framework trains HMMs for phonemes as well as for the short pause between words and the silence between phrases. The speech recognition HMM then connects the phoneme HMMs into word HMMs and connects the word HMMs with the short pause/silence HMMs into a decoding network. Recognition is conducted by searching for the path with the highest probability in the decoding network given the input signals. FIG. 2 illustrates a prior art decoding network 200 for SR along with the HMM structures, see paragraphs [0046, 0057].)  

 	With respect to Claim 4, Zhou et al. disclose 
 	wherein producing phonemes includes producing a plurality of segments as time intervals matching the phonemes, where these intervals of the segments may overlap (Zhou et al. [0057] The processor 104 uses the same sets of extracted features for recognition using the HMM models in the decoding network 300 as are used during the training process of FIG. 8, such as sets of FR and GA features for the dominant hand and GA features for the non-dominant hand or the FR and GA features of the dominant hand in combination with the transition features of the non-dominant hand in the TWT embodiment. As indicated in FIG. 1, the HMMs 116-124 are linked to each other so after detection of the start of a sequence, the processor 104 analyzes the next set of features that occur in the time interval following the detection of the initial start feature with a high probability weighting toward the features corresponding to the phonemes in a given sign (block 916). The processor 104 uses the HMM 120 to identify the next sign in the sequence based on the next set of features that are extracted from the input sequence, see paragraph [0058].)

 	With respect to Claim 5, Zhou et al. disclose  
wherein producing phonemes and their intervals includes determining of a set of possible succeeding phoneme/sign fragment for each phoneme/sign fragment (Zhou et al. [0057] The processor 104 uses the same sets of extracted features for recognition using the HMM models in the decoding network 300 as are used during the training process of FIG. 8, such as sets of FR and GA features for the dominant hand and GA features for the non-dominant hand or the FR and GA features of the dominant hand in combination with the transition features of the non-dominant hand in the TWT embodiment. As indicated in FIG. 1, the HMMs 116-124 are linked to each other so after detection of the start of a sequence, the processor 104 analyzes the next set of features that occur in the time interval following the detection of the initial start feature with a high probability weighting toward the features corresponding to the phonemes in a given sign (block 916). The processor 104 uses the HMM 120 to identify the next sign in the sequence based on the next set of features that are extracted from the input sequence, see paragraph [0058].) 

 	With respect to Claim 6, Zhou et al. disclose  
 	wherein producing sign sequences from the phonemes includes matching potential paths in a graph of phonemes to each sign in each sign sequence (Zhou et al. [0058] The processor 104 identifies a path using the HMM network 300 and a maximum a posteriori (MAP) estimation process with a Bayes decision formulation to identify the sign phonemes and sequences of signs in the trained HMMs that have the greatest likelihood of matching the extracted features from the input data to determine if a set of features corresponds to a transition between signs or the end of a phrase and determines the transition or end of phrase, see paragraphs [0027, 0035, 0038, 0046].)

 	With respect to Claim 7, Zhou et al. disclose  
 	wherein producing grammatically parsed sign utterances includes producing a grammatical context and using the grammatical context of previous utterances (Zhou et al. [0027] Many signed words can use the same phoneme or different phonemes more than once in a sequence to form a single word or express an idea in the context of a larger sentence or phrase. As used herein, the term “transition” refers to the movements of the hands that are made between individual signs, such as between sequences of sign phonemes that form individual words or between the phonemes within a single sign. Unlike a sign phoneme, a transition movement is not a typical unit of a sign language that has independent meaning, but transition movements still provide important contextual information to indicate the end of a first word or phoneme and the beginning of a second word or phoneme, see paragraph [0046].)

 	With respect to Claim 8, Zhou et al. disclose  
 	wherein producing grammatically parsed sign utterances includes producing a confidence value based on the confidences of the signs, the confidence of the parsing and confidence of the parse matching a grammatical context for each parse of each sign sequence (Zhou et al. [0008] The method includes generation of explicit models for the transition signals before, between, and after signs within a HMM framework. In traditional speech recognition, an HMM based framework trains HMMs for phonemes as well as for the short pause between words and the silence between phrases. The speech recognition HMM then connects the phoneme HMMs into word HMMs and connects the word HMMs with the short pause/silence HMMs into a decoding network. Recognition is conducted by searching for the path with the highest probability in the decoding network given the input signals. FIG. 2 illustrates a prior art decoding network 200 for SR along with the HMM structures, see paragraphs [0046, 0057, 0058].)  
  
 	With respect to Claim 10, Zhou et al. disclose  
 	further comprising detecting the end of a sign language utterance before parsing the sign sequence to produce a grammatically parsed sign utterance (Zhou et al. Fig. 9 element 912, see paragraphs [0027, 0031, 0033, 0035].)

 	With respect to Claim 12, Zhou et al. disclose 
 	wherein the motion capture data includes data captured using marked gloves used by a user to produce the sign language utterance (Zhou et al. Fig. 5, [0037] As described above, in one embodiment the system 100 uses gloves with sensors that record acceleration and hand angle position information for the dominant and non-dominant hands of the user. FIG. 5 depicts examples of gloves 500 that are worn during the process 800 and during subsequent sign language recognition processing. The gloves 500 include multiple sensors, such as micro-electromechanical systems (MEMS) sensors that record the altitude, orientation, and acceleration of different portions of the dominant and non-dominant hands of the user as the user performs hand movements/postures to form the signs in the predetermined sequence. FIG. 6 depicts a more detailed view of one embodiment of the sensors 600 in a gloves for a right hand, see paragraphs [0021, 0029, 0030].)
	
 	With respect to Claim 13, Zhou et al. disclose 
 	wherein user specific parameter data (Zhou et al. [0032] During operation, a user 102 provides sign language input to the system 100 using the input device 132. In a training mode, the user 102 provides sign inputs to the input devices 132 for one or more predetermined sequences of signs that the processor 104 stores with the training data 130 in the memory 108) is used for one of: 
 	producing phonemes (Zhou et al. 0027] As used herein, the terms “sign phoneme” or more simply “phoneme” are used interchangeably and refer to a gesture, handshape, location, palm orientation, or other hand posture using one or two hands that corresponds to the smallest unit of a sign language. Each sign in a predetermined sign language includes at least one sign phoneme, and as used herein the term “sign” refers to a word or other unit of language that is formed from one or more sign phoneme, Fig. 9 element 912); 
 	producing a plurality of sign sequences (Zhou et al. Fig. 9 elements 912-920); 
 	parsing these sign sequences (Zhou et al. Fig. 9 element 925); and 
 	translating the grammatically parsed sign utterances (Zhou et al. [0031] The sign language phrase recognition model in the memory 108 also includes a language model 128 that converts identified signs from the sign language phrase recognition model 112 into phrases/sentences for one or more languages, such as Chinese or American Sign Language phrases/sentences, [0060] To ensure a proper grammatical output, the translation procedure may include reordering the words in the output from the actual order of the signs as entered by the user 102 or adding additional articles, conjunctions, prepositions, and other words that are not directly included in the sign language to the final output.)  

	With respect to Claim 15, Zhou et al. disclose 
 	A system configured to translate sign language utterances into a target language, comprising: 
 	an input interface configured to receive motion capture data (Zhou e al. Fig. 9 element 904 Receive dominant hand and non-dominant hand input corresponding to a sequence of signs via user interface device); 
 	a memory (Zhou et al. [0016] a memory, and a processor); and 
 	a processor in communication with the input interface and the memory (Zhou et al. see paragraphs [0014-0016], the processor being configured to: 
 	extracting features in parallel for each group of features of a plurality of groups of features on the basis of an independent temporal segmentation of the motion capture data for each group of features (Zhou et al. Fig. 9 element 908 Extract features from dominant and non-dominant hand inputs);  
 produce phonemes based on the extracted features (Zhou et al. 0027] As used herein, the terms “sign phoneme” or more simply “phoneme” are used interchangeably and refer to a gesture, handshape, location, palm orientation, or other hand posture using one or two hands that corresponds to the smallest unit of a sign language. Each sign in a predetermined sign language includes at least one sign phoneme, and as used herein the term “sign” refers to a word or other unit of language that is formed from one or more sign phoneme, Fig. 9 element 912 
 	produce a plurality of sign sequences from the phonemes (Zhou et al. Fig. 9 elements 912-920 Identify start of sequence from features using a start/end model stored in memory, Identify next sign in sequence from features using a sign model using a sign model stored in memory, Identify next transition or end of sequence using transition model stored in memory and start/end model stored in memory); 
 	parse the sign sequences to produce grammatically parsed sign utterances (Zhou et al. Fig. 9 element 925 Identify words corresponding to identified sign using language model stored in memory, see paragraphs [0059 and 0060]); 
 	translate the grammatically parsed sign utterances into grammatical representations in the target language (Zhou et al. [0031] The sign language phrase recognition model in the memory 108 also includes a language model 128 that converts identified signs from the sign language phrase recognition model 112 into phrases/sentences for one or more languages, such as Chinese or American Sign Language phrases/sentences, [0060] To ensure a proper grammatical output, the translation procedure may include reordering the words in the output from the actual order of the signs as entered by the user 102 or adding additional articles, conjunctions, prepositions, and other words that are not directly included in the sign language to the final output); and 
 	generate output utterances in the target language based upon the grammatical representations (Zhou et al. [0061] the processor 104 generates synthesized speech for the words in the output using a loudspeaker or other audio output device 136. In this mode, the system 100 can produce audio output for recipients who can hear but are unable to understand the signs from the user 102.)

 	With respect to Claim 16, Zhou et al. disclose 
 	wherein confidence values are produced for each generated output utterance (Zhou et al. [0008] The method includes generation of explicit models for the transition signals before, between, and after signs within a HMM framework. In traditional speech recognition, an HMM based framework trains HMMs for phonemes as well as for the short pause between words and the silence between phrases. The speech recognition HMM then connects the phoneme HMMs into word HMMs and connects the word HMMs with the short pause/silence HMMs into a decoding network. Recognition is conducted by searching for the path with the highest probability in the decoding network given the input signals. FIG. 2 illustrates a prior art decoding network 200 for SR along with the HMM structures, see paragraphs [0046, 0057].)  

 	With respect to Claim 17, Zhou et al. disclose
 	wherein producing phonemes from motion capture data includes producing a confidence value for each produced phoneme (Zhou et al. [0008] The method includes generation of explicit models for the transition signals before, between, and after signs within a HMM framework. In traditional speech recognition, an HMM based framework trains HMMs for phonemes as well as for the short pause between words and the silence between phrases. The speech recognition HMM then connects the phoneme HMMs into word HMMs and connects the word HMMs with the short pause/silence HMMs into a decoding network. Recognition is conducted by searching for the path with the highest probability in the decoding network given the input signals. FIG. 2 illustrates a prior art decoding network 200 for SR along with the HMM structures, see paragraphs [0046, 0057].)  

 	With respect to Claim 18, Zhou et al. disclose 
 	wherein producing phonemes includes producing a plurality of segments as time intervals matching the phonemes, where these intervals of the segments may overlap (Zhou et al. [0057] The processor 104 uses the same sets of extracted features for recognition using the HMM models in the decoding network 300 as are used during the training process of FIG. 8, such as sets of FR and GA features for the dominant hand and GA features for the non-dominant hand or the FR and GA features of the dominant hand in combination with the transition features of the non-dominant hand in the TWT embodiment. As indicated in FIG. 1, the HMMs 116-124 are linked to each other so after detection of the start of a sequence, the processor 104 analyzes the next set of features that occur in the time interval following the detection of the initial start feature with a high probability weighting toward the features corresponding to the phonemes in a given sign (block 916). The processor 104 uses the HMM 120 to identify the next sign in the sequence based on the next set of features that are extracted from the input sequence, see paragraph [0058].)

 	With respect to Claim 19, Zhou et al. disclose 
 	wherein producing phonemes and their intervals includes determining of a set of possible succeeding phonemes for each phoneme (Zhou et al. [0057] The processor 104 uses the same sets of extracted features for recognition using the HMM models in the decoding network 300 as are used during the training process of FIG. 8, such as sets of FR and GA features for the dominant hand and GA features for the non-dominant hand or the FR and GA features of the dominant hand in combination with the transition features of the non-dominant hand in the TWT embodiment. As indicated in FIG. 1, the HMMs 116-124 are linked to each other so after detection of the start of a sequence, the processor 104 analyzes the next set of features that occur in the time interval following the detection of the initial start feature with a high probability weighting toward the features corresponding to the phonemes in a given sign (block 916). The processor 104 uses the HMM 120 to identify the next sign in the sequence based on the next set of features that are extracted from the input sequence, see paragraph [0058].) 
 
 	With respect to Claim 20, Zhou et al. disclose 
 	wherein producing sign sequences from the phonemes/sign fragments includes matching potential paths in a graph of phonemes to each sign in each sign sequence (Zhou et al. [0058] The processor 104 identifies a path using the HMM network 300 and a maximum a posteriori (MAP) estimation process with a Bayes decision formulation to identify the sign phonemes and sequences of signs in the trained HMMs that have the greatest likelihood of matching the extracted features from the input data to determine if a set of features corresponds to a transition between signs or the end of a phrase and determines the transition or end of phrase, see paragraphs [0027, 0035, 0038, 0046].)
 
 	With respect to Claim 21, Zhou et al. disclose 
 	wherein producing grammatically parsed sign utterances includes producing a grammatical context and using the grammatical context of previous utterances (Zhou et al. [0027] Many signed words can use the same phoneme or different phonemes more than once in a sequence to form a single word or express an idea in the context of a larger sentence or phrase. As used herein, the term “transition” refers to the movements of the hands that are made between individual signs, such as between sequences of sign phonemes that form individual words or between the phonemes within a single sign. Unlike a sign phoneme, a transition movement is not a typical unit of a sign language that has independent meaning, but transition movements still provide important contextual information to indicate the end of a first word or phoneme and the beginning of a second word or phoneme, see paragraph [0046].)

 	With respect to Claim 22, Zhou et al. disclose  
 	wherein producing grammatically parsed sign utterances includes producing a confidence value based on the confidences of the signs, the confidence of the parsing and confidence of the parse matching a grammatical context for each parse of each sign sequence (Zhou et al. [0008] The method includes generation of explicit models for the transition signals before, between, and after signs within a HMM framework. In traditional speech recognition, an HMM based framework trains HMMs for phonemes as well as for the short pause between words and the silence between phrases. The speech recognition HMM then connects the phoneme HMMs into word HMMs and connects the word HMMs with the short pause/silence HMMs into a decoding network. Recognition is conducted by searching for the path with the highest probability in the decoding network given the input signals. FIG. 2 illustrates a prior art decoding network 200 for SR along with the HMM structures, see paragraphs [0046, 0057, 0058].)  

 	With respect to Claim 24, Zhou et al. disclose  
 	wherein the processor is further configured to detect the end of a sign language utterance before parsing the sign sequence to produce a grammatically parsed sign utterance (Zhou et al. Fig. 9 element 912, see paragraphs [0027, 0031, 0033, 0035].)

 	With respect to Claim 26, Zhou et al. disclose 
 	wherein the motion capture data includes data captured using marked gloves used by a user to produce the sign language utterance (Zhou et al. Fig. 5, [0037] As described above, in one embodiment the system 100 uses gloves with sensors that record acceleration and hand angle position information for the dominant and non-dominant hands of the user. FIG. 5 depicts examples of gloves 500 that are worn during the process 800 and during subsequent sign language recognition processing. The gloves 500 include multiple sensors, such as micro-electromechanical systems (MEMS) sensors that record the altitude, orientation, and acceleration of different portions of the dominant and non-dominant hands of the user as the user performs hand movements/postures to form the signs in the predetermined sequence. FIG. 6 depicts a more detailed view of one embodiment of the sensors 600 in a gloves for a right hand, see paragraphs [0021, 0029, 0030].)
	
 	With respect to Claim 27, Zhou et al. disclose 
 	wherein user specific parameter data (Zhou et al. [0032] During operation, a user 102 provides sign language input to the system 100 using the input device 132. In a training mode, the user 102 provides sign inputs to the input devices 132 for one or more predetermined sequences of signs that the processor 104 stores with the training data 130 in the memory 108) is used for one of: 
 	producing phonemes (Zhou et al. 0027] As used herein, the terms “sign phoneme” or more simply “phoneme” are used interchangeably and refer to a gesture, handshape, location, palm orientation, or other hand posture using one or two hands that corresponds to the smallest unit of a sign language. Each sign in a predetermined sign language includes at least one sign phoneme, and as used herein the term “sign” refers to a word or other unit of language that is formed from one or more sign phoneme, Fig. 9 element 912); 
 	producing a plurality of sign sequences (Zhou et al. Fig. 9 elements 912-920); 
 	parsing these sign sequences (Zhou et al. Fig. 9 element 925); and 
 	translating the grammatically parsed sign utterances (Zhou et al. [0031] The sign language phrase recognition model in the memory 108 also includes a language model 128 that converts identified signs from the sign language phrase recognition model 112 into phrases/sentences for one or more languages, such as Chinese or American Sign Language phrases/sentences, [0060] To ensure a proper grammatical output, the translation procedure may include reordering the words in the output from the actual order of the signs as entered by the user 102 or adding additional articles, conjunctions, prepositions, and other words that are not directly included in the sign language to the final output.)  

 	
 	With respect to Claim 29, Zhou et al. disclose 
 	further comprising sensors producing motion capture data (Zhou et al. Fig. 5, [0021] FIG. 5 is a depiction of an input device that includes gloves with sensors to detect the motion, position, orientation, and shape of the hands of a user who provides sign language input to the system of FIG. 1.) 

 	With respect to Claim 30, Zhou et al. disclose 
 	further comprising marked gloves used by a user to produce the sign language utterance (Zhou et al. Fig. 5, [0037] As described above, in one embodiment the system 100 uses gloves with sensors that record acceleration and hand angle position information for the dominant and non-dominant hands of the user. FIG. 5 depicts examples of gloves 500 that are worn during the process 800 and during subsequent sign language recognition processing. The gloves 500 include multiple sensors, such as micro-electromechanical systems (MEMS) sensors that record the altitude, orientation, and acceleration of different portions of the dominant and non-dominant hands of the user as the user performs hand movements/postures to form the signs in the predetermined sequence. FIG. 6 depicts a more detailed view of one embodiment of the sensors 600 in a gloves for a right hand, see paragraphs [0021, 0029, 0030].)

Claim Rejections - 35 USC § 103
7.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.



8.	Claims 9, 14, 23, 28 are rejected under 35 U.S.C.103 as being unpatentable over Zhou et al. (US 2016/0307469 A1) in view of Kanter (US 9,230,160 B1.)

 	With respect to Claim 9, Zhou et al. disclose all the limitations of Claim 1 upon which Claim 9 depends. Zhou fail to explicitly 
 	further comprising generating a plurality of output utterances in the target language based upon the plurality of sign sequences; 
 	displaying the plurality output utterances to a user; and 
 	receiving an indication from the user selecting one of the plurality of displayed output utterances as the correct translation.  
	However, Kanter teaches 
 	further comprising generating a plurality of output utterances in the target language based upon the plurality of sign sequences (Kanter col. 3 lines 25-42 according to the systems and methods of the present disclosure, a user may perform one or more motions, gestures and/or mannerisms that may be sensed or captured via a video camera and analyzed by one or more computers and/or computer processors. The motions, gestures and/or mannerisms included in video imagery may then be converted into or otherwise treated as signals or instructions by the one or more computers and/or computer processors, such as by comparing the sensed or captured motions, gestures and/or mannerisms to those motions, gestures and/or mannerisms associated with various pertinent words or phrases in accordance with one or more recognized languages, which may be stored in a library, dictionary, data store or other pertinent resource. The most appropriate words or phrases may be determined by any standard means, such as by calculating a confidence level or factor associated with the sensed or captured motions, gestures and/or mannerisms with respect to one or more motions, gestures and/or mannerisms stored in the library or data store);  
 	displaying the plurality output utterances to a user (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication); and 
 	receiving an indication from the user selecting one of the plurality of displayed output utterances as the correct translation (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication.)
 	Zhou et al. and Kanter are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of converting the sign language into the textual-based language as taught by Zhou et al., using teaching of displaying one or more potentially matching words or phrases corresponding with the sign language as taught by Kanter for the benefit of enabling the user to confirm or reject the displayed potentially matching words or phrases (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication.)

 	With respect to Claim 14, Zhou et al. disclose all the limitations of Claim 1 upon which Claim 14 depends. Zhou et al. fail to explicitly teach 
 	further comprising: 
 	detecting that a user is using fingerspelling, after producing phonemes; 
 	translating the fingerspelling phonemes to letters in the target language; 
 	generating an output to the user showing translated letters to the user; and 
 	receiving an input from the user indicating the correctness of the translated letters.  
	However, Kanter teaches 
 	further comprising: 
 	detecting that a user is using fingerspelling, after producing phonemes (Kanter col. 13 lines 28-32 When an online marketplace requests that a user provide his or her name, a particular library consisting primarily of gestures for forming letter may be identified, as proper names are typically formed by “fingerspelling,”); 
 	translating the fingerspelling phonemes to letters in the target language (Kanter col. 13 lines 28-32 When an online marketplace requests that a user provide his or her name, a particular library consisting primarily of gestures for forming letter may be identified, as proper names are typically formed by “fingerspelling,”, col. 2 lines 1-5, col. 4 lines 10-15); 
 	generating an output to the user showing translated letters to the user (Kanter col. 13 lines 28-32 When an online marketplace requests that a user provide his or her name, a particular library consisting primarily of gestures for forming letter may be identified, as proper names are typically formed by “fingerspelling,”); and 
 	receiving an input from the user indicating the correctness of the translated letters (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication.)
 	Zhou et al. and Kanter are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of converting the sign language into the textual-based language as taught by Zhou et al., using teaching of displaying one or more potentially matching words or phrases corresponding with the sign language as taught by Kanter for the benefit of enabling the user to confirm or reject the displayed potentially matching words or phrases (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication.)

With respect to Claim 23, Zhou et al. disclose all the limitations of Claim 15 upon which Claim 23 depends. Zhou fail to explicitly 
 	wherein the processor is further configured to:
generate a plurality of output utterances in the target language based upon the plurality of sign sequences; 
 display the plurality output utterances to a user; and receive an indication from the user selecting one of the plurality of displayed output utterances as a  correct translation.  
 	However, Kanter teaches 
 	wherein the processor is further configured to:
generate a plurality of output utterances in the target language based upon the plurality of sign sequences (Kanter col. 3 lines 25-42 according to the systems and methods of the present disclosure, a user may perform one or more motions, gestures and/or mannerisms that may be sensed or captured via a video camera and analyzed by one or more computers and/or computer processors. The motions, gestures and/or mannerisms included in video imagery may then be converted into or otherwise treated as signals or instructions by the one or more computers and/or computer processors, such as by comparing the sensed or captured motions, gestures and/or mannerisms to those motions, gestures and/or mannerisms associated with various pertinent words or phrases in accordance with one or more recognized languages, which may be stored in a library, dictionary, data store or other pertinent resource. The most appropriate words or phrases may be determined by any standard means, such as by calculating a confidence level or factor associated with the sensed or captured motions, gestures and/or mannerisms with respect to one or more motions, gestures and/or mannerisms stored in the library or data store);  
display the plurality output utterances to a user (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication); and 
 receive an indication from the user selecting one of the plurality of displayed output utterances as a  correct translation (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication.)
 	Zhou et al. and Kanter are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of converting the sign language into the textual-based language as taught by Zhou et al., using teaching of displaying one or more potentially matching words or phrases corresponding with the sign language as taught by Kanter for the benefit of enabling the user to confirm or reject the displayed potentially matching words or phrases (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication.)

 	With respect to Claim 28, Zhou et al. disclose all the limitations of Claim 15 upon which Claim 28 depends. Zhou et al. fail to explicitly teach 
 	wherein the processor is further configured to: 
 	detect that the user is using fingerspelling, after producing phonemes;
 	 translate the fingerspelling phonemes to letters in the target language; 
 	generate an output to the user showing translated letters to the user; and 
 	receive an input from the user indicating the correctness of the translated letters.  
	However, Kanter teaches 
 	wherein the processor is further configured to: 
 	detect that a user is using fingerspelling, after producing phonemes (Kanter col. 13 lines 28-32 When an online marketplace requests that a user provide his or her name, a particular library consisting primarily of gestures for forming letter may be identified, as proper names are typically formed by “fingerspelling,”); 
 	translate the fingerspelling phonemes to letters in the target language (Kanter col. 13 lines 28-32 When an online marketplace requests that a user provide his or her name, a particular library consisting primarily of gestures for forming letter may be identified, as proper names are typically formed by “fingerspelling,”, col. 2 lines 1-5, col. 4 lines 10-15); 
 	generate an output to the user showing translated letters to the user (Kanter col. 13 lines 28-32 When an online marketplace requests that a user provide his or her name, a particular library consisting primarily of gestures for forming letter may be identified, as proper names are typically formed by “fingerspelling,”); and 
 	receive an input from the user indicating the correctness of the translated letters (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication.)
 	Zhou et al. and Kanter are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of converting the sign language into the textual-based language as taught by Zhou et al., using teaching of displaying one or more potentially matching words or phrases corresponding with the sign language as taught by Kanter for the benefit of enabling the user to confirm or reject the displayed potentially matching words or phrases (Kanter col. 3 lines 43-48 One or more potentially matching words or phrases may be identified based on the confidence level or factor and displayed to the user, who may either confirm that the potentially matching words or phrases correspond to his or her intended communication, or indicate that the potential matches do not correspond to his or her intended communication.)

9.	Claim 31 is rejected under 35 U.S.C.103 as being unpatentable over Zhou et al. (US 2016/0307469 A1) in view of Wohlert et al. (US 2015/0120293 A1.)

 	With respect to Claim 31, Zhou et al. disclose all the limitations of Claim 15 upon which Claim 31 depends. Zhou et al. fail to explicitly teach 
 	wherein the input interface further receives communication input from a second user to facilitate a conversation between the user and the second user.  
	However, Wohlert et al. teach
 	wherein the input interface further receives communication input from a second user to facilitate a conversation between the user and the second user (Wohlert et al. [0019] FIG. 1 depicts an illustrative embodiment of a system 100 that can utilize a multimedia accessibility platform 110 (hereinafter server 110) to facilitate a communication session between a first end user 101 utilizing an end user device 120 and a second end user 102 utilizing another end user device 120. The end user devices 120 can be various types of devices including smart phones, mobile devices, laptop computers, desktop computers, landline telephones, cordless telephones, set top boxes and/or any other communication device capable of engaging in a communication session to exchange or otherwise communicate voice, video and/or data. Platform 110 is described as a server, but it should be understood that the platform 110 can be implemented using any number of computing devices (e.g., a single server in a centralized system or multiple server in a distributed environment), any type of computing devices (e.g., a service provider server or a customer computing device), and/or any configuration of the computing device(s) (e.g., a server farm where one or more servers are in a master/slave arrangement with one or more other servers or a combination of service provider devices and customer equipment performing the multimedia accessibility platform functions.), [0017] translating the user into sign language images, [0023] speech impairment (e.g., sign language to speech or text to speech.)
Zhou et al. and Wohlert et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of converting the sign language into the textual-based language as taught by Zhou et al., using teaching of identifying a number of different accessibility impairment or issues in communications as taught by Wohlert et al. for the benefit of adjusting the presentation of the multimedia content in order to facilitate the accessibility by the user to the content (Wohlert et al. [0017] translating the user into sign language images, [0023] speech impairment (e.g., sign language to speech or text to speech, [0020] The accessibility adaptation can include adjustment of the multimedia content, adjusting the presentation of the multimedia content or otherwise making adjustments associated with the presentation of the multimedia content to facilitate the accessibility by the user to the content.)

Conclusion
10.	The prior art made of record and not relied upon is considered pertinent to application’s disclosure. See PTO-892
a.	Itzhaik (US 2015/0084859 A1.) In this reference, Itzhaik disclose a method/a system for recognition and response to gesture-based input. 
b.	Shin et al. (US 2013/0295529 A1.) In this reference, Shin et al. disclose a method/a system to recognizing sign language using electromyogram sensor and gyro sensor. 
c.  	Ludwig (US 2012/0317521 A1.) In this reference, Ludwid disclose a method/a system for a multi-touch gesture-based user interface wherein a plurality of gestures are defined as functions of abstract space and time and further being primitive gesture segments that can be concatenated over time and space to construct gestures. 

11.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

12. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655                                                                                                                                                                                                        

30