DETAILED ACTION

	Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority Acknowledgment
2.               Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in Application 10-2019-0095581 on 08/06/2019 in the Republic of Korea Patent Office. 

Response to Arguments/Amendments
3.	Claims 1-13 were previously allowed. 
 	The Applicant has amended the independent Claim 14 by incorporating of Claim 15.  Claim 15 was previously indicated as Allowable Subject Matter. Thus, the rejections towards Claim 14 and its dependent claims is withdrawn. 
 	With respect to Claim Interpretations, Applicant has amended Claims 8, 9, 11, 13 and 14 to provide structure. Thus, Claims 8, 9, 10, 11, 13 and 14 do not invoke 35 U.S.C. § 112 (f). 

Reasons for Allowance
4.	Claims 1-14 are allowed.
The prior art(s) taken alone or in combination fail(s) to teach the following element(s) in combination with the other recited elements in the claim(s). 
	“a recognition step of recognizing a speech language sentence from speech information, and recognizing an appearance image and a background image from video information; 
 	a sign language joint information generation step of acquiring multiple pieces of word-joint information corresponding to the speech language sentence from a joint information database and sequentially inputting the word-joint information to a deep learning neural network to generate sentence-joint information; and
 	a video generation step of generating a motion model on the basis of the sentence-joint information, and generating a sign language video in which the background image and the appearance image are synthesized with the motion model.” as recited in Claim 1.  
 	“a recognition module configured to recognize, on the basis of video information and speech information, an appearance image and a background image from the video information and a speech language sentence from the speech information; 
 	a joint information generation module configured to acquire multiple pieces of word-joint information corresponding to the speech language sentence and sequentially input the word-joint information to a deep learning neural network to generate sentence-joint information; and 
 	a video generation module configured to generate a motion model on the basis of the sentence-joint information, and generate a sign language video in which the background image and the appearance image are synthesized with the motion model.” as recited in Claim 8.
	“wherein the sign language video is a video generated by allowing a recognition processor of the sign language video providing apparatus to recognize an appearance image and a background image from the video information and recognize a speech language sentence from the speech information the basis of the video information and the speech information, 
 	the sign language video providing apparatus comprising: 
 	a joint information generating processor configured to acquire multiple pieces of word- joint information corresponding to the speech language sentence from a joint information database and sequentially receive the word-joint information to a deep learning network to generate sentence-joint information; and 
 	a video generation processor configured to generate a motion model on the basis of the sentence-joint information and synthesize the background image and the appearance image on the motion model.” as recited in Claim 14. 
  
	The closest prior art found as following. 
a. 	Negishi (US 2021/0150145 A1.) In this reference, Negishi disclose a method for converting the speech into sign language, and outputs, on a display, an image of an avatar expressing it in sign language (Negishi [0040] the sensor section 10 may include a microphone that senses sound information, [0041] the sensor section 110 may include an image sensor that senses an image (still image or moving image, [0136] the device 100 first recognizes the context of the activity of a user (step S110). For example, the device 100 recognizes the context of activity by classifying the activity of a user into the three classes of gesturing, speaking, and the others on the basis of sensing information obtained by a camera, [0180] The device 100 extracts a feature on the basis of sensed speech, and recognizes the languages used by the first user 10 and the second user 20 on the basis of the speech feature, and the contexts of attributes and places. Next, the device 100 uses an acoustic model and a language model to obtain the speech of the speakers as text, [0123] in a case where the first action subject is a spoken language user and the second action subject is a sign language user, the device 100 may convert a message expressed by the first action subject using the spoken language into a moving image of a hand performing the sign language gesture corresponding to the message. The gesture is superimposed on the first action subject displayed on a transmissive display, and displayed in an AR manner. Watching the gesture performed by a hand, superimposed on the first action subject displayed on the transmissive display, and displayed in an AR manner allows the second action subject to recognize the message outputted by the first action subject. This allows the second action subject to recognize a message from the first action subject as if the first action subject actually made a remark in sign language, [0156] the second user 20 briefly speaks, to the first user 10, “Turn right at the second corner on this street.” The device 100 then performs speech recognition in real time, and superimposes and displays, in an AR manner, an arm 30 performing the corresponding sign language gesture on the second user 20 displayed on the see-through display.) Negishi et al. recognizes a speech language sentence from speech information and recognizes an appearance image from video information. In Negishi, an image in which a person engaging in some activity is a claimed appearance image. However, Negishi does not teach recognizing a background image from video information and generating a sign language video in which the background image and the appearance image are synthesized. Thus, Negishi fail to teach and/or suggest the allowable subject matter noted above. 
b. 	Yao (US 2020/0075011 A1.) In this reference, Yao disclose a method/ a system for generating a sign language video (Yao [0098] S101: obtaining voice information and video information collected by a user terminal in real time, [0102] S102: determining, in the video information, a speaking object corresponding to the voice information, [0104] In an implementation of step S102, which speaking object the currently collected voice information belongs to may be determined by recognizing a person currently speaking. Specifically, it may be that, at least one face image is recognized in the video information at first; and then a face image showing an opening and closing action of a lip is determined as a target face image; and finally a portrait corresponding to the target face image is determined as the speaking object corresponding to the voice information, [0105] The method for recognizing at least one face image in the video information may be to perform a face recognition on video frames in the video information to obtain a face image when the video information is obtained. Specifically, the video information is obtained from a cache of the user terminal, or the video information is obtained from a cache of the server when the server receives the video information from the user terminal and stores the same in the cache. There may be multiple video frames parsed from the video information. Then, the video frames may be processed to obtain the face image in a manner of picture recognition and picture classification. For example, image classification based on pixel points is performed on the video frames by a semantic segmentation algorithm (for example, FCN algorithm) or an instance segmentation algorithm (for example, Mask RCNN algorithm), and face images in the video frames are recognized and located. Alternatively, face feature information is searched in the video frames, and a image area that conforms to face feature is used as a face area, [0111] S103: superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object to obtain a sign language video.) Yao recognizes a speech language sentence from speech information and recognizes a person currently speaking. The person currently speaking is a claimed appearance image. However, Yao does not teach recognizing a background image from video information and generating a sign language video in which the background image and the appearance image are synthesized. Thus, Yao fail to teach and/or suggest the allowable subject matter noted above. 
c. 	Chandler et al. (US 2019/0251702 A1.) In this reference, Chandler et al. disclose a method/ a system for real-time gesture recognition. One exemplary method for the real-time identification of a gesture communicated by a subject includes receiving, by a first thread of the one or more multi-threaded processors, a first set of image frames associated with the gesture, the first set of image frames captured during a first time interval, performing, by the first thread, pose estimation on each frame of the first set of image frames including eliminating background information from each frame to obtain one or more areas of interest, storing information representative of the one or more areas of interest in a shared memory accessible to the one or more multi-threaded processors, and performing, by a second thread of the one or more multi-threaded processors, a gesture recognition operation on a second set of image frames associated with the gesture (Chandler et al. [0294] FIG. 37A illustrates an example threading model that can be used for CPU processing in accordance with an example embodiment of the disclosed technology. For simplicity, only one pair of threads (also referred to as ping-pong threads) is used in the context of an image capture, processing and recognition example. The load balancing module first starts Thread A 3711 and Thread B 3712 at the same time. The load balancing module delegates Thread A 3711 to handle the task of input data capturing 3701. In some embodiments, as a part of the input data capturing task 3301, Thread A 3711 can perform some pre-processing operations on the captured data (e.g., color space conversion, or encoding) using the GPU cores. Thread A 3711 then produces an image frame for subsequent processing. For example, Thread A 3711 can preprocess the captured image to remove background pixels, so that only the areas of interest (e.g., foreground pixels that show the gestures) remain in the processed image for subsequent processing, [0327] In another example aspect, an apparatus in a sign language processing system includes a first processing unit and a second processing unit, and a memory including instructions stored thereupon. The instructions upon execution by the first processing unit cause the first processing unit to receive, by a first thread of a first processing unit, a set of data captured by a capture device, the set of data including an image frame that illustrates a gesture representing a letter, a word, or a phrase in a sign language. The instructions cause the first processing unit to eliminate, by the first thread of the first processing unit, background information in the image frame to obtain one or more areas of interest; prepare, by a second thread of the first processing unit concurrently as the set of data is preprocessed, a set of resources for a gesture recognition operation.) Chandler et al. eliminates background information in the image frame in recognizing the sign language. However, Chandler et al. does not teach recognizing a speech language sentence from speech information, and recognizing an appearance image and a background image from video information and generating a sign language video in which the background image and the appearance image are synthesized. Thus, Chandler et al. fail to teach and/or suggest the allowable subject matter noted above. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
5.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655