DETAILED ACTION
Information Disclosure Statement
1.	The information disclosure statement (IDS) submitted on 08/29/2019 and 10/31/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. However, it is noted that All Non-Patent Literature (NPL) citations need at least a month and year of publication: MPEP 609.04(a): The date of publication supplied must include at least the month and year of publication, except that the year of publication (without the month) will be accepted if the applicant points out in the information disclosure statement that the year of publication is sufficiently earlier than the effective U.S. filing date and any foreign priority date so that the particular month of publication is not in issue. NPL cited without at least the month and year of publication has been labeled with “no date available”.

Election/Restrictions
Applicant’s election without traverse of SPECIES I in the reply filed on 03/30/2021 is acknowledged. Claims 19-20 are cancelled by applicant.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more. The claim(s) recite(s) limitations that fall under the grouping of abstract idea of “Certain Methods of Organizing Human Activity”, e.g. Concepts Relating To Managing Human Behavior (step 2A). The claim limitations merely recite various steps of human 
Specifically, each limitation in the claims can be performed by a human. A human ask a question, look at a screen to capture the image, extract a noun-phrase by listening, target the area of the screen by looking, and identify the target by looking up information. The use of various modules does not add a meaningful limitation to the abstract idea because they amount to simply implementing the abstract idea on a computer. The dependent claims recite various limitations that does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception (e.g. speech to text conversion, acquiring metadata, etc.).

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 9-17 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Davoust (US Patent 10271109 B1).
(col 12 lines 32-63), said instructions comprising: 
a request acquisition module receiving an audibly spoken question including a noun-phrase and a video stream (col 2 lines 18-50 video content 103 corresponding to a movie is rendered upon a display for viewing by a user; While watching the video content 103, the user presents a verbal query 106 in the form of a question: "Who is the man at the right?), said request acquisition module converting said audibly spoken question to text (col 7 lines 32-38 converting the audio received from the microphone 285 either to text or profile representations) and capturing a image data of a still frame of said video stream associated with a point in time of said video stream when said audibly spoken question is received (Fig. 3A-3C & col 7 lines 60-67 & col 8 lines 1-40 e.g. video content with grid being superimposed thereon); 
a noun-phrase extraction module receiving said text and extracting therefrom said noun-phrase (col 2 lines 18-50  While watching the video content 103, the user presents a verbal query 106 in the form of a question: "Who is the man at the right?; col 7 lines 39-50 The query response service 218 performs natural language processing on the verbal query 106 to determine the items that are inquired about and the nature of the inquiry, e.g., who, what, when, where, why, how, etc.);
target selection module identifying target data in said image data, said target data corresponding to said extracted noun-phrase (col 7 lines 60-67 & col 8 lines 1-40 e.g. grid 303 being superimposed thereon by the content access application 284; In some cases, a verbal query 106 may refer to a relative frame location that is unclear or ambiguous. In response, the grid 303 of a plurality of cells may be shown to facilitate easier specification of a relative frame location); 
a subject identification module generating a textual description of the identity of a target represented in said target data (col 2 lines 30-49 The response 109 in this case specifies the character name ("George") and the name of the cast member who plays the character ("Jim Kingsboro"); In various examples, the system may read out the response 109 using a speech synthesizer, or the system may present the response 109 via the display); and 
a response module generating a script comprising said noun-phrase and said textual description of said identity (col 2 lines 30-49 In various examples, the system may read out the response 109 using a speech synthesizer, or the system may present the response 109 via the display).

Regarding claim 2, Davoust discloses the medium of claim 1, wherein said audibly spoken question is converted to text by a speech recognition module (col 7 lines 32-50 query response service 218 performs natural language processing on the verbal query 106 to determine the items that are inquired about and the nature of the inquiry). 

Regarding claim 3, Davoust discloses the medium of claim 1, wherein said request acquisition module further includes program instructions for acquiring metadata about said video stream (col 5 lines 50-57 The time metadata 237 is used to correlate specific points in time in respective video content 103 with items in the extrinsic data library 230 and/or the performer data 233). 

Regarding claim 9, Davoust discloses the medium of claim 1, wherein said medium is included in a display device (Fig. 2 display 206). 

Regarding claim 10, Davoust discloses the medium of claim 9, wherein said display device is a smart television (col 5 lines 58-67 e.g. smart television). 

(col 5 lines 58-67 e.g. cellular telephones). 

Regarding claim 12, Davoust discloses the medium of claim 1, wherein said video stream is received via a telecommunications network (col 2 lines 50-59 The network 209 includes, for example, the Internet, intranets, extranets, wide area networks (WANs), local area networks (LANs), wired networks, wireless networks, or other suitable networks, etc., cable networks, satellite networks, or any combination of two or more such networks). 

Regarding claim 13, Davoust discloses the medium of claim 1, wherein said response module causes to be vocalized a response to said audibly spoken question, said vocalized response based at least in part on said script (col 2 lines 30-49 In various examples, the system may read out the response 109 using a speech synthesizer, or the system may present the response 109 via the display). 

Regarding claim 14, Davoust discloses the medium of claim 13, wherein said vocalization is performed using a voice user interface (col 6 lines 56-65 The speech synthesizer 288 may be executed to generate audio corresponding to synthesized speech for textual inputs; The content information application 287 is executed to receive verbal queries 106 from users via the microphone 285 and to present responses 109 via the speech synthesizer 288 and the audio device 286). 

Regarding claim 15, Davoust discloses the medium of claim 14, wherein said voice user interface comprises a digital assistant (col 5 lines 58-67 e.g. personal digital assistants). 

(col 7 lines 29-31 Users may also initiate purchases of items currently shown on screen or add the items to wish lists, watch lists, shopping carts, and so on). 

Regarding claim 17, Davoust discloses a computerized method for answering an ambiguous user query comprising: 
receiving a video stream and displaying said video stream (col 2 lines 18-50 video content 103 corresponding to a movie is rendered upon a display for viewing by a user); 
receiving an audibly spoken question at a first time during said display of said video stream (col 2 lines 18-50 video content 103 corresponding to a movie is rendered upon a display for viewing by a user While watching the video content 103, the user presents a verbal query 106 in the form of a question: "Who is the man at the right?); 
converting said audibly spoken question to text (col 7 lines 32-38 converting the audio received from the microphone 285 either to text or profile representations); 
capturing image data of said video stream at said first time (Fig. 3A-3C & col 7 lines 60-67 & col 8 lines 1-40 e.g. video content with grid being superimposed thereon); 
extracting a noun-phrase from said converted text (col 2 lines 18-50  While watching the video content 103, the user presents a verbal query 106 in the form of a question: "Who is the man at the right?; col 7 lines 39-50 The query response service 218 performs natural language processing on the verbal query 106 to determine the items that are inquired about and the nature of the inquiry, e.g., who, what, when, where, why, how, etc.);
(col 7 lines 60-67 & col 8 lines 1-40 e.g. grid 303 being superimposed thereon by the content access application 284; In some cases, a verbal query 106 may refer to a relative frame location that is unclear or ambiguous. In response, the grid 303 of a plurality of cells may be shown to facilitate easier specification of a relative frame location); 
generating a textual description of said target data; generating a script comprising said noun-phrase and said textual description (col 2 lines 30-49 The response 109 in this case specifies the character name ("George") and the name of the cast member who plays the character ("Jim Kingsboro"); In various examples, the system may read out the response 109 using a speech synthesizer, or the system may present the response 109 via the display); and 
vocalizing said script (col 2 lines 30-49 In various examples, the system may read out the response 109 using a speech synthesizer, or the system may present the response 109 via the display).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-8 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Davoust as applied to claim 1 and 17 above, and further in view of Tang (US 20200380292).
Regarding claim 4, Davoust discloses the medium of claim 1, but fails to teach wherein said target selection module identifies said target data using a machine learning system. 
(¶39-40 machine learning; ¶49 method for identifying an object).
	Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein said target selection module identifies said target data using a machine learning system from Tang into the medium as disclosed by Davoust. The motivation for doing this is to improve methods and device for identifying an object.

Regarding claim 5, the combination of Davoust and Tang discloses the medium of claim 4, wherein said machine learning system comprises a neutral network (Tang ¶39-40 e.g. neural network). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein said machine learning system comprises a neutral network from Tang into the medium as disclosed by Davoust. The motivation for doing this is to improve methods and device for identifying an object.

Regarding claim 6, Davoust discloses the medium of claim 1, but fails to teach wherein said subject identification module generates said textual representation using a machine learning system. 
Tang teaches wherein said subject identification module generates said textual representation using a machine learning system (¶39-40 machine learning; ¶68 Then, in step S430, a category corresponding to the largest one of the plurality of second feature similarities S2.sub.ref(i) may be determined as the category of the object). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein said subject identification module generates said textual representation using a machine learning system from Tang into the 

Regarding claim 7, the combination of Davoust and Tang discloses the medium of claim 6, wherein said machine learning system comprises a plurality of neural networks, each neural network in said plurality being trained on a target category (Tang ¶54-55 the first neural network may be trained using a large number of object sample images; second neural networks trained for particular categories in a subsequent process). 
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of wherein said machine learning system comprises a plurality of neural networks, each neural network in said plurality being trained on a target category from Tang into the medium as disclosed by Davoust. The motivation for doing this is to improve methods and device for identifying an object.

Regarding claim 8, the combination of Davoust and Tang discloses the medium of claim 7, further comprising: 
a target categorization module assigning a category to said target data (Tang ¶68 in step S430, a category corresponding to the largest one of the plurality of second feature similarities S2.sub.ref(i) may be determined as the category of the object); and 
said subject identification module generating said textual description using a selected neural network from said plurality of neural network, said selected neural network being determined based on said assigned category (Tang ¶54-55 the first neural network may be trained using a large number of object sample images; second neural networks trained for particular categories in a subsequent process). 


Regarding claim 18, Davoust discloses the method of claim 17, but fails to teach assigning a category to said target data; and in said generating a textual description, generating said textual description using a neural network trained using image data corresponding to said category.
Tang teaches assigning a category to said target data Tang ¶68 in step S430, a category corresponding to the largest one of the plurality of second feature similarities S2.sub.ref(i) may be determined as the category of the object); and in said generating a textual description, generating said textual description using a neural network trained using image data corresponding to said category (Tang ¶54-55 the first neural network may be trained using a large number of object sample images; second neural networks trained for particular categories in a subsequent process).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of a assigning a category to said target data; and in said generating a textual description, generating said textual description using a neural network trained using image data corresponding to said category from Tang into the medium as disclosed by Davoust. The motivation for doing this is to improve methods and device for identifying an object.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648.  The examiner can normally be reached on Monday-Friday 9-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/KEVIN KY/Primary Examiner, Art Unit 2669