DETAILED ACTION
This Office Action is in response to the original application filed on 11/30/2018 and the preliminary amendment filed on 02/21/2019. Claims 1-20 are pending, of which, claims 1 and 11 are presented in independent form.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 02/22/2019 and 03/20/2020 were filed in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Drawings
	The drawings submitted on 11/30/2018 are accepted.

Specification
The disclosure is objected to because of the following informalities: 
In [0059] line 3, “insert something here? may…” seems to be a placeholder.  
In [0071] line 7, “keyworks” should be read as “keywords”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 7-11, 15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hammontree et al. (U.S. Pub. No. 2013/0086105, cited in IDS), hereinafter Hammontree, in view of Goel et al. (U.S. Pub. No. 2019/0294668, which claims foreign benefit from India App. No. IN201641041399 filed on 12/03/2016), hereinafter Goel.
 
Regarding independent claim 1, Hammontree teaches a method for providing contextual search results to ambiguous queries, the method comprising: (Hammontree, [0031]-[0032], discloses a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. The disambiguated search word provided by the query analysis component can be an additional search term where the search component can generate the result using the additional search term and detected objects from the visual content.)
receiving a search query during a presentation of a video; (Hammontree, [0004], discloses "A voice directed query that relates to visual content rendered on a display can be received")
in response to determining that the search query is ambiguous: (Hammontree, [0030], discloses a disambiguation component that disambiguates the search word from the voice directed query)
accessing a plurality of frames from the video that were presented concurrently with receiving the search query; (Hammontree, [0031], discloses a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. Hammontree, [0055], discloses a frame selection component that captures frames from the video stream rendered on the display at a time when the voice recognition component receives the voice directed query where the frames captured by the frame selection component can be the visual content.)
augmenting the search query with the retrieved keyword; and performing a search based on the augmented search query; and outputting results of the search. (Hammontree, [0031]-[0032], discloses a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. The disambiguated search word provided by the query analysis component can be an additional search term where the search component can generate the result using the additional search term and detected objects from the visual content.)
However, Hammontree does not explicitly teach analyzing the plurality of frames to identify a performed action; and 
retrieving a keyword associated with the identified action;
On the other hand, Goel teaches analyzing the plurality of frames to identify a performed action; and retrieving a keyword associated with the identified action; (Goel, [0068], discloses content extraction module to extract the audio portions (the speech content) and the video portions (the visual content/image frames) from the multimedia and classifies extracted audio portions and the video portions. The multimedia analysis engine then analyzes the image frames of the video portions to detect and covert detected objects and/or actions into a textual list to identifying the keywords and/or keyphrases that represent the context of the contents presented in the image frames of the multimedia.)
Hammontree [0031]-[0032] teaches a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. The extracted video portions and identified keywords that represent the context of Goel can be the visual content and contextual information of Hammontree. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to have modified the voice directed context sensitive visual search system of Hammontree to incorporate the teachings of video context analysis system of Goel because both address the same field of video analysis and search systems and by incorporating Goel into Hammontree provides the voice based query system a way of analyze segments of video to identify actions in the analyzed video frames.
One of ordinary skill in the art would be motivated to do so to provide a quick summary, useful cross-references and additional relevant information to provide an efficient way for a user to select the right multimedia and right content/portions of the multimedia, as taught by Goel [0008].
 
Regarding claim 5, Hammontree, in view of Goel, teaches the method of claim 1, wherein receiving the search query comprises: detecting user voice input; and performing speech to text analysis of the user voice input to derive the search query.  (Hammontree, [0004], discloses "A voice directed query that relates to visual content rendered on a display can be received". Hammontree, [0049], discloses "voice recognition component 404 can receive the voice directed query 104 and identify the search words")
Claim 15 recites substantially the same limitations as claim 5, and is rejected for substantially the same reasons.
 
Regarding claim 7, Hammontree, in view of Goel, teaches the method of claim 1, wherein accessing the plurality of frames from the video that were presented concurrently with receiving the search query comprises: (Hammontree, [0055], discloses a frame selection component that captures frames from the video stream rendered on the display at a time when the voice recognition component receives the voice directed query where the frames captured by the frame selection component can be the visual content.)
receiving an audio sample of the video that were presented concurrently with receiving the search query; identifying a time location in the video where the audio sample occurred; and extracting frames corresponding to the time location. (Goel, [0068], discloses content extraction module to extract the audio portions (the speech content) and the video portions (the visual content/image frames) from the multimedia and classifies extracted audio portions and the video portions. The multimedia analysis engine then analyzes the image frames of the video portions to detect and covert detected objects and/or actions into a textual list to identifying the keywords and/or keyphrases that represent the context of the contents presented in the image frames of the multimedia. Goel, [0054]-[0055], discloses an analytics generation unit that can generate the analytics based on the keywords and/or keyphrases, the timestamps of the occurrences of the keywords and/or keyphrases, the objects and/or the actions recognized in the video portions, etc.)
Claim 17 recites substantially the same limitations as claim 7, and is rejected for substantially the same reasons.
 
Regarding claim 8, Hammontree, in view of Goel, teaches the method of claim 7, wherein extracting frames corresponding to the time location comprises: extracting frames from a predetermined time period prior to the time location; and extracting frames from a predetermined time period after the time location. (Hammontree, [0055], discloses a frame selection component that captures frames from the video stream rendered on the display at a time when the voice recognition component receives the voice directed query (e.g., when the voice recognition component begins to receive the voice directed query, when the voice recognition component finishes receiving the voice directed query, a time there between, etc.) where the frames captured by the frame selection component can be the visual content. Examiner interprets that the frame selection component selects a set number of frames (set number or time period) based on the time a voice query is captured, where the time inbetween the start and finish of the voice query is the time location.)
Claim 18 recites substantially the same limitations as claim 8, and is rejected for substantially the same reasons.
 
Regarding claim 9, Hammontree, in view of Goel, teaches the method of claim 1, wherein accessing the plurality of frames from the video that were presented concurrently with receiving the search query comprises capturing displayed frames of the video for a predetermined time period after receiving the search query. (Hammontree, [0055], discloses a frame selection component that captures frames from the video stream rendered on the display at a time when the voice recognition component receives the voice directed query where the frames captured by the frame selection component can be the visual content. Examiner interprets that the frame selection component selects a set number of frames (set number or time period) based on the time a voice query is captured.)
Claim 19 recites substantially the same limitations as claim 9, and is rejected for substantially the same reasons.
 
Regarding claim 10, Hammontree, in view of Goel, teaches the method of claim 1, wherein accessing the plurality of frames from the video that were presented concurrently with receiving the search query comprises capturing a predetermined number of displayed frames after receiving the search query. (Hammontree, [0055], discloses a frame selection component that captures frames from the video stream rendered on the display at a time when the voice recognition component receives the voice directed query where the frames captured by the frame selection component can be the visual content. Examiner interprets that the frame selection component selects a set number of frames (set number or time period) based on the time a voice query is captured.)
Claim 20 recites substantially the same limitations as claim 10, and is rejected for substantially the same reasons.
 
Regarding independent claim 11, Hammontree teaches a system for providing contextual search results to ambiguous queries, the system comprising: (Hammontree, [0031]-[0032], discloses a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. The disambiguated search word provided by the query analysis component can be an additional search term where the search component can generate the result using the additional search term and detected objects from the visual content.)
input circuitry of a device configured to: (Hammontree, [0082], discloses "The computing device 1100 also includes an input interface 1110 that allows external devices to communicate with the computing device")
receive a search query during a presentation of a video; (Hammontree, [0004], discloses "A voice directed query that relates to visual content rendered on a display can be received") and
control circuitry of the device configured to: (Hammontree, [0084], discloses "component and system are intended to encompass computer-readable data storage that is configured with computer-executable instructions that cause certain functionality to be performed when executed by a processor.")
in response to determining that the search query is ambiguous: (Hammontree, [0030], discloses a disambiguation component that disambiguates the search word from the voice directed query)
access a plurality of frames from the video that were presented concurrently with receiving the search query; (Hammontree, [0031], discloses a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. Hammontree, [0055], discloses a frame selection component that captures frames from the video stream rendered on the display at a time when the voice recognition component receives the voice directed query where the frames captured by the frame selection component can be the visual content.)
augment the search query with the retrieved keyword; and perform a search based on the augmented search query; and output results of the search. (Hammontree, [0031]-[0032], discloses a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. The disambiguated search word provided by the query analysis component can be an additional search term where the search component can generate the result using the additional search term and detected objects from the visual content.)
However, Hammontree does not explicitly teach analyze the plurality of frames to identify a performed action; and 
retrieve a keyword associated with the identified action; 
On the other hand, Goel teaches analyze the plurality of frames to identify a performed action; and retrieve a keyword associated with the identified action; (Goel, [0068], discloses content extraction module to extract the audio portions (the speech content) and the video portions (the visual content/image frames) from the multimedia and classifies extracted audio portions and the video portions. The multimedia analysis engine then analyzes the image frames of the video portions to detect and covert detected objects and/or actions into a textual list to identifying the keywords and/or keyphrases that represent the context of the contents presented in the image frames of the multimedia.)
Hammontree [0031]-[0032] teaches a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. The extracted video portions and identified keywords that represent the context of Goel can be the visual content and contextual information of Hammontree. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to have modified the voice directed context sensitive visual search system of Hammontree to incorporate the teachings of video context analysis system of Goel because both address the same field of video analysis and search systems and by incorporating Goel into Hammontree provides the voice based query system a way of analyze segments of video to identify actions in the analyzed video frames.
One of ordinary skill in the art would be motivated to do so to provide a quick summary, useful cross-references and additional relevant information to provide an efficient way for a user to select the right multimedia and right content/portions of the multimedia, as taught by Goel [0008].
 
 
 
Claims 2-4 and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Hammontree, in view of Goel, and further in view of Rajan et al. (U.S. Pub. No. 2019/0258851, which claims benefit from U.S. Provisional App. No. 62/633,045 filed on 02/20/2018), hereinafter Rajan.
 
Regarding claim 2, Hammontree, in view of Goel, teaches all the limitations as set forth in the rejection of claim 1 above. Hammontree, in view of Goel, further teaches the method of claim 1, wherein identifying the performed action comprises: identifying a character in each of the plurality of frames; (Goel, [0068], discloses content extraction module to extract the audio portions (the speech content) and the video portions (the visual content/image frames) from the multimedia and classifies extracted audio portions and the video portions. The multimedia analysis engine then analyzes the image frames of the video portions to detect and covert detected objects and/or actions into a textual list to identifying the keywords and/or keyphrases that represent the context of the contents presented in the image frames of the multimedia.)
wherein retrieving the keyword associated with the identified action comprises retrieving metadata of the movement template. (Goel, [0068], discloses content extraction module to extract the audio portions (the speech content) and the video portions (the visual content/image frames) from the multimedia and classifies extracted audio portions and the video portions. The multimedia analysis engine then analyzes the image frames of the video portions to detect and covert detected objects and/or actions into a textual list to identifying the keywords and/or keyphrases that represent the context of the contents presented in the image frames of the multimedia.)
However, Hammontree, in view of Goel, does not explicitly teach generating a model of the identified character's movements; 
determining that the generated model matches a movement template;
On the other hand, Rajan teaches generating a model of the identified character's movements; determining that the generated model matches a movement template; (Rajan, [0059]-[0060], discloses a movement identification module employ movement model to determine movements in the observation volume by calculating joint angles, speed, acceleration, force, and repetitive motions. The movement identification module inputs the time series of key-points in the images into the movement model and movement identification module identifies action performed by a person in the image series. The movement identification module may calculate the angle between the person's ankle, knee, and hip based on spatial representation of their associated key-points. Rajan, [0065], discloses the movement identification module calculates an angle between an ankle, knee, and elbow for a person performing a movement in an active region. The movement identification model identifies the movement due, in part, to the repetitive change in angle.)
Hammontree [0031]-[0032] teaches a query analysis component that can disambiguate the search word based on the visual content and/or the contextual information. The movement identification from an observational volume of Rajan can be the visual content of Hammontree. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to have modified the voice directed context sensitive visual search system of Hammontree to incorporate the teachings of movement identification using movement models of Rajan because both address the same field of video analysis systems and by incorporating Rajan into Hammontree provides the voice based query system a way of identifying actions performed by characters in a video clip using movement models.
One of ordinary skill in the art would be motivated to do so to provide movement identification through use of active regions and movements of users in those active regions without the use of physical and/or electronic indicators in an observation volume, as taught by Rajan [0004].
Claim 12 recites substantially the same limitations as claim 2, and is rejected for substantially the same reasons.
 
Regarding claim 3, Hammontree, in view of Goel and Rajan, teaches the method of claim 2, wherein generating a model of the identified character's movements comprises: identifying body parts of the identified character; and calculating an angle between two body parts of the identified character. (Rajan, [0059]-[0060], discloses a movement identification module employ movement model to determine movements in the observation volume by calculating joint angles, speed, acceleration, force, and repetitive motions. The movement identification module inputs the time series of key-points in the images into the movement model and movement identification module identifies action performed by a person in the image series. The movement identification module may calculate the angle between the person's ankle, knee, and hip based on spatial representation of their associated key-points. Rajan, [0065], discloses the movement identification module calculates an angle between an ankle, knee, and elbow for a person performing a movement in an active region. The movement identification model identifies the movement due, in part, to the repetitive change in angle.)
Claim 13 recites substantially the same limitations as claim 3, and is rejected for substantially the same reasons.
 
Regarding claim 4, Hammontree, in view of Goel and Rajan, teaches the method of claim 3, wherein determining that the generated model matches the movement template comprises: comparing the calculated angle with a reference angle of the movement template; and in response to determining that the calculated angle matches a reference angle determining that the generated model matches the movement template. (Rajan, [0059]-[0060], discloses a movement identification module employ movement model to determine movements in the observation volume by calculating joint angles, speed, acceleration, force, and repetitive motions. The movement identification module inputs the time series of key-points in the images into the movement model and movement identification module identifies action performed by a person in the image series. The movement identification module may calculate the angle between the person's ankle, knee, and hip based on spatial representation of their associated key-points. Rajan, [0065], discloses the movement identification module calculates an angle between an ankle, knee, and elbow for a person performing a movement in an active region. The movement identification model identifies the movement due, in part, to the repetitive change in angle.)
Claim 14 recites substantially the same limitations as claim 4, and is rejected for substantially the same reasons.
 
 
 
Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Hammontree, in view of Goel, and further in view of Nguyen (U.S. Pat. No. 10,437,833).
 
Regarding claim 6, Hammontree, in view of Goel, teaches all the limitations as set forth in the rejection of claim 1 above. However, Hammontree, in view of Goel, does not explicitly teach the method of claim 1, wherein determining that the search query is ambiguous comprises: determining that the search query comprises at least one of: a pronoun and an auxiliary verb. 
On the other hand, Nguyen teaches wherein determining that the search query is ambiguous comprises: determining that the search query comprises at least one of: a pronoun and an auxiliary verb. (Nguyen, Col. 2 lines 33-34, discloses "semantic-syntactic parsing, performing discourse enhancements and other NLP user based interactions." Nguyen, Col. 3 lines 4-17, discloses text includes spoken form of natural language as processed from the signal to the speech-to-word level and natural language processing including constrained semantic-syntactic subsequence matching, transformation based learning learned rules, ontology constrained word sense disambiguation, and the like. Examiner interprets text as spoken form of natural language as processed from the signal to the speech-to-word level to include voice directed search queries. Nguyen, Col. 9 lines 65-67, discloses separate ontologies for different parts of speech, including nouns, verbs, adverbs, adjectives, or prepositions. Examiner interprets that pronouns and auxiliary verbs are different parts of speech which would each have their own separate ontologies.)
Hammontree [0030]-[0032] teaches a disambiguation component that disambiguates search words from the voice directed query. The ontology constrained word sense disambiguation with different parts of speech ontologies where the text could be speech-to-word level of Nguyen could be the disambiguation of a voice directed query of Hammontree. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to have modified the voice directed context sensitive visual search system of Hammontree to incorporate the teachings of parts of speech ontology-based natural language processing of Nguyen because both address the same field of query disambiguation systems and by incorporating Nguyen into Hammontree provides the voice based query system a way of disambiguating ambiguous search queries that have pronouns and auxiliary verbs.
One of ordinary skill in the art would be motivated to do so to provide a natural language processing system that is not resource heavy both in processing and memory, as taught by Nguyen Col. 1 lines 25-40.
Claim 16 recites substantially the same limitations as claim 6, and is rejected for substantially the same reasons.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDDY CHEUNG whose telephone number is (571)272-9785.  The examiner can normally be reached on MON-TH 8:00AM-4:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached on (571)270-1760.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Eddy Cheung/Examiner, Art Unit 2165