Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-12, 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yakishyn et al. (US 2018/0181811 A1).
Regarding claim 1, Yakishyn teaches:
A method with image augmentation, (FIG. 1) the method comprising: 
recognizing, based on a gaze of the user corresponding to the input image, any one or any combination of any two or more of an object of interest of a user, a situation of the object of interest, and a task of the user from partial regions of an input image; ([0050]-[0051], “For example, the user may have the display apparatus 10 fixed to his or her head and 
determining relevant information indicating an intention of the user based on any two or any other combination of the object of interest of the user, the situation of the object of interest, and the task of the user; ([0053], “The display apparatus 10 may provide information regarding the image of interest 40 to the user. In an embodiment, the information regarding the image of interest 40 may include, but is not limited to, coordinates of an area where the image of interest 40 is reproduced, a reproduction section of the image of interest 40, information regarding a key frame included in the image of interest 40, information regarding an object included in the key frame, and context information regarding the key frame.” [0086], “In an embodiment, the received information regarding the image of interest may include coordinates of an area where the image of interest is reproduced, a reproduction section of the image of interest, information regarding a key frame included in the image of interest, and 
generating a visually augmented image by visually augmenting the input image based on the relevant information. ([0054], “For example, the display apparatus 10 may determine a VR image 21 of an area where a certain building is reproduced as an image of interest 41. The display apparatus 10 may identify that an object included in a key frame of the image of interest 41 is `Building A`, and may provide information regarding a construction year and features of the `Building A` to the user.”)
The above limitation of “determining relevant information indicating an intention of the user based on any two or any other combination of the object of interest of the user, the situation of the object of interest, and the task of the user;” are taught by Yakishyn from different embodiment. However, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the different embodiments of Yakishyn to determine and acquire different information regarding the interest object. The benefit would be to provide a flexibility to offer users more and different information in an augmented environment.

Regarding claim 2, Yakishyn teaches:
The method of claim 1, wherein the recognizing comprises: 
generating, based on the image sequence, an image sequence including partial regions of the input image based on the gaze of the user; ([0066], “Referring to FIG. 3B, the gaze of the user moves according to the movement of the object included in the VR image. The display apparatus 10 may determine areas 312a to 312d of the VR image corresponding to a movement path of the gaze of the user during the period t3.about.t4 312. Also, the display apparatus 10 may determine the VR image that is reproduced in the determined areas 312a to 312d as a dynamic image of interest. In an embodiment, the determined areas 312a to 312d may be different with respect to each frame included in the VR image. Also, the determined areas 312a to 312d may differ from frame to frame, and thus, each of the determined areas 312a to 312d is independent.”) and 
recognizing any one or any combination of any two or more of the object of interest of the user, the situation of the object of interest, and the task of the user. ([0066], “Referring to FIG. 3B, the gaze of the user moves according to the movement of the object included in the VR image. The display apparatus 10 may determine areas 312a to 312d of the VR image corresponding to a movement path of the gaze of the user during the period t3.about.t4 312. Also, the display apparatus 10 may determine the VR image that is reproduced in the determined areas 312a to 312d as a dynamic image of interest. In an embodiment, the determined areas 312a to 312d may be different with respect to each frame included in the VR image. Also, the determined areas 312a to 312d may differ from frame to frame, and thus, each of the determined areas 312a to 312d is independent.”)

Regarding claim 3, Yakishyn teaches:
The method of claim 2, wherein the generating comprises: extracting partial images mapped to the gaze of the user from the input image; and generating the image sequence by sequentially combining the partial images. ([0066], “Referring to FIG. 3B, the gaze of the user moves according to the movement of the object included in the VR image. The display apparatus 10 may determine areas 312a to 312d of the VR image corresponding to a movement path of the gaze of the user during the period t3.about.t4 312. Also, the display apparatus 10 may determine the VR image that is reproduced in the determined areas 312a to 312d as a dynamic image of interest. In an embodiment, the determined areas 312a to 312d may be different with respect to each frame included in the VR image. Also, the determined areas 312a to 312d may differ from frame to frame, and thus, each of the determined areas 312a to 312d is independent.”)

Regarding claim 4, Yakishyn teaches:
The method of claim 3, wherein the extracting comprises extracting the partial images based on gaze information on which the gaze of the user is determined in the input image input at each timestep of timesteps used to track the gaze of the user.([0065]-[0066], “For example, as a result of calculating the slope for the change graph 310 of a value .sigma. according to time, when the slope for the graph during a period t3.about.t4 is greater than the predetermined first critical value and less than or equal to the second critical value, the display apparatus 10 may obtain the coordinates 311 of the viewing area in which the gaze of the user is located during the period t3.about.t4 and a reproduction section t3.about.t4 312 of the VR image that is reproduced while the gaze of the user is located in the viewing area. Referring to 

Regarding claim 5, Yakishyn teaches:
The method of claim 2, wherein the recognizing comprises: recognizing either one or both of the situation of the object of interest and the object of interest included in the image sequence by applying the image sequence to a first neural network continued to perform object recognition; ([0074], “Also, the display apparatus 10 may provide context information regarding a key frame. In an embodiment, the display apparatus 10 may provide the context information regarding the key frame by analyzing emotion information of the user and information regarding an object included in the key frame. For example, the display apparatus 10 may use a convolutional neural network (CNN) and a recurrent neural network language model (RNNLM) to provide the context information regarding the key frame.” [0066], “In an embodiment, the determined areas 312a to 312d may be different with respect to each frame included in the VR image. Also, the determined areas 312a to 312d may differ from frame to frame, and thus, each of the determined areas 312a to 312d is independent.”) and recognizing a task being performed by the user by applying the image sequence to a second neural network continued to perform task recognition. (This feature is not a selected feature from parent claim 1.)

Regarding claim 6, Yakishyn teaches:
The method of claim 5, wherein the recognizing of the task comprises: generating a coded image by visually encoding temporal information included in each gaze corresponding to the image sequence; and predicting the task being performed by the user based on the image sequence and the coded image. (This feature is not selected feature from parent claim 1.)

Regarding claim 7, Yakishyn teaches:
The method of claim 6, wherein the temporal information comprises any one or any combination of any two or more of a gaze trajectory, a velocity during eye movements, a duration of each fixation, whether the fixations are repeated on the partial regions, a count of recurrent/repeated fixations, an interval of the recurrent/repeated fixations, and a coverage area of the fixations. (This feature depends on a feature that is not selected from parent claim 1.)

Regarding claim 8, Yakishyn teaches:
The method of claim 6, wherein the generating of the coded image comprises: generating coded partial images by encoding the temporal information to each RGB channel in partial regions to which gazes corresponding to the image sequence are mapped in the input image; and generating the coded image by combining the coded partial images. (This feature depends on feature that is not selected from parent claim 1.)

Regarding claim 9, Yakishyn teaches:
The method of claim 6, wherein the predicting comprises: obtaining feature vectors corresponding to the image sequence based on the image sequence and the coded image; and classifying the task based on the feature vectors. (This feature depends on a feature that is not selected from parent claim 1.)

Regarding claim 10, Yakishyn teaches:
The method of claim 9, wherein the obtaining comprises: extracting first feature vectors from partial regions to which gazes corresponding to the image sequence are mapped; extracting second feature vectors based on the coded image; and obtaining feature vectors corresponding to the image sequence by concatenating the first feature vectors and the second feature vectors. (This feature depends on a feature that is not selected from parent claim 1.)

Regarding claim 11, Yakishyn teaches:
The method of claim 1, wherein the situation of the object of interest comprises a situation in which any one or any combination of any two or more of occlusion, blur, distortion caused by rain, low illumination, and light reflection occurs with respect to the object of interest in the image sequence. ([0077], “Referring to FIG. 5B, the display apparatus 10 may provide a VR image of an area 530 where an image of interest is reproduced at a picture quality higher than that of a VR image 520 that is reproduced in remaining areas. In an embodiment, at the time of loading a stored VR image or streaming a real-time image, the display apparatus 10 may provide the VR image of the area 530 where the image of interest is reproduced at a picture quality higher than that of the VR image 520 that is reproduced in remaining areas.”.)

Regarding claim 12, Yakishyn teaches:
The method of claim 1, wherein the task of the user comprises any one or any combination of any two or more of search, object identification, matching, counting, measurement, and freely viewing. (This feature depends on a feature that is not a selected from parent claim 1.)

Regarding claim 14, Yakishyn teaches:
The method of claim 1, wherein the determining comprises: 
determining a descriptor corresponding to the object of interest of the user and the situation of the object of interest; ([0071], “The display apparatus 10 may provide information regarding the key frames 420. In an embodiment, information 430 regarding a key frame may include, but is not limited to, a name of the VR image including the key frame, coordinates of the key frame, a reproduction location of the key frame, information regarding an object included in the key frame, and context information regarding the key frame.” [0073]-[0074], and 
determining the relevant information by searching a table for a result of combining the descriptor and the task of the user, the table including information of a relationship between the object of interest and the task of the user. (This feature depends on the task feature that is not a selected from parent claim 1.)

Regarding claim 15, Yakishyn teaches:
The method of claim 1, wherein the visually augmenting comprises either one or both of: 
visually augmenting the input image by matching the relevant information to the input image; and visually augmenting the input image by correcting the input image based on the relevant information. ([0054], “For example, the display apparatus 10 may determine a VR image 21 of an area where a certain building is reproduced as an image of interest 41. The display apparatus 10 may identify that an object included in a key frame of the image of 

Regarding claim 16, Yakishyn teaches:
The method of claim 1, wherein the visually augmenting comprises visually augmenting the input image by selectively providing additional information for each determined situation corresponding to the relevant information.([0086], “In an embodiment, the received information regarding the image of interest may include coordinates of an area where the image of interest is reproduced, a reproduction section of the image of interest, information regarding a key frame included in the image of interest, information regarding an object included in the key frame, and context information regarding the key frame. Also, the received information regarding the image of interest may include, but is not limited to, a message regarding the image of interest input by the other users and obtained emotion information of the other users.” FIG. 1)

Regarding claim 17, Yakishyn teaches:
The method of claim 1, further comprising: acquiring the input image and gaze information including a gaze of the user corresponding to the input image. ([0050]-[0051], “For example, the user may have the display apparatus 10 fixed to his or her head and view the VR image 20 through the display apparatus 10, and the display apparatus 10 may sense a gaze of the user and determine in which area of the 360 degree image the VR image 20 that is being viewed by the user is being reproduced. In an embodiment, the display apparatus 10 may 

Regarding claim 18, Yakishyn teaches:
The method of claim 1, further comprising outputting the visually augmented image. ([0054], “For example, the display apparatus 10 may determine a VR image 21 of an area where a certain building is reproduced as an image of interest 41. The display apparatus 10 may identify that an object included in a key frame of the image of interest 41 is `Building A`, and may provide information regarding a construction year and features of the `Building A` to the user.” FIG. 1)

Regarding claim 19, Yakishyn teaches:
A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method of claim 1. (Abstract: “A method a device and a computer readable medium for 

Regarding claim 20, Yakishyn teaches:
An apparatus with image augmentation,(FIG. 1) the apparatus comprising: 
a communication interface configured to acquire an input image and gaze information including a gaze of a user corresponding to the input image; ([0050], “A display apparatus 10 may provide a VR image 20 to a user. In an embodiment, the display apparatus 10 may provide a 360-degree image to the user, but a type of an image which is provided to the user is not limited thereto. For example, the user may have the display apparatus 10 fixed to his or her head and view the VR image 20 through the display apparatus 10, and the display apparatus 10 may sense a gaze of the user and determine in which area of the 360 degree image the VR image 20 that is being viewed by the user is being reproduced. In an embodiment, the display apparatus 10 may analyze the gaze of the user by using a sensor for sensing an orientation of the display apparatus 10 and a sensor for sensing a gaze direction of the user, and thus, may determine an area 30 (hereinafter referred to as a viewing area) of the VR image 20 being viewed by the user.”)
one or more processors (FIG. 1, 10) configured to: 
The rest of claim recites similar limitations of claim 1, thus are rejected using the same rationale.


Claims 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yakishyn in view of Wang et al. (US 2018/0150681 A1).
Regarding claim 13, Yakishyn teaches:
The method of claim 1, wherein the recognizing comprises: 
setting a first window region and a second window region corresponding to partial regions of different sizes in the input image, the second window region being larger than the first window region; (FIG. 3B and FIG. 7, 710)
However, Yakishyn does not, but Wang teaches:
resizing a resolution of the second window region by downsampling the second window region; ([0090],” Next in face detection system 300, each detected moving area 318, which is a portion of input video image 302, is received by pyramid and patch generation module 306. Pyramid and patch generation module 306 is configured to convert moving area 318 into a "pyramid" of multi-resolution representations of moving area 318 by downsampling moving area 318 with different downsampling factors, whereby allowing subsequent face detection modules to detect faces of different scales in moving area 318. More specifically, a higher-resolution representation of the moving area 318 in the "pyramid" can be used to detect smaller faces in the original input image 302, while a lower-resolution representation of moving area 318 in the "pyramid" can be used to detect larger faces in the original input image 302.” FIG. 3)
detecting a first object candidate from the first window region, and detecting a second object candidate from the downsampled second window region; ([0090],” Next in face detection system 300, each detected moving area 318, which is a portion of input video image and 
recognizing the object of interest included in the input image based on either one or both of the first object candidate and the second object candidate.(FIG. 3, “As shown in FIG. 3, face detection system 300 receives a video image 302 as input and generates face detection decisions 316 as output.”)
Yakishyn teaches an image augmentation method based on an object detection. Wang teaches a specific method of object detection using low-cost CNN module to reduce system requirement.
  it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Yakishyn with the specific method of Wang to reduce system requirement during object detection.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction 
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 21-25 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yakishyn.
Regarding claim 21, Yakishyn teaches:
A processor-implemented method with image augmentation, (FIG. 1) the method comprising: 
recognizing, based on a gaze of the user corresponding to the input image, characteristics of a user from partial regions of an input image;  ([0050]-[0051], “For example, the user may have the display apparatus 10 fixed to his or her head and view the VR image 20 through the display apparatus 10, and the display apparatus 10 may sense a gaze of the user and determine in which area of the 360 degree image the VR image 20 that is being viewed by the user is being reproduced. In an embodiment, the display apparatus 10 may analyze the gaze of the user by using a sensor for sensing an orientation of the display apparatus 10 and a sensor for sensing a gaze direction of the user, and thus, may determine an area 30 (hereinafter referred to as a viewing area) of the VR image 20 being viewed by the user….”[0052], “The display apparatus 10 may determine an image that is reproduced in some of a plurality of viewing areas 30 as an image of interest 40. In an embodiment, the display apparatus 10 may determine the image of interest 40 by using coordinates of the viewing area 30 and a 
determining relevant information used for the user, based on the recognized characteristics of the user;  ([0053], “The display apparatus 10 may provide information regarding the image of interest 40 to the user. In an embodiment, the information regarding the image of interest 40 may include, but is not limited to, coordinates of an area where the image of interest 40 is reproduced, a reproduction section of the image of interest 40, information regarding a key frame included in the image of interest 40, information regarding an object included in the key frame, and context information regarding the key frame.”) and 
generating a visually augmented image by visually augmenting the input image based on the relevant information. ([0054], “For example, the display apparatus 10 may determine a VR image 21 of an area where a certain building is reproduced as an image of interest 41. The display apparatus 10 may identify that an object included in a key frame of the image of interest 41 is `Building A`, and may provide information regarding a construction year and features of the `Building A` to the user.”)

Regarding claim 22, Yakishyn teaches:
The method of claim 21, wherein the characteristics of the user comprise any one or any combination of any two or more of an object of interest of a user, a situation of the object of interest, and a task of the user from partial regions of an input image. ([0050]-[0051], “For example, the user may have the display apparatus 10 fixed to his or her head and view the VR image 20 through the display apparatus 10, and the display apparatus 10 may 

Regarding claim 23, Yakishyn teaches:
The method of claim 22, wherein the recognizing comprises: generating an image sequence including partial regions of the input image based on the gaze of the user; ([0066], “Referring to FIG. 3B, the gaze of the user moves according to the movement of the object included in the VR image. The display apparatus 10 may determine areas 312a to 312d of the VR image corresponding to a movement path of the gaze of the user during the period t3.about.t4 312. Also, the display apparatus 10 may determine the VR image that is reproduced in the determined areas 312a to 312d as a dynamic image of interest. In an embodiment, the determined areas 312a to 312d may be different with respect to each frame included in the VR image. Also, the determined areas 312a to 312d may differ from frame to frame, and thus, each of the determined areas 312a to 312d is independent.”) and 
recognizing any one or any combination of any two or more of the object of interest of the user, the situation of the object of interest, and the task of the user, based on the image sequence. ([0066], “Referring to FIG. 3B, the gaze of the user moves according to the movement of the object included in the VR image. The display apparatus 10 may determine areas 312a to 312d of the VR image corresponding to a movement path of the gaze of the user during the period t3.about.t4 312. Also, the display apparatus 10 may determine the VR image that is reproduced in the determined areas 312a to 312d as a dynamic image of interest. In an embodiment, the determined areas 312a to 312d may be different with respect to each frame included in the VR image. Also, the determined areas 312a to 312d may differ from frame to frame, and thus, each of the determined areas 312a to 312d is independent.”)

Regarding claim 24, Yakishyn teaches:
The method of claim 23, wherein the generating comprises: extracting partial images mapped to the gaze of the user from the input image; and generating the image sequence by sequentially combining the partial images. ([0066], “Referring to FIG. 3B, the gaze of the user moves according to the movement of the object included in the VR image. The display apparatus 10 may determine areas 312a to 312d of the VR image corresponding to a movement path of the gaze of the user during the period t3.about.t4 312. Also, the display apparatus 10 may determine the VR image that is reproduced in the determined areas 312a to 312d as a dynamic image of interest. In an embodiment, the determined areas 312a to 312d may be different with respect to each frame included in the VR image. Also, the determined areas 312a 

Regarding claim 25, Yakishyn teaches:
The method of claim 21, further comprising outputting the visually augmented image. ([0054], “For example, the display apparatus 10 may determine a VR image 21 of an area where a certain building is reproduced as an image of interest 41. The display apparatus 10 may identify that an object included in a key frame of the image of interest 41 is `Building A`, and may provide information regarding a construction year and features of the `Building A` to the user.” FIG. 1)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YANNA WU whose telephone number is (571)270-0725.  The examiner can normally be reached on Monday-Thursday 8:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/YANNA WU/Primary Examiner, Art Unit 2611