Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-4, 9-13, 15-17, 19-20, and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Oshima (U.S. PGPUB 20160054859) in view of Wright et al. (U.S. PGPUB 20160378294), and further in view of Clement et al. (U.S. PGPUB 20170038830).
With respect to claim 1, Oshima discloses an image processing apparatus (paragraph 30, As shown in FIG. 1, the camera scanner 101 is connected to a host computer 102 and a printer 103 via a network 104 such as an Ethernet), comprising:
one or more processors (paragraph 36, As shown in FIG. 3, the controller unit 201 includes a CPU 302, a RAM 303, a ROM 304, an HDD 305, a network I/F 306, an image processing processor 307, a camera I/F 308, a display controller 309, a serial I/F 310, an audio controller 311, and a USB controller 312, which are connected to a system bus 301), configured to act as a plurality of units, comprising:
an operation point selection unit (paragraph 37, The CPU 302 is a central processing unit that performs overall control of operations of the controller unit 201, which acts as a selection unit) configured to select an operation point from a plurality of points corresponding to a contour of the selected operating part in accordance with respective positions of the plurality of points (paragraph 59, In step S603, processing for detecting the shape of the user's hand and a fingertip from the acquired group of three-dimensional points is performed as shown in steps S631 to S634. In step S631, the group of three-dimensional points corresponding to a hand is obtained from the group of three-dimensional points acquired in step S602, paragraph 63, This is performed on all of the contour points of the outline, and if the center of a circle that fits and has a curvature greater than a predetermined value is inside the outline of the hand, the point in the middle of the finite number of adjacent contour points is determined as the fingertip, paragraph 72, In step S606, touch gesture judgment processing is performed. The three-dimensional coordinates of the detected fingertip and the previously-described plane parameters of the document stand 204 are used in this calculation);
an operating unit (paragraph 36, FIG. 3 is a diagram showing an example of the hardware configuration of the controller unit 201) configured to operate the virtual object based on a positional relationship between the selected operation point (paragraph 74, in step S607, if the determination "touch gesture" was made in the immediately previous step, the procedure moves to step S608, paragraph 81, In step S609, touch position determination processing is performed. This is processing for estimating the position of the finger pad at which the user actually feels the touching, paragraph 92, In step S605, the judged touch gesture and the three-dimensional coordinates of the touch position are notified to the main control unit 402, and then the procedure returns to step S602, and gesture recognition processing is repeated). Gesture recognition processing is 
an extraction unit configured to extract a plurality of objects from an image captured by a capturing unit, as a plurality of candidates of operating parts used for operating a virtual object;
a determination unit configured to determine an operation state of the virtual object;
an operating part selection unit configured to select an operating part from the plurality of candidates of operating parts in accordance with a selection condition corresponding to the determined operation state of the virtual object;
a generation unit configured to generate an image of the virtual object based on a position and orientation of the capturing unit;
an output unit configured to output, to a display unit, a composite image of the captured image and the image of the virtual object; and
the processing unit is configured to operate the virtual object based on a positional relationship between the selected operation point and the virtual object.

Wright et al., who also deal with hand detection, disclose a method including:
a generation unit configured to generate an image of the virtual object based on a position and orientation of the capturing unit (paragraph 17, The head mounted display device 10 may further include a position sensor system 22 that may include one or more position sensors such as accelerometer(s), gyroscope(s), magnetometer(s), global positioning system(s), multilateration tracker(s), and/or other sensors that output position sensor information useable as a position, orientation, paragraph 19, the optical and positional sensor information may be used to create a virtual model of the real-world background. In some embodiments, the position and orientation of the vantage point may be characterized relative to this virtual space. Moreover, the virtual model may be used to determine positions of virtual objects in the virtual space and add additional virtual objects to be displayed to the user at a desired depth and location within the virtual world); and
an output unit configured to output, to a display unit, a composite image of the captured image and the image of the virtual object (paragraph 14, The head mounted display device 10 may be configured to display a virtual representation such as a three dimensional graphical rendering of the physical environment in front of the user that may include additional virtual objects, such as a virtual cursor, or may be configured to display camera-captured images of the physical environment along with additional virtual objects including the virtual cursor overlaid on the camera-captured images, paragraph 21, In an augmented reality configuration, the virtual cursor is a holographic cursor that is displayed on an at least partially see-through display, such that the virtual cursor appears to be superimposed onto the physical environment being viewed by the user).
	Oshima and Wright et al. are in the same field of endeavor, namely computer graphics.
	Before the effective filing date of the claimed invention, it would have been obvious to apply the method of including: a generation unit configured to generate an image of the virtual object based on a position and orientation of the capturing unit and 
	Clement et al., who also deal with hand detection, disclose a method including:
an extraction unit configured to extract a plurality of objects from an image captured by a capturing unit, as a plurality of candidates of operating parts used for operating a virtual object (paragraph 23, If the user wishes to interact with the VR environment, he or she may reach toward virtual objects in the VR environment using one or more fingers, hands, arms, feet, legs, and the like. Such a reach (e.g., movement) may be detected as input in which to simulate movement of virtual objects and modifications to the VR environment, paragraph 24, The user's fingers, palms, forearms, and possibly other arm or portions may trigger collisions and affect the physical world);
a determination unit configured to determine an operation state of the virtual object (paragraph 44, a collision mode may be adapted for small area selection (e.g., keys, menu items, or detailed virtual object manipulation) or large area selection (e.g., lifting objects, moving blocks or other virtual content, drawing in virtual space, etc.), small area selection states and large area selection states correspond to different operation states for virtual objects);
This can allow the user to use a palm to scroll and/or select, but if a finger inadvertently interacts with a collision zone, the module 112 can block (e.g., mute) the finger selection since the palm-based mode is the only active collision mode for performing scrolling in this example, blocking the finger selection selects the palm as the operating part based on the state of the scrollable content virtual object), wherein the operating unit is configured to operate the virtual object based on a positional relationship between the selected operation point and the virtual object (paragraph 45, in addition to triggering button clicks with an index finger, the collision mode module 112 can allow users to move a scrollable region with the palm center of their hands).
Oshima, Wright et al., and Clement et al. are in the same field of endeavor, namely computer graphics.
Before the effective filing date of the claimed invention, it would have been obvious to apply the method wherein an extraction unit configured to extract a plurality of objects from an image captured by a capturing unit, as a plurality of candidates of operating parts used for operating a virtual object;
a determination unit configured to determine an operation state of the virtual object;
an operating part selection unit configured to select an operating part from the plurality of candidates of operating parts in accordance with a selection condition corresponding to the determined operation state of the virtual object, wherein the 

With respect to claim 3, Oshima as modified by Wright et al. and Clement et al. disclose the processing apparatus according to claim 1, wherein the operating unit arranges the virtual object at the operation point (Wright et al.: paragraph 15, the virtual object may be a virtual cursor that is displayed to the user, such that the virtual cursor appears to the user to be located at a desired location in the virtual three dimensional environment, paragraph 27, the user hand gesture may be used as an input to execute a programmatic function. It will be appreciated that while the example illustrated in FIG. 3 shows a hand gesture of a raised forefinger, paragraph 32, the visual appearance 28D of the holographic cursor affords the user with the understanding that a drawing function is currently being executed, and therefore any movements of the hand will cause a virtual image to be drawn on the wall 36, the holographic cursor is arranged at the operation point), and executes corresponding processing if the virtual object is in contact with another virtual object (paragraph 27, the head mounted display device 10 may be configured to detect a recognized object in the three dimensional environment 32. In the example illustrated in FIG. 3, the recognized object is the wall 36 of the three dimensional environment 32, which is a room, paragraph 31, maintaining the hand gesture 40, and user may move that hand around in trackable space of the three dimensional environment, and the head mounted display device 10 may render and display a virtual image 41 on the wall 36 that corresponds to the tracked motion of the user's hand, paragraph 33, Now turning to FIG. 4, the recognized object may be a virtual object rather than a real physical object located in the three dimensional environment 32). Thus, the planar surface 44 of virtual object 42 is interacted with by the cursor virtual object when the virtual object 42 is the recognized object.
It would have been obvious to apply the method wherein the processing unit arranges a virtual object at the operation point, and executes corresponding processing if the virtual object is in contact with another virtual object, because the visual appearance of the holographic cursor may provide feedback to the user that the programmatic function is currently being executed (paragraph 32 of Wright et al.) and virtually execute a drawing function.
With respect to claim 4, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 1, wherein the operating unit arranges a virtual object near the operation point (Wright et al.: paragraph 15, the virtual object may be a virtual cursor that is displayed to the user, such that the virtual cursor appears to the user to be located at a desired location in the virtual three dimensional environment, paragraph 27, the user hand gesture may be used as an input to execute a programmatic function. It will be appreciated that while the example illustrated in FIG. 3 shows a hand gesture of a raised forefinger, paragraph 32, the visual appearance 28D of the holographic cursor affords the user with the understanding that a drawing function is currently being executed). The holographic cursor is a virtual object near the operation point for performing the drawing operation.
Among these pixels, the group of pixels 814 includes nine pixels that were discovered as the fingertip, and it is assumed that a pixel 806 at the middle was discovered as the fingertip).
	With respect to claim 10, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 1, wherein, while the operating unit does not operate the virtual object in a state where the operation point and the virtual object are associated with each other, the operation point selection unit selects, as the operation point, a point satisfying a predetermined condition in the one point group, and, in a region based on the point, a point satisfying the predetermined condition (Oshima: paragraph 84, Among these pixels, the group of pixels 814 includes nine pixels that were discovered as the fingertip, and it is assumed that a pixel 806 at the middle was discovered as the fingertip. Also, 815 indicates the centroid of the group of pixels 814 that includes the fingertip point 806, and it is sufficient that the centroid 815 is determined as the touch position).
With respect to claim 11, Oshima as modified by Wright et al. and Holz et al. disclose the image processing apparatus according to claim 1, wherein the plurality of units further comprises: 
In step S632, a two-dimensional image in which the acquired group of three-dimensional points corresponding to the hand is projected onto the plane of the document stand 204 is generated, and the outline of the hand is detected, as applied to the plurality of hands), and wherein, while the operating unit does not operate the virtual object in a state where the operation point and the virtual object are associated with each other, the operation point selection unit selects a point satisfying a predetermined condition from the one point group out of point groups configuring the three-dimensional contours as the operation point (Oshima: paragraph 59, In step S603, processing for detecting the shape of the user's hand and a fingertip from the acquired group of three-dimensional points is performed as shown in steps S631 to S634. In step S631, the group of three-dimensional points corresponding to a hand is obtained from the group of three-dimensional points acquired in step S602, paragraph 63, This is performed on all of the contour points of the outline, and if the center of a circle that fits and has a curvature greater than a predetermined value is inside the outline of the hand, the point in the middle of the finite number of adjacent contour points is determined as the fingertip, paragraph 72, In step S606, touch gesture judgment processing is performed. The three-dimensional coordinates of the detected fingertip and the previously-described plane parameters of the document stand 204 are used in this calculation, as applied to multiple hands).
With respect to claim 12, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 11, wherein the contour generation unit generates the three-dimensional contour by using a stereo image Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition). It would have been obvious to generate the three-dimensional contour using a stereo image, because this would implement one of many known methods for detecting a hand or target object.
	With respect to claim 13, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 11, wherein the contour generation unit generates the three-dimensional contour by using an image of the target object captured by a depth camera (Wright et al.: paragraph 66, Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition). It would have been obvious to generate the three-dimensional contour using an image from a depth camera, because this would implement one of many known methods for detecting a hand or target object.
With respect to claim 15, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 4, wherein the generation unit generates the image of the virtual object based on the operation point (Wright et al.: paragraph 32, As illustrated in FIG. 3, the visual appearance of the holographic cursor may provide feedback to the user that the programmatic function is currently being executed, the visual appearance 28D of the holographic cursor affords the user with the understanding that a drawing function is currently being executed, and therefore any movements of the hand will cause a virtual image to be drawn on the wall 36, the holographic cursor is arranged at the operation point).
the virtual object may be a virtual cursor that is displayed to the user, such that the virtual cursor appears to the user to be located at a desired location in the virtual three dimensional environment. In the augmented reality configuration, the virtual object may be a holographic cursor that is displayed to the user, such that the holographic cursor appears to the user to be located at a desired location in the real world physical environment). 
With respect to claim 17, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 14, wherein the display unit is a head-mounted display (Wright et al.: paragraph 14, FIG. 1 illustrates an example head mounted display device 10), and wherein the capturing unit is mounted on the head-mounted display (Wright et al.: paragraph 16, The head mounted display device 10 includes an optical sensor system 16 that may include one or more optical sensors. In one example, the optical sensor system 16 includes an outward facing optical sensor 18 that may be configured to detect the real-world background from a similar vantage point (e.g., line of sight) as observed by the user through the at least partially see-through stereoscopic display 12, the outward facing optical sensor 18 may include one or more component sensors, including an RGB camera and a depth camera). It would have been obvious to include a head-mounted display because this would allow for more user input flexibility in implementing augmented reality environments.
	

	With respect to claim 20, Oshima as modified by Wright et al. and Clement et al. disclose a non-transitory computer-readable storage medium storing a computer program (Oshima: paragraph 132, Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium) for causing a computer to function as a selection unit configured to, implement the apparatus of claim 1; see rationale for rejection of claim 1.

	With respect to claim 25, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 1, wherein the operation point selection unit switches, according to selection of the virtual object, a selection method of the operation point from one point group (Clement et al.: paragraph 45, the collision mode module 112 can determine that the virtual environment is providing scrollable content and in response, can select a palm-based collision mode. Selecting a palm-based collision mode may include configuring the content to be scrolled in response to receiving a palm gesture initiated by a hand of the user. In addition, selecting a palm-based collision mode may include modifying portions of the hand other than the palm to be ineffective, paragraph 48, the collision mode module 112 can dynamically change a user's arm into a spear or stylus that can point at a virtual object and be able to select a target that appears smaller than the user's finger). The selection method of the operation point is switched to one of finer detail according to selection of the target virtual object. It would have been obvious to apply the method wherein the selection unit switches, according to selection of the virtual object, a selection method of the operation point from the one point group because this would reduce the likelihood of the user selecting multiple targets or unwanted targets (paragraph 48 of Clement et al.).
	
Claims 2 and 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Oshima (U.S. PGPUB 20160054859) in view of Wright et al. (U.S. PGPUB 20160378294), Clement et al. (U.S. PGPUB 20170038830), and further in view of Lin (U.S. PGPUB 20050264527).
With respect to claim 2, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 1. However, Oshima as modified by Wright et al. and Clement et al. do not expressly disclose while the operating unit does not operate the virtual object in a state where the operation point and the virtual object are associated with each other, the operation point selection unit selects the operation point from one point group positioned most upward in the captured image out of the point groups corresponding to the contours of the plurality of objects.
Lin, who also deals with hand detection, discloses a method while the operating unit does not operate the virtual object in a state where the operation point and the virtual object are associated with each other, the operation point selection unit selects the operation point from one point group positioned most upward in the captured image out of the point groups corresponding to the contours of the plurality of objects If the object is a fingertip, the edge is detected using the edge detection processes explained more fully below, the central point of the highest yi (in screen coordinate) edge pixels is determined in the image data, and the xi value of the object is obtained). The state where the operation point and the virtual object are associated is not being executed because Lin does not disclose interacting with the virtual object.
Oshima, Wright et al., Clement et al., and Lin are in the same field of endeavor, namely computer graphics.
Before the effective filing date of the claimed invention, it would have been obvious to apply the method while the operating unit does not operate the virtual object in a state where the operation point and the virtual object are associated with each other, the operation point selection unit selects the operation point from one point group positioned most upward in the captured image out of the point groups corresponding to the contours of the plurality of objects, as taught by Lin, to the Oshima as modified by Wright et al. and Clement et al. system, because this information may be used to determine a predetermined identifier of the object (e.g. thumb, middle finger, ring finger, etc.) (paragraph 159 of Lin), thus facilitate determining the finger object.
	With respect to claim 22, Oshima as modified by Wright et al., Clement et al., and Lin et al. disclose the image processing apparatus according to claim 1, wherein, while the operating unit does not operate the virtual object in a state where the operation point and the virtual object are associated with each other, the operation point selection unit selects, out of contours of the plurality of objects, a contour positioned most upward in the captured image (Lin et al.: paragraph 159, If the object is a fingertip, the edge is detected using the edge detection processes explained more fully below, the edge corresponding to the fingertip is the contour positioned most upward), and selects, from the one point group corresponding to the selected contour positioned most upward in the captured image, the point satisfying the predetermined condition as the operation point (Lin et al.: paragraph 159, If the object is a fingertip, the edge is detected using the edge detection processes explained more fully below, the central point of the highest yi (in screen coordinate) edge pixels is determined in the image data, and the xi value of the object is obtained). The state where the operation point and the virtual object are associated is not being executed because Lin does not disclose interacting with the virtual object.
	With respect to claim 23, Oshima as modified by Wright et al., Clement et al., and Lin et al. disclose the image processing apparatus according to claim 1, wherein the operation point selection unit selects the operation point in accordance with a positioning most upward in a row direction in the captured image (Lin et al.: paragraph 159, If the object is a fingertip, the edge is detected using the edge detection processes explained more fully below, the central point of the highest yi (in screen coordinate) edge pixels is determined in the image data, and the xi value of the object is obtained). The highest yi coordinate corresponds to the most upward in a row direction in the captured image.

Claims 14 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Oshima (U.S. PGPUB 20160054859) in view of Wright et al. (U.S. PGPUB .
With respect to claim 14, Oshima as modified by Wright et al. and Clement et al. disclose the image processing apparatus according to claim 1. However, Oshima as modified by Wright et al. and Clement et al. do not expressly disclose an occlusion relationship between the target object and the virtual object is made to correspond in the output composite image.
Holz et al., who also deal with hand detection, disclose a method wherein an occlusion relationship between the target object and the virtual object is made to correspond in the output composite image (Holz et al.: column 7, lines 65-67, column 8, lines 1-9, Association of anchor point 311-2 with virtual surface 311-5 can enable modeling of a user interaction "anchored" to a physical surface, e.g., a user's hand resting on a flat surface while typing while interacting meaningfully with the virtual space).
Oshima, Wright et al., Clement et al., and Holz et al. are in the same field of endeavor, namely computer graphics.
Before the effective filing date of the claimed invention, it would have been obvious to apply the method wherein an occlusion relationship between the target object and the virtual object is made to correspond in the output composite image, as taught by Holz et al., to the Oshima as modified by Wright it al. and Clement et al. system, because this would allow for realistic interactions with a virtual surface.
With respect to claim 21, Oshima as modified by Wright et al., Clement et al., and Holz et al. disclose the image processing apparatus according to claim 1, wherein A user can drag the virtual object as illustrated by example 504 and manipulate it in preparation for use as illustrated by example 505). Dragging the virtual objects moves the selected virtual object according to movement of the operation point. It would have been obvious to include the processing to be executed in the state where the operation point and the virtual object are associated with each other is processing in which a selected virtual object moves according to movement of the operation point, because this would allow a user to realistically interact with virtual objects.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1 and 19-21 have been considered but are moot in view of the new grounds of rejection.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. PGPUB 20170287222 to Fujimaki for a method of detecting multiple hands interacting with a virtual object
U.S. Patent No. 9,679,197 to Sills et al. for a method of determining an authorized hand for performing an operation
U.S. PGPUB 20150358614 to Jin for a method of detecting multiple interaction points of multiple hands
U.S. Patent No. 9,939,914 to Fleishman et al. for a method of tracking movements of multiple body parts for interacting with virtual objects.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW GUS YANG whose telephone number is (571)272-5514.  The examiner can normally be reached on M-F 9 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Zimmerman can be reached on (571)272-7653.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ANDREW G YANG/Primary Examiner, Art Unit 2619                                                                                                                                                                                                        
3/17/21