DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 18-21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 18 recites the limitation "wherein performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image comprises" in lines 1-3.  There is insufficient antecedent basis for this limitation in the claim. Claim 1, which claim 18 depends on, does not include the limitation “performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image”, however claim 17, which depends on claim 1, includes the limitation “performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image”. For the purposes of furthering prosecution, examiner has interpreted claim 18 to be dependent on claim 17.
Claim 19 recites the limitation "wherein performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image comprises" in lines 1-3.  There is insufficient antecedent basis for this limitation in the claim. Claim 1, which claim 19 depends on, does not include the limitation “performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image”, however claim 17, which depends on claim 1, includes the limitation “performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image”. For the purposes of furthering prosecution, examiner has interpreted claim 19 to be dependent on claim 17.
Claim 20 recites the limitation "wherein performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image comprises" in lines 1-3.  There is insufficient antecedent basis for this limitation in the claim. Claim 1, which claim 20 depends on, does not include the limitation “performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image”, however claim 17, which depends on claim 1, includes the limitation “performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image”. For the purposes of furthering prosecution, examiner has interpreted claim 20 to be dependent on claim 17.
Claim 21 recites the limitation "wherein performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image comprises" in lines 1-3.  There is insufficient antecedent basis for this limitation in the claim. Claim 1, which claim 21 depends on, does not include the limitation “performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image”, however claim 17, which depends on claim 1, includes the limitation “performing the action in relation to the object within the captured image based on the texture image corresponding to the captured image”. For the purposes of furthering prosecution, examiner has interpreted claim 21 to be dependent on claim 17.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claim 1, 4-18, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Liang et al. ("Geometric Rectification of Camera-Captured Document Images", April 2008), hereinafter referred to as Liang in view of Clark et al. (US 20180189974 A1), hereinafter referred to as Clark.

Regarding claim 1, Liang discloses a method (Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”) comprising: 
receiving, by a processor (Section 1, “handheld devices equipped with cameras, such as PDAs and cell phones, are ideal platforms for mobile OCR applications”, mobile cellphone has processor), an image of an object (Section 2.3, “we assume a single image as the input”, Abstract, “camera-captured documents”) captured by a camera (Abstract, “handheld cameras”), wherein the captured image (Section 2.3, “we assume a single image as the input”, Abstract, “camera-captured documents”) includes a surface that corresponds to a non-planar surface of the object and/or that has distortions (Abstract, “camera-captured documents may suffer from distortions caused by a nonplanar document shape and perspective projection”) introduced by a three- dimensional (3D) perspective of the camera relative to the object during image capture (Section 4.2, “curvature of the document shape and its pose relative to the camera”); 
determining, by the processor (Section 1, “handheld devices equipped with cameras, such as PDAs and cell phones, are ideal platforms for mobile OCR applications”, mobile cellphone has processor), pose parameters from the captured image (Fig. 3, Section 3.2.1, “We formulate the major texture flow detection as a local skew detection problem. In document image analysis, skew detection finds the orientation of text lines with respect to the horizontal axis. We apply the classic projection profile analysis to detect the skew angle in a small neighborhood. Then, we use a relaxation labeling approach to smooth out possible errors and obtain a coherent result.” and “The rest of this paper describes our method of extracting a 3D structure from texture flows and using it to process general curved document images”, in the Specification para. 0020, pose parameters is defined as “The pose parameters 220 specify the pose of the object 204 within the captured image 202. The pose of the object 204 is the 3D spatial position and orientation of the object 204 relative to a reference camera.”, Liang teaches extracting the 3D structure from texture flows in the captured image as well as the orientation of the text, Liang also teaches obtaining Vh and Vv or the vanishing points of the major and minor texture flow tangent lines in Section 3.3.1); 
determining, by the processor (Section 1, “handheld devices equipped with cameras, such as PDAs and cell phones, are ideal platforms for mobile OCR applications”, mobile cellphone has processor), a plurality of image space 2D coordinates (Section 3.3.2, Metric Rectification, “Suppose that we set up a 2D coordinate system in the document plane so that the x-axis is aligned with Vh, whereas the y-axis is (must be) aligned with Vv. Every point on the document plane thus has a 2D coordinate.”) that planarize and/or undistort the surface of the captured image (Section 3.3.2, he homogeneous transformation H in equation 5 uses the 2D coordinates, “The inverse of H maps every point in the image plane back to the frontal-flat view of the document page and is called the rectification matrix”, Section 1, “we present a rectification framework that extracts the 3D document shape from a single 2D image and performs a shape-based geometric rectification to restore the frontal-flat view of the document”, rectification planarize or undistort the surface of the image as seen in Fig. 2), based on the pose parameters  (Fig. 3, Section 3.2.1, “We formulate the major texture flow detection as a local skew detection problem. In document image analysis, skew detection finds the orientation of text lines with respect to the horizontal axis. We apply the classic projection profile analysis to detect the skew angle in a small neighborhood. Then, we use a relaxation labeling approach to smooth out possible errors and obtain a coherent result.” and “The rest of this paper describes our method of extracting a 3D structure from texture flows and using it to process general curved document images”, in the Specification para. 0020, pose parameters is defined as “The pose parameters 220 specify the pose of the object 204 within the captured image 202. The pose of the object 204 is the 3D spatial position and orientation of the object 204 relative to a reference camera.”, Liang teaches extracting the 3D structure from texture flows in the captured image as well as the orientation of the text), a parameterized surface model definition (Section 1, “Under these assumptions, we show that we can constrain the physical page by a developable surface model, obtain a planar-strip approximation of the surface using texture flow data extracted from the text in the image, and use the 3D shape information to restore the frontal-flat document view., Section 2.4, “we use the developable-surface model to constrain the 3D shape estimation process.”), and camera properties (Section 3.4.43, “As for the focal length, we select a set of feasible values based on the physical lens constraint and perform the above process for each value. The f that results in minimum F is chosen as the best initial focal length”); and 
interpolating (Section 3.3.2, equation 5), by the processor (Section 1, “handheld devices equipped with cameras, such as PDAs and cell phones, are ideal platforms for mobile OCR applications”, mobile cellphone has processor) using the image space 2D coordinates (Section 3.3.2, Metric Rectification, “Suppose that we set up a 2D coordinate system in the document plane so that the x-axis is aligned with Vh, whereas the y-axis is (must be) aligned with Vv. Every point on the document plane thus has a 2D coordinate.”) to produce a texture image corresponding to the captured image (Section 3.3, Rectification of Planar Documents and Section 3.4, Rectification of Curved Documents, Section 3.3.2, the 2D coordinated are used for the homogeneous transformation H which is seen in equation 5, “The inverse of H maps every point in the image plane back to the frontal-flat view of the document page and is called the rectification matrix.”), wherein the texture image includes a surface that corresponds to the surface of the captured image but that is planar and/or undistorted (Fig. 1, the result image of the image rectification is planar/undistorted compared to the original image which is skewed and curved).

Liang does not explicitly disclose determining, by the processor, pose parameters from the captured image by using a machine learning model.
	However, Clark discloses determining (Fig. 3, step 104a), by the processor (para. 0048, “FIG. 6A depicts an exemplary implementation of the present invention in a tablet computer-based AR application” which should have a processor), pose parameters (Fig. 3, step 104a, contextually refined image device pose estimate) from the captured image (Fig. 1, step 101, input: single 2d image) by using a machine learning model (para. 0042, “with the exception that an Object Segmentation & Classification Machine Learning algorithm, 107, which has been specifically trained to recognize a floor object in a 2D scene image, is executed concurrently with the standard scene depth-sensing CNN, 102a, and the Imaging Device AFOV determination steps, 103a. In the exemplary flow diagram depicted, a trained CNN is the Machine Learning algorithm applied in step 107. The outputs of steps 102a, 103s, and 107 and used in step 104a to calculate a contextually refined pose estimate for the imaging device”, the CNN is used to calculate a contextually refined pose estimate).
	Liang and Clark are both considered to be analogous to the claimed invention because they are in the same field of pose determination of an object within an image. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Liang to incorporate the teachings of Clark of determining, by the processor, pose parameters from the captured image by using a machine learning model. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because adding CNN in the loop serves as a significantly faster and more accurate proxy to human 2D image depth and object segmentation/classification analysis (to include scene space boundaries such as floor and wall planes) as well as non-Machine Learning Computer Vision techniques (Clark, para.0016).

Regarding claim 4, the combination of Liang in view of Clark discloses the method of claim 1 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein the machine learning model comprises a neural network (Clark, Fig. 3, CNN is a neural network).

Regarding claim 5, the combination of Liang in view of Clark discloses the method of claim 1 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein determining the image space 2D coordinates (Liang, Section 3.3.2, Metric Rectification, “Suppose that we set up a 2D coordinate system in the document plane so that the x-axis is aligned with Vh, whereas the y-axis is (must be) aligned with Vv. Every point on the document plane thus has a 2D coordinate.”) comprises: 
constructing a pose matrix (Liang, Section 3.3.2, Vv and Vh are used to construct the matrix or homogeneous transformation H or equation 5, the 2D coordinates that was used to construct the pose matrix is from the 3D coordinate that was extracted by Liang in Section 3.2.1, the vanishing points are also used) from the pose parameters (Liang, Fig. 3, Section 3.2.1, “We formulate the major texture flow detection as a local skew detection problem. In document image analysis, skew detection finds the orientation of text lines with respect to the horizontal axis. We apply the classic projection profile analysis to detect the skew angle in a small neighborhood. Then, we use a relaxation labeling approach to smooth out possible errors and obtain a coherent result.” and “The rest of this paper describes our method of extracting a 3D structure from texture flows and using it to process general curved document images”, in the Specification para. 0020, pose parameters is defined as “The pose parameters 220 specify the pose of the object 204 within the captured image 202. The pose of the object 204 is the 3D spatial position and orientation of the object 204 relative to a reference camera.”, Liang teaches extracting the 3D structure from texture flows in the captured image as well as the orientation of the text, Liang also teaches obtaining Vh and Vv or the vanishing points of the major and minor texture flow tangent lines in Section 3.3.1); 
determining uv minima and maxima (Liang, Section 3.2.1, “we detect the local text line and vertical character stroke directions, which we define as the major and minor texture flows, respectively”) based on the pose matrix (Liang, Fig. 3, Section 3.2.1, “We formulate the major texture flow detection as a local skew detection problem. In document image analysis, skew detection finds the orientation of text lines with respect to the horizontal axis. We apply the classic projection profile analysis to detect the skew angle in a small neighborhood. Then, we use a relaxation labeling approach to smooth out possible errors and obtain a coherent result.” and “The rest of this paper describes our method of extracting a 3D structure from texture flows and using it to process general curved document images”, in the Specification para. 0020, pose parameters is defined as “The pose parameters 220 specify the pose of the object 204 within the captured image 202. The pose of the object 204 is the 3D spatial position and orientation of the object 204 relative to a reference camera.”, Liang teaches extracting the 3D structure from texture flows in the captured image as well as the orientation of the text, Liang also teaches obtaining Vh and Vv or the vanishing points of the major and minor texture flow tangent lines in Section 3.3.1), the parameterized surface model definition (Liang, Section 1, “Under these assumptions, we show that we can constrain the physical page by a developable surface model, obtain a planar-strip approximation of the surface using texture flow data extracted from the text in the image, and use the 3D shape information to restore the frontal-flat document view., Section 2.4, “we use the developable-surface model to constrain the 3D shape estimation process.”), and camera properties (Liang, Section 3.4.43, “As for the focal length, we select a set of feasible values based on the physical lens constraint and perform the above process for each value. The f that results in minimum F is chosen as the best initial focal length”); and 
generating a uv texture sample grid (Liang, Fig. 3, grid of the texture flows) based on the uv minima and maxima (Liang, Section 3.2.1, “we detect the local text line and vertical character stroke directions, which we define as the major and minor texture flows, respectively”), wherein the uv texture sample grid comprises a plurality of points (Liang, Fig. 3, the grid comprises a plurality of points).

Regarding claim 6, the combination of Liang in view of Clark discloses the method of claim 5 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein constructing the pose matrix  (Liang, Section 3.3.2, Vv and Vh are used to construct the matrix or homogeneous transformation H or equation 5, the 2D coordinates that was used to construct the pose matrix is from the 3D coordinate that was extracted by Liang in Section 3.2.1, the vanishing points are also used) comprises: 
constructing an initial pose matrix  (Liang, Section 3.3.2, Vv and Vh are used to construct the matrix or homogeneous transformation H or equation 5, the 2D coordinates that was used to construct the pose matrix is from the 3D coordinate that was extracted by Liang in Section 3.2.1, the vanishing points are also used) from the pose parameters (Liang, Fig. 3, Section 3.2.1, “We formulate the major texture flow detection as a local skew detection problem. In document image analysis, skew detection finds the orientation of text lines with respect to the horizontal axis. We apply the classic projection profile analysis to detect the skew angle in a small neighborhood. Then, we use a relaxation labeling approach to smooth out possible errors and obtain a coherent result.” and “The rest of this paper describes our method of extracting a 3D structure from texture flows and using it to process general curved document images”, in the Specification para. 0020, pose parameters is defined as “The pose parameters 220 specify the pose of the object 204 within the captured image 202. The pose of the object 204 is the 3D spatial position and orientation of the object 204 relative to a reference camera.”, Liang teaches extracting the 3D structure from texture flows in the captured image as well as the orientation of the text, Liang also teaches obtaining Vh and Vv or the vanishing points of the major and minor texture flow tangent lines in Section 3.3.1); and
inverting the initial pose matrix (Liang, Section 3.3.2, Liang teaches an initial pose matrix in Section 3.3.3.2 equation 5) to produce the pose matrix (Liang, Section 3.3.2, “The inverse of H maps every point in the image plane back to the frontal-flat view of the document page and is called the rectification matrix”).

Regarding claim 7, the combination of Liang in view of Clark discloses the method of claim 5 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein determining the image space 2D coordinates (Liang, Section 3.3.2, Metric Rectification, “Suppose that we set up a 2D coordinate system in the document plane so that the x-axis is aligned with Vh, whereas the y-axis is (must be) aligned with Vv. Every point on the document plane thus has a 2D coordinate.”) further comprises:
evaluating the parameterized surface model definition (Liang teaches this in Section 2.4, developable surface model which constrain the 3D shape estimation process, “. Section 3.4.1, “Developable surfaces represent particular cases of a more general class of surfaces called ruled surfaces. Ruled surfaces are envelopes of a one-parameter family of straight lines (called rulings) in 3D space, and each ruling lies entirely on the underlying surface. In other words, a ruled surface is the locus of a moving line in 3D space”, Fig. 7) at each point of the uv texture sample grid (Liang, Fig. 3, texture flow grid) to generate a corresponding plurality of model space 3D coordinates (Liang, Fig. 11 and Fig. 12 shows the corresponding 3D point from the 2D coordinates, Section 3.4.2, “we distinguish 2D texture flows and their 3D counterparts”);
transforming, the model space 3D coordinates using the pose matrix (Liang, Section 3.3.2, Vv and Vh are used to construct the matrix or homogeneous transformation H or equation 5, the 2D coordinates that was used to construct the pose matrix is from the 3D coordinate that was extracted by Liang in Section 3.2.1, the vanishing points are also used) to produce a corresponding plurality of camera space 3D coordinates (Liang, Section 3.4.2, Projected Ruling Estimation, “We call the projections of 3D rulings in the image projected rulings or 2D rulings. Similarly, we distinguish 2D texture flows and their 3D counterparts. In this section, we describe our method of detecting 2D rulings using 2D texture flow fields in document images.”, Fig. 8); and 
projecting the camera space 3D coordinates using the camera properties to produce the image space 2D coordinates (Liang, Section 3.4.2, “We call the projections of 3D rulings in the image projected rulings or 2D rulings”).

Regarding claim 8, the combination of Liang in view of Clark and in further view of Liang discloses the method of claim 7 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein transforming the model space 3D coordinates using the pose matrix (Liang, Section 3.4.2, Projected Ruling Estimation, “We call the projections of 3D rulings in the image projected rulings or 2D rulings. Similarly, we distinguish 2D texture flows and their 3D counterparts. In this section, we describe our method of detecting 2D rulings using 2D texture flow fields in document images.”, Fig. 8) comprises: 
transforming the model space 3D coordinates (Liang, Section 3.4.2, Projected Ruling Estimation, “We call the projections of 3D rulings in the image projected rulings or 2D rulings. Similarly, we distinguish 2D texture flows and their 3D counterparts. In this section, we describe our method of detecting 2D rulings using 2D texture flow fields in document images.”, Fig. 8) using an inverted pose matrix of the pose matrix (Liang teaches inverting the pose matrix to map every point of the matrix back to the frontal-flat view in Section 3.3.2).

Regarding claim 9, the combination of Liang in view of Clark discloses the method of claim 5 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein the surface corresponds to a planar surface of the object (Liang, Section 3.3, Rectification of Planar Documents, Fig. 3.a is a planar page).

Regarding claim 10, the combination of Liang in view of Clark discloses the method of claim 9 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein determining the uv minima and maxima (Liang, Section 3.2.1, “we detect the local text line and vertical character stroke directions, which we define as the major and minor texture flows, respectively”) based on the pose matrix (Liang, Section 3.3.2, Vv and Vh are used to construct the matrix or homogeneous transformation H or equation 5, the 2D coordinates that was used to construct the pose matrix is from the 3D coordinate that was extracted by Liang in Section 3.2.1, the vanishing points are also used), the parameterized surface model definition (Liang, Section 1, “Under these assumptions, we show that we can constrain the physical page by a developable surface model, obtain a planar-strip approximation of the surface using texture flow data extracted from the text in the image, and use the 3D shape information to restore the frontal-flat document view., Section 2.4, “we use the developable-surface model to constrain the 3D shape estimation process.”), and the camera properties (Liang, Section 3.4.43, “As for the focal length, we select a set of feasible values based on the physical lens constraint and perform the above process for each value. The f that results in minimum F is chosen as the best initial focal length”)comprises: 
generating a plurality of camera space 3D frustum rays (Liang, Fig. 8 and Fig. 11 shows 3D frustum rays) using the camera properties (Liang, Section 3.3.2, “Consider an arbitrary point (x0, y0) in the image plane. In the camera’s 3D coordinate system, its position is x0,y0,f)”, f is the camera focal length; 23Attorney docket no. 1078.001 US1 
transforming the 3D frustum rays using the pose matrix to produce a corresponding plurality of model space 3D frustum rays (Liang, Section 3.3.2, “homogeneous transformation from document plane to image plane is the concatenation”,  equation 5); 
determining intersections of the model space 3D frustum rays with the parameterized surface model definition  (Liang, Section 1, “we can constrain the physical page by a developable surface model, obtain a planar-strip approximation of the surface using texture flow data extracted from the text in the image, and use the 3D shape information to restore the frontal-flat document view”, the model and 3D shape information are used with each other) to produce a corresponding plurality of uv parameter space intersection points (Liang, Section 5.2, parameter selection, Fig. 12 shows intersection points); and 
determining minima and maxima of the uv parameter space intersection points (Section 3.3.1, “we assume that the principal point rests at the image center. This is usually true unless the image is cropped. Under this assumption, suppose that the two vanishing points are vh = (xh, yh)Tand vv = (xv, yv)T. Then, the 3D directions of the major and minor texture flows in the camera coordinate system are given by equation 1 where f is the focal length, Fig. 8 show the texture flow vector as well as the intersection points).
Regarding claim 11, the combination of Liang in view of Clark in further view of Liang discloses the method of claim 10 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein transforming the 3D frustum rays using the pose matrix (Liang, Section 3.4.2, Projected Ruling Estimation, “We call the projections of 3D rulings in the image projected rulings or 2D rulings. Similarly, we distinguish 2D texture flows and their 3D counterparts. In this section, we describe our method of detecting 2D rulings using 2D texture flow fields in document images.”, Fig. 8) comprises: 
transforming the 3D frustum rays (Liang, Section 3.3.2, “homogeneous transformation from document plane to image plane is the concatenation”,  equation 5) using an inverted pose matrix of the pose matrix (Liang teaches inverting the pose matrix to map every point of the matrix back to the frontal-flat view in Section 3.3.2).

Regarding claim 12, the combination of Liang in view of Clark discloses the method of claim 5 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein the surface corresponds to a cylindrical surface of the object (Section 3.4, Rectification of Curved Documents, Fig. 3.b, open book with cylindrical shape).

Regarding claim 13, the combination of Liang in view of Clark in view of Liang discloses the method of claim 12 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein determining the uv minima and maxima (Liang, Section 3.2.1, “we detect the local text line and vertical character stroke directions, which we define as the major and minor texture flows, respectively”) based on the pose matrix (Liang, Section 3.3.2, Vv and Vh are used to construct the matrix or homogeneous transformation H or equation 5, the 2D coordinates that was used to construct the pose matrix is from the 3D coordinate that was extracted by Liang in Section 3.2.1, the vanishing points are also used), the parameterized surface model definition (Liang, Section 1, “Under these assumptions, we show that we can constrain the physical page by a developable surface model, obtain a planar-strip approximation of the surface using texture flow data extracted from the text in the image, and use the 3D shape information to restore the frontal-flat document view., Section 2.4, “we use the developable-surface model to constrain the 3D shape estimation process.”), and the camera properties (Liang, Section 3.4.43, “As for the focal length, we select a set of feasible values based on the physical lens constraint and perform the above process for each value. The f that results in minimum F is chosen as the best initial focal length”) comprises: 
generating a plurality of camera space 3D frustum rays (Liang, Fig. 8 and Fi. 11 shows 3D frustum rays)using the camera properties (Liang, Section 3.3.2, “Consider an arbitrary point (x0, y0) in the image plane. In the camera’s 3D coordinate system, its position is x0,y0,f)”, f is the camera focal length); 
transforming the 3D frustum rays using the pose matrix to produce a corresponding plurality of model space 3D frustum rays (Liang, Section 3.3.2, “homogeneous transformation from document plane to image plane is the concatenation”,  equation 5); 
determining a plurality of model space planes using the 3D frustum rays (Liang, Fig. 8, Fig. 11, and Fig. 12);
determining intersections of the model space planes with the parameterized surface model definition to produce v values (Liang, Section 1, “we can constrain the physical page by a developable surface model, obtain a planar-strip approximation of the surface using texture flow data extracted from the text in the image, and use the 3D shape information to restore the frontal-flat document view”, the model and 3D shape information are used with each other, v or y values are produces in Section 3.3.2, equation 8); 
selecting u values of the parameterized surface model definition using the pose matrix (Liang, Section 5.2, parameter selection); and 
determining minima and maxima of the produced v values and the selected u values (Section 3.3.1, “we assume that the principal point rests at the image center. This is usually true unless the image is cropped. Under this assumption, suppose that the two vanishing points are vh = (xh, yh)Tand vv = (xv, yv)T. Then, the 3D directions of the major and minor texture flows in the camera coordinate system are given by equation 1 where f is the focal length, Fig. 8 show the texture flow vector as well as the intersection points).

Regarding claim 14, the combination of Liang in view of Clark in further view of Liang discloses the method of claim 13 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein transforming the 3D frustum rays (Liang, Section 3.4.2, Projected Ruling Estimation, “We call the projections of 3D rulings in the image projected rulings or 2D rulings. Similarly, we distinguish 2D texture flows and their 3D counterparts. In this section, we describe our method of detecting 2D rulings using 2D texture flow fields in document images.”, Fig. 8) comprises: 
transforming the 3D frustum rays (Liang, Section 3.3.2, “homogeneous transformation from document plane to image plane is the concatenation”,  equation 5) using an inverted pose matrix of the pose matrix (Liang teaches inverting the pose matrix to map every point of the matrix back to the frontal-flat view in Section 3.3.2).

Regarding claim 15, the combination of Liang in view of Clark in further view of Liang discloses the method of claim 13 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein selecting the u values of the parameterized surface model definition using the pose matrix (Liang, Section 5.2, parameter selection) comprises: 
selecting the u values of the parameterized surface model definition (Liang, Section 5.2, parameter selection) using an inverted pose matrix of the pose matrix (Liang teaches inverting the pose matrix to map every point of the matrix back to the frontal-flat view in Section 3.3.2).

Regarding claim 16, the combination of Liang in view of Clark discloses the method of claim 13 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein selecting the u values of the parameterized surface model definition (Liang, Section 5.2, parameter selection) comprises: 
identifying a range of the u values in which a normal vector to the cylindrical surface has a negative dot product with a vector extending from an eye point of the camera to a corresponding location of the cylindrical surface (Liang, Section 3.3.2, the matrix or vector K is multiplied to 3D point P in the camera’s coordinate system which results to vector or matrix H. “The inverse of H maps every point in the image plane back to the frontal-flat view of the document page and is called the rectification matrix”).

Regarding claim 17, the combination of Liang in view of Clark discloses the method of claim 1 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), further comprising: 
performing an action (Liang, Section 4.2, “, “we use the OCR performance to measure the image quality from an application point of view. That is, we apply OCR to the original flat document, the synthetic curved document, and the rectified document”) in relation to the object (Liang, Fig. 4, the object is the text on the image) within the captured image (Section 2.3, “we assume a single image as the input”, Abstract, “camera-captured documents”) based on the texture image corresponding to the captured image (Liang, Fig. 15, rectification results from different images).

Regarding claim 18, the combination of Liang in view of Clark discloses the method of claim 1 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein performing the action (Liang, Section 4.2, “, “we use the OCR performance to measure the image quality from an application point of view. That is, we apply OCR to the original flat document, the synthetic curved document, and the rectified document”) in relation to the object (Liang, Fig. 4, the object is the text on the image) within the captured image (Section 2.3, “we assume a single image as the input”, Abstract, “camera-captured documents”) based on the texture image corresponding to the captured image (Liang, Fig. 15, rectification results from different images) comprises: 
performing optical character recognition (OCR) on the texture image (Liang, Section 4.2, “, “we use the OCR performance to measure the image quality from an application point of view. That is, we apply OCR to the original flat document, the synthetic curved document, and the rectified document”).

Regarding claim 21, the combination of Liang in view of Clark discloses the method of claim 1 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein performing the action (Liang, Section 4.2, “, “we use the OCR performance to measure the image quality from an application point of view. That is, we apply OCR to the original flat document, the synthetic curved document, and the rectified document”) in relation to the object (Liang, Fig. 4, the object is the text on the image) within the captured image (Section 2.3, “we assume a single image as the input”, Abstract, “camera-captured documents”) based on the texture image corresponding to the captured image (Liang, Fig. 15, rectification results from different images) comprises: 
identifying the object (Liang, Fig. 4, the object is the text on the image) within the captured image (Section 2.3, “we assume a single image as the input”, Abstract, “camera-captured documents”) by performing image processing (Liang, Section 4.2, “, “we use the OCR performance to measure the image quality from an application point of view. That is, we apply OCR to the original flat document, the synthetic curved document, and the rectified document”, OCR is the image processing that was performed) on the texture image (Liang, Fig. 15, rectification results from different images).

Claim 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Liang in view of  Clark and in further view of Greenhow et al. (“Adaptive image downsampling preprocessor for artificial neural networks”, July 2017), hereinafter referred to as Greenhow.

Regarding claim 2, the combination of Liang in view of Clark discloses the method of claim 1 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”).

The combination of Liang in view of Clark does not explicitly disclose performing, by the processor, preprocessing on the captured image, wherein the preprocessed captured image is provided as input to the machine learning model and the pose parameters is received as output from the machine learning model.
	However, Greenhow teaches  disclose performing, by the processor (Liang teaches a mobile cellphone which has a processor), preprocessing on the captured image (Fig. 6, preprocessor for input image), wherein the preprocessed captured image (Fig. 6, preprocessor for input image) is provided as input to the machine learning model (Fig. 6, after the image goes through the preprocessor, it goes to a neural network) and the pose parameters is received as output from the machine learning model (Clark teaches para. 0042, “with the exception that an Object Segmentation & Classification Machine Learning algorithm, 107, which has been specifically trained to recognize a floor object in a 2D scene image, is executed concurrently with the standard scene depth-sensing CNN, 102a, and the Imaging Device AFOV determination steps, 103a. In the exemplary flow diagram depicted, a trained CNN is the Machine Learning algorithm applied in step 107. The outputs of steps 102a, 103s, and 107 and used in step 104a to calculate a contextually refined pose estimate for the imaging device”, the CNN is used to calculate a contextually refined pose estimate).
	Greenhow is considered to be analogous to the claimed invention because it is in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by the combination of Liang in view of Clark to incorporate the teachings of Greenhow of performing, by the processor, preprocessing on the captured image, wherein the preprocessed captured image is provided as input to the machine learning model. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because preprocessing technique can, by itself, contribute to improving the accuracy of an ANN based system  (Greenhow, Section V.C).

Regarding claim 3, the combination of Liang in view of Clark discloses the method of claim 2 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein performing the preprocessing on the capture image (Greenhow, Fig. 6, preprocessor for input image) comprises: 
downscaling a resolution of the captured image (Greenhow, Section III. C “A Regional Downsampling Algorithm (RDA) is a image processing algorithm designed to reduce the dimensionality of an image using a sequence of regions”)


Claim 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Liang in view of  Clark and in further view of Rodriguez et al. (WO2014063157A2), hereinafter referred to as Rodriguez.

Regarding claim 19, the combination of Liang in view of Clark discloses the method of claim 1 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein performing the action (Liang, Section 4.2, “, “we use the OCR performance to measure the image quality from an application point of view. That is, we apply OCR to the original flat document, the synthetic curved document, and the rectified document”) in relation to the object (Liang, Fig. 4, the object is the text on the image) within the captured image (Section 2.3, “we assume a single image as the input”, Abstract, “camera-captured documents”) based on the texture image corresponding to the captured image (Liang, Fig. 15, rectification results from different images).

The combination of Liang in view of Clark does not explicitly disclose detecting a watermark within the texture image.
	However, Rodriguez teaches detecting a watermark within the texture image (page 7 line 27-28, “the object-identifying information can be a machine-readable identifier, such as a barcode or a steganographic digital watermark” Rodriguez also teaches perspective distortion of the surface in Fig. 21 and cylindrical surface in Fig. 22 and Fig. 24 shows the reoriented label, Abstract, “Crinkles and other deformations in product packaging can be optically sensed, allowing such surfaces to be virtually flattened to aid identification”).
	Rodriguez is considered to be analogous to the claimed invention because it is in the same field of barcode or watermark identification. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by the combination of Liang in view of Clark to incorporate the teachings of Rodriguez of detecting a watermark within the texture image. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because product packaging is digitally watermarked over most of its extent to facilitate high-throughput item identification at retail checkouts (Rodriguez, Abstract).

Regarding claim 20, the combination of Liang in view of Clark discloses the method of claim 1 (Liang, Section 3.2.1, “our method of extracting a 3D structure from texture flows and using it to process general curved document images”), wherein performing the action (Liang, Section 4.2, “, “we use the OCR performance to measure the image quality from an application point of view. That is, we apply OCR to the original flat document, the synthetic curved document, and the rectified document”) in relation to the object (Liang, Fig. 4, the object is the text on the image) within the captured image (Section 2.3, “we assume a single image as the input”, Abstract, “camera-captured documents”) based on the texture image corresponding to the captured image (Liang, Fig. 15, rectification results from different images).

The combination of Liang in view of Clark does not explicitly disclose decoding a barcode within the texture image.
However, Rodriguez teaches decoding a barcode within the texture image.
 (page 7 line 27-28, “the object-identifying information can be a machine-readable identifier, such as a barcode or a steganographic digital watermark” Rodriguez also teaches perspective distortion of the surface in Fig. 21 and cylindrical surface in Fig. 22 and Fig. 24 shows the reoriented label, Abstract, “Crinkles and other deformations in product packaging can be optically sensed, allowing such surfaces to be virtually flattened to aid identification”).
	Rodriguez is considered to be analogous to the claimed invention because it is in the same field of barcode or watermark identification. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by the combination of Liang in view of Clark to incorporate the teachings of Rodriguez of decoding a barcode within the texture image. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because widespread use of barcodes has greatly simplified supermarket checkout. (Rodriguez, page 1 line 24).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENISE G ALFONSO/Examiner, Art Unit 2663                     

/CLAIRE X WANG/Supervisory Patent Examiner, Art Unit 2663