DETAILED ACTION

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/16/2021 has been entered.

Response to Arguments
Applicant’s arguments with respect to claim1, 9, and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 5, 9, 10, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jacobs et al. (US Pub. No. 2005/0259866 A1) in view of Wshah et al. (US Pub. No. 2017/0372174 A1) in view of Liu[1] et al. (US Pub. No. 2019/0147304 A1) and in further view of  Liu[2] et al. (“Large-Margin Softmax Loss for Convolutional Neural Networks”).
Regarding claim 1, Jacobs discloses, a method, comprising: receiving, by a computer system, a grapheme image; (See Jacobs, “At 1702, a bitmap of character image is input.  In this case, the bitmap is the 29.times.29 pixel receptive field.”)
computing, by a neural network, a feature vector representing the grapheme image in a space of image features, wherein the convolutional neural network includes multiple 
and computing a confidence vector associated with the grapheme image, (See Jacobs ¶74, “At 1704, “the network outputs a list of probabilities that indicates what characters are likely to be represented by the image.” As shown in Fig. 16, there are 76 output units that would create the 76 length vector output.)
	Jacobs discloses a neural network with multiple convolutional layers but he fails to disclose additional alternating pooling layers.
However Wshah discloses, wherein the convolutional neural network includes multiple alternating sets of convolutional layers and pooling layers (See Wshah ¶32, “The convolutional neural network (CNN), according to systems and methods herein, is illustrated in FIGS. 3A-3F, and is generally referred to as 200.  The CNN 200 includes a plurality of layers 203.” Further see Wshah ¶33, “FIG. 4 shows an exemplary convolutional layer (or module), according to systems and methods herein.  1.times.1 convolutions are used to compute reductions before using 3.times.3 and 5.times.5 convolutions.  In addition, a pooling path is added in each module.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute the CNN architecture as disclosed by Wshah for the CNN architecture used for OCR as suggested by Jacobs using known engineering techniques, with a reasonable expectation of success. The motivation for doing so as disclosed by Wshah in ¶30 is because “The GoogLeNet achieved the best results for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).  Comparing to other CNN architectures, GoogLeNet is a 
Jacobs and Wshah disclose using a convolutional neural network to classify characters but they fail to disclose that a neural network can be used to determine distances between feature vectors in order to determine a probability vector.
However Liu discloses, wherein each element of the confidence vector reflects a distance, in the space of image features, between the feature vector and a center of a class of a set of classes, wherein the class is identified by an index of the element of the confidence vector.  (See Liu[1] ¶73, “As an example of a font probability vector, if the first set of text images 202 includes 600 fonts, the higher neural network layers 214 outputs a 600-dimensional font probability vector with entries ranging between zero and one (i.e., [0-1]).  Each dimensional in the font probability vector provides a correspondence (e.g., matching probability based on vector space distance) between the feature vectors of an input font and the feature vectors of each font in the first set of text images 202.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the determination of a probability vector based on distances between features vectors as suggested by Liu[1] to Jacobs and Wshah’s classification of a character with a neural network. This can be done using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to accurately determine the probability of each class in a set of classes.

However Liu[2] discloses, computing a confidence vector associated with the grapheme image, wherein each element of the confidence vector is produced by a monotonically decreasing function of a distance, in the space of image features, between the feature vector and a center of a class of a set of classes. (See Liu p. 3 section 3.2, where the L-Softmax loss function (eq. 4)  uses the function in (eq. 5) which is a monotonically decreasing function as shown in Fig. 3. Note Also that a regular Softmax Loss function  is also monotonically decreasing as shown in Fig. 3.  
In the loss function the exponential term is the same as the confidence value. As shown in Fig. 4, W is a line located at the center of the features, and the angle is the distance between W and the features.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to substitute the L-Softmax loss and its associated confidence function which are monotonically decreasing as suggested by Liu[2] for Liu[1]’s Softmax loss function using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is disclosed by Liu on p. 4 3.4 Discussion, “The L-Softmax loss defines a relatively difficult learning objective with adjustable margin (difficulty). A difficult learning objective can effectively avoid overfitting and take full advantage of the strong learning ability from deep and wide architectures. The L-Softmax loss can be easily used as a drop-in replacement for standard loss.”

Regarding claim 5, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the method of claim 1, wherein each class of the set of classes corresponds to a character of an alphabet. (See Jacobs ¶44, “The present invention facilitates the capture and accurate optical character recognition (OCR) of symbols and text when using low resolution symbol and/or document images.

Regarding claim 9, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, a system, comprising: a memory; a processor, coupled to the memory, the processor configured to: (See Liu ¶143, “For example, the components 506-524 can include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices, such as a client device or server device.”)
compute, by a neural network, a feature vector representing the grapheme image in a space of image features, wherein the convolutional neural network includes multiple alternating sets of convolutional layers and pooling layers producing a plurality of feature maps, wherein each feature map of the plurality of feature maps corresponds to a particular image feature of a plurality of image features; compute a confidence vector associated with the grapheme image, wherein each element of the confidence vector produced by a monotonically decreasing function of a distance, in the space of image features, between the feature vector and a center of a class of a set of classes, wherein the class is identified by an index of the element of the confidence vector; identify an element having a maximum value among elements of the confidence vector; and 

Regarding claim 10, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the system of claim 9, wherein each class of the set of classes corresponds to a character of an alphabet. (See the rejection of claim 5 as it is equally applicable for claim 10 as well.)

Regarding claim 15, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, a computer-readable non-transitory storage medium comprising executable instructions that, when executed by a computer system, cause the computer system to: (See Liu ¶143, “For example, the components 506-524 can include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices, such as a client device or server device.”)
receive a grapheme image; compute, by a neural network, a feature vector representing the grapheme image in a space of image features, wherein the convolutional neural network includes multiple alternating sets of convolutional layers and pooling layers producing a plurality of feature maps, wherein each feature map of the plurality of feature maps corresponds to a particular image feature of a plurality of image features; and compute a confidence vector associated with the grapheme image, wherein each element of the confidence vector is produced by a monotonically decreasing function of a distance, in the space of image features, between the feature vector and a center of a class of a set of classes, wherein the class is identified by an 

Regarding claim 19, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the computer-readable non-transitory storage medium of claim 15, wherein each class of the set of classes corresponds to a character of an alphabet.  (See the rejection of claim 5 as it is equally applicable for claim 19 as well.)

Claims 2, 3, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jacobs et al. (US Pub. No. 2005/0259866 A1) in view of Wshah et al. (US Pub. No. 2017/0372174 A1) in view of Liu[1] et al. (US Pub. No. 2019/0147304 A1) in view of Liu[2] et al. (“Large-Margin Softmax Loss for Convolutional Neural Networks”) and in further view of Lee et al. (US Pat. No. 8,553,968 B1).
Regarding claim 2, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the method of claim 1, where a vector probability is calculated, but they fail to disclose the following limitations.
However Lee discloses, further comprising: identifying an element having a maximum value among elements of the confidence vector; and associating the grapheme image with a grapheme class corresponding to the identified element of the confidence vector.  (See Lee 6:20-23, “wherein a selector 84 selects the maximum confidence metric as the detected output character. If at step 86 of FIG. 4B the 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the select the maximum confidence as suggested by Lee to Jacobs, Wshah Liu[1], and Liu[2] probability vector using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to obtain the most accurate character classification.

Regarding claim 3, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the method of claim 1, but they fail to disclose the following limitations.
However Lee discloses, further comprising: identifying an element having a maximum value among elements of the confidence vector; and responsive to determining that the maximum value falls below a threshold, returning an error code indicating that the grapheme image is not recognizable.  (See Lee 6:24-26, “If the maximum confidence metric is less than the predetermined threshold, the OCR system 20 outputs an erasure pointer.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the select the maximum confidence as suggested by Lee to Jacobs, Wshah, Liu[1], and Liu[2]’s probability vector using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to ensure that only high quality recognition of characters are classified by removing poor quality recognitions. 

Regarding claim 16, Jacobs, Wshah, Liu[1], Liu[2], and Lee disclose, the computer-readable non-transitory storage medium of claim 15, further comprising executable instructions causing the computer system to: identify an element having a maximum value among elements of the confidence vector; and associate the grapheme image with a grapheme class corresponding to the identified element of the confidence vector.  (See the rejection of claim 2 as it is equally applicable for claim 16 as well.)

Regarding claim 17, Jacobs, Wshah, Liu[1], Liu[2], and Lee disclose, the computer-readable non-transitory storage medium of claim 15, further comprising executable instructions causing the computer system to: identify an element having a maximum value among elements of the confidence vector; and responsive to determining that the maximum value falls below a threshold, return an error code indicating that the grapheme image is not recognizable.  (See the rejection of claim 3 as it is equally applicable for claim 17 as well.)

Claims 4, 14, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jacobs et al. (US Pub. No. 2005/0259866 A1) in view of Wshah et al. (US Pub. No. 2017/0372174 A1) in view of Liu[1] et al. (US Pub. No. 2019/0147304 A1) in view of Liu[2] et al. (“Large-Margin Softmax Loss for Convolutional Neural Networks”) and in further view of Cummins et al. (US Pub. No. 2015/0055866 A1).
Regarding claim 4, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the method of claim 1, but the fail to disclose the following limitations.

and repeating, for the second grapheme image, operations of computing the feature vector and computing the confidence vector. (See Cummins ¶17 performing a second character recognition on at least a portion of the second segmented portion of the image to produce a second sequence of characters.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the second segmentation after an initial classification as suggested by Cummins to Jacobs, Wshah, Liu[1], and Liu[2] ’s character classification using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to correct the segmentation of characters if there is an error in the initial segmentation.

Regarding claim 14, Jacobs, Wshah, Liu[1], Liu[2], and Cummins disclose, the system of claim 9, wherein the processor is further configured to: perform, in view of the confidence vector, segmentation of an original image to produce a second grapheme image; and repeat, for the second grapheme image, operations of computing the 

Regarding claim 18, Jacobs, Wshah, Liu[1], Liu[2],  and Cummins disclose, the computer-readable non-transitory storage medium of claim 16, further comprising executable instructions causing the computer system to: perform, in view of the confidence vector, segmentation of an original image to produce a second grapheme image; and repeat, for the second grapheme image, operations of computing the feature vector and computing the confidence vector.  (See the rejection of claim 4 as it is equally applicable for claim 18 as well.)

Claims 6 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Jacobs et al. (US Pub. No. 2005/0259866 A1) in view of Wshah et al. (US Pub. No. 2017/0372174 A1) in view of Liu[1] et al. (US Pub. No. 2019/0147304 A1) in view of Liu[2] et al. (“Large-Margin Softmax Loss for Convolutional Neural Networks”) and in further view of Chen et al. (US Pub. No. 2018/0077181 A1).
Regarding claim 6, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the method of claim 1, wherein the neural network comprises a fully-connected layer employed to produce a set of class weights, such that each class weight characterizes a degree of association of the grapheme image with a certain class of a set of classes, (See Jacobs ¶72, “The last two layers (1608 and 1610) are fully connected, and can be viewed as forming a multipurpose classifier, since a 2-layer fully connected neural network can learn any function.”)

Jacobs, Wshah, Liu[1], and Liu[2]  disclose using a neural network for character recognition, but they fail to disclose the function that is used as part of the softmax layer.
However Chen discloses, and wherein the method further comprises: computing, using a normalized exponential transformation, (See Chen ¶95, “In machine-learned neural networks, the softmax function is often implemented at the final layer of a network used for classification.  Such networks are then trained under a log loss (or cross-entropy) regime, giving a non-linear variant of multinomial logistic regression.  The softmax function, or normalized exponential is a generalization of the logistic function that "squashes" a K-dimensional vector z of arbitrary real values to a K-dimensional vector .sigma.(z) of real values in the range (0, 1).”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the normalized exponential function in the softmax layer of a neural network as suggested by Chen to Jacobs, Wshah, Liu[1], and Liu[2] ’s neural network classification of a character using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to squash neural network output scores in to values of the range from 0 to 1.

Regarding claim 11, Jacobs, Wshah, Liu[1], Liu[2], and Chen disclose, the system of claim 9, wherein the neural network comprises a fully-connected layer employed to produce a set of class weights, such that each class weight characterizes a degree of association of the grapheme image with a certain class of a set of classes, and wherein the processor is further configured to: compute, using a normalized exponential transformation, a set of probabilities corresponding to the set of class weights, such that each probability characterizes a hypothesis of the grapheme image representing an instance of a certain class of the set of classes.  (See the rejection of claim 6 as it is equally applicable for claim 11 as well.)

Claims 7 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Jacobs et al. (US Pub. No. 2005/0259866 A1) in view of Wshah et al. (US Pub. No. 2017/0372174 A1) in view of Liu[1] et al. (US Pub. No. 2019/0147304 A1) in view of Liu[2] et al. (“Large-Margin Softmax Loss for Convolutional Neural Networks”)  and in further view of Jiang et al. (US Pub. No. 2019/0065606 A1).
Regarding claim 7, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the method of claim 6, but they fail to disclose the following limitations.
However Jiang discloses, wherein the confidence vector is determined for a subset of classes associated with highest probability values. (See Jiang ¶55, “In certain embodiments, the category selection module 306 can be configured to select a pre-determined number of categories.  For example, the pre-determined number may be 4.  In the example above, the category selection module 306 could select the four highest 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the selection of a subset of classes as suggested by Jiang to Jacobs, Wshah, Liu[1], and Liu[2] ’s classification of a character using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is to present a list of ranked characters to a user, if they want to manually correct the character recognition.[AltContent: textbox ()] Regarding claim 12, Jacobs, Wshah, Liu[1], Liu[2], and Jiang disclose, the system of claim 11, wherein the confidence vector is determined for a subset of classes associated with highest probability values. (See the rejection of claim 7 as it is equally applicable for claim 12 as well.)

Claims 8, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jacobs et al. (US Pub. No. 2005/0259866 A1) in view of Wshah et al. (US Pub. No. 2017/0372174 A1) in view of Liu[1] et al. (US Pub. No. 2019/0147304 A1) in view of Liu[2] et al. (“Large-Margin Softmax Loss for Convolutional Neural Networks”) and in further view of Wen et al. (“A Discriminative Feature Learning Approach for Deep Face Recognition”).
Regarding claim 8, Jacobs, Wshah, Liu[1], and Liu[2]  disclose, the method of claim 1, but they fail to disclose the following limitations.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the  as suggested by linear combination of a cross entropy loss function and a center loss function to Jacobs, Wshah, Liu[1], and Liu[2] ’s neural networking loss function using known engineering techniques, with a reasonable expectation of success. The motivation for doing so as disclosed by as disclosed by Wen on p. 1 Abstract, is in order “to obtain deep features with … inter-class dispension and intra-class compactness as much as possible.” 

Regarding claim 13, Jacobs, Wshah, Liu[1], Liu[2], and Wen disclose, the system of claim 9, wherein the processor is further configured to: train the neural network using a loss function represented by a linear combination of a cross entropy loss function and a center loss function.  (See the rejection of claim 8 as it is equally applicable for claim 13 as well.)

Regarding claim 20, Jacobs, Wshah, Liu[1], Liu[2], and Wen disclose, the computer-readable non-transitory storage medium of claim 15, further comprising executable instructions causing the computer system to: train the neural network using 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID PERLMAN whose telephone number is        (571) 270-1417. The examiner can normally be reached on Monday - Friday; 10:00am - 6:30pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/DAVID PERLMAN/Primary Examiner, Art Unit 2662