DETAILED ACTION


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 6, 2018 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.


35 USC § 101 Statutory Analysis
The claims do not recite any of the judicial exceptions enumerated in the 2019 Revised Patent Subject Matter Eligibility Guidance. Further, the claims do not recite any method of organizing human activity, such as a fundamental economic concept or managing interactions between people. Finally, the claims do not recite a mathematical relationship, formula, or calculation. Thus, the claims are eligible because they do not recite a judicial exception.


Claim Rejections - 35 USC § 102
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. §102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-28 are rejected under 35 U.S.C. §102(a)(1) as being anticipated by Kliger et al. (U.S. Patent Application Publication No. US 2018/0336439 A1) (hereafter referred to as “Kliger”).  
	With regard to claim 1, Kliger describes inputting input data to a neural network (see Figure 1 and refer for example to paragraphs [0022] and [0024]); determining respective mapping functions corresponding to a multiclass output of the neural network in association with the input data, including determining a mapping function of a first class and a mapping function of a second class (see Figure 1 and refer for example to paragraph [0024]); acquiring a result of a loss function including a first probability component that changes correspondingly to a function value of the mapping function of the first class and a second probability component that changes contrastingly to a function value of the mapping function of the second class (see Figure 1 and refer for example to paragraphs [0021] and [0031]); determining a gradient of loss corresponding to the input data based on the result of the loss function (see Figure 1 and refer for example to paragraph [0024]); and updating a parameter of the neural network based on the determined gradient of loss for generating a trained neural network based on the updated parameter (see Figure 2 and refer to paragraphs [0037] and [0038], where the adjusting of a parameter of the discriminator corresponds to applicant’s “updating a parameter of the neural network based on the determined gradient of loss”).
As to claim 2, Kliger describes wherein the first probability component increases with respect to increases in the function value of the mapping function of the first class and the second probability component decreases with respect to increases in the function value of the mapping function of the second class (refer for example to paragraphs [0027] and [0030]).
In regard to claim 3, Kliger describes wherein the first probability component is based on a probability function associated with the mapping function of the first class and the second probability component is based on an inverse probability function associated with the mapping function of the second class (refer to paragraph [0029]).
With regard to claim 4, Kliger describes wherein, in response to the parameter of the neural network being updated iteratively, a monotonically increasing relationship is established between the mapping function of the first class and a conditional probability of input data of the first class being recognized as the first class (see Figure 2 and refer to paragraphs [0037] and [0038], where the adjusting of a parameter of the discriminator corresponds to applicant’s “updating a parameter of the neural network based on the determined gradient of loss”).
As to claim 5, Kliger describes wherein the loss function corresponds to an equation as follows:

    PNG
    media_image1.png
    110
    534
    media_image1.png
    Greyscale


where L2a denotes the loss function, i and j denote respective classes, c denotes of classes, x denotes input data, Xi denotes input data of a class i, σ denotes a sigmoid function, fi(x) denotes a mapping function of the class i, and fj(x) denotes a mapping function of a class j (refer for example to paragraph [0024] and see equation 2).
In regard to claim 6, Kliger describes further comprising
acquiring another loss function, wherein the other loss function corresponds to another equation as follows:

    PNG
    media_image2.png
    72
    318
    media_image2.png
    Greyscale

determining another gradient of loss corresponding to the input data based on the other loss function; and updating another parameter of the neural network based on the determined other gradient of loss (refer for example to paragraphs [0024] and [0025], see equations 3 and 4).
With regard to claim 7, Kliger describes wherein the loss function corresponds to an equation as follows:

    PNG
    media_image3.png
    87
    524
    media_image3.png
    Greyscale

where L2b denotes the loss function, i and j denote respective classes, c denotes of classes, x denotes input data, Xi denotes input data of a class i, σ denotes a sigmoid function, fi(x) denotes a mapping function of the class i, and fj(x) denotes a mapping function of a class j (refer for example to paragraph [0024] and see equation 2).
As to claim 8, Kliger describes further comprising acquiring another loss function, wherein the other loss function corresponds to another equation as follows:

    PNG
    media_image2.png
    72
    318
    media_image2.png
    Greyscale

determining another gradient of loss corresponding to the input data based on the other loss function; and updating another parameter of the neural network based on the determined other gradient of loss (refer for example to paragraphs [0024] and [0025], see equations 3 and 4).
In regard to claim 9, Kliger describes wherein the loss function corresponds a contrastive loss function, and the method further comprises determining another gradient of loss corresponding to the input data based on a cross-entropy loss function, and updating another parameter of the neural network based on the determined other gradient of loss for the generating of the trained neural network (refer for example to paragraph [0032]).
With regard to claim 10, Kliger describes wherein the updating comprises adjusting the parameter of the neural network in a direction opposite to a direction of the determined gradient of loss (see Figure 2 and refer to paragraphs [0037] and [0038], where the adjusting of a parameter of the discriminator corresponds to applicant’s “updating a parameter of the neural network based on the determined gradient of loss”).
As to claim 11, Kliger describes a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 (refer for example to paragraphs [0046] and [0048]).
In regard to claim 12, Kliger describes inputting input data respectively to each of a first neural network portion of a neural network trained using a first loss function and a second neural network portion of the neural network trained using a second loss function (see Figure 1 and refer for example to paragraphs [0022], [0024] and [0031]); respectively normalizing each of an output of the first neural network portion and an output of the second neural network portion based on a reference level (see Figure 1 and refer for example to paragraphs [0024] and [0031]); obtaining a weighted average of the normalized output of the first neural network portion and the normalized output of the second neural network portion (see Figure 1 and refer for example to paragraphs [0021] and [0031]); and indicating a recognition result of the neural network based on the obtained weighted average (see Figure 2 and refer to paragraphs [0010], [0011], [0029], [0031], [0037] and [0038], where the adjusting of a parameter of the discriminator corresponds to applicant’s “updating a parameter of the neural network based on the determined gradient of loss”).
With regard to claim 13, Kliger describes wherein the reference level corresponds to a conditional probability of data of a first class being recognized by the neural network as the first class (refer for example to paragraphs [0029] and [0030]).
As to claim 14, Kliger describes wherein the first loss function, as a contrastive loss function, includes a first probability component that changes correspondingly to a function value of a mapping function of a first class and a second probability component that changes contrastingly to a function value of a mapping function of a second class (refer for example to paragraphs [0027] and [0030]).
In regard to claim 15, Kliger describes wherein the second loss function corresponds to a cross-entropy loss function (refer for example to paragraph [0032]).
With regard to claim 16, Kliger describes wherein the first loss function corresponds to an equation as follows:

    PNG
    media_image1.png
    110
    534
    media_image1.png
    Greyscale

where L2a denotes the loss function, i and j denote respective classes, c denotes of classes, x denotes input data, Xi denotes input data of a class i, σ denotes a sigmoid function, fi(x) denotes a mapping function of the class i, and fj(x) denotes a mapping function of a class j (refer for example to paragraph [0024] and see equation 2).
In regard to claim 17, Kliger describes wherein the second loss function corresponds to an equation as follows:

    PNG
    media_image3.png
    87
    524
    media_image3.png
    Greyscale
 
(refer for example to paragraph [0024] and see equation 2).
18. The method of claim 16, wherein the normalizing comprises normalizing the output of the first neural network using an equation as follows:

    PNG
    media_image4.png
    38
    122
    media_image4.png
    Greyscale
 (refer for example to paragraph [0031] and see equation 11).
With regard to claim 19, Kliger describes wherein the first loss function corresponds to an equation as follows: 

    PNG
    media_image3.png
    87
    524
    media_image3.png
    Greyscale
.
where L2b denotes the loss function, i and j denote respective classes, c denotes of classes, x denotes input data, Xi denotes input data of a class i, σ denotes a sigmoid function, fi(x) denotes a mapping function of the class i, and fj(x) denotes a mapping function of a class j (refer for example to paragraph [0024] and see equation 2).
As to claim 20, Kliger describes wherein the normalizing comprises normalizing the output of the first neural network portion using an equation as follows:

    PNG
    media_image4.png
    38
    122
    media_image4.png
    Greyscale
 (refer for example to paragraph [0031] and see equation 11).
In regard to claim 21, Kliger describes a processor configured to input data to a neural network (see Figure 1 and refer for example to paragraphs [0022] and [0024]); determine respective mapping functions corresponding to a multiclass output of the neural network in association with the input data, including determining a mapping function of a first class and a mapping function of a second class (see Figure 1 and refer for example to paragraphs [0024] and [0031]); acquire a result of a loss function including a first probability component that changes correspondingly to a function value of the mapping function of the first class and a second probability component that changes contrastingly to a function value of the mapping function of the second class (see Figure 1 and refer for example to paragraphs [0021] and [0031]); determine a gradient of loss corresponding to the input data based on the result of the loss function (see Figure 1 and refer for example to paragraph [0024]); and update a parameter of the neural network based on the determined gradient of loss for generating a trained neural network based on the updated parameter (see Figure 2 and refer for example to paragraphs [0037] and [0038], where the adjusting of a parameter of the discriminator corresponds to applicant’s “updating a parameter of the neural network based on the determined gradient of loss”).
With regard to claim 22, Kliger describes further comprising a memory including instructions, wherein, in response to the instructions being executed by the processor, the processor is controlled to perform (refer for example to paragraphs [0046] and [0048]) the determining of the respective mapping functions, the acquiring of the result of the loss function, the determining of the gradient of loss, and the updating of the parameter of the neural network for the generating of the trained neural network based on the updated parameter (refer for example to paragraphs [0021], [0022], [0024], [0031], [0037] and [0038], where the adjusting of a parameter of the discriminator corresponds to applicant’s “updating a parameter of the neural network based on the determined gradient of loss”).
As to claim 23, Kliger describes wherein the first probability component increases with respect to the function value of the mapping function of the first class and the second probability component decreases with respect to increases in the function value of the mapping function of the second class (refer for example to paragraphs [0027], [0029] and [0030]).
In regard to claim 24, Kliger describes wherein the first probability component is based on a probability function associated with the mapping function of the first class and the second probability component is based on an inverse probability function associated with the mapping function of the second class (refer for example to paragraphs [0027], [0029] and [0030]).
With regard to claim 25, Kliger describes wherein the loss function corresponds to an equation as follows:

    PNG
    media_image1.png
    110
    534
    media_image1.png
    Greyscale

where L2a denotes the loss function, i and j denote respective classes, c denotes of classes, x denotes input data, Xi denotes input data of a class i, σ denotes a sigmoid function, fi(x) denotes a mapping function of the class i, and fj(x) denotes a mapping function of a class j (refer for example to paragraph [0024] and see equation 2).
As to claim 26, Kliger describes wherein the loss function corresponds to an equation as follows:

    PNG
    media_image3.png
    87
    524
    media_image3.png
    Greyscale

where L2b denotes the loss function, i and j denote respective classes, c denotes of classes, x denotes input data, Xi denotes input data of a class i, σ denotes a sigmoid function, fi(x) denotes a mapping function of the class i, and fj(x) denotes a mapping function of a class j (refer for example to paragraph [0024] and see equation 2).
In regard to claim 27, Kliger describes wherein the loss function corresponds a contrastive loss function, and the processor is further configured to determine another gradient of loss corresponding to the input data based on a cross-entropy loss function, and update another parameter of the neural network based on the determined other gradient of loss for the generating of the trained neural network (refer for example to paragraph [0032]).
With regard to claim 28, Kliger describes wherein the input data is training data and the trained neural network is a first neural network portion of a recognition neural network, and the processor is further configured, for performing recognition of non-training input data, to input the non-training input data to each of the first neural network portion trained using the loss function, and a second neural network portion trained using a different loss function (see Figure 1 and refer for example to paragraphs [0022], [0024] and [0031]); normalize each of an output of the first neural network portion and an output of the second neural network portion based on a reference level (see Figure 1 and refer for example to paragraphs [0024] and [0031]); obtain a weighted average of the normalized output of the first neural network portion and the normalized output of the second neural network portion (see Figure 1 and refer for example to paragraphs [0021] and [0031]); and indicate a recognition result of the recognition neural network based on the obtained weighted average (see Figure 2 and refer to paragraphs[0010], [0011], [0029], [0031], [0037] and [0038], where the adjusting of a parameter of the discriminator corresponds to applicant’s “updating a parameter of the neural network based on the determined gradient of loss”).

Relevant Prior Art

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sainath, Chai, Tokkui, Pi, Gu, Bhaskar, Liu, Chen S, Li, Yang, Chen G, Yin, Ma and Pei all disclose systems similar to applicant’s claimed invention.  

Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jose L. Couso whose telephone number is (571) 272-7388. The examiner can normally be reached on Monday through Friday from 6:00am to 2:00pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella, can be reached on 571-272-7778. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.




/JOSE L COUSO/Primary Examiner, Art Unit 2667                                                                                                                                                                                                        
May 20, 2021