DETAILED ACTION
This action is in response to the communications filed on 04/22/2020 in which claims 1-7 are amended; and 1-15 are still pending.
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/24/2020 has been entered.
 
Priority 
Applicant’s claim for the benefit of a prior-filed foreign patent application No.  JP2013-042014 filed on March 04, 2013 and benefit of National Stage application No. PCT/JP2013/079688 filed November 11, 2013 is acknowledged.

Response to Arguments
Applicant’s arguments and amendments filed 04/22/2020, have been fully considered by the examiner.

In response to applicant’s arguments, see pg. 7 of response submitted 04/22/2020, with respect to the rejection of claims 1-15 under 35 U.S.C. § 112(b), the applicant’s remarks have been fully 

In response to applicant’s arguments, see pg. 7 of response submitted 04/22/2020, with respect to the rejection of claims 1-15 under 35 U.S.C. § 103 have been fully consider and were not persuasive. Applicant argues that the cited prior art fails to disclose the amended limitation directed to the use of a first and second loss function for computing the updates to the dictionary of learning parameters; and searching repeating a search for an identification boundary until the calculated sum vary no more.
In response, the examiner notes the following. 
Regarding the first element, the amended claims do not recite the use of a first and second loss function, as these elements have been removed from the amendments filed 04/22/2020. Applicant appears to argue limitations not required by the claim limitations. Therefore, the arguments are deemed unpersuasive.
Regarding the second limitation, the applicant’s arguments amount to mere allegations of patentability because there is no discussions as to how the amended language is different from the cited prior art. Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.
The rejection of claims 1-15 under 35 U.S.C. § 103 has been maintained. 

Claim Rejections - 35 USC § 112-Indefiniteness
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Regarding claims 1, 6, and 7, the claim recite limitations that render the claim indefinite because the intended scope is unclear:
search for an identification boundary of the dictionary based on the dictionary, the supervised data, and labeled unsupervised data; assign a label to the unsupervised data in accordance with the identification boundary;
The intended scope of this limitation is clear because of the following:
The claim limitation recited “labelled unsupervised data” this would be considered part of the supervised data set as one of ordinary skill in the art would classify supervised data as a collection of labelled data. What is than applicant’s differentiation of recited data sets is unclear? How is labelled supervised data assigned a label when it is already labeled? The examiner interpret the scope of the claim to include labeled  data that is supervise data, divided into two data sets. 
calculate an unsupervised loss by providing 1 as a loss to samples of the unsupervised data contained in an area in the vicinity of the identification boundary, and providing O to the other samples.
The intended scope of this limitation is clear because of the following:
It is unclear how the lass is proved to sample in a vicinity of a boundary.
“an area in the vicinity” is a relative phrase where the claim and specification failed to disclose  some standard for measuring the scope of the phrase or the analysis for accessing the degree of determining that samples are in “the vicinity” as recited by the claim limitation.
It is unclear what is meant to provide a 0 to other samples, does it refer to changing the value, to multiply the values by 0, or some other method involving the application of a zero value to the training samples. 
calculate a sum of the supervised loss and the unsupervised loss; 
The intended scope of this limitation is clear because of the following:
the preceding  limitations refers to calculating “a supervised lass by applying a function” and “an unsupervised loss by providing a 1 as a loss that refer to a computation of a  single loss, it is unclear what plurality of losses are summed as recited by the claim limitation. 
update the dictionary to slightly shift the identification boundary;
The intended scope of this limitation is clear because of the following:
The claim recited updating the diction to “slightly shift” a boundary where the phrase “slightly shift” is a relative phrase where the claim and specification failed to disclose  some standard for measuring the scope of the phrase or the analysis for accessing the degree of determining “slightly shift” as recited by the claim limitation.
repeat processing from the searching for the identification boundary of the dictionary to the updating the dictionary, and terminate the processing when the calculated sum of the losses no more vary; and
The intended scope of this limitation is clear because of the following:
the repeating process the termination process for identifying when the sum no more varies is unclear, especially given that the method for computing the calculated sum appears to be based on a single value for applying a 1 to a sample and zeros to others which implies that sum, as claimed, is a non-varying value.
One of ordinary skill in the art would not be able to ascertain the intended scope of applicant claimed invention. Therefore, the limitation renders the claims indefinite.
 The examiner interprets any optimizing process for determining a classification boundary as within the scope of the applicant’s claim limitations. 

Regarding claims 2-5, 8-11, 12-15 which depend on claims 1, 6, and 7 respectively, the claims do not resolve the noted deficiencies above and are therefore appropriately rejected.
Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.


Claims 1-2, 4-15 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Guo et al. (US Patent Publication No. 8,527,432, hereafter ‘Guo’) in view of Cortes et al. (US Pat. No. 5, 640, 492, hereinafter ‘Cortes’).

Regarding independent claim 1 limitations, Guo teaches an information processing device
which uses supervised data and unsupervised data to perform semi-supervised learning, the information processing device comprising: (Guo teaches the device system that performs semi-supervised learning using a semi-supervised learning framework for learning using unlabeled data, that is unsupervised learning data, in 2:1-13: Consequently, this approach is called a semiparametric regularization based semi-supervised learning. By selecting different loss functions for the supervised learning, different semi-supervised learning frameworks are obtained… The present invention provides a semi-supervised learning approach which incorporates the unlabeled data into the supervised learning by a parametric function learned from the whole data including the labeled and unlabeled data.)
at least one memory configured to store instructions, and; at least one processor configured to execute instructions to: (Guo teaches computer storage memory and processing device for executing computer instructions, in 16:4-65: The present method may be implemented on a general purpose computer or a specially adapted machine. Typically, 5 a programmable processor will execute machine-readable instructions stored on a computer-readable medium… Depending on the exact con­figuration and type of computing device, the memory may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two…)
obtain a dictionary that includes a parameter group used in an identification device; (Guo teaches dictionary as the parameters associated with a function obtained from the geometric distribution of the data to develop a semi-supervised learning machine learner, that is the identification device, in Abstract. …A semi­supervised learning approach is provided which incorporates the unlabeled data into the supervised learning by a paramet­ric function learned from the whole data including the labeled and unlabeled data….;  the parameters of the dictionary are used as part of the function for determining the geometric structure of the marginal distribution of the data, as depicted in Fig. 1A and 1B, in 1: 25-62: …The geometry of the marginal distribution must be considered 30 such that the learned classification or regression function adapts to the data distribution… A variety of graph based methods are proposed in the literature to achieve this goal. The approach herein exploits the geometric structure in a different way. This is achieved by a 2-step learning process. The first step is to obtain a para­metric function from the unlabeled data which describes the geometric structure of the marginal distribution. This para­metric function is obtained by applying Kernel Principal Component Analysis (KPCA) algorithm to the whole data including the labeled and unlabeled data. In KPCA, the func­tion to extract the most important principal component is a linear combination of the kernel functions in the Reproducing Kernel Hilbert Space (RKHS), f(x)=K(x, .)a, where K is a kernel function and a is the coefficients vector. This learned parametric function can be shown to reflect the geometric structure of the marginal distribution of the data…)
search for an identification boundary of the dictionary based on the dictionary, the supervised data, and labeled unsupervised data; (Guo teaches determining, by searching for an geometric structure for of the marginal distribution of the data that is models as a classification function, that is a decision boundary, to cluster data using unlabeled, that is unsupervised data, and labeled examples, that is supervised labels, in 1:11-32: Semi-supervised learning has attracted con­siderable attention in recent years and many methods have been proposed to utilize the unlabeled data. Most of the semisupervised learning models are based on the cluster assumption which states that the decision boundary should not cross the high density regions, but instead lie in the low density regions… Moreover, the marginal distribution of the data is advan­tageously determined by the unlabeled examples if there is a small labeled data set available along with a relatively large unlabeled data set, which is the case for many applications. The geometry of the marginal distribution must be considered such that the learned classification or regression function adapts to the data distribution… ; where the cluster classification is based the classification function that is used to search for identification boundary as depicted in Fig. 1 for classification of supervised data (e.g. label data) and unsupervised (unlabeled) data, in 1:39-44: …On the other hand, the mar­ginal distribution of the data described by the unlabeled data has a particular geometric structure. Incorporating this geo­metric structure into the learning process results in a better classification function, as shown in FIG. lB. The above observation suggests that the unlabeled data help Change the decision function towards the desired direc­tion…)
assign a label to the unsupervised data in accordance with the identification boundary; (Guo teaches assigning the same labeled to data based on their determined similarity clusters, in 1:18-24 …Most of the semisupervised learning models are based on the cluster assumption which states that the decision boundary should not cross the high density regions, but instead lie in the low density regions. In other words, similar data points should have the same label and dissimilar data points should have different labels….; where assigning labels involves the use of the classification function in accordance with the identification boundary as depicted in Fig. 1 to classify the unsupervised data, that is the unlabeled data set with the assigned positive and negative assigned label classes as depicted in Fig 1, 1:25-41. The present approach is also based on the cluster assumption. Moreover, the marginal distribution of the data is advan­tageously determined by the unlabeled examples if there is a small labeled data set available along with a relatively large unlabeled data set, which is the case for many applications. The geometry of the marginal distribution must be considered such that the learned classification or regression function adapts to the data distribution. An example is shown in FIGS. lA and lB for a binary classification problem…; and assigning labels to the x training data set of the supervised and unsupervised data using the f(x) function, in in 3:65-4:45: A variety of loss functions have 65 been considered in the literature. The simplest loss function is 0/1 loss

    PNG
    media_image1.png
    364
    601
    media_image1.png
    Greyscale

)
calculate a supervised loss by applying a function to the supervised data when a label which is preassigned to the supervised data is different from a label determined based on the identification boundary; (Guo teaches calculating the sum of loss as in equation 3.2 using supervised learning functions to perform supervised learning and to calculate a supervised loss by applying the equation (i.e. an applied function), to the supervised data set of preassigned labels to the data points, in 3:53-4:11: Assuming that the given data consist of 1 labeled data points (x,,y,), … which are generated according to P. The binary clas­sification problem is assumed where the labels y,, … are binary, i.e., y,=±1. In the supervised learning scenario, the goal is to learn a function f to minimize the expected loss called risk functional…. where L is a loss function. A variety of loss functions have been considered in the literature. The simplest loss function is  0/1 loss…; where the loss function determining the error for the decision function, that is based on the identification boundary as the Ψ(x) family of parametric functions that can be applied to the supervised data as depicted in Fig. 1, in 4: 5-17.)
calculate an unsupervised loss by providing 1 as a loss to samples of the unsupervised data contained in an area in the vicinity of the identification boundary, and providing 0 to the other samples (Guo teaches the computation of the unsupervised loss as an expansion of the acquired knowledge from the supervised learning process to calculate the unsupervised loss by providing a 1 when the assigned label is the same as the function estimate and 0 to samples that do not match as part of the L loss function, in 3:65-4:45: A variety of loss functions have 65 been considered in the literature. The simplest loss function is 0/1 loss

    PNG
    media_image1.png
    364
    601
    media_image1.png
    Greyscale


…where y>0 is the regularization parameter which specifies the trade off f between minimization of Rem/f) and the smoothness or simplicity enforced by small E>(f). A choice of E>(f) is the norm of the RKHS representation of the feature space 
8(f)=ll􀁃IK2 where ll•IIK is the norm in the RKHS 1-( K associ­ated with the kernel K.
Where the loss function is incorporated as part of the unsupervised learning process, in 5:27-48 : 

    PNG
    media_image2.png
    566
    591
    media_image2.png
    Greyscale

)
calculate a sum of the supervised data loss and the unsupervised data loss; (Guo teaches the summation of the supervised and unsupervised loss, in equation 4.6 using the L loss function capturing the supervised and unsupervised data loss in equation 4.7, in 528-65:

    PNG
    media_image3.png
    264
    595
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    607
    598
    media_image4.png
    Greyscale

)
update the dictionary to slightly shift the identification boundary; repeat processing from the searching for the identification boundary of the dictionary to the updating the dictionary, and terminate the processing when the calculated sum of the losses no more vary; and (Guo teaches the parametric function that is learned from the supervised learning on data, that is an updating of the parametric function that is the dictionary, in 3:35-41; where the iterative (i.e. repeating) process decreases the sum loss by using the minimizer equation that solves the differential values of the sum of the losses as the α* and β* that are associated with the minimization of the sum of the losses, that is vary no more for the minimized solution, in 6:5-2, for searching for the geometric structure as a function that models the marginal distribution of the supervised and unsupervised data classes, as depicted in Fig. 2, in 7:9-66, to update the dictionary by projecting on the principle axis v for updating the dictionary parameters as depicted in Fig. 1B, in 6:1-21 .)
a dictionary output circuit configured to output the updated dictionary. (Guo teaches outputting the estimated parameter function, in 11:36-55.)

Examiner notes that the term circuit is considered a structural term that has been found to not invoke § 112, ¶ 6; and disclosed by Gou as an information processing device including a programmable processor for executing instruction for implementing using application software, in 16:3-9.

Guo teaches the semi-supervised support vector machine (SVM) for learning the parameter function as the dictionary for searching for the geometric structure for determining the classification of data a using margin classifier  as recited in the limitations above.
Guo does not expressly teach the geometric structure for making classification decision as a hyperplane when using margin classifiers such as Support Vector Machines. Cortes teaches the use of margin classifier to separate training data into classes by searching for a geometric structure, disclosed as a hyperplane, that is consider the claimed identification boundary as claimed in limitations:
search for an identification boundary of the dictionary based on the dictionary, the supervised data, and labeled unsupervised data; assign a label to the unsupervised data  (Cortes teaches the classification of data by searching for an optimal hyperplanes by updating the parameter values which minimizes the some of the loss associated with the training data as depicted in Fig. 6 by applying 1 and zeroes using a step function as the error pattern capturing the training data loss including supervised and unsupervised data training data samples depicted in Fig. 6, in 6:56-7:22: FIG. 6 illustrates the use of slack variables in relation to an optimal hyperplane between vectors of classes A, B represented by X's and O's respectively.  Hyperplanes 135, 140 correspond to w•x +b =-1 and W·X +b =l, respectively, with soft margins extending from each hyperplane 135, 140. Hyperplane 145 corresponds to w•x +b =0 and is intermediate of hyperplanes 135, 140.

    PNG
    media_image5.png
    561
    658
    media_image5.png
    Greyscale

)

It would have been obvious to one of ordinary skill in the art before the claimed invention was made to integrate the method training margin classifiers to search for hyperplanes in classification task  as disclosed by Cortes with the method of developing semi-supervised information processing methods for data classification as disclosed by Guo.
One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to provide an improvement to enable efficient and stable method of determining an optimize decision function for the parameters of the optimal hyperplane generated from the training data (Cortes, Abstract).

Examiner notes that under broadest reasonable interpretation, see MPEP 2111, the dictionary update is based on a condition that is not required, as the recitation is discloses an “If’ clause without disclosing alternative limitation (e.g. else …). Therefore, any action would read on the alternative. In addition, although understanding the claim language may be aided by explanations contained in the written description, it is improper to import claim limitations from the specification, especially when the when the claim language is broader than the embodiment disclosed in applicant’s specification, see MPEP, 2111.01 (II).

Regarding claim 2, the rejection of claim 1 is incorporated and Guo in combination with Cortes further teaches the information processing device according to claim 1, 
the at least one processor configured to execute the instructions to use a function representing Gaussian distribution when calculating the loss of the unsupervised (Guo teaches calculating the loss of the unsupervised data based on the B coefficient of the unlabeled data a kernel function, K, to determine the classification function associated with the identified boundary as depicted in Fig. 1B, in 8:3-46; where the Kernel function used is the Gaussian radial basis function in table 1 of col. 14, that is a function representing a Gaussian distribution, in 8:37-46.)
Gou teaches the information processing device including a programmable processor for executing instruction for implementing using application software, in 16:3-9.

Regarding claim 4, the rejection of claim 1 is incorporated and Guo in combination with Cortes further teaches the information processing device according to claim 1,
the at least one processor configured to execute the instructions to determine, based on a change amount of the sum of the losses in response to an update of the dictionary, either output of the dictionary or continuation of update of the dictionary. (Guo teaches determining using semi-supervised learning in response to learned parametric function, that is an update to the dictionary, using the supervised learning, in 1:60-2:9 to output the parametric functions based on the analysis of the loss functions, in 11-12:4; where the Change in the amount of loss based on minimizing, that is a Change amount, the semiparametric regularized parameter in equation 4.6 that is a sum of loss function to determine the final update as the classification function, in 8:8-46)
Gou teaches the information processing device including a programmable processor for executing instruction for implementing using application software, in 16:3-9.

Regarding claim 5, the rejection of claim 1 is incorporated and Guo in combination with Cortes further teaches the information processing device according to claim 1,
the at least one processor further configured to execute the instructions to identify by using the dictionary, data subject to identification which is input. (Gou the semi-supervised input consisting of test set, that is data subject to identification that is identified with labels that are used to measure the error rate, in 14:19-26.)

Regarding independent claim 6 limitations, Guo teaches an information processing method,
executed by an information processing device which uses supervised data and unsupervised data to perform semi-supervised learning, the information processing method comprising; (Guo teaches the information processing device as the programmable processor for executing instruction for implementing using application software, in 16:3-9; where the system performs semi-supervised learning using unlabeled data, that is unsupervised learning data, 15:58-61)
obtaining a dictionary which includes a parameter group used in an identification device; (Guo teaches dictionary as the parameters associated with a function obtained from the geometric distribution of the data to develop a semi-supervised learning machine learner, that is the identification device, in Abstract. …A semi­supervised learning approach is provided which incorporates the unlabeled data into the supervised learning by a paramet­ric function learned from the whole data including the labeled and unlabeled data….;  the parameters of the dictionary are used as part of the function for determining the geometric structure of the marginal distribution of the data, as depicted in Fig. 1A and 1B, in 1: 25-62: …The geometry of the marginal distribution must be considered 30 such that the learned classification or regression function adapts to the data distribution… A variety of graph based methods are proposed in the literature to achieve this goal. The approach herein exploits the geometric structure in a different way. This is achieved by a 2-step learning process. The first step is to obtain a para­metric function from the unlabeled data which describes the geometric structure of the marginal distribution. This para­metric function is obtained by applying Kernel Principal Component Analysis (KPCA) algorithm to the whole data including the labeled and unlabeled data. In KPCA, the func­tion to extract the most important principal component is a linear combination of the kernel functions in the Reproducing Kernel Hilbert Space (RKHS), f(x)=K(x, .)a, where K is a kernel function and a is the coefficients vector. This learned parametric function can be shown to reflect the geometric structure of the marginal distribution of the data…)
determining an identification boundary of the dictionary based on the dictionary, the supervised data, and labeled unsupervised data; (Guo teaches determining, by searching for an geometric structure for of the marginal distribution of the data that is models as a classification function, that is a decision boundary, to cluster data using unlabeled, that is unsupervised data, and labeled examples, that is supervised labels, in 1:11-32: Semi-supervised learning has attracted con­siderable attention in recent years and many methods have been proposed to utilize the unlabeled data. Most of the semisupervised learning models are based on the cluster assumption which states that the decision boundary should not cross the high density regions, but instead lie in the low density regions… Moreover, the marginal distribution of the data is advan­tageously determined by the unlabeled examples if there is a small labeled data set available along with a relatively large unlabeled data set, which is the case for many applications. The geometry of the marginal distribution must be considered such that the learned classification or regression function adapts to the data distribution… ; where the cluster classification is based the classification function that is used to search for identification boundary as depicted in Fig. 1 for classification of supervised data (e.g. label data) and unsupervised (unlabeled) data, in 1:39-44: …On the other hand, the mar­ginal distribution of the data described by the unlabeled data has a particular geometric structure. Incorporating this geo­metric structure into the learning process results in a better classification function, as shown in FIG. lB. The above observation suggests that the unlabeled data help change the decision function towards the desired direc­tion…)
the remaining limitations are similar to the ones rejected in claim 1 and are rejected under the same rationale.

Regarding independent claim 7 limitations, Guo teaches a non-transitory computer readable medium storing a program,
executed by an information processing device which uses supervised data and unsupervised data to perform semi-supervised learning, the program causing the information processing device to execute: (Guo teaches the device system that performs semi-supervised learning using unlabeled data, that is unsupervised learning data, 15:58-61; information processing device as the programmable processor for executing instruction for implementing using application software, in 16:3-9)
remaining claim limitation recites similar content as the claim 1 limitations and are rejected under the same rationale.


wherein the information processing device uses a function representing Gaussian distribution when calculating the loss of the unsupervised data. (Guo teaches calculating the loss of the unsupervised data based on the B coefficient of the unlabeled data a kernel function, K, to determine the classification function associated with the identified boundary as depicted in Fig. 1B, in 8:3-46; where the Kernel function used is the Gaussian radial basis function in table 1 of col. 14, that is a function representing a Gaussian distribution, in 8:37-46.)

Regarding claim 9, the rejection of claim 6 is incorporated and Guo in combination with Cortes further teaches the information processing method according to claim 6,
wherein the dictionary update circuit updates the dictionary by using a differential value of the sum based on the losses. (Guo teaches the parametric function that is learned from the supervised learning on data, that is an updating of the parametric function that is the dictionary, in 3:35-41; where the process decreases the sum loss by using the minimizing the squared loss function, in 9:12-20; where the parametric function reflects the choices of the loss function, in 2:14-19, that is a different value associated with the minimizing choice of the loss function by the algorithm to learn the function f, that is the parametric function, in 4:39-53, that is considered based a sum based on the losses as recited in equation 3.3 and 3.4 and based on the captured losses for computing the sum of loss all the supervised and unsupervised data captured as the overall variance on the projected principle axis as depicted in Fig. 2, in 7:9-66.)


wherein the dictionary output circuit determines, based on a Change amount of the sum of the losses in response to an update of the dictionary, either output of the dictionary or continuation of update of the dictionary. (Guo teaches determining using semi-supervised learning in response to learned parametric function, that is an update to the dictionary, using the supervised learning, in 1:60-2:9 to output the parametric functions based on the analysis of the loss functions, in 11-12:4; where the change in the amount of loss based on minimizing, that is a Change amount, the semiparametric regularized parameter in equation 4.6 that is a sum of loss function to determine the final update as the classification function, in 8:8-46)

Regarding claim 11, the rejection of claim 6 is incorporated and Guo in combination with Cortes further teaches the information processing method according to claim 6,
further including: identifying, by using the dictionary, data subject to identification which is input. (Gou the semi-supervised input consisting of test set, that is data subject to identification that is identified with labels that are used to measure the error rate using the dictionary function parameters, in 14:19-26.)

Regarding claim 12, the rejection of claim 7 is incorporated and Guo in combination with Cortes further teaches the program according to Claim 7, 
causing the information processing device to perform the loss calculation function in which a function representing Gaussian distribution is used when calculating the loss of the unsupervised data. (Guo teaches calculating the loss of the unsupervised data based on the B coefficient of the unlabeled data a kernel function, K, to determine the classification function associated with the identified boundary as depicted in Fig. 1B, in 8:3-46; where the Kernel function used is the Gaussian radial basis function in table 1 of col. 14, that is a function representing a Gaussian distribution, in 8:37-46; and based on the captured losses for computing the sum of loss all the supervised and unsupervised data captured as the overall variance on the projected principle axis as depicted in Fig. 2, in 7:9-66.)

Regarding claim 13, the rejection of claim 7 is incorporated and Guo in combination with Cortes further teaches the program according to Claim 7, 
causing the information processing device to perform the dictionary update function of updating the dictionary by using a differential value based on the sum losses. (Guo teaches the parametric function that is learned from the supervised learning on data, that is an updating of the parametric function that is the dictionary, in 3:35-41; where the process decreases the sum loss by using the minimizing the squared loss function, considered a sum based on the losses, in 9:12-20; where the parametric function reflects the choices of the loss function, in 2:14-19, that is a different value associated with the minimizing choice of the loss function, based on the sum of the losses, by the algorithm to learn the function f, that is the parametric function, in 4:39-53; .)

Regarding claim 14, the rejection of claim 7 is incorporated and Guo in combination with Cortes in combination with Cortes in combination with Cortes further teaches the program according to Claim 7, 
causing the information processing device to perform the dictionary update function of determining, based on a change amount of the sum of the losses in response to an update of the dictionary, either output of the dictionary or continuation of update of the dictionary. (Guo teaches determining using semi-supervised learning in response to learned parametric function, that is an update to the dictionary, using the supervised learning, in 1:60-2:9 to output the parametric functions based on the analysis of the loss functions, in 11-12:4; where the change in the amount of loss based on minimizing, that is a change amount, the semiparametric regularized parameter in equation 4.6 that is a sum of loss function to determine the final update as the classification function, in 8:8-46)

Regarding claim 15, the rejection of claim 7 is incorporated and Guo in combination with Cortes further teaches the program according to Claim 7, 
causing the information processing device to further perform an identification function of identifying by using the dictionary, data subject to identification which is input. (Gou the semi-supervised input consisting of test set, that is data subject to identification that is identified with labels that are used to measure the error rate, in 14:19-26; based on the identification boundary as the Ψ(x) family of parametric functions that can be learned for labeling the data as depicted in Fig. 1, in 4: 5-17 and 13:1-9.)


Claim 3 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Guo et al. (US Patent Publication No. 8,527,432, hereafter ‘Guo’) in view of Cortes et al. (US Pat. No. 5, 640, 492, .

Regarding claim 3, the rejection of claim 1 is incorporated and Guo in combination with Cortes further teaches the information processing device according to claim 1,
wherein the dictionary update circuit updates the dictionary by using ... (Guo teaches the parametric function that is learned from the supervised learning on data, that is an updating of the parametric function that is the dictionary, in in 3:35-41, of the support vector machine classification using semi-supervised learning,; where the process decreases the sum loss by using the minimizing the squared loss function, in 9:12-20 as an optimization method to solve the parameters associated with the support vector machine, in 10:65-11:30.)
Guo and Cortes do not expressly teach claim 3 limitation:
wherein … updates …by using a steepest decent method.	
Chap teaches claim 3 limitation:
wherein ... updates …by using a steepest decent method. (Chap teaches the Gradient methods using gradient descent as a steepest descent method used in optimizing the parameters for minimizing the objection loss function in a semi-supervised learning approach for support vector machines, in pg. 210: Section number 2. Gradient Methods…)

The Guo, Cortes, and Chap references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing methods for data classification.

One of ordinary skill in the arts would have been motivated to integrate the disclosed methods in order to provide an improvement to enable a solution to non-convex optimization problems (Chap, Abstract).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below.
Chang et al. (NPL: “Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines”): teaches the use of L1 and L2 commonly used loss functions for support vector machines, in pg. 1369: Sec.1. Introduction; and using a first loss of min w f(w)+ε as a predetermined minimum value that is compared with a second loss function f(𝒘̂) to update the 𝒘̂ dictionary parameter values using L2-loss function that is smaller than the predetermined value this is the calculated sum as denoted by equation (3) on pg. 1369, in pg. 1371: 1st full para.: “We say 𝒘̂ is an …” used to update the dictionary, pgs. 1380-1381: Sec. 4.3.
Principe et al. (US Pub. No. 2013/0132315): using the L1 and L2 as hinge loss computation for training support vector machine and neural network classifiers.
Soldevila et al. (US Patent Application Publication No. 20170011279): Using loss objective function to learning neural networks for annotating images using a word dictionary.
Bangalore et al. (US Patent Application Publication No. 2012/0150531): Teaches the support vector machine learning algorithm for classifying data using semi-supervised learning and a gradient decent algorithm.
Tsao et al. (US Pat. Pub. 2012/0004116): teaches using steepest decent in a semi-supervised approach to defining a classification hyperspace using leaning models such as support vector machines, in [0108]-[0206].
                                                                Any inquiry concerning this communication or earlier communications from the examiner should be 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/O.O.A./Examiner, Art Unit 2126         
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126