Detailed Action
This action is in response to Applicant's communications filed 09 August 2018.  
Claims 2-4 are cancelled.  Thus, claims 1 and 5-15 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 5, and 10-15 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Farabet et al. (Learning Hierarchical Features for Scene Labeling, hereinafter "Farabet").

Regarding Claim 1,
Farabet teaches an information processing apparatus comprising:
an acquisition section configured to acquire a semantic network including information indicating a relationship between nodes, identification information of data, and a label corresponding to the node forming the semantic network ("This paper presents a scene parsing system that relies on deep learning methods to approach both questions. The main idea is to use a convolutional network (ConvNet) [27] operating on a large input window to produce label hypotheses for each pixel location. The convolutional net is fed with raw image pixels (after band-pass filtering and contrast normalization), and trained in supervised mode from fully labeled images to produce a category for each pixel location. ConvNets are composed of multiple stages, each of which contains a filter bank module, a nonlinearity, and a spatial pooling module. With end-to-end training, ConvNets can automatically learn hierarchical feature representations." sec. 1, p. 1915; "We report our semantic scene understanding results on three different datasets" sec. 5, p. 1922; the convolutional network with semantic scene understand teaches the semantic network, the fully labeled images to product a category teaches identification information of data, and the training to produce a category for each pixel location teaches a label corresponding to the node forming the semantic network); and
a learning section configured to learn a classification model  that classifies the data into the label, on a basis of the semantic network, the identification information, and the label that have been acquired by the acquisition section ("trained in supervised mode from fully labeled images to produce a category for each pixel location. ConvNets are composed of multiple stages, each of which contains a filter bank module, a nonlinear-ity, and a spatial pooling module. With end-to-end training, ConvNets can automatically learn hierarchical feature representations." sec. 1, p. 1915); 
using a learning criterion as to whether a plurality of the labels included in a classification result of the data conforms to the relationship between the nodes in the semantic network 
(
    PNG
    media_image1.png
    374
    345
    media_image1.png
    Greyscale
 sec. 4.3.3, p. 1921; "We report our semantic scene understanding results on three different datasets: “Stanford Background” on which related state-of-the-art methods report classification errors, and two more challenging datasets with a larger number of classes: “SIFT Flow” and “Barcelona.”... We use the evaluation procedure introduced in [15], 5-fold cross validation: 572 images used for training, and 143 for testing." sec. 5, p. 1922; the results shown in Tables 1-3 comparing the model trained on the training images and tested on the testing images teaches analyzing the conformity to the relationship between the nodes in the semantic network); 

Regarding Claim 5,
Farabet teaches the information processing apparatus according to claim 1.  Farabet further teaches wherein the learning section performs learning on a basis of a feedback to output information regarding a learning result (
    PNG
    media_image1.png
    374
    345
    media_image1.png
    Greyscale
 sec. 4.3.3, p. 1921; the divergence between the classifier prediction and the true (known) distributions of labels teaches the feedback that is used to train the classifier to learn the proper classifications).

Regarding Claim 10,
Farabet teaches the information processing apparatus according to claim 5.  Farabet further teaches wherein the classification model is mounted by a neural network and the output information includes output values of one or more units included in the neural network ("a convolutional network (ConvNet)" sec. 1, p. 1915).

Regarding Claim 11,
Farabet teaches the information processing apparatus according to claim 10.  Farabet further teaches wherein the output information includes a clustering result of the output values ("We used the gPb hierarchies of Arbelaez et al., which are computed using spectral clustering to produce semantically consistent contours of objects." sec. 5.3, p. 1924). 

Regarding Claim 12,
Farabet teaches the information processing apparatus according to claim 10.  Farabet further teaches wherein the one or more units correspond to a plurality of units constituting an intermediate layer ("ConvNets [26], [27] are trainable architectures composed of multiple stages. The input and output of each stage are sets of arrays called feature maps. For example, if the input is a color image, each feature map would be a two-dimensional array containing a color channel of the input image (for an audio input, each feature map would be a one-dimensional array, and for a video or volumetric image, it would be a three-dimensional array). At the output, each feature map represents a particular feature extracted at all locations on the input. Each stage is composed of three layers: a filter bank layer, a nonlinearity layer, and a feature pooling layer. A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module. Because they are trainable, arbitrary input modalities can be modeled beyond natural images. " sec. 3.1, p. 1918; "we use a three-stage ConvNet. The first two layers of the network are composed of a bank of filters of size 7 x 7 followed by tanh units and 2 x 2 max-pooling operations. The last layer is a simple filter bank." sec. 5.1, p. 1923).

Regarding Claim 13,
Farabet teaches the information processing apparatus according to claim 10.  Farabet further teaches wherein the one or more units correspond to one unit of an intermediate layer ("ConvNets [26], [27] are trainable architectures composed of multiple stages. The input and output of each stage are sets of arrays called feature maps. For example, if the input is a color image, each feature map would be a two-dimensional array containing a color channel of the input image (for an audio input, each feature map would be a one-dimensional array, and for a video or volumetric image, it would be a three-dimensional array). At the output, each feature map represents a particular feature extracted at all locations on the input. Each stage is composed of three layers: a filter bank layer, a nonlinearity layer, and a feature pooling layer. A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module. Because they are trainable, arbitrary input modalities can be modeled beyond natural images. " sec. 3.1, p. 1918; "we use a three-stage ConvNet. The first two layers of the network are composed of a bank of filters of size 7 x 7 followed by tanh units and 2 x 2 max-pooling operations. The last layer is a simple filter bank." sec. 5.1, p. 1923)..

Regarding Claim 14,
Farabet teaches the information processing apparatus according to claim 5.  Farabet further teaches wherein the output information includes a co-occurrence histogram of the label ("A classifier is then applied to all the aggregated feature grids to produce a histogram of categories, the entropy of which measures the “impurity” of the segment. Each pixel is then labeled by the minimally impure node above it, which is the segment that best “explains” the pixel." Figure 4, p. 1920).

Regarding Claim(s) 15,
Claim(s) 15 recite(s) a method executed by a processor ("processors" p. 1927) corresponding to the steps recited in claim(s) 1, respectively.  Farabet teaches the limitations of claim(s) 15 as set forth above in connection with claim(s) 1.  Therefore, claim(s) 15 is/are rejected under the same rationale as respective claim(s) 1.

Claim(s) 6-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Farabet et al. (Learning Hierarchical Features for Scene Labeling, hereinafter "Farabet") in view of Mottaghi et al. (The Role of Context for Object Detection and Semantic Segmentation in the Wild, hereinafter "Mottaghi").

Regarding Claim 6,
Farabet teaches the information processing apparatus according to claim 5.  Farabet does not explicitly teach wherein the output information includes information that proposes an input of the semantic network that is new.
Mottaghi teaches wherein the output information includes information that proposes an input of the semantic network that is new ("Our dataset contains pixel-wise labels for the 10,103 trainval images of the PASCAL VOC 2010 detection challenge (Fig. 1 shows example labels). There are 540 categories in the dataset, divided into three types: (i) objects, (ii) stuff and (iii) hybrids. Objects are classes that are defined by shape." sec. 3, p. 2; "We provided the annotators with an initial set of 80 carefully chosen labels and asked them to include more classes if a region did not fit into any of these classes. We provided the annotators with an initial set of 80 carefully chosen labels and asked them to include more classes if a region did not fit into any of these classes." sec. 3, p. 2; adding more classes teaches information that proposes an input of the semantic network that is new).
Farabet and Mottaghi are analogous art because they are both directed to image classification. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the image classifier of Farabet with the feedback of Mottaghi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve object detection, as suggested by Mottaghi ("We show that this contextual reasoning significantly helps in detecting objects at all scales" Abstract, p. 355).

Regarding Claim 7,
The Farabet/Mottaghi combination teaches the information processing apparatus according to claim 6.  Mottaghi further teaches wherein the output information includes information that proposes the semantic network that is new ("Our dataset contains pixel-wise labels for the 10,103 trainval images of the PASCAL VOC 2010 detection challenge (Fig. 1 shows example labels). There are 540 categories in the dataset, divided into three types: (i) objects, (ii) stuff and (iii) hybrids. Objects are classes that are defined by shape." sec. 3, p. 2; "We provided the annotators with an initial set of 80 carefully chosen labels and asked them to include more classes if a region did not fit into any of these classes. We provided the annotators with an initial set of 80 carefully chosen labels and asked them to include more classes if a region did not fit into any of these classes." sec. 3, p. 2; adding more classes indicates that the semantic network is new or incomplete).
Farabet and Mottaghi are analogous art because they are both directed to image classification. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the image classifier of Farabet with the feedback of Mottaghi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve object detection, as suggested by Mottaghi ("We show that this contextual reasoning significantly helps in detecting objects at all scales" Abstract, p. 355).

Regarding Claim 8,
The Farabet/Parkash combination teaches the information processing apparatus according to claim 7.  Mottaghi further teaches wherein the output information includes information indicating the semantic network inferred from another label associated with other data ("Our dataset contains pixel-wise labels for the 10,103 trainval images of the PASCAL VOC 2010 detection challenge (Fig. 1 shows example labels). There are 540 categories in the dataset, divided into three types: (i) objects, (ii) stuff and (iii) hybrids. Objects are classes that are defined by shape." sec. 3, p. 2; "We provided the annotators with an initial set of 80 carefully chosen labels and asked them to include more classes if a region did not fit into any of these classes. We provided the annotators with an initial set of 80 carefully chosen labels and asked them to include more classes if a region did not fit into any of these classes." sec. 3, p. 2; adding more classes indicates that the semantic network is new or incomplete).
Farabet and Mottaghi are analogous art because they are both directed to image classification. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the image classifier of Farabet with the feedback of Mottaghi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve object detection, as suggested by Mottaghi ("We show that this contextual reasoning significantly helps in detecting objects at all scales" Abstract, p. 355).

Regarding Claim 9,
Farabet teaches the information processing apparatus according to claim 5.  Farabet does not explicitly teach wherein the output information includes information that proposes association of the label that is new with the data.
Mottaghi teaches wherein the output information includes information that proposes association of the label that is new with the data ("Our dataset contains pixel-wise labels for the 10,103 trainval images of the PASCAL VOC 2010 detection challenge (Fig. 1 shows example labels). There are 540 categories in the dataset, divided into three types: (i) objects, (ii) stuff and (iii) hybrids. Objects are classes that are defined by shape." sec. 3, p. 2; "We provided the annotators with an initial set of 80 carefully chosen labels and asked them to include more classes if a region did not fit into any of these classes. We provided the annotators with an initial set of 80 carefully chosen labels and asked them to include more classes if a region did not fit into any of these classes." sec. 3, p. 2; adding more classes outside the 80 chosen labels teaches information proposes associations of new labels).
Farabet and Mottaghi are analogous art because they are both directed to image classification. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the image classifier of Farabet with the feedback of Mottaghi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve object detection, as suggested by Mottaghi ("We show that this contextual reasoning significantly helps in detecting objects at all scales" Abstract, p. 355).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477. The examiner can normally be reached M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES C KUO/           Examiner, Art Unit 2126       
/ANN J LO/           Supervisory Patent Examiner, Art Unit 2126