DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
2.	Claims 1-20 have been examined and rejected. This is the first Office action on the merits.

Claim Rejections - 35 USC § 102
3.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

4.	Claims 1-2, 11-14, and 18-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Liu et al (“Raster-to-Vector: Revisiting Floorplan Transformation,” October 22, 2017).

Claims 1-2, 11 (Method)
Claims 14, 19 (Device)
4-1.	Regarding claims 1 and 14, Liu teaches a method for determining a function configured to determine a semantic segmentation of a 2D floor plan representing a layout of a building, the function having a neural network presenting a convolutional encoder-decoder architecture, the neural network comprising a pixel-wise classifier with respect to a set of classes, by disclosing converting a rasterized floorplan image into a vector-graphics representation using a learning-based approach [Abstract]. A Convolutional Neural Network (CNN) is used to convert a raster floorplan image into a first function layer (i.e., junction maps and per-pixel room-classification scores) [Section 1 Introduction, paragraph 4; Section 3 Stratified floorplan representation, paragraph 1; Section 4 Raster to vector conversion, paragraph 1]. Heatmaps at pixel-level are predicted and three deconvolution layers are used in parallel, one for function heatmap regression and two for per-pixel classifications [Section 4.1 Junction layer conversion via a CNN, first paragraph]. The junction layer has two types of per-pixel probability distribution maps over different semantic types, the first distinguishing if a pixel belongs to a wall or a certain room type and the second distinguishing if a pixel belongs to an opening, a certain icon type or empty [Section 3.1 Junction layer]. 
	Liu teaches the classes comprising at least two classes among a wall class, a door class and a window class, by disclosing that in the intermediate layers, the floorplan data are represented by three factors: walls, openings (doors or windows), or icons [Section 3 Stratified floorplan representation, paragraphs 1-2].
Liu teaches the method comprising: obtaining a dataset comprising 2D floor plans each associated to a respective semantic segmentation, by disclosing a large-scale dataset with groundtruth for vectorgraphics floorplan conversion, based on the LIFULL HOME’S dataset [4] which contains 5 million floorplan raster images [Section 5.1 Annotating a Floorplan Dataset, paragraph 1, lines 1-5]. To create the groundtruth, 1,000 floorplan images are randomly sampled and human subjects are asked to annotate the geometric and semantic information for each floorplan image [Section 5.1 Annotating a Floorplan Dataset, paragraph 1, lines 5-8]. An annotator can draw a line representing either a wall or an opening, draw a rectangle and pick an object type for each object, or attach a room label at a specific location [Section 5.1 Annotating a Floorplan Dataset, paragraph 1, lines 8-11].
Liu teaches learning the function based on the dataset, by disclosing that for classification tasks, pixelwise softmax cross entropy loss is used where three branches are trained jointly [Section 4.1 Junction layer conversion via a CNN, paragraph 1]. Images with poor quality are dropped and 870 groundtruth floorplan images are collected where 770 of them are used for network training while the remaining 100 examples are served as test images [Section 5.1 Annotating a Floorplan Dataset, paragraph 1, last five lines].

4-2.	Regarding claim 2, Liu teaches all the limitations of claim 1, wherein the function presents a mean accuracy higher than 0.85 and/or a mean intersection-over-union higher than 0.75, by disclosing the accuracies presented in [page 2221, Table 1].

4-3.	Regarding claim 11, Liu teaches all the limitations of claim 1, wherein providing the dataset comprises: obtaining a database of 2D floor plans each associated to a respective 3D model; and determining for each 2D floor plan the respective semantic segmentation from the respective 3D model, by disclosing processing floorplan images from the Rent3D database which contains 3D reconstructions of floor plans [page 2221, left column, paragraph 2].

4-4.	Regarding claim 19, Liu teaches all the limitations of claim 14, wherein the device further comprises the processor coupled to the data storage medium, by disclosing a computer that converts a rasterized floorplan image into a vector-graphics representation using a learning-based approach [Abstract].

Claims 12-13 (Method)
Claims 18, 20 (Device)
4-5.	Regarding claims 12 and 18, Liu teaches the claim comprising: determining a semantic segmentation of a 2D floor plan representing a layout of a building, by: obtaining the 2D floor plan, and applying a function to the 2D floor plan, the function being learnable according to a computer-implemented process for determining a function configured to determine a semantic segmentation of a 2D floor plan representing a layout of a building, the function having a neural network presenting a convolutional encoder-decoder architecture, the neural network comprising a pixel-wise classifier with respect to a set of classes, by disclosing converting a rasterized floorplan image into a vector-graphics representation using a learning-based approach [Abstract]. A Convolutional Neural Network (CNN) is used to convert a raster floorplan image into a first function layer (i.e., junction maps and per-pixel room-classification scores) [Section 1 Introduction, paragraph 4; Section 3 Stratified floorplan representation, paragraph 1; Section 4 Raster to vector conversion, paragraph 1]. Heatmaps at pixel-level are predicted and three deconvolution layers are used in parallel, one for function heatmap regression and two for per-pixel classifications [Section 4.1 Junction layer conversion via a CNN, first paragraph]. The junction layer has two types of per-pixel probability distribution maps over different semantic types, the first distinguishing if a pixel belongs to a wall or a certain room type and the second distinguishing if a pixel belongs to an opening, a certain icon type or empty [Section 3.1 Junction layer].
	Liu teaches the classes comprising at least two classes among a wall class, a door class and a window class, by disclosing that in the intermediate layers, the floorplan data are represented by three factors: walls, openings (doors or windows), or icons [Section 3 Stratified floorplan representation, paragraphs 1-2].
Liu teaches the process including: obtaining a dataset comprising 2D floor plans each associated to a respective semantic segmentation, by disclosing a large-scale dataset with groundtruth for vectorgraphics floorplan conversion, based on the LIFULL HOME’S dataset [4] which contains 5 million floorplan raster images [Section 5.1 Annotating a Floorplan Dataset, paragraph 1, lines 1-5]. To create the groundtruth, 1,000 floorplan images are randomly sampled and human subjects are asked to annotate the geometric and semantic information for each floorplan image [Section 5.1 Annotating a Floorplan Dataset, paragraph 1, lines 5-8]. An annotator can draw a line representing either a wall or an opening, draw a rectangle and pick an object type for each object, or attach a room label at a specific location [Section 5.1 Annotating a Floorplan Dataset, paragraph 1, lines 8-11].
Liu teaches learning the function based on the dataset, by disclosing that for classification tasks, pixelwise softmax cross entropy loss is used where three branches are trained jointly [Section 4.1 Junction layer conversion via a CNN, paragraph 1]. Images with poor quality are dropped and 870 groundtruth floorplan images are collected where 770 of them are used for network training while the remaining 100 examples are served as test images [Section 5.1 Annotating a Floorplan Dataset, paragraph 1, last five lines].

4-6.	Regarding claim 13, Liu teaches all the limitations of claim 12, further comprising: obtaining a 2D floor plan representing a layout of the building, and generating a 3D model representing the building based on the semantic segmentation, by disclosing converting a rasterized floorplan image into a vector-graphics representation that allows 3D model popup for better indoor scene visualization [Abstract; figures 2, 5].

4-7.	Regarding claim 20, Liu teaches all the limitations of claim 18, wherein the device further comprises the processor coupled to the data storage medium, by disclosing a computer that converts a rasterized floorplan image into a vector-graphics representation using a learning-based approach [Abstract].

Claim Rejections - 35 USC § 103
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 3-10 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (“Raster-to-Vector: Revisiting Floorplan Transformation,” October 22, 2017) in view of Merhav et al (Pub. No. US 2017/0300811).

6-1.	Regarding claims 3 and 15, Liu teaches all the limitations of claims 1 and 14 respectively. Liu does not expressly teach wherein the neural network comprises weights, and the learning comprises, with an optimization algorithm, updating the weights according to the dataset and to a loss function. Merhay discloses training a Deep Convolutional Neural Network (DCNN) to identify relevant features of input images [paragraph 24]. A classification layer contains a filter that applies a classification function having weights that may be refined in the same manner as the weights in the functions of the filters of the normal convolutional layers [paragraph 55]. Back propagation involves calculating a gradient of a loss function in a loss layer, with respect to a number of weights in the DCNN [paragraph 56, lines 1-3]. The gradient is then fed to a method that updates the weights for the next iteration of the training of the DCNN in an attempt to minimize the loss function [paragraph 56, lines 3-8]. This would help minimize errors. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to provide an optimization algorithm that updates weights according to the dataset and to a loss function, as taught by Merhay. This would help minimize errors.

6-2.	Regarding claim 4, Liu-Merhay teach all the limitations of claim 3, wherein the optimization algorithm is a stochastic gradient descent, by disclosing use of a stochastic gradient descent function [Merhay, paragraphs 64-67].

6-3.	Regarding claim 5, Liu-Merhay teach all the limitations of claim 4, wherein the loss function is a cross-entropy loss function, by disclosing that for classification tasks, pixelwise softmax cross entropy loss is used [Liu, Section 4.1 Junction layer conversion via a CNN, paragraph 1].

6-4.	Regarding claims 6 and 16, Liu-Merhay teach all the limitations of claims 3 and 15 respectively, wherein the pixel-wise classifier outputs, for each input 2D floor plan, respective data for inference of a semantic segmentation mask of the input 2D floor plan, the semantic segmentation mask being a pixel-wise classification of the 2D floor plan with respect to the set of classes, the loss function penalizing, for each 2D floor plan of the dataset, inference of a semantic segmentation mask erroneous relative to the respective semantic segmentation associated to the 2D floor plan in the dataset, by disclosing that a Convolutional Neural Network (CNN) is used to convert a raster floorplan image into a first function layer (i.e., junction maps and per-pixel room-classification scores) [Liu, Section 1 Introduction, paragraph 4; Section 3 Stratified floorplan representation, paragraph 1; Section 4 Raster to vector conversion, paragraph 1]. For classification tasks, pixelwise softmax cross entropy loss is used [Liu, Section 4.1 Junction layer conversion via a CNN, paragraph 1]. Forward and backward propagation of a training pattern’s input images are performed repeatedly until the error rate is below a particular threshold [Merhay, paragraphs 59-60].

6-5.	Regarding claims 7 and 17, Liu-Merhay teach all the limitations of claims 6 and 16 respectively, wherein the pixel-wise classifier outputs, for each pixel of an input 2D floor plan, respective data for inference of a class of the set of classes, the loss function penalizing, for each pixel of each 2D floor plan of the dataset, inference of a respective class different from a class provided for said pixel by the respective semantic segmentation associated to the 2D floor plan in the dataset, by disclosing that a Convolutional Neural Network (CNN) is used to convert a raster floorplan image into a first function layer (i.e., junction maps and per-pixel room-classification scores) [Liu, Section 1 Introduction, paragraph 4; Section 3 Stratified floorplan representation, paragraph 1; Section 4 Raster to vector conversion, paragraph 1]. For classification tasks, pixelwise softmax cross entropy loss is used [Liu, Section 4.1 Junction layer conversion via a CNN, paragraph 1]. Forward and backward propagation of a training pattern’s input images are performed repeatedly until the error rate is below a particular threshold [Merhay, paragraphs 59-60].

6-6.	Regarding claim 8, Liu-Merhay teach all the limitations of claim 7, wherein the respective data outputted by the pixel-wise classifier comprises a distribution of probabilities over the set of classes, by disclosing that the junction layer has two types of per-pixel probability distribution maps over different semantic types [Liu, Section 3.1 Junction layer].

6-7.	Regarding claim 9, Liu-Merhay teach all the limitations of claim 8, wherein the loss function comprises a sum of loss terms each relative to a respective pixel, each loss term being of the type:
                
                    -
                    
                        
                            ∑
                            
                                i
                                =
                                1
                            
                            
                                c
                            
                        
                        
                            
                                
                                    y
                                
                                
                                    t
                                    r
                                    u
                                    e
                                
                                
                                    i
                                
                            
                        
                    
                    
                        
                            log
                        
                        ⁡
                        
                            (
                            
                                
                                    y
                                
                                
                                    p
                                    r
                                    e
                                    a
                                
                                
                                    i
                                
                            
                        
                    
                    )
                
            
where: 
- C is the number of classes of the set of classes; 
- i designates a class of the set of classes; 
-                 
                    
                        
                            y
                        
                        
                            t
                            r
                            u
                            e
                        
                        
                            i
                        
                    
                
             is a binary indicator if class i is the class provided for the respective pixel by the respective semantic segmentation associated to the 2D floor plan in the dataset; and 
-                 
                    
                        
                            y
                        
                        
                            p
                            r
                            e
                            a
                        
                        
                            i
                        
                    
                
             is a probability outputted by the pixel-wise classifier for class i
by disclosing for classification tasks, pixelwise softmax cross entropy loss is used [Liu, Section 4.1 Junction layer conversion via a CNN, paragraph 1]. The equation above corresponds to the mathematical definition of the cross-entropy loss function.

6-8.	Regarding claim 10, Liu-Merhay teach all the limitations of claim 3, wherein the loss function is multinomial and/or the pixel-wise classifier is a softmax classifier, by disclosing that the classification scheme uses more than two classes [Liu, Section 3 Stratified floorplan representation, paragraphs 1-2] and thus, training the neural network is multinomial. Further, for classification tasks, pixelwise softmax cross entropy loss is used [Liu, Section 4.1 Junction layer conversion via a CNN, paragraph 1].

Conclusion
7.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALVIN H TAN whose telephone number is (571)272-8595. The examiner can normally be reached M-F 10AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on 571-272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ALVIN H TAN/Primary Examiner, Art Unit 2178