Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The information disclosure statement (IDS) submitted on 04/23/2021. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. 
Claims 1, 2, 4-9, 11-16 and 18-20 are pending.
Response to Arguments
Applicant's arguments with respect to claims 1, 8 and 15 have been considered, however the reference previously used in rejecting dependent claim 7 is applicable to the independent amended claims. 
Applicant presents the following arguments in the June 09, 2021 amendment:

There is no disclosure of, pre-processing data values of each column of each of the first data set and the second data set, such that data values within individual columns are of a same length in terms of number of characters, (Page, 11 lines 1-8).
Examiner presents the following responses to Applicant's arguments:
With respect to applicant's argument A, Examiner respectfully disagrees with applicant's arguments. In regards to the applicant's remarks stating that “pre-processing data values of each column of each of the first data set and the second data set, such that data values within individual columns are of a same length in terms of number of characters,” Brewster discloses the data sets are optionally preprocessed. For example, white spaces and/or special characters such as punctuations can be removed, certain special characters such as punctuations can be converted to other characters such as spaces, uppercase characters are converted to lower case characters (or vice versa), a spell check is performed, etc. a first data set comprising a first 
Applicant presents the following arguments in the June 09, 2021 amendment:
Gao does not discuss, providing a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers, (Page 11, lines 1-22).
Examiner presents the following responses to Applicant's arguments:
With respect to applicant's argument B, Examiner respectfully disagrees with applicant's arguments. In regards to the applicant's remarks stating that “providing a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers,” Gao discloses auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. Therefore, the applicant's claim concept is similar to what Gao discloses. Further Gao discloses a multi-task deep neural network (DNN) for representation learning for semantic classification (e.g., query classification) and semantic information retrieval tasks (e.g., ranking for web searches). It is gets passed along to one or more hidden layer. These lines all represent connection like between input and output, which is where you get the final w*) for which the loss function (f) takes a minimum value. According to the mandates of the standard condition, if the Neural Network is at a minimum of the loss function, the gradient is the zero vector. This discloses the current claim language. Further clarification through amendments to the claim language may aid in differentiating from the current prior at citations.  
 Applicant presents the following arguments in the June 09, 2021 amendment:
Gao does not discuss, providing a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers, (Page 11, lines 1-22).
Examiner presents the following responses to Applicant's arguments:
With respect to applicant's argument C, Examiner respectfully disagrees with applicant's arguments. In regards to the applicant's remarks stating that “providing a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers,” Gao discloses auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. A one-hot vector representation by allowing for w*) for which the loss function (f) takes a minimum value. According to the mandates of the standard condition, if the Neural Network is at a minimum of the loss function, the gradient is the zero vector. Weight is the parameter within a neural network that transforms input data within the network's hidden layers. The Input layer is multiple with weight matrices which gives the output of the Hidden Layer. In the network, each element of the input vector is connected to each neuron input through the weight matrix W. The neuron has a summer that gathers its weighted inputs and bias to form its own scalar output n(i). The various n(i) taken together form an S-element net input vector. Finally, the neuron layer outputs form a column vector. A network can have several layers. Each layer has a weight matrix W, a bias vector, and an output vector. Network inputs might have associated processing functions. Processing functions transform user input data to a form that is easier or more efficient for a network. This discloses the current claim language. Further clarification through amendments to the claim language may aid in differentiating from the current prior at citations. 
Applicant presents the following arguments in the June 09, 2021 amendment:
Gao does not discuss, outputting the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multi-dimensional vector to provide an output, the output comprising a difference between the first multi-dimensional vector, (Page 11, Lines 1-22).
Examiner presents the following responses to Applicant's arguments:
With respect to applicant's argument D, Examiner respectfully disagrees with applicant's arguments. In regards to the applicant's remarks stating that “outputting the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multi-dimensional vector to provide an output, the output comprising a difference between the first multi-dimensional vector,” Gao discloses task-specific objective. Such updating approximately optimizes the sum of the multi-task objectives. For query classification of class C, the processor may use the cross entropy loss function as the task-specific objective, (see Gao: Para. 0018, 0037, 0050, 0077- 0089). A loss function is used to optimize the parameter values in a neural network model. Loss functions map a set of parameter values for the network onto a scalar value that indicates how well those parameter accomplish the task the network is intended to do. Neural network put weight on the connection between different neural and different layers. These weights can either positive or negative. Multi-layer neural that will convert a list of data into a list of numerical vectors. Multidimensional vectors, in other word, a vector with two numbers in it. Plot these vectors in multidimensional space. The first thing is gauge similarity using data vectors. The most common way to calculate using cosine similarity function, and it will return a score between as a similarity measure. The X axis would represent the angles between two vectors, and then the V axis is similarity score that would be returned. The loss information by averaging the data vectors together to create level representation.  Therefore, the applicant's claim concept is similar to what Gao discloses. A neuron is the basic unit of a neural network. They receive input from an external source or other nodes. Each node is connected with another node from the next layer, and each such connection has a particular weight. Weights are assigned to a neuron based on its relative importance against other inputs. The layer or layers hidden between the input and output layer is known as the hidden layer. It is called the hidden layer since it is always hidden from the external world. The main computation of a Neural Network takes place in the hidden layers. Neural Networks aims to find the parameter vector (w*) for which the loss function (f) takes a minimum value. According to the mandates of the standard condition, if the Neural Network is at a minimum of the loss function, the gradient is the zero vector. Weight is the parameter within a neural network that transforms input data within the network's hidden layers. The Input layer is multiple with weight matrices which gives the output of the Hidden Layer. In the network, each element of the input vector is connected to each neuron input through the weight matrix W. The neuron has a summer that gathers its weighted inputs and bias to form its own scalar output n(i). The various n(i) taken together form an S-element net input vector. Finally, the neuron layer outputs form a column vector. A network can have several layers. Each layer has a weight matrix W, a bias vector, and an output vector. The Loss Function is one of the important components of Neural Networks. Loss is nothing but a prediction error of Neural Net. And the method to calculate the loss is called Loss Function. The loss function can give a lot of practical flexibility to your neural networks and it will define how exactly the output of the network is connected with the rest of the network. This discloses the current claim language. Further clarification through amendments to the claim language may aid in differentiating from the current prior at citations. 
 Applicant presents the following arguments in the June 09, 2021 amendment:
Gao does not discuss, for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column, the encoder providing encoded data for the first data set, and the second data set, respectively, (Page 11, lines 1-22).
Examiner presents the following responses to Applicant's arguments:
With respect to applicant's argument E, Regarding applicant's Arguments/Remarks filed on
June 09, 2021 with respect to independent claims 1, 8 and 15 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection. In regards to the applicant's remarks stating that “for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column, the encoder providing encoded data for the first data set, and the second data set, respectively,” The arguments are now rejected by newly cited art 'US 10,970,629 A1 Dirac' as explained in the body of rejection below. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7, 8, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Brewster et al. (US 2017/0262491 A1, hereinafter Brewster) in view of Dirac et al. (US 10,970,629 B1, hereinafter Dirac) in view of Gao et al. (US 2017/0032035 A1, hereinafter Gao) and in view of Carlsson (US 2016/0267397 A1, hereinafter Carlsson).  
Regarding independent claim(s) 1, Brewster discloses a computer-implemented method executed by one or more processors, the method comprising: receiving, a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns (Brewster discloses a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns.  Data sets are shown to include tables of data records (also referred to as cells or entries) organized in rows and columns. The data sets may be implemented to include tables storing the data in a data format of rows and columns, as well as in other data formats (e.g., list format, compressed data stream format, etc.) that can be interpreted/translated into a logical organization of tables. A data set can be stored in a memory. The term structured data refers to data that resides in a fixed field within a file or record. Structured data is typically stored in a relational database (RDBMS). For example, through the network interface 116, the processor 102 can receive information (e.g., data objects or program instructions) from another network. A part of the data preparation engine that transmits data to and receives data from a client application, or as a combination, (see Brewster: Para. 0018-0035). This reads on the claim concept of a computer-implemented method executed by one or more processors, the method comprising: receiving, a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns); 
pre-processing data values of each column of each of the first data set and the second data set, such that data values within individual columns are of a same length in terms of number of characters (Brewster discloses the data sets are optionally preprocessed. For example, white spaces and/or special characters such as punctuations can be removed, certain special characters such as punctuations can be converted to other characters such as spaces, uppercase characters are converted to lower case characters (or vice versa), a spell check is performed, etc. a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns.  Data sets are shown to include tables of data records (also referred to as cells or entries) organized in rows and columns. Normalization is the organization of data to appear similar across all records and fields. Examples of symbol types include punctuation, letter, number classes, etc. For instance, the cell value of"123-abc-def' includes four symbol type transitions). ASCII is a character encoding that uses numeric codes to represent characters (It is a code for representing English characters as numbers, with each letter assigned a number from 0 to 127). In the following example, # represents a numeral and * represents a character, and #? represents a string of numerals and *? represents a string of characters. Referring to the example shown in FIGS. 1 and 6, for cluster 608, the application of the TOPE! function on the contents of columns entitled "Order Number" and "Product ID" of Table 1 results in patterns of *####### ( a letter followed by seven numerals) and ###### (six numerals), respectively. The values of the columns in the tables are considered during the identification process. Feature extraction is performed on the contents of cells in each column. Some examples of features being extracted include: the number of spaces in the cells of the column, the number of punctuations in the cells of the column, the average length of values in the cells of the column, the variance of cell values in the column, the total number of words in cells of the column, the average number of words in cells of the column, and the number of symbol type transitions in cells of the column. The features are extracted for each column, normalized, and clustered in an N-dimensional space, (see Brewster: Para. 0018-0045, 0055-0056 and FIG. 4). This reads on the claim concepts of pre-processing data values of each column of each of the first data set and the second data set, such that data values within individual columns are of a same length in terms of number of characters); 
However, Brewster does not appears to specifically disclose for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively.
In the same field of endeavor, Dirac discloses for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively (Dirac discloses   models representing data relationships and patterns, such as functions, algorithms, systems, and the like, may accept input (sometimes referred to as an input vector), and produce output (sometimes referred to as an output vector) that corresponds to the input in some way. Sets of encoded training data input vectors (e.g., "mini batches") may be arranged as encoded input matrices. Each row of an input matrix may correspond to an individual encoded training data input vector, and each column of the input matrix may correspond to an individual node of the input layer. Deep neural networks have multiple layers of nodes. Encoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). The encoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”). A high-dimensional input vector may be encoded to generate an encoded, reduced-dimensional input vector using a probabilistic structure with a plurality of mapping functions. A machine learning model 202 from multi-label training data input vectors 210 and reference data output vectors. The individual rows in the weight matrix W1 may correspond to the individual nodes in the input layer 104, and the individual columns in the weight matrix W1 may correspond to the individual nodes in the internal layer. For a DNN, the input and the output may be encoded and decoded, respectively, (see Dirac: Col. 4 lines 1-67, Col. 5 lines 1-67, Col. 6 lines 1-67, Col. 9 lines 1-67, Col. 10 lines 1-67, Col. 12 lines 1-67, Col. 15 lines 1-67 and FIG. 1-3). This reads on the claim concept for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively);   
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality data sets are optionally preprocessed Brewster in order to have incorporated deep neural networks within columns type, as disclosed by Dirac, since both of these are directed to neural networks can usually be read from left to right. The first layer is the layer in which inputs are entered. There are internals layers (called hidden layers) that do some math, and one last layer that contains all the possible outputs. For example it adds up the value of every neurons from the previous column it is connected to, there are inputs (x1, x2, x3) coming to the neuron, so 3 neurons of the previous column are connected to our neuron. This value is multiplied, before being added, by another variable called “weight” (w1, w2, w3) which determines the connection between the two neurons. Each connection of neurons has its own weight, and those are the only values that will be modified during the learning process. The forward pass involves multiplying large weight matrices, representing the parameters of the model, by vectors corresponding to input feature vectors or hidden intermediate representations. The parameters of a can be set in a process referred to as training. First of all, remember that when an input is given to the neural network, it returns an output. The Encoder-Decoder architecture with recurrent neural networks has become an effective and standard approach for both neural machine translation and sequence-to-sequence (seq2seq) prediction in general. An Encoder-Decoder architecture was developed where an input sequence was read in entirety and encoded to a fixed-length internal representation. The model uses the same two-model approach, here giving it the explicit name of the encoder-decoder architecture. This vector aims to encapsulate the information for all input elements in order to help the decoder make accurate predictions. Encoder decoder models allow for a process in which a machine learning model generates a sentence describing a data. It receives the data as the input and outputs a sequence of words.  Each of the arrows in network has an associated weight value. The value x1 going into the node i1 will be distributed according to the values of the weights. In order to efficiently execute all the necessary calaculations, arrange the weights into a weight matrix. The name should indicate that the weights are connecting the input and the hidden nodes, i.e. they are between the input and the hidden layer. The matrix multiplication between the matrix wih and the matrix of the values of the input nodes x1,x2,x3 calculates the output which will be passed to the activation function. Incorporating the teachings of Dirac into Brewster would produce the input to a machine learning model may be encoded using a probabilistic data structure with a plurality of mapping functions into a lower dimensional space. Encoding the input to the machine learning model results in a compact machine learning model with a reduced model size. The compact machine learning model can output an encoded representation of a higher-dimensional space, as disclosed by Dirac, (see Abstract).  
However, Brewster and Dirac do not appears to specifically disclose providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers; providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers; and Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multidimensional vector and the second multi-dimensional vector to provide an output, the output comprising a difference between the first multi-dimensional vector.
In the same field of endeavor, Gao discloses providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. This reads on the claim concept of providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers). 
providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. A one-hot vector representation by allowing for representation of out-of-vocabulary words by n-gram vectors. Reading, cleaning and split it the data into trained model, which is this retures a single vector that prepared to be passed directly into machine learning model. Creating many randomly assigned training sets. Each individual training set, is used to train recommender system independently and then measure the accuracy of the resulting system against the test set, (see Gao: Para. 0072-0077). This reads on the claim concept of providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers). 
Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector (Gao discloses task-specific objective. Such updating approximately optimizes the sum of the multi-task objectives. For query classification of class C, the processor may use the cross entropy loss function as the task-specific objective, (see Gao: Para. 0018, 0037, 0050, 0077- 0089). A loss function is used to optimize the parameter values in a neural network model. Loss functions map a set of parameter values for the network onto a scalar value that indicates how well those parameter accomplish the task the network is intended to do. Neural network put weight on the connection between different neural and different layers. These weights can either positive or negative. Multi-layer neural that will convert a list of data into a list of numerical vectors. Multidimensional vectors, in other word, a vector with two numbers in it. Plot these vectors in multidimensional space. The first thing is gauge similarity using data vectors. The most common way to calculate using cosine similarity function, and it will return a score between as a similarity measure. The X axis would represent the angles between two vectors, and then the V axis is similarity score that would be returned. The loss information by averaging the data vectors together to create level representation. This reads on the claim concept of Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multidimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector) and 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality of columns and data sets are optionally preprocessed Brewster and Dirac in order to have incorporated deep neural networks within multi-dimensional vector to a loss-function, as disclosed by Gao, since both of these are directed to a data set is a set or collection of data. This set is normally presented in a tabular pattern. Every column describes a particular variable. And each row corresponds to a given member of the data set, as per the given question. The data are essentially organized to a certain model that helps to process the needed information. The set of data is any permanently saved collection of information that usually contains either case-level, gathered data, or statistical guidance level data. Data Preprocessing provides operations which can organize the data into a proper form for better understanding in data mining process. They are Data Cleaning/Cleansing, Data Integration, Data Transformation, and Data Reduction. The neural network needs to learn all the time to solve tasks in a more qualified manner or even to use various methods to provide a better result. When it gets new information in the system, it learns how to act accordingly to a new situation. Learning becomes deeper when tasks you solve get harder. Deep neural network represents the type of machine learning when the system uses many layers of nodes to derive high-level functions from input information. It means transforming the data into a more creative and abstract component. Each Hidden layer is composed of neurons. The neurons are connected to each other. The neuron will process and then propagate the input signal it receives the layer above it. The strength of the signal given the neuron in the next layer depends on the weight, bias and activation function. The network consumes large amounts of input data and operates them through multiple layers; the network can learn increasingly complex features of the data at each layer. A deep neural network provides state-of-the-art accuracy in many tasks, from object detection to speech recognition. They can learn automatically, without predefined knowledge explicitly coded by the programmers. A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e., the hierarchy of knowledge. A neural network with four layers will learn more complex feature than with that with two layers. The first phase consists of applying a nonlinear transformation of the input and create a statistical model as output. The second phase aims at improving the model with a mathematical method known as derivative. Auto encoder networks teach themselves how to compress data from the input layer into a shorter code, and then uncompress that code into whatever format best matches the original input. This process sometimes involves multiple auto encoders, such as stacked sparse auto encoder layers used in image processing. Incorporating the teachings of Gao into Brewster and Dirac would produce receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks, as disclosed by Gao, (see Abstract). 
However, Brewster, Dirac and Gao does not appears to specifically disclose the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set. 
In the same field of endeavor, Carlsson discloses the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set (Carlsson discloses the database may be any data structure containing data (e.g., a very large dataset of multidimensional data). The input module 314 receives a database identifier and accesses a large multidimensional database. The input module 314 may scan the database and provide the user with an interface window allowing the user to identify an ID field. Vector is an additional set of weights in a neural network. Data point is a member of the particular group, apply the prediction model to the second transformation data set to generate predicted outcomes. Values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point in the training data set if the particular Data point is a member of the particular group, and apply a machine learning model to the first transformation data set to generate a prediction model. A plurality of groups, each group of the plurality of groups including a different subset of data points of the training data set, each data point of the training data set being a member of at least one group of the plurality of groups, creating a first transformation data set, the first transformation data set including the training data set as well as a plurality of feature subsets, each of the plurality of feature subsets being associated with at least one group of the plurality of groups. A dataset (or data set) is a collection of data, which is singular of data refers to a single point of data. The first data set is being trained by using a method such as predictive modeling, machine learning or neuro networks to predict the exact data match outcomes for incoming data points in a second data set. The prediction module 2702 may be tested using a test data set with known outcomes to assess whether the prediction model output is the same as (an exact match) or similar to known outcomes, (see Carlsson: Para. 0106, 0303, 00420-0462 and 0473-0481). This reads on the claim concept of the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality of columns and data sets are optionally preprocessed for deep neural networks within multi-dimensional vector to a loss-function Brewster, Dirac and Gao in order to have incorporated deep neural networks within multi-dimensional vector to a loss-function, as disclosed by Carlsson, since both of these are directed to a dataset is a structured collection of data generally associated with a unique body of work. A database is an organized collection of data stored as multiple datasets. Those datasets are generally stored and accessed electronically from a computer system that allows the data to be easily accessed, manipulated, and updated. A data point or observation is a set of one or more measurements on a single member of the unit of observation. Simply put, predictive analytics uses past trends and applies them to future. Neural networks can learn complex patterns using layers of neurons which mathematically transform the data. The layers between the input and output are referred to as hidden layers. A neural network can learn relationships between the features that other algorithms cannot easily discover. An artificial neuron is a mathematical function. It takes one or more inputs that are multiplied by values called 'weights' and added together. This value is then passed to a non-linear function, referred to as an 'activation function', which becomes the output. The features are passed as inputs, e.g. size, brand, location, etc. This is the target variable, the thing we are trying to predict, e.g. the price of an item. Machine learning algorithms create a model after training, this is a mathematical function that can then be used to take a new observation and calculates an appropriate prediction. Training samples consist of measured data of some kind combined with the solutions that will help the neural network to generalize all this information into a consistent input-output relationship. Incorporating the teachings of Carlsson into Brewster, Dirac and Gao would produce values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point if the particular data point is a member of the particular group, and applying a machine learning model to the first transformation data set to generate a prediction model, as disclosed by Carlsson, (see Abstract).   
Regarding dependent claim(s) 7, the combination of Brewster, Dirac, Gao and Carlsson discloses the method as in claim 1. Brewster further discloses comprising determining a column type for each column of the plurality of columns (The first data set and the second data set are appended according to at least the identified plurality of matching columns and the user specification to generate a resulting data set. the number of spaces in the cells of the column, the number of punctuations in the cells of the column, the average length of values in the cells of the column, the variance of cell values in the column, the total number of words in cells of the column, the average number of words in cells of the column, and the number of symbol type transitions in cells of the column. This read on the claim concept of comprising determining a column type for each column of the plurality of columns, see Brewster: Para. 0020, 0021, 0025, 0040, 0048, 0050, 0051 and FIG. 9).  
Regarding independent claim(s) 8, Brewster discloses a non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations, the operations comprising (Brewster discloses processor 102 can be implemented by a single-chip processor or by multiple processors. Using instructions retrieved from memory 110, the processor 102 controls the reception and manipulation of input data, and the output and display of data on output devices (e.g., display 118). An instruction set is code that the computer processor (CPU) can understand. Information, often represented as a sequence of instructions to be executed on a processor, can be received from and outputted to another network, (see Brewster: 0018-0027). This reads on the claim concept of a non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations, the operations comprising a non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations, the operations comprising): 
receiving a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns (Brewster discloses a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns.  Data sets are shown to include tables of data records (also referred to as cells or entries) organized in rows and columns. The data sets may be implemented to include tables storing the data in a data format of rows and columns, as well as in other data formats (e.g., list format, compressed data stream format, etc.) that can be interpreted/translated into a logical organization of tables. A data set can be stored in a memory. The term structured data refers to data that resides in a fixed field within a file or record. Structured data is typically stored in a relational database (RDBMS). For example, through the network interface 116, the processor 102 can receive information (e.g., data objects or program instructions) from another network. A part of the data preparation engine that transmits data to and receives data from a client application, or as a combination, (see Brewster: Para. 0018-0035). This reads on the claim concept of receiving a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns);   
pre-processing data values of each column of each of the first data set and the second data set, such that data values within individual columns are of a same length in terms of number of characters (Brewster discloses the data sets are optionally preprocessed. For example, white spaces and/or special characters such as punctuations can be removed, certain special characters such as punctuations can be converted to other characters such as spaces, uppercase characters are converted to lower case characters (or vice versa), a spell check is performed, etc. a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns.  Data sets are shown to include tables of data records (also referred to as cells or entries) organized in rows and columns. Normalization is the organization of data to appear similar across all records and fields. Examples of symbol types include punctuation, letter, number classes, etc. For instance, the cell value of"123-abc-def' includes four symbol type transitions). ASCII is a character encoding that uses numeric codes to represent characters (It is a code for representing English characters as numbers, with each letter assigned a number from 0 to 127). In the following example, # represents a numeral and * represents a character, and #? represents a string of numerals and *? represents a string of characters. Referring to the example shown in FIGS. 1 and 6, for cluster 608, the application of the TOPE! function on the contents of columns entitled "Order Number" and "Product ID" of Table 1 results in patterns of *####### ( a letter followed by seven numerals) and ###### (six numerals), respectively. The values of the columns in the tables are considered during the identification process. Feature extraction is performed on the contents of cells in each column. Some examples of features being extracted include: the number of spaces in the cells of the column, the number of punctuations in the cells of the column, the average length of values in the cells of the column, the variance of cell values in the column, the total number of words in cells of the column, the average number of words in cells of the column, and the number of symbol type transitions in cells of the column. The features are extracted for each column, normalized, and clustered in an N-dimensional space, (see Brewster: Para. 0018-0045, 0055-0056 and FIG. 4). This reads on the claim concepts of pre-processing data values of each column of each of the first data set and the second data set, such that data values within individual columns are of a same length in terms of number of characters); 
However, Brewster does not appears to specifically disclose for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively. 
In the same field of endeavor, Dirac discloses for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively (Dirac discloses   models representing data relationships and patterns, such as functions, algorithms, systems, and the like, may accept input (sometimes referred to as an input vector), and produce output (sometimes referred to as an output vector) that corresponds to the input in some way. Sets of encoded training data input vectors (e.g., "mini batches") may be arranged as encoded input matrices. Each row of an input matrix may correspond to an individual encoded training data input vector, and each column of the input matrix may correspond to an individual node of the input layer. Deep neural networks have multiple layers of nodes. Encoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). The encoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”). A high-dimensional input vector may be encoded to generate an encoded, reduced-dimensional input vector using a probabilistic structure with a plurality of mapping functions. A machine learning model 202 from multi-label training data input vectors 210 and reference data output vectors. The individual rows in the weight matrix W1 may correspond to the individual nodes in the input layer 104, and the individual columns in the weight matrix W1 may correspond to the individual nodes in the internal layer. For a DNN, the input and the output may be encoded and decoded, respectively, (see Dirac: Col. 4 lines 1-67, Col. 5 lines 1-67, Col. 6 lines 1-67, Col. 9 lines 1-67, Col. 10 lines 1-67, Col. 12 lines 1-67, Col. 15 lines 1-67 and FIG. 1-3). This reads on the claim concept for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively);   
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality data sets are optionally preprocessed Brewster in order to have incorporated deep neural networks within columns type, as disclosed by Dirac, since both of these are directed to neural networks can usually be read from left to right. The first layer is the layer in which inputs are entered. There are internals layers (called hidden layers) that do some math, and one last layer that contains all the possible outputs. For example it adds up the value of every neurons from the previous column it is connected to, there are inputs (x1, x2, x3) coming to the neuron, so 3 neurons of the previous column are connected to our neuron. This value is multiplied, before being added, by another variable called “weight” (w1, w2, w3) which determines the connection between the two neurons. Each connection of neurons has its own weight, and those are the only values that will be modified during the learning process. The forward pass involves multiplying large weight matrices, representing the parameters of the model, by vectors corresponding to input feature vectors or hidden intermediate representations. The parameters of a can be set in a process referred to as training. First of all, remember that when an input is given to the neural network, it returns an output. The Encoder-Decoder architecture with recurrent neural networks has become an effective and standard approach for both neural machine translation and sequence-to-sequence (seq2seq) prediction in general. An Encoder-Decoder architecture was developed where an input sequence was read in entirety and encoded to a fixed-length internal representation. The model uses the same two-model approach, here giving it the explicit name of the encoder-decoder architecture. This vector aims to encapsulate the information for all input elements in order to help the decoder make accurate predictions. Encoder decoder models allow for a process in which a machine learning model generates a sentence describing a data. It receives the data as the input and outputs a sequence of words.  Each of the arrows in network has an associated weight value. The value x1 going into the node i1 will be distributed according to the values of the weights. In order to efficiently execute all the necessary calaculations, arrange the weights into a weight matrix. The name should indicate that the weights are connecting the input and the hidden nodes, i.e. they are between the input and the hidden layer. The matrix multiplication between the matrix wih and the matrix of the values of the input nodes x1,x2,x3 calculates the output which will be passed to the activation function. Incorporating the teachings of Dirac into Brewster would produce the input to a machine learning model may be encoded using a probabilistic data structure with a plurality of mapping functions into a lower dimensional space. Encoding the input to the machine learning model results in a compact machine learning model with a reduced model size. The compact machine learning model can output an encoded representation of a higher-dimensional space, as disclosed by Dirac, (see Abstract).  
However, Brewster and Dirac do not appears to specifically disclose providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers; providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers; and Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multidimensional vector and the second multi-dimensional vector to provide an output, the output comprising a difference between the first multi-dimensional vector. 
In the same field of endeavor, Gao discloses providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. This reads on the claim concept of providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers). 
providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. A one-hot vector representation by allowing for representation of out-of-vocabulary words by n-gram vectors. Reading, cleaning and split it the data into trained model, which is this retures a single vector that prepared to be passed directly into machine learning model. Creating many randomly assigned training sets. Each individual training set, is used to train recommender system independently and then measure the accuracy of the resulting system against the test set, (see Gao: Para. 0072-0077). This reads on the claim concept of providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers). 
Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector (Gao discloses task-specific objective. Such updating approximately optimizes the sum of the multi-task objectives. For query classification of class C, the processor may use the cross entropy loss function as the task-specific objective, (see Gao: Para. 0018, 0037, 0050, 0077- 0089). A loss function is used to optimize the parameter values in a neural network model. Loss functions map a set of parameter values for the network onto a scalar value that indicates how well those parameter accomplish the task the network is intended to do. Neural network put weight on the connection between different neural and different layers. These weights can either positive or negative. Multi-layer neural that will convert a list of data into a list of numerical vectors. Multidimensional vectors, in other word, a vector with two numbers in it. Plot these vectors in multidimensional space. The first thing is gauge similarity using data vectors. The most common way to calculate using cosine similarity function, and it will return a score between as a similarity measure. The X axis would represent the angles between two vectors, and then the V axis is similarity score that would be returned. The loss information by averaging the data vectors together to create level representation. This reads on the claim concept of Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multidimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector).   
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality of columns and data sets are optionally preprocessed Brewster and Dirac in order to have incorporated deep neural networks within multi-dimensional vector to a loss-function, as disclosed by Gao, since both of these are directed to a data set is a set or collection of data. This set is normally presented in a tabular pattern. Every column describes a particular variable. And each row corresponds to a given member of the data set, as per the given question. The data are essentially organized to a certain model that helps to process the needed information. The set of data is any permanently saved collection of information that usually contains either case-level, gathered data, or statistical guidance level data. Data Preprocessing provides operations which can organize the data into a proper form for better understanding in data mining process. They are Data Cleaning/Cleansing, Data Integration, Data Transformation, and Data Reduction. The neural network needs to learn all the time to solve tasks in a more qualified manner or even to use various methods to provide a better result. When it gets new information in the system, it learns how to act accordingly to a new situation. Learning becomes deeper when tasks you solve get harder. Deep neural network represents the type of machine learning when the system uses many layers of nodes to derive high-level functions from input information. It means transforming the data into a more creative and abstract component. Each Hidden layer is composed of neurons. The neurons are connected to each other. The neuron will process and then propagate the input signal it receives the layer above it. The strength of the signal given the neuron in the next layer depends on the weight, bias and activation function. The network consumes large amounts of input data and operates them through multiple layers; the network can learn increasingly complex features of the data at each layer. A deep neural network provides state-of-the-art accuracy in many tasks, from object detection to speech recognition. They can learn automatically, without predefined knowledge explicitly coded by the programmers. A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e., the hierarchy of knowledge. A neural network with four layers will learn more complex feature than with that with two layers. The first phase consists of applying a nonlinear transformation of the input and create a statistical model as output. The second phase aims at improving the model with a mathematical method known as derivative. Auto encoder networks teach themselves how to compress data from the input layer into a shorter code, and then uncompress that code into whatever format best matches the original input. This process sometimes involves multiple auto encoders, such as stacked sparse auto encoder layers used in image processing. Incorporating the teachings of Gao into Brewster and Dirac would produce receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks, as disclosed by Gao, (see Abstract). 
However, Brewster, Dirac and Gao do not appears to specifically disclose the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set.  
In the same field of endeavor, Carlsson discloses the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set (Carlsson discloses the database may be any data structure containing data (e.g., a very large dataset of multidimensional data). The input module 314 receives a database identifier and accesses a large multidimensional database. The input module 314 may scan the database and provide the user with an interface window allowing the user to identify an ID field. Vector is an additional set of weights in a neural network. Data point is a member of the particular group, apply the prediction model to the second transformation data set to generate predicted outcomes. Values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point in the training data set if the particular Data point is a member of the particular group, and apply a machine learning model to the first transformation data set to generate a prediction model. A plurality of groups, each group of the plurality of groups including a different subset of data points of the training data set, each data point of the training data set being a member of at least one group of the plurality of groups, creating a first transformation data set, the first transformation data set including the training data set as well as a plurality of feature subsets, each of the plurality of feature subsets being associated with at least one group of the plurality of groups. A dataset (or data set) is a collection of data, which is singular of data refers to a single point of data. The first data set is being trained by using a method such as predictive modeling, machine learning or neuro networks to predict the exact data match outcomes for incoming data points in a second data set. The prediction module 2702 may be tested using a test data set with known outcomes to assess whether the prediction model output is the same as (an exact match) or similar to known outcomes, (see Carlsson: Para. 0106, 0303, 00420-0462 and 0473-0481). This reads on the claim concept of the second multidimensional vector representing an exact match between a data point of the first data set and a data point of the second data set). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality of columns and data sets are optionally preprocessed for deep neural networks within multi-dimensional vector to a loss-function Brewster, Dirac and Gao in order to have incorporated deep neural networks within multi-dimensional vector to a loss-function, as disclosed by Carlsson, since both of these are directed to a dataset is a structured collection of data generally associated with a unique body of work. A database is an organized collection of data stored as multiple datasets. Those datasets are generally stored and accessed electronically from a computer system that allows the data to be easily accessed, manipulated, and updated. A data point or observation is a set of one or more measurements on a single member of the unit of observation. Simply put, predictive analytics uses past trends and applies them to future. Neural networks can learn complex patterns using layers of neurons which mathematically transform the data. The layers between the input and output are referred to as hidden layers. A neural network can learn relationships between the features that other algorithms cannot easily discover. An artificial neuron is a mathematical function. It takes one or more inputs that are multiplied by values called 'weights' and added together. This value is then passed to a non-linear function, referred to as an 'activation function', which becomes the output. The features are passed as inputs, e.g. size, brand, location, etc. This is the target variable, the thing we are trying to predict, e.g. the price of an item. Machine learning algorithms create a model after training, this is a mathematical function that can then be used to take a new observation and calculates an appropriate prediction. Training samples consist of measured data of some kind combined with the solutions that will help the neural network to generalize all this information into a consistent input-output relationship. Incorporating the teachings of Carlsson into Brewster, Dirac and Gao would produce values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point if the particular data point is a member of the particular group, and applying a machine learning model to the first transformation data set to generate a prediction model, as disclosed by Carlsson, (see Abstract).  
Regarding dependent claim(s) 14, (drawn computer-readable storage medium): claims 14 is computer-readable storage medium claim respectively that correspond to method of claim 7. Therefore, 14 is rejected for at least the same reasons as the method of 7.  
Regarding independent claim(s) 15, Brewster discloses a system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations, the operations comprising receiving a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns (Brewster discloses processor 102 can be implemented by a single-chip processor or by multiple processors. Using instructions retrieved from memory 110, the processor 102 controls the reception and manipulation of input data, and the output and display of data on output devices (e.g., display 118). An instruction set is code that the computer processor (CPU) can understand. Information, often represented as a sequence of instructions to be executed on a processor, can be received from and outputted to another network. A storage device is any type of computing hardware that is used for storing, porting or extracting data files and objects. Further Brewster discloses a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns.  Data sets are shown to include tables of data records (also referred to as cells or entries) organized in rows and columns. The data sets may be implemented to include tables storing the data in a data format of rows and columns, as well as in other data formats (e.g., list format, compressed data stream format, etc.) that can be interpreted/translated into a logical organization of tables. A data set can be stored in a memory. The term structured data refers to data that resides in a fixed field within a file or record. Structured data is typically stored in a relational database (RDBMS). For example, through the network interface 116, the processor 102 can receive information (e.g., data objects or program instructions) from another network. A part of the data preparation engine that transmits data to and receives data from a client application, or as a combination, (see Brewster: Para. 0018-0035). This reads on the claim concept of a system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations, the operations comprising receiving a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns); 
pre-processing data values of each column of each of the first data set and the second data set, such that data values within individual columns are of a same length in terms of number of characters  (Brewster discloses the data sets are optionally preprocessed. For example, white spaces and/or special characters such as punctuations can be removed, certain special characters such as punctuations can be converted to other characters such as spaces, uppercase characters are converted to lower case characters (or vice versa), a spell check is performed, etc. a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns.  Data sets are shown to include tables of data records (also referred to as cells or entries) organized in rows and columns. Normalization is the organization of data to appear similar across all records and fields. Examples of symbol types include punctuation, letter, number classes, etc. For instance, the cell value of"123-abc-def' includes four symbol type transitions). ASCII is a character encoding that uses numeric codes to represent characters (It is a code for representing English characters as numbers, with each letter assigned a number from 0 to 127). In the following example, # represents a numeral and * represents a character, and #? represents a string of numerals and *? represents a string of characters. Referring to the example shown in FIGS. 1 and 6, for cluster 608, the application of the TOPE! function on the contents of columns entitled "Order Number" and "Product ID" of Table 1 results in patterns of *####### ( a letter followed by seven numerals) and ###### (six numerals), respectively. The values of the columns in the tables are considered during the identification process. Feature extraction is performed on the contents of cells in each column. Some examples of features being extracted include: the number of spaces in the cells of the column, the number of punctuations in the cells of the column, the average length of values in the cells of the column, the variance of cell values in the column, the total number of words in cells of the column, the average number of words in cells of the column, and the number of symbol type transitions in cells of the column. The features are extracted for each column, normalized, and clustered in an N-dimensional space, (see Brewster: Para. 0018-0045, 0055-0056 and FIG. 4). This reads on the claim concepts of pre-processing data values of each column of each of the first data set and the second data set, such that data values within individual columns are of a same length in terms of number of characters);        
However, Brewster does not appears to specifically disclose for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively. 
In the same field of endeavor, Dirac discloses for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively (Dirac discloses   models representing data relationships and patterns, such as functions, algorithms, systems, and the like, may accept input (sometimes referred to as an input vector), and produce output (sometimes referred to as an output vector) that corresponds to the input in some way. Sets of encoded training data input vectors (e.g., "mini batches") may be arranged as encoded input matrices. Each row of an input matrix may correspond to an individual encoded training data input vector, and each column of the input matrix may correspond to an individual node of the input layer. Deep neural networks have multiple layers of nodes. Encoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). The encoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”). A high-dimensional input vector may be encoded to generate an encoded, reduced-dimensional input vector using a probabilistic structure with a plurality of mapping functions. A machine learning model 202 from multi-label training data input vectors 210 and reference data output vectors. The individual rows in the weight matrix W1 may correspond to the individual nodes in the input layer 104, and the individual columns in the weight matrix W1 may correspond to the individual nodes in the internal layer. For a DNN, the input and the output may be encoded and decoded, respectively, (see Dirac: Col. 4 lines 1-67, Col. 5 lines 1-67, Col. 6 lines 1-67, Col. 9 lines 1-67, Col. 10 lines 1-67, Col. 12 lines 1-67, Col. 15 lines 1-67 and FIG. 1-3). This reads on the claim concept for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively);   
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality data sets are optionally preprocessed Brewster in order to have incorporated deep neural networks within columns type, as disclosed by Dirac, since both of these are directed to neural networks can usually be read from left to right. The first layer is the layer in which inputs are entered. There are internals layers (called hidden layers) that do some math, and one last layer that contains all the possible outputs. For example it adds up the value of every neurons from the previous column it is connected to, there are inputs (x1, x2, x3) coming to the neuron, so 3 neurons of the previous column are connected to our neuron. This value is multiplied, before being added, by another variable called “weight” (w1, w2, w3) which determines the connection between the two neurons. Each connection of neurons has its own weight, and those are the only values that will be modified during the learning process. The forward pass involves multiplying large weight matrices, representing the parameters of the model, by vectors corresponding to input feature vectors or hidden intermediate representations. The parameters of a can be set in a process referred to as training. First of all, remember that when an input is given to the neural network, it returns an output. The Encoder-Decoder architecture with recurrent neural networks has become an effective and standard approach for both neural machine translation and sequence-to-sequence (seq2seq) prediction in general. An Encoder-Decoder architecture was developed where an input sequence was read in entirety and encoded to a fixed-length internal representation. The model uses the same two-model approach, here giving it the explicit name of the encoder-decoder architecture. This vector aims to encapsulate the information for all input elements in order to help the decoder make accurate predictions. Encoder decoder models allow for a process in which a machine learning model generates a sentence describing a data. It receives the data as the input and outputs a sequence of words.  Each of the arrows in network has an associated weight value. The value x1 going into the node i1 will be distributed according to the values of the weights. In order to efficiently execute all the necessary calaculations, arrange the weights into a weight matrix. The name should indicate that the weights are connecting the input and the hidden nodes, i.e. they are between the input and the hidden layer. The matrix multiplication between the matrix wih and the matrix of the values of the input nodes x1,x2,x3 calculates the output which will be passed to the activation function. Incorporating the teachings of Dirac into Brewster would produce the input to a machine learning model may be encoded using a probabilistic data structure with a plurality of mapping functions into a lower dimensional space. Encoding the input to the machine learning model results in a compact machine learning model with a reduced model size. The compact machine learning model can output an encoded representation of a higher-dimensional space, as disclosed by Dirac, (see Abstract).  
However, Brewster and Dirac do not appears to specifically disclose providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers; providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers; and Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multidimensional vector and the second multi-dimensional vector to provide an output, the output comprising a difference between the first multi-dimensional vector.
In the same field of endeavor, Gao discloses providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. This reads on the claim concept of providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers).     
providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. A one-hot vector representation by allowing for representation of out-of-vocabulary words by n-gram vectors. Reading, cleaning and split it the data into trained model, which is this retures a single vector that prepared to be passed directly into machine learning model. Creating many randomly assigned training sets. Each individual training set, is used to train recommender system independently and then measure the accuracy of the resulting system against the test set, (see Gao: Para. 0072-0077). This reads on the claim concept of providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers). 
Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector (Gao discloses task-specific objective. Such updating approximately optimizes the sum of the multi-task objectives. For query classification of class C, the processor may use the cross entropy loss function as the task-specific objective, (see Gao: Para. 0018, 0037, 0050, 0077- 0089). A loss function is used to optimize the parameter values in a neural network model. Loss functions map a set of parameter values for the network onto a scalar value that indicates how well those parameter accomplish the task the network is intended to do. Neural network put weight on the connection between different neural and different layers. These weights can either positive or negative. Multi-layer neural that will convert a list of data into a list of numerical vectors. Multidimensional vectors, in other word, a vector with two numbers in it. Plot these vectors in multidimensional space. The first thing is gauge similarity using data vectors. The most common way to calculate using cosine similarity function, and it will return a score between as a similarity measure. The X axis would represent the angles between two vectors, and then the V axis is similarity score that would be returned. The loss information by averaging the data vectors together to create level representation. This reads on the claim concept of Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multidimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector).
  Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality of columns and data sets are optionally preprocessed Brewster and Dirac in order to have incorporated deep neural networks within multi-dimensional vector to a loss-function, as disclosed by Gao, since both of these are directed to a data set is a set or collection of data. This set is normally presented in a tabular pattern. Every column describes a particular variable. And each row corresponds to a given member of the data set, as per the given question. The data are essentially organized to a certain model that helps to process the needed information. The set of data is any permanently saved collection of information that usually contains either case-level, gathered data, or statistical guidance level data. Data Preprocessing provides operations which can organize the data into a proper form for better understanding in data mining process. They are Data Cleaning/Cleansing, Data Integration, Data Transformation, and Data Reduction. The neural network needs to learn all the time to solve tasks in a more qualified manner or even to use various methods to provide a better result. When it gets new information in the system, it learns how to act accordingly to a new situation. Learning becomes deeper when tasks you solve get harder. Deep neural network represents the type of machine learning when the system uses many layers of nodes to derive high-level functions from input information. It means transforming the data into a more creative and abstract component. Each Hidden layer is composed of neurons. The neurons are connected to each other. The neuron will process and then propagate the input signal it receives the layer above it. The strength of the signal given the neuron in the next layer depends on the weight, bias and activation function. The network consumes large amounts of input data and operates them through multiple layers; the network can learn increasingly complex features of the data at each layer. A deep neural network provides state-of-the-art accuracy in many tasks, from object detection to speech recognition. They can learn automatically, without predefined knowledge explicitly coded by the programmers. A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e., the hierarchy of knowledge. A neural network with four layers will learn more complex feature than with that with two layers. The first phase consists of applying a nonlinear transformation of the input and create a statistical model as output. The second phase aims at improving the model with a mathematical method known as derivative. Auto encoder networks teach themselves how to compress data from the input layer into a shorter code, and then uncompress that code into whatever format best matches the original input. This process sometimes involves multiple auto encoders, such as stacked sparse auto encoder layers used in image processing. Incorporating the teachings of Gao into Brewster and Dirac would produce receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks, as disclosed by Gao, (see Abstract). 
However, Brewster, Dirac and Gao do not appears to specifically disclose the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set.
In the same field of endeavor, Carlsson discloses the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set (Carlsson discloses the database may be any data structure containing data (e.g., a very large dataset of multidimensional data). The input module 314 receives a database identifier and accesses a large multidimensional database. The input module 314 may scan the database and provide the user with an interface window allowing the user to identify an ID field. Vector is an additional set of weights in a neural network. Data point is a member of the particular group, apply the prediction model to the second transformation data set to generate predicted outcomes. Values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point in the training data set if the particular Data point is a member of the particular group, and apply a machine learning model to the first transformation data set to generate a prediction model. A plurality of groups, each group of the plurality of groups including a different subset of data points of the training data set, each data point of the training data set being a member of at least one group of the plurality of groups, creating a first transformation data set, the first transformation data set including the training data set as well as a plurality of feature subsets, each of the plurality of feature subsets being associated with at least one group of the plurality of groups. A dataset (or data set) is a collection of data, which is singular of data refers to a single point of data. The first data set is being trained by using a method such as predictive modeling, machine learning or neuro networks to predict the exact data match outcomes for incoming data points in a second data set. The prediction module 2702 may be tested using a test data set with known outcomes to assess whether the prediction model output is the same as {an exact match) or similar to known outcomes, (see Carlsson: Para. 0106, 0303, 00420-0462 and 0473-0481). This reads on the claim concept of the second multidimensional vector representing an exact match between a data point of the first data set and a data point of the second data set). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set according to at least the identified plurality of columns and data sets are optionally preprocessed for deep neural networks within multi-dimensional vector to a loss-function Brewster, Dirac and Gao in order to have incorporated deep neural networks within multi-dimensional vector to a loss-function, as disclosed by Carlsson, since both of these are directed to a dataset is a structured collection of data generally associated with a unique body of work. A database is an organized collection of data stored as multiple datasets. Those datasets are generally stored and accessed electronically from a computer system that allows the data to be easily accessed, manipulated, and updated. A data point or observation is a set of one or more measurements on a single member of the unit of observation. Simply put, predictive analytics uses past trends and applies them to future. Neural networks can learn complex patterns using layers of neurons which mathematically transform the data. The layers between the input and output are referred to as hidden layers. A neural network can learn relationships between the features that other algorithms cannot easily discover. An artificial neuron is a mathematical function. It takes one or more inputs that are multiplied by values called 'weights' and added together. This value is then passed to a non-linear function, referred to as an 'activation function', which becomes the output. The features are passed as inputs, e.g. size, brand, location, etc. This is the target variable, the thing we are trying to predict, e.g. the price of an item. Machine learning algorithms create a model after training, this is a mathematical function that can then be used to take a new observation and calculates an appropriate prediction. Training samples consist of measured data of some kind combined with the solutions that will help the neural network to generalize all this information into a consistent input-output relationship. Incorporating the teachings of Carlsson into Brewster, Dirac and Gao would produce values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point if the particular data point is a member of the particular group, and applying a machine learning model to the first transformation data set to generate a prediction model, as disclosed by Carlsson, (see Abstract).  
Claims 2, 6, 9, 13, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Brewster et al. (US 2017/0262491 A1, hereinafter Brewster) in view of Dirac et al. (US 10,970,629 B1, hereinafter Dirac) in view of Gao et al. (US 2017/0032035 A1, hereinafter Gao), in view of Carlsson (US 2016/0267397 A1, hereinafter Carlsson) and in view of Netz et al. (US 2010/0030796 A1, hereinafter Netz). 
Regarding dependent claim(s) 2, the combination of Brewster, Dirac, Gao and Carlsson discloses the method as in claim 1. However, the combination of Brewster, Dirac, Gao and Caisson do not appears to specifically disclose wherein a same encoder is used to provide the encoded data of the first data set, and the encoded data of the second data set. 
In the same field of endeavor, Netz discloses wherein a same encoder is used to provide the encoded data of the first data set, and the encoded data of the second data set (A column based encoder/compressor is provided for compacting large scale data storage and for making resulting scan/search/query operations over the data substantially more efficient as well. Netz discloses type columns (name 1, age 1, address 1, sex 1, etc.). This read on the claim concept of each column into an encoder specific to a column type of a respective column the encoder providing encoded. First and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. This read on the claim concept of first data set, and the second data set, respectively. It takes a column which has categorical data, which has been label encoded and then splits the column into multiple columns. The numbers are replaced by ls and Os, depending on which column has what value. A one hot encoding is a representation of categorical variables as binary vectors. This first requires that the categorical values be mapped to integer values. This read on the claim concept of a same encoder is used to provide the encoded data of the first data set, and the encoded data of the second data set, (see Netz: Para. 0048, 0051, 0053, 0055, 0057 0059, 0061, 0069 and FIG. 1, 2, 12, 20 and 24). Netz discloses the system includes areas having homogeneous repeated values to which run length encoding has been applied, and other areas labeled which is the same encoder been used, see Netz: Para. 0085).    
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data using deep neural networks within multi-dimensional vector to a loss-function of Brewster, Dirac, Gao and Carlsson in order to have incorporated the encoder providing encoded data, disclosed by Netz, since both of these are directed to it involves converting each value in a column to a number. Consider a dataset of bridges having a column names columns types. There will be many more columns in the dataset, to understand label encoding. Label Encoder converts each class under specified feature to a numerical value. Apply Label Encoder on each of the categorical columns. Each row represents a sample and each column represents a feature. It refers to splitting the column which contains numerical categorical data to many columns depending on the number of categories present in that column. Each column contains "O" or "1" corresponding to which column it has been placed. The data in the column usually denotes a category or value of the category and also when the data in the column is label encoded. Encoders are used to convert categorical data, or text data, into numbers, which our predictive models can better understand. Incorporating the teachings of Netz into Brewster, Dirac, Gao and Carlsson would produce relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns, as disclosed by Netz, (see Abstract). 
Regarding dependent claim(s) 9, (drawn computer-readable storage medium): claims 9 is computer-readable storage medium claim respectively that correspond to method of claim 2. Therefore, 9 is rejected for at least the same reasons as the method of 2. 
Regarding dependent claim(s) 16, (drawn system): claims 16 is system claim respectively that correspond to method of claim 2. Therefore, 16 is rejected for at least the same reasons as the method of 2. 
Regarding dependent claim(s) 6, the combination of Brewster, Dirac, Gao and Carlsson discloses the method as in claim 1. However, the combination of Brewster, Dirac, Gao and Carlsson do not appears to specifically disclose further comprising filtering at least one column from each of the first data set, and the second data set prior to providing encoded data. 
In the same field of endeavor, Netz discloses further comprising filtering at least one column from each of the first data set, and the second data set prior to providing encoded data (A column based encoder/compressor is provided for compacting large scale data storage and for making resulting scan/search/query operations over the data substantially more efficient as well. Netz discloses type columns (name 1, age 1, address 1, sex 1, etc.). This read on the claim concept of each column into an encoder specific to a column type of a respective column the encoder providing encoded. First and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. This read on the claim concept of first data set, and the second data set, respectively. It takes a column which has categorical data, which has been label encoded and then splits the column into multiple columns. The numbers are replaced by 1s and 0s, depending on which column has what value. A one hot encoding is a representation of categorical variables as binary vectors. This first requires that the categorical values be mapped to integer values. This read on the claim concept of a same encoder is used to provide the encoded data of the first data set, and the encoded data of the second data set, (see Netz: Para. 0048, 0051, 0053, 0055, 0057 0059, 0061, 0069 and FIG. 1, 2, 12, 20 and 24). Prior to applying run length encoding of the column, the column can be reordered to group all of the most similar values as re-ordered column. This read on the claim concept of prior to providing encoded data, (see Netz: Para. 0079). Filtering and/or Aggregations such operations can be mathematically reduced to efficient operations over the data organized as columns. This read on the claim concept of filtering at least one column, see Netz: Para. 0110, 0112, 0115, 0122 and FIG. 23).
Regarding dependent claim(s) 13, (drawn computer-readable storage medium): claims 13 is computer-readable storage medium claim respectively that correspond to method of claim 6. Therefore, 13 is rejected for at least the same reasons as the method of 6. 
Regarding dependent claim(s) 20, (drawn system): claims 20 is system claim respectively that correspond to method of claim 6. Therefore, 20 is rejected for at least the same reasons as the method of 6. 
Claims 4, 5, 11, 12, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Brewster et al. (US 2017/0262491 A1, hereinafter Brewster) in view of Dirac et al. (US 10,970,629 B1, hereinafter Dirac) in view of Gao et al. (US 2017/0032035 A1, hereinafter Gao), in view of Carlsson (US 2016/0267397 A1, hereinafter Carlsson) and in view of Hercus (US 2006/0149692 A1, hereinafter Hercus). 
Regarding dependent claim(s) 4, the combination of Brewster, Dirac, Gao and Carlsson discloses the method as in claim 1. However, the combination of Brewster, Dirac, Gao and Carlsson do not appears to specifically disclose wherein pre-processing comprises pre-appending one or more zeros to a numerical data value. 
In the same field of endeavor, Hercus discloses wherein pre-processing comprises preappending one or more zeros to a numerical data value (Hercus discloses the product of data preprocessing is the final training set. that for machine learning and neural networks a data preprocessing step is needed. So it has become to a universal technique which is used in computing in general. Training of such neural networks is accomplished, in its most basic form, by applying a specific input state to all the input neurons, selecting a specific output neuron to represent that input state, and adjusting the synaptic strengths or weights in the hidden layer. This read on the claim concept of wherein preprocessing, (see Hercus: Para. 0081, 0082 and 0089). The number of neurons that may be stored in an array which is numeric data value stored in memory storage, (see Hercus: Para. 0094 and 0143). This read on the claim concept of numerical data value. Zero per-adding (preappending) is a technique that allows to preserve the original input size. Neural networks are based on the concept of three layers an input neuron layer, a hidden neuron layer, and an output neuron layer. Add something like a double border or triple border of zeros to maintain the original size of the input. This is just going to depend on the size of the input and the size of the filters. Training neural network might need apply zero peradding (pre-appending) because the pre-adding ensures that the output has the same shape as the input data. This read on the claim concept of appending one or more zeros, see Hercus: Para. 0066, 0086, 0087 and FIG. 2). 
 Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data using deep neural networks within multi-dimensional vector to a loss-function of Brewster, Dirac, Gao and Carlsson in order to have incorporated appending one or more zero, disclosed by Hercus, since both of these are directed to training Neural Networks means that network is composed of two separate parts most of the times. The last part of your network, which often contains densely connected layers, generates a classification or regresses a value based on the inputs received by the first dense layer. The first part, however, serves as a feature extraction mechanism. It transforms the original inputs into bits of information which ensures that the dense layers perform better. Neural networks, two- and three-dimensional feature tensors can also be inputted to the model. During training, the machine adjusts its internal parameters to project each feature tensor close to its target. After training, the machine can be used to predict the target for previously unseen feature tensors. By incorporating the teachings of Hercus into Brewster, Dirac, Gao and Carlsson would produce a neural network comprising a plurality of neurons in which any one of the plurality of neurons is able to associate with itself or another neuron in the plurality of neurons via active connections to a further neuron in the plurality of neurons, as disclosed by Hercus, (see Abstract). 
  Regarding dependent claim(s) 11, (drawn computer-readable storage medium): claims 11 is computer-readable storage medium claim respectively that correspond to method of claim 4. Therefore, 11 is rejected for at least the same reasons as the method of 4.
Regarding dependent claim(s) 18, (drawn system): claims 18 is system claim respectively that correspond to method of claim 4. Therefore, 18 is rejected for at least the same reasons as the method of 4.
Regarding dependent claim(s) 5, the combination of Brewster, Dirac, Gao and Carlsson discloses the method as in claim 1. However, the combination of Brewster, Dirac, Gao and Carlsson do not appears to specifically disclose wherein pre-processing comprises pre-appending one or more spaces to a string data value. 
In the same field of endeavor, Hercus discloses wherein pre-processing comprises preappending one or more spaces to a string data value (Hercus discloses the product of data preprocessing is the final training set. That for machine learning and neural networks a data preprocessing step is needed. So it has become to a universal technique which is used in computing in general. Training of such neural networks is accomplished, in its most basic form, by applying a specific input state to all the input neurons, selecting a specific output neuron to represent that input state, and adjusting the synaptic strengths or weights in the hidden layer. This read on the claim concept of wherein preprocessing, (see Hercus: Para. 0081, 0082 and 0089). Zero per-adding (pre-appending) is a technique that allows to preserve the original input size. Neural networks are based on the concept of three layers an input neuron layer, a hidden neuron layer, and an output neuron layer. Add something like a double border or triple border of zeros to maintain the original size of the input. This is just going to depend on the size of the input and the size of the filters. Training neural network might need apply zero per-adding (pre-appending) because the pre-adding ensures that the output has the same shape as the input data. This read on the claim concept of appending one or more zeros, (see Hercus: Para. 0066, 0086, 0087 and FIG. 2). A string is a data type used in programming, such as represent text rather than numbers. Without spaces between words, text would be completely unreadable and impossible to align. Spaces are a necessary tool for the creation of textual material. This read on the claim concept of spaces to a string data value, (see Hercus: Para. 0077, 0111, 0122 and 0128). 
Regarding dependent claim(s) 12, (drawn computer-readable storage medium): claims 12 is computer-readable storage medium claim respectively that correspond to method of claim 5. Therefore, 12 is rejected for at least the same reasons as the method of 5. 
Regarding dependent claim(s) 19, (drawn system): claims 19 is system claim respectively that correspond to method of claim 5. Therefore, 19 is rejected for at least the same reasons as the method of 5.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
                                                              Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOHANES Demiss KELEMEWORK whose telephone number is (571)272-8772.  The examiner can normally be reached on Monday-Friday 8:00 am-5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on 571-272-0631.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/YOHANES D KELEMEWORK/Examiner, Art Unit 2164       

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164