Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to communications: Application filed on 03/27/2018. Claims 1, 8 and 15 are independent claims. Claims 1-20 have been examined and rejected in the current patent application. 
Response to Arguments
Applicant presents the following arguments in the January 19, 2021 amendment. 
Applicant's arguments with respect to claims 1, 8 and 15 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR l.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR l.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's 
submission filed on 01/19/2021 has been entered. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1, 3, 8, 10, 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Santos Moraes et al. (US 2019/0155904 A1, hereinafter Santos Moraes) in view of Gao et al. (US 2017/0032035 A1, hereinafter Gao) and in view of Carlsson (US 2016/0267397 A1, hereinafter Carlsson).
Regarding independent claim(s) 1, Santos discloses a computer-implemented method executed by one or more processors, the method comprising: receiving, a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns (Santos discloses receiving the structured data with various sets of values or entries (first data set and second data set). Models for scoring and ranking the answer can be trained on the basis of large sets of question (input) and answer (output) pairs, (see Santos: Para. 0017, 0041 and FIG. 5). Determining that data is present within a box having multiple rows and columns. For purposes of this implementation structured data 56 can be considered as being generated from a table having a general subject or topic (derived from metadata ), one or more fields in the table (row and/or column headers), and multiple entries in a given field (an entry may be null), (see Santos: Para. 0031-0035). This regressed allows you to create a model that is both wide, with logistic regression that has sparse features, and deep, using a feed-forward neural network that has an embedding layer and several hidden layers. It is using a technique called backpropagation or mathematical trick. For each training step compute the output error for the weights that currently in place for each connection between each artificial neuron. This reads on the claim concept of a computer-implemented method executed by one or more processors, the method comprising: receiving, a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns). 
	However, Santos does not appears to specifically disclose for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column

	In the same field of endeavor, Gao discloses for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively (Gao discloses a multi-task deep neural network (DNN) for representation learning for semantic classification (e.g., query classification) and semantic information retrieval tasks (e.g., ranking for web searches). It is gets passed along to one or more hidden layer. These lines all represent connection like between input and output, which is where you get the final classification or decision about what's happening. An artificial neural network gets a lot of its power from the connections between different neurons. They always uses neurons, except these neurons have a numerical value as a way to hold information. There's an input layer, several hidden layers and output layers. The feed data into the machine and then the network find different patterns. Machine learning algorithms can find complex relationships and classify data based on patterns. Neural networks take this the next level, which can use thousands or millions of neurons to analyze the data and find complex patterns. Auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner by, for example, generating model parameters that optimize the reconstruction of documents after the documents have been converted into a semantic space (see Gao: Para. 0015-0021). At operational layer LO, either queries Q or documents D (e.g., D1, D2) or (first data set and the second data) may be initially represented as a bag of words, among a vocabulary of 500 k words. Here, the size 500 k of a vocabulary is merely an illustrative example, and a vocabulary may have any other size, (see Gao: Para. 0047-0060). A multi layers perceptron as being where the perceptron are divided up into multiple layers. The input is a linear combination of the sum the products of the weight time the input. The input to this perceptron is a linear combination of all of the output multiplied by weight. A deep neural network is then a neural network with one or more hidden layers. The set of inputs into the network and outputs, (see Gao: Para. 0061-0068 and 0094).
Model 500, for example, may satisfy such criteria with relatively high model compactness, which may be attributed/column to aggressive compression from the 500 k dimensional bag-of-words input to
300-dimensional semantic representation in shared operational level. This reads on the claim concept of for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively). 
	providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer.
The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. This reads on the claim concept of providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers).
	providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. A one-hot vector representation by allowing for representation of out-of-vocabulary words by n-gram vectors. Reading, cleaning and split it the data into trained model, which is this retures a single vector that prepared to be passed directly into machine learning model. Creating many randomly assigned training sets. Each individual training set, is used to train recommender system independently and then measure the accuracy of the resulting system against the test set, (see Gao:
Para. 0072-0077). This reads on the claim concept of providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers).
	Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector (Gao discloses task-specific objective. Such updating approximately optimizes the sum of the multi-task objectives. For query classification of class C, the processor may use the cross entropy loss function as the task-specific objective, (see Gao: Para. 0018, 0037, 0050, 0077-0089). A loss function is used to optimize the parameter values in a neural network model. Loss functions map a set of parameter values for the network onto a scalar value that indicates how well those parameter accomplish the task the network is intended to do. Neural network put weight on the connection between different neural and different layers. These weights can either positive or negative. Multi-layer neural that will convert a list of data into a list of numerical vectors. Multi-dimensional vectors, in other word, a vector with two numbers in it. Plot these vectors in multi-dimensional space. The first thing is gauge similarity using data vectors. The most common way to calculate using cosine similarity function, and it will return a score between as a similarity measure. The X axis would represent the angles between two vectors, and then the V axis is similarity score that would be returned. The loss information by averaging the data vectors together to create level representation. This reads on the claim concept of Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector). 
	Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set comprising structured data in a plurality of columns Santos in order to have incorporated deep neural
networks within multi-dimensional vector to a loss-function, as disclosed by Gao, since both of these are directed to the neural network needs to learn all the time to solve tasks in a more qualified manner or even to use various methods to provide a better result. When it gets new information in the system, it learns how to act accordingly to a new situation. Learning becomes deeper when tasks you solve get harder. Deep neural network represents the type of machine learning when the system uses many layers of nodes to derive high-level functions from input information. It means transforming the data into a more creative and abstract component. Each Hidden layer is composed of neurons. The neurons are connected to each other. The neuron will process and then propagate the input signal it receives the layer above it. The strength of the signal given the neuron in the next layer depends on the weight, bias and activation function. The network consumes large amounts of input data and operates them through multiple layers; the network can learn increasingly complex features of the data at each layer. A deep neural network provides state-of-the-art accuracy in many tasks, from object detection to speech recognition. They can learn automatically, without predefined knowledge explicitly coded by the programmers. A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e., the hierarchy of knowledge. A neural network with four layers will learn more complex feature than with that with two layers. The first phase consists of applying a nonlinear transformation of the input and create a statistical model as output. The second phase aims at improving the model with a mathematical method known as derivative. Auto encoder networks teach themselves how to compress data from the input layer into a shorter code, and then uncompress that code into whatever format best matches the original input. This process sometimes involves multiple auto encoders, such as stacked sparse auto encoder layers used in image processing. Incorporating the teachings of Gao into Santos would produce receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks, as disclosed by Gao, (see Abstract).  
	However, Santos and Gao does not appears to specifically disclose the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set.
	In the same field of endeavor, Carlsson discloses the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set (Carlsson discloses the database may be any data structure containing data (e.g., a very large dataset of multidimensional data). The input module 314 receives a database identifier and accesses a large multidimensional database. The input module 314 may scan the database and provide the user with an interface window allowing the user to identify an ID field. Vector is an additional set of weights in a neural network. Data point is a member of the particular group, apply the prediction model to the second transformation data set to generate predicted outcomes. Values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point in the training data set if the particular Data point is a member of the particular group, and apply a machine learning model to the first transformation data set to generate a prediction model. A plurality of groups, each group of the plurality of groups including a different subset of data points of the training data set, each data point of the training data set being a member of at least one group of the plurality of groups, creating a first transformation data set, the first transformation data set including the training data set as well as a plurality of feature subsets, each of the plurality of feature subsets being associated with at least one group of the plurality of groups. A dataset (or data set) is a collection of data, which is singular of data refers to a single point of data. The first data set is being trained by using a method such as predictive modeling, machine learning or neuro networks to predict the exact data match outcomes for incoming data points in a second data set. The prediction module 2702 may be tested using a test data set with known outcomes to assess whether the prediction model output is the same as (an exact match) or similar to known outcomes, (see Carlsson: Para. 0106, 0303, 00420-0462 and 0473-0481). This reads on the claim concept of the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set comprising structured data in a plurality of columns Santos and Gao in order to have incorporated an exact match base on the training data, as disclosed by Carlsson, since both of these are directed to a dataset is a structured collection of data generally associated with a unique body of work. A database is an organized collection of data stored as multiple datasets. Those datasets are generally stored and accessed electronically from a computer system that allows the data to be easily accessed, manipulated, and updated. A data point or observation is a set of one or more measurements on a single member of the unit of observation. Simply put, predictive analytics uses past trends and applies them to future. Neural networks can learn complex patterns using layers of neurons which mathematically transform the data. The layers between the input and output are referred to as hidden layers. A neural network can learn relationships between the features that other algorithms cannot easily discover. An artificial neuron is a mathematical function. It takes one or more inputs that are multiplied by values called ‘weights’ and added together. This value is then passed to a non-linear function, referred to as an ‘activation function’, which becomes the output. The features are passed as inputs, e.g. size, brand, location, etc. This is the target variable, the thing we are trying to predict, e.g. the price of an item. Machine learning algorithms create a model after training, this is a mathematical function that can then be used to take a new observation and calculates an appropriate prediction. Training samples consist of measured data of some kind combined with the solutions that will help the neural network to generalize all this information into a consistent input–output relationship. Incorporating the teachings of Carlsson into Santos and Gao would produce values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point if the particular data point is a member of the particular group, and applying a machine learning model to the first transformation data set to generate a prediction model, as disclosed by Carlsson, (see Abstract).  
Regarding dependent claim(s) 3, the combination of Santos, Gao and Carlsson discloses the method as in claim 1. However, Santos and Carlsson do not appears to specifically disclose wherein, prior to the encoder providing encoded data, data values of one or more of the first data set, and the second data set are pre-processed to provide revised data values.
In the same field of endeavor, Gao discloses wherein, prior to the encoder providing encoded data, data values of one or more of the first data set, and the second data set are pre-processed to provide revised data values (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner by, for example, generating model parameters that optimize the reconstruction of documents after the documents have been converted into a semantic space. An autoencoder is trained using the backpropagation algorithm frequently based on the mean square error cost function, the reason being an autoencoder is generally applied through neural networks, (see Gao: Para. 0017-0021). This reads on the claim concept of wherein, prior to the encoder providing encoded data, data values of one or more of the first data set, and the second data set are pre-processed to provide revised data values). 
Regarding independent claim(s) 8, Santos discloses a non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations, the operations comprising: receiving a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns (Santos discloses receiving the structured data with various sets of values or entries (first data set and second data set). Models for scoring and ranking the answer can be trained on the basis of large sets of question (input) and answer (output) pairs, (see Santos: Para. 0017-0029, 0041 and FIG. 5). Determining that data is present within a box having multiple rows and columns. For purposes of this implementation structured data 56 can be considered as being generated from a table having a general subject or topic (derived from metadata ), one or more fields in the table (row and/or column headers), and multiple entries in a given field (an entry may be null), (see Santos: Para. 0031-0035). This regressed allows you to create a model that is both wide, with logistic regression that has sparse features, and deep, using a feed-forward neural network that has an embedding layer and several hidden layers. It is using a technique called backpropagation or mathematical trick. For each training step compute the output error for the weights that currently in place for each connection between each artificial neuron. This reads on the claim concept of a computer-implemented method executed by one or more processors, the method comprising: receiving, a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns). 
	However, Santos does not appears to specifically disclose for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column
encoder providing encoded data for the first data set, and the second data set, respectively; providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers; providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers; and Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multidimensional vector and the second multi-dimensional vector to provide an output, the output comprising a difference between the first multi-dimensional vector.  
	In the same field of endeavor, Gao discloses for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively (Gao discloses a multi-task deep neural network (DNN) for representation learning for semantic classification (e.g., query classification) and semantic information retrieval tasks (e.g., ranking for web searches). It is gets passed along to one or more hidden layer. These lines all represent connection like between input and output, which is where you get the final classification or decision about what's happening. An artificial neural network gets a lot of its power from the connections between different neurons. They always uses neurons, except these neurons have a numerical value as a way to hold information. There's an input layer, several hidden layers and output layers. The feed data into the machine and then the network find different patterns. Machine learning algorithms can find complex relationships and classify data based on patterns. Neural networks take this the next level, which can use thousands or millions of neurons to analyze the data and find complex patterns. Auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner by, for example, generating model parameters that optimize the reconstruction of documents after the documents have been converted into a semantic space (see Gao: Para. 0015-0021). At operational layer LO, either queries Q or documents D (e.g., D1, D2) or (first data set and the second data) may be initially represented as a bag of words, among a vocabulary of 500 k words. Here, the size 500 k of a vocabulary is merely an illustrative example, and a vocabulary may have any other size, (see Gao: Para. 0047-0060). A multi layers perceptron as being where the perceptron are divided up into multiple layers. The input is a linear combination of the sum the products of the weight time the input. The input to this perceptron is a linear combination of all of the output multiplied by weight. A deep neural network is then a neural network with one or more hidden layers. The set of inputs into the network and outputs, (see Gao: Para. 0061-0068 and 0094).
Model 500, for example, may satisfy such criteria with relatively high model compactness, which may be attributed/column to aggressive compression from the 500 k dimensional bag-of-words input to
300-dimensional semantic representation in shared operational level. This reads on the claim concept of for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively). 
	providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer.
The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. This reads on the claim concept of providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers).
	providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. A one-hot vector representation by allowing for representation of out-of-vocabulary words by n-gram vectors. Reading, cleaning and split it the data into trained model, which is this retures a single vector that prepared to be passed directly into machine learning model. Creating many randomly assigned training sets. Each individual training set, is used to train recommender system independently and then measure the accuracy of the resulting system against the test set, (see Gao:
Para. 0072-0077). This reads on the claim concept of providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers).
	Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector (Gao discloses task-specific objective. Such updating approximately optimizes the sum of the multi-task objectives. For query classification of class C, the processor may use the cross entropy loss function as the task-specific objective, (see Gao: Para. 0018, 0037, 0050, 0077-0089). A loss function is used to optimize the parameter values in a neural network model. Loss functions map a set of parameter values for the network onto a scalar value that indicates how well those parameter accomplish the task the network is intended to do. Neural network put weight on the connection between different neural and different layers. These weights can either positive or negative. Multi-layer neural that will convert a list of data into a list of numerical vectors. Multi-dimensional vectors, in other word, a vector with two numbers in it. Plot these vectors in multi-dimensional space. The first thing is gauge similarity using data vectors. The most common way to calculate using cosine similarity function, and it will return a score between as a similarity measure. The X axis would represent the angles between two vectors, and then the V axis is similarity score that would be returned. The loss information by averaging the data vectors together to create level representation. This reads on the claim concept of Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector). 
	Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set comprising structured data in a plurality of columns Santos in order to have incorporated deep neural
networks within multi-dimensional vector to a loss-function, as disclosed by Gao, since both of these are directed to the neural network needs to learn all the time to solve tasks in a more qualified manner or even to use various methods to provide a better result. When it gets new information in the system, it learns how to act accordingly to a new situation. Learning becomes deeper when tasks you solve get harder. Deep neural network represents the type of machine learning when the system uses many layers of nodes to derive high-level functions from input information. It means transforming the data into a more creative and abstract component. Each Hidden layer is composed of neurons. The neurons are connected to each other. The neuron will process and then propagate the input signal it receives the layer above it. The strength of the signal given the neuron in the next layer depends on the weight, bias and activation function. The network consumes large amounts of input data and operates them through multiple layers; the network can learn increasingly complex features of the data at each layer. A deep neural network provides state-of-the-art accuracy in many tasks, from object detection to speech recognition. They can learn automatically, without predefined knowledge explicitly coded by the programmers. A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e., the hierarchy of knowledge. A neural network with four layers will learn more complex feature than with that with two layers. The first phase consists of applying a nonlinear transformation of the input and create a statistical model as output. The second phase aims at improving the model with a mathematical method known as derivative. Auto encoder networks teach themselves how to compress data from the input layer into a shorter code, and then uncompress that code into whatever format best matches the original input. This process sometimes involves multiple auto encoders, such as stacked sparse auto encoder layers used in image processing. Incorporating the teachings of Gao into Santos would produce receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks, as disclosed by Gao, (see Abstract).  
	However, Santos and Gao do not appears to specifically disclose the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set.
	In the same field of endeavor, Carlsson discloses the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set (Carlsson discloses the database may be any data structure containing data (e.g., a very large dataset of multidimensional data). The input module 314 receives a database identifier and accesses a large multidimensional database. The input module 314 may scan the database and provide the user with an interface window allowing the user to identify an ID field. Vector is an additional set of weights in a neural network. Data point is a member of the particular group, apply the prediction model to the second transformation data set to generate predicted outcomes. Values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point in the training data set if the particular Data point is a member of the particular group, and apply a machine learning model to the first transformation data set to generate a prediction model. A plurality of groups, each group of the plurality of groups including a different subset of data points of the training data set, each data point of the training data set being a member of at least one group of the plurality of groups, creating a first transformation data set, the first transformation data set including the training data set as well as a plurality of feature subsets, each of the plurality of feature subsets being associated with at least one group of the plurality of groups. A dataset (or data set) is a collection of data, which is singular of data refers to a single point of data. The first data set is being trained by using a method such as predictive modeling, machine learning or neuro networks to predict the exact data match outcomes for incoming data points in a second data set. The prediction module 2702 may be tested using a test data set with known outcomes to assess whether the prediction model output is the same as (an exact match) or similar to known outcomes, (see Carlsson: Para. 0106, 0303, 00420-0462 and 0473-0481). This reads on the claim concept of the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set comprising structured data in a plurality of columns Santos and Gao in order to have incorporated an exact match base on the training data, as disclosed by Carlsson, since both of these are directed to a dataset is a structured collection of data generally associated with a unique body of work. A database is an organized collection of data stored as multiple datasets. Those datasets are generally stored and accessed electronically from a computer system that allows the data to be easily accessed, manipulated, and updated. A data point or observation is a set of one or more measurements on a single member of the unit of observation. Simply put, predictive analytics uses past trends and applies them to future. Neural networks can learn complex patterns using layers of neurons which mathematically transform the data. The layers between the input and output are referred to as hidden layers. A neural network can learn relationships between the features that other algorithms cannot easily discover. An artificial neuron is a mathematical function. It takes one or more inputs that are multiplied by values called ‘weights’ and added together. This value is then passed to a non-linear function, referred to as an ‘activation function’, which becomes the output. The features are passed as inputs, e.g. size, brand, location, etc. This is the target variable, the thing we are trying to predict, e.g. the price of an item. Machine learning algorithms create a model after training, this is a mathematical function that can then be used to take a new observation and calculates an appropriate prediction. Training samples consist of measured data of some kind combined with the solutions that will help the neural network to generalize all this information into a consistent input–output relationship. Incorporating the teachings of Carlsson into Santos and Gao would produce values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point if the particular data point is a member of the particular group, and applying a machine learning model to the first transformation data set to generate a prediction model, as disclosed by Carlsson, (see Abstract).  
	Regarding dependent claim(s) 10, (drawn computer-readable storage medium): claims 10 is computer-readable storage medium claim respectively that correspond to method of claim 3. Therefore,
10 is rejected for at least the same reasons as the method of 3. 
	Regarding independent claim(s) 15, Santos discloses a system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations, the operations comprising receiving a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns (Santos discloses receiving the structured data with various sets of values or entries (first data set and second data set).
Models for scoring and ranking the answer can be trained on the basis of large sets of question (input) and answer (output) pairs, (see Santos: Para. 0017-0029, 0041 and FIG. 5). Determining that data is present within a box having multiple rows and columns. For purposes of this implementation structured data 56 can be considered as being generated from a table having a general subject or topic (derived from metadata), one or more fields in the table {row and/or column headers), and multiple entries in a given field (an entry may be null), (see Santos: Para. 0031-0035). This regressed allows you to create a model that is both wide, with logistic regression that has sparse features, and deep, using a feed-forward neural network that has an embedding layer and several hidden layers. It is using a technique called backpropagation or mathematical trick. For each training step compute the output error for the weights that currently in place for each connection between each artificial neuron. This reads on the claim concept of a computer-implemented method executed by one or more processors, the method comprising: receiving, a first data set and a second data set, both the first data set and the second data set comprising structured data in a plurality of columns).
	However, Santos does not appears to specifically disclose for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column
encoder providing encoded data for the first data set, and the second data set, respectively; providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers; providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers; and Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multidimensional vector and the second multi-dimensional vector to provide an output, the output comprising a difference between the first multi-dimensional vector.  
	In the same field of endeavor, Gao discloses for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively (Gao discloses a multi-task deep neural network (DNN) for representation learning for semantic classification (e.g., query classification) and semantic information retrieval tasks (e.g., ranking for web searches). It is gets passed along to one or more hidden layer. These lines all represent connection like between input and output, which is where you get the final classification or decision about what's happening. An artificial neural network gets a lot of its power from the connections between different neurons. They always uses neurons, except these neurons have a numerical value as a way to hold information. There's an input layer, several hidden layers and output layers. The feed data into the machine and then the network find different patterns. Machine learning algorithms can find complex relationships and classify data based on patterns. Neural networks take this the next level, which can use thousands or millions of neurons to analyze the data and find complex patterns. Auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner by, for example, generating model parameters that optimize the reconstruction of documents after the documents have been converted into a semantic space (see Gao: Para. 0015-0021). At operational layer LO, either queries Q or documents D (e.g., D1, D2) or (first data set and the second data) may be initially represented as a bag of words, among a vocabulary of 500 k words. Here, the size 500 k of a vocabulary is merely an illustrative example, and a vocabulary may have any other size, (see Gao: Para. 0047-0060). A multi layers perceptron as being where the perceptron are divided up into multiple layers. The input is a linear combination of the sum the products of the weight time the input. The input to this perceptron is a linear combination of all of the output multiplied by weight. A deep neural network is then a neural network with one or more hidden layers. The set of inputs into the network and outputs, (see Gao: Para. 0061-0068 and 0094).
Model 500, for example, may satisfy such criteria with relatively high model compactness, which may be attributed/column to aggressive compression from the 500 k dimensional bag-of-words input to
300-dimensional semantic representation in shared operational level. This reads on the claim concept of for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column encoder providing encoded data for the first data set, and the second data set, respectively). 
	providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer.
The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. This reads on the claim concept of providing, a first multi-dimensional vector based on encoded data of the first data set by mapping a first output of first fully connected layers to a latent space independently of a second output of second fully connected layers).
	providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers (auto-encoders may leverage deep learning to project linguistic items into a semantic space. One approach trains these auto-encoders in an unsupervised manner, (see Gao: Para. 0017-0021 and 0041-0051). Each of the layers connects to all of the neurons in next layer. Each of the neurons in the first hidden layer connects to all the neurons in the second hidden layer. This second hidden layer tries to assemble some data from the first hidden layer. The key thing to keep in mind, is that the activation and levels in the hidden layer help classify the data for the output layer. The more layers have, the more patterns the network can see as it tries to classify the data. The two layers neural network that accepts a text corpus as input and it returns a set vectors. A one-hot vector representation by allowing for representation of out-of-vocabulary words by n-gram vectors. Reading, cleaning and split it the data into trained model, which is this retures a single vector that prepared to be passed directly into machine learning model. Creating many randomly assigned training sets. Each individual training set, is used to train recommender system independently and then measure the accuracy of the resulting system against the test set, (see Gao:
Para. 0072-0077). This reads on the claim concept of providing, a second multi-dimensional vector based on encoded data of the second data set by mapping the second output of the second fully connected layers to the latent space independently of the first output of the first fully connected layers).
	Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector (Gao discloses task-specific objective. Such updating approximately optimizes the sum of the multi-task objectives. For query classification of class C, the processor may use the cross entropy loss function as the task-specific objective, (see Gao: Para. 0018, 0037, 0050, 0077-0089). A loss function is used to optimize the parameter values in a neural network model. Loss functions map a set of parameter values for the network onto a scalar value that indicates how well those parameter accomplish the task the network is intended to do. Neural network put weight on the connection between different neural and different layers. These weights can either positive or negative. Multi-layer neural that will convert a list of data into a list of numerical vectors. Multi-dimensional vectors, in other word, a vector with two numbers in it. Plot these vectors in multi-dimensional space. The first thing is gauge similarity using data vectors. The most common way to calculate using cosine similarity function, and it will return a score between as a similarity measure. The X axis would represent the angles between two vectors, and then the V axis is similarity score that would be returned. The loss information by averaging the data vectors together to create level representation. This reads on the claim concept of Outputting, the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multidimensional vector to provide an output, the output comprising a difference between the first multidimensional vector). 
	Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set comprising structured data in a plurality of columns Santos in order to have incorporated deep neural
networks within multi-dimensional vector to a loss-function, as disclosed by Gao, since both of these are directed to the neural network needs to learn all the time to solve tasks in a more qualified manner or even to use various methods to provide a better result. When it gets new information in the system, it learns how to act accordingly to a new situation. Learning becomes deeper when tasks you solve get harder. Deep neural network represents the type of machine learning when the system uses many layers of nodes to derive high-level functions from input information. It means transforming the data into a more creative and abstract component. Each Hidden layer is composed of neurons. The neurons are connected to each other. The neuron will process and then propagate the input signal it receives the layer above it. The strength of the signal given the neuron in the next layer depends on the weight, bias and activation function. The network consumes large amounts of input data and operates them through multiple layers; the network can learn increasingly complex features of the data at each layer. A deep neural network provides state-of-the-art accuracy in many tasks, from object detection to speech recognition. They can learn automatically, without predefined knowledge explicitly coded by the programmers. A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e., the hierarchy of knowledge. A neural network with four layers will learn more complex feature than with that with two layers. The first phase consists of applying a nonlinear transformation of the input and create a statistical model as output. The second phase aims at improving the model with a mathematical method known as derivative. Auto encoder networks teach themselves how to compress data from the input layer into a shorter code, and then uncompress that code into whatever format best matches the original input. This process sometimes involves multiple auto encoders, such as stacked sparse auto encoder layers used in image processing. Incorporating the teachings of Gao into Santos would produce receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks, as disclosed by Gao, (see Abstract).  
	However, Santos and Gao do not appears to specifically disclose the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set.
	In the same field of endeavor, Carlsson discloses the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set (Carlsson discloses the database may be any data structure containing data (e.g., a very large dataset of multidimensional data). The input module 314 receives a database identifier and accesses a large multidimensional database. The input module 314 may scan the database and provide the user with an interface window allowing the user to identify an ID field. Vector is an additional set of weights in a neural network. Data point is a member of the particular group, apply the prediction model to the second transformation data set to generate predicted outcomes. Values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point in the training data set if the particular Data point is a member of the particular group, and apply a machine learning model to the first transformation data set to generate a prediction model. A plurality of groups, each group of the plurality of groups including a different subset of data points of the training data set, each data point of the training data set being a member of at least one group of the plurality of groups, creating a first transformation data set, the first transformation data set including the training data set as well as a plurality of feature subsets, each of the plurality of feature subsets being associated with at least one group of the plurality of groups. A dataset (or data set) is a collection of data, which is singular of data refers to a single point of data. The first data set is being trained by using a method such as predictive modeling, machine learning or neuro networks to predict the exact data match outcomes for incoming data points in a second data set. The prediction module 2702 may be tested using a test data set with known outcomes to assess whether the prediction model output is the same as (an exact match) or similar to known outcomes, (see Carlsson: Para. 0106, 0303, 00420-0462 and 0473-0481). This reads on the claim concept of the second multi-dimensional vector representing an exact match between a data point of the first data set and a data point of the second data set). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data set comprising structured data in a plurality of columns Santos and Gao in order to have incorporated an exact match base on the training data, as disclosed by Carlsson, since both of these are directed to a dataset is a structured collection of data generally associated with a unique body of work. A database is an organized collection of data stored as multiple datasets. Those datasets are generally stored and accessed electronically from a computer system that allows the data to be easily accessed, manipulated, and updated. A data point or observation is a set of one or more measurements on a single member of the unit of observation. Simply put, predictive analytics uses past trends and applies them to future. Neural networks can learn complex patterns using layers of neurons which mathematically transform the data. The layers between the input and output are referred to as hidden layers. A neural network can learn relationships between the features that other algorithms cannot easily discover. An artificial neuron is a mathematical function. It takes one or more inputs that are multiplied by values called ‘weights’ and added together. This value is then passed to a non-linear function, referred to as an ‘activation function’, which becomes the output. The features are passed as inputs, e.g. size, brand, location, etc. This is the target variable, the thing we are trying to predict, e.g. the price of an item. Machine learning algorithms create a model after training, this is a mathematical function that can then be used to take a new observation and calculates an appropriate prediction. Training samples consist of measured data of some kind combined with the solutions that will help the neural network to generalize all this information into a consistent input–output relationship. Incorporating the teachings of Carlsson into Santos and Gao would produce values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point if the particular data point is a member of the particular group, and applying a machine learning model to the first transformation data set to generate a prediction model, as disclosed by Carlsson, (see Abstract).  
	Regarding dependent claim(s) 17, (drawn system): claims 17 is system claim respectively that correspond to method of claim 3. Therefore, 17 is rejected for at least the same reasons as the method of 3. 
Claims 2, 6, 9, 13, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Santos Moraes et al. (US 2019/0155904 A1, hereinafter Santos Moraes) in view of Gao et al. (US 2017/0032035 A1, hereinafter Gao), in view of Carlsson (US 2016/0267397 A1, hereinafter Carlsson) and in view of Netz et al. (US 2010/0030796 A1, hereinafter Netz).
Regarding dependent claim(s) 2, the combination of Santos, Gao and Carlsson discloses the method as in claim 1. However, the combination of Santos, Gao and Calsson do not appears to specifically disclose wherein a same encoder is used to provide the encoded data of the first data set, and the encoded data of the second data set. 
In the same field of endeavor, Netz discloses wherein a same encoder is used to provide the encoded data of the first data set, and the encoded data of the second data set (A column based encoder/compressor is provided for compacting large scale data storage and for making resulting scan/search/query operations over the data substantially more efficient as well. Netz discloses type columns (name 1, age 1, address 1, sex 1, etc.). This read on the claim concept of each column into an encoder specific to a column type of a respective column the encoder providing encoded. First and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. This read on the claim concept of first data set, and the second data set, respectively. It takes a column which has categorical data, which has been label encoded and then splits the column into multiple columns. The numbers are replaced by 1s and 0s, depending on which column has what value. A one hot encoding is a representation of categorical variables as binary vectors. This first requires that the categorical values be mapped to integer values. This read on the claim concept of a same encoder is used to provide the encoded data of the first data set, and the encoded data of the second data set, (see Netz: Para. 0048, 0051, 0053, 0055, 0057 0059, 0061, 0069 and FIG. 1, 2, 12, 20 and 24). Netz discloses the system includes areas having homogeneous repeated values to which run length encoding has been applied, and other areas labeled which is the same encoder been used, see Netz: Para. 0085). 
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data using deep neural networks within multi-dimensional vector to a loss-function of Santos, Gao and Carlsson in order to have incorporated the encoder providing encoded data, disclosed by Netz, since both of these are directed to it involves converting each value in a column to a number. Consider a dataset of bridges having a column names columns types. There will be many more columns in the dataset, to understand label encoding. Label Encoder converts each class under specified feature to a numerical value. Apply Label Encoder on each of the categorical columns. Each row represents a sample and each column represents a feature. It refers to splitting the column which contains numerical categorical data to many columns depending on the number of categories present in that column. Each column contains "O" or "1" corresponding to which column it has been placed. The data in the column usually denotes a category or value of the category and also when the data in the column is label encoded. Encoders are used to convert categorical data, or text data, into numbers, which our predictive models can better understand. Incorporating the teachings of Netz into Santos, Gao and Carlsson would produce relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns, as disclosed by Netz, (see Abstract).
Regarding dependent claim(s) 9, (drawn computer-readable storage medium): claims 9 is computer-readable storage medium claim respectively that correspond to method of claim 2. Therefore, 9 is rejected for at least the same reasons as the method of 2.
Regarding dependent claim(s) 16, (drawn system): claims 16 is system claim respectively that correspond to method of claim 2. Therefore, 16 is rejected for at least the same reasons as the method of 2.
Regarding dependent claim(s) 6, the combination of Santos, Gao and Carlsson discloses the method as in claim 1. However, the combination of Santos, Gao and Carlsson do not appears to specifically disclose further comprising filtering at least one column from each of the first data set, and the second data set prior to providing encoded data.
In the same field of endeavor, Netz discloses further comprising filtering at least one column from each of the first data set, and the second data set prior to providing encoded data (A column based encoder/compressor is provided for compacting large scale data storage and for making resulting scan/search/query operations over the data substantially more efficient as well. Netz discloses type columns (name 1, age 1, address 1, sex 1, etc.). This read on the claim concept of each column into an encoder specific to a column type of a respective column the encoder providing encoded. First and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. This read on the claim concept of first data set, and the second data set, respectively. It takes a column which has categorical data, which has been label encoded and then splits the column into multiple columns. The numbers are replaced by Is and Os, depending on which column has what value. A one hot encoding is a representation of categorical variables as binary vectors. This first requires that the categorical values be mapped to integer values. This read on the claim concept of a same encoder is used to provide the encoded data of the first data set, and the encoded data of the second data set, (see Netz: Para. 0048, 0051, 0053, 0055, 0057 0059, 0061, 0069 and FIG. 1, 2, 12, 20 and 24). Prior to applying run length encoding of the column, the column can be reordered to group all of the most similar values as re-ordered column. This read on the claim concept of prior to providing encoded data, (see Netz: Para. 0079). Filtering and/or Aggregations such operations can be mathematically reduced to efficient operations over the data organized as columns. This read on the claim concept of filtering at least one column, see Netz: Para. 0110, 0112, 0115, 0122 and FIG. 23). 
Regarding dependent claim(s) 13, (drawn computer-readable storage medium): claims 13 is computer-readable storage medium claim respectively that correspond to method of claim 6. Therefore, 13 is rejected for at least the same reasons as the method of 6.
	Regarding dependent claim(s) 20, (drawn system): claims 20 is system claim respectively that correspond to method of claim 6. Therefore, 20 is rejected for at least the same reasons as the method of 6. 
Claims 4, 5, 11, 12, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Santos Moraes et al. (US 2019/0155904 A1, hereinafter Santos Moraes) in view of Gao et al. (US 2017/0032035 A1, hereinafter Gao), in view of Carlsson (US 2016/0267397 A1, hereinafter Carlsson) and in view of Hercus (US 2006/0149692 A1, hereinafter Hercus). 
Regarding dependent claim(s) 4, the combination of Santos, Gao and Carlsson discloses the method as in claim 3. However, the combination of Santos, Gao and Carlsson do not appears to specifically disclose wherein pre-processing comprises pre-appending one or more zeros to a numerical data value. 
In the same field of endeavor, Hercus discloses wherein pre-processing comprises preappending one or more zeros to a numerical data value (Hercus discloses the product of data preprocessing is the final training set. that for machine learning and neural networks a data preprocessing step is needed. So it has become to a universal technique which is used in computing in general. Training of such neural networks is accomplished, in its most basic form, by applying a specific input state to all the input neurons, selecting a specific output neuron to represent that input state, and adjusting the synaptic strengths or weights in the hidden layer. This read on the claim concept of wherein pre-processing, (see Hercus: Para. 0081, 0082 and 0089). The number of neurons that may be stored in an array which is numeric data value stored in memory storage, (see Hercus: Para. 0094 and 0143). This read on the claim concept of numerical data value. Zero per-adding (preappending) is a technique that allows to preserve the original input size. Neural networks are based on the concept of three layers an input neuron layer, a hidden neuron layer, and an output neuron layer. Add something like a double border or triple border of zeros to maintain the original size of the input. This is just going to depend on the size of the input and the size of the filters. Training neural network might need apply zero per-adding (pre-appending) because the pre-adding ensures that the output has the same shape as the input data. This read on the claim concept of appending one or more zeros, see Hercus: Para. 0066, 0086, 0087 and FIG. 2).
	Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data using deep neural networks within multi-dimensional vector to a loss-function of Santos, Gao and Carlsson in order to have incorporated appending one or more zero, disclosed by Hercus, since both of these are directed to training Neural Networks means that network is composed of two separate parts most of the times. The last part of your network, which often contains densely connected layers, generates a classification or regresses a value based on the inputs received by the first dense layer. The first part, however, serves as a feature extraction mechanism. It transforms the original inputs into bits of information which ensures that the dense layers perform better. Neural networks, two- and three-dimensional feature tensors can also be inputted to the model. During training, the machine adjusts its internal parameters to project each feature tensor close to its target. After training, the machine can be used to predict the target for previously unseen feature tensors. By incorporating the teachings of Hercus into Santos, Gao and Carlsson would produce a neural network comprising a plurality of neurons in which any one of the plurality of neurons is able to associate with itself or another neuron in the plurality of neurons via active connections to a further neuron in the plurality of neurons, as disclosed by Hercus, (see Abstract). 
	Regarding dependent claim(s) 11, (drawn computer-readable storage medium): claims 11 is computer-readable storage medium claim respectively that correspond to method of claim 4. Therefore,
11 is rejected for at least the same reasons as the method of 4.
	Regarding dependent claim(s) 18, (drawn system): claims 18 is system claim respectively that correspond to method of claim 4. Therefore, 18 is rejected for at least the same reasons as the method of 4. 
	Regarding dependent claim(s) 5, the combination of Santos, Gao and Carlsson discloses the method as in claim 3. However, the combination of Santos, Gao and Carlsson do not appears to specifically disclose wherein pre-processing comprises pre-appending one or more spaces to a string data value. 
	In the same field of endeavor, Hercus discloses wherein pre-processing comprises preappending one or more spaces to a string data value (Hercus discloses the product of data preprocessing is the final training set. that for machine learning and neural networks a data preprocessing step is needed.
So it has become to a universal technique which is used in computing in general. Training of such neural networks is accomplished, in its most basic form, by applying a specific input state to all the input neurons, selecting a specific output neuron to represent that input state, and adjusting the synaptic strengths or weights in the hidden layer. This read on the claim concept of wherein preprocessing, (see Hercus: Para. 0081, 0082 and 0089). Zero per-adding (pre-appending) is a technique that allows to preserve the original input size. Neural networks are based on the concept of three layers an input neuron layer, a hidden neuron layer, and an output neuron layer. Add something like a double border or triple border of zeros to maintain the original size of the input. This is just going to depend on the size of the input and the size of the filters. Training neural network might need apply zero per-adding (pre-appending) because the pre-adding ensures that the output has the same shape as the input data. This read on the claim concept of appending one or more zeros, (see Hercus: Para. 0066, 0086, 0087 and FIG. 2). A string is a data type used in programming, such as represent text rather than numbers. Without spaces between words, text would be completely unreadable and impossible to align. Spaces are a necessary tool for the creation of textual material. This read on the claim concept of spaces to a string data value, see Hercus: Para. 0077, 0111, 0122 and 0128). 
	Regarding dependent claim(s) 12, (drawn computer-readable storage medium): claims 12 is computer-readable storage medium claim respectively that correspond to method of claim 5. Therefore, 12 is rejected for at least the same reasons as the method of 5.
	Regarding dependent claim(s) 19, (drawn system): claims 19 is system claim respectively that correspond to method of claim 5. Therefore, 19 is rejected for at least the same reasons as the method of 5.
Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Santos Moraes et al. (US 2019/0155904 A1, hereinafter Santos Moraes) in view of Gao et al. (US 2017/0032035 A1, hereinafter Gao), in view of Carlsson (US 2016/0267397 A1, hereinafter Carlsson) and in view of Brewster et al. (US 2017 /0262491 A1, hereinafter Brewster).
Regarding dependent claim(s) 7, the combination of Santos, Gao and Carlsson discloses the method as in claim 1. However, the combination of Santos, Gao and Carlsson do not appears to specifically disclose further disclose further comprising determining a column type for each column of the plurality of columns. 
In the same field of endeavor, Brewster discloses further disclose further comprising determining a column type for each column of the plurality of columns (The first data set and the second data set are appended according to at least the identified plurality of matching columns and the user specification to generate a resulting data set. the number of spaces in the cells of the column, the number of punctuations in the cells of the column, the average length of values in the cells of the column, the variance of cell values in the column, the total number of words in cells of the column, the average number of words in cells of the column, and the number of symbol type transitions in cells of the column. This read on the claim concept of comprising determining a column type for each column of the plurality of columns, see Brewster: Para. 0020, 0021, 0025, 0040, 0048, 0050, 0051 and FIG. 9).
Accordingly, it would have been obvious to a person of ordinarily skill in the art before the effective filing date of the claimed invention to modify the first data set and the second data using deep neural networks within multi-dimensional vector to a loss-function of Santos, Gao and Carlsson in order to have incorporated a column type, disclosed by Brewster, since both of these are directed to it involves converting each value in a column to a number. Consider a dataset of bridges having a column names columns types. There will be many more columns in the dataset, to understand label encoding. Label Encoder converts each class under specified feature to a numerical value. Apply Label Encoder on each of the categorical columns. Each row represents a sample and each column represents a feature. It refers to splitting the column which contains numerical categorical data to many columns depending on the number of categories present in that column. Each column contains "O" or "1" corresponding to which column it has been placed. The data in the column usually denotes a category or value of the category and also when the data in the column is label encoded. Encoders are used to convert categorical data, or text data, into numbers, which our predictive models can better understand. Fitting a neural network involves using a training dataset to update the model weights to create a good mapping of inputs to outputs. Training a neural network involves using an optimization algorithm to find a set of weights to best map inputs to outputs. The problem is hard, not least because the error surface is non-convex and contains local minima, flat spots, and is highly multidimensional. The key to the efficacy of neural networks is they are extremely adaptive and learn very quickly. Each node weighs the importance of the input it receives from the nodes before it. The inputs that contribute the most towards the right output are given the highest weight. A multilayer perceptron has three or more layers. It is used to classify data that cannot be separated linearly. It is a type of artificial neural network that is fully connected. This is because every single node in a layer is connected to each node in the following layer. Incorporating the teachings of Brewster into Santos, Gao and Carlsson would produce based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns, a plurality of matching columns and a plurality of nonmatching columns, as disclosed by Brewster, (see Abstract). 
Regarding dependent claim(s) 14, (drawn computer-readable storage medium): claims 14 is computer-readable storage medium claim respectively that correspond to method of claim 7. Therefore, 14 is rejected for at least the same reasons as the method of 7. 
                                                                  Examiner's Notes
Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner and the additional related prior arts made of record that are considered pertinent to applicant's disclosure to further show the general state of the art. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOHANES Demiss KELEMEWORK whose telephone number is (571)272-8772.  The examiner can normally be reached on Monday-Friday 8:00 am-5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on 571-272-0631.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/YOHANES D KELEMEWORK/Examiner, Art Unit 2164                                                                                                                                                                                                        
/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164