DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Continued Examination Under 37 CFR 1.114
A Request for Continued Examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission, the Response to Final Office Action (“Response”) filed 25 March 2022, has been entered.
 
Status of the Claims
The currently pending claims in the present application are claims 1-27 as presented in the Response.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 5, 8-11, 14, 17-20, 23, 26, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pat. App. Pub. No. 2002/0143752 A1 to Plunkett et al. (“Plunkett”), in view of U.S. Pat. No. 10,902,344 B1 to Kenthapadi et al. (“Kenthapadi”), further in view of U.S. Pat. App. Pub. No. 2017/0300814 A1 to Shaked et al. (“Shaked”), and further in view of U.S. Pat. App. Pub. No. 2002/0002479 A1 to Almog et al. (“Almog”).
Regarding independent claim 1, Plunkett teaches the following limitations:
“A computer-implemented method of predictive benchmarking, the method comprising: collecting, by a number of processors, wage data from a number of sources, wherein the wage data comprises a number of dimensions.” Plunkett teaches, in para. [0027], “The server 150 includes a compensation calculation module 160.” Plunkett teaches, in para. [0027], “The server 150 includes database 155 (located in the storage drive 154) that is adapted to store various information concerning compensation by job title and geographic location. According to one embodiment, the compensation information of the various jobs are gathered periodically on a local, national and international basis and categorized by job title and geographic location.” The operation of the server, in Plunkett, reads on the claimed “computer-implemented method.” The compensation calculation, in Plunkett, reads on the claimed “method of predictive benchmarking.” The gathering of compensation information by the server, in Plunkett, reads on the claimed “collecting, by a number of processors, wage data from a number of sources.” The categorization parameters, in Plunkett, read on the claimed “wage data comprises a number of dimensions.”
“Receiving, by a number of processors, a user request for a number of wage benchmark forecasts.” Plunkett teaches, in para. [0029], “With reference to FIG. 2, at stage 202 of the method 200, a user of system 100 enters a specified Uniform Resonance Locator (URL) that connects the user to the Website of the server 150 containing the compensation calculation module 160. Upon access to the server 150, the user is greeted with a home page (stage 204) that allows the user to perform a compensation search at the server 150.” Plunkett teaches, in para. [0033], “The salaries are searched in the database 155 based on the job title but in any location and the retrieved salaries are compiled in a form suitable for graphical representation.” Receiving data tied to a compensation search, in Plunkett, reads on the claimed “receiving, by a number of processors, a user request for a number of wage benchmark forecasts.”
“Forecasting, for” “periods, by a number of processors, a number of wage benchmarks” “according to parameters in the user request.” Plunkett teaches, in para. [0041], “Referring now to FIG. 10, a salary comparison page 1000, which may be an extension of the salary report page 900 provides for graphical comparisons of salaries that the user may make in conjunction with the salary graph associated with a job title.” Plunkett teaches, in para. [0043], “With reference to FIG. 12, a comparison chart 1201 is produced that shows a salary graph 1202 for the job title at the first selected location and a salary graph 1204 for the job title at the location specified in the location entry 1003. At stage 425, the user may desire to compare a selected job title with a different but related job title at the same location previously selected by the user. The user enters a related job title in an job title entry 1005 and clicks the ‘Go’ button for a related job title. At stages 426-429, the server retrieves salary information based on the entered related job title and previously selected location. With reference to FIG. 13, a comparison chart 1301 is produced that shows a salary graph 1302 for the job title and a salary graph 1304 for the related job title at the selected location.” Providing multiple salary graphs and related information in Plunkett, based on user inputs, reads on the claimed “forecasting for” “periods, by a number of processors, a number of wage benchmarks” “according to parameters in the user request.” See also the references to “job seekers” in para. [0006] of Plunkett, and “expected to earn” phrasing in FIGS. 9, 9A, and 11-14 of Plunkett.
Kenthapadi teaches limitations below of independent claim 1 that do not appear to be explicitly taught in their entirety by Plunkett:
“Preprocessing, by a number of processors, the wage data.” Kenthapadi teaches, in col. 2, ll. 45-49, “A client device 102 may utilize a confidential data frontend 104 to submit confidential information to a confidential data backend 106.” Kenthapadi teaches, in col. 5, ll. 39-42, “It should be noted that the information obtained by the databus listener 110 from the confidential information database 108 and placed in the backend queue 112 is anonymized.” Kenthapadi teaches, in col. 6, ll. 52-59, “In FIG. 2A, the user is prompted to enter a base salary in a text box 202, with a drop-down menu providing options for different time periods by which to measure the base salary (e.g., per year, per month, per hour, etc.). Additionally, the user may be identified by name at 204, the user's title may be identified at 206, and the user's current employer may be identified at 208.” Kenthapadi teaches, in col. 10, ll. 42-47, “a machine learning model is utilized to predict confidential data values for a member based on other members' submitted confidential data values and a social networking profile for the member. Specifically, a confidential data value prediction model is trained using a machine learning algorithm.” Anonymizing salary data, in Kenthapadi, among a number of other forms of data processing, including performing machine learning, reads on the claimed “preprocessing, by a number of processors, the wage data.”
Kenthapadi describes, in col. 3, ll. 5-7, obtaining and using salary information, similar to the claimed invention and to Plunkett. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the data gathering processes of Plunkett, to include the data handling or processing steps of Kenthapadi, to address privacy concerns of users (see col. 1, ll. 19 and 20).
Shaked teaches limitations below of independent claim 1 that do not appear to be explicitly taught in their entirety by the combination of Plunkett and Kenthapadi:
“Training, by a number of processors, a wide linear part of a wide-and-deep model to emulate benchmarks and to memorize exceptions and co-occurrence of dimensions in the wage data.” Shaked teaches, in para. [0019], “FIG. 1 is a block diagram of an example of a wide and deep machine learning model 102 that includes a deep machine learning model 104, a wide machine learning model 106, and a combining layer 134.” Shaked teaches, in para. [0034], “The deep machine learning model 104 is a deep model that includes an embedding layer 150 and a deep neural network 130.” Shaked teaches, in para. [0035], “the deep machine learning model 104 is configured to process a first set of features included in the model input of the wide and deep learning model 102 to generate a deep model intermediate predicted output. For example, the deep machine learning model 104 is configured to process the first set of features 108-114. The embedding layer can apply embedding functions to one or more of the first set of features 108-114. For example, the embedding layer 150 applies embedding functions 124-128 to features 110-114. In some cases, the features that are processed by the embedding layer are sparse, categorical features such as user features (e.g., country, language, and demographics), contextual features (e.g., device, hour of the day, and day of the week), and impression features (e.g., app age, historical statistics of an app).” Shaked teaches, in para. [0037], “The wide machine learning model 106 is a wide and shallow model, e.g., a generalized linear model 138, that is configured to process a second set of features (e.g., features 116-122) included in the model input of the wide and deep learning model 102 and to generate a wide model intermediate predicted output.” Shaked teaches, in para. [0040], “The combined machine learning model 102 also includes a combining layer 134 that is configured to process the deep model intermediate predicted output generated by the deep machine learning model 104 and the wide model intermediate predicted output generated by the wide machine learning model 106 to generate the predicted output 136.” Shaked teaches, in para. [0054], “FIG. 3 is a flow diagram of an example process 300 for training a machine learning system that includes a wide and deep learning model.” Shaked teaches, in para. [0055], “To determine trained values of the parameters of the wide model and of the deep model, the system trains the combined model on training data.” Shaked teaches, in para. [0058], “The system then trains the combined model by, for each of the training inputs, processing the features of the training input using the deep machine learning model to generate a deep model intermediate predicted output for the training input in accordance with current values of parameters of the deep machine learning model (step 304).” Shaked teaches, in para. [0059], “The system processes the features of the training input using the wide machine learning model to generate a wide model intermediate predicted output for the training input in accordance with current values of parameters of the wide machine learning model (step 306).” Applying the wide machine learning model and related training to the data processed by the machine learning model, of the combination of Plunkett and Kenthapadi, reads on the claimed “training, by a number of processors, a wide linear part of a wide-and-deep model to emulate benchmarks and to memorize exceptions and co-occurrence of dimensions in the wage data.”
“Training, by a number of processors, a deep part of the wide-and-deep model to generalize rules for wage predictions across employment sectors based on relationships between dimensions, wherein the deep part is trained concurrently with the wide linear part.” See the passages of Shaked and the applied rationales of the immediately preceding bullet point. Shaked teaches, in para. [0056], “as described in FIG. 3, the system trains the wide model and the deep model jointly.” Shaked teaches, in para. [0060], “The system then processes the deep model intermediate predicted output and the wide model intermediate predicted output for the training input using the combining layer to generate a predicted output for the training input (step 308).” Shaked teaches, in para. [0061], “The system then determines an error between the predicted output for the training input and the known output for the training input. In addition, the system backpropagates a gradient determined from the error through the combining layer to the wide machine learning model and the deep machine learning model to jointly adjust the current values of the parameters of the deep machine learning model and the wide machine learning model in a direction that reduces the error (step 310). Furthermore, through the method of backpropagation, the system can send an error signal to the deep learning model, which allows the deep learning model to adjust the parameters of its internal components, e.g., the deep neural network and the set of embedding functions, though successive stages of backpropagation. The system can also send an error signal to the wide learning model to allow the wide learning model to adjust the parameters of the generalized linear model.” Applying the deep part of the model and related training to the data processed by the machine learning model of the combination of Plunkett and Kenthapadi, reads on the claimed “training, by a number of processors, a deep part of the wide-and-deep model to generalize rules for wage predictions across employment sectors based on relationships between dimensions.” Jointly training the wide model and the deep model in Kenthapadi teaches the claimed “wherein the deep part is trained concurrently with the wide linear part.”
“Wherein linear coefficients produced by the wide linear part are summed with nonlinear coefficients produced by the deep part according to parameters in the user request.” Shaked teaches, in para. [0037], “The wide machine learning model 106 is a wide and shallow model, e.g., a generalized linear model 138.” Shaked teaches, in para. [0046], “The combining embedding function can merge the respective floating point vector using a linear function, e.g., a sum, average, or weighted linear combination of the respective floating point vectors, or using a nonlinear function, e.g., a component-wise maximum or a norm-constrained linear combination.” Shaked teaches, in para. [0051], “The deep network includes multiple layers with at least one layer including a non-linear transformation. A non-linear transformation can be defined based on values of a respective set of parameters.” Shaked teaches, in para. [0052], “The system processes a second set of features from the obtained features using a wide machine learning model to generate a wide model intermediate predicted output (step 206).” Shaked teaches, in para. [0053], “The system processes the deep model output and the wide model output to generate a predicted output using a combining layer (step 208). Generally, the combining layer combines the deep model output and the wide model output, e.g., by computing a sum or a weighted sum of the two outputs, to generate a combined output and then generates the predicted output from the combined output.” Data or other elements related to a linear function of the wide model (generalized linear model) in Shaked reads on the claimed “linear coefficients produced by the wide linear part.” Data or other elements related to a non-linear transformation in Shaked reads on the claimed “nonlinear coefficients produced by the deep part.” Foundational data on which models are built and trained in Shaked reads on the claimed “parameters in the user request.” Computing a sum of a deep model output and a wide model output in Shaked read on the claimed “summed” step.
Shaked describes a machine learning model (see abstract), similar to the claimed invention and the combination of Plunkett and Kenthapadi. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the machine learning model of the combination of Plunkett and Kenthapadi, to include a wide and deep machine learning model as in Shaked, to obtain benefits of memorization and generalization, resulting in better outputs, as taught by Shaked (see para. [0013]).
Almog teaches limitations below of independent claim 1 that do not appear to be explicitly taught in their entirety by the combination of Plunkett, Kenthapadi, and Shaked:
The claimed “forecasting” is “for future periods.” Almog teaches, in para. [0079], “server 22 optionally estimates changes in salaries expected in the future.” Almog teaches, in para. [0080], “Optionally, the rate at which the salaries of certain workers increase, the rate at which job openings are filled, and/or the changes in the numbers of workers in certain fields are used to estimate the salaries in certain industries in the future.” The estimating of future salaries and/or changes to salaries in the future, in Almog, reads on the claimed “forecasting for future periods.”
“Displaying, by a number of processors, the wage benchmark forecasts.” Plunkett teaches, in para. [0041], “Referring now to FIG. 10, a salary comparison page 1000, which may be an extension of the salary report page 900 provides for graphical comparisons of salaries that the user may make in conjunction with the salary graph associated with a job title.” Displaying salary graphs and related information, as depicted in the screenshots of Plunkett, when Plunkett is modified to include the future salary estimates of Almog, reads on the claimed “displaying, by a number of processors, the wage benchmark forecasts.”
Almog describes systems and methods used for job placement (see abstract) and job hunting (see paras. [0002], [0003], and [0005]), similar to the claimed invention and to the combination of Plunkett, Kenthapadi, and Shaked. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have widened the salary forecasts of the combination of Plunkett, Kenthapadi, and Shaked, to include forecasts for future salaries, as in Almog, to provide relevant data for career planning as trends change, as taught by Almog (see para. [0079]).
Regarding claim 2, the combination of Plunkett, Kenthapadi, Shaked, and Almog teaches the following limitations:
“The method of claim 1, wherein wage benchmarks comprise at least one of: average annual base salary; median annual base salary; percentiles of annual base salary; average hourly rate; median hourly rate; or percentiles of hourly rate.” Plunkett teaches, in para. [0034], “the line graph 902 shows a median base salary 904 of $49,769.”
Regarding claim 5, the combination of Plunkett, Kenthapadi, Shaked, and Almog teaches the following limitations:
“The method of claim 1, wherein the dimensions comprise at least one of: region; subregion; work state; metropolitan and micropolitan statistical area codes; combined metropolitan statistical area codes; North American Industry Classification System codes; industry sector; industry subsector; industry supersector; industry combo; industry crosssector; employee headcount band; employer revenue band; job title; occupation; job level; or tenure.” Plunkett teaches, in para. [0027], “the compensation information of the various jobs are gathered periodically on a local, national and international basis and categorized by job title and geographic location.”
Regarding claim 8, the combination of Plunkett, Kenthapadi, Shaked, and Almog teaches the following limitations:
“The method of claim 1, wherein cross terms provide sharing information between pairs of dimensions, and wherein dimensions are added to correct for the outliers in the wage data.” As explained above, Plunkett teaches elements that read on the claimed “wage data.” Shaked teaches, in para. [0038], “the wide machine learning model 106 is configured to process both the original input features (e.g. features 120 and 122) in the second set of features and transformed features generated from the other features (e.g., features 116-118), e.g., using a cross-product feature transformation 132, to generate the wide model intermediate output. In some cases, the cross-product feature transformation 132 is applied to categorical features.” Multiple features that are involved in a cross-product feature transformation in Shaked read on the claimed “cross terms provide sharing information between pairs of dimensions.” Providing additional features to the cross-product feature transformation in Shaked reads on the claimed “dimensions are added to correct for the outliers,” in that a resultant effect of adding features in Shaked would be to correct for any outliers. The reasoning for combining the teachings of Plunkett and Shaked, with the teachings of the other cited references, in the rejection of claim 1, also apply to this rejection of claim 8.
Regarding claim 9, the combination of Plunkett, Kenthapadi, Shaked, and Almog teaches the following limitations:
“The method of claim 1, wherein dimension embeddings map benchmark dimensions to lower-dimensional vectors, wherein categories predefined as similar to each other have values within a predefined proximity at one or more coordinates.” Shaked teaches, in para. [0035], “The embedding layer can apply embedding functions to one or more of the first set of features 108-114. For example, the embedding layer 150 applies embedding functions 124-128 to features 110-114. In some cases, the features that are processed by the embedding layer are sparse, categorical features such as user features (e.g., country, language, and demographics), contextual features (e.g., device, hour of the day, and day of the week), and impression features (e.g., app age, historical statistics of an app). Other features that are not processed by the embedding layer may include continuous features such as a number of installations of a software application. Each of the embedding functions 124-128 applies a transformation to each of the features 110-114 that maps each of the features 110-114 to a respective numeric embedding, e.g., a floating point vector presentation of the feature.” Shaked teaches, in para. [0046], “In order to identify the respective floating point vectors, the parallel embedding function may use a single look up table or multiple different look up tables. For example, for the ordered list {‘Atlanta’, ‘Hotel’}, the parallel embedding function may map ‘Atlanta’ to a vector [0.1, 0.2, 0.3] and ‘Hotel’ to [0.4, 0.5, 0.6], and then output the sum of the two vectors, i.e., [0.5, 0.7, 0.9].” The embeddings in Shaked read on the claimed “dimension embeddings.” The relationships between embeddings and vectors in Shaked read on the claimed “embeddings map benchmark dimensions to lower-dimensional vectors.” The assigning of values to relating terms in Shaked reads on the claimed “categories predefined as similar to each other have values within a predefined proximity at one or more coordinates,” wherein values of the summed vectors in Shaked read on the claimed “one or more coordinates.” The reasoning for combining the teachings of Plunkett and Shaked, with the teachings of the other cited references, in the rejection of claim 1, also apply to this rejection of claim 9.
Regarding independent claim 10, Plunkett teaches the following limitations:
“A system for predictive benchmarking, the system comprising:” “a storage device” “wherein the storage device stores program instructions; and a number of processors” “wherein the number of processors execute the program instructions to” perform steps listed below. Plunkett teaches, in para. [0026], “FIG. 1 illustrates a first embodiment of the present invention of a system 100 for calculating compensation.” Plunkett teaches, in para. [0027], “The server 150 includes a compensation calculation module 160. The compensation calculation module 160 may be implemented as computer instructions contained in a computer readable medium.” Plunkett teaches, in para. [0027], “The server comprises a processing unit 152, a storage device 154 such as a fixed disk drive and a main memory 156 that are in communication with each other.” A system for calculating compensation in Plunkett reads on the claimed “system for predictive benchmarking.” A storage device or other memory in Plunkett reads on the claimed “storage device stores program instructions.” A processing unit for executing computer instructions in Plunkett reads on the claimed “processors execute the program instructions.”
“Collect wage data from a number of sources, wherein the wage data comprises a number of dimensions.” Plunkett teaches, in para. [0027], “The server 150 includes database 155 (located in the storage drive 154) that is adapted to store various information concerning compensation by job title and geographic location. According to one embodiment, the compensation information of the various jobs are gathered periodically on a local, national and international basis and categorized by job title and geographic location.” The gathering of compensation information by a server in Plunkett reads on the claimed “collect wage data from a number of sources.” The categorizing of parameters in Plunkett read on the claimed “wage data comprises a number of dimensions.”
Kenthapadi teaches limitations below of independent claim 10 that do not appear to be explicitly taught in their entirety by Plunkett:
“A bus system,” the claimed “storage device” being “connected to the bus system,” and the claimed “processors” being “connected to the bus system.” Kenthapadi teaches, in col. 21, ll. 43-46, “The machine 1300 may include processors 1310, memory/storage 1330, and I/O components 1350, which may be configured to communicate with each other such as via a bus 1302.” Kenthapadi teaches, in col. 21, ll. 64-67, “The memory/storage 1330 may include a memory 1332, such as a main memory, or other memory storage, and a storage unit 1336, both accessible to the processors 1310 such as via the bus 1302.” A bus of Kenthapadi reads on the claimed “bus system.” A memory/storage of Kenthapadi reads on the claimed “storage device,” a processor of Kenthapadi reads on the claimed “processors,” and their connections to the bus in Kenthapadi (see FIG. 13) reads on the claimed “connected to the bus system.”
“Preprocess the wage data.” Kenthapadi teaches, in col. 2, ll. 45-49, “A client device 102 may utilize a confidential data frontend 104 to submit confidential information to a confidential data backend 106.” Kenthapadi teaches, in col. 5, ll. 39-42, “It should be noted that the information obtained by the databus listener 110 from the confidential information database 108 and placed in the backend queue 112 is anonymized.” Kenthapadi teaches, in col. 6, ll. 52-59, “In FIG. 2A, the user is prompted to enter a base salary in a text box 202, with a drop-down menu providing options for different time periods by which to measure the base salary (e.g., per year, per month, per hour, etc.). Additionally, the user may be identified by name at 204, the user's title may be identified at 206, and the user's current employer may be identified at 208.” Kenthapadi teaches, in col. 10, ll. 42-47, “a machine learning model is utilized to predict confidential data values for a member based on other members' submitted confidential data values and a social networking profile for the member. Specifically, a confidential data value prediction model is trained using a machine learning algorithm.” Anonymizing salary data in Plunkett, among a number of other forms of data processing, including performing machine learning, reads on the claimed “preprocess the wage data.”
Kenthapadi teaches, in col. 3, ll. 5-7, obtaining and using salary information, similar to the claimed invention and Plunkett. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the data gathering processes of Plunkett, to include the data handling or processing steps and related underlying computer architecture of Kenthapadi, to address privacy concerns of users (see col. 1, ll. 19 and 20).
Shaked teaches limitations below of independent claim 10 that do not appear to be explicitly taught in their entirety by the combination of Plunkett and Kenthapadi:
“Train a wide linear part of a wide-and-deep model to emulate benchmarks and to memorize exceptions and co-occurrence of dimensions in the wage data.” Shaked teaches, in para. [0019], “FIG. 1 is a block diagram of an example of a wide and deep machine learning model 102 that includes a deep machine learning model 104, a wide machine learning model 106, and a combining layer 134.” Shaked teaches, in para. [0034], “The deep machine learning model 104 is a deep model that includes an embedding layer 150 and a deep neural network 130.” Shaked teaches, in para. [0035], “the deep machine learning model 104 is configured to process a first set of features included in the model input of the wide and deep learning model 102 to generate a deep model intermediate predicted output. For example, the deep machine learning model 104 is configured to process the first set of features 108-114. The embedding layer can apply embedding functions to one or more of the first set of features 108-114. For example, the embedding layer 150 applies embedding functions 124-128 to features 110-114. In some cases, the features that are processed by the embedding layer are sparse, categorical features such as user features (e.g., country, language, and demographics), contextual features (e.g., device, hour of the day, and day of the week), and impression features (e.g., app age, historical statistics of an app).” Shaked teaches, in para. [0037], “The wide machine learning model 106 is a wide and shallow model, e.g., a generalized linear model 138, that is configured to process a second set of features (e.g., features 116-122) included in the model input of the wide and deep learning model 102 and to generate a wide model intermediate predicted output.” Shaked teaches, in para. [0040], “The combined machine learning model 102 also includes a combining layer 134 that is configured to process the deep model intermediate predicted output generated by the deep machine learning model 104 and the wide model intermediate predicted output generated by the wide machine learning model 106 to generate the predicted output 136.” Shaked teaches, in para. [0054], “FIG. 3 is a flow diagram of an example process 300 for training a machine learning system that includes a wide and deep learning model.” Shaked teaches, in para. [0055], “To determine trained values of the parameters of the wide model and of the deep model, the system trains the combined model on training data.” Shaked teaches, in para. [0058], “The system then trains the combined model by, for each of the training inputs, processing the features of the training input using the deep machine learning model to generate a deep model intermediate predicted output for the training input in accordance with current values of parameters of the deep machine learning model (step 304).” Shaked teaches, in para. [0059], “The system processes the features of the training input using the wide machine learning model to generate a wide model intermediate predicted output for the training input in accordance with current values of parameters of the wide machine learning model (step 306).” Applying the wide machine learning model and related training to the data processed by the machine learning model of the combination of Plunkett and Kenthapadi, reads on the claimed “train a wide linear part of a wide-and-deep model to emulate benchmarks and to memorize exceptions and co-occurrence of dimensions in the wage data.”
“Train a deep part of the wide-and-deep model to generalize rules for wage predictions across employment sectors based on relationships between dimensions, wherein the deep part is trained concurrently with the wide linear part.” See the passages of Shaked and the applied rationales of the immediately preceding bullet point. Shaked teaches, in para. [0056], “as described in FIG. 3, the system trains the wide model and the deep model jointly.” Shaked teaches, in para. [0060], “The system then processes the deep model intermediate predicted output and the wide model intermediate predicted output for the training input using the combining layer to generate a predicted output for the training input (step 308).” Shaked teaches, in para. [0061], “The system then determines an error between the predicted output for the training input and the known output for the training input. In addition, the system backpropagates a gradient determined from the error through the combining layer to the wide machine learning model and the deep machine learning model to jointly adjust the current values of the parameters of the deep machine learning model and the wide machine learning model in a direction that reduces the error (step 310). Furthermore, through the method of backpropagation, the system can send an error signal to the deep learning model, which allows the deep learning model to adjust the parameters of its internal components, e.g., the deep neural network and the set of embedding functions, though successive stages of backpropagation. The system can also send an error signal to the wide learning model to allow the wide learning model to adjust the parameters of the generalized linear model.” Applying the deep part of the model and related training to the data processed by the machine learning model of the combination of Plunkett and Kenthapadi, reads on the claimed “train a deep part of the wide-and-deep model to generalize rules for wage predictions across employment sectors based on relationships between dimensions.” Jointly training the wide model and the deep model in Kenthapadi teaches the claimed “wherein the deep part is trained concurrently with the wide linear part.”
“Wherein linear coefficients produced by the wide linear part are summed with nonlinear coefficients produced by the deep part according to parameters in the user request.” Shaked teaches, in para. [0037], “The wide machine learning model 106 is a wide and shallow model, e.g., a generalized linear model 138.” Shaked teaches, in para. [0046], “The combining embedding function can merge the respective floating point vector using a linear function, e.g., a sum, average, or weighted linear combination of the respective floating point vectors, or using a nonlinear function, e.g., a component-wise maximum or a norm-constrained linear combination.” Shaked teaches, in para. [0051], “The deep network includes multiple layers with at least one layer including a non-linear transformation. A non-linear transformation can be defined based on values of a respective set of parameters.” Shaked teaches, in para. [0052], “The system processes a second set of features from the obtained features using a wide machine learning model to generate a wide model intermediate predicted output (step 206).” Shaked teaches, in para. [0053], “The system processes the deep model output and the wide model output to generate a predicted output using a combining layer (step 208). Generally, the combining layer combines the deep model output and the wide model output, e.g., by computing a sum or a weighted sum of the two outputs, to generate a combined output and then generates the predicted output from the combined output.” Data or other elements related to a linear function of the wide model (generalized linear model) in Shaked reads on the claimed “linear coefficients produced by the wide linear part.” Data or other elements related to a non-linear transformation in Shaked reads on the claimed “nonlinear coefficients produced by the deep part.” Foundational data on which models are built and trained in Shaked reads on the claimed “parameters in the user request.” Computing a sum of a deep model output and a wide model output in Shaked read on the claimed “summed” step.
Shaked teaches a machine learning model (see abstract), similar to the claimed invention and the combination of Plunkett and Kenthapadi. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the machine learning model of the combination of Plunkett and Kenthapadi, to include a wide and deep machine learning model as in Shaked, to obtain benefits of memorization and generalization, resulting in better outputs, as taught by Shaked (see para. [0013]).
Almog teaches limitations below of independent claim 10 that do not appear to be explicitly taught in their entirety by the combination of Plunkett, Kenthapadi, and Shaked:
“Receive a user request for a number of wage benchmark forecasts.” Plunkett teaches, in para. [0029], “With reference to FIG. 2, at stage 202 of the method 200, a user of system 100 enters a specified Uniform Resonance Locator (URL) that connects the user to the Website of the server 150 containing the compensation calculation module 160. Upon access to the server 150, the user is greeted with a home page (stage 204) that allows the user to perform a compensation search at the server 150.” Plunkett teaches, in para. [0033], “The salaries are searched in the database 155 based on the job title but in any location and the retrieved salaries are compiled in a form suitable for graphical representation.” Receiving data tied to a compensation search in Plunkett reads on the claimed “receive a user request for a number of wage benchmark” values. Plunkett does not appear to explicitly teach that the values are forecasts, to the extent forecasts are for future periods (a contention made by the applicant in earlier communications tied to the present application). Almog teaches, in para. [0079], “server 22 optionally estimates changes in salaries expected in the future.” Almog teaches, in para. [0080], “Optionally, the rate at which the salaries of certain workers increase, the rate at which job openings are filled, and/or the changes in the numbers of workers in certain fields are used to estimate the salaries in certain industries in the future.” The estimated values for future salaries, in Almog, read on the claimed “wage benchmark forecasts.”
“Forecast a number of wage benchmarks” “according to parameters in the user request.” Plunkett teaches, in para. [0041], “Referring now to FIG. 10, a salary comparison page 1000, which may be an extension of the salary report page 900 provides for graphical comparisons of salaries that the user may make in conjunction with the salary graph associated with a job title.” Plunkett teaches, in para. [0043], “With reference to FIG. 12, a comparison chart 1201 is produced that shows a salary graph 1202 for the job title at the first selected location and a salary graph 1204 for the job title at the location specified in the location entry 1003. At stage 425, the user may desire to compare a selected job title with a different but related job title at the same location previously selected by the user. The user enters a related job title in an job title entry 1005 and clicks the ‘Go’ button for a related job title. At stages 426-429, the server retrieves salary information based on the entered related job title and previously selected location. With reference to FIG. 13, a comparison chart 1301 is produced that shows a salary graph 1302 for the job title and a salary graph 1304 for the related job title at the selected location.” Providing multiple salary graphs and related information in Plunkett, based on user inputs, reads on estimating “a number of wage benchmarks” “according to parameters in the user request.” Plunkett does not, however, appear to explicitly teach or suggest that the estimates include forecasts, to the extent that forecasts are for future periods. Almog teaches, in para. [0079], “server 22 optionally estimates changes in salaries expected in the future.” Almog teaches, in para. [0080], “Optionally, the rate at which the salaries of certain workers increase, the rate at which job openings are filled, and/or the changes in the numbers of workers in certain fields are used to estimate the salaries in certain industries in the future.” The estimating of values for future salaries, in Almog, reads on the claimed “forecast a number of wage benchmarks.”
“Display the wage benchmark forecasts.” Plunkett teaches, in para. [0041], “Referring now to FIG. 10, a salary comparison page 1000, which may be an extension of the salary report page 900 provides for graphical comparisons of salaries that the user may make in conjunction with the salary graph associated with a job title.” Displaying salary graphs and related information, as depicted in the screenshots of Plunkett, after modifying Plunkett to include the estimates of future salaries of Almog, reads on the claimed “display the wage benchmark forecasts.”
Almog describes systems and methods used for job placement (see abstract) and job hunting (see paras. [0002], [0003], and [0005]), similar to the claimed invention and to the combination of Plunkett, Kenthapadi, and Shaked. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have widened the salary forecasts of the combination of Plunkett, Kenthapadi, and Shaked, to include forecasts for future salaries, as in Almog, to provide relevant data for career planning as trends change, as taught by Almog (see para. [0079]).
Regarding claims 11, 14, 17, and 18, while the claims are of different scope relative to claims 2, 5, 8, and 9, the claims nevertheless recite limitations similar to the limitations recited by claims 2, 5, 8, and 9. Claims 11, 14, 17, and 18 are, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, and Almog for the same reasons as claims 2, 5, 8, and 9.
Regarding claims 19, 20, 23, 26, and 27, while the claims are of different scope relative to claims 1, 2, 5, 8, and 9 and claims 10, 11, 14, 17, and 18, the claims nevertheless recite limitations similar to the limitations recited by claims 1, 2, 5, 8, and 9 and claims 10, 11, 14, 17, and 18. Claims 19, 20, 23, 26, and 27 are, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, and Almog for the same reasons as claims 1, 2, 5, 8, and 9 and claims 10, 11, 14, 17, and 18.
Claims 3, 12, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Plunkett, in view of Kenthapadi, further in view of Shaked, further in view of Almog, and further in view of online publication Verbamour (2016, September 9). Can Tensorflow Wide and Deep model train to continuous values. Stack Overflow. https://stackoverflow.com/questions/39414369/can_ tensorflow-wide-and-deep-model-train-to-continuous-values (hereinafter referred to as “Stack Overflow”).
Regarding claim 3, the combination of Plunkett, Kenthapadi, Shaked, and Almog teaches the following limitations:
“The method of claim 2, wherein the wide-and-deep model uses” “regression to calculate average base salary.” Plunkett teaches, in para. [0034], “the line graph 902 shows a median base salary 904 of $49,769,” which reads on the claimed “calculate average base salary.” As explained above, Kenthapadi teaches using a machine learning model to make predictions about compensation values like those in Plunkett. Shaked teaches, in para. [0007], “The combining layer can be a logistic regression layer that is configured to process the deep model intermediate predicted output generated by the deep machine learning model and the wide model intermediate predicted output generated by the wide machine learning model to generate a score that represents the likelihood that the particular objective will be satisfied if the content item is presented in the content presentation setting.” Using a regression layer in calculations in Shaked reads on the claimed “wide-and-deep model uses” “regression to calculate.” The rationales for combining the teachings of Plunkett, Kenthapadi, Shaked, and Almog, to reject claim 1, also apply to this rejection of claim 3.
Stack Overflow teaches limitations below of claim 3 that do not appear to be explicitly taught in their entirety by the combination of Plunkett, Kenthapadi, Shaked, and Almog:
The claimed “regression” includes “linear regression.” Stack Overflow teaches “working with the Tensorflow Wide and Deep model” (see p. 1). Stack Overflow teaches to “like the structure of the Wide and Deep model, especially the ability to run the linear regression” (see p. 1).
Stack Overflow teaches use of a wide and deep model (see p. 1), similar to the claimed invention and the combination of Plunkett, Kenthapadi, Shaked, and Almog. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the regression aspects in the combination of Plunkett, Kenthapadi, Shaked, and Almog to include the linear regression aspects of Stack Overflow, for the advantage of being able to determine how learnable data is, as taught by Stack Overflow (see p. 1).
Regarding claim 12, while the claim is of different scope relative to claim 3, the claim nevertheless recites limitations similar to the limitations recited by claim 3. Claim 12 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, Almog, and Stack Overflow for the same reasons as claim 3.
Regarding claim 21, while the claim is of different scope relative to claim 3 and claim 12, the claim nevertheless recites limitations similar to the limitations recited by claim 3 and claim 12. Claim 21 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, Almog, and Stack Overflow for the same reasons as claim 3 and claim 12.
Claims 4, 13, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Plunkett, in view of Kenthapadi, further in view of Shaked, further in view of Almog, further in view of Zweig, J. (2018, March 27). Deep Quantile Regression in Tensorflow. Towards Data Science. https://towardsdatascience.com/deep-quantile-regression-in-tensorflow-1dbc792fe597 (hereinafter referred to as “Zweig”), and further in view of Hoaglin, D. (2013, May 28). Re: st: Quantile vs. Quartile regression. Stata. https://www.stata.com/statalist/archive/2013-05/msg00946.html (hereinafter referred to as “Hoaglin”).
Regarding claim 4, the combination of Plunkett, Kenthapadi, Shaked, and Almog teaches the following limitations:
“The method of claim 2, wherein the wide-and-deep model uses” “regression to calculate” “base salary.” Plunkett teaches, in para. [0034], “the line graph 902 shows a median base salary 904 of $49,769,” which reads on the claimed “calculate” “base salary.” As explained above, Kenthapadi teaches using a machine learning model to make predictions about compensation values like those in Plunkett. Shaked teaches, in para. [0007], “The combining layer can be a logistic regression layer that is configured to process the deep model intermediate predicted output generated by the deep machine learning model and the wide model intermediate predicted output generated by the wide machine learning model to generate a score that represents the likelihood that the particular objective will be satisfied if the content item is presented in the content presentation setting.” Using a regression layer in calculations in Shaked reads on the claimed “wide-and-deep model uses” “regression to calculate.” The rationales for combining the teachings of Plunkett, Kenthapadi, Shaked, and Almog, in the rejection of claim 1, also apply to this rejection of claim 4.
Zweig teaches limitations below of claim 4 that do not appear to be explicitly taught in their entirety by the combination of Plunkett, Kenthapadi, Shaked, and Almog:
The claimed “regression” includes quantile “regression to calculate percentile.” Zweig teaches, on p. 1, “A key challenge in deep learning is how to get estimates on the bounds of predictors. Quantile regression, first introduced in the 70’s by Koenker and Bassett [1], allows us to estimate percentiles of the underlying conditional data distribution even in cases where they are asymmetric.”
Zweig teaches deep learning (see p. 1) similar to the claimed invention and the combination of Plunkett, Kenthapadi, Shaked, and Almog. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the regression used by the wide and deep model of the combination of Plunkett, Kenthapadi, Shaked, and Almog to include the quantile regression for estimating percentiles of Zweig, for giving insight on the relationship of the variability between predictors and responses, as taught by Zweig (see p. 1).
Hoaglin teaches limitations below of claim 4 that do not appear to be explicitly taught in their entirety by the combination of Plunkett, Kenthapadi, Shaked, Almog, and Zweig:
The aforementioned quantile “regression” includes “quartile regression.” Hoaglin teaches, on p. 1, the relationship between quantile regression and quartile regression.
Hoaglin teaches statistical analyses (see p. 1) similar to those of the claimed invention and the combination of Plunkett, Kenthapadi, Shaked, Almog, and Zweig. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the quantile regression of the combination of Plunkett, Kenthapadi, Shaked, Almog, and Zweig, to include quartile regression as in Hoaglin, as they are related forms of statistical analyses widely known in the prior art, as taught by Hoaglin (see p. 1).
Regarding claim 13, while the claim is of different scope relative to claim 4, the claim nevertheless recites limitations similar to the limitations recited by claim 4. Claim 13 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, Almog, Zweig, and Hoaglin for the same reasons as claim 4.
Regarding claim 22, while the claim is of different scope relative to claim 4 and claim 13, the claim nevertheless recites limitations similar to the limitations recited by claim 4 and claim 13. Claim 22 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, Almog, Zweig, and Hoaglin for the same reasons as claim 4 and claim 13.
Claims 6, 15, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Plunkett, in view of Kenthapadi, further in view of Shaked, further in view of Almog, and further in view of Ruder, S. (2017, March 21). Transfer Learning - Machine Learning’s Next Frontier. Sebastian Ruder. https://ruder.io/transfer-learning/ (hereinafter referred to as “Ruder”).
Regarding claim 6, Ruder teaches limitations below that do not appear to be explicitly taught in their entirety by the combination of Plunkett, Kenthapadi, Shaked, and Almog:
“The method of claim 1, wherein the wide-and-deep model is trained through transfer learning.” As explained above, the combination of Plunkett, Kenthapadi, Shaked, and Almog teaches elements reading on the claimed “wide-and-deep model is trained.” Ruder teaches, on p. 3, “The traditional supervised learning paradigm breaks down when we do not have sufficient labeled data for the task or domain we care about to train a reliable model,” and “Transfer learning allows us to deal with these scenarios by leveraging the already existing labeled data of some related task or domain.”
Ruder teaches machine learning principles (see title) involving deep learning (see p. 1), similar to the claimed invention and the combination of Plunkett, Kenthapadi, Shaked, and Almog. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the machine learning model and related training of the combination of Plunkett, Kenthapadi, Shaked, and Almog, to include the transfer learning of Ruder, because transfer learning can help with addressing novel scenarios and may be necessary for production-scale use of machine learning that goes beyond tasks and domains were labeled data is plentiful, as taught by Ruder (see p. 6).
Regarding claim 15, while the claim is of different scope relative to claim 6, the claim nevertheless recites limitations similar to the limitations recited by claim 6. Claim 15 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, Almog, and Ruder for the same reasons as claim 6.
Regarding claim 24, while the claim is of different scope relative to claim 6 and claim 15, the claim nevertheless recites limitations similar to the limitations recited by claim 6 and claim 15. Claim 24 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, Almog, and Ruder for the same reasons as claim 6 and claim 15.
Claims 7, 16, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Plunkett, in view of Kenthapadi, further in view of Shaked, further in view of Almog, and further in view of Dietz, M. (2017, May 21). Understand Deep Residual Networks--a simple, modular, learning framework that has redefined state-of-the-art. Michael Dietz. https://medium.com/@waya.ai/deep-residual-learning-9610bb62c355 (hereinafter referred to as “Dietz”).
Regarding claim 7, the combination of Plunkett, Kenthapadi, Shaked, and Almog teaches the following limitations:
“The method of claim 1, wherein the linear wide part of the model assists the deep part of the model.” See the rationales in the rejection of claim 1, and in particular, those involving the teachings of Shaked.
Dietz teaches limitations below of claim 7 that do not appear to be explicitly taught in their entirety by the combination of Plunkett, Kenthapadi, Shaked, and Almog:
The claimed “assists” step involves “residual learning.” Dietz teaches, on p. 5, “Wide Residual Networks showed the power of these networks is actually in residual blocks, and that the effect of depth is supplementary at a certain point. Aggregated Residual Transformations for Deep Neural Networks builds on this, and exposes a new dimension called cardinality as an essential network parameter, in addition to depth and width.” The relationship between a wide residual network, residuals, and a deep neural network in Dietz, reads on the claimed “assists with residual learning.”
Dietz teaches deep and wide learning (see p. 1), similar to the claimed invention and the combination of Plunkett, Kenthapadi, Shaked, and Almog. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the wide-and-deep model of the combination of Plunkett, Kenthapadi, Shaked, and Almog to utilize the residual learning of Dietz, because network depth is of crucial importance in neural network architectures, but deeper networks are more difficult to train; and a residual learning framework eases the training of these networks, and enables them to be substantially deeper--leading to improved performance in both visual and non-visual tasks, as taught by Dietz (see p. 2).
Regarding claim 16, while the claim is of different scope relative to claim 7, the claim nevertheless recites limitations similar to the limitations recited by claim 7. Claim 16 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, Almog, and Dietz for the same reasons as claim 7.
Regarding claim 25, while the claim is of different scope relative to claim 7 and claim 16, the claim nevertheless recites limitations similar to the limitations recited by claim 7 and claim 16. Claim 25 is, therefore, rejected under 35 USC 103 as obvious in view of the combination of Plunkett, Kenthapadi, Shaked, Almog, and Dietz for the same reasons as claim 7 and claim 16.



Response to Arguments
On pp. 9-14 of the Response, the applicant argues for reconsideration and withdrawal of the claim rejections under 35 USC 103. More specifically, the applicant argues, on p. 11 of the Response, “what Plunkett teaches” “is not forecasting for future periods (looking at historical trends to predict future directions) but instead, computation and presentation of a current median or average salary using current salary data stored in the system. Plunkett does not look backwards at historical changes. In particular, Plunkett cannot project forward in time to future values.” The applicant also argues, on pp. 11 and 12 of the Response, “Kenthapadi and Shaked fail to cure the deficiencies of Plunkett noted above.” The applicant’s arguments above have been considered but are moot because the new grounds of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. That is, the cited Almog reference, and not Plunkett alone, is relied upon in the claim rejections under 35 USC 103, to teach the claimed forecasting for future periods.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Such prior art includes the following:
U.S. Pat. No. 7,571,110 B2 to Tarr et al. describes providing compensation reports (see abstract), wherein FIG. 27 shows the ability to “Find out” both “The average compensation” and “Expected increases with experience.”
U.S. Pat. No. 8,001,057 B1 to Hill describes job seeking processes (see abstract), involving “salary forecasting” in a way that does not necessarily entail looking at future periods (see col. 12, ll. 5, 9, and 10).
U.S. Pat. App. Pub. No. 2010/0057659 A1 to Phelon et al. describes a career navigation system (see abstract), wherein, “Statistics are also available to predict future annual compensation, as opposed to just salary (e.g., the U.S. Department of Labor's Bureau of Labor Statistics' National Compensation Survey--Average Annual Compensation)” (see para. [0092]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS Y HO, whose telephone number is (571)270-7918. The examiner can normally be reached Monday through Friday, 9:30 AM to 5:30 PM Eastern.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jerry O'Connor, can be reached at 571-272-6787. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THOMAS YIH HO/Examiner, Art Unit 3624                                                                                                                                                                                                        


/Jerry O'Connor/Supervisory Patent Examiner,Group Art Unit 3624