DETAILED ACTION
Response to Arguments
In response to Applicant’s amendments to the specification and claims and Applicant’s  remarks filed on 11/18/2021, Examiner withdraws the rejection to the benefit claim and the 112(b) rejections. Examiner also withdraws the objection to the drawings. 
Applicant’s arguments with respect to claims 1 and 12 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/17/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-19 are rejected under 35 U.S.C. 103 as being unpatentable over Miller et al. "Key-value memory networks for directly reading documents." arXiv preprint arXiv:1606.03126 (2016) in view of Akerib et al. US 2015/0146491 Al and in view of Ehrman et al. US 2015/0200009 A1.
Regarding claim 1, Miller teaches a system for natural language processing, the system comprising: a memory array having rows and columns, said memory array being divided into a similarity section initially storing a plurality of feature or key vectors(Miller, pg. 3, sec. 3.1 Model Description, fig. 1(Key embeddings memory section), “[O]ne defines a memory, which is a possibly very large array of slots which can encode both long-term and short term… [i]n KV-MemNNs we define the memory slots as pairs of vectors                         
                            
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            v
                                        
                                        
                                            1
                                        
                                    
                                
                            
                            …
                            ,
                            (
                            
                                
                                    k
                                
                                
                                    M
                                
                            
                            ,
                             
                            
                                
                                    v
                                
                                
                                    M
                                
                            
                            )
                        
                     and denote the question x.”), a SoftMax section in which to determine probabilities of occurrence of said feature or key vectors(Miller, pg. 3, sec. Key Addressing, fig. 1(Softmax memory section), “[D]uring                         
                            
                                
                                    p
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            S
                            o
                            f
                            t
                            m
                            a
                            x
                            (
                            A
                            
                                
                                    ϕ
                                
                                
                                    x
                                
                            
                            
                                
                                    x
                                
                            
                            ⋅
                            A
                            
                                
                                    ϕ
                                
                                
                                    K
                                
                            
                            (
                            
                                
                                    k
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            )
                        
                     where                         
                            ϕ
                        
                     are feature maps of dimension D, A is a d                         
                            ×
                        
                     D matrix and Softmax                        
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            
                                
                                    e
                                
                                
                                    
                                        
                                            z
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            /
                            
                                
                                    ∑
                                    
                                        j
                                    
                                
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    z
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                        
                    .”), a value section initially storing a plurality of modified feature vectors(Miller, pg. 3, sec. Value Reading, fig. 1(Value embeddings), “[T]he values of the memories are read by taking their weighted sum using the addressing probabilities, and the vector o is returned:                         
                            o
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    A
                                    
                                        
                                            ϕ
                                        
                                        
                                            V
                                        
                                    
                                    (
                                    
                                        
                                            v
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    )
                                
                            
                        
                    .” & see also Miller, pg. 8, sec. 5.2 WikiQA, “Due to the size of the training set, following many other works…we pre-trained the word vectors[i.e., embeddings]…matrices A and B which are constrained to be identical…before training KV-MemNNs.” Note: The matrix                         
                            A
                        
                     times                         
                            
                                
                                    ϕ
                                
                                
                                    V
                                
                            
                            (
                            
                                
                                    v
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            )
                        
                     produces word vectors that represents a value section storing a plurality of modified feature vectors as seen by fig.1 under title of Value embeddings);  a similarity operation in said similarity section between a vector question and each said feature vector stored in each said indicated column(Miller, pgs. 3-4, sec. Value Reading, fig. 1(Question, Question embedding, weighted average, O, Hops,                         
                            
                                
                                    R
                                
                                
                                    i
                                
                            
                        
                    ), “The memory access process is conducted by the “controller” neural network using                         
                            q
                            =
                            A
                            
                                
                                    ϕ
                                
                                
                                    X
                                
                            
                            (
                            x
                            )
                        
                     as the query. After receiving the result [vector] o, the query is updated with                         
                            
                                
                                    q
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            (
                            q
                            +
                            o
                            )
                        
                     where R is d                         
                            ×
                        
                     d matrix.” Note: It is being interpreted that                        
                             
                            q
                            =
                            A
                            
                                
                                    ϕ
                                
                                
                                    X
                                
                            
                            (
                            x
                            )
                        
                     represents the vector question, vector o represents the weighted sum of each said feature vector stored in each said indicated column, and                        
                             
                            q
                            +
                            o
                        
                     represents the similarly operation); a SoftMax operation in said SoftMax section to determine an associated SoftMax value for each said indicated feature vector(Miller, pg. 3, sec. Key Addressing, fig. 1(Softmax memory section), “[D]uring addressing, each candidate memory is assigned a relevance probability by comparing the question to each key:                         
                            
                                
                                    p
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            S
                            o
                            f
                            t
                            m
                            a
                            x
                            (
                            A
                            
                                
                                    ϕ
                                
                                
                                    x
                                
                            
                            
                                
                                    x
                                
                            
                            ⋅
                            A
                            
                                
                                    ϕ
                                
                                
                                    K
                                
                            
                            (
                            
                                
                                    k
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            )
                        
                     where                         
                            ϕ
                        
                     are feature maps of dimension D, A is a d                         
                            ×
                        
                     D matrix and Softmax                        
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            
                                
                                    e
                                
                                
                                    
                                        
                                            z
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            /
                            
                                
                                    ∑
                                    
                                        j
                                    
                                
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    z
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                        
                    .”); a multiplication operation                         
                            o
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    A
                                    
                                        
                                            ϕ
                                        
                                        
                                            V
                                        
                                    
                                    (
                                    
                                        
                                            v
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    )
                                
                            
                        
                    .” Note: It is being interpreted that                         
                            
                                
                                    p
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     represents said associated Softmax value,                        
                             
                            A
                            
                                
                                    ϕ
                                
                                
                                    V
                                
                            
                            (
                            
                                
                                    v
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    ) represents said modified feature vector stored in each indicate value embedding columns of the memory network, vector o represents a vector sum operation in said value section to accumulate an attention vector sum of output, and                         
                            
                                
                                    p
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            A
                            
                                
                                    ϕ
                                
                                
                                    V
                                
                            
                            
                                
                                    
                                        
                                            v
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                        
                     represents the multiplication operation), said vector sum to be used to generate a new vector question for a further iteration or to generate an output value in a final iteration(Miller, pgs. 3-4, fig. 1, After receiving the result o, the query is updated with                         
                            
                                
                                    q
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            (
                            q
                            +
                            o
                            )
                        
                     where R is d                         
                            ×
                        
                     d matrix. The memory access is then repeated…using a different matrix                         
                            
                                
                                    R
                                
                                
                                    i
                                
                            
                        
                     on each hop. The motivation for this is that new evidence can be combined into the query to focus on and retrieve more pertinent information in subsequent accesses. Finally, after a fixed number H hops, the resulting state of the controller is used to compute a final prediction over the possible outputs:                         
                            
                                
                                    a
                                
                                ^
                            
                            =
                            a
                            r
                            g
                            m
                            a
                            
                                
                                    x
                                
                                
                                    i
                                    =
                                    1
                                    ,
                                    …
                                    ,
                                    C
                                
                            
                            S
                            o
                            f
                            t
                            m
                            a
                            x
                            (
                            
                                
                                    q
                                
                                
                                    H
                                    +
                                    1
                                
                                
                                    T
                                
                            
                            B
                            
                                
                                    ϕ
                                
                                
                                    Y
                                
                            
                            (
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            )
                            )
                        
                     where                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     are the possible candidate outputs….”).
Miller does not teach: and an in-memory processor to activate said memory array to perform the following operations in parallel in each column. 
However, Akerib teaches: and an in-memory processor to activate said memory array to perform the following operations in parallel in each column (Akerib, paras. 0042-0045, fig. 2(150, 158, 160, 156, 116, 114), “Each cell 150 in a row may be connected to a read word line 158 (RE) and a write word line 160 (WE) through which each cell in the row may be  MUXs 116 may be used to transfer data between MBL data sections 114…without having to output the data from memory and rewriting into memory. That is, by activating one or more MUXs 116, the data result of an operation performed in a column, may be transferred from the column in one MLB data section to one or more columns in other MLB data sections in the same MLB or other MLBs.” Note: It is being interpreted that the read word lines, write word lines, and the MUXs transferring data between the different columns of the MLBs represents an in-memory processor to activate said memory array to perform the following operations in parallel in each column). 
Accordingly, one of ordinary skill in the art would modify Miller’s system in view of Akerib to teach: and an in-memory processor to activate said memory array to perform the following operations in parallel in each column. The motivation to do so would be to construct a  memory network out of bit cells that can perform data parallel computations in memory without executing costly rewriting instructions; thus speeding up computation and/or inference time (Akerib, para. 0033, “Applicants have realized that the performance of devices which use in-memory computations may be vastly improved by dividing the device's memory into blocks of memory cells which may be individually accessed and which in parallel may carry out different operations. Applicants have additionally realized that performance may be further improved by using the results of the operations carried out in a memory logic block (MLB)… in other MLBs to perform other operations, without first having to output the result of each operation and rewriting them into memory for use by the other MLBs.”). 

However Ehrman teaches: and a marker section, storing a marker vector specifying columns to be operated upon, wherein operations in one or more columns of said memory array are associated with one feature vector to be processed(Ehrman, paras. 0041-0045, see also fig. 1A,  fig. 1B, and figs 5A-5E, “Memory array 102 may include memory cells arranged in rows and columns, with the columns of cells connected together using either NOR-type architecture (for NOR Boolean operations) or a NAND-type architecture (for NAND operations), both of which are known in the art… [m]emory array 102 may be partitioned into two sections, an input data section 106 which may store column arranged input data 112, and an RSP data section 108 which may store RSP data. The RSP data may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated.” Ehrman teaches: [m]emory array 102 may be partitioned into two sections, an input data section 106 which may store column arranged input data 112, and an RSP data section 108 which may store RSP data that may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated and figures 5A-5E (i.e. and a marker section, storing a marker vector specifying columns to be operated upon, wherein operations in one or more columns of said memory array are associated with one feature vector to be processed)); specified by said marker vector(Ehrman, paras. 0041-0045, see also fig. 1A,  fig. 1B, and figs 5A-5E, “Memory array 102 may include memory  [m]emory array 102 may be partitioned into two sections, an input data section 106 which may store column arranged input data 112, and an RSP data section 108 which may store RSP data. The RSP data may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated.” Ehrman teaches: an RSP data section 108 which may store RSP data that may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated and figures 5A-5E (i.e. specified by said marker vector)). 
Accordingly, one of ordinary skill in the art would modify Miller’s system in view of Ehrman the motivation to do so would be to increase the loading of input data vertically into columns to speed up various math operations(Ehrman, para. 0035, “Applicants have realized that the functionality of memory devices with memory arrays suitable for loading input data vertically into columns, as is frequently done in CAMs (content addressable memories), may be increased by using wired-OR circuitry which may generate a signal responsive to positive identification of a data candidate in at least one of the columns. The wired-OR circuitry, hereinafter referred to as RSP (responder) signal circuitry, may perform Boolean OR operations on bit line data in most, if not all, bit lines in the memory array, to generate a RSP signal. This RSP signal may then be used internally in the device to communicate a RSP signal value (RSP value) to the data stored in the array. The RSP signal value, which may be communicated to

Regarding claim 2, Miller in view of Akerib  and in view of Ehrman teaches the system according to claim 1 wherein said memory array comprises operational portions, one portion per iteration of a natural language processing operation, each portion being divided into said similarity, Softmax, value(Miller, pg. 3, Figure 1 details, the Key-Value Memory Network model for question and answering for natural language processing,  in which the different sections are responsible for different npl  operations, such as computing the inner product between question                         
                            
                                
                                    q
                                
                                
                                    i
                                    +
                                    1
                                
                            
                        
                     and the key embeddings  in the key addressing section, implementing the Softmax function in the Softmax section, and a value embeddings section that outputs a weighted average) and marker sections (Ehrman, paras. 0041-0045, see also fig. 1A,  fig. 1B, and figs 5A-5E, “Memory array 102 may include memory cells arranged in rows and columns, with the columns of cells connected together using either NOR-type architecture (for NOR Boolean operations) or a NAND-type architecture (for NAND operations), both of which are known in the art… [m]emory array 102 may be partitioned into two sections, an input data section 106 which may store column arranged input data 112, and an RSP data section 108 which may store RSP data. The RSP data may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Miller with the above teachings of Ehrman for the same rationale stated at Claim 1.
wherein said memory array is one of: an SRAM, a non- volatile, a volatile, and a non-destructive array(Akerib, paras. 0042-0045, fig. 2(150, 158, 160, 156, 116, 114), “Each cell 150 in a row may be connected to a read word line 158 (RE) and a write word line 160 (WE) through which each cell in the row may be activated for reading and writing respectively. Each cell 150 in a column may be connected to a bit line 156. Cells 150 may include volatile memory cells or non-destructive (non-volatile) memory cells… MUXs 116 may be used to transfer data between MBL data sections 114…without having to output the data from memory and rewriting into memory. That is, by activating one or more MUXs 116, the data result of an operation performed in a column, may be transferred from the column in one MLB data section to one or more columns in other MLB data sections in the same MLB or other MLBs.”).
Accordingly, one of ordinary skill in the art would modify Miller’s system in view of Akerib to teach: wherein said memory array is one of: an SRAM, a non- volatile, a volatile, and a non-destructive array. The motivation to do so would be to have different memory cell technologies available in the memory array to manipulate memory logic blocks’ bit-lines in based on voltage differences of the technology at hand(Akerib, paras. 0044, fig. 2,  “MUX 116 may connect bit line 156 in a column of an MLB data section 114 with bit lines 156 of one or more columns in the MLB data section above or below…The columns to which bit line 156 may connect through M[U]X 116 may include that directly above, or below, the bit line, and the adjacent column on each side…Through MUX 116, a voltage charge (data) on bit line 156 in cell column 162 may be transferred to bit line 156 in any one of columns 164, 166, 168, or the inverse.”). 
 MUXs 116 may be used to transfer data between MBL data sections 114…without having to output the data from memory and rewriting into memory. That is, by activating one or more MUXs 116, the data result of an operation performed in a column, may be transferred from the column in one MLB data section to one or more columns in other MLB data sections in the same MLB or other MLBs.” Note:  It is being represented that the bit cell rows of each MLB(memory logic blocks) connected together by the MUXs(multiplexers) represents a multiplicity of bit line processors each operating on one bit of data and each MLB represents one per column of each said section). 
Accordingly, one of ordinary skill in the art would modify Miller’s system in view of Akerib to teach: wherein said memory array comprises a multiplicity of bit line processors, one per column of each said section, each said bit line processor operating on one bit of data of its associated section. The motivation to do so would be to implement simple Boolean functions in memory by using just multiplexers and memory logic blocks’ bit-lines to create complex Boolean functions for further processing (Akerib, para. 0045, fig. 2, “As an example, to write the result of a NAND or NOR operation performed in column 168 in MLB data section 114B to cell C32 (column 162) in MLB data section 114, MUX 116 connects (responsive to a command) bit 
Regarding claim 5, Miller in view of Akerib and in view of Ehrman teaches the system according to claim 1 and also comprising a neural network feature extractor to generate said feature and modified feature vectors(Miller, pg. 8, sec. 5.2 WikiQA, “Due to the size of the training set, following many other works… we pre-trained the word vectors (matrices A and B which are constrained to be identical) before training KV-MemNNs. We employed Supervised Embeddings… for that goal, training on all of Wikipedia while treating the input as a random sentence and the target as the subsequent sentence.”).
Regarding claim 6, Miller in view of Akerib and in view of Ehrman teaches the system according to claim 1 and wherein said feature vectors comprise features of a word, a sentence, or a document(Miller, pg. 8, sec. 5.2 WikiQA, “Due to the size of the training set, following many other works… we pre-trained the word vectors (matrices A and B which are constrained to be identical) before training KV-MemNNs. We employed Supervised Embeddings… for that goal, training on all of Wikipedia while treating the input as a random sentence and the target as the subsequent sentence.”).
Regarding claim 7, Miller in view of Akerib and in view of Ehrman teaches the system according to claim 1 wherein said feature vectors are the output of a pre-trained neural network (Miller, pg. 8, sec. 5.2 WikiQA, “Due to the size of the training set, following many other works… we pre-trained the word vectors (matrices A and B which are constrained to be identical) before training KV-MemNNs. We employed Supervised Embeddings… for that goal, training on all of Wikipedia while treating the input as a random sentence and the target as the subsequent sentence.”).
Regarding claim 8, Miller in view of Akerib and in view of Ehrman teaches the system according to claim 1 and also comprising a pre-trained neural network to generate an initial vector question(Miller, pg. 3, fig. 1,  “The memory access process is conducted by the “controller” neural network using                         
                            q
                            =
                            A
                            
                                
                                    ϕ
                                
                                
                                    X
                                
                            
                            
                                
                                    x
                                
                            
                        
                     as the query.” Note: It is being interpreted that the controller neural network represents the pre-trained neural network that generates the initial question & see also  Miller, pg. 8, sec. 5.2 WikiQA, “Due to the size of the training set, following many other works…we pre-trained the word vectors (matrices A and B which are constrained to be identical) before training KV-MemNNs. We employed Supervised Embeddings… for that goal, training on all of Wikipedia while treating the input as a random sentence and the target as the subsequent sentence.”).
Regarding claim 9, Miller in view of Akerib and in view of Ehrman teaches the system according to claim 8 and also comprising a question generator to generate a further question from said initial vector question and said attention vector sum(Miller, pgs. 3-4, fig. 1, “After receiving the result o, the query is updated with                         
                            
                                
                                    q
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            (
                            q
                            +
                            o
                            )
                        
                     where R is d                         
                            ×
                        
                     d matrix. The memory access is then repeated…using a different matrix                         
                            
                                
                                    R
                                
                                
                                    i
                                
                            
                        
                     on each hop. The motivation for this is that new evidence can be combined into the query to focus on and retrieve more pertinent Note: It is being interpreted that the vector                         
                            q
                        
                     represents said initial vector question and vector                         
                            o
                        
                     represents said attention vector sum).
Regarding claim 10, Miller in view of Akerib and in view of Ehrman teaches the system according to claim 9 wherein said question generator is a neural network(Miller, pgs. 3-4, fig. 1, “Finally, after a fixed number H hops, the resulting state of the controller is used to compute a final prediction over the possible outputs:                         
                            
                                
                                    a
                                
                                ^
                            
                            =
                            a
                            r
                            g
                            m
                            a
                            
                                
                                    x
                                
                                
                                    i
                                    =
                                    1
                                    ,
                                    …
                                    ,
                                    C
                                
                            
                            S
                            o
                            f
                            t
                            m
                            a
                            x
                            (
                            
                                
                                    q
                                
                                
                                    H
                                    +
                                    1
                                
                                
                                    T
                                
                            
                            B
                            
                                
                                    ϕ
                                
                                
                                    Y
                                
                            
                            (
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            )
                            )
                        
                     where                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     are the possible candidate outputs e.g. all the entities in the KB, or all possible candidate answer sentences in the case of a dataset like WIKIQA… The whole network is trained end-to-end, and the model learns to perform the iterative accesses to output the desired target a by minimizing a standard cross-entropy loss between                         
                            
                                
                                    a
                                
                                ^
                            
                        
                     and the correct answer a. Backpropagation and stochastic gradient descent are thus used to learn the matrices A, B and                         
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            ,
                            …
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    .”).
Regarding claim 11, Miller in view of Akerib and in view of Ehrman teaches the system according to claim 9 and wherein said question generator is implemented as a matrix multiplier on bit lines of said memory array(Miller, pgs. 3-4, fig. 1, “After receiving the result o, the query is updated with                         
                            
                                
                                    q
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            (
                            q
                            +
                            o
                            )
                        
                     where R is d                         
                            ×
                        
                     d matrix. The memory access is then repeated…using a different matrix                         
                            
                                
                                    R
                                
                                
                                    i
                                
                            
                        
                     on each hop.” Note: It is being interpreted that the matrix                         
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                        
                     times the vector q +o represents a matrix multiplier on bit lines of said memory array).
Regarding claim 12, Miller teaches a method for natural language processing, the method comprising: having a memory array having rows and columns, said memory array being divided into a similarity section initially storing a plurality of feature or key vectors(Miller, pg. 3, sec.  [i]n KV-MemNNs we define the memory slots as pairs of vectors                         
                            
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            v
                                        
                                        
                                            1
                                        
                                    
                                
                            
                            …
                            ,
                            (
                            
                                
                                    k
                                
                                
                                    M
                                
                            
                            ,
                             
                            
                                
                                    v
                                
                                
                                    M
                                
                            
                            )
                        
                     and denote the question x.”), a SoftMax section in which to determine probabilities of occurrence of said feature or key vectors(Miller, pg. 3, sec. Key Addressing, fig. 1(Softmax memory section), “[D]uring addressing, each candidate memory is assigned a relevance probability by comparing the question to each key:                         
                            
                                
                                    p
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            S
                            o
                            f
                            t
                            m
                            a
                            x
                            (
                            A
                            
                                
                                    ϕ
                                
                                
                                    x
                                
                            
                            
                                
                                    x
                                
                            
                            ⋅
                            A
                            
                                
                                    ϕ
                                
                                
                                    K
                                
                            
                            (
                            
                                
                                    k
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            )
                        
                     where                         
                            ϕ
                        
                     are feature maps of dimension D, A is a d                         
                            ×
                        
                     D matrix and Softmax                        
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            
                                
                                    e
                                
                                
                                    
                                        
                                            z
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            /
                            
                                
                                    ∑
                                    
                                        j
                                    
                                
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    z
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                        
                    .”), a value section initially storing a plurality of modified feature vectors(Miller, pg. 3, sec. Value Reading, fig. 1(Value embeddings), “[T]he values of the memories are read by taking their weighted sum using the addressing probabilities, and the vector o is returned:                         
                            o
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    A
                                    
                                        
                                            ϕ
                                        
                                        
                                            V
                                        
                                    
                                    (
                                    
                                        
                                            v
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    )
                                
                            
                        
                    .” & see also Miller, pg. 8, sec. 5.2 WikiQA, “Due to the size of the training set, following many other works…we pre-trained the word vectors[i.e., embeddings]…matrices A and B which are constrained to be identical…before training KV-MemNNs.” Note: The matrix                         
                            A
                        
                     times                         
                            
                                
                                    ϕ
                                
                                
                                    V
                                
                            
                            (
                            
                                
                                    v
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            )
                        
                     produces word vectors that represents a value section storing a plurality of modified feature vectors as seen by fig.1 under title of Value embeddings); performing a similarity operation in said similarity section between a vector question and each said feature vector stored in each said indicated column(Miller, pgs. 3-4, sec. Value Reading, fig. 1(Question, Question embedding, weighted average, O, Hops,                         
                            
                                
                                    R
                                
                                
                                    i
                                
                            
                        
                    ), “The memory access process is conducted by the “controller” neural network using                         
                            q
                            =
                            A
                            
                                
                                    ϕ
                                
                                
                                    X
                                
                            
                            (
                            x
                            )
                        
                     as the query. After receiving the result [vector] o, the query is updated with                         
                            
                                
                                    q
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            (
                            q
                            +
                            o
                            )
                        
                     where R is d                         
                            ×
                        
                     d matrix.” Note: It is being interpreted that                        
                             
                            q
                            =
                            A
                            
                                
                                    ϕ
                                
                                
                                    X
                                
                            
                            (
                            x
                            )
                        
                     represents the vector question, vector o represents the weighted sum of each said feature vector stored in each said indicated column, and                        
                             
                            q
                            +
                            o
                        
                     represents the similarly operation);  “[D]uring addressing, each candidate memory is assigned a relevance probability by comparing the question to each key:                         
                            
                                
                                    p
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            S
                            o
                            f
                            t
                            m
                            a
                            x
                            (
                            A
                            
                                
                                    ϕ
                                
                                
                                    x
                                
                            
                            
                                
                                    x
                                
                            
                            ⋅
                            A
                            
                                
                                    ϕ
                                
                                
                                    K
                                
                            
                            (
                            
                                
                                    k
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            )
                        
                     where                         
                            ϕ
                        
                     are feature maps of dimension D, A is a d                         
                            ×
                        
                     D matrix and Softmax                        
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            =
                            
                                
                                    e
                                
                                
                                    
                                        
                                            z
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            /
                            
                                
                                    ∑
                                    
                                        j
                                    
                                
                                
                                    
                                        
                                            e
                                        
                                        
                                            
                                                
                                                    z
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                        
                    .”); performing a multiplication operation in said value section to multiply said associated SoftMax value by each said modified feature vector stored in each said indicated column; and performing a vector sum operation in said value section to accumulate an attention vector sum of output of said multiplication operation(Miller, pg. 3, sec. Value Reading, fig. 1(Value embeddings, weighted average, o,  Softmax), “[T]he values of the memories are read by taking their weighted sum using the addressing probabilities, and the vector o is returned:                         
                            o
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    A
                                    
                                        
                                            ϕ
                                        
                                        
                                            V
                                        
                                    
                                    (
                                    
                                        
                                            v
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    )
                                
                            
                        
                    .” Note: It is being interpreted that                         
                            
                                
                                    p
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     represents said associated Softmax value,                        
                             
                            A
                            
                                
                                    ϕ
                                
                                
                                    V
                                
                            
                            (
                            
                                
                                    v
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    ) represents said modified feature vector stored in each indicate value embedding columns of the memory network, vector o represents a vector sum operation in said value section to accumulate an attention vector sum of output, and                         
                            
                                
                                    p
                                
                                
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            A
                            
                                
                                    ϕ
                                
                                
                                    V
                                
                            
                            
                                
                                    
                                        
                                            v
                                        
                                        
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                        
                     represents the multiplication operation), said vector sum to be used to generate a new vector question for a further iteration or to generate an output value in a final iteration(Miller, pgs. 3-4, fig. 1, After receiving the result o, the query is updated with                         
                            
                                
                                    q
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            (
                            q
                            +
                            o
                            )
                        
                     where R is d                         
                            ×
                        
                     d matrix. The memory access is then repeated…using a different matrix                         
                            
                                
                                    R
                                
                                
                                    i
                                
                            
                        
                     on each hop. The motivation for this is that new evidence can be combined into the query to focus on and retrieve more pertinent information in subsequent accesses. Finally, after a fixed number H hops, the resulting state of the controller is used to compute a final prediction over the possible outputs:                         
                            
                                
                                    a
                                
                                ^
                            
                            =
                            a
                            r
                            g
                            m
                            a
                            
                                
                                    x
                                
                                
                                    i
                                    =
                                    1
                                    ,
                                    …
                                    ,
                                    C
                                
                            
                            S
                            o
                            f
                            t
                            m
                            a
                            x
                            (
                            
                                
                                    q
                                
                                
                                    H
                                    +
                                    1
                                
                                
                                    T
                                
                            
                            B
                            
                                
                                    ϕ
                                
                                
                                    Y
                                
                            
                            (
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            )
                            )
                        
                     where                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     are the possible candidate outputs….”).
 and activating said memory array to perform the following operations in parallel in each column. 
However, Akerib teaches and activating said memory array to perform the following operations in parallel in each column(Akerib, paras. 0042-0045, fig. 2(150, 158, 160, 156, 116, 114), “Each cell 150 in a row may be connected to a read word line 158 (RE) and a write word line 160 (WE) through which each cell in the row may be activated for reading and writing respectively. Each cell 150 in a column may be connected to a bit line 156. Cells 150 may include volatile memory cells or non-destructive (non-volatile) memory cells… MUXs 116 may be used to transfer data between MBL data sections 114…without having to output the data from memory and rewriting into memory. That is, by activating one or more MUXs 116, the data result of an operation performed in a column, may be transferred from the column in one MLB data section to one or more columns in other MLB data sections in the same MLB or other MLBs.” Note: It is being interpreted that the read word lines, write word lines, and the MUXs transferring data between the different columns of the MLBs represents an in-memory processor to activate said memory array to perform the following operations in parallel in each column). 
Accordingly, one of ordinary skill in the art would modify Miller’s method in view of Akerib to teach: and activating said memory array to perform the following operations in parallel in each column. The motivation to do so would be to construct a  memory network out of bit cells that can perform data parallel computations in memory without executing costly rewriting instructions; thus speeding up computation and/or inference time (Akerib, para. 0033, “Applicants have realized that the performance of devices which use in-memory computations may be vastly improved by dividing the device's memory into blocks of memory cells which may be individually accessed and which in parallel may carry out different operations.  other MLBs.”). 
Miller does not teach: and a marker section, storing a marker vector specifying columns to be operated upon, wherein operations in one or more columns of said memory array are associated with one feature vector to be processed; indicated by said marker vector.
However Ehrman teaches: and a marker section, storing a marker vector specifying columns to be operated upon, wherein operations in one or more columns of said memory array are associated with one feature vector to be processed(Ehrman, paras. 0041-0045, see also fig. 1A,  fig. 1B, and figs 5A-5E, “Memory array 102 may include memory cells arranged in rows and columns, with the columns of cells connected together using either NOR-type architecture (for NOR Boolean operations) or a NAND-type architecture (for NAND operations), both of which are known in the art… [m]emory array 102 may be partitioned into two sections, an input data section 106 which may store column arranged input data 112, and an RSP data section 108 which may store RSP data. The RSP data may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated.” Ehrman teaches: [m]emory array 102 may be partitioned into two sections, an input data section 106 which may store column arranged input data 112, and an RSP data section 108 which may store RSP data that may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated and figures 5A-5E (i.e. and a marker section, storing a marker vector specifying columns to be operated upon, wherein operations in one or more columns of said memory array are associated with one feature vector to be processed)); indicated by said marker vector(Ehrman, paras. 0041-0045, see also fig. 1A,  fig. 1B, and figs 5A-5E, “Memory array 102 may include memory cells arranged in rows and columns, with the columns of cells connected together using either NOR-type architecture (for NOR Boolean operations) or a NAND-type architecture (for NAND operations), both of which are known in the art… [m]emory array 102 may be partitioned into two sections, an input data section 106 which may store column arranged input data 112, and an RSP data section 108 which may store RSP data. The RSP data may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated.” Ehrman teaches: an RSP data section 108 which may store RSP data that may include processed data resulting from the manipulation of the stored data responsive to an obtained RSP value in an RSP signal 114, and may include temporary data which may be updated every time a new RSP signal is generated and figures 5A-5E (i.e. indicated by said marker vector)). 
Accordingly, one of ordinary skill in the art would modify Miller’s system in view of Ehrman the motivation to do so would be to increase the loading of input data vertically into columns to speed up various math operations(Ehrman, para. 0035, “Applicants have realized that the functionality of memory devices with memory arrays suitable for loading input data vertically into columns, as is frequently done in CAMs (content addressable memories), may be increased by using wired-OR circuitry which may generate a signal responsive to positive identification of a data candidate in at least one of the columns. The wired-OR circuitry, 
Regarding claim 13, Miller in view of Akerib and in view of Ehrman teaches the method according to claim 12 wherein said memory array comprises a multiplicity of bit line processors, one per column of each said section, said method additionally comprising each said bit line processor operating on one bit of data of its associated section(Akerib, paras. 0042-0045, fig. 2(150, 158, 160, 156, 116, 114), “Each cell 150 in a row may be connected to a read word line 158 (RE) and a write word line 160 (WE) through which each cell in the row may be activated for reading and writing respectively. Each cell 150 in a column may be connected to a bit line 156. Cells 150 may include volatile memory cells or non-destructive (non-volatile) memory cells… MUXs 116 may be used to transfer data between MBL data sections 114…without having to output the data from memory and rewriting into memory. That is, by activating one or more MUXs 116, the data result of an operation performed in a column, may be transferred from the column in one MLB data section to one or more columns in other MLB data sections in the same MLB or other MLBs.” Note:  It is being represented that the bit cell rows of each MLB(memory logic blocks) connected together by the MUXs(multiplexers) represents a multiplicity of bit line processors each operating on one bit of data and each MLB represents one per column of each said section).
 wherein said memory array comprises a multiplicity of bit line processors, one per column of each said section, said method additionally comprising each said bit line processor operating on one bit of data of its associated section. The motivation to do so would be to implement simple Boolean functions in memory by using just multiplexers and memory logic blocks’ bit-lines to create complex Boolean functions for further processing (Akerib, para. 0045, fig. 2, “As an example, to write the result of a NAND or NOR operation performed in column 168 in MLB data section 114B to cell C32 (column 162) in MLB data section 114, MUX 116 connects (responsive to a command) bit line 156 in column 162 to bit line 156 in column 168 so that the two bit lines are now at substantially the same potential (a logical "0" or "1"). Write word line 160 connecting to C32 in MLB data section 114A is activated (write enabled) and the data on bit line 156 is written into C32. The data written into C32 in MLB data section 11A may be used to perform a Boolean operation in column 168.”).
Regarding claim 14, Miller in view of Akerib and in view of Ehrman teaches the method according to claim 12 and also comprising generating said feature and modified feature vectors with a neural network(Miller, pg. 8, sec. 5.2 WikiQA, fig. 1(Key embeddings and value embeddings) “Due to the size of the training set, following many other works… we pre-trained the word vectors (matrices A and B which are constrained to be identical) before training KV-MemNNs. We employed Supervised Embeddings… for that goal, training on all of Wikipedia while treating the input as a random sentence and the target as the subsequent sentence.”) and storing them into said similarity and value sections, respectively (Miller, pg. 8, sec. 5.2 WikiQA, fig. 1(Key embeddings and value embeddings) “To represent the memories, we used the Window-Level representation.” & see also Miller, pg. 4, sec. Window Level, Note: It is being interpreted that encoding the key as the entire window using the bag-of-words transformation represents storing said similarity section, and encoding the values as the center word using the bag-of-words transformation represents storing the value section).
Regarding claim 15, Miller in view of Akerib and in view of Ehrman teaches the method according to claim 12 and wherein said feature vectors comprise features of a word, a sentence, or a document(Miller, pg. 8, sec. 5.2 WikiQA, “Due to the size of the training set, following many other works… we pre-trained the word vectors (matrices A and B which are constrained to be identical) before training KV-MemNNs. We employed Supervised Embeddings… for that goal, training on all of Wikipedia while treating the input as a random sentence and the target as the subsequent sentence.”).
Regarding claim 16, Miller in view of Akerib and in view of Ehrman teaches the method according to claim 12 and also comprising generating an initial vector question using a pre-trained neural network (Miller, pg. 3, fig. 1, “The memory access process is conducted by the “controller” neural network using                         
                            q
                            =
                            A
                            
                                
                                    ϕ
                                
                                
                                    X
                                
                            
                            
                                
                                    x
                                
                            
                        
                     as the query.” Note: It is being interpreted that the controller neural network represents the pre-trained neural network that generates the initial vector question & see also  Miller, pg. 8, sec. 5.2 WikiQA, “Due to the size of the training set, following many other works…we pre-trained the word vectors (matrices A and B which are  for that goal, training on all of Wikipedia while treating the input as a random sentence and the target as the subsequent sentence.”).
Regarding claim 17, Miller in view of Akerib and in view of Ehrman teaches the method according to claim 16 and also comprising generating a further question from said initial vector question and said attention vector sum(Miller, pgs. 3-4, fig. 1, “After receiving the result o, the query is updated with                         
                            
                                
                                    q
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            (
                            q
                            +
                            o
                            )
                        
                     where R is d                         
                            ×
                        
                     d matrix. The memory access is then repeated…using a different matrix                         
                            
                                
                                    R
                                
                                
                                    i
                                
                            
                        
                     on each hop. The motivation for this is that new evidence can be combined into the query to focus on and retrieve more pertinent information in subsequent accesses.” Note: It is being interpreted that the vector                         
                            q
                        
                     represents said initial vector question and vector                         
                            o
                        
                     represents said attention vector sum).
Regarding claim 18, Miller in view of Akerib and in view of Ehrman teaches the method according to claim 17 wherein generating a further question utilizes a neural network(Miller, pgs. 3-4, fig. 1, “Finally, after a fixed number H hops, the resulting state of the controller is used to compute a final prediction over the possible outputs:                         
                            
                                
                                    a
                                
                                ^
                            
                            =
                            a
                            r
                            g
                            m
                            a
                            
                                
                                    x
                                
                                
                                    i
                                    =
                                    1
                                    ,
                                    …
                                    ,
                                    C
                                
                            
                            S
                            o
                            f
                            t
                            m
                            a
                            x
                            (
                            
                                
                                    q
                                
                                
                                    H
                                    +
                                    1
                                
                                
                                    T
                                
                            
                            B
                            
                                
                                    ϕ
                                
                                
                                    Y
                                
                            
                            (
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            )
                            )
                        
                     where                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     are the possible candidate outputs e.g. all the entities in the KB, or all possible candidate answer sentences in the case of a dataset like WIKIQA… The whole network is trained end-to-end, and the model learns to perform the iterative accesses to output the desired target a by minimizing a standard cross-entropy loss between                         
                            
                                
                                    a
                                
                                ^
                            
                        
                     and the correct answer a. Backpropagation and stochastic gradient descent are thus used to learn the matrices A, B and                         
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            ,
                            …
                            
                                
                                    R
                                
                                
                                    H
                                
                            
                        
                    .”).
wherein said generating a further question comprises performing matrix multiplication on bit lines of said memory array(Miller, pgs. 3-4, fig. 1, “After receiving the result o, the query is updated with                         
                            
                                
                                    q
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                            (
                            q
                            +
                            o
                            )
                        
                     where R is d                         
                            ×
                        
                     d matrix. The memory access is then repeated…using a different matrix                         
                            
                                
                                    R
                                
                                
                                    i
                                
                            
                        
                     on each hop.” Note: It is being interpreted that the matrix                         
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                        
                     times the vector q +o represents a matrix multiplication on bit lines of said memory array).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129