DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is responsive to the correspondence filled on 1/26/22.
Claims 1-20 are presented for examination.

IDS Considerations

The information disclosure statement (IDS) submitted on 6/24/20 is/are being considered by the examiner as the submission is in compliance with the provisions of 37 CFR 1.97.

Response to Arguments


Applicant's arguments filed 1/26/22 with respect to claims 1-20 have been considered but are not persuasive.

	Applicant argued in page 7-8 that prior art do not teach the perception processor configured to classify the feature and generate a common coordinate space for the feature in relation to the line of sight of the user’s head.

 Klaus page 192 para 4 to generate the sub-digraphs taking into account the presence of obstacles within walkable surfaces, we use approaches conventionally applied in the field of mobile robotics, in particular, visibility-based method. The defining characteristics of a visibility map are that its nodes share an edge if they are within line of sight of each other, and that all points in the free space are within line of sight of at least one node on the visibility map. The nodes vi of the visibility graph include the start location, the goal location, and all the vertices of the obstacles.

Klaus page 37 Fig. 2a-2b shows stereo vision camera is located at line of sight and as per page 37 para 5 teach convert pixel coordinates from the captured images into world [common coordinate] point coordinates in the camera coordinate system. So all sensor data collected by this system is converted to world or global common coordinate system which is corelated to line of sight.




Daniela was not cited for this limitation. Other arguments are not related to claim language, because applicant argument is not commensurate with claim language.

CLAIM INTERPRETATION

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: in claim 1, 6-7 and 14.



If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Examiner is invoking 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, for examining claim(s) 1, 6-7 and 14 because these claim(s) are drawn to a functionality comprising unit/module which use a generic placeholder, “module” coupled with functional language “a navigation module configured to measure localized coordinate” in claim 1, 6-7 and 14 without reciting sufficient structure to achieve the function. 

However, a review of the specification paragraph [0055] shows corresponding structure.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Klaus (Computers Helping People with Special Needs - 15th International Conference, ICCHP 2016 - DOI 10.1007/978-3-319-41267-2 - Linz, Austria, July 13–15, 2016), in view of Daniela (Stereosonic vision: Exploring visual-to auditory sensory substitution mappings in an immersive virtual reality navigation paradigm - Published: July 5, 2018 - https://doi.org/10.1371/journal.pone.0199389)

Regarding to claim 1, 14 and 20:

1. Klaus teach a route guidance and proximity awareness system for a visually impaired user, comprising: (Klaus page 204 Fig. 1. Custom multi-speaker headphones and stereovision camera mount. Klaus page 207 para 2 the tests of the Sound of Vision are conducted with participation of visually impaired testers – blind and partially-sighted, as these groups of end-users are the best experts regarding how the SOV solution is going to meet their needs)
a sensor assembly including a camera configured to detect a feature in an environment proximal to the visually impaired user and capture feature data of that feature; (Klaus page 206 para 7 Model 3 – Depth scanning - A virtual “scanning plane”, a surface parallel to the camera view that moves away from the observer through the scene. The model distinguishes two categories of objects – walls (any object with a sufficiently large surface area) and generic obstacles. This model has been previously successfully implemented in the Naviton prototype [2])
a navigation module configured to measure localized coordinate data of the user and provide navigation data; (Klaus page 202 para 2 the overall concept of the Sound of Vision system is creation of an electronic aid for local navigation and obstacle avoidance, similar to a previous Naviton project [2])
the perception processor configured to classify the feature and generate a common coordinate space for the feature (Klaus page 37 para 5 images from the stereo vision camera (Fig. 2a, 2b) are devoid of geometric distortions and rectified, which is a prerequisite for the calculation of the disparity map. The disparity map in turn is used to convert pixel coordinates from the captured images into world [common coordinate] point coordinates in the camera coordinate system, from which depth of the obstacles [classify the feature] can be computed [5, 17]) in relation to the line of sight of the user’s head; (Klaus page 192 para 4 to generate the sub-digraphs taking into account the presence of obstacles within walkable surfaces, we use approaches conventionally applied in the field of mobile robotics, in particular, visibility-based method. The defining characteristics of a visibility map are that its nodes share an edge if they are within line of sight of each other, and that all points in the free space are 
a translation processor (Klaus page 583 para 3 taking the previous issues into consideration, a translation system, which is user-friendly and easily accessible, is needed and vital to the process of translation for people facing visual impairment and blindness) for receiving data from the perception processor and (Klaus page 583 para 1 blind translators who use technology do more than sighted people could ever imagine, as they can translate with technology anywhere and anytime. Nevertheless, fully understanding this issue is reflected in the growth of facing many challenges regarding their techniques of translating texts. One of these challenges is that blind people start scanning and skimming texts [perception processor] without having a highlighting tool. They pick every difficult word and look for it in electronic dictionaries and websites on the Internet, which of course, adds a heightened degree of difficulty during translation) configured to process that data to assign a sound to the feature and specialize that sound based upon the coordinates of the feature in relation to the user’s line of sight; and (Klaus page 192 para 4 to generate the sub-digraphs taking into account the presence of obstacles within walkable surfaces, we use approaches conventionally applied in the field of mobile robotics, in particular, visibility-based method. The defining characteristics of a visibility map are that its nodes share an edge if they are within line of sight of each other, and that all points in the free space are within line of sight of at least one node on the visibility map. The nodes vi of the visibility graph include the start location, the goal location, and all the vertices of the obstacles)
an earphone for broadcasting an immersive audio-augmented reality environment wherein each sound is conveyed through the earphone with depth and location. (Klaus page 124 para 2 2.2 AR is not limited to the visual augmentation, but it can be applied for audio augmentation [5–7]. With our AR tactile map, the physical tactile map can be augmented by audio and visual feedbacks which are enlarging/enhancing the focused area with voice over of the POR/POI according to the user’s input. We also propose intuitive user interface with hand gesture recognition for the interaction with the system. Klaus page 181 para 2 the system presented by Anderson [19] collects depth information about the environment, saves it in a chunk-based voxel representation, and generates 3D audio for sonifiction which is relayed to the VI user via headphones to alert him to the presence of obstacles)

Klaus do not explicitly teach a headset IMU having an inertial measurement unit configured to capture head data including the movements and line of sight of the user’s head; a perception processor configured to receive the feature data from the sensor assembly, coordinate data of the user from the navigation module, and head data from the headset IMU.

However Daniela teach a headset IMU having an inertial measurement unit configured to capture head data including the movements and line of sight of the user’s head; (Daniela page 10 para 4 the sonified environments were then deployed on a Google Tango tablet (Google, Mountain View, CA, USA). The tablet employs visual-inertial odometry using a 180Ê FOV fish-eye camera and its Inertial Measurement Unit 
a perception processor configured to receive the feature data from the sensor assembly, coordinate data of the user from the navigation module, (Daniela Fig 4. Maze navigational behaviours and results. (A, B, & C) Examples of participant trajectories through a maze, using visual-only cues (A), echolocation audio-only cues (B), and humming audio-only cues (C). (D, E, F, & H) Basic efficiency and strategy of navigation through the maze) and head data from the headset IMU, (Daniela page 10 para 6 the sonified environments were then deployed on a Google Tango tablet (Google, Mountain View, CA, USA). The tablet employs visual-inertial odometry using a 180Ê FOV fish-eye camera and its Inertial Measurement Unit (Fig 2B) to map its 3D position and rotation in real-time based on an initialisation point)
 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Klaus, further incorporating Daniela in video/camera technology. One would be motivated to do so, to 

Regarding to claim 2:

2. Klaus teach the system of claim 1, Klaus do not explicitly teach wherein the camera of the sensor assembly includes an inertial measurement unit to capture the feature data, the feature data including raw movement and orientation data of the feature in relation to the user, the data being transmitted to the perception processor.

However Daniela teach wherein the camera of the sensor assembly includes an inertial measurement unit to capture the feature data, the feature data including raw movement and orientation data of the feature in relation to the user, the data being transmitted to the perception processor. (Daniela page 10 para 6 the sonified environments were then deployed on a Google Tango tablet (Google, Mountain View, CA, USA). The tablet employs visual-inertial odometry using a 180Ê FOV fish-eye camera and its Inertial Measurement Unit (Fig 2B) to map its 3D position and rotation in real-time based on an initialisation point)

Regarding to claim 3:

3. Klaus teach the system of claim 2, Klaus do not explicitly teach wherein the sensor assembly includes a second camera that is configured to capture video of the feature in the environment and depth data as measured from the user, the depth data and video being transmitted to the perception processor.

However Daniela teach wherein the sensor assembly includes a second camera that is configured to capture video of the feature in the environment and depth data as measured from the user, the depth data and video being transmitted to the perception processor. (Daniela page 25 para 3 by pairing an RGB camera with a depth sensor and applying computer vision algorithms to the incoming video streams, it is possible to extract 3D spatial information of the real-world environment. On the incoming RGB images, object detection methods [102] will localise the presence/absence of objects, and object recognition methods [103, 104] will identify the types or classes of these objects (for example, desks, chairs, people, but also more abstract categories like walls, floors and ceilings). The incoming depth maps (from the depth sensor) will provide real-world distance estimates from the camera to densely scattered points in the environment. If the camera were mounted on the user's head, these two streams of information would allow for the building and updating of a
3D egocentric representation of the user's environment which could subsequently be converted into its corresponding soundscape. For simulated echolocation, the dense distances provided by the depth maps are synonymous with the distances the projected particles travel)

Regarding to claim 4:

4. Klaus teach the system of claim 1, Klaus do not explicitly teach wherein the camera of the sensor assembly is configured to capture video of the feature in the environment and depth data as measured from the user, the depth data and video being transmitted to the perception processor.

However Daniela teach wherein the camera of the sensor assembly is configured to capture video of the feature in the environment and depth data as measured from the user, the depth data and video being transmitted to the perception processor. (Daniela page 3 para 5 these SSDs work by presenting a spatially informative modified audio signal to the user, but they have no access to a 3D model of the user's surroundings. This limitation can be addressed, however, by the recent proliferation of portable devices that can rapidly scan and reconstruct 3D environments (e.g. through stereoscopic depth or using active projection of infrared features such as in the XBox Kinect. Daniela page 5 para 1 central to our quantitative assessment of participants' navigational efficiency was the 3D tracking capability of the Google Tango tablet. The device captured the real-time 3D dynamics of participants' walking behaviour in the VR environments, thus enabling us to develop mobility-
relevant metrics and provide an in-depth analysis of participants' movements in both auditory conditions and the visual baseline condition)

Regarding to claim 5:

5. Klaus teach the system of claim 1, wherein the camera is an RGB-D camera. (Klaus page 505 para 1 this system is composed of an adaptive interface with a motorized webcam and a cheap RGBD sensor for recognizing hand gestures)

Regarding to claim 6:

6. Klaus teach the system of claim 1, wherein the navigation module provides the GPS coordinate of the user. (Klaus page 3 para 2 there are other powerful environment sensing mechanisms that harness non-visual information. Accessible navigation apps using GPS (e.g., Blindsquare) allow blind pedestrians to localize themselves, follow a route, and discover nearby points of interest)

Regarding to claim 7:

7. Klaus teach the system of claim 1, wherein the navigation data of the navigation module is a navigation solution through one or more waypoints (Klaus page 3 para 2 there are other powerful environment sensing mechanisms that harness non-visual information. Accessible navigation apps using GPS (e.g., Blindsquare) allow blind pedestrians to localize themselves, follow a route, and discover nearby points of interest) located within the audio-augmented reality environment. (Klaus page 124 para 2 2.2 AR is not limited to the visual augmentation, but it can be applied for audio augmentation [5–7]. With our AR tactile map, the physical tactile map can be 

Regarding to claim 8:

8. Klaus teach the system of claim 1, wherein the sensor assembly is mounted to the user. (Klaus page 204 Fig. 1. Custom multi-speaker headphones and stereovision camera mount)

Regarding to claim 9:

9. Klaus teach the system of claim 1, wherein common coordinate space includes the converted coordinates of the feature (Klaus page 37 para 5 images from the stereo vision camera (Fig. 2a) are devoid of geometric distortions and rectified, which is a prerequisite for the calculation of the disparity map. The disparity map in turn is used to convert pixel coordinates from the captured images into world [common coordinate] point coordinates in the camera coordinate system, from which depth of the obstacles [classify the feature] can be computed [5, 17]) and the user to that of a head space coordinates based on the line of sight of the user. (Klaus page 276 para 2 The movement of the viewpoint was measured using the head-mounted eye mark recorder (EMR) NAC EMR-8 (Image Technology Inc.) to determine how long and how many times the viewpoint was on the smartphone (Fig. 2))

Regarding to claim 10  and 19:

10. Klaus teach the system of claim 1, wherein the perception processor tracks movement of the feature. (Klaus page 503 para 1 these include a fast learning mechanism from an accurate sixdegrees-of-freedom pose tracker, a real-time extended distance transform for the hand model, and a robust integration of support vector machine and superpixels)

Regarding to claim 11:

11. Klaus teach the system of claim 1, wherein the perception processor classifies the feature, assigns as status to the feature, (Klaus page 503 para 1 these include a fast learning mechanism from an accurate sixdegrees-of-freedom pose tracker, a real-time extended distance transform for the hand model, and a robust integration of support vector machine and superpixels) and monitors the feature’s location. (Klaus page 504 para 1 Kiliboz and Gudukbay [1], in 2015, introduced a real-time [monitor] robust gesture recognition algorithm for hand using a quick learning mechanism and a six-degrees-offreedom pose tracker interactively. By collecting gesture data and 

Regarding to claim 12:

12. Klaus teach the system of claim 1, wherein the perception processor identifies the feature as at least one of a semantic object and a non-semantic object. (Klaus page 543 para 3 we show how our hierarchical SVM-HCRF model combines two different sources characterizing motion activities: the temporally local segment discriminative power, represented by object information and motion semantic information inferred by SVM)

Regarding to claim 13:

13. Klaus teach the system of claim 1, wherein the translation processor receives feature data and navigation data from the perception processor. (Klaus page 180 para 3 3D depth sensor, accelerometer, ambient light sensor, barometer, compass, GPS, gyroscope), which allow it not only to track its own movement and orientation through 3D space in real time using computer vision techniques but also enable it to remember areas that it has travelled through and localize the user within those areas to up to an accuracy of a few centimeters. Its integrated infrared based depth sensors also 

Regarding to claim 15:

15. Klaus teach the method of claim 14, wherein the navigation data includes at least one of the GPS location of the user and a navigation solution. (Klaus page 3 para 2 there are other powerful environment sensing mechanisms that harness non-visual information. Accessible navigation apps using GPS (e.g., Blindsquare) allow blind pedestrians to localize themselves, follow a route, and discover nearby points of interest)

Regarding to claim 16:

16. Klaus teach the method of claim 14, further comprising: converting the coordinates of the feature data and the navigation data (Klaus page 180 para 3 3D depth sensor, accelerometer, ambient light sensor, barometer, compass, GPS, gyroscope), which allow it not only to track its own movement and orientation through 3D space in real time using computer vision techniques but also enable it to remember areas that it has travelled through and localize the user within those areas to up to an accuracy of a few centimeters. Its integrated infrared based depth sensors also allow it to measure the distance from the device to objects in the real world providing depth data about the objects in the form of point clouds) to a common coordinate space. 

Regarding to claim 17:

17. Klaus teach the method of claim 14, wherein the navigation data includes waypoint navigation solutions (Klaus page 3 para 2 there are other powerful environment sensing mechanisms that harness non-visual information. Accessible navigation apps using GPS (e.g., Blindsquare) allow blind pedestrians to localize themselves, follow a route, and discover nearby points of interest) wherein navigation data is provided to the user in the audio-augmented reality environment (Klaus page 124 para 2 2.2 AR is not limited to the visual augmentation, but it can be applied for audio augmentation [5–7]. With our AR tactile map, the physical tactile map can be augmented by audio and visual feedbacks which are enlarging/enhancing the focused area with voice over of the POR/POI according to the user’s input. We also propose intuitive user interface with hand gesture recognition for the interaction with the system. Klaus page 181 para 2 the system presented by Anderson [19] collects depth information about the environment, saves it in a chunk-based voxel representation, and 

Klaus do not explicitly teach as a sound having a location and distance.

However Daniela teach as a sound having a location and distance. (Daniela page 4 para 2 we have explored two novel, relatively simple and sparse spatial audio representations of 3D environments: 1) simulated echolocation with discrete `sound particles' and 2) distance-dependent hum volume modulation of beacon sounds attached to objects. Daniela page 26 para 4 the current study has explored the feasibility of two novel visual-to-audio mappings for the task of spatial navigation: simulated echolocation and distance-dependent volume modulation of hums. Both sonification methods were implemented and tested in two virtual reality environments using a head-mounted 3D motion-tracking device. The device created an immersive virtual world in which participants were able to physically walk around virtual scenes)

Regarding to claim 18:

18. Klaus teach the method of claim 14, Klaus do not explicitly teach further comprising: modifying the sounds assigned to at least one of the feature and the navigation point.

 further comprising: modifying the sounds assigned to at least one of the feature and the navigation point. (Daniela page 4 para 2 we have explored two novel, relatively simple and sparse spatial audio representations of 3D environments: 1) simulated echolocation with discrete `sound particles' and 2) distance-dependent hum volume modulation of beacon sounds attached to objects. Daniela page 26 para 4 the current study has explored the feasibility of two novel visual-to-audio mappings for the task of spatial navigation: simulated echolocation and distance-dependent volume modulation of hums. Both sonification methods were implemented and tested in two virtual reality environments using a head-mounted 3D motion-tracking device. The device created an immersive virtual world in which participants were able to physically walk around virtual scenes)
Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NASIM N NIRJHAR whose telephone number is (571)272-3792.  The examiner can normally be reached on Monday - Friday, 8 am to 5 pm ET.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Kelley can be reached on (571)272-7331.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.