Project 2
Milestone 1
Continuing off of the idea that I developed in Project 1, I would like to redesign and implement the voice transcription app with Android. The inspiration behind this app remains the same, I struggle on a daily basis to have a flowing conversation with a hearing person. I am profoundly hard of hearing, and each year that passes, my hearing continues to digress. Even since the development of Project 1, I still constantly ask “what?” to people and it continues to get worse and worse with people assuming I can hear just like a normal person. I want to learn how to develop the same application on two different platforms - iOS and Android. Since I have already completed the iOS version of a transcribing application, I want to implement it in Android. As before, I want it to be a one button application that when the user presses the button, it will transcribe what a person is saying in real time. The intended audience for this app is hard of hearing or deaf people. The overall user flow of this application with act the same as with the iOS app. It will allow the user to ask the person to repeat what they had said and listen for the speech using the device’s microphone, and then convert the spoken text to written word.
Android, being as open source and having a much more extensive amount of applications out there have much more similar applications to what I wish to develop that iOS did. There are a few apps that already exist that are similar to what I desire to create.
SpeechNotes - This is a note application that when you speak into the device, it transcribes the conversation and outputs the written text. One of the biggest positives of this application is that you don’t have to dictate punctuation. At the bottom of the application, there is something called a “punctuation keyboard” in which you don’t have to say “comma” or “semi-colon” if you want that punctuation. Instead, there is a keyboard just with punctuation that you can press as you are talking and don’t have to say the awkward punctuation words. They do offer a trial period with the premium version of the application for free and then from then on, you have to pay for transcriptions which progressively gets more expensive the more files you have saved in the app. Many users have complained about how this application is not as intuitive as they were hoping, and when they tried to use their free features, their application crashed then requested payment. Additionally, this is not an app to be used in casual conversation, this is meant more for longer conversations or lectures.
VoiceNotes - Specializes in taking very short and quick notes out-of-the-blue, which is more similar to what I will be developing with my app. You can also access other audio files that you have saved on your device elsewhere or live transcribe. One of the powerful positives that it has is that the application allows for organizational features such as color coding, separate folders, and more. However, even with that positive, it still requires purchase to access these features which is something that many people complain about. Another area for improvement with this application is that there were way too many buttons for the user to have to maneuver around in order to get the app to record something, and then if you stay on free version it only lets you record 5 seconds.
SpeechTexter - This application is a speech-to-text application that has the capability of working both online and offline which is really helpful for some users. However, in order to work offline, you need to download the required documentation from Google (as it uses their backend transcribing agent). One of the things that makes it different from other applications is that it has its own custom dictionary for you to store contacts and other words you want stored. The level of accuracy with this application is very low which is extremely frustrating for users.
These are just a few examples of applications that are used by Android users, and one of the most common issue is the lack of intuitive nature with the application. A lot of people requested having just one button to press to begin recording and one button to end recording. Some people wanted to have it auto save when finished. Another user feedback was that the user couldn’t edit the text after they had finished recording but could during recording. All of these elements were things that I had considered with my iOS version of this application but will not be translating to Android. I had a one button that I want to have at the top of the screen that will allow for the start and end recording of the speech by the user. One piece of advice I received from other people using my iOS application is that it wasn’t intuitive to press the button at first, so maybe adding text to indicate to the user where to press could be useful (maybe a toast). Additionally, I am able to add an editable text area that is large enough for the text to be displayed in and editable. Having an autosave feature is difficult to implement as we don’t know where the user wants to store the documents (this can be custom) etc. so maintaining a separate navigation button that allows the user to select where and if they want to save the file is better. I think that given all of these edits that I hope to implement I will create an intuitive, and easily usable application.
In order to implement this application, I will need to use the Android API documentation for the android.speech. Using the SpeechRecognizer class within the android.speech API, I am able to grab the audio from the user’s device and then use this class to convert the spoken words to text. Additionally, the Google Cloud has further documentation on speech-to-text that is applicable for Android applications. With all of this documentation that is open source and available, I do feel like it is possible to implement this application.
The prototype of what I hope to develop in terms of the user interaction is displayed below on the note cards.
There are some differences from that of the iOS device as the applications look different and are developed different on Android Devices. The user will presented with a home screen like implemented with the iOS device with one button in the center that will be teal when not recording and red when recording. The transcription will begin to show below the button in the editable text area until you press the button again to stop recording. At the end of the transcription, you have the option to edit the text and then save or delete the transcription. The navigation bar will be the last element but seeing as Android does not use navigation on a single activity interaction, this might be done using a button or a FAB, this is a to be determined element of my application. Additionally, in order to save or share a document, Android has a different share icon, not the sharrow but a three dot vector image with two lines connected the dots. I want this app to be intuitive as previous apps have been complained about for not being intuitive.
Milestone 2
For my milestone 2, I completed a proof of concept with the transcribing library in Android. I was using the Android documentation website where examples were linked to as well as pseudocode to follow along with to understand how to implement this library. The initial element I needed to figure out was how to do a button with an image and not just with text. I found that Android actually has an ImageButton element that you can include on the view and it will implement having an image that you define to be the button as opposed to just text. I decided that I wanted to use the same logo image that I used for my iOS device. I then added two different text views, one to tell the user to click on the button to record text and one to display the transcription of the speech from the user. The following part that was the longest and most difficult part was figuring out the speech library itself. In the MainActivity.java file, I needed to establish variables to hold the button value (clicked or not) and I needed to have another variable to hold the transcribed text. I wanted to make the app similar to the iOS app in that when the button is pressed, the app begins transcribing text. In order for that to occur, I needed to include an event listener for a click on the button. If the button is clicked, it will then continue to traverse the program to the function to transcribe the speech. Inside of the function, I needed to define an implicit intent. Essentially since I just want to map the intent to an Activity, it does not have its own separate components included to make explicit. Within this intent I added a few extra elements that were not necessary but I felt was key to creating a successful application. One of the extras was to access the default language that was used by the user already on their device and transcribing the text in that language. Another extra is to access the prompt so when the pop-up appears to have the user begin talking, it will tell the user to start speaking. If the user has allowed access to the microphone, then the app will begin transcribing, if they have not, then a pop-up will appear to let the user know that they did not allow access for the app to access their microphone. Once the speech has been detected, I then implement another function that gets those words, passes them into an array and displays the array of words like sentences into the transcribed text view that is on the screen displayed to the user. Of course, in the development of this app I ran into some issues. One of the first things that was noticeable after I got the transcribing to work was a silly mistake, I forgot to add vertical constraints, so the transcribed text was laying directly on top of the prompt.
Furthermore, at this time we had just learned about the toolbar on Androids so I was considering implementing one. However, after further consideration, a toolbar would not be effective in my application to allow for the sharing of the text that they are transcribing as that is one action I want to implement and so it can be done using a FAB much more effectively. I then fixed the constraint layout for portrait (not yet moving on to landscape just yet), and got the below result.
If you notice, I also removed the toolbar from the layout as it was not necessary to the functionality of the app. Below is the working prototype shown in order of what the user experiences. Additionally, I changed the default image of the FAB from the mail icon to the share icon.
As you can see above, there are two separate prompts for the user - Tap on button to speak and then Say something within the pop-up. This was described above with how I implemented the code. Additionally, it is key to notice that the application uses the Google speech library because that is the default for the Android speech library to be using. Furthermore, I attempted to get the image to change from green to red as the user presses on the record button. However, as you can see above, the button remains as red after the user has finished recording even though I want it to turn back to green. This was an obstacle for me to overcome. I tried to use a boolean variable to establish when the button is pressed and when it is not to make the image change to the other image when clicked. I did this by adding a line of code to change the button to go back to the logo image before the break but after I put the text into the array of strings and set that equal to the transcribed text TextView.
The next step to get working is the FAB button to allow it to share content with different apps in the phone, kind of like the sharrow button I used on the iOS. This takes additional research.
Furthermore, I want to implement a delete button so the user can delete the text displayed in the text view if they do not wish to save it. (MAYBE?)
The final element that I need to work on with this project will be to get it to be responsive for all layouts and sizes of devices. I already have it working for the most part in landscape view, however, I need to add the piece of code so the app doesn’t reload every time the user rotates their phone.
Milestone 4
The final product of the application for transcribing text was a much more intuitive app than I created for the iOS device. I incorporated the feedback I received about how to improve my iOS app in adding a text field telling the user where to click on the screen in order to begin recording speech. Furthermore, Android Studio allows you to add in your own custom theme which allowed for there to be a more cohesive design to the application. With this and the intuitiveness of adding the text field for the user to understand where to click to begin transcribing speech allowed for a much better stylistic application than what was completed for the iOS app. In terms of tactically, the transcribing feature worked great when shown on the phone and the transcription was very accurate. When testing on the emulator, it took a lot of memory for the computer to be able to transcribe the speech so the speech detector took much longer and therefore also was not as accurate. Like any project, there are elements that could be added to this app if done again or if allotted more time that could be beneficial to the flow of the app. One of the elements that was discussed during feedback while presenting was to allow the transcribed text to be editable so that the user is able to change text before having to click on the share button. Once the user clicks on the share button, they are able to edit the text before sending or saving it. Additionally, the ability to add text on to the current transcribed speech when you mess up mid-talking is not possible with the way the app is done currently. To improve this element and allow for the text to not be removed when continuing to transcribe the speech, I could use a += when adding the text to the text view and then not have the text be removed until the user shares or deletes the transcribed text. Overall, I am very proud of this project and already have friends and colleagues that have asked to get a copy of this application downloaded to their phone to use to remember ideas they might have throughout the day.