I see your OCR and raise you more Machine Learning

Since school started in September, my time and energy has been mostly devoted to it. I have been dedicated to write codes for assignments rather than “cooler” projects. But this is about to change.

Yesterday my project proposal for COMP 400, a course titled Honours Project in Computer Science at McGill, got approved by my supervisor, Prof. Joseph Vybihal. This is probably the coolest project I will do for school. In fact, I expect it to be a big project (big enough to qualify for 3 university credits), and I will be in a duality of developer and PM for it in the next two months.

I intend to call this project “ScribeX”. According to Oxford Dictionaries, a scribe is “a person who copies out documents, especially one employed to do this before printing was invented.” In a nutshell, ScribeX is a program that copies handwritten characters from images (scanned or photographed) into text files, while keeping the original content and format of the handwritten document. As mentioned in the proposal, ScribeX addresses the need to convert handwritten documents into text-based ones, not uncommon among students and other professions where note-taking is important.

This project will be similar to an OCR solution, but it will involve more machine learning to do customized OCR for users.

User base

I imagine the most important user scenario of ScribeX to be situations where producing electronic documents is inconvenient, but storing, indexing and managing e-documents would make users’ lives easier. In fact, I developed the idea of ScribeX with school notes in my head: from what I observed in classes, there are certain scenarios where it is simply too much work to take notes with Word, especially the ones involving graphs and mathematical notations. Nevertheless, it would be handy to have the text parts indexed and searchable in a computer.

It should be fair to expect students to be a good user base for this application. Given my projected timeline (only two months before this project is due), I would give priorities to features aiming at students.

Product and user flow

Now we have a basic understanding on the potential users, we need a basic concept of the application itself.

Thinking in a student/user’s way, I would want an app that I can pull out casually, so it needs to be lightweight from the user’s point of view. When I write my notes, I might want to either scan them later or take a picture immediately with my phone, so this app must be accessible from computer or smartphones; in other words, cross-platform. With two months the only solution seems to be building a web application.

However, web apps aren’t perfect either. Because there is not a single copy of the software on the user’s personal device, the app might need a login feature in order to pertain user’s handwriting history, which can be important in training the machine to provide customized service. This will be an additional task for me, but fortunately I have done similar works before.

The user flow of ScribeX should be as intuitive as possible, because again this is a “casual” app. The user should have the option to skip login and just go for it; it is only when the confidence level is low that ScribeX will prompt the user to login in order to get help from previously learned materials. After processing the document, a text or Word file should be downloaded to the user’s device.

If a user does choose to register and login, she should be able to store her past data and benefit from more accurate recognition resulting from these data. Because an important user scenario is to use string search in handwritten notes, it would also be nice to have user’s converted documents archived, so the user can choose to search in all her previous documents.

A crucial requirement for ScribeX is to be accurate and, when accuracy cannot be guaranteed, let the user know. The app should be able to empirically rate its level of confidence, and when in doubt, either mark the part to let the user correct it, or present the original image. The latter might be preferable especially when the unrecognizable part is a graph.

I do plan to extend this project a little bit beyond the scope of a university course. I want to improve this project with user testing, so it is of course important to have a feedback channel for the user.

Feature list (tentative)

MVP:

– Text recognition

– Level of confidence rating

– Extracting graph directly

MDP:

– Multiple files processing

– Special character recognition

– File archive service

– Manual training

– Feedback channel

Completion criteria

The last question is how should I define success for this project. The MVP should be completed before the end of the semester, as a showcase of me having hands-on experiences with machine learning. The definition of “complete” is tricky though, because it is hard to clearly define accuracy in the context. Generally speaking, if the application is able to recognize most texts in my notes, and use properly-sized pictures to show whatever it cannot recognize, I would consider it done.

Thank you for reading. Because this is an ongoing project, feedback can be very helpful for the dev team, aka me. Once again, you can read the project proposal here.