You may recognize the ways Artificial Intelligence (A.I.) has permeated many aspects of daily life, such as the TV show recommendations you receive on Netflix or a product recommendation from Amazon. And although it’s easy to see how A.I. has successfully integrated into our lives in a variety of ways, using it in the context of healthcare is relatively new. A.I. is being developed to assist doctors in detecting disease or determining the best treatment for a patient, and there’s a groundswell of wearable devices that provide insights into your health patterns to name a few.
At its core, the kinds of A.I. we use at doc.ai are simply good at pattern recognition. Pattern recognition is something humans do very well starting at a very early age, but computers require a huge amount of data and computational power to be equally good at this skill.
The pattern recognizers that we build here, or “models”, are just complex mathematical equations — very complex mathematical equations — which have been exposed to a lot of information and are now good at recognizing patterns in that information. If we’ve done a good job of “training” that model then it will also be able to recognize those same patterns in information it hasn’t seen before. We’ll say the model “generalizes well” and is good at making new predictions.
Accurate Models Need Data
Having enough data is essential if you want to end up with a good model that generalizes well.
Imagine, for example, that you want to distinguish an image of a dog from an image of a cat. Given a photo of an animal you might notice the shape of the ears and the nose, what the tail looks like, how big the teeth are, the color and texture of the fur, and so on. You see all these patterns at once, patterns you are already familiar with, and immediately recognize that it’s a photo of a ridiculously adorable kitten.
When a computer looks at an image of a dog or a cat it only sees numbers representing the red, green, and blue colors of each individual pixel in it, all the tiny dots that make up the image in a digital format. During training in which the A.I. considers all these dots, a model will build up an increasingly complex numerical representation of the image and in a sense also start to recognize the patterns that make up an ear, a nose, or fur.
But it may happen that during training the model ends up focusing on the wrong parts of the image or the wrong patterns in it. If you trained a dog and cat recognizer on only two pictures, one of a black cat and the other of a white dog, the model might learn that the pattern “black” means cat and the pattern “white” means dog. When this model sees a new picture of a black dog, it will guess that it’s an image of a cat. This is a model that does not generalize well because it’s learned the wrong patterns.
The solution is to train the model on more data--lots more data, thousands, even millions of pictures of cats and dogs, and certainly pictures of black dogs and white cats. In fact, a lot of what our artificial intelligence team here at doc.ai does is find and prepare data for training models so that they can learn to focus on the right patterns.
Normally we collect data on our servers and clean it up before working with a new model that is shown to doc.ai users. This is the early research phase where we use data from public sources as well as anonymized data that our users have explicitly given us permission to use for specific cases. For example, when you take a medical selfie in the doc.ai app and correct the A.I.’s predictions of your BMI, you’re helping us build more accurate models by providing images taken in real-world conditions, images with the same kinds of patterns that the model will see in new images later on.
Building accurate models necessarily requires a lot of data, but privacy is extremely important to us at doc.ai. That’s why our A.I. team is also working on some of the most advanced privacy-preserving technology in the industry.
Federated learning is a type of technology that allows us to send a model to the data for training, instead of sending data to the model. Rather than collecting anonymized information on our servers, we’ll send a model to your phone and train it on the data there. Then the phone will send the updated model back to us. Not only is your personal information anonymized--we never see it at all. We only see the effect it has on the model. Further privacy-preserving techniques also ensure that we can’t infer anything specific about the data that was used to produce that update. Inc.com further discusses doc.ai’s efforts in Federated Learning and why it’s needed in a recent article, here.
At doc.ai, we use A.I. in healthcare applications designed to help people lead happier, healthier lives. And as we practice A.I. training by collecting data and building models, we strive to do so in a way that preserves your privacy and addresses those concerns of trust and responsibility that you might have.