About Me

AI and engineering leadership in healthtech and fintech, nonprofit founder, former researcher at Harvard.

I'm Aron Szanto. I head the Machine Learning Products department at PathAI, delivering AI-powered diagnostics and therapies for cancer and other diseases. Previously, I led ML at Kensho, where we built a pioneering portfolio of language model-backed technologies used by experts around the world. I've spent my career in the healthtech and fintech industries, aiming to create AI systems to enable people performing the most critical jobs on the planet — from the CIA to the operating room. Back before Language Models got Large, I studied statistical machine learning, computational economics, and mathematical philosophy at Harvard.

I'm passionate about growing world-class teams that create outstanding and impactful technology. My work has focused on large language models and multimodal ML, complex disease diagnostics, fake news and misinformation detection, and human-computer hybrid systems. Some of my recent published research focuses on how fake and real news spread differently, building models to allow us to identify fake news before it diffuses widely.

When I'm not working, you'll find me playing cello in my orchestra, on the ultimate frisbee field, or hacking on an open source project. My music recordings and tech-stracurriculars are below! Other things you should know about me: my first name is properly spelled Áron and pronounced /ɑ'rõn/ (AH-rown), since my multicultural parents needed to be very authentic. I love espresso, dogs, and cooking. I'm terrible at singing, but I'm a great whistler. And though many have tried, you'll never be able to convince me that there's a city better than New York.

Projects

Here are some of my recent research projects. Everything here is open source—drop me a line if you want in!

CBL Graphs

Content-Blind Fake News Detection

Developed a novel machine learning methodology for identifying fake news on Twitter. Using only information about the topology of the Twitter network that forms around a news article, the model uses graph kernels to predict the truthfulness, bias, and subject matter of a rumor with high accuracy. The model outperforms both standard techniques and deep neural networks, demonstrating that network shape encodes rich and unique information about the content it surrounds. This work represents the state of the art in fake news identification within this domain, as published in The Web Conference (WWW), where I gave an invited talk about the work.

GitHub AI-Assisted Collaboration

AI-Assisted Collaboration on GitHub

Sebastian Gehrmann and I built a system to understand how large-scale collaboration works on platforms like GitHub. We used neural networks and machine learning to analyze vast amounts of historical GitHub data, developing a model that could predict a user's future contributions and determine whether a project would be successful. These findings could inform an AI system that manages project contributors, organizes them into subteams, recruits relevant users, and shapes work to maximize success.

Airbnb Reidentification

Airbnb Reidentification

Airbnb claims to protect host privacy using probabilistic location fuzzing. Along with Emily Houlihan and Neel Mehta, I developed an algorithm to reidentify Airbnb hosts using public voter records, demonstrating that Airbnb's platform is not identity-secure. This research gained attention in a hit Medium story and coverage from The International Business Times, and was published in The Journal of Technology Science.