About Me

Head of NLP at Kensho, CEO & Founder at the Zerobase Foundation, former researcher at Harvard.

I'm Aron Szanto. I head the NLP division at Kensho Technologies in NYC and lead the nonprofit Zerobase Foundation. Before working at Kensho, I was Product Engineering Manager at MarketFactory, a New York-based fintech startup. There, I led an R&D effort to merge cutting-edge research in machine learning and high-performance computing with macroeconomics to change the way that the world exchanges currencies. A few jobs before that, I studied applied math, computer science, and economics at Harvard.

I'm passionate about growing world-class teams that create outstanding and impactful technology. My work has centered around NLP, fake news and misinformation detection, artificial intelligence and multi-agent systems, big data and distributed systems, and machine learning. My most recent published research focuses on the ways in which fake and real news spread differently, building models to allow us to identify fake news before it diffuses widely.

When I'm not working, you'll find me on the ultimate frisbee field, playing cello in an orchestra or chamber ensemble, or hacking on an open source project. Other things you should know about me: my first name is properly spelled Áron and pronounced ɑ'rõn (AH-rown), since my multicultural parents wanted to be very authentic. I love coffee, dogs, and cooking. I'm terrible at singing, but I'm a great whistler. And though many have tried, you'll never be able to convince me that there's a city better than New York.


Here are some of my recent research projects. Everything here is open source— drop me a line if you want in!
CBL Graphs

Content-Blind Fake News Detection

Developed novel machine learning methodology for identifying fake news on Twitter. Using only information about the topology of the Twitter network that forms around a news article, the model uses graph kernels to predict the truthfulness, bias, and subject matter of a rumor with high accuracy. The model beats the performance of both standard techniques and deep neural nets, demonstrating that network shape encodes rich and unique information about the content that it surrounds. This is the first application of predictive analytics to the largest collection of fake news stories and associated social networks ever assembled. With its high accuracy, this work represents the state of the art for fake news identification in this domain. Research paper published at The Web Conference (WWW), where I gave an invited talk about the work.

GitHub AI-Assisted Collaboration

AI-Assisted Collaboration on GitHub

Sebastian Gehrmann and I built a system to understand how large-scale collaboration works on platforms like GitHub. We used neural networks and machine learning to analyze huge amounts of historical GitHub data and developed a model that could predict both a user's future contributions, as well as determine whether a given project is going to be successful. These findings might be used in an AI information system that assists collaboration by actively managing a project's contributors, organizing them into subteams, finding relevant users to bring into the project, and shaping the work that users do on the project to maximize its success. It's way cool.

Airbnb Reidentification

Airbnb Reidentification

Airbnb claims that they protect the privacy of their hosts using a probabilistic location fuzzing mechanism. Along with Emily Houlihan and Neel Mehta, I developed an algorithm to reidentify Airbnb hosts using public voter records, demonstrating that Airbnb's platform is not identity-secure. Find our hit Medium story below, as well as news coverage from The International Business Times. Published in The Journal of Technology Science.