About Me

Harvard CS and Econ, Machine Learning Engineer at Kensho.

I'm Aron Szanto. I research machine learning at Kensho Technologies in NYC, working on projects ranging from statistical modeling to natural language processing. I studied computer science and economics at Harvard, having worked jointly towards my Bachelor's and Master's degrees. Before working at Kensho, I was Product Engineering Manager at MarketFactory, a New York-based fintech startup. There, I led their R&D effort to merge cutting-edge research in machine learning and high-performance computing with macroeconomics to change the way that the world exchanges currencies. Overall, my work and experience has centered around artificial intelligence and multi-agent systems, big data and distributed systems, and machine learning. My outside research focuses on the ways in which fake and real news spread differently, building models to allow us to identify fake news before it diffuses widely. When I'm not working, you'll find me on the ultimate frisbee field, playing cello in an orchestra or chamber ensemble, or hacking on an open source project. Other things you should know about me: my first name is properly spelled Áron and pronounced ɑ'rõn (AH-rown), since my multicultural parents wanted to be very authentic. I love coffee, dogs, and cooking. I'm terrible at singing, but I'm a great whistler. And though many have tried, you'll never be able to convince me that there's a city better than New York.

Projects

Here are some of my recent research projects. Everything is open source, because what's the point of building cool stuff if you can't share it? Drop me a line if you want in!
CBL Graphs

Content Blind Network Learning

Developed new machine learning methodology for identifying fake news on Twitter. Using only information about the topology of the Twitter network that forms around a news article, the model uses graph kernels to predict the veracity, bias, and subject matter of an article with high accuracy. The model beats the performance of both standard techniques and deep neural nets, demonstrating that network shape encodes rich and unique information about the content that it surrounds. This is the first application of predictive analytics to the largest collection of fake news stories and associated social networks ever assembled. With its high accuracy, this work represents the state of the art for fake news identification in this domain. Research paper published shortly.

GitHub AI-Assisted Collaboration

AI-Assisted Collaboration on GitHub

Sebastian Gehrmann and I built a system to understand how large-scale collaboration works on platforms like GitHub. We used neural networks and machine learning to analyze huge amounts of historical GitHub data and developed a model that could predict both a user's future contributions, as well as determine whether a given project is going to be successful. These findings might be used in an AI information system that assists collaboration by actively managing a project's contributors, organizing them into subteams, finding relevant users to bring into the project, and shaping the work that users do on the project to maximize its success. It's way cool.

Airbnb Reidentification

Airbnb Reidentification

Airbnb claims that they protect the privacy of their hosts using a probabilistic location fuzzing mechanism. Along with Emily Houlihan and Neel Mehta, I developed an algorithm to reidentify Airbnb hosts using public voter records, demonstrating that Airbnb's platform is not identity-secure. Find our hit Medium story below, as well as news coverage from The International Business Times. Published in The Journal of Technology Science.