Data Science at Alethea

By Matthew Kenney

I am a data scientist and therefore understanding data structures has always been a passion of mine. As I came into my own as an AI researcher, my focus on social media data analysis allowed me to develop a deep understanding of the power of machine learning models that drive social media engagement. In this space, I have for years noted disinformation’s impact on our access to accurate information. Now, we are seeing its threat to democracy around the world.

Earlier this year, I decided to leave big tech behind and join Alethea, a technology company addressing and mitigating disinformation. This is an exciting new chapter in my career. I get to work on an important problem in an environment that allows for creative problem-solving, growth, and collaboration. As a Senior Data Scientist, I have a large role in product development, including direction and new features. In this blog post, I’ll discuss how collaboration drives the development of our flagship SaaS product, Artemis.

Connecting AI and Counter-Disinformation

AI and countering disinformation are both massively growing fields. From advances in modeling coordinated disinformation campaigns to applying the latest transformer models on social media data, it’s a thrilling time to work at the intersection of both disciplines. In my day to day, I get to work and collaborate with leaders in the field of open-source intelligence (OSINT) and the inner workings of disinformation. There’s a huge value in having data scientists, machine learning engineers, software developers, and disinformation experts in the same room, collaborating towards a common goal. The knowledge transfer among our respective fields has taken Artemis from concept to reality.

Collaboration and Knowledge Transfer

Knowledge transfer among teams is crucial for building a platform like Artemis. Our data scientists and machine learning engineers work closely with disinformation analysts to keep up with their distinct and evolving skill sets. One thing I love about the collaboration is the ability to walk through the analysts’ findings and ask questions about their techniques, tools, and approaches. Collaborating with them and thinking about ways to improve our models to mirror their expertise is an exciting challenge and the perfect niche for my interests. No two disinformation campaigns are the same, but this collaboration allows my team to see the overlapping patterns that we, as data scientists, can capitalize on to create the best-in-class machine learning models. Infusing their domain knowledge into our products is an exciting process and I’m certain that what we are doing is the most promising approach to tackling disinformation at scale.

The significant advantage of machine learning is that you can automatically extract patterns from massive amounts of data. By leveraging machine learning in the disinformation space, we are creating solutions that efficiently surface disinformation threats and their malicious messaging before they reach critical mass online. It’s possible to automate substantial parts of the data analysis process with a breadth that no team of human investigators can do on their own: our insights come in real-time and from analysis covering millions of data points. What’s more, particularly with deep learning, AI models pick up on patterns that may not be apparent to human analysts.

Big Data Analysis

What sets Alethea apart is how we identify and track disinformation across vast amounts of data. This is complex work, but patterns inherent to disinformation campaigns make it possible for machine learning to track them at scale. A disinformation campaign may begin with blog posts and posts on message boards, and work its way through social media sites to fringe and then mainstream news sources. Where we differentiate ourselves is our ability to identify the source and track subsequent diffusion. Attributing a disinformation narrative to an individual, group, organization, or source of origin gives our customers a greater level of power to address the bad actor seeking to harm them.

A key behind the collaboration with our analysts is our ability to tap their expertise to improve the performance of our machine learning models. We focus on a number of areas, including:

Understanding the correct datasets to use for pretraining and fine-tuning our models
Building knowledge bases of data such as terms, keywords and other indicators
Developing additional signifiers in the data that might signal a concerted disinformation campaign

Our Artemis platform hits the sweet spot between large-scale data analysis and domain expertise, allowing customers access to analyst-guided insights generated by analyzing a vast amount of data. How do we go from a generic model to one that can capture the intricacies inherent in a wide variety of disinformation campaigns? From a team standpoint, deep collaboration comes first. From a technical standpoint, you’ll have to come work with us to find out.

We are growing our Data Science team and will be opening a number of Data Science and Machine Learning Engineer roles, so please keep an eye on our careers page. You can check out our postings for openings available now on the Data Science team - a VP in Data Science and a Data Scientist, where we’ll work together to develop a vision for the future of Artemis.

Interested in seeing Artemis in action? Sign up here for a demo.