Thursday, April 18, 2024

The Open Source Problem

People are having a big freakout about the Jia Tan user and I want to throw a little napalm on that kitchen fire by showing ya'll what the open source community looks like when you filter it for people with the same basic signature as Jia Tan. The summary here is: You have software on your machine right now that is running code from one of many similar "suspicious" accounts. 

We can run a simple scan for "Jia-Tans" with a test Reagent database and a few Cypher queries, the first on just looking at the top 5000 Pip packages for:

  • anyone who has commit access
  • is in Timezone 8 (mostly China)
  • has an email that matches the simple regular expression the Jia Tan team used for their email (a Gmail with name+number):

MATCH path=(p:Pip)<-[:PARENT]-(r:Repo)<-[:COMMITTER_IN]-(u:User) WHERE u.email_address =~ '^[a-zA-Z]+[0-9]+@gmail\\.com$' AND u.tz_guess = 8 RETURN path LIMIT 5000

This gets us a little graph with 310 Pip packages selected:

So many potential targets, so little time

One of my favorites is that Pip itself has a matching contributor: meowmeowcat1211@gmail.com


I'm sure whoever meowmeowcat is did a great job editing Pip.py


Almost every package of importance has a user that matches our suspicious criteria. And of course, your problems just start there when you look at the magnitude of these packages. 

I didn't scroll all the way down, but you can imagine how long this list is.

You can also look for matching Jia Tan-like Users who own (as opposed to just commit into) pip packages in the top 5000:

MATCH path=(u:User)-[:PARENT]->(p:Pip)<-[:PARENT]-(r:Repo)
WHERE u.tz_guess = 8
AND u.email_address =~ '^[a-zA-Z]+[0-9]+@gmail\\.com$'
RETURN path
ORDER BY r.pagerank DESC

Ok, there's not as many (less than 100), but some of these might be interesting given they are in the top 5000.


Pip packages can require other pip packages to be installed, and you also want to look at that entire chain of dependencies when looking at your risk profile. Reagent allows you to do this with a simple query. Below you can see the popular diffusers tool and scipy packages require pip packages that match "dangerous" users. In the scipy case, this is only if you install it as a dev. But nonetheless, this is interesting.

MATCH path=(u:User)-[:PARENT]->(p:Pip)<-[:REQUIRES]-(p2:Pip)<-[:PARENT]-(r:Repo) 
WHERE u.tz_guess = 8
AND u.email_address =~ '^[a-zA-Z]+[0-9]+@gmail\\.com$'
RETURN path



On the other hand, many people don't care about the particular regular expression that matches emails. What if we broadened it out to all Chinese owners of a top5000 Pip packages with either Gmail or QQ.com addresses and all the packages that rely on them. We sort by pagerank for shock value.
For customers that want to cut and paste into their DB:
MATCH path=(u:User)-[:PARENT]->(p:Pip)<-[:REQUIRES*..5]-(p2:Pip)<-[:PARENT]-(r:Repo)
WHERE u.tz_guess = 8
  AND ALL(rel IN relationships(path) WHERE rel.marker IS NULL)
  AND (u.email_address CONTAINS "gmail.com" OR u.email_address CONTAINS "qq.com")
RETURN p2.name, r.pagerank
ORDER BY r.pagerank DESC

Don't run Ansible, I guess?

One of the unique things about Reagent is we can say if a contributor is actually a maintainer, using some graph theory that we've gone into in depth in other posts. This is the query you could use:
MATCH path=(u:User)-[:MAINTAINS]->(c:Community)<-[:HAS_COMMUNITY]-(r:Repo)-[:PARENT]->(p:Pip)   
 WHERE u.tz_guess = 8
 AND u.email_address =~ '^[a-zA-Z]+[0-9]+@gmail\\.com$'
RETURN path LIMIT 15

As you can see, there are quite a few Pip packages where at least one maintainer (by our own definition) has been or currently is in the "Jin Tan"-style format.

Ok, so that's the tip of the iceberg! We didn't go over using HIBP as a verification on emails, or looking at any time data at all or commit frequencies or commit message content or anything like that. And of course, we also support NPM and Deb packages, and just Git repos in general. Perhaps in the next blog post we will pull the thread further. 

Also: I want to thank the DARPA SocialCyber program for sponsoring this work! Definitely thinking ahead! 



No comments:

Post a Comment