Posted on

Written By: Myles Shin

Is it possible to engineer a program to remember the whole internet? That’s exactly what the startup Diffbot is trying to achieve. Diffbot is a company that utilizes artificial intelligence (AI) in order to extract as much data from the internet as possible. AI is the computational simulation of human intelligence and is responsible for inventions such as Amazon Alexa and self-driving cars. 

AI wasn’t created until the latter half of the 1900s. In 1997, IBM’s supercomputer, Deep Blue, beat Russian grandmaster Garry Kasparov, the international chess champion at the time. Fourteen years of AI research later, IBM’s new supercomputer, Watson, beat Jeopardy’s reigning champions, Brad Rutter and Ken Jennings. Historically, artificial intelligence was used to store an abundance of data/information available later for retrieval. 

Founded in 2008 at Stanford University, the company’s mission is to “build the world’s first comprehensive map of human knowledge” via knowledge graphs. Knowledge graphs compile data from across the internet to help add relevant information to what a user was searching for. For example, if you were to look up Michael Jordan on Google, there would be a profile panel about him on the right side of the screen. This is powered by Google’s knowledge graphs. 

Diffbot’s knowledge graph contains more than double the information of Google’s, with a whopping 1 trillion facts, adding over 130 million facts a month. Google only is able to apply its knowledge graphs to a few of the most popular search terms, like singers and other famous people, places, objects, or events. However, if you were to look up any generic question in the Google search bar, you would only receive links, not the information panel on the right side. Diffbot, on the other hand, is trying to get this type of relevant information for every single search item. Diffbot has been rapidly expanding and already attracted major clients like Microsoft and eBay who now employ the company to strengthen their search engines.

To be able to accomplish the things that Diffbot does, Diffbot needs to be able to scour the internet at lightning speeds. Diffbot uses a high-performance version of Google Chrome in order to sift through the internet as efficiently as possible. It looks through websites as raw pixels and utilizes algorithms to categorize each piece of data. This way, it’s able to look through each page super quickly, spending no unnecessary time on each page. While doing this at such high speeds, Diffbot is able to rate and analyze content to determine whether or not the website it is “reading” is trustworthy, or if the website has up-to-date information.

Another significant application of Diffbot is being able to quickly search throughout the internet rapidly to help companies spot counterfeit goods. Adidas and Nike both use Diffbot to find all the sites that mention a specific type of shoe, the Adidas Ultraboost for example. Diffbot is able to find every single website that says “Adidas Ultraboost” no matter what language and then give the companies the list of all of the results.

Diffbot has so many future applications that can help improve our quality of life. They can help remove language barriers by creating some sort of earpiece that can translate languages instantaneously or making fully autonomous cars a possibility in the near future. However, Diffbot also may pose some privacy issues. As with any type of AI, there is always the debate about whether our right to privacy is being invaded. Diffbot has spoken on this subject and while they will collect publicly known data, they will not collect sensitive data such as race or political alignments. With Diffbot making sure to protect the privacy of both consumers and search subjects, there truly is only an upside to the wide implementation of Diffbot. With the advent of Diffbot’s all-knowing knowledge graphs and the continual progress by artificial intelligence, there is so much that can be accomplished with these two things working in tandem.