Unmasking the Internet Imposters: AI's Role in Identifying Fraudulent Websites

The digital realm is a playground for deception. Every day, we see the rise of fraudulent websites, meticulously designed to appear trustworthy, with the sole purpose of illicitly profiting, either directly or indirectly. These sites are notorious for stealing sensitive information such as passwords and other credentials, which are then sold on the dark web. They also trick unsuspecting individuals into executing false money transactions or revealing their credit card data.

The Corporate Conundrum: The Threat of Fake Websites

For many businesses, the nightmare begins when their brands, logos, or business websites are replicated and misused to gain trust from employees, customers, or anyone unfortunate enough to land on these fraudulent sites. The attacker’s end-goals can vary from breaking into the organization using stolen passwords to obtaining sensitive customer information.

Setting up such a fake website involves registering an internet domain name that often closely resembles the target organization’s name. This act, known as cybersquatting or domain squatting, is a common tactic. For instance, a legitimate website might be www.company.com, and an attacker could register www.company.org (different extension, TLD swap) if that domain name was still available. Most users won’t notice the difference. There are many other look-a-like domain names to choose from (like www.company-info.com, www.com-pany.com), making it virtually impossible for an organization to register all of them to prevent cybersquatting. In such scenarios, it becomes crucial to check if a website is legit to protect oneself from falling into the trap of these fraudulent sites.

Despite the organization being blameless, the repercussions of a successful attack can cause significant damages. The internet is rife with cases and news articles about such attacks and their consequences.

According to the United States federal law known as the Anticybersquatting Consumer Protection Act, cybersquatting is the act of registering, trafficking in, or using an Internet domain name with bad faith intent to profit from the goodwill of a trademark belonging to someone else. The cybersquatter may offer to sell the domain to the person or company who owns a trademark contained within the domain name at an inflated price.

Since 1999, the World Intellectual Property Organization (WIPO), one of the 15 specialized agencies of the United Nations (UN), has provided an arbitration system wherein a trademark holder can attempt to claim a squatted site. The number of claims has been on the rise ever since.

However, the law often falls short in helping the victim, as the attacker cannot be identified in many cases. In such instances, the law also fails to assist in settling any damages. That’s why several organizations have taken proactive steps to monitor the registration of potential cybersquatting domains in combination with detecting fake websites to avoid or minimize any damages early on.

Swift Detection of Fraudulent Websites: The Power of AI

Registering all possible look-a-like internet domain names (or so-called “cybersquatting candidates”) is not a feasible strategy. It would require significant time and money to register and follow up. Automation in the detection and monitoring process can offer a solution here.

Here’s how it can work:

  1. Enumerate candidate cybersquatting (look-a-like) domain names: Based on a list of the organization’s known primary domain names, look-a-like names can be generated. Several enumeration techniques can be applied like adding and removing delimiters like dots and dashes,changing the extension with another extension (top-level domain swaps), or changing characters. These techniques generate a large list of domain names. 

  2. Verify if candidate cybersquatting websites are online: The next step is to continuously or frequently verify if these domain names are registered AND if they are actually hosting a website. If online, by automatically taking a screenshot of the website, it can be analyzed and investigated further to create an initial list of potential cybersquatting candidates. However, new candidates will regularly and continuously pop up for an analyst to investigate. While such an approach has proven to work, its downside is that it will also generate many false-positives which might lead to alert fatigue.

  3. Further qualify cybersquatting websites for brand abuse indicators using AI: Additional checks should be added like verifying if the organization’s name is used on the cybersquatting website. Additionally, Artificial Intelligence (AI) can help to dramatically reduce those false-positives and turn them into highly relevant alerts by searching for logos. An effective and simple approach is to list all your brand logos that are used online and have AI verify if a cybersquatting website is using a similar image. Only comparing an image file name or the raw image data or size, however, is not enough as this can easily be circumvented by recompressing the image in other formats or changing the file name.

There are plenty of off-the-shelf AI engines available that can help in detecting logos and brand images. Some common ones are AWS Rekognition, Google Vision AI, MS Azure Vision AI, and OpenCV. These AI tools, equipped with advanced machine learning algorithms, can analyze and compare images at an astonishing speed and accuracy. They can identify even the slightest similarities between different images, making them highly effective in spotting brand logos on fraudulent websites. By integrating such AI engines into their cybersecurity strategy, organizations can significantly enhance their ability to detect and combat cybersquatting, thereby safeguarding their online presence and reputation.