Skip to content

A demonstration of domain generation algorithm (DGA) and determing regular expression and yara rule for each dga.

Notifications You must be signed in to change notification settings

ericyoc/gen_dga_regex_and_yara_rules_poc

Repository files navigation

Domain Generation Algorithm (DGA) Detection and Analysis

Domain Generation Algorithms (DGAs) are a technique used by malware authors to generate a large number of domain names programmatically. These generated domains are used for command and control (C&C) communication, data exfiltration, or other malicious purposes. DGAs make it difficult for security researchers and law enforcement to track and take down malicious domains, as the domains are constantly changing and can be generated on the fly.

Motivating Article and Related Work

Rizi, A., Yocam, E., Vaidyan, V., & Wang, Y. (2024, February 13). Robust Defense against LRS Obfuscated DGAs through Domain-Specific Noise and Deep Learning Analysis. https://doi.org/10.21203/rs.3.rs-3918608/v1

The Shadowserver https://dashboard.shadowserver.org/

MITRE ATT&CK https://attack.mitre.org/

Any.Run https://any.run/malware-trends/

Joe Sandbox https://www.joesandbox.com/analysispaged/0

Triage https://tria.ge/reports/public

URLhaus https://urlhaus.abuse.ch/statistics

Intezer Analyze https://analyze.intezer.com/scan

Tools

https://github.com/Yara-Rules/rules

https://urlhaus.abuse.ch/browse/

https://radar.cloudflare.com/domains

https://github.com/PeterDaveHello/top-1m-domains

https://github.com/baderj/domain_generation_algorithms

https://www.cybereason.com/blog/what-are-domain-generation-algorithms-dga

https://blog.didierstevens.com/programs/yara-rules/

https://github.com/certtools/intelmq-feeds-documentation/blob/master/DGArchive/Malware.md

https://www.botconf.eu/botconf-presentation-or-article/dgarchive-a-deep-dive-into-domain-generating-malware/

https://github.com/360netlab

https://malpedia.caad.fkie.fraunhofer.de/library

People

https://twitter.com/viql

https://twitter.com/push_pnx?lang=en

https://www.thecyberyeti.com/

Overview

This Python code demonstrates various types of DGAs, generates sample domains using each DGA type, and provides explanations, strengths, weaknesses, and deception methods for each DGA. It also includes functionality to create regular expressions and Yara rules for detecting DGA-generated domains.

Importance of Detecting DGAs

Detecting DGAs is crucial for identifying and preventing malware infections and attacks. By identifying DGA-generated domains, security professionals can:

  • Block access to malicious domains and prevent communication with C&C servers
  • Identify infected machines and take appropriate remediation actions
  • Gather intelligence on malware campaigns and threat actors
  • Proactively protect against future attacks that may use similar DGA techniques

Regular Expressions and Yara Rules

The Python code includes functions to generate regular expressions and Yara rules based on the sample domains generated by each DGA type. These regular expressions and Yara rules can be used to detect and identify DGA-generated domains in network traffic, DNS logs, or other data sources.

Regular expressions provide a pattern-matching mechanism to identify domains that exhibit characteristics of DGA-generated domains. Yara rules, on the other hand, are a more advanced detection method that allows for the creation of complex rules based on strings, regular expressions, and other conditions.

The generated regular expressions and Yara rules can be integrated into security tools, intrusion detection systems (IDS), or other monitoring solutions to identify and flag potential DGA activity.

DGA Types

The program covers the following DGA types DGA Results:

  1. Zodiac-based DGA: Generates domains based on zodiac signs and random strings.
  2. Time-based DGA: Generates domains based on the current time.
  3. Seed-based DGA: Generates domains based on a seed value and a hash function.
  4. Dictionary-based DGA: Generates domains by combining random words from a predefined dictionary.
  5. Pseudorandom Number Generator (PRNG) based DGA: Generates domains using a pseudorandom number generator seeded with a specific value.
  6. Arithmetic-based DGA: Generates domains by performing arithmetic operations on a base value and a random number.
  7. Permutation-based DGA: Generates domains by permuting the characters of a base domain.
  8. Fibonacci-based DGA: Generates domains using the Fibonacci sequence and a character mapping.
  9. Base32/Base64 DGA: Generates domains by encoding a seed value using Base32 or Base64 encoding.
  10. Wordlist-based DGA: Generates domains by combining random words from a predefined wordlist.
  11. Vowel-Consonant DGA: Generates domains by alternating between vowels and consonants.
  12. Morse Code DGA: Generates domains using Morse code representation of characters.
  13. Emoji DGA: Generates domains using emojis.
  14. Coordinate-based DGA: Generates domains using coordinates.
  15. Musical Notes DGA: Generates domains using musical notes and octaves.

Dyre DGA Implementation

The provided code demonstrates an implementation of a very simple Dyre Domain Generation Algorithm (DGA), which is a technique used by malware to generate a large number of domain names programmatically for command and control (C&C) communication or other malicious purposes.

Significance of Dyre DGA

The Dyre DGA was used by the Dyre banking trojan, a sophisticated piece of malware that targeted financial institutions and their customers. The Dyre trojan was first discovered in 2014 and was active until late 2015, when its infrastructure was taken down by law enforcement agencies.

The Dyre DGA played a crucial role in the success of the Dyre trojan by providing a resilient and constantly changing infrastructure for C&C communication. This made it challenging for security researchers and law enforcement to track and take down the malware's C&C servers, as the domains were constantly changing based on the date.

Understanding and being able to detect the Dyre DGA is important for several reasons:

  1. Identifying Dyre Infections: By detecting Dyre DGA domains in network traffic or DNS logs, security teams can identify systems that may be infected with the Dyre trojan or related malware variants.

  2. Threat Intelligence: Analyzing the Dyre DGA can provide valuable insights into the tactics, techniques, and procedures (TTPs) used by the threat actors behind the Dyre trojan. This intelligence can be used to improve defenses against similar threats.

  3. Historical Significance: The Dyre trojan was a significant threat during its active years, causing substantial financial losses to banks and their customers. Analyzing the Dyre DGA is important from a historical perspective to understand the evolution of malware and DGA techniques.

  4. Detecting DGA Variants: The techniques used in the Dyre DGA implementation can be adapted and evolved by other malware authors. By understanding the Dyre DGA, security researchers can better detect and mitigate similar DGA variants used by new or emerging threats.

The implementation details of the simplistic Dyre DGA are explained in the following section.

The code defines a set of characters and top-level domains (TLDs) that will be used to generate the domains. The generate_domains function generates a specified number of random domains by combining random characters from the character set and appending a random TLD.

The dyre_dga function is the core of the Dyre DGA implementation. It takes the year, month, and day as input and generates a seed value based on the provided date using the MD5 hash function. This seed value is then used to seed the random number generator, and a specified number of random domains (between 1000 and 5000) are generated using the generate_domains function.

The is_dyre_domain function checks if a given domain is a valid Dyre DGA domain for a specific date by generating the list of Dyre DGA domains for that date and checking if the provided domain is present in the list.

The generate_past_dga_domains function generates a list of Dyre DGA domains for a specified number of past days, including the current day.

The main function demonstrates the usage of the Dyre DGA implementation. It generates a list of Dyre DGA domains for the current date, selects a random domain from the list, and checks if it is a valid Dyre DGA domain using the is_dyre_domain function. It also checks a non-Dyre DGA domain (example.com) for comparison. Finally, it generates and prints a list of Dyre DGA domains for the past 1 day, including the current day.

Overall, this code provides a way to generate and detect Dyre DGA domains based on a given date, which can be useful for identifying and mitigating malware infections and attacks that utilize this DGA technique.

Usage and Who Uses DGAs

DGAs are primarily used by malware authors and threat actors to evade detection and maintain a resilient command and control infrastructure. Some common uses of DGAs include:

  • Botnets: DGAs are used by botnets to generate a large number of domain names, making it difficult for security researchers to track and take down the botnet's C&C servers.
  • Malware: Various types of malware, such as ransomware, trojans, and information stealers, employ DGAs to establish communication channels with their C&C servers and exfiltrate data.
  • Advanced Persistent Threats (APTs): APT groups often use DGAs as part of their tactics to maintain a low profile and avoid detection while conducting targeted attacks.

Conclusion

DGAs pose a significant challenge for malware detection and prevention. By understanding the different types of DGAs and their characteristics, security professionals can develop effective strategies to identify and mitigate DGA-based threats.

This DGA analysis provides a comprehensive overview of various DGA types, their explanations, strengths, weaknesses, and deception methods. It also demonstrates the generation of regular expressions and Yara rules for detecting DGA-generated domains.

By leveraging the knowledge gained from this analysis, security teams can enhance their detection capabilities, improve their incident response procedures, and strengthen their overall security posture against DGA-based attacks.

Disclaimer This repository is intended for educational and research purposes.