Skip to content
This page was generated and translated with the assistance of AI. If you spot any inaccuracies, feel free to help improve it. Edit on GitHub

Supported File Types

PRX-SD identifies file types using magic number detection (examining the first bytes of a file) rather than relying on file extensions. This ensures accurate identification even when files are renamed or have missing extensions.

File Type Matrix

The following table shows all supported file types and which detection layers apply to each:

File TypeExtensionsMagic BytesHashYARAHeuristicsArchive Recursion
PE (Windows).exe, .dll, .sys, .scr, .ocx4D 5A (MZ)YesYesYes--
ELF (Linux).so, .o, (no ext)7F 45 4C 46YesYesYes--
Mach-O (macOS).dylib, .bundle, (no ext)FE ED FA CE/CF or CE FA ED FE/CFYesYesYes--
Universal Binary(no ext)CA FE BA BEYesYesYes--
PDF.pdf25 50 44 46 (%PDF)YesYesYes--
Office (OLE).doc, .xls, .pptD0 CF 11 E0YesYesYes--
Office (OOXML).docx, .xlsx, .pptx50 4B 03 04 (ZIP) + [Content_Types].xmlYesYesYesExtracted
ZIP.zip50 4B 03 04YesYesLimitedRecursive
7-Zip.7z37 7A BC AF 27 1CYesYesLimitedRecursive
tar.tar75 73 74 61 72 at offset 257YesYesLimitedRecursive
gzip.gz, .tgz1F 8BYesYesLimitedRecursive
bzip2.bz242 5A 68 (BZh)YesYesLimitedRecursive
xz.xzFD 37 7A 58 5A 00YesYesLimitedRecursive
RAR.rar52 61 72 21 (Rar!)YesYesLimitedRecursive
CAB.cab4D 53 43 46 (MSCF)YesYesLimitedRecursive
ISO.iso43 44 30 30 31 at offset 32769YesYesLimitedRecursive
Shell script.sh, .bash23 21 (#!)YesYesPattern--
Python.py, .pycText / 42 0D 0D 0AYesYesPattern--
JavaScript.js, .mjsText detectionYesYesPattern--
PowerShell.ps1, .psm1Text detectionYesYesPattern--
VBScript.vbs, .vbeText detectionYesYesPattern--
Batch.bat, .cmdText detectionYesYesPattern--
Java.class, .jarCA FE BA BE / ZIPYesYesLimited.jar recursive
WebAssembly.wasm00 61 73 6DYesYesLimited--
DEX (Android).dex64 65 78 0A (dex\n)YesYesLimited--
APK (Android).apkZIP + AndroidManifest.xmlYesYesLimitedRecursive

Detection Layer Legend

LayerMeaning
HashSHA-256/MD5 hash checked against signature database
YARAFull YARA rule set applied to file contents
Heuristics: YesFull file-type-specific heuristic analysis (see Heuristics)
Heuristics: LimitedBasic entropy and structure checks only
Heuristics: PatternText-based pattern matching for suspicious commands and obfuscation
Archive RecursionContents are extracted and each file is scanned individually

Magic Number Detection

PRX-SD reads the first 8192 bytes of each file to determine its type. This approach is more reliable than extension-based detection:

File: invoice.pdf.exe
Extension suggests: PDF
Magic bytes: 4D 5A → PE executable
PRX-SD identifies: PE (correct)

Extension Mismatch

When the file extension does not match the detected magic number, PRX-SD adds a note to the scan report. Extension mismatches are a common social engineering technique (e.g., photo.jpg.exe).

Magic Detection Priority

When multiple signatures could match (e.g., ZIP magic for both .zip and .docx), PRX-SD uses deeper inspection:

  1. Read magic bytes at offset 0
  2. If ambiguous (e.g., ZIP), inspect internal structure
  3. For ZIP-based formats, check for [Content_Types].xml (OOXML), META-INF/MANIFEST.MF (JAR), AndroidManifest.xml (APK)
  4. Fall back to the generic container type

Archive Recursive Scanning

When PRX-SD encounters an archive (ZIP, 7z, tar, gzip, RAR, etc.), it extracts the contents to a temporary directory and scans each file individually through the full detection pipeline.

Recursion Depth

SettingDefaultDescription
max_archive_depth5Maximum nesting levels for archives within archives
max_archive_files10,000Maximum files to extract from a single archive
max_archive_size_mb500Maximum total extracted size before stopping

These limits prevent resource exhaustion from zip bombs and deeply nested archives.

toml
# ~/.config/prx-sd/config.toml
[scanning]
max_archive_depth = 5
max_archive_files = 10000
max_archive_size_mb = 500

Zip Bombs

PRX-SD detects zip bombs (archives with extreme compression ratios) and stops extraction before consuming excessive disk space or memory. A zip bomb detection is reported as SUSPICIOUS in the scan results.

Password-Protected Archives

PRX-SD cannot extract password-protected archives. These are reported as skipped in the scan results with a note about the encryption. The archive file itself is still checked against hash and YARA databases.

Script Detection

For text-based script files (shell, Python, JavaScript, PowerShell, VBScript, batch), PRX-SD applies pattern-based heuristics:

PatternPointsDescription
Obfuscated strings10-20Base64-encoded commands, excessive string concatenation
Download + execute15-25curl/wget piped to bash/sh, Invoke-WebRequest + Invoke-Expression
Reverse shell20-30Known reverse shell patterns (/dev/tcp, nc -e, bash -i)
Credential access10-15Reading /etc/shadow, browser credential stores, keychain
Persistence mechanisms10-15Adding cron jobs, systemd services, registry keys

Unsupported Files

Files that do not match any known magic number are still checked against hash and YARA databases. Heuristic analysis is not applied to unknown file types. Common examples:

  • Raw binary data
  • Proprietary formats without public magic numbers
  • Encrypted files (unless the container format is recognized)

These files appear as type: unknown in scan reports and receive hash + YARA scanning only.

Next Steps

Released under the Apache-2.0 License.