File Integrity Checker

A Python-based security tool for monitoring file integrity by creating a trusted baseline of file hashes and comparing future scans against that baseline.

This project was built to demonstrate practical Cyber Security, Detection Engineering, and Python scripting skills in a realistic and useful way. The tool can detect whether files have been added, deleted, or modified, and it can export results as a CLI report, CSV report, and HTML report. It also includes a watch mode for continuous monitoring.

The purpose of this project is to simulate a simplified integrity monitoring workflow similar to what is used in real-world security environments to detect suspicious file changes, unauthorized tampering, or unexpected modifications.

File Integrity Checker
Project Overview
Why This Project Matters
Features
Tech Stack
How It Works
Prerequisites
Quickstart
How to Run the Tool
Usage
Reports
Screenshots
What Was Implemented and Why
Security Relevance
Possible Improvements

Project Overview

This repository contains a Python script called integrity-checker.py that performs file integrity monitoring on a chosen file or directory.

The tool creates a baseline by hashing files with a chosen algorithm such as SHA-256 and saving the results in a JSON file. Later, the tool scans the same directory again and compares the current file hashes with the stored baseline. If a file was changed, removed, or newly created, the tool reports it clearly.

In addition to terminal output, the tool can also generate:

a CSV report for structured export
an HTML report for easier visualization
a watch mode for repeated automatic checks

Why This Project Matters

File integrity monitoring is an important concept in Cyber Security because it helps detect:

unauthorized file modifications
malicious tampering
suspicious configuration changes
unexpected new files
accidental deletions

This kind of monitoring is relevant for:

system administration
incident response
hardening critical systems
change tracking
detection engineering
Security Operations Center workflows

Although this project is a simplified standalone Python implementation, it follows the same core principle used by larger integrity monitoring solutions.

Features

Feature	Description
Baseline creation	Stores the original file state in a JSON file
Integrity checking	Detects added, deleted, and modified files
Recursive scanning	Scans all files in a directory and subdirectories
Multiple hash algorithms	Supports `md5`, `sha1`, `sha256`, and `sha512`
Colored CLI output	Improves readability in the terminal
CSV export	Exports scan results into a structured CSV file
HTML export	Creates a readable visual report in HTML format
Include patterns	Allows scanning only matching file types
Exclude patterns	Ignores unwanted files and directories
Default excludes	Automatically ignores `.git`, `__pycache__`, `.DS_Store`, etc.
Watch mode	Repeatedly checks files at a defined interval

Tech Stack

Component	Purpose
Python 3	Main programming language
`hashlib`	Hash generation for integrity checking
`json`	Baseline storage
`csv`	CSV report export
`html`	Safe HTML output
`argparse`	CLI argument parsing
`fnmatch`	Include and exclude pattern matching
`colorama`	Colored terminal output
`pathlib`	File and path handling

How It Works

1. Baseline Creation

The tool scans the selected target path and calculates a hash for each file. These hashes are then saved in a baseline JSON file.

This baseline represents the trusted initial state of the monitored files.

2. Integrity Check

When the check command is executed, the tool scans the files again and compares the current hashes with the baseline.

It classifies files into four categories:

Added
Deleted
Modified
Unchanged

3. Reporting

After a comparison, the tool can display the results in the terminal and optionally export the results into:

a CSV file
an HTML report

4. Watch Mode

The watch mode repeats the integrity check every few seconds. This simulates a simple continuous monitoring workflow and allows file changes to be detected live during execution.

Prerequisites

Before using the tool, make sure the following software is available:

Python 3.10 or newer
pip
a terminal environment such as macOS Terminal, Linux shell, or VS Code integrated terminal

Optional but recommended:

a virtual environment
colorama for colored CLI output

Quickstart

Clone the repository:

git clone <your-repository-url>

Move into the project directory:

cd file-integrity-checker

Create a virtual environment:

python3 -m venv venv

Activate the virtual environment:

source venv/bin/activate

Install the dependency:

pip install colorama

Create a baseline:

python3 integrity_checker_v2.py baseline /path/to/target --output baseline.json

Run a check:

python3 integrity_checker_v2.py check /path/to/target --baseline baseline.json

How to Run the Tool

1. Create a Virtual Environment

python3 -m venv venv

2. Install Dependencies

Activate the environment:

source venv/bin/activate

Install colorama:

pip install colorama

3. Create a Test Folder

mkdir test-integrity

cd test-integrity

Create test files:

echo "hello world" > file1.txt

echo "secure config" > config.txt

Move back to the project directory if needed.

4. Create a Baseline

python3 integrity_checker_v2.py baseline ./test-integrity --output baseline.json

5. Run an Integrity Check

python3 integrity_checker_v2.py check ./test-integrity --baseline baseline.json

At this point, no changes should be detected.

6. Simulate File Changes

Modify a file:

echo "hacked" >> ./test-integrity/file1.txt

Create a new file:

echo "new file" > ./test-integrity/newfile.txt

Delete a file:

rm ./test-integrity/config.txt

7. Run the Check Again

python3 integrity_checker_v2.py check ./test-integrity --baseline baseline.json

Now the tool should report:

one modified file
one added file
one deleted file

8. Generate Reports

Generate an HTML report:

python3 integrity_checker_v2.py check ./test-integrity --baseline baseline.json --html-report report.html

Generate a CSV report:

python3 integrity_checker_v2.py check ./test-integrity --baseline baseline.json --csv-report report.csv

Generate both:

python3 integrity_checker_v2.py check ./test-integrity --baseline baseline.json --html-report report.html --csv-report report.csv

9. Run Watch Mode

python3 integrity_checker_v2.py watch ./test-integrity --baseline baseline.json --interval 5

This command checks the target every 5 seconds and prints an alert if the file state changes.

Usage

Create a Baseline

python3 integrity_checker_v2.py baseline /path/to/target --output baseline.json

Check Files Against the Baseline

python3 integrity_checker_v2.py check /path/to/target --baseline baseline.json

Generate an HTML Report

python3 integrity_checker_v2.py check /path/to/target --baseline baseline.json --html-report report.html

Generate a CSV Report

python3 integrity_checker_v2.py check /path/to/target --baseline baseline.json --csv-report report.csv

Use Include Patterns

Scan only Python files:

python3 integrity_checker_v2.py baseline /path/to/target --output baseline.json --include "*.py"

Use Exclude Patterns

Ignore log files:

python3 integrity_checker_v2.py baseline /path/to/target --output baseline.json --exclude "*.log"

Ignore a full directory pattern:

python3 integrity_checker_v2.py baseline /path/to/target --output baseline.json --exclude "logs/*"

Run Continuous Monitoring

python3 integrity_checker_v2.py watch /path/to/target --baseline baseline.json --interval 10

Reports

The tool can produce the following outputs:

Report Type	Purpose
Terminal output	Quick analysis in the console
JSON baseline	Trusted reference of original hashes
CSV report	Structured export for analysis or documentation
HTML report	Human-readable visual report

The HTML report is especially useful for screenshots, portfolio documentation, and demonstrating project results in a clean format.

Screenshots

The following screenshots can be added to document the tool and demonstrate its functionality:

1. Baseline Creation

Shows the successful creation of the initial baseline JSON file.

Baseline Creation

2. Integrity Check – No Changes Detected

Shows a clean check where no file changes were detected.

Integrity Check No Changes

3. Integrity Check – Changes Detected

Shows a scan where files were added, deleted, or modified.

Integrity Check With Changes

4. HTML Report

Shows the generated visual HTML report.

HTML Report

5. Watch Mode

Shows the watch mode first in a normal state and then after a change was detected.

Watch Mode

What Was Implemented and Why

Baseline JSON storage

Implemented so the tool has a trusted reference state to compare against later.

Hash-based comparison

Implemented because hashes are a reliable way to detect even very small file changes.

Multiple algorithms

Implemented to make the tool more flexible and educational.

Include and exclude patterns

Implemented to give the user control over what should or should not be scanned.

Default excludes

Implemented to avoid noise from automatically generated files and directories.

CSV report export

Implemented to support structured exports and easier external review.

HTML report export

Implemented to make the results easier to read and present visually.

Watch mode

Implemented to simulate lightweight continuous security monitoring.

Colored terminal output

Implemented to improve readability and make results easier to interpret quickly.

Security Relevance

This project demonstrates important security concepts such as:

file integrity monitoring
change detection
hash comparison
incident response thinking
detection engineering
security tooling with Python

Possible real-world use cases include:

checking whether configuration files were modified
monitoring sensitive folders for unauthorized changes
verifying deployment artifacts
demonstrating tampering detection concepts in a lab environment

Possible Improvements

Possible future improvements for this project include:

email notifications when a change is detected
file metadata comparison beyond hashes
scheduled scans
log file output
GUI or web dashboard
alert severity levels
integration with a SIEM-style dashboard
packaging the tool as an installable CLI utility
support for checksum verification against signed files
Linux daemon or background service mode

Table of Contents​

Project Overview​

Why This Project Matters​

Features​

Tech Stack​

How It Works​

1. Baseline Creation​

2. Integrity Check​

3. Reporting​

4. Watch Mode​

Prerequisites​

Quickstart​

How to Run the Tool​

1. Create a Virtual Environment​

2. Install Dependencies​

3. Create a Test Folder​

4. Create a Baseline​

5. Run an Integrity Check​

6. Simulate File Changes​

7. Run the Check Again​

8. Generate Reports​

9. Run Watch Mode​

Usage​

Create a Baseline​

Check Files Against the Baseline​

Generate an HTML Report​

Generate a CSV Report​

Use Include Patterns​

Use Exclude Patterns​

Run Continuous Monitoring​

Reports​

Screenshots​

1. Baseline Creation​

2. Integrity Check – No Changes Detected​

3. Integrity Check – Changes Detected​

4. HTML Report​

5. Watch Mode​

What Was Implemented and Why​

Baseline JSON storage​

Hash-based comparison​

Multiple algorithms​

Include and exclude patterns​

Default excludes​

CSV report export​

HTML report export​

Watch mode​

Colored terminal output​

Security Relevance​

Possible Improvements​

Table of Contents

Project Overview

Why This Project Matters

Features

Tech Stack

How It Works

1. Baseline Creation

2. Integrity Check

3. Reporting

4. Watch Mode

Prerequisites

Quickstart

How to Run the Tool

1. Create a Virtual Environment

2. Install Dependencies

3. Create a Test Folder

4. Create a Baseline

5. Run an Integrity Check

6. Simulate File Changes

7. Run the Check Again

8. Generate Reports

9. Run Watch Mode

Usage

Create a Baseline

Check Files Against the Baseline

Generate an HTML Report

Generate a CSV Report

Use Include Patterns

Use Exclude Patterns

Run Continuous Monitoring

Reports

Screenshots

1. Baseline Creation

2. Integrity Check – No Changes Detected

3. Integrity Check – Changes Detected

4. HTML Report

5. Watch Mode

What Was Implemented and Why

Baseline JSON storage

Hash-based comparison

Multiple algorithms

Include and exclude patterns

Default excludes

CSV report export

HTML report export

Watch mode

Colored terminal output

Security Relevance

Possible Improvements