Explainable Smart Contract Vulnerability Detection
What this project does: This is an Explainable AI (XAI) framework that analyzes Solidity smart contracts to detect security vulnerabilities and provides human-readable explanations for why vulnerabilities were detected.
How It Works
The system uses a two-stage approach for vulnerability detection and explanation:
1. Vulnerability Classification
A trained machine learning classifier (TF-IDF + Logistic Regression) analyzes the contract source code and predicts which vulnerability types are present, with confidence scores for each label (e.g., reentrancy, integer-overflow, unchecked-external-call, etc.).
2. Explainability Layer
The system provides three types of explanations to help developers understand why a vulnerability was detected:
- SHAP explanations: Token-level importance scores showing which code tokens (words, symbols) most contributed to the prediction.
- LIME explanations: Alternative token importance scores using a different explainability algorithm for comparison.
- Natural language rationale: A human-readable explanation generated either:
- Template-based (instant): Automatically generated from SHAP/LIME top tokens—provides immediate feedback.
- Llama2-generated (optional): Uses a local Llama2-7b model to generate more nuanced explanations (slower, requires model files).
3. User Workflow
- User inputs a Solidity contract via the React frontend (paste text or upload a
.solfile). - Frontend sends analysis request to the FastAPI backend with optional toggles for SHAP/LIME explanations and Llama2 rationale.
- Backend processes the contract:
- Runs classifier inference to get vulnerability predictions
- If
explain=true, generates SHAP and LIME token importance scores - Generates rationale (template or Llama2 based on toggles)
- Saves the complete analysis as a JSON run in
runs/
- Frontend displays results: Shows top predictions with confidence bars, explanation token tables, and rationale text. Users can export results as PDF or JSON.
Key Technologies
- React (Frontend Framework): A JavaScript library for building user interfaces. React allows us to create interactive, component-based UIs where the frontend state automatically updates when data changes. The frontend uses React to:
- Render the contract input form (paste/upload)
- Display analysis results with interactive charts and tables
- Manage routing between different pages (Analysis, Model, Vocabulary, Runs)
- Handle API calls to the backend and display responses
Why React? React provides efficient rendering, reusable components, and a large ecosystem of tools. It's ideal for building modern web applications with complex UI interactions.
- Python + FastAPI (Backend Framework): Python is the programming language used for the backend, and FastAPI is a modern web framework for building APIs. The backend uses:
- FastAPI: Provides automatic API documentation (Swagger/OpenAPI), type validation, and async support. Handles HTTP requests and responses.
- scikit-learn: Machine learning library for TF-IDF vectorization and Logistic Regression classification.
- SHAP & LIME: Explainability libraries that provide token-level feature importance for model predictions.
- transformers: Hugging Face library for loading and running local Llama2 model.
Why Python? Python has excellent ML/AI libraries (scikit-learn, transformers), great data processing capabilities, and is widely used in research and production ML systems.
- Llama2 (Large Language Model): Llama2 is an open-source large language model (LLM) developed by Meta. In this project, we use Llama2-7b-hf (7 billion parameters, Hugging Face format) to generate natural language explanations for vulnerability predictions.
- What is Llama2? It's a transformer-based neural network trained on vast amounts of text data. It can understand context and generate human-readable text.
- Why use Llama2 here? It converts technical SHAP/LIME token scores into plain-language explanations that developers can easily understand (e.g., "This contract has a reentrancy vulnerability because...").
- Local deployment: The model runs entirely on your machine (CPU or GPU), no internet connection required. Model files are stored in
Llama2-7b-hf/directory. - Performance: CPU inference is slow (30-60 seconds), so we've added timeouts and optimizations. The system also provides instant template-based rationale as a faster alternative.
- ML Pipeline: TF-IDF vectorization + Logistic Regression (fast on CPU, interpretable). TF-IDF converts code text into numerical features, and Logistic Regression predicts vulnerability probabilities.
- XAI (Explainable AI): SHAP and LIME for token-level feature importance showing which code tokens contributed most to predictions.
How to Create a React Project (for Reference)
This project's frontend was created using create-react-app. Here's how to set up a similar React project from scratch:
- Install Node.js: Download and install Node.js (v14+) from nodejs.org.
- Create React app:
npx create-react-app frontend-name cd frontend-name npm start - Project structure: The generated project includes:
src/- Source code (components, pages, services)public/- Static files (index.html, assets)package.json- Dependencies and scriptsnode_modules/- Installed packages (auto-generated)
- Key React concepts used:
- Components: Reusable UI pieces (e.g.,
Sidebar.js,AnalysisPage.js) - State: React's
useStatehook manages component data (e.g., user input, API results) - Effects: React's
useEffecthook handles side effects (e.g., API calls, polling) - Props: Data passed from parent to child components
- Components: Reusable UI pieces (e.g.,
- Development server: Running
npm startstarts a development server at http://localhost:3000 with hot-reload (changes appear immediately). - Production build: Run
npm run buildto create an optimized production build in thebuild/folder.
Note: This project's frontend is already set up. You only need to run npm install and npm start in the frontend/ directory to use it.
Data Flow
Raw CSV datasets are converted to JSONL format, split into train/val/test sets, and used to train the classifier. The trained model artifacts are saved and loaded by the API server on startup. Each analysis run is persisted as JSON for later review and export.
See the Test Data & Sample Contracts Project Workflow, Dataset & Training, Llama2 Connection, and System Architecture sections below for detailed diagrams and explanations.
Setup & Run
Backend
cd "RP 44"
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt
cd backend
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Frontend
cd "RP 44/frontend"
npm install
npm start
Training
source .venv/bin/activate
python backend/scripts/convert_sc_csv.py --in_csv "data/SC_Vuln_8label.csv" --dataset_name sc_vuln_8label --out_dir data/processed
python backend/scripts/convert_sc_csv.py --in_csv "data/SC_4label.csv" --dataset_name sc_4label --out_dir data/processed
python backend/scripts/train_classifier.py \
--train_jsonl data/processed/sc_vuln_8label.train.jsonl \
--val_jsonl data/processed/sc_vuln_8label.val.jsonl \
--test_jsonl data/processed/sc_vuln_8label.test.jsonl \
--artifacts_dir backend/artifacts
Outputs: backend/artifacts/tfidf_lr.joblib, training_report.json, vocabulary.json
Test Data & Sample Contracts
The project includes sample/test data you can use to quickly test the vulnerability detection system without needing your own Solidity contracts.
Sample Contract Provided
A sample Solidity contract is included at examples/sample_contract.sol. This contract contains intentionally vulnerable code patterns to demonstrate how the system detects different vulnerability types.
Vulnerabilities in the Sample Contract:
- Reentrancy vulnerability: The
withdraw()function updates balance after external call, allowing potential reentrancy attacks. - Integer overflow: The
add()function performs unchecked addition (though Solidity 0.8+ handles overflow by default, it demonstrates the pattern). - Missing access control: The
mint()function in TokenContract allows anyone to mint tokens without authorization checks. - Front-running vulnerability: The
transfer()function pattern can be exploited via front-running attacks.
How to Use the Sample Contract:
- Open the Analysis page in the React frontend (http://localhost:3000).
- Choose "Paste" mode (or upload the file).
- Copy the contents of
examples/sample_contract.soland paste into the text area, OR click "Choose File" and selectexamples/sample_contract.sol. - Enable explanations: Toggle "SHAP & LIME Explanations" to see token-level importance scores.
- Optional: Enable "Llama2 Rationale" for natural language explanations (slower).
- Click "Analyze" to see vulnerability predictions, confidence scores, and explanations.
Sample Contract Code:
// Sample Solidity Contract for Testing Vulnerability Detection
// This contract demonstrates common vulnerabilities
pragma solidity ^0.8.0;
contract SimpleStorage {
uint256 private storedData;
address public owner;
mapping(address => uint256) public balances;
constructor() {
owner = msg.sender;
storedData = 0;
}
// Potential reentrancy vulnerability example
function withdraw() public {
uint256 amount = balances[msg.sender];
require(amount > 0, "No balance");
balances[msg.sender] = 0;
(bool success, ) = msg.sender.call{value: amount}("");
require(success, "Transfer failed");
}
// Potential integer overflow (now handled by SafeMath in 0.8+)
function add(uint256 a, uint256 b) public pure returns (uint256) {
return a + b;
}
// Setter function
function set(uint256 x) public {
storedData = x;
}
// Getter function
function get() public view returns (uint256) {
return storedData;
}
}
// Example with potential access control issues
contract TokenContract {
mapping(address => uint256) balances;
address public owner;
constructor() {
owner = msg.sender;
}
// Missing access control - anyone can mint
function mint(address to, uint256 amount) public {
balances[to] += amount;
}
// Potential front-running vulnerability
function transfer(address to, uint256 amount) public {
require(balances[msg.sender] >= amount, "Insufficient balance");
balances[msg.sender] -= amount;
balances[to] += amount;
}
}
Creating Your Own Test Contracts
You can create additional test contracts by:
- Copying real contracts from Etherscan: Visit Etherscan, find verified contracts, and copy their source code.
- Using OpenZeppelin examples: The project includes OpenZeppelin contracts in
openzeppelin-contracts-master/that you can test. - Writing minimal test cases: Create small Solidity files focusing on specific vulnerability patterns.
Analysis Run Examples
The runs/ directory contains saved analysis results (JSON files) from previous runs. You can:
- View run history: Go to the "Runs" page in the frontend to see all saved analyses.
- Open specific runs: Click on any run to view its full results, explanations, and rationale.
- Export runs: Download runs as PDF or JSON for offline review.
- Use as test cases: The JSON structure in runs can serve as examples of the API response format.
Note: The training datasets (data/SC_Vuln_8label.csv, data/SC_4label.csv) are real datasets used to train the classifier, not dummy data.
Project Workflow
Goal: analyze a Solidity contract, predict vulnerability labels, generate explainability (SHAP/LIME), optionally generate a natural-language rationale (template or Llama2), and save the run for export/review.
Analysis workflow diagram
Download draw.io file: docs/diagrams/analysis_workflow.drawio
Dataset & Training
Input formats: raw CSVs are converted to a unified JSONL schema (id, source, labels, optional meta/spans). The classifier is trained from JSONL splits.
Dataset pipeline diagram
Download draw.io file: docs/diagrams/dataset_pipeline.drawio
Training outputs
- Model artifact:
backend/artifacts/tfidf_lr.joblib - Training report:
backend/artifacts/training_report.json(accuracy + classification report) - Vocabulary map:
backend/artifacts/vocabulary.json(word → index)
How Llama2 is Connected
The backend loads the local model from Llama2-7b-hf/ using transformers with local_files_only=True. The model is optional and used only for rationale generation when enabled.
- Path wiring:
backend/app/settings.pyresolvesllama_model_pathtoRP 44/Llama2-7b-hf. - Loader:
backend/app/llm.pyvalidates shards and loads tokenizer+model. - Inference guardrails (CPU): small input length, small max tokens, and a hard timeout so requests cannot hang indefinitely.
- Fallback: when SHAP/LIME is enabled, the system can generate an instant template rationale without Llama2.
System Architecture
The system has a React UI, a FastAPI backend, a fast classical classifier, SHAP/LIME explainability, and an optional local Llama2 rationale generator. All results are persisted as runs for export/review.
Architecture diagram
Download draw.io file: docs/diagrams/system_architecture.drawio
Key Snippets: backend/app/classifier.py
TF-IDF + LogisticRegression: training, vocabulary map generation, reports, and model loading helpers.
Training pipeline + progress bars + report payload
def train_multiclass(
train_jsonl: str,
artifacts_path: str,
max_features: int = 50000,
val_jsonl: Optional[str] = None,
test_jsonl: Optional[str] = None,
) -> Tuple[Pipeline, List[str], Dict[str, Any]]:
print("Loading training data...")
X_train: List[str] = []
y_train: List[str] = []
examples = list(read_jsonl(train_jsonl))
for ex in tqdm(examples, desc="Loading train"):
X_train.append(ex.source)
y_train.append(ex.labels[0] if ex.labels else "unknown")
print(f"Training samples: {len(X_train)}")
labels = sorted(set(y_train))
print(f"Labels: {labels}")
print("\nBuilding vocabulary...")
vocab_map = build_vocabulary(X_train)
print(f"Vocabulary size: {len(vocab_map)}")
print("\nTraining classifier...")
pipe: Pipeline = Pipeline(
steps=[
("tfidf", TfidfVectorizer(ngram_range=(1, 2), max_features=max_features, min_df=2)),
("clf", LogisticRegression(max_iter=2000, n_jobs=1, verbose=1)),
]
)
pipe.fit(X_train, y_train)
report: Dict[str, Any] = {
"train_samples": len(X_train),
"vocabulary_size": len(vocab_map),
"labels": labels,
"label_distribution": dict(Counter(y_train)),
}
if val_jsonl:
print("\nEvaluating on validation set...")
X_val: List[str] = []
y_val: List[str] = []
for ex in tqdm(read_jsonl(val_jsonl), desc="Loading val"):
X_val.append(ex.source)
y_val.append(ex.labels[0] if ex.labels else "unknown")
y_val_pred = pipe.predict(X_val)
val_acc = accuracy_score(y_val, y_val_pred)
report["val_accuracy"] = float(val_acc)
report["val_samples"] = len(X_val)
report["val_classification_report"] = classification_report(
y_val, y_val_pred, output_dict=True, zero_division=0
)
print(f"Validation accuracy: {val_acc:.4f}")
if test_jsonl:
print("\nEvaluating on test set...")
X_test: List[str] = []
y_test: List[str] = []
for ex in tqdm(read_jsonl(test_jsonl), desc="Loading test"):
X_test.append(ex.source)
y_test.append(ex.labels[0] if ex.labels else "unknown")
y_test_pred = pipe.predict(X_test)
test_acc = accuracy_score(y_test, y_test_pred)
report["test_accuracy"] = float(test_acc)
report["test_samples"] = len(X_test)
report["test_classification_report"] = classification_report(
y_test, y_test_pred, output_dict=True, zero_division=0
)
print(f"Test accuracy: {test_acc:.4f}")
p = Path(artifacts_path)
p.parent.mkdir(parents=True, exist_ok=True)
joblib.dump({"pipeline": pipe, "labels": labels, "vocab_map": vocab_map}, p)
return pipe, labels, report
Artifacts loading and top-k helper
def load_classifier(artifacts_path: str) -> Tuple[Pipeline, List[str], Optional[Dict[str, int]]]:
obj = joblib.load(artifacts_path)
vocab_map = obj.get("vocab_map")
return obj["pipeline"], obj["labels"], vocab_map
def predict_proba(pipe: Pipeline, text: str) -> Dict[str, float]:
probs = pipe.predict_proba([text])[0]
classes = list(pipe.classes_)
return {cls: float(p) for cls, p in zip(classes, probs)}
def topk(probs: Dict[str, float], k: int = 3) -> List[Tuple[str, float]]:
return sorted(probs.items(), key=lambda kv: kv[1], reverse=True)[:k]
Key Snippets: backend/app/explain.py
SHAP + LIME integration: wraps the classifier into explainers and returns token importance weights.
LIME explanation wrapper
def lime_explain(pipe: Pipeline, text: str, top_label: str, num_features: int = 20) -> Dict[str, Any]:
classes = list(pipe.classes_)
explainer = lime.lime_text.LimeTextExplainer(class_names=classes)
exp = explainer.explain_instance(text, pipe.predict_proba, num_features=num_features, labels=[classes.index(top_label)])
weights = exp.as_list(label=classes.index(top_label))
return {"label": top_label, "weights": [{"token": t, "weight": float(w)} for t, w in weights]}
SHAP explanation wrapper
def shap_explain(pipe: Pipeline, text: str, top_label: str, max_evals: int = 200) -> Dict[str, Any]:
classes = list(pipe.classes_)
masker = shap.maskers.Text()
explainer = shap.Explainer(pipe.predict_proba, masker, output_names=classes)
sv = explainer([text], max_evals=max_evals)
idx = classes.index(top_label)
tokens = list(sv.data[0])
vals = list(sv.values[0][:, idx])
pairs: List[Tuple[str, float]] = [(t, float(v)) for t, v in zip(tokens, vals)]
pairs = sorted(pairs, key=lambda x: abs(x[1]), reverse=True)[:50]
return {"label": top_label, "weights": [{"token": t, "value": v} for t, v in pairs]}
Key Snippets: backend/app/llm.py
Local Llama2 wrapper: validates model files, loads with transformers, and generates text with CPU guardrails (timeout).
Model file checks + local loading
class Llama2Service:
def __init__(self, model_path: str, device: str = "cpu") -> None:
self.model_path = str(Path(model_path))
self.device = device
# Check if model directory exists
model_dir = Path(self.model_path)
if not model_dir.exists():
raise Llama2ModelError(f"Model directory not found: {self.model_path}")
# Check if required model files exist
config_file = model_dir / "config.json"
if not config_file.exists():
raise Llama2ModelError(f"Model config.json not found in {self.model_path}")
# Check for model weight files
index_file = model_dir / "pytorch_model.bin.index.json"
if index_file.exists():
# Check if all shard files exist
with open(index_file, "r") as f:
index_data = json.load(f)
weight_map = index_data.get("weight_map", {})
shard_files = set(weight_map.values())
missing_shards = []
for shard_file in shard_files:
shard_path = model_dir / shard_file
if not shard_path.exists():
missing_shards.append(shard_file)
if missing_shards:
raise Llama2ModelError(
f"Missing model shard files: {', '.join(missing_shards)}. "
f"Please ensure all model files are present in {self.model_path}"
)
else:
# Check for single model file
single_model_file = model_dir / "pytorch_model.bin"
if not single_model_file.exists():
raise Llama2ModelError(
f"Model weight file not found. Expected either pytorch_model.bin or "
f"pytorch_model.bin.index.json with shard files in {self.model_path}"
)
try:
self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, local_files_only=True)
self.model = AutoModelForCausalLM.from_pretrained(
self.model_path,
local_files_only=True,
torch_dtype=torch.float16 if device != "cpu" else torch.float32,
low_cpu_mem_usage=True,
)
self.model.to(device)
self.model.eval()
if device == "cpu":
try:
import torch.quantization as quantization
self.model = torch.quantization.quantize_dynamic(
self.model,
{torch.nn.Linear},
dtype=torch.qint8
)
except Exception as qe:
pass
except FileNotFoundError as e:
raise Llama2ModelError(
f"Failed to load Llama2 model: {str(e)}. "
f"Please ensure all model files are present in {self.model_path}"
) from e
except Exception as e:
raise Llama2ModelError(f"Failed to load Llama2 model: {str(e)}") from e
CPU speed controls: quantization + max tokens + timeout
if device == "cpu":
try:
import torch.quantization as quantization
self.model = torch.quantization.quantize_dynamic(
self.model,
{torch.nn.Linear},
dtype=torch.qint8
)
except Exception as qe:
pass
except FileNotFoundError as e:
raise Llama2ModelError(
f"Failed to load Llama2 model: {str(e)}. "
f"Please ensure all model files are present in {self.model_path}"
) from e
except Exception as e:
raise Llama2ModelError(f"Failed to load Llama2 model: {str(e)}") from e
def _generate_internal(self, inputs: dict, max_new_tokens: int) -> torch.Tensor:
with torch.inference_mode():
return self.model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
temperature=1.0,
num_beams=1,
pad_token_id=self.tokenizer.eos_token_id,
)
def generate(self, prompt: str, max_new_tokens: int = 50, timeout_seconds: int = 60) -> str:
inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=256)
inputs = {k: v.to(self.device) for k, v in inputs.items()}
max_new_tokens = min(max_new_tokens, 50)
result_container = {"output": None, "error": None}
def generate_with_timeout():
try:
out = self._generate_internal(inputs, max_new_tokens)
result_container["output"] = out
except Exception as e:
result_container["error"] = e
thread = threading.Thread(target=generate_with_timeout)
thread.daemon = True
thread.start()
thread.join(timeout=timeout_seconds)
if thread.is_alive():
raise TimeoutError(f"Llama2 generation exceeded {timeout_seconds} seconds timeout. This is expected on CPU - consider using GPU or disabling Llama2 rationale.")
if result_container["error"]:
raise result_container["error"]
if result_container["output"] is None:
raise TimeoutError(f"Llama2 generation did not complete within {timeout_seconds} seconds")
out = result_container["output"]
text = self.tokenizer.decode(out[0], skip_special_tokens=True)
prompt_text = self.tokenizer.decode(inputs['input_ids'][0], skip_special_tokens=True)
if text.startswith(prompt_text):
text = text[len(prompt_text):].strip()
text = text.strip()
prefixes = ["Rationale:", "rationale:", "Explanation:", "explanation:"]
for prefix in prefixes:
if text.lower().startswith(prefix.lower()):
text = text[len(prefix):].strip()
text = text.lstrip(": -")
return text
Key Snippets: backend/app/main.py
FastAPI entrypoint: defines API endpoints, CORS, inference flow, explanations, rationale, and run persistence.
/analyze endpoint (prediction + explanations + rationale)
@app.post("/analyze")
def analyze(
source: Optional[str] = Form(default=None),
file: Optional[UploadFile] = File(default=None),
explain: bool = Form(default=True),
use_llama_rationale: bool = Form(default=False),
max_chars: int = Form(default=12000),
) -> dict:
t0 = time.time()
if source is None and file is None:
return {"error": "Provide source or file"}
if source is None and file is not None:
source = file.file.read().decode("utf-8", errors="replace")
assert source is not None
if max_chars and len(source) > max_chars:
source = source[:max_chars]
pipe, _labels, _vocab = get_classifier()
proba = pipe.predict_proba([source])[0]
classes = list(pipe.classes_)
probs = {cls: float(p) for cls, p in zip(classes, proba)}
top = topk(probs, k=3)
top_label = top[0][0] if top else classes[0]
explanations = {}
if explain:
explanations["lime"] = lime_explain(pipe, source, top_label=top_label)
explanations["shap"] = shap_explain(pipe, source, top_label=top_label)
rationale = None
rationale_error = None
rationale_source = None
if explain and (explanations.get("shap") or explanations.get("lime")):
top_confidence = top[0][1] if top else probs.get(top_label, 0.0)
rationale = generate_template_rationale(
top_label=top_label,
confidence=top_confidence,
shap_explanations=explanations.get("shap"),
lime_explanations=explanations.get("lime"),
)
rationale_source = "template"
if use_llama_rationale:
try:
llm = get_llm()
source_preview = source[:1000]
shap_top = ""
if explain and explanations.get("shap") and explanations["shap"].get("weights"):
top_tokens = sorted(explanations["shap"]["weights"], key=lambda x: abs(x.get("value", 0)), reverse=True)[:3]
shap_top = ", ".join([f"{t['token']}({t['value']:.2f})" for t in top_tokens])
prompt = f"Vulnerability: {top_label}\nCode: {source_preview[:500]}\nExplain why in 2 sentences:\n"
llama_rationale = llm.generate(prompt, max_new_tokens=50, timeout_seconds=60)
rationale = llama_rationale
rationale_source = "llama2"
except TimeoutError as e:
rationale_error = str(e)
except Llama2ModelError as e:
rationale_error = str(e)
except Exception as e:
rationale_error = f"Failed to generate rationale: {str(e)}"
dt = time.time() - t0
run_data = {
"input": {"source_len": len(source)},
"prediction": {"probs": probs, "top": top},
"explanations": explanations,
"metrics": {"runtime_sec": dt},
}
if rationale is not None:
run_data["rationale"] = rationale
if rationale_source:
run_data["rationale_source"] = rationale_source
if rationale_error:
run_data["rationale_error"] = rationale_error
run = save_run(settings.runs_dir, run_data)
return run
Artifact endpoints (/model/info, /training/report, /vocabulary)
@app.get("/model/info")
def model_info() -> dict:
artifacts_path = Path(settings.artifacts_dir) / "tfidf_lr.joblib"
if not artifacts_path.exists():
return {"error": "Model not found. Train a model first."}
pipe, labels, vocab = load_classifier(str(artifacts_path))
return {
"labels": labels,
"num_labels": len(labels),
"vocab_size": len(vocab) if vocab else None,
"model_type": "TF-IDF + LogisticRegression",
}
@app.get("/training/report")
def training_report() -> dict:
report_path = Path(settings.artifacts_dir) / "training_report.json"
if not report_path.exists():
return {"error": "Training report not found. Train a model first."}
return json.loads(report_path.read_text(encoding="utf-8"))
@app.get("/vocabulary")
def vocabulary(limit: int = 1000) -> dict:
vocab_path = Path(settings.artifacts_dir) / "vocabulary.json"
if not vocab_path.exists():
return {"error": "Vocabulary not found. Train a model first."}
vocab = json.loads(vocab_path.read_text(encoding="utf-8"))
items = list(vocab.items())[:limit]
return {
"total_size": len(vocab),
"items": [{"word": word, "index": idx} for word, idx in items],
}
Key Snippets: backend/app/rationale.py
Instant template rationale: converts SHAP/LIME top tokens into a human-readable explanation without LLM latency.
Instant template rationale from SHAP/LIME
def generate_template_rationale(
top_label: str,
confidence: float,
shap_explanations: Optional[Dict[str, Any]] = None,
lime_explanations: Optional[Dict[str, Any]] = None,
) -> str:
top_tokens = []
if shap_explanations and shap_explanations.get("weights"):
top_tokens = sorted(
shap_explanations["weights"],
key=lambda x: abs(x.get("value", 0)),
reverse=True
)[:5]
elif lime_explanations and lime_explanations.get("weights"):
top_tokens = sorted(
lime_explanations["weights"],
key=lambda x: abs(x.get("weight", 0)),
reverse=True
)[:5]
confidence_pct = int(confidence * 100)
if top_tokens:
token_descriptions = []
for token_info in top_tokens[:3]:
token = token_info.get("token", "") or token_info.get("text", "")
value = token_info.get("value") or token_info.get("weight", 0)
if abs(value) > 0.01:
token_descriptions.append(f'"{token}"')
tokens_str = ", ".join(token_descriptions) if token_descriptions else "various code patterns"
rationale = (
f"This contract has been classified as {top_label} with {confidence_pct}% confidence. "
f"The key indicators include code patterns like {tokens_str}, which are commonly associated with this vulnerability type. "
f"These patterns suggest potential security risks that should be reviewed and addressed."
)
else:
rationale = (
f"This contract has been classified as {top_label} with {confidence_pct}% confidence. "
f"The classification is based on patterns detected in the code that are characteristic of this vulnerability type. "
f"Please review the code carefully to identify and mitigate potential security risks."
)
return rationale
Key Snippets: backend/scripts/train_classifier.py
CLI training script: trains model, writes artifacts, and saves training report + vocabulary.
Writes training_report.json and vocabulary.json
import argparse
import json
import sys
from pathlib import Path
import joblib
sys.path.insert(0, str(Path(__file__).parent.parent))
from app.classifier import train_multiclass
def main() -> None:
ap = argparse.ArgumentParser()
ap.add_argument("--train_jsonl", required=True)
ap.add_argument("--artifacts_dir", required=True)
ap.add_argument("--val_jsonl", default=None)
ap.add_argument("--test_jsonl", default=None)
ap.add_argument("--max_features", type=int, default=50000)
args = ap.parse_args()
artifacts_dir = Path(args.artifacts_dir)
artifacts_dir.mkdir(parents=True, exist_ok=True)
out_path = str(artifacts_dir / "tfidf_lr.joblib")
vocab_path = artifacts_dir / "vocabulary.json"
report_path = artifacts_dir / "training_report.json"
pipe, labels, report = train_multiclass(
args.train_jsonl,
out_path,
max_features=args.max_features,
val_jsonl=args.val_jsonl,
test_jsonl=args.test_jsonl,
)
saved_obj = joblib.load(out_path)
vocab_obj = saved_obj.get("vocab_map", {})
if vocab_obj:
with vocab_path.open("w", encoding="utf-8") as f:
json.dump(vocab_obj, f, indent=2, ensure_ascii=False)
print(f"\nVocabulary saved to: {vocab_path} ({len(vocab_obj)} words)")
else:
print("\nWarning: Vocabulary map not found in saved artifacts")
with report_path.open("w", encoding="utf-8") as f:
json.dump(report, f, indent=2, ensure_ascii=False)
print(f"Training report saved to: {report_path}")
print("\nTraining completed successfully!")
if __name__ == "__main__":
main()
Key Snippets: frontend/src/App.js
Top-level React app: sidebar layout, page routing, backend health polling.
Health polling + routing
const DEFAULT_API_BASE = process.env.REACT_APP_API_BASE || 'http://localhost:8000';
function App() {
const [activePage, setActivePage] = useState('analysis');
const [apiBase, setApiBase] = useState(DEFAULT_API_BASE);
const [health, setHealth] = useState({ status: 'checking', lastCheck: null });
useEffect(() => {
let isMounted = true;
const check = async () => {
try {
const status = await checkHealth(apiBase);
if (isMounted) {
setHealth({ status: status ? 'ok' : 'error', lastCheck: new Date() });
}
} catch (error) {
if (isMounted) {
setHealth({ status: 'error', lastCheck: new Date() });
}
}
};
check();
const interval = setInterval(check, 5000);
return () => {
isMounted = false;
clearInterval(interval);
};
}, [apiBase]);
const renderPage = () => {
switch (activePage) {
case 'analysis':
return <AnalysisPage apiBase={apiBase} health={health} />;
case 'model':
return <ModelPage apiBase={apiBase} health={health} />;
case 'vocabulary':
return <VocabularyPage apiBase={apiBase} health={health} />;
case 'runs':
return <RunsPage apiBase={apiBase} health={health} />;
default:
return <AnalysisPage apiBase={apiBase} health={health} />;
}
};
return (
<div className="app">
<Sidebar
activePage={activePage}
onPageChange={setActivePage}
health={health}
apiBase={apiBase}
onApiBaseChange={setApiBase}
/>
<main className="main-content">
{renderPage()}
</main>
</div>
);
Key Snippets: frontend/src/components/Sidebar.js
Navigation + backend connection status + API base URL input.
Connection indicator + API base URL override
function Sidebar({ activePage, onPageChange, health, apiBase, onApiBaseChange }) {
const pages = [
{ id: 'analysis', name: 'Analysis', description: 'Analyze contracts' },
{ id: 'model', name: 'Model', description: 'Model information' },
{ id: 'vocabulary', name: 'Vocabulary', description: 'View vocabulary' },
{ id: 'runs', name: 'Runs', description: 'Analysis history' },
];
const getStatusColor = () => {
if (health.status === 'ok') return 'ok';
if (health.status === 'error') return 'error';
return '';
};
return (
<aside className="sidebar">
<div className="sidebar-header">
<div className="sidebar-logo">SCXAI</div>
<div className="status-indicator">
<span className={`status-dot ${getStatusColor()}`}></span>
<span>{health.status === 'ok' ? 'Connected' : health.status === 'error' ? 'Offline' : 'Checking'}</span>
</div>
</div>
<nav className="sidebar-nav">
{pages.map((page) => (
<div
key={page.id}
className={`nav-item ${activePage === page.id ? 'active' : ''}`}
onClick={() => onPageChange(page.id)}
>
<div>
<div className="nav-item-name">{page.name}</div>
<div style={{ fontSize: '12px', color: 'var(--color-text-tertiary)', marginTop: '2px' }}>
{page.description}
</div>
</div>
</div>
))}
</nav>
<div className="sidebar-footer">
<div style={{ marginBottom: 'var(--spacing-sm)' }}>
<label style={{ fontSize: '11px', color: 'var(--color-text-tertiary)', display: 'block', marginBottom: '4px' }}>
API Base URL
</label>
<input
type="text"
className="input"
value={apiBase}
onChange={(e) => onApiBaseChange(e.target.value)}
style={{ fontSize: '12px', padding: '6px 8px' }}
/>
</div>
<div style={{ fontSize: '11px', color: 'var(--color-text-tertiary)' }}>
{apiBase}
</div>
</div>
</aside>
Key Snippets: frontend/src/pages/AnalysisPage.js
Main workflow: paste/upload contract, call /analyze, render predictions, rationale, and SHAP/LIME weights.
UI state: toggles + analyze request
function AnalysisPage({ apiBase, health }) {
const [mode, setMode] = useState('paste');
const [source, setSource] = useState('');
const [file, setFile] = useState(null);
const [explain, setExplain] = useState(true);
const [useLlamaRationale, setUseLlamaRationale] = useState(false);
const [maxChars, setMaxChars] = useState(12000);
const [busy, setBusy] = useState(false);
const [result, setResult] = useState(null);
const [error, setError] = useState(null);
const [explanationTab, setExplanationTab] = useState('shap');
const canRun = health.status === 'ok' && (source.trim() || file) && !busy;
const topPrediction = useMemo(() => {
if (!result?.prediction?.top?.[0]) return null;
return result.prediction.top[0];
}, [result]);
const sortedProbs = useMemo(() => {
if (!result?.prediction?.probs) return [];
return Object.entries(result.prediction.probs)
.sort((a, b) => b[1] - a[1])
.slice(0, 10);
}, [result]);
const handleFileChange = (e) => {
const selectedFile = e.target.files?.[0];
if (selectedFile) {
setFile(selectedFile);
setMode('upload');
setSource('');
}
};
const handleAnalyze = async () => {
setBusy(true);
setError(null);
setResult(null);
try {
const data = await analyzeContract(apiBase, {
source: mode === 'paste' ? source : undefined,
file: mode === 'upload' ? file : undefined,
explain,
useLlamaRationale,
maxChars,
});
setResult(data);
} catch (err) {
setError(err.message || 'Analysis failed');
} finally {
setBusy(false);
}
};
Explanation toggles (SHAP/LIME + Llama2)
<div style={{ marginTop: 'var(--spacing-lg)', paddingTop: 'var(--spacing-lg)', borderTop: '1px solid var(--color-border)' }}>
<div style={{ marginBottom: 'var(--spacing-md)' }}>
<div className="flex items-center justify-between" style={{ marginBottom: 'var(--spacing-sm)' }}>
<label className="text-primary" style={{ fontWeight: 600 }}>
SHAP & LIME Explanations
</label>
<div
className={`toggle ${explain ? 'active' : ''}`}
onClick={() => !busy && setExplain(!explain)}
>
<div className="toggle-thumb"></div>
</div>
</div>
<div className="text-secondary" style={{ fontSize: '13px' }}>
Generate token-level importance scores for predictions
</div>
</div>
<div>
<div className="flex items-center justify-between" style={{ marginBottom: 'var(--spacing-sm)' }}>
<label className="text-primary" style={{ fontWeight: 600 }}>
Llama2 Rationale
</label>
<div
className={`toggle ${useLlamaRationale ? 'active' : ''}`}
onClick={() => !busy && setUseLlamaRationale(!useLlamaRationale)}
>
<div className="toggle-thumb"></div>
</div>
</div>
<div className="text-secondary" style={{ fontSize: '13px' }}>
Generate natural language explanation (CPU intensive - may take 1-3 minutes)
</div>
</div>
Rationale / error rendering
{(result.rationale || result.rationale_error) && (
<div className="card" style={{ marginTop: 'var(--spacing-lg)' }}>
<div className="card-header">
<div>
<h3 className="card-title">AI Explanation (Llama2)</h3>
<div className="card-subtitle">
{result.rationale_error
? 'Llama2 model encountered an error'
: 'Human-readable explanation generated by Llama2'}
</div>
</div>
</div>
{result.rationale_error ? (
<div style={{ padding: 'var(--spacing-md)' }}>
<div className="alert alert-error">
<div style={{ fontWeight: 600, marginBottom: 'var(--spacing-sm)', fontSize: '15px' }}>
⚠️ Llama2 Model Not Available
</div>
<div style={{ fontSize: '14px', lineHeight: 1.7, marginBottom: 'var(--spacing-md)' }}>
{result.rationale_error}
</div>
<div style={{
paddingTop: 'var(--spacing-md)',
borderTop: '1px solid var(--color-border)',
fontSize: '13px',
color: 'var(--color-text-secondary)',
lineHeight: 1.6
}}>
<strong>Note:</strong> The analysis completed successfully! You can still use the SHAP/LIME explanations shown above, which provide excellent insights into why the vulnerability was detected. The Llama2 feature is optional and requires all model files to be present.
</div>
</div>
</div>
) : result.rationale ? (
<div
style={{
padding: 'var(--spacing-lg)',
background: 'var(--color-surface-elevated)',
borderRadius: 'var(--radius-md)',
lineHeight: 1.8,
fontSize: '15px',
color: 'var(--color-text-primary)',
whiteSpace: 'pre-wrap',
fontFamily: '-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif',
}}
>
{result.rationale}
</div>
) : null}
</div>
Key Snippets: frontend/src/services/api.js
Centralized fetch wrappers for backend endpoints with friendly network error messages.
Backend health check + analyze request wiring
const DEFAULT_API_BASE = process.env.REACT_APP_API_BASE || 'http://localhost:8000';
export async function checkHealth(apiBase = DEFAULT_API_BASE) {
try {
const response = await fetch(`${apiBase}/health`);
return response.ok;
} catch (error) {
return false;
}
}
export async function analyzeContract(apiBase, { source, file, explain = true, useLlamaRationale = false, maxChars = 12000 }) {
const formData = new FormData();
if (file) {
formData.append('file', file);
} else if (source) {
formData.append('source', source);
}
formData.append('explain', String(explain));
formData.append('use_llama_rationale', String(useLlamaRationale));
formData.append('max_chars', String(maxChars));
try {
const response = await fetch(`${apiBase}/analyze`, {
method: 'POST',
body: formData,
});
if (!response.ok) {
const error = await response.json().catch(() => ({ error: `HTTP ${response.status}: ${response.statusText}` }));
throw new Error(error.error || `HTTP ${response.status}: ${response.statusText}`);
}
return await response.json();
} catch (error) {
// Re-throw with more context if it's a network error
if (error instanceof TypeError && error.message.includes('fetch')) {
throw new Error(`Failed to connect to backend at ${apiBase}. Make sure the backend server is running.`);
}
throw error;
}
Runs export PDF URL helper
export async function getRun(apiBase, runId) {
const response = await fetch(`${apiBase}/runs/${runId}`);
if (!response.ok) {
const error = await response.json().catch(() => ({ error: `HTTP ${response.status}` }));
throw new Error(error.error || `HTTP ${response.status}`);
}
return await response.json();
}
export function getRunPdfUrl(apiBase, runId) {
return `${apiBase}/runs/${runId}/pdf`;
}
Files (what each does)
All files below are part of the project. Only key files have code snippets above.
backend/app/__init__.pyPackage marker for backend app.
backend/app/classifier.pyTF-IDF + LogisticRegression: training, vocabulary map generation, reports, and model loading helpers.
See the key snippets section: backend/app/classifier.py
backend/app/dataset.pyJSONL reader/writer and schema helpers for contract examples.
backend/app/explain.pySHAP + LIME integration: wraps the classifier into explainers and returns token importance weights.
See the key snippets section: backend/app/explain.py
backend/app/labels.pyLabel normalization used during dataset conversion.
backend/app/llm.pyLocal Llama2 wrapper: validates model files, loads with transformers, and generates text with CPU guardrails (timeout).
See the key snippets section: backend/app/llm.py
backend/app/main.pyFastAPI entrypoint: defines API endpoints, CORS, inference flow, explanations, rationale, and run persistence.
See the key snippets section: backend/app/main.py
backend/app/rationale.pyInstant template rationale: converts SHAP/LIME top tokens into a human-readable explanation without LLM latency.
See the key snippets section: backend/app/rationale.py
backend/app/report.pyPDF report generation and run loading utility.
backend/app/runs.pyPersists each analysis run to runs/<uuid>.json.
backend/app/settings.pyCentral settings: resolves project-root paths for Llama2 model dir, artifacts, and runs.
backend/artifacts/training_report.jsonTraining metrics output (accuracy, reports, label distribution).
backend/artifacts/vocabulary.jsonVocabulary (word -> index) generated from training data.
backend/LLAMA2_PERFORMANCE.mdNotes about CPU performance and constraints for Llama2 rationale generation.
backend/requirements.txtPinned Python dependencies for FastAPI, ML pipeline, SHAP/LIME, and local Llama2 loading.
backend/scripts/convert_sc_csv.pyConverts provided CSV datasets into processed JSONL splits used for training/eval.
backend/scripts/prepare_dataset.pyDataset preparation helper (JSONL writing) for general workflows.
backend/scripts/train_classifier.pyCLI training script: trains model, writes artifacts, and saves training report + vocabulary.
See the key snippets section: backend/scripts/train_classifier.py
data/README.mdDefines the unified JSONL schema used by the backend training pipeline.
examples/README.mdHow to get Solidity code to test in the UI; includes sources like GitHub/Etherscan.
examples/sample_contract.solSample Solidity code for quick testing in the Analysis page.
frontend/package-lock.jsonExact npm dependency lockfile.
frontend/package.jsonFrontend dependencies and scripts.
frontend/public/index.htmlReact HTML template.
frontend/src/App.cssMain UI system styles (cards, typography, layout, toggles, loading states).
frontend/src/App.jsTop-level React app: sidebar layout, page routing, backend health polling.
See the key snippets section: frontend/src/App.js
frontend/src/components/Sidebar.jsNavigation + backend connection status + API base URL input.
See the key snippets section: frontend/src/components/Sidebar.js
frontend/src/index.cssGlobal CSS and theme variables.
frontend/src/index.jsReact entry: mounts <App/>.
frontend/src/pages/AnalysisPage.jsMain workflow: paste/upload contract, call /analyze, render predictions, rationale, and SHAP/LIME weights.
See the key snippets section: frontend/src/pages/AnalysisPage.js
frontend/src/pages/ModelPage.jsCalls /model/info and displays model type, label count, vocab size, and label list.
frontend/src/pages/RunsPage.jsCalls /runs and /runs/{id} to browse and reopen analysis history.
frontend/src/pages/VocabularyPage.jsCalls /vocabulary and displays the word map.
frontend/src/services/api.jsCentralized fetch wrappers for backend endpoints with friendly network error messages.
See the key snippets section: frontend/src/services/api.js
README.mdTop-level project overview and the main commands to convert datasets, train the classifier, and run backend/frontend.
runs/1d6faf53-7b39-4df2-ad35-625c03765061.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/25c41055-eb2d-483e-a980-93545f2d8a39.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/2e7bffe5-0ca3-4214-8251-e577e5bafd83.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/316275c0-50b8-43ee-9580-8ffa9a238234.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/39314a8d-d4c0-4417-993d-b66f2004ad24.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/4539ada5-ce07-400e-91d4-80db814b89ae.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/463667d9-ed8e-4085-acb1-4af4ace5cf66.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/53c8f704-43e6-4f74-bb81-e9891e9c0efe.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/5788a643-275c-4b6a-8f7c-fa99b154ffec.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/70f62db3-9648-413a-9011-a33ac73ce44e.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/7d97c7ae-6512-4980-9445-cf632a1409bb.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/7f5cd82e-6a29-4a1c-ae0c-f17fdf644e3e.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/8810c98e-f38b-45a5-b2c5-ab077faaac39.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/8f19caf4-2258-44e9-99c7-dd6b7a3df255.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/980610b8-13fe-471f-9465-e1243884c306.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/bf8daf6d-0fde-4e74-8799-12466be40de3.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/cb293717-ce7f-4156-aeee-3f7dddac4fb3.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/def29c4f-dbb1-45a7-ae2d-d06e67f34959.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/e745accc-1790-4345-93fc-a7608d4ea19c.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/f03fe598-d656-4941-8d59-608d8384d895.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/f1867f7b-396c-4a83-9429-03f580a2577a.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/f55f293b-c82c-46e5-a6b4-773b9131e2e7.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.
runs/fc177b50-b8c7-461d-905d-fdf5fa8e84a6.jsonSaved analysis output (one JSON per run). Contains prediction probabilities, optional explanations, runtime, and rationale fields.