Explainable Smart Contract Vulnerability Detection

What this project does: This is an Explainable AI (XAI) framework that analyzes Solidity smart contracts to detect security vulnerabilities and provides human-readable explanations for why vulnerabilities were detected.

How It Works

The system uses a two-stage approach for vulnerability detection and explanation:

1. Vulnerability Classification

A trained machine learning classifier (TF-IDF + Logistic Regression) analyzes the contract source code and predicts which vulnerability types are present, with confidence scores for each label (e.g., reentrancy, integer-overflow, unchecked-external-call, etc.).

2. Explainability Layer

The system provides three types of explanations to help developers understand why a vulnerability was detected:

SHAP explanations: Token-level importance scores showing which code tokens (words, symbols) most contributed to the prediction.
LIME explanations: Alternative token importance scores using a different explainability algorithm for comparison.
Natural language rationale: A human-readable explanation generated either:
- Template-based (instant): Automatically generated from SHAP/LIME top tokens—provides immediate feedback.
- Llama2-generated (optional): Uses a local Llama2-7b model to generate more nuanced explanations (slower, requires model files).

3. User Workflow

User inputs a Solidity contract via the React frontend (paste text or upload a .sol file).
Frontend sends analysis request to the FastAPI backend with optional toggles for SHAP/LIME explanations and Llama2 rationale.
Backend processes the contract:
- Runs classifier inference to get vulnerability predictions
- If explain=true, generates SHAP and LIME token importance scores
- Generates rationale (template or Llama2 based on toggles)
- Saves the complete analysis as a JSON run in runs/
Frontend displays results: Shows top predictions with confidence bars, explanation token tables, and rationale text. Users can export results as PDF or JSON.

Key Technologies

React (Frontend Framework): A JavaScript library for building user interfaces. React allows us to create interactive, component-based UIs where the frontend state automatically updates when data changes. The frontend uses React to:
- Render the contract input form (paste/upload)
- Display analysis results with interactive charts and tables
- Manage routing between different pages (Analysis, Model, Vocabulary, Runs)
- Handle API calls to the backend and display responses
Why React? React provides efficient rendering, reusable components, and a large ecosystem of tools. It's ideal for building modern web applications with complex UI interactions.
Python + FastAPI (Backend Framework): Python is the programming language used for the backend, and FastAPI is a modern web framework for building APIs. The backend uses:
- FastAPI: Provides automatic API documentation (Swagger/OpenAPI), type validation, and async support. Handles HTTP requests and responses.
- scikit-learn: Machine learning library for TF-IDF vectorization and Logistic Regression classification.
- SHAP & LIME: Explainability libraries that provide token-level feature importance for model predictions.
- transformers: Hugging Face library for loading and running local Llama2 model.
Why Python? Python has excellent ML/AI libraries (scikit-learn, transformers), great data processing capabilities, and is widely used in research and production ML systems.
Llama2 (Large Language Model): Llama2 is an open-source large language model (LLM) developed by Meta. In this project, we use Llama2-7b-hf (7 billion parameters, Hugging Face format) to generate natural language explanations for vulnerability predictions.
- What is Llama2? It's a transformer-based neural network trained on vast amounts of text data. It can understand context and generate human-readable text.
- Why use Llama2 here? It converts technical SHAP/LIME token scores into plain-language explanations that developers can easily understand (e.g., "This contract has a reentrancy vulnerability because...").
- Local deployment: The model runs entirely on your machine (CPU or GPU), no internet connection required. Model files are stored in Llama2-7b-hf/ directory.
- Performance: CPU inference is slow (30-60 seconds), so we've added timeouts and optimizations. The system also provides instant template-based rationale as a faster alternative.
ML Pipeline: TF-IDF vectorization + Logistic Regression (fast on CPU, interpretable). TF-IDF converts code text into numerical features, and Logistic Regression predicts vulnerability probabilities.
XAI (Explainable AI): SHAP and LIME for token-level feature importance showing which code tokens contributed most to predictions.

How to Create a React Project (for Reference)

This project's frontend was created using create-react-app. Here's how to set up a similar React project from scratch:

Install Node.js: Download and install Node.js (v14+) from nodejs.org.

Create React app:

npx create-react-app frontend-name
cd frontend-name
npm start

Project structure: The generated project includes:
- src/ - Source code (components, pages, services)
- public/ - Static files (index.html, assets)
- package.json - Dependencies and scripts
- node_modules/ - Installed packages (auto-generated)
Key React concepts used:
- Components: Reusable UI pieces (e.g., Sidebar.js, AnalysisPage.js)
- State: React's useState hook manages component data (e.g., user input, API results)
- Effects: React's useEffect hook handles side effects (e.g., API calls, polling)
- Props: Data passed from parent to child components
Development server: Running npm start starts a development server at http://localhost:3000 with hot-reload (changes appear immediately).
Production build: Run npm run build to create an optimized production build in the build/ folder.

Note: This project's frontend is already set up. You only need to run npm install and npm start in the frontend/ directory to use it.

Data Flow

Raw CSV datasets are converted to JSONL format, split into train/val/test sets, and used to train the classifier. The trained model artifacts are saved and loaded by the API server on startup. Each analysis run is persisted as JSON for later review and export.

See the Test Data & Sample Contracts Project Workflow, Dataset & Training, Llama2 Connection, and System Architecture sections below for detailed diagrams and explanations.

Setup & Run

Backend

cd "RP 44"
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt

cd backend
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend

cd "RP 44/frontend"
npm install
npm start

Training

source .venv/bin/activate
python backend/scripts/convert_sc_csv.py --in_csv "data/SC_Vuln_8label.csv" --dataset_name sc_vuln_8label --out_dir data/processed
python backend/scripts/convert_sc_csv.py --in_csv "data/SC_4label.csv" --dataset_name sc_4label --out_dir data/processed

python backend/scripts/train_classifier.py \
  --train_jsonl data/processed/sc_vuln_8label.train.jsonl \
  --val_jsonl data/processed/sc_vuln_8label.val.jsonl \
  --test_jsonl data/processed/sc_vuln_8label.test.jsonl \
  --artifacts_dir backend/artifacts

Outputs: backend/artifacts/tfidf_lr.joblib, training_report.json, vocabulary.json

Test Data & Sample Contracts

The project includes sample/test data you can use to quickly test the vulnerability detection system without needing your own Solidity contracts.

Sample Contract Provided

A sample Solidity contract is included at examples/sample_contract.sol. This contract contains intentionally vulnerable code patterns to demonstrate how the system detects different vulnerability types.

Vulnerabilities in the Sample Contract:

Reentrancy vulnerability: The withdraw() function updates balance after external call, allowing potential reentrancy attacks.
Integer overflow: The add() function performs unchecked addition (though Solidity 0.8+ handles overflow by default, it demonstrates the pattern).
Missing access control: The mint() function in TokenContract allows anyone to mint tokens without authorization checks.
Front-running vulnerability: The transfer() function pattern can be exploited via front-running attacks.

How to Use the Sample Contract:

Open the Analysis page in the React frontend (http://localhost:3000).
Choose "Paste" mode (or upload the file).
Copy the contents of examples/sample_contract.sol and paste into the text area, OR click "Choose File" and select examples/sample_contract.sol.
Enable explanations: Toggle "SHAP & LIME Explanations" to see token-level importance scores.
Optional: Enable "Llama2 Rationale" for natural language explanations (slower).
Click "Analyze" to see vulnerability predictions, confidence scores, and explanations.

Sample Contract Code:

// Sample Solidity Contract for Testing Vulnerability Detection
// This contract demonstrates common vulnerabilities

pragma solidity ^0.8.0;

contract SimpleStorage {
    uint256 private storedData;
    address public owner;
    mapping(address => uint256) public balances;
    
    constructor() {
        owner = msg.sender;
        storedData = 0;
    }
    
    // Potential reentrancy vulnerability example
    function withdraw() public {
        uint256 amount = balances[msg.sender];
        require(amount > 0, "No balance");
        balances[msg.sender] = 0;
        (bool success, ) = msg.sender.call{value: amount}("");
        require(success, "Transfer failed");
    }
    
    // Potential integer overflow (now handled by SafeMath in 0.8+)
    function add(uint256 a, uint256 b) public pure returns (uint256) {
        return a + b;
    }
    
    // Setter function
    function set(uint256 x) public {
        storedData = x;
    }
    
    // Getter function
    function get() public view returns (uint256) {
        return storedData;
    }
}

// Example with potential access control issues
contract TokenContract {
    mapping(address => uint256) balances;
    address public owner;
    
    constructor() {
        owner = msg.sender;
    }
    
    // Missing access control - anyone can mint
    function mint(address to, uint256 amount) public {
        balances[to] += amount;
    }
    
    // Potential front-running vulnerability
    function transfer(address to, uint256 amount) public {
        require(balances[msg.sender] >= amount, "Insufficient balance");
        balances[msg.sender] -= amount;
        balances[to] += amount;
    }
}

Creating Your Own Test Contracts

You can create additional test contracts by:

Copying real contracts from Etherscan: Visit Etherscan, find verified contracts, and copy their source code.
Using OpenZeppelin examples: The project includes OpenZeppelin contracts in openzeppelin-contracts-master/ that you can test.
Writing minimal test cases: Create small Solidity files focusing on specific vulnerability patterns.

Analysis Run Examples

The runs/ directory contains saved analysis results (JSON files) from previous runs. You can:

View run history: Go to the "Runs" page in the frontend to see all saved analyses.
Open specific runs: Click on any run to view its full results, explanations, and rationale.
Export runs: Download runs as PDF or JSON for offline review.
Use as test cases: The JSON structure in runs can serve as examples of the API response format.

Note: The training datasets (data/SC_Vuln_8label.csv, data/SC_4label.csv) are real datasets used to train the classifier, not dummy data.

Project Workflow

Goal: analyze a Solidity contract, predict vulnerability labels, generate explainability (SHAP/LIME), optionally generate a natural-language rationale (template or Llama2), and save the run for export/review.

Analysis workflow diagram

Download draw.io file: docs/diagrams/analysis_workflow.drawio

Dataset & Training

Input formats: raw CSVs are converted to a unified JSONL schema (id, source, labels, optional meta/spans). The classifier is trained from JSONL splits.

Dataset pipeline diagram

Download draw.io file: docs/diagrams/dataset_pipeline.drawio

Training outputs

Model artifact: backend/artifacts/tfidf_lr.joblib
Training report: backend/artifacts/training_report.json (accuracy + classification report)
Vocabulary map: backend/artifacts/vocabulary.json (word → index)

How Llama2 is Connected

The backend loads the local model from Llama2-7b-hf/ using transformers with local_files_only=True. The model is optional and used only for rationale generation when enabled.

Path wiring: backend/app/settings.py resolves llama_model_path to RP 44/Llama2-7b-hf.
Loader: backend/app/llm.py validates shards and loads tokenizer+model.
Inference guardrails (CPU): small input length, small max tokens, and a hard timeout so requests cannot hang indefinitely.
Fallback: when SHAP/LIME is enabled, the system can generate an instant template rationale without Llama2.

System Architecture

The system has a React UI, a FastAPI backend, a fast classical classifier, SHAP/LIME explainability, and an optional local Llama2 rationale generator. All results are persisted as runs for export/review.

Architecture diagram

Download draw.io file: docs/diagrams/system_architecture.drawio

Key Snippets: `backend/app/classifier.py`

TF-IDF + LogisticRegression: training, vocabulary map generation, reports, and model loading helpers.

Training pipeline + progress bars + report payload

def train_multiclass(
    train_jsonl: str,
    artifacts_path: str,
    max_features: int = 50000,
    val_jsonl: Optional[str] = None,
    test_jsonl: Optional[str] = None,
) -> Tuple[Pipeline, List[str], Dict[str, Any]]:
    print("Loading training data...")
    X_train: List[str] = []
    y_train: List[str] = []
    examples = list(read_jsonl(train_jsonl))
    for ex in tqdm(examples, desc="Loading train"):
        X_train.append(ex.source)
        y_train.append(ex.labels[0] if ex.labels else "unknown")

    print(f"Training samples: {len(X_train)}")
    labels = sorted(set(y_train))
    print(f"Labels: {labels}")

    print("\nBuilding vocabulary...")
    vocab_map = build_vocabulary(X_train)
    print(f"Vocabulary size: {len(vocab_map)}")

    print("\nTraining classifier...")
    pipe: Pipeline = Pipeline(
        steps=[
            ("tfidf", TfidfVectorizer(ngram_range=(1, 2), max_features=max_features, min_df=2)),
            ("clf", LogisticRegression(max_iter=2000, n_jobs=1, verbose=1)),
        ]
    )
    pipe.fit(X_train, y_train)

    report: Dict[str, Any] = {
        "train_samples": len(X_train),
        "vocabulary_size": len(vocab_map),
        "labels": labels,
        "label_distribution": dict(Counter(y_train)),
    }

    if val_jsonl:
        print("\nEvaluating on validation set...")
        X_val: List[str] = []
        y_val: List[str] = []
        for ex in tqdm(read_jsonl(val_jsonl), desc="Loading val"):
            X_val.append(ex.source)
            y_val.append(ex.labels[0] if ex.labels else "unknown")
        
        y_val_pred = pipe.predict(X_val)
        val_acc = accuracy_score(y_val, y_val_pred)
        report["val_accuracy"] = float(val_acc)
        report["val_samples"] = len(X_val)
        report["val_classification_report"] = classification_report(
            y_val, y_val_pred, output_dict=True, zero_division=0
        )
        print(f"Validation accuracy: {val_acc:.4f}")

    if test_jsonl:
        print("\nEvaluating on test set...")
        X_test: List[str] = []
        y_test: List[str] = []
        for ex in tqdm(read_jsonl(test_jsonl), desc="Loading test"):
            X_test.append(ex.source)
            y_test.append(ex.labels[0] if ex.labels else "unknown")
        
        y_test_pred = pipe.predict(X_test)
        test_acc = accuracy_score(y_test, y_test_pred)
        report["test_accuracy"] = float(test_acc)
        report["test_samples"] = len(X_test)
        report["test_classification_report"] = classification_report(
            y_test, y_test_pred, output_dict=True, zero_division=0
        )
        print(f"Test accuracy: {test_acc:.4f}")

    p = Path(artifacts_path)
    p.parent.mkdir(parents=True, exist_ok=True)
    joblib.dump({"pipeline": pipe, "labels": labels, "vocab_map": vocab_map}, p)
    
    return pipe, labels, report

Artifacts loading and top-k helper

def load_classifier(artifacts_path: str) -> Tuple[Pipeline, List[str], Optional[Dict[str, int]]]:
    obj = joblib.load(artifacts_path)
    vocab_map = obj.get("vocab_map")
    return obj["pipeline"], obj["labels"], vocab_map


def predict_proba(pipe: Pipeline, text: str) -> Dict[str, float]:
    probs = pipe.predict_proba([text])[0]
    classes = list(pipe.classes_)
    return {cls: float(p) for cls, p in zip(classes, probs)}


def topk(probs: Dict[str, float], k: int = 3) -> List[Tuple[str, float]]:
    return sorted(probs.items(), key=lambda kv: kv[1], reverse=True)[:k]

Key Snippets: `backend/app/explain.py`

SHAP + LIME integration: wraps the classifier into explainers and returns token importance weights.

LIME explanation wrapper

def lime_explain(pipe: Pipeline, text: str, top_label: str, num_features: int = 20) -> Dict[str, Any]:
    classes = list(pipe.classes_)
    explainer = lime.lime_text.LimeTextExplainer(class_names=classes)
    exp = explainer.explain_instance(text, pipe.predict_proba, num_features=num_features, labels=[classes.index(top_label)])
    weights = exp.as_list(label=classes.index(top_label))
    return {"label": top_label, "weights": [{"token": t, "weight": float(w)} for t, w in weights]}

SHAP explanation wrapper

def shap_explain(pipe: Pipeline, text: str, top_label: str, max_evals: int = 200) -> Dict[str, Any]:
    classes = list(pipe.classes_)
    masker = shap.maskers.Text()
    explainer = shap.Explainer(pipe.predict_proba, masker, output_names=classes)
    sv = explainer([text], max_evals=max_evals)
    idx = classes.index(top_label)
    tokens = list(sv.data[0])
    vals = list(sv.values[0][:, idx])
    pairs: List[Tuple[str, float]] = [(t, float(v)) for t, v in zip(tokens, vals)]
    pairs = sorted(pairs, key=lambda x: abs(x[1]), reverse=True)[:50]
    return {"label": top_label, "weights": [{"token": t, "value": v} for t, v in pairs]}

Key Snippets: `backend/app/llm.py`

Local Llama2 wrapper: validates model files, loads with transformers, and generates text with CPU guardrails (timeout).

Model file checks + local loading

class Llama2Service:
    def __init__(self, model_path: str, device: str = "cpu") -> None:
        self.model_path = str(Path(model_path))
        self.device = device
        
        # Check if model directory exists
        model_dir = Path(self.model_path)
        if not model_dir.exists():
            raise Llama2ModelError(f"Model directory not found: {self.model_path}")
        
        # Check if required model files exist
        config_file = model_dir / "config.json"
        if not config_file.exists():
            raise Llama2ModelError(f"Model config.json not found in {self.model_path}")
        
        # Check for model weight files
        index_file = model_dir / "pytorch_model.bin.index.json"
        if index_file.exists():
            # Check if all shard files exist
            with open(index_file, "r") as f:
                index_data = json.load(f)
                weight_map = index_data.get("weight_map", {})
                shard_files = set(weight_map.values())
                missing_shards = []
                for shard_file in shard_files:
                    shard_path = model_dir / shard_file
                    if not shard_path.exists():
                        missing_shards.append(shard_file)
                if missing_shards:
                    raise Llama2ModelError(
                        f"Missing model shard files: {', '.join(missing_shards)}. "
                        f"Please ensure all model files are present in {self.model_path}"
                    )
        else:
            # Check for single model file
            single_model_file = model_dir / "pytorch_model.bin"
            if not single_model_file.exists():
                raise Llama2ModelError(
                    f"Model weight file not found. Expected either pytorch_model.bin or "
                    f"pytorch_model.bin.index.json with shard files in {self.model_path}"
                )
        
        try:
            self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, local_files_only=True)
            self.model = AutoModelForCausalLM.from_pretrained(
                self.model_path,
                local_files_only=True,
                torch_dtype=torch.float16 if device != "cpu" else torch.float32,
                low_cpu_mem_usage=True,
            )
            self.model.to(device)
            self.model.eval()
            
            if device == "cpu":
                try:
                    import torch.quantization as quantization
                    self.model = torch.quantization.quantize_dynamic(
                        self.model,
                        {torch.nn.Linear},
                        dtype=torch.qint8
                    )
                except Exception as qe:
                    pass
        except FileNotFoundError as e:
            raise Llama2ModelError(
                f"Failed to load Llama2 model: {str(e)}. "
                f"Please ensure all model files are present in {self.model_path}"
            ) from e
        except Exception as e:
            raise Llama2ModelError(f"Failed to load Llama2 model: {str(e)}") from e

CPU speed controls: quantization + max tokens + timeout

            if device == "cpu":
                try:
                    import torch.quantization as quantization
                    self.model = torch.quantization.quantize_dynamic(
                        self.model,
                        {torch.nn.Linear},
                        dtype=torch.qint8
                    )
                except Exception as qe:
                    pass
        except FileNotFoundError as e:
            raise Llama2ModelError(
                f"Failed to load Llama2 model: {str(e)}. "
                f"Please ensure all model files are present in {self.model_path}"
            ) from e
        except Exception as e:
            raise Llama2ModelError(f"Failed to load Llama2 model: {str(e)}") from e

    def _generate_internal(self, inputs: dict, max_new_tokens: int) -> torch.Tensor:
        with torch.inference_mode():
            return self.model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                do_sample=False,
                temperature=1.0,
                num_beams=1,
                pad_token_id=self.tokenizer.eos_token_id,
            )

    def generate(self, prompt: str, max_new_tokens: int = 50, timeout_seconds: int = 60) -> str:
        inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=256)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}
        
        max_new_tokens = min(max_new_tokens, 50)
        
        result_container = {"output": None, "error": None}
        
        def generate_with_timeout():
            try:
                out = self._generate_internal(inputs, max_new_tokens)
                result_container["output"] = out
            except Exception as e:
                result_container["error"] = e
        
        thread = threading.Thread(target=generate_with_timeout)
        thread.daemon = True
        thread.start()
        thread.join(timeout=timeout_seconds)
        
        if thread.is_alive():
            raise TimeoutError(f"Llama2 generation exceeded {timeout_seconds} seconds timeout. This is expected on CPU - consider using GPU or disabling Llama2 rationale.")
        
        if result_container["error"]:
            raise result_container["error"]
        
        if result_container["output"] is None:
            raise TimeoutError(f"Llama2 generation did not complete within {timeout_seconds} seconds")
        
        out = result_container["output"]
        text = self.tokenizer.decode(out[0], skip_special_tokens=True)
        prompt_text = self.tokenizer.decode(inputs['input_ids'][0], skip_special_tokens=True)
        if text.startswith(prompt_text):
            text = text[len(prompt_text):].strip()
        
        text = text.strip()
        prefixes = ["Rationale:", "rationale:", "Explanation:", "explanation:"]
        for prefix in prefixes:
            if text.lower().startswith(prefix.lower()):
                text = text[len(prefix):].strip()
                text = text.lstrip(": -")
        
        return text

Key Snippets: `backend/app/main.py`

FastAPI entrypoint: defines API endpoints, CORS, inference flow, explanations, rationale, and run persistence.

/analyze endpoint (prediction + explanations + rationale)

@app.post("/analyze")
def analyze(
    source: Optional[str] = Form(default=None),
    file: Optional[UploadFile] = File(default=None),
    explain: bool = Form(default=True),
    use_llama_rationale: bool = Form(default=False),
    max_chars: int = Form(default=12000),
) -> dict:
    t0 = time.time()
    if source is None and file is None:
        return {"error": "Provide source or file"}
    if source is None and file is not None:
        source = file.file.read().decode("utf-8", errors="replace")
    assert source is not None
    if max_chars and len(source) > max_chars:
        source = source[:max_chars]

    pipe, _labels, _vocab = get_classifier()
    proba = pipe.predict_proba([source])[0]
    classes = list(pipe.classes_)
    probs = {cls: float(p) for cls, p in zip(classes, proba)}
    top = topk(probs, k=3)
    top_label = top[0][0] if top else classes[0]

    explanations = {}
    if explain:
        explanations["lime"] = lime_explain(pipe, source, top_label=top_label)
        explanations["shap"] = shap_explain(pipe, source, top_label=top_label)

    rationale = None
    rationale_error = None
    rationale_source = None
    
    if explain and (explanations.get("shap") or explanations.get("lime")):
        top_confidence = top[0][1] if top else probs.get(top_label, 0.0)
        rationale = generate_template_rationale(
            top_label=top_label,
            confidence=top_confidence,
            shap_explanations=explanations.get("shap"),
            lime_explanations=explanations.get("lime"),
        )
        rationale_source = "template"
    
    if use_llama_rationale:
        try:
            llm = get_llm()
            source_preview = source[:1000]
            shap_top = ""
            if explain and explanations.get("shap") and explanations["shap"].get("weights"):
                top_tokens = sorted(explanations["shap"]["weights"], key=lambda x: abs(x.get("value", 0)), reverse=True)[:3]
                shap_top = ", ".join([f"{t['token']}({t['value']:.2f})" for t in top_tokens])
            
            prompt = f"Vulnerability: {top_label}\nCode: {source_preview[:500]}\nExplain why in 2 sentences:\n"
            llama_rationale = llm.generate(prompt, max_new_tokens=50, timeout_seconds=60)
            rationale = llama_rationale
            rationale_source = "llama2"
        except TimeoutError as e:
            rationale_error = str(e)
        except Llama2ModelError as e:
            rationale_error = str(e)
        except Exception as e:
            rationale_error = f"Failed to generate rationale: {str(e)}"

    dt = time.time() - t0
    run_data = {
        "input": {"source_len": len(source)},
        "prediction": {"probs": probs, "top": top},
        "explanations": explanations,
        "metrics": {"runtime_sec": dt},
    }
    if rationale is not None:
        run_data["rationale"] = rationale
        if rationale_source:
            run_data["rationale_source"] = rationale_source
    if rationale_error:
        run_data["rationale_error"] = rationale_error
    
    run = save_run(settings.runs_dir, run_data)
    return run

Artifact endpoints (/model/info, /training/report, /vocabulary)

@app.get("/model/info")
def model_info() -> dict:
    artifacts_path = Path(settings.artifacts_dir) / "tfidf_lr.joblib"
    if not artifacts_path.exists():
        return {"error": "Model not found. Train a model first."}
    pipe, labels, vocab = load_classifier(str(artifacts_path))
    return {
        "labels": labels,
        "num_labels": len(labels),
        "vocab_size": len(vocab) if vocab else None,
        "model_type": "TF-IDF + LogisticRegression",
    }


@app.get("/training/report")
def training_report() -> dict:
    report_path = Path(settings.artifacts_dir) / "training_report.json"
    if not report_path.exists():
        return {"error": "Training report not found. Train a model first."}
    return json.loads(report_path.read_text(encoding="utf-8"))


@app.get("/vocabulary")
def vocabulary(limit: int = 1000) -> dict:
    vocab_path = Path(settings.artifacts_dir) / "vocabulary.json"
    if not vocab_path.exists():
        return {"error": "Vocabulary not found. Train a model first."}
    vocab = json.loads(vocab_path.read_text(encoding="utf-8"))
    items = list(vocab.items())[:limit]
    return {
        "total_size": len(vocab),
        "items": [{"word": word, "index": idx} for word, idx in items],
    }

Key Snippets: `backend/app/rationale.py`

Instant template rationale: converts SHAP/LIME top tokens into a human-readable explanation without LLM latency.

Instant template rationale from SHAP/LIME

def generate_template_rationale(
    top_label: str,
    confidence: float,
    shap_explanations: Optional[Dict[str, Any]] = None,
    lime_explanations: Optional[Dict[str, Any]] = None,
) -> str:
    top_tokens = []
    
    if shap_explanations and shap_explanations.get("weights"):
        top_tokens = sorted(
            shap_explanations["weights"],
            key=lambda x: abs(x.get("value", 0)),
            reverse=True
        )[:5]
    elif lime_explanations and lime_explanations.get("weights"):
        top_tokens = sorted(
            lime_explanations["weights"],
            key=lambda x: abs(x.get("weight", 0)),
            reverse=True
        )[:5]
    
    confidence_pct = int(confidence * 100)
    
    if top_tokens:
        token_descriptions = []
        for token_info in top_tokens[:3]:
            token = token_info.get("token", "") or token_info.get("text", "")
            value = token_info.get("value") or token_info.get("weight", 0)
            if abs(value) > 0.01:
                token_descriptions.append(f'"{token}"')
        
        tokens_str = ", ".join(token_descriptions) if token_descriptions else "various code patterns"
        
        rationale = (
            f"This contract has been classified as {top_label} with {confidence_pct}% confidence. "
            f"The key indicators include code patterns like {tokens_str}, which are commonly associated with this vulnerability type. "
            f"These patterns suggest potential security risks that should be reviewed and addressed."
        )
    else:
        rationale = (
            f"This contract has been classified as {top_label} with {confidence_pct}% confidence. "
            f"The classification is based on patterns detected in the code that are characteristic of this vulnerability type. "
            f"Please review the code carefully to identify and mitigate potential security risks."
        )
    
    return rationale

Key Snippets: `backend/scripts/train_classifier.py`

CLI training script: trains model, writes artifacts, and saves training report + vocabulary.

Writes training_report.json and vocabulary.json

import argparse
import json
import sys
from pathlib import Path

import joblib

sys.path.insert(0, str(Path(__file__).parent.parent))

from app.classifier import train_multiclass


def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument("--train_jsonl", required=True)
    ap.add_argument("--artifacts_dir", required=True)
    ap.add_argument("--val_jsonl", default=None)
    ap.add_argument("--test_jsonl", default=None)
    ap.add_argument("--max_features", type=int, default=50000)
    args = ap.parse_args()

    artifacts_dir = Path(args.artifacts_dir)
    artifacts_dir.mkdir(parents=True, exist_ok=True)
    
    out_path = str(artifacts_dir / "tfidf_lr.joblib")
    vocab_path = artifacts_dir / "vocabulary.json"
    report_path = artifacts_dir / "training_report.json"
    
    pipe, labels, report = train_multiclass(
        args.train_jsonl,
        out_path,
        max_features=args.max_features,
        val_jsonl=args.val_jsonl,
        test_jsonl=args.test_jsonl,
    )
    
    saved_obj = joblib.load(out_path)
    vocab_obj = saved_obj.get("vocab_map", {})
    if vocab_obj:
        with vocab_path.open("w", encoding="utf-8") as f:
            json.dump(vocab_obj, f, indent=2, ensure_ascii=False)
        print(f"\nVocabulary saved to: {vocab_path} ({len(vocab_obj)} words)")
    else:
        print("\nWarning: Vocabulary map not found in saved artifacts")
    
    with report_path.open("w", encoding="utf-8") as f:
        json.dump(report, f, indent=2, ensure_ascii=False)
    print(f"Training report saved to: {report_path}")
    
    print("\nTraining completed successfully!")


if __name__ == "__main__":
    main()

Key Snippets: `frontend/src/App.js`

Top-level React app: sidebar layout, page routing, backend health polling.

Health polling + routing

const DEFAULT_API_BASE = process.env.REACT_APP_API_BASE || 'http://localhost:8000';

function App() {
  const [activePage, setActivePage] = useState('analysis');
  const [apiBase, setApiBase] = useState(DEFAULT_API_BASE);
  const [health, setHealth] = useState({ status: 'checking', lastCheck: null });

  useEffect(() => {
    let isMounted = true;
    const check = async () => {
      try {
        const status = await checkHealth(apiBase);
        if (isMounted) {
          setHealth({ status: status ? 'ok' : 'error', lastCheck: new Date() });
        }
      } catch (error) {
        if (isMounted) {
          setHealth({ status: 'error', lastCheck: new Date() });
        }
      }
    };

    check();
    const interval = setInterval(check, 5000);

    return () => {
      isMounted = false;
      clearInterval(interval);
    };
  }, [apiBase]);

  const renderPage = () => {
    switch (activePage) {
      case 'analysis':
        return <AnalysisPage apiBase={apiBase} health={health} />;
      case 'model':
        return <ModelPage apiBase={apiBase} health={health} />;
      case 'vocabulary':
        return <VocabularyPage apiBase={apiBase} health={health} />;
      case 'runs':
        return <RunsPage apiBase={apiBase} health={health} />;
      default:
        return <AnalysisPage apiBase={apiBase} health={health} />;
    }
  };

  return (
    <div className="app">
      <Sidebar
        activePage={activePage}
        onPageChange={setActivePage}
        health={health}
        apiBase={apiBase}
        onApiBaseChange={setApiBase}
      />
      <main className="main-content">
        {renderPage()}
      </main>
    </div>
  );

Key Snippets: `frontend/src/components/Sidebar.js`

Navigation + backend connection status + API base URL input.

Connection indicator + API base URL override

function Sidebar({ activePage, onPageChange, health, apiBase, onApiBaseChange }) {
  const pages = [
    { id: 'analysis', name: 'Analysis', description: 'Analyze contracts' },
    { id: 'model', name: 'Model', description: 'Model information' },
    { id: 'vocabulary', name: 'Vocabulary', description: 'View vocabulary' },
    { id: 'runs', name: 'Runs', description: 'Analysis history' },
  ];

  const getStatusColor = () => {
    if (health.status === 'ok') return 'ok';
    if (health.status === 'error') return 'error';
    return '';
  };

  return (
    <aside className="sidebar">
      <div className="sidebar-header">
        <div className="sidebar-logo">SCXAI</div>
        <div className="status-indicator">
          <span className={`status-dot ${getStatusColor()}`}></span>
          <span>{health.status === 'ok' ? 'Connected' : health.status === 'error' ? 'Offline' : 'Checking'}</span>
        </div>
      </div>

      <nav className="sidebar-nav">
        {pages.map((page) => (
          <div
            key={page.id}
            className={`nav-item ${activePage === page.id ? 'active' : ''}`}
            onClick={() => onPageChange(page.id)}
          >
            <div>
              <div className="nav-item-name">{page.name}</div>
              <div style={{ fontSize: '12px', color: 'var(--color-text-tertiary)', marginTop: '2px' }}>
                {page.description}
              </div>
            </div>
          </div>
        ))}
      </nav>

      <div className="sidebar-footer">
        <div style={{ marginBottom: 'var(--spacing-sm)' }}>
          <label style={{ fontSize: '11px', color: 'var(--color-text-tertiary)', display: 'block', marginBottom: '4px' }}>
            API Base URL
          </label>
          <input
            type="text"
            className="input"
            value={apiBase}
            onChange={(e) => onApiBaseChange(e.target.value)}
            style={{ fontSize: '12px', padding: '6px 8px' }}
          />
        </div>
        <div style={{ fontSize: '11px', color: 'var(--color-text-tertiary)' }}>
          {apiBase}
        </div>
      </div>
    </aside>

Key Snippets: `frontend/src/pages/AnalysisPage.js`

Main workflow: paste/upload contract, call /analyze, render predictions, rationale, and SHAP/LIME weights.

UI state: toggles + analyze request

function AnalysisPage({ apiBase, health }) {
  const [mode, setMode] = useState('paste');
  const [source, setSource] = useState('');
  const [file, setFile] = useState(null);
  const [explain, setExplain] = useState(true);
  const [useLlamaRationale, setUseLlamaRationale] = useState(false);
  const [maxChars, setMaxChars] = useState(12000);
  const [busy, setBusy] = useState(false);
  const [result, setResult] = useState(null);
  const [error, setError] = useState(null);
  const [explanationTab, setExplanationTab] = useState('shap');

  const canRun = health.status === 'ok' && (source.trim() || file) && !busy;

  const topPrediction = useMemo(() => {
    if (!result?.prediction?.top?.[0]) return null;
    return result.prediction.top[0];
  }, [result]);

  const sortedProbs = useMemo(() => {
    if (!result?.prediction?.probs) return [];
    return Object.entries(result.prediction.probs)
      .sort((a, b) => b[1] - a[1])
      .slice(0, 10);
  }, [result]);

  const handleFileChange = (e) => {
    const selectedFile = e.target.files?.[0];
    if (selectedFile) {
      setFile(selectedFile);
      setMode('upload');
      setSource('');
    }
  };

  const handleAnalyze = async () => {
    setBusy(true);
    setError(null);
    setResult(null);

    try {
      const data = await analyzeContract(apiBase, {
        source: mode === 'paste' ? source : undefined,
        file: mode === 'upload' ? file : undefined,
        explain,
        useLlamaRationale,
        maxChars,
      });
      setResult(data);
    } catch (err) {
      setError(err.message || 'Analysis failed');
    } finally {
      setBusy(false);
    }
  };

Explanation toggles (SHAP/LIME + Llama2)

        <div style={{ marginTop: 'var(--spacing-lg)', paddingTop: 'var(--spacing-lg)', borderTop: '1px solid var(--color-border)' }}>
          <div style={{ marginBottom: 'var(--spacing-md)' }}>
            <div className="flex items-center justify-between" style={{ marginBottom: 'var(--spacing-sm)' }}>
              <label className="text-primary" style={{ fontWeight: 600 }}>
                SHAP & LIME Explanations
              </label>
              <div
                className={`toggle ${explain ? 'active' : ''}`}
                onClick={() => !busy && setExplain(!explain)}
              >
                <div className="toggle-thumb"></div>
              </div>
            </div>
            <div className="text-secondary" style={{ fontSize: '13px' }}>
              Generate token-level importance scores for predictions
            </div>
          </div>

          <div>
            <div className="flex items-center justify-between" style={{ marginBottom: 'var(--spacing-sm)' }}>
              <label className="text-primary" style={{ fontWeight: 600 }}>
                Llama2 Rationale
              </label>
              <div
                className={`toggle ${useLlamaRationale ? 'active' : ''}`}
                onClick={() => !busy && setUseLlamaRationale(!useLlamaRationale)}
              >
                <div className="toggle-thumb"></div>
              </div>
            </div>
            <div className="text-secondary" style={{ fontSize: '13px' }}>
              Generate natural language explanation (CPU intensive - may take 1-3 minutes)
            </div>
          </div>

Rationale / error rendering

          {(result.rationale || result.rationale_error) && (
            <div className="card" style={{ marginTop: 'var(--spacing-lg)' }}>
              <div className="card-header">
                <div>
                  <h3 className="card-title">AI Explanation (Llama2)</h3>
                  <div className="card-subtitle">
                    {result.rationale_error 
                      ? 'Llama2 model encountered an error' 
                      : 'Human-readable explanation generated by Llama2'}
                  </div>
                </div>
              </div>
              {result.rationale_error ? (
                <div style={{ padding: 'var(--spacing-md)' }}>
                  <div className="alert alert-error">
                    <div style={{ fontWeight: 600, marginBottom: 'var(--spacing-sm)', fontSize: '15px' }}>
                      ⚠️ Llama2 Model Not Available
                    </div>
                    <div style={{ fontSize: '14px', lineHeight: 1.7, marginBottom: 'var(--spacing-md)' }}>
                      {result.rationale_error}
                    </div>
                    <div style={{ 
                      paddingTop: 'var(--spacing-md)', 
                      borderTop: '1px solid var(--color-border)',
                      fontSize: '13px', 
                      color: 'var(--color-text-secondary)',
                      lineHeight: 1.6
                    }}>
                      <strong>Note:</strong> The analysis completed successfully! You can still use the SHAP/LIME explanations shown above, which provide excellent insights into why the vulnerability was detected. The Llama2 feature is optional and requires all model files to be present.
                    </div>
                  </div>
                </div>
              ) : result.rationale ? (
                <div
                  style={{
                    padding: 'var(--spacing-lg)',
                    background: 'var(--color-surface-elevated)',
                    borderRadius: 'var(--radius-md)',
                    lineHeight: 1.8,
                    fontSize: '15px',
                    color: 'var(--color-text-primary)',
                    whiteSpace: 'pre-wrap',
                    fontFamily: '-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif',
                  }}
                >
                  {result.rationale}
                </div>
              ) : null}
            </div>

Key Snippets: `frontend/src/services/api.js`

Centralized fetch wrappers for backend endpoints with friendly network error messages.

Backend health check + analyze request wiring

const DEFAULT_API_BASE = process.env.REACT_APP_API_BASE || 'http://localhost:8000';

export async function checkHealth(apiBase = DEFAULT_API_BASE) {
  try {
    const response = await fetch(`${apiBase}/health`);
    return response.ok;
  } catch (error) {
    return false;
  }
}

export async function analyzeContract(apiBase, { source, file, explain = true, useLlamaRationale = false, maxChars = 12000 }) {
  const formData = new FormData();
  if (file) {
    formData.append('file', file);
  } else if (source) {
    formData.append('source', source);
  }
  formData.append('explain', String(explain));
  formData.append('use_llama_rationale', String(useLlamaRationale));
  formData.append('max_chars', String(maxChars));

  try {
    const response = await fetch(`${apiBase}/analyze`, {
      method: 'POST',
      body: formData,
    });

    if (!response.ok) {
      const error = await response.json().catch(() => ({ error: `HTTP ${response.status}: ${response.statusText}` }));
      throw new Error(error.error || `HTTP ${response.status}: ${response.statusText}`);
    }

    return await response.json();
  } catch (error) {
    // Re-throw with more context if it's a network error
    if (error instanceof TypeError && error.message.includes('fetch')) {
      throw new Error(`Failed to connect to backend at ${apiBase}. Make sure the backend server is running.`);
    }
    throw error;
  }

Runs export PDF URL helper

export async function getRun(apiBase, runId) {
  const response = await fetch(`${apiBase}/runs/${runId}`);
  if (!response.ok) {
    const error = await response.json().catch(() => ({ error: `HTTP ${response.status}` }));
    throw new Error(error.error || `HTTP ${response.status}`);
  }
  return await response.json();
}

export function getRunPdfUrl(apiBase, runId) {
  return `${apiBase}/runs/${runId}/pdf`;
}

Files (what each does)

All files below are part of the project. Only key files have code snippets above.

backend/app/__init__.py

2 bytes

Package marker for backend app.

backend/app/classifier.py

4577 bytes has key snippets

TF-IDF + LogisticRegression: training, vocabulary map generation, reports, and model loading helpers.

See the key snippets section: backend/app/classifier.py

backend/app/dataset.py

1082 bytes

JSONL reader/writer and schema helpers for contract examples.

backend/app/explain.py

1288 bytes has key snippets

SHAP + LIME integration: wraps the classifier into explainers and returns token importance weights.

See the key snippets section: backend/app/explain.py

backend/app/labels.py

630 bytes

Label normalization used during dataset conversion.

backend/app/llm.py

6325 bytes has key snippets

Local Llama2 wrapper: validates model files, loads with transformers, and generates text with CPU guardrails (timeout).

See the key snippets section: backend/app/llm.py

backend/app/main.py

6704 bytes has key snippets

FastAPI entrypoint: defines API endpoints, CORS, inference flow, explanations, rationale, and run persistence.

See the key snippets section: backend/app/main.py

backend/app/rationale.py

1989 bytes has key snippets

Instant template rationale: converts SHAP/LIME top tokens into a human-readable explanation without LLM latency.

See the key snippets section: backend/app/rationale.py

backend/app/report.py

2946 bytes

PDF report generation and run loading utility.

backend/app/runs.py

447 bytes

Persists each analysis run to runs/<uuid>.json.

backend/app/settings.py

594 bytes

Central settings: resolves project-root paths for Llama2 model dir, artifacts, and runs.

backend/artifacts/training_report.json

4094 bytes

Training metrics output (accuracy, reports, label distribution).

backend/artifacts/vocabulary.json

3176214 bytes

Vocabulary (word -> index) generated from training data.

backend/LLAMA2_PERFORMANCE.md

1705 bytes

Notes about CPU performance and constraints for Llama2 rationale generation.

backend/requirements.txt

269 bytes

Pinned Python dependencies for FastAPI, ML pipeline, SHAP/LIME, and local Llama2 loading.

backend/scripts/convert_sc_csv.py

2402 bytes

Converts provided CSV datasets into processed JSONL splits used for training/eval.

backend/scripts/prepare_dataset.py

1173 bytes

Dataset preparation helper (JSONL writing) for general workflows.

backend/scripts/train_classifier.py

1668 bytes has key snippets

CLI training script: trains model, writes artifacts, and saves training report + vocabulary.

See the key snippets section: backend/scripts/train_classifier.py

data/README.md

524 bytes

Defines the unified JSONL schema used by the backend training pipeline.

examples/README.md

2622 bytes

How to get Solidity code to test in the UI; includes sources like GitHub/Etherscan.

examples/sample_contract.sol

1675 bytes

Sample Solidity code for quick testing in the Analysis page.

frontend/package-lock.json

646447 bytes

Exact npm dependency lockfile.

frontend/package.json

541 bytes

Frontend dependencies and scripts.

frontend/public/index.html

307 bytes

React HTML template.

frontend/src/App.css

10192 bytes

Main UI system styles (cards, typography, layout, toggles, loading states).

frontend/src/App.js

2042 bytes has key snippets

Top-level React app: sidebar layout, page routing, backend health polling.

See the key snippets section: frontend/src/App.js

frontend/src/index.css

556 bytes

Global CSS and theme variables.

frontend/src/index.js

255 bytes

React entry: mounts <App/>.

frontend/src/pages/AnalysisPage.js

18746 bytes has key snippets

Main workflow: paste/upload contract, call /analyze, render predictions, rationale, and SHAP/LIME weights.

See the key snippets section: frontend/src/pages/AnalysisPage.js

frontend/src/pages/ModelPage.js

3884 bytes

Calls /model/info and displays model type, label count, vocab size, and label list.

frontend/src/pages/RunsPage.js

6044 bytes

Calls /runs and /runs/{id} to browse and reopen analysis history.

frontend/src/pages/VocabularyPage.js

3438 bytes

Calls /vocabulary and displays the word map.

frontend/src/services/api.js

2861 bytes has key snippets

Centralized fetch wrappers for backend endpoints with friendly network error messages.

See the key snippets section: frontend/src/services/api.js

README.md

2235 bytes

Top-level project overview and the main commands to convert datasets, train the classifier, and run backend/frontend.

runs/1d6faf53-7b39-4df2-ad35-625c03765061.json

697 bytes