Skip to content

Latest commit

 

History

History
250 lines (181 loc) · 8.97 KB

File metadata and controls

250 lines (181 loc) · 8.97 KB

CLAUDE.md

This file provides guidance to AI coding agents (Claude Code, Codex, Gemini CLI, and similar) when working with code in this repository. The content is intentionally agent-agnostic; only the filename follows the CLAUDE.md convention.

Project Overview

sgraph is a Python library for working with hierarchic graph structures, typically used for representing software architectures. It provides data formats, structures, and algorithms for analyzing and manipulating software dependency graphs.

The library is maintained by Softagram and is used for building information models about analyzed software. See also sgraph-mcp-server for AI agent integration.

Repository Model (Fork-Based Contribution)

The canonical repository is softagram/sgraph. Contributors are expected to work from personal forks using the standard fork-and-PR model.

A typical working clone has two remotes:

Remote Purpose Example URL
origin Your personal fork (push target for feature branches) git@github.com:<you>/sgraph.git
upstream The canonical Softagram repository (source of truth for main) git@github.com:softagram/sgraph.git

Important for coding agents: when checking whether a branch's commits are already merged, always compare against upstream/main, not just origin/main. A contributor's fork may lag behind upstream, so commits that appear "unmerged" relative to origin/main can in fact already be live in the canonical repo.

# Always fetch both before reasoning about merge state
git fetch origin main
git fetch upstream main

# Are this branch's commits in the canonical main?
git rev-list --left-right --count upstream/main...HEAD
# → "N 0" means all HEAD commits are already in upstream/main (safe to switch/delete branch)

New contributors: fork softagram/sgraph on GitHub, clone your fork as origin, and add softagram/sgraph as upstream. Open PRs from feature branches on your fork against softagram/sgraph:main.

Development Commands

Environment Setup

# Install dependencies
pip install -r requirements.txt

# Install dev dependencies (includes yapf, flake8)
pip install -r requirements-dev.txt

# Install package in development mode
pip install -e .

Testing

# Run all tests
pytest

# Run specific test file
pytest tests/sgraph_test.py

# Run specific test
pytest tests/sgraph_test.py::test_deepcopy

Linting

# Run flake8 (max line length: 100)
flake8 src/

# Format code with yapf (config in .style.yapf)
yapf -r -i src/

Building

# Build distribution packages
python3 setup.py sdist bdist_wheel

Releasing

The project uses an automated release script to streamline the release process:

# Auto-bump patch version (1.1.1 -> 1.1.2)
python scripts/release.py --bump patch

# Auto-bump minor version (1.1.1 -> 1.2.0)
python scripts/release.py --bump minor

# Auto-bump major version (1.1.1 -> 2.0.0)
python scripts/release.py --bump major

# Release specific version
python scripts/release.py --version 1.2.0

# Dry run to preview changes
python scripts/release.py --bump patch --dry-run

The script automates the full release workflow: version bumping, branch creation, PR creation (requires gh CLI), tagging, building, PyPI upload (requires twine), and GitHub release creation with auto-generated release notes from merged PRs. See scripts/README.md for setup instructions.

Core Architecture

Data Model

The sgraph data model consists of three primary classes:

  1. SGraph (sgraph.py): The top-level model container

    • Contains a root SElement
    • Manages model-level attributes via modelAttrs and metaAttrs
    • Provides serialization to XML, deps, and other formats
    • Entry point for parsing models from files
  2. SElement (selement.py): Hierarchical graph nodes

    • Represents elements in a tree structure (like directories/files)
    • Each element has a name, parent, children (stored in both list and dict)
    • Contains incoming and outgoing lists of associations (edges)
    • Stores attributes as key-value pairs in attrs dict
    • Uses __slots__ for memory efficiency
  3. SElementAssociation (selementassociation.py): Edges between elements

    • Represents directed relationships between elements
    • Has fromElement, toElement, and deptype (dependency type)
    • Stores additional attributes in attrs dict
    • Must call initElems() to register with connected elements
    • Use create_unique_element_association() to avoid duplicates

High-Level APIs

  • ModelApi (modelapi.py): Query and traverse models

    • getElementByPath(): Find elements by path
    • getElementsByName(): Find all elements by name
    • getCalledFunctions(): Get function call relationships
    • Various filtering and traversal utilities
  • MetricsApi (metricsapi.py): Extract metrics from models

    • get_total_loc_metrics(): Lines of code metrics
    • get_total_tech_debt_metrics(): Tech debt analysis

Module Structure

  • algorithms/: Graph algorithms (metrics, filtering, generalization, analysis)
  • converters/: Format converters (XML, deps, GraphML, JSON, PlantUML, DOT, CytoscapeJS, SBOM)
  • compare/: Model comparison and diff functionality (rename detection, similarity analysis)
  • loader/: Model loading utilities
  • cli/: Command-line interface utilities
  • attributes/: Attribute query and management
  • cypher.py: Cypher query language support via sPyCy (optional dependency: spycy-aneeshdurg)

Data Formats

The library supports multiple serialization formats:

  1. XML Format: High-performance format for large models (10M+ elements)

    • Uses integer IDs for element references
    • Minimalist syntax: <e n="name"> for elements, <r r="id" t="type"> for relationships
    • Model version 2.1
  2. Deps Format: Line-based text format for simple scripting

    • Format: /path/to/from:/path/to/to:dependency_type
    • Easy to read and script, but not recommended for very large models
  3. Other formats: GraphML, JSON, PlantUML, DOT, CytoscapeJS

Key Patterns

  • Path-based element access: Elements are identified by hierarchical paths (e.g., /nginx/src/core/nginx.c)
  • Lazy element creation: createOrGetElementFromPath() creates elements on-demand
  • Element merging: Duplicate elements under the same parent are merged
  • Association initialization: Associations must call initElems() to register with their connected elements
  • Type-based filtering: Elements and associations support type attributes for categorization

Common Workflows

Creating a Model

from sgraph import SGraph, SElement, SElementAssociation

# Create empty model
model = SGraph(SElement(None, ''))

# Create elements from paths
e1 = model.createOrGetElementFromPath('/path/to/file.x')
e2 = model.createOrGetElementFromPath('/path/to/file.y')

# Create association
ea = SElementAssociation(e1, e2, 'use')
ea.initElems()  # Must call to register association

# Serialize
model.to_xml('output.xml')
model.to_deps('output.txt')

Loading and Querying

from sgraph import SGraph, ModelApi

# Load model
model = SGraph.parse_xml_or_zipped_xml('model.xml')

# Query with ModelApi
api = ModelApi(model=model)
element = api.getElementByPath('/some/path')
elements = api.getElementsByName('nginx.c')

Querying with Cypher (requires spycy-aneeshdurg)

from sgraph import SGraph
from sgraph.cypher import cypher_query

model = SGraph.parse_xml_or_zipped_xml('model.xml')
results = cypher_query(model, '''
    MATCH (a:file)-[:imports]->(b)
    RETURN a.name, b.name
''')

CLI: python -m sgraph.cypher model.xml.zip [query] — supports interactive REPL and 11 output formats (table, csv, tsv, json, jsonl, xml, deps, dot, plantuml, graphml, cytoscape). See docs/cypher.md for full documentation.

Comparing Models

from sgraph.compare.modelcompare import ModelCompare
from sgraph.compare.compareutils import SLIDING_WINDOW_ATTRS

mc = ModelCompare()

# Basic comparison from files
compare_model = mc.compare('model_a.xml', 'model_b.xml')

# Exclude noisy time-windowed metrics (author counts, commit counts, etc.)
compare_model = mc.compare('model_a.xml', 'model_b.xml', exclude_attrs=SLIDING_WINDOW_ATTRS)

# Compare in-memory models
compare_model = mc.compareModels(model1, model2, exclude_attrs={'commit_count_30', 'author_list_7'})

# Inspect results
mc.printCompareInfos(compare_model)

The exclude_attrs parameter accepts a set of attribute names to ignore during comparison. Use SLIDING_WINDOW_ATTRS as a preset to suppress time-windowed metric noise (author/commit/bug counts at various time windows).

File Locations

  • Source code: src/sgraph/
  • Tests: tests/
  • Automation scripts: scripts/ (includes release.py)
  • Package metadata: setup.cfg, setup.py
  • Documentation: README.md, releasing.md, CLAUDE.md
  • Graph conventions: docs/graph-conventions.md
  • Cypher query docs: docs/cypher.md