GitHub Profile Parser

Extract GitHub profile details, organizations, pinned repositories, and contribution stats from a profile page HTML.

Overview

This parser extracts comprehensive GitHub profile data, including personal information, organizational affiliations, pinned projects, and contribution statistics. Perfect for developer portfolios, recruitment tools, and open-source analytics.

Simply provide the raw HTML of any public GitHub profile to get structured, actionable data.

Key Takeaways

  • Convert any public GitHub profile HTML into structured JSON covering people, organizations, and activity.
  • Feed recruiting dashboards or engineering analytics pipelines with a consistent schema across profiles.
  • Combine the GitHub parser with social parsers to build multi-channel developer intelligence.

Table of Contents

Key Data Outputs

Profile Information

Extract core profile details including:

  • Avatar — Profile picture URL
  • Bio — User biography text
  • Display Name — Full name displayed on the profile
  • Username — GitHub handle
  • Social — Array of connected social accounts with handle and URL

Organizations

Get a list of organizations the user belongs to, with each organization containing:

  • Avatar — Organization logo URL
  • Name — Organization name
  • URL — Link to the organization profile

Pinned Repositories

Access the user's showcased projects with detailed information:

  • Name — Repository name
  • Owner — Repository owner username
  • Description — Project description
  • Language — Primary programming language
  • Stars — Star count (integer)
  • Forks — Fork count (integer)
  • URL — Direct link to the repository

Contribution Statistics

Track developer activity with key metrics returned as integers:

  • Followers — Number of followers
  • Following — Number of accounts followed
  • Repositories Count — Total public repositories
  • Stars Count — Total stars received across all repositories
  • Contributions Last Year — Number of contributions in the past year

How It Works

  1. Collect the HTML of a public GitHub profile via your crawler or fetch pipeline.
  2. Send the HTML payload to Parseium's github-profile endpoint with your API key.
  3. Receive normalized JSON that mirrors the schema below and feed it into your systems.

Implementation Steps

  1. Capture profile HTML on the cadence you need (e.g., daily for active candidates).
  2. Call the Parseium API and persist the JSON to your datastore or queue.
  3. Pipe the data into dashboards, alerts, or enrichment workflows for downstream teams.
  4. Monitor responses for pinned repository changes to trigger follow-up actions.

Best Practices

  • Pair profile insights with repository metadata or traffic data for richer context.
  • Refresh contributions weekly or monthly to track velocity over time.
  • Cross-reference organizations with your CRM to spot shared connections.
  • Use Parseium automations to notify recruiters when stars or followers spike.

Use Cases

  • Build developer portfolios with live GitHub data.
  • Create recruitment and talent discovery platforms.
  • Analyze open-source contributions for security or compliance reviews.
  • Monitor developer engagement and project popularity for community teams.

Next Steps

Review the Parseium docs [blocked] for authentication details, rate limits, and parser options. When you are ready, route high-intent profiles into the Prebuilt CTA workflow to convert visitors into trials or demos.

JSON Response

{
  "content": {
    "organizations": [
      {
        "avatar": {
          "type": "string"
        },
        "name": {
          "type": "string"
        },
        "url": {
          "type": "string"
        }
      }
    ],
    "pinned_repositories": [
      {
        "description": {
          "type": "string"
        },
        "forks": {
          "format": "int",
          "type": "number"
        },
        "language": {
          "type": "string"
        },
        "name": {
          "type": "string"
        },
        "owner": {
          "type": "string"
        },
        "stars": {
          "format": "int",
          "type": "number"
        },
        "url": {
          "type": "string"
        }
      }
    ]
  },
  "profile": {
    "avatar": {
      "type": "string"
    },
    "bio": {
      "type": "string"
    },
    "display_name": {
      "type": "string"
    },
    "social": [
      {
        "handle": {
          "type": "string"
        },
        "url": {
          "type": "string"
        }
      }
    ],
    "username": {
      "type": "string"
    }
  },
  "stats": {
    "contributions_last_year": {
      "format": "int",
      "type": "number"
    },
    "followers": {
      "format": "int",
      "type": "number"
    },
    "following": {
      "format": "int",
      "type": "number"
    },
    "repositories_count": {
      "format": "int",
      "type": "number"
    },
    "stars_count": {
      "format": "int",
      "type": "number"
    }
  }
}

API Call (TypeScript)

// TypeScript example: call the GitHub profile pre-built parser
// Set PARSIUM_API_KEY in your environment (e.g. .env.local)

async function run() {
  const parserName = 'github-profile';
  // Provide the raw HTML of a public GitHub profile page
  const html = '<!doctype html>...';

  const res = await fetch(`https://api.parseium.com/v1/parse/${parserName}`, {
    method: 'POST',
    headers: {
      'X-API-Key': process.env.PARSIUM_API_KEY ?? '',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ html }),
  });

  if (!res.ok) {
    throw new Error(`Request failed: ${res.status} ${res.statusText}`);
  }

  const data = await res.json();
  console.log(data);
}

run().catch(console.error);

FAQ

Do I need GitHub API access for this parser?

No. The parser works entirely on public HTML, so you do not need API tokens or to manage rate limits beyond your Parseium plan.

Does the response include private or hidden profile data?

Only public information is captured. Private repositories, hidden contributions, and non-public metrics remain inaccessible by design.

How often should I refresh a GitHub profile?

Most teams refresh weekly, while high-priority candidates might be checked daily. Choose a cadence that balances freshness with your Parseium quota.

Can I track changes to pinned repositories?

Yes. Store previous responses and diff pinned repository arrays to alert recruiters or CSMs when featured projects change.

Conclusion

The GitHub Profile Parser converts unstructured profile pages into actionable JSON for recruiting, product analytics, and community monitoring. Leverage it alongside other Parseium parsers to centralize developer intelligence.

Free your Data

Stop wasting hours writing parsing scripts for each site. Stop overpaying for tokens with LLM extraction.

  • easy integration
  • one simple API call
  • fast and accurate
  • scalable performance
  • thousands of pages per day